KR102597074B1

KR102597074B1 - Image generating device for generating 3d images corresponding to user input sentences and operation method thereof

Info

Publication number: KR102597074B1
Application number: KR1020230065200A
Authority: KR
Inventors: 이호영; 김규철; 최호섭; 황현영; 신기영; 임지윤
Original assignee: 주식회사 툰스퀘어 (Toonsquare)
Priority date: 2023-05-19
Filing date: 2023-05-19
Publication date: 2023-11-01
Also published as: KR102597074B9

Abstract

본 개시의 일 실시예는, 사용자의 입력 문장에 대응되는 3D 이미지를 생성하는 이미지 생성 장치 및 이미지 생성 장치의 동작 방법을 개시한다. 이미지 생성 장치는 입력 인터페이스를 제공하는 디스플레이, 적어도 하나의 명령어(instruction)을 저장하는 메모리 및 메모리에 저장된 적어도 하나의 명령어를 실행하는 적어도 하나의 프로세서를 포함하고, 적어도 하나의 프로세서는 적어도 하나의 명령어를 실행함으로써 입력 인터페이스를 통하여 사용자의 입력 문장을 획득하고, 획득한 입력 문장에 포함된 적어도 하나의 문장 구성 요소를 분할하고, 분할된 적어도 하나의 문장 구성 요소에 기초하여 3D 이미지를 생성하고, 생성된 3D 이미지를 디스플레이를 통하여 제공할 수 있다.An embodiment of the present disclosure discloses an image generating device that generates a 3D image corresponding to a user's input sentence and a method of operating the image generating device. The image generating device includes a display providing an input interface, a memory storing at least one instruction, and at least one processor executing at least one instruction stored in the memory, and the at least one processor executes at least one instruction. Acquire the user's input sentence through the input interface by executing, segment at least one sentence component included in the obtained input sentence, and generate a 3D image based on the at least one segmented sentence component. 3D images can be provided through the display.

Description

Image generating device that generates a 3D image corresponding to a user's input sentence and its operating method {IMAGE GENERATING DEVICE FOR GENERATING 3D IMAGES CORRESPONDING TO USER INPUT SENTENCES AND OPERATION METHOD THEREOF}

본 발명은 이미지 생성 장치에 관한 것으로, 보다 구체적으로는 사용자의 입력 문장에 대응되는 3D 이미지를 생성하기 위한 이미지 생성 장치 및 그의 동작 방법에 관한 것이다.The present invention relates to an image generating device, and more specifically, to an image generating device and a method of operating the same for generating a 3D image corresponding to a user's input sentence.

최근 들어, 텍스트 등 문장을 입력으로 제공받아, 문장에 기초하여 이미지나 영상을 생성하는 기술이 개발되고 있다.Recently, technology has been developed to receive sentences such as text as input and generate images or videos based on the sentences.

특히, AI(Artificial Intelligence) 기술의 발전으로, GAN(Generative Adversarial Networks), VAE(Variational AutoEncoder), Transformer 등의 이미지나 영상을 생성하는 모델이 개발되고 있다.In particular, with the development of AI (Artificial Intelligence) technology, models that generate images or videos, such as GAN (Generative Adversarial Networks), VAE (Variational AutoEncoder), and Transformer, are being developed.

이를 통하여, 많은 업체들에서 문장을 입력하여 웹툰이나 그림, 동영상 등을 생성해주는 서비스를 제공하고 있다.Through this, many companies are providing services that create webtoons, pictures, videos, etc. by entering sentences.

등록번호 제2081229 호 (등록일자: 2020년 02월 19일)Registration No. 2081229 (Registration Date: February 19, 2020)

입력받은 문장에 기초하여 이미지 등을 생성함에 있어, 문장에 포함된 인물이 복수일 경우, 이미지를 생성할 때 인물들 간의 상호 작용을 고려하여 이미지를 생성하는 데 어려움이 있을 수 있다.When creating an image based on an input sentence, if there are multiple people included in the sentence, it may be difficult to create the image by considering the interaction between the people when creating the image.

또한, 이미지를 생성함에 있어, 이미지의 뷰를 결정함에 있어 문장에 포함된 인물의 수나 인물의 동작을 고려하여 적절한 뷰를 갖는 이미지를 생성하는데 어려움이 있을 수 있다.Additionally, when creating an image, it may be difficult to create an image with an appropriate view by considering the number of people included in the sentence or the movements of the people when determining the view of the image.

또한, 이미지를 생성하기 위해 문장을 입력할 때, 원하고자 하는 이미지를 생성하기 위하여 입력해야 하는 문장을 구성함에 있어서, 사용자가 어려움을 겪을 수 있다.Additionally, when inputting a sentence to create an image, the user may have difficulty composing the sentence that must be entered to create the desired image.

본 개시가 해결하고자 하는 과제들은 이상에서 언급된 과제로 제한되지 않으며, 언급되지 않은 또 다른 과제들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The problems to be solved by the present disclosure are not limited to the problems mentioned above, and other problems not mentioned can be clearly understood by those skilled in the art from the description below.

상술한 기술적 과제를 달성하기 위한 본 개시에 따르면, 사용자의 입력 문장에 대응되는 3D 이미지를 생성하는 이미지 생성 장치가 제공될 수 있다. 이미지 생성 장치는 입력 인터페이스를 제공하는 디스플레이를 포함할 수 있다. 이미지 생성 장치는 적어도 하나의 명령어(instruction)을 저장하는 메모리 및 메모리에 저장된 적어도 하나의 명령어를 실행하는 적어도 하나의 프로세서를 포함할 수 있다. 적어도 하나의 프로세서는 적어도 하나의 명령어를 실행함으로써, 입력 인터페이스를 통하여 사용자의 입력 문장을 획득할 수 있다. 적어도 하나의 프로세서는 적어도 하나의 명령어를 실행함으로써, 획득한 입력 문장에 포함된 적어도 하나의 문장 구성 요소를 분할할 수 있다. 적어도 하나의 프로세서는 적어도 하나의 명령어를 실행함으로써, 분할된 적어도 하나의 문장 구성 요소에 기초하여 3D 이미지를 생성할 수 있다. 적어도 하나의 프로세서는 적어도 하나의 명령어를 실행함으로써, 생성된 3D 이미지를 디스플레이를 통하여 제공할 수 있다.According to the present disclosure for achieving the above-described technical problem, an image generating device that generates a 3D image corresponding to a user's input sentence can be provided. The image generating device may include a display that provides an input interface. The image generating device may include a memory that stores at least one instruction and at least one processor that executes the at least one instruction stored in the memory. At least one processor may obtain a user's input sentence through an input interface by executing at least one command. At least one processor may segment at least one sentence component included in the obtained input sentence by executing at least one instruction. At least one processor may generate a 3D image based on at least one segmented sentence component by executing at least one instruction. At least one processor can provide the generated 3D image through a display by executing at least one instruction.

또한, 본 개시에 따르면, 사용자의 입력 문장에 대응되는 3D 이미지를 생성하는 이미지 생성 장치의 동작 방법이 제공될 수 있다. 이미지 생성 장치의 동작 방법은 입력 인터페이스를 디스플레이를 통하여 제공하는 단계를 포함할 수 있다. 이미지 생성 장치의 동작 방법은 입력 인터페이스를 통하여 사용자의 입력 문장을 획득하는 단계를 포함할 수 있다. 이미지 생성 장치의 동작 방법은 획득한 입력 문장에 포함된 적어도 하나의 문장 구성 요소를 분할하는 단계를 포함할 수 있다. 이미지 생성 장치의 동작 방법은 분할된 적어도 하나의 문장 구성 요소에 기초하여 3D 이미지를 생성하는 단계를 포함할 수 있다. 이미지 생성 장치의 동작 방법은 생성된 3D 이미지를 디스플레이를 통하여 제공하는 단계를 포함할 수 있다.Additionally, according to the present disclosure, a method of operating an image generating device that generates a 3D image corresponding to a user's input sentence can be provided. A method of operating an image generating device may include providing an input interface through a display. A method of operating an image generating device may include obtaining a user's input sentence through an input interface. The method of operating the image generating device may include segmenting at least one sentence component included in the obtained input sentence. A method of operating an image generating device may include generating a 3D image based on at least one segmented sentence component. A method of operating an image generating device may include providing a generated 3D image through a display.

이 외에도, 본 개시를 구현하기 위한 방법을 실행하기 위한 컴퓨터 프로그램을 기록하는 컴퓨터 판독 가능한 기록 매체가 더 제공될 수 있다.In addition, a computer-readable recording medium recording a computer program for executing a method for implementing the present disclosure may be further provided.

본 개시의 전술한 과제 해결 수단에 의하면, 사용자의 입력 문장에 대응되는 3D 이미지를 생성하는 이미지 생성 장치는 입력 문장에 포함된 문장 구성 요소를 추출하고, 추출된 문장 구성 요소에 기초하여 3D 이미지를 생성하여, 3D 이미지 생성의 정확도를 높일 수 있다.According to the means for solving the above-described problem of the present disclosure, an image generating device that generates a 3D image corresponding to a user's input sentence extracts sentence components included in the input sentence and creates a 3D image based on the extracted sentence components. By creating 3D images, the accuracy of 3D image creation can be increased.

또한, 입력 문장에 포함된 인물들이 복수일 경우, 복수의 인물들 간의 상호 작용을 고려하여 3D 이미지를 생성하고, 3D 이미지를 생성함에 있어 적절한 뷰를 결정할 수 있다.Additionally, when there are multiple people included in the input sentence, a 3D image can be generated by considering the interaction between the plurality of people, and an appropriate view can be determined when generating the 3D image.

또한, 사용자에게 3D 이미지를 생성함에 있어 필요한 문장 구성 요소를 순차적으로 입력할 수 있는 입력 인터페이스를 제공함에 따라 사용자의 편의성을 높일 수 있다.In addition, user convenience can be improved by providing an input interface that allows users to sequentially input sentence elements necessary for creating a 3D image.

본 개시의 효과들은 이상에서 언급된 효과로 제한되지 않으며, 언급되지 않은 또 다른 효과들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The effects of the present disclosure are not limited to the effects mentioned above, and other effects not mentioned may be clearly understood by those skilled in the art from the description below.

도 1은 본 개시의 일 실시예에 따른 이미지 생성 장치의 구성을 도시한 블록도이다.
도 2는 본 개시의 일 실시예에 따른 이미지 생성 장치의 동작을 설명하기 위한도면이다.
도 3a는 본 개시의 일 실시예에 따른 인물 요소의 변화에 기초하여 3D 이미지를 생성하는 이미지 생성 장치의 동작을 설명하기 위한 도면이다.
도 3b는 본 개시의 일 실시예에 따른 인물 요소의 변화에 기초하여 3D 이미지를 생성하는 이미지 생성 장치의 동작을 설명하기 위한 도면이다.
도 4a는 본 개시의 일 실시예에 따른 인물 요소에 포함된 인물의 수 및 동작 요소에 기초하여 3D 이미지를 생성하는 이미지 생성 장치의 동작을 설명하기 위한 도면이다.
도 4b는 본 개시의 일 실시예에 따른 인물 요소에 포함된 인물의 수 및 동작 요소에 기초하여 3D 이미지를 생성하는 이미지 생성 장치의 동작을 설명하기 위한 도면이다.
도 5는 본 개시의 일 실시예에 따른 인식된 복수의 인물들 각각의 움직임과 방향성에 기초하여 3D 이미지를 생성하는 이미지 생성 장치의 동작을 설명하기 위한 도면이다.
도 6은 본 개시의 일 실시예에 따른 인식된 복수의 인물들 간의 상호 작용을 고려하여 3D 이미지를 생성하는 이미지 생성 장치의 동작을 설명하기 위한 도면이다.
도 7a는 본 개시의 일 실시예에 따른 배경 이미지를 생성하는 이미지 생성 장치의 동작을 설명하기 위한 도면이다.
도 7b는 본 개시의 일 실시예에 따른 인물 이미지를 생성하는 이미지 생성 장치의 동작을 설명하기 위한 도면이다.
도 7c는 본 개시의 일 실시예에 따른 인물의 동작 이미지를 생성하는 이미지 생성 장치의 동작을 설명하기 위한 도면이다.
도 8은 복수의 프레임을 포함하는 동영상에서 검색된 동작에 대응되는 단일 프레임의 3D 이미지를 제공하는 동작을 설명하기 위한 도면이다.
도 9는 본 개시의 일 실시예에 따른 사용자의 입력 문장에 포함된 인물의 수 및동작 요소에 기초하여 3D 이미지의 뷰를 결정하는 동작을 설명하기 위한 도면이다.
도 10은 본 개시의 일 실시예에 따른 복수의 인터페이스를 통하여 복수의 문장 구성 요소를 각각 제공받는 이미지 생성 장치의 동작을 설명하기 위한 도면이다.
도 11a는 본 개시의 일 실시예에 따른 생성된 3D 이미지를 편집하는 동작을 설명하기 위한 도면이다.
도 11b는 본 개시의 일 실시예에 따른 생성된 3D 이미지를 편집하는 동작을 설명하기 위한 도면이다.
도 11c는 본 개시의 일 실시예에 따른 생성된 3D 이미지를 편집하는 동작을 설명하기 위한 도면이다.
도 12는 본 개시의 일 실시예에 따른 이미지 생성 장치의 동작 방법을 설명하기 위한 순서도이다.1 is a block diagram showing the configuration of an image generating device according to an embodiment of the present disclosure.
Figure 2 is a diagram for explaining the operation of an image generating device according to an embodiment of the present disclosure.
FIG. 3A is a diagram for explaining the operation of an image generating device that generates a 3D image based on changes in character elements according to an embodiment of the present disclosure.
FIG. 3B is a diagram for explaining the operation of an image generating device that generates a 3D image based on changes in character elements according to an embodiment of the present disclosure.
FIG. 4A is a diagram for explaining the operation of an image generating device that generates a 3D image based on the number of people and motion elements included in the person element according to an embodiment of the present disclosure.
FIG. 4B is a diagram for explaining the operation of an image generating device that generates a 3D image based on the number of people and motion elements included in the person element according to an embodiment of the present disclosure.
FIG. 5 is a diagram illustrating the operation of an image generating device that generates a 3D image based on the movement and direction of each of a plurality of recognized people according to an embodiment of the present disclosure.
FIG. 6 is a diagram illustrating the operation of an image generating device that generates a 3D image by considering interactions between a plurality of recognized people according to an embodiment of the present disclosure.
FIG. 7A is a diagram for explaining the operation of an image generating device that generates a background image according to an embodiment of the present disclosure.
FIG. 7B is a diagram for explaining the operation of an image generating device that generates a person image according to an embodiment of the present disclosure.
FIG. 7C is a diagram for explaining the operation of an image generating device that generates a motion image of a person according to an embodiment of the present disclosure.
Figure 8 is a diagram for explaining an operation of providing a 3D image of a single frame corresponding to a motion found in a video including a plurality of frames.
Figure 9 is a diagram for explaining an operation of determining a view of a 3D image based on the number of people and motion elements included in a user's input sentence according to an embodiment of the present disclosure.
FIG. 10 is a diagram illustrating the operation of an image generating device that receives a plurality of sentence components through a plurality of interfaces according to an embodiment of the present disclosure.
FIG. 11A is a diagram for explaining an operation of editing a generated 3D image according to an embodiment of the present disclosure.
FIG. 11B is a diagram for explaining an operation of editing a generated 3D image according to an embodiment of the present disclosure.
FIG. 11C is a diagram for explaining an operation of editing a generated 3D image according to an embodiment of the present disclosure.
FIG. 12 is a flowchart illustrating a method of operating an image generating device according to an embodiment of the present disclosure.

본 개시 전체에 걸쳐 동일 참조 부호는 동일 구성요소를 지칭한다. 본 개시가 실시예들의 모든 요소들을 설명하는 것은 아니며, 본 개시가 속하는 기술분야에서 일반적인 내용 또는 실시예들 간에 중복되는 내용은 생략한다. 명세서에서 사용되는 '부, 모듈, 부재, 블록'이라는 용어는 소프트웨어 또는 하드웨어로 구현될 수 있으며, 실시예들에 따라 복수의 '부, 모듈, 부재, 블록'이 하나의 구성요소로 구현되거나, 하나의 '부, 모듈, 부재, 블록'이 복수의 구성요소들을 포함하는 것도 가능하다.Like reference numerals refer to like elements throughout this disclosure. The present disclosure does not describe all elements of the embodiments, and general content or overlapping content between the embodiments in the technical field to which the present disclosure pertains is omitted. The term 'unit, module, member, block' used in the specification may be implemented as software or hardware, and depending on the embodiment, a plurality of 'unit, module, member, block' may be implemented as a single component, or It is also possible for one 'part, module, member, or block' to include multiple components.

또한 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다.Additionally, when a part "includes" a certain component, this means that it may further include other components rather than excluding other components, unless specifically stated to the contrary.

제1, 제2 등의 용어는 하나의 구성요소를 다른 구성요소로부터 구별하기 위해 사용되는 것으로, 구성요소가 전술된 용어들에 의해 제한되는 것은 아니다. Terms such as first and second are used to distinguish one component from another component, and the components are not limited by the above-mentioned terms.

단수의 표현은 문맥상 명백하게 예외가 있지 않는 한, 복수의 표현을 포함한다.Singular expressions include plural expressions unless the context clearly makes an exception.

각 단계들에 있어 식별부호는 설명의 편의를 위하여 사용되는 것으로 식별부호는 각 단계들의 순서를 설명하는 것이 아니며, 각 단계들은 문맥상 명백하게 특정 순서를 기재하지 않는 이상 명기된 순서와 다르게 실시될 수 있다.The identification code for each step is used for convenience of explanation. The identification code does not explain the order of each step, and each step may be performed differently from the specified order unless a specific order is clearly stated in the context. there is.

본 명세서에서 '본 개시에 따른 이미지 생성 장치'는 연산처리를 수행하여 사용자에게 결과를 제공할 수 있는 다양한 장치들이 모두 포함된다. 예를 들어, 본 개시에 따른 장치는, 컴퓨터, 서버 및 휴대용 단말기를 모두 포함하거나, 또는 어느 하나의 형태가 될 수 있다.In this specification, 'image generating device according to the present disclosure' includes all various devices that can perform computational processing and provide results to the user. For example, the device according to the present disclosure may include all of a computer, a server, and a portable terminal, or may take the form of any one.

여기에서, 상기 컴퓨터는 예를 들어, 웹 브라우저(WEB Browser)가 탑재된 노트북, 데스크톱(desktop), 랩톱(laptop), 태블릿 PC, 슬레이트 PC 등을 포함할 수 있다.Here, the computer may include, for example, a laptop, desktop, laptop, tablet PC, slate PC, etc. equipped with a web browser.

상기 서버는 외부 장치와 통신을 수행하여 정보를 처리하는 서버로써, 애플리케이션 서버, 컴퓨팅 서버, 데이터베이스 서버, 파일 서버, 게임 서버, 메일 서버, 프록시 서버 및 웹 서버 등을 포함할 수 있다.The server is a server that processes information by communicating with external devices, and may include an application server, computing server, database server, file server, game server, mail server, proxy server, and web server.

상기 휴대용 단말기는 예를 들어, 휴대성과 이동성이 보장되는 무선 통신 장치로서, PCS(Personal Communication System), GSM(Global System for Mobile communications), PDC(Personal Digital Cellular), PHS(Personal Handyphone System), PDA(Personal Digital Assistant), IMT(International Mobile Telecommunication)-2000, CDMA(Code Division Multiple Access)-2000, W-CDMA(W-Code Division Multiple Access), WiBro(Wireless Broadband Internet) 단말, 스마트 폰(Smart Phone) 등과 같은 모든 종류의 핸드헬드(Handheld) 기반의 무선 통신 장치와 시계, 반지, 팔찌, 발찌, 목걸이, 안경, 콘택트 렌즈, 또는 머리 착용형 장치(head-mounted-device(HMD) 등과 같은 웨어러블 장치를 포함할 수 있다. The portable terminal is, for example, a wireless communication device that guarantees portability and mobility, such as PCS (Personal Communication System), GSM (Global System for Mobile communications), PDC (Personal Digital Cellular), PHS (Personal Handyphone System), and PDA. (Personal Digital Assistant), IMT (International Mobile Telecommunication)-2000, CDMA (Code Division Multiple Access)-2000, W-CDMA (W-Code Division Multiple Access), WiBro (Wireless Broadband Internet) terminal, smart phone ), all types of handheld wireless communication devices, and wearable devices such as watches, rings, bracelets, anklets, necklaces, glasses, contact lenses, or head-mounted-device (HMD). may include.

이하 첨부된 도면들을 참고하여 본 개시의 작용 원리 및 실시예들에 대해 설명한다.Hereinafter, the operating principle and embodiments of the present disclosure will be described with reference to the attached drawings.

도 1은 본 개시의 일 실시예에 따른 이미지 생성 장치의 구성을 도시한 블록도이다.1 is a block diagram showing the configuration of an image generating device according to an embodiment of the present disclosure.

도 1을 참조하면, 일 실시예에서, 이미지 생성 장치(100)는 이미지 생성 장치(100)를 사용하는 사용자의 입력 문장에 대응되는 3D 이미지를 생성할 수 있다. 이미지 생성 장치(100)는 생성한 3D 이미지를 사용자에게 제공할 수 있다. 다만, 본 개시는 이에 제한되지 않고, 이미지 생성 장치(100)는 입력 문장에 대응되는 2D 이미지를 생성할 수 있음은 물론이다. 이하, 설명의 편의를 위하여 이미지 생성 장치(100)가 입력 문장에 대응되는 3D 이미지를 생성하는 것으로 설명한다. 또한, 3D 이미지는 하나의 프레임에 대응되는 이미지 또는 복수의 프레임에 대응되는 동영상을 모두 포함할 수 있다. Referring to FIG. 1 , in one embodiment, the image generating device 100 may generate a 3D image corresponding to a sentence input by a user using the image generating device 100. The image generating device 100 may provide the generated 3D image to the user. However, the present disclosure is not limited thereto, and it goes without saying that the image generating device 100 can generate a 2D image corresponding to the input sentence. Hereinafter, for convenience of explanation, it will be described that the image generating device 100 generates a 3D image corresponding to an input sentence. Additionally, a 3D image may include both an image corresponding to one frame or a video corresponding to multiple frames.

일 실시예에서, 이미지 생성 장치(100)는 디스플레이(110), 메모리(120) 및 적어도 하나의 프로세서(130)를 포함할 수 있다. 도 1에 도시된 구성 요소들은 본 개시에 따른 이미지 생성 장치(100)를 구현하는데 필수적인 것은 아니다. 일 실시예에서, 본 명세서 상에서 설명되는 이미지 생성 장치(100)는 위에서 열거된 구성 요소들보다 많거나, 혹은 적은 구성 요소들을 가질 수도 있다. 디스플레이(110), 메모리(120) 및 적어도 하나의 프로세서(130)는 각각 전기적 및/또는 물리적으로 서로 연결될 수 있다.In one embodiment, the image generating device 100 may include a display 110, a memory 120, and at least one processor 130. The components shown in FIG. 1 are not essential for implementing the image generating device 100 according to the present disclosure. In one embodiment, the image generating device 100 described herein may have more or fewer components than those listed above. The display 110, the memory 120, and at least one processor 130 may each be electrically and/or physically connected to each other.

일 실시예에서, 디스플레이(110)는 이미지 생성 장치(100)에서 처리되는 정보를 표시할 수 있다. 디스플레이(110)는 이미지 생성 장치(100)에서 제공되는 입력 인터페이스를 표시할 수 있다. In one embodiment, the display 110 may display information processed by the image generating device 100. The display 110 may display an input interface provided by the image generating device 100.

일 실시예에서, 디스플레이(110)는 이미지 생성 장치(110)에서 구동되는 응용 프로그램(일 예로, 입력 인터페이스)의 실행화면 정보, 또는 이러한 실행화면 정보에 따른 UI(User Interface), GUI(Graphic User Interface) 정보를 표시할 수 있다. In one embodiment, the display 110 displays execution screen information of an application program (for example, an input interface) running on the image generating device 110, or a user interface (UI) or graphic user interface (GUI) according to the execution screen information. Interface) information can be displayed.

일 실시예에서, 적어도 하나의 프로세서(130)는 입력 인터페이스를 표시하도록 디스플레이(110)를 제어하여, 이미지 생성 장치(100)를 사용하는 사용자에게 입력 인터페이스를 제공할 수 있다. 사용자는 디스플레이(110)에 표시되는 입력 인터페이스를 통하여 이미지 생성 장치(100)에 사용자 입력 문장을 제공할 수 있다.In one embodiment, at least one processor 130 may control the display 110 to display an input interface, thereby providing the input interface to a user using the image generating device 100. A user may provide a user input sentence to the image generating device 100 through an input interface displayed on the display 110.

일 실시예에서, 적어도 하나의 프로세서(130)는 사용자 입력 문장에 대응되도록 생성한 3D 이미지를 표시하도록 디스플레이(110)를 제어할 수 있다. 일 실시예에서, 적어도 하나의 프로세서(130)는 생성된 3D 이미지를 디스플레이(110)를 통하여 사용자에게 제공할 수 있다.In one embodiment, at least one processor 130 may control the display 110 to display a 3D image generated to correspond to a user input sentence. In one embodiment, at least one processor 130 may provide the generated 3D image to the user through the display 110.

일 실시예에서, 메모리(120)는 이미지 생성 장치(100)의 다양한 기능을 지원하는 데이터와, 적어도 하나의 프로세서(130)의 동작을 위한 프로그램을 저장할 수 있고, 입/출력되는 데이터들(예를 들어, 문장, 음악 파일, 정지 영상, 동영상 등)을 저장할 있고, 본 장치에서 구동되는 다수의 응용 프로그램(application program 또는 애플리케이션(application)), 본 장치의 동작을 위한 적어도 하나의 데이터들, 적어도 하나의 명령어(instruction)를 저장할 수 있다. 이러한 응용 프로그램 중 적어도 일부는, 무선 통신을 통해 외부 서버로부터 다운로드 될 수 있다. In one embodiment, the memory 120 may store data supporting various functions of the image generating device 100 and a program for operation of at least one processor 130, and may store input/output data (e.g. For example, sentences, music files, still images, videos, etc.), a plurality of application programs (application programs or applications) running on the device, at least one data for operation of the device, at least One instruction can be stored. At least some of these applications may be downloaded from an external server via wireless communication.

이러한, 메모리(120)는 플래시 메모리 타입(flash memory type), 하드디스크 타입(hard disk type), SSD 타입(Solid State Disk type), SDD 타입(Silicon Disk Drive type), 멀티미디어 카드 마이크로 타입(multimedia card micro type), 카드 타입의 메모리(예를 들어 SD 또는 XD 메모리 등), 램(random access memory; RAM), SRAM(static random access memory), 롬(read-only memory; ROM), EEPROM(electrically erasable programmable read-only memory), PROM(programmable read-only memory), 자기 메모리, 자기 디스크 및 광디스크 중 적어도 하나의 타입의 저장매체를 포함할 수 있다. 또한, 메모리는 이미지 생성 장치(100)와는 분리되어 있으나, 유선 또는 무선으로 연결된 데이터베이스가 될 수도 있다.The memory 120 includes a flash memory type, a hard disk type, a solid state disk type, an SDD type (Silicon Disk Drive type), and a multimedia card micro type. micro type), card type memory (e.g. SD or XD memory, etc.), random access memory (RAM), static random access memory (SRAM), read-only memory (ROM), EEPROM (electrically erasable) It may include at least one type of storage medium among programmable read-only memory (PROM), programmable read-only memory (PROM), magnetic memory, magnetic disk, and optical disk. Additionally, the memory is separate from the image generating device 100, but may be a database connected wired or wirelessly.

일 실시예에서, 적어도 하나의 프로세서(130)는 이미지 생성 장치(100)의 전반적인 동작들을 제어할 수 있다. 일 실시예에서, 적어도 하나의 프로세서(130)는 메모리(120)에 저장된 적어도 하나 이상의 명령어를 실행하여, 이미지 생성 장치(100)의 동작을 제어할 수 있다. 일 실시예에서, 적어도 하나의 프로세서(130)는 메모리(120)에 저장된 적어도 하나의 명령어를 실행하여 사용자의 입력 문장에 대응하는 3D 이미지를 생성하는 동작을 수행할 수 있다.In one embodiment, at least one processor 130 may control overall operations of the image generating device 100. In one embodiment, at least one processor 130 may control the operation of the image generating device 100 by executing at least one instruction stored in the memory 120. In one embodiment, at least one processor 130 may execute at least one command stored in the memory 120 to generate a 3D image corresponding to a user input sentence.

일 실시예에서, 이미지 생성 장치(100)는 통신 인터페이스를 더 포함할 수도 있다. 일 실시예에서, 통신 인터페이스는 외부의 서버 또는 주변의 다른 전자 장치들과 이미지 생성 장치(100) 간의 데이터 통신을 수행할 수 있다. In one embodiment, the image generating device 100 may further include a communication interface. In one embodiment, the communication interface may perform data communication between the image generating device 100 and an external server or other nearby electronic devices.

일 실시예에서, 통신 인터페이스는 외부의 서버 또는 주변의 다른 전자 장치들과 통신을 가능하게 하는 하나 이상의 구성 요소를 포함할 수 있으며, 예를 들어, 무선 통신 모듈, 근거리 통신 모듈, 위치 정보 모듈 중 적어도 하나를 포함할 수 있다.In one embodiment, the communication interface may include one or more components that enable communication with an external server or other nearby electronic devices, for example, a wireless communication module, a short-range communication module, or a location information module. It can contain at least one.

무선 통신 모듈은 와이파이(Wifi) 모듈, 와이브로(Wireless broadband) 모듈 외에도, GSM(global System for Mobile Communication), CDMA(Code Division Multiple Access), WCDMA(Wideband Code Division Multiple Access), UMTS(universal mobile telecommunications system), TDMA(Time Division Multiple Access), LTE(Long Term Evolution), 4G, 5G, 6G 등 다양한 무선 통신 방식을 지원하는 무선 통신 모듈을 포함할 수 있다.In addition to Wi-Fi modules and WiBro (Wireless broadband) modules, wireless communication modules include GSM (global System for Mobile Communication), CDMA (Code Division Multiple Access), WCDMA (Wideband Code Division Multiple Access), and UMTS (universal mobile telecommunications system). ), TDMA (Time Division Multiple Access), LTE (Long Term Evolution), 4G, 5G, 6G, etc. may include a wireless communication module that supports various wireless communication methods.

무선 통신 모듈은 데이터 신호를 송신하는 안테나 및 송신기(Transmitter)를 포함하는 무선 통신 인터페이스를 포함할 수 있다. 또한, 무선 통신 모듈은 적어도 하나의 프로세서(130)의 제어에 따라 무선 통신 인터페이스를 통해 적어도 하나의 프로세서(130)로부터 출력된 디지털 제어 신호를 아날로그 형태의 무선 신호로 변조하는 데이터 신호 변환 모듈을 더 포함할 수 있다.The wireless communication module may include a wireless communication interface including an antenna and a transmitter that transmits a data signal. In addition, the wireless communication module further includes a data signal conversion module that modulates a digital control signal output from at least one processor 130 into an analog wireless signal through a wireless communication interface under the control of the at least one processor 130. It can be included.

무선 통신 모듈은 데이터 신호를 수신하는 안테나 및 수신기(Receiver)를 포함하는 무선 통신 인터페이스를 포함할 수 있다. 또한, 무선 통신 모듈은 무선 통신 인터페이스를 통하여 수신한 아날로그 형태의 무선 신호를 디지털 제어 신호로 복조하기 위한 데이터 신호 변환 모듈을 더 포함할 수 있다.The wireless communication module may include a wireless communication interface including an antenna and a receiver for receiving data signals. Additionally, the wireless communication module may further include a data signal conversion module for demodulating an analog wireless signal received through a wireless communication interface into a digital control signal.

근거리 통신 모듈은 근거리 통신(Short range communication)을 위한 것으로서, 블루투스(Bluetooth™RFID(Radio Frequency Identification), 적외선 통신(Infrared Data Association; IrDA), UWB(Ultra Wideband), ZigBee, NFC(Near Field Communication), Wi-Fi(Wireless-Fidelity), Wi-Fi Direct, Wireless USB(Wireless Universal Serial Bus) 기술 중 적어도 하나를 이용하여, 근거리 통신을 지원할 수 있다.The short-range communication module is for short-range communication and includes Bluetooth™ RFID (Radio Frequency Identification), Infrared Data Association (IrDA), UWB (Ultra Wideband), ZigBee, and NFC (Near Field Communication). , short-distance communication can be supported using at least one of Wi-Fi (Wireless-Fidelity), Wi-Fi Direct, and Wireless USB (Wireless Universal Serial Bus) technologies.

일 실시예에서, 이미지 생성 장치(100)는 사용자 인터페이스를 더 포함할 수도 있다. 사용자 인터페이스는 사용자로부터 정보를 입력받기 위한 것으로서, 사용자 인터페이스를 통해 정보가 입력되면, 적어도 하나의 프로세서(130)는 입력된 정보에 대응되도록 이미지 생성 장치(100)의 동작을 제어할 수 있다. In one embodiment, the image generating device 100 may further include a user interface. The user interface is for receiving information from the user. When information is input through the user interface, at least one processor 130 may control the operation of the image generating device 100 to correspond to the input information.

일 실시예에서, 디스플레이(110)를 통하여 표시되는 입력 인터페이스는, 사용자 입력 인터페이스를 통하여 제공되는 정보를 표시하기 위한 인터페이스일 수 있다.In one embodiment, the input interface displayed through the display 110 may be an interface for displaying information provided through a user input interface.

일 실시예에서, 사용자 인터페이스는 터치 키를 포함할 수 있다. 일 예로서, 터치 키는, 소프트웨어적인 처리를 통해 터치스크린 타입의 디스플레이부 상에 표시되는 가상 키(virtual key), 소프트 키(soft key) 또는 비주얼 키(visual key)로 이루어지거나, 상기 터치스크린 이외의 부분에 배치되는 터치 키(touch key)로 이루어질 수 있다. 사용자는 터치 키를 이용하여 사용자 입력 문장을 입력할 수 있다.In one embodiment, the user interface may include touch keys. As an example, the touch key consists of a virtual key, soft key, or visual key displayed on a touch screen-type display unit through software processing, or is displayed on the touch screen. It may be composed of touch keys placed in other parts. The user can input a user input sentence using the touch keys.

또한, 사용자 인터페이스는 마우스(mouse), 키보드(keyboard) 등의 장치를 포함할 수 있다. 일 실시예에서, 사용자는 키보드 등을 이용하여 사용자 입력 문장을 입력할 수 있다. Additionally, the user interface may include devices such as a mouse and keyboard. In one embodiment, a user may input a user input sentence using a keyboard or the like.

또한, 사용자 인터페이스는 마이크 등의 음성 입력 인터페이스를 포함할 수 있다. 일 실시예에서, 사용자는 마이크 등을 이용하여 사용자 입력 문장을 입력할 수도 있다.Additionally, the user interface may include a voice input interface such as a microphone. In one embodiment, the user may input a user input sentence using a microphone or the like.

일 실시예에서, 이미지 생성 장치(100)는 통신 인터페이스를 통하여 외부의 서버 또는 주변의 전자 장치와 통신을 수행할 수 있다. 일 실시예에서, 외부의 서버는 애플리케이션 서버, 컴퓨팅 서버, 데이터베이스 서버, 파일 서버, 게임 서버, 메일 서버, 프록시 서버 및 웹 서버 등을 포함할 수 있다. In one embodiment, the image generating device 100 may communicate with an external server or surrounding electronic device through a communication interface. In one embodiment, external servers may include application servers, computing servers, database servers, file servers, game servers, mail servers, proxy servers, and web servers.

일 실시예에서, 외부의 서버 또는 주변의 전자 장치에, 본 개시에 설명하는 사용자의 입력 문장에 대응하여 3D 이미지를 생성하는 방법을 포함하는 모델이 포함되어 있을 수도 있다. 이미지 생성 장치(100)는 사용자 인터페이스를 통하여 획득한 사용자 입력 문장을 통신 인터페이스를 통하여 외부의 서버 또는 주변의 전자 장치에 제공할 수 있다. In one embodiment, a model including a method for generating a 3D image in response to a user's input sentence described in this disclosure may be included in an external server or nearby electronic device. The image generating device 100 may provide a user input sentence obtained through a user interface to an external server or nearby electronic device through a communication interface.

일 실시예에서, 이미지 생성 장치(100)는 외부의 서버 또는 주변의 전자 장치로부터 제공받은 3D 이미지를 디스플레이(110)를 통하여 사용자에게 제공할 수 있다. 사용자는 이미지 생성 장치(100)를 통하여 외부의 서버 또는 외부의 전자 장치에 포함된 3D 이미지 생성 모델을 이용할 수도 있다. In one embodiment, the image generating device 100 may provide a 3D image provided from an external server or a nearby electronic device to the user through the display 110. A user may use a 3D image generation model included in an external server or external electronic device through the image generation device 100.

이하, 설명의 편의를 위하여, 사용자의 입력 문장에 대응하여 3D 이미지를 생성하는 동작은 이미지 생성 장치(100)에서 이루어지는 것으로 설명한다.Hereinafter, for convenience of explanation, the operation of generating a 3D image in response to a user's input sentence will be described as being performed in the image generating device 100.

도 2는 본 개시의 일 실시예에 따른 이미지 생성 장치의 동작을 설명하기 위한도면이다.Figure 2 is a diagram for explaining the operation of an image generating device according to an embodiment of the present disclosure.

도 1 및 도 2를 참조하면, 일 실시예에서, 도 2에는 이미지 생성 장치(100)를 통하여 사용자에게 제공되는 입력 인터페이스(300)를 통하여 이미지 생성 장치(100)에 제공된 사용자의 입력 문장에 대응하여 생성된 3D 이미지(400)가 표시되어 있다. 이때, 입력 인터페이스(300) 및 3D 이미지(400)는 하나의 통합 인터페이스(200)에 포함되어 사용자에게 제공될 수도 있다.Referring to FIGS. 1 and 2 , in one embodiment, FIG. 2 corresponds to a user input sentence provided to the image generating device 100 through the input interface 300 provided to the user through the image generating device 100. The generated 3D image 400 is displayed. At this time, the input interface 300 and the 3D image 400 may be included in one integrated interface 200 and provided to the user.

일 실시예에서, 이미지 생성 장치(100)의 사용자는 입력 인터페이스(300)를 통하여 입력 문장을 이미지 생성 장치(100)에 제공할 수 있다. 일 실시예에서, 적어도 하나의 프로세서(130)는 입력 인터페이스(300)를 통하여 사용자의 입력 문장을 획득할 수 있다. In one embodiment, a user of the image generating device 100 may provide an input sentence to the image generating device 100 through the input interface 300. In one embodiment, at least one processor 130 may obtain a user's input sentence through the input interface 300.

일 실시예에서, 적어도 하나의 프로세서(130)는 획득한 입력 문장에 포함된 적어도 하나의 문장 구성 요소를 추출할 수 있다. 적어도 하나의 프로세서(130)는 자연 언어 처리(Natural Language Processing, NLP)를 이용하여 입력 문장에 포함된 적어도 하나의 문장 구성 요소를 추출할 수 있다. 일 실시예에서, 적어도 하나의 프로세서(130)는 RNN(Recurrent Neural Network), LSTM(Long Short Term Memory), Transforemr 등을 이용하여 입력 문장에 포함된 적어도 하나의 문장 구성 요소를 추출할 수 있다.In one embodiment, at least one processor 130 may extract at least one sentence component included in the obtained input sentence. At least one processor 130 may extract at least one sentence component included in the input sentence using natural language processing (NLP). In one embodiment, at least one processor 130 may extract at least one sentence component included in the input sentence using a Recurrent Neural Network (RNN), Long Short Term Memory (LSTM), Transformer, etc.

일 실시예에서, 적어도 하나의 문장 구성 요소는 문장의 배경 요소(310), 예를 들어 교실, 학교, 공원 등을 포함할 수 있다. 일 실시예에서, 적어도 하나의 문장 구성 요소는 문장의 인물 요소(320), 예를 들어 남자, 여자, 이름, 동물, 사물 등을 포함할 수 있다. 일 실시예에서, 적어도 하나의 문장 구성 요소는 문장의 동작 요소(330, 340, 350), 예를 들어 싸우는 동작, 책을 읽는 동작, 음식을 먹는 동작, 물건을 뺏는 동작, 노래를 부르는 동작 등을 포함할 수 있다.In one embodiment, at least one sentence component may include a sentence background element 310, such as a classroom, school, park, etc. In one embodiment, at least one sentence component may include a person element 320 of the sentence, such as a man, a woman, a name, an animal, an object, etc. In one embodiment, the at least one sentence component is an action element (330, 340, 350) of the sentence, such as fighting, reading, eating, taking an object, singing, etc. may include.

일 실시예에서, 적어도 하나의 프로세서(130)는 자동 분할 및 추출(Automatic Segmentation and Extraction)을 통하여, 적어도 하나의 문장 구성 요소에 포함된 배경 요소(310), 인물 요소(320) 및 동작 요소(330, 340, 350)을 분할하고, 분할된 요소들에 기초하여 입력 문장의 정보를 추출할 수 있다. 일 실시예에서, 도 2를 참조하면, 적어도 하나의 프로세서(130)는 분할된 요소들에 기초하여, 입력 문장의 배경은 "교실"이고, 인물은 "여자와 남자" 이고, 동작은 "축구공을 뺏는 동작", "싸우는 동작" 이라는 정보를 추출할 수 있다.In one embodiment, the at least one processor 130 processes the background element 310, the character element 320, and the action element included in at least one sentence element through automatic segmentation and extraction ( 330, 340, 350) can be divided, and information of the input sentence can be extracted based on the divided elements. In one embodiment, referring to Figure 2, at least one processor 130 based on the segmented elements, the background of the input sentence is "classroom", the person is "woman and man", and the action is "football". Information such as “movement to steal the ball” and “fighting movement” can be extracted.

일 실시예에서, 적어도 하나의 프로세서(130)는 추출된 적어도 하나의 문장 구성 요소에 기초하여 3D 이미지(400)를 생성할 수 있다. 3D 이미지(400)는 추출된 적어도 하나의 문장 구성 요소에 각각 대응되는 적어도 하나의 이미지 구성 요소를 포함할 수 있다. 일 실시예에서, 적어도 하나의 이미지 구성 요소는 문장의 배경 요소(310)에 대응되는 배경 이미지(410)를 포함할 수 있다. 적어도 하나의 이미지 구성 요소는 문장의 인물 요소(320)에 대응되는 인물 이미지(420, 430)를 포함할 수 있다. 적어도 하나의 이미지 구성 요소는 문장의 동작 요소(330, 340, 350)에 대응되는 동작 이미지(440)를 포함할 수 있다. 이때, 동작 이미지(440)는 인물 이미지(420, 430)의 동작을 나타내는 요소일 수 있다.In one embodiment, at least one processor 130 may generate a 3D image 400 based on at least one extracted sentence component. The 3D image 400 may include at least one image component each corresponding to at least one extracted sentence component. In one embodiment, at least one image component may include a background image 410 corresponding to the background element 310 of the sentence. At least one image component may include person images 420 and 430 corresponding to the person element 320 of the sentence. At least one image component may include an action image 440 corresponding to the action elements 330, 340, and 350 of the sentence. At this time, the motion image 440 may be an element representing the motion of the person images 420 and 430.

도 3a는 본 개시의 일 실시예에 따른 인물 요소의 변화에 기초하여 3D 이미지를 생성하는 이미지 생성 장치의 동작을 설명하기 위한 도면이다. 도 3b는 본 개시의 일 실시예에 따른 인물 요소의 변화에 기초하여 3D 이미지를 생성하는 이미지 생성 장치의 동작을 설명하기 위한 도면이다. 이하, 도 2에서 설명한 구성과 동일한 구성에 대하여는 동일한 도면 부호를 부여하고, 중복되는 설명은 생략하도록 한다.FIG. 3A is a diagram for explaining the operation of an image generating device that generates a 3D image based on changes in character elements according to an embodiment of the present disclosure. FIG. 3B is a diagram for explaining the operation of an image generating device that generates a 3D image based on changes in character elements according to an embodiment of the present disclosure. Hereinafter, the same reference numerals will be assigned to the same components as those described in FIG. 2, and overlapping descriptions will be omitted.

도 1, 도 3a 및 도 3b를 참조하면, 도 3a에 도시된 입력 문장에 포함된 인물 요소(320)는 “A가 B를”이고, 도 3b에 도시된 입력 문장에 포함된 인물 요소(321)는 “동생이 형을” 이다.Referring to FIGS. 1, 3A, and 3B, the person element 320 included in the input sentence shown in FIG. 3A is “A is B”, and the person element 321 included in the input sentence shown in FIG. 3B ) is “the younger brother is the older brother.”

일 실시예에서, 적어도 하나의 프로세서(130)는 도 3a에 도시된 “A가 B를”의 인물 요소(320)로부터 입력 문장의 인물은 “A”와 “B”이고, 입력 문장의 동작의 주체가 “A”인 정보를 추출할 수 있다. 이때, “”와 “B”의 나이, 성별 등과 같은 추가 정보를 획득할 수 없으므로, 적어도 하나의 프로세서(130)는 추출한 정보들에 기초하여 “A”에 대응되는 임의의 인물 이미지(420)가 “B”에 대응되는 임의의 인물 이미지(421)에 대하여 동작을 수행하는 3D 이미지를 생성할 수 있다.In one embodiment, the at least one processor 130 determines from the person element 320 of “A is B” shown in FIG. 3A that the people in the input sentence are “A” and “B” and the operation of the input sentence. Information with the subject “A” can be extracted. At this time, since additional information such as age, gender, etc. of “” and “B” cannot be obtained, at least one processor 130 creates a random person image 420 corresponding to “A” based on the extracted information. A 3D image that performs an action on a random person image 421 corresponding to “B” can be created.

일 실시예에서, 적어도 하나의 프로세서(130)는 도 3b에 도시된 “동생이 형을”의 인물 요소(321)로부터 입력 문장의 인물은 “동생”과 “형”이고, 입력 문장의 동작의 주체가 “동생”인 정보를 추출할 수 있다. 이때, 입력 문장의 인물이 “동생”와 “형”이라는 추가 정보에 기초하여, 적어도 하나의 프로세서(130)는 “동생”에 대응되는 인물 이미지(420)가 “형”에 대응되는 인물 이미지(422)에 대하여 동작을 수행하는 3D 이미지를 생성할 수 있다. 이때, “형”에 대응되는 인물 이미지(422)의 형상은 “동생”에 대응되는 인물 이미지(420)의 형상보다 클 수 있다.In one embodiment, the at least one processor 130 determines from the person element 321 of “younger brother” shown in FIG. 3B that the people in the input sentence are “younger brother” and “older brother” and determines the operation of the input sentence. Information that the subject is “younger brother” can be extracted. At this time, based on the additional information that the characters in the input sentence are “younger brother” and “older brother,” at least one processor 130 selects the person image 420 corresponding to “younger brother” to the person image corresponding to “older brother” ( 422), a 3D image that performs an operation can be generated. At this time, the shape of the person image 422 corresponding to “older brother” may be larger than the shape of the person image 420 corresponding to “younger brother.”

본 개시는 이에 제한되지 않고, 적어도 하나의 프로세서(130)는 입력 문장에 포함된 배경 요소 또는 동작 요소가 변경되는 경우, 변경된 배경 요소 또는 동작 요소에 포함된 정보를 반영하여 3D 이미지를 생성할 수 있다.The present disclosure is not limited thereto, and when the background element or action element included in the input sentence is changed, the at least one processor 130 may generate a 3D image by reflecting the information included in the changed background element or action element. there is.

도 4a는 본 개시의 일 실시예에 따른 인물 요소에 포함된 인물의 수 및 동작 요소에 기초하여 3D 이미지를 생성하는 이미지 생성 장치의 동작을 설명하기 위한 도면이다. 도 4b는 본 개시의 일 실시예에 따른 인물 요소에 포함된 인물의 수 및 동작 요소에 기초하여 3D 이미지를 생성하는 이미지 생성 장치의 동작을 설명하기 위한 도면이다. 이하, 도 2에서 설명한 구성과 동일한 구성에 대하여는 동일한 도면 부호를 부여하고, 중복되는 설명은 생략하도록 한다.FIG. 4A is a diagram for explaining the operation of an image generating device that generates a 3D image based on the number of people and motion elements included in the person element according to an embodiment of the present disclosure. FIG. 4B is a diagram for explaining the operation of an image generating device that generates a 3D image based on the number of people and motion elements included in the person element according to an embodiment of the present disclosure. Hereinafter, the same reference numerals will be assigned to the same components as those described in FIG. 2, and overlapping descriptions will be omitted.

도 1, 도 4a 및 도 4b를 참조하면, 일 실시예에서, 적어도 하나의 프로세서(130)는 분할된 동작 요소에 기초하여, 분할된 인물 요소(320)의 포즈(pose)를 추정(estimation)할 수 있다. 일 실시예에서, 도 4a에 도시된 입력 문장의 동작 요소는 “기대어 있다” 이고, 인물 요소(320)는 “한 남자”이다.1, 4A, and 4B, in one embodiment, at least one processor 130 estimates the pose of the segmented person element 320 based on the segmented motion element. can do. In one embodiment, the action element of the input sentence shown in Figure 4A is “reclining,” and the person element 320 is “a man.”

일 실시예에서, 적어도 하나의 프로세서(130)는 포즈 추정(pose estimation) 알고리즘을 이용하여, 분할된 인물 요소(320)의 포즈를 추정할 수 있다. 일 실시예에서, 적어도 하나의 프로세서(130)는 포즈 추정 알고리즘을 이용하여, 인물 요소(320)의 관절 위치를 인식하여, 인물 요소의 스켈레톤(skeleton) 형태를 파악할 수 있다. 일 실시예에서, 포즈 추정 알고리즘은 OpenPose, AlphaPose 또는 DensePose 등을 이용할 수 있다.In one embodiment, at least one processor 130 may estimate the pose of the segmented human element 320 using a pose estimation algorithm. In one embodiment, at least one processor 130 may use a pose estimation algorithm to recognize the joint positions of the human element 320 and identify the skeleton shape of the human element 320. In one embodiment, the pose estimation algorithm may use OpenPose, AlphaPose, or DensePose.

일 실시예에서, 도 4a에서, 적어도 하나의 프로세서(130)는 “한 남자”의 포즈를 추정할 수 있다. 적어도 하나의 프로세서(130)는 “한 남자”의 스켈레톤 형태를 파악할 수 있다.In one embodiment, in Figure 4A, at least one processor 130 may estimate the pose of “a man.” At least one processor 130 may determine the skeleton form of “a man.”

일 실시예에서, 적어도 하나의 프로세서(130)는 “기대어 있다”의 동작 요소 및In one embodiment, at least one processor 130 includes a “reclining” operational element and

“한 남자”의 인물 요소에 기초하여, 사용자의 입력 문장에 대응되는 3D 이미지 상에서의 “한 남자”의 포즈를 추정할 수 있다. 일 실시예에서, 적어도 하나의 프로세서(130)는 3D 이미지 내에서, “한 남자”의 스켈레톤 형태를 기대어 서 있는 형태로 추정할 수 있다.Based on the character elements of “a man”, the pose of “a man” on the 3D image corresponding to the user’s input sentence can be estimated. In one embodiment, at least one processor 130 may estimate the skeleton form of “a man” in a reclining form within the 3D image.

일 실시예에서, 적어도 하나의 프로세서(130)는 추정된 인물 요소의 포즈에 기초하여, 분할된 동작 요소를 수행하는 인물 요소의 위치 및 방향 정보를 결정할 수 있다. 적어도 하나의 프로세서(130)는 결정한 인물 요소의 위치 및 방향 정보에 기초하여, 3D 이미지 내에서의 인물 요소에 대응되는 인물 이미지의 위치 및 방향 정보를 결정하고, 해당 정보에 기초하여 3D 이미지를 생성할 수 있다.In one embodiment, at least one processor 130 may determine position and direction information of a person element that performs a segmented action element, based on the estimated pose of the person element. At least one processor 130 determines the position and direction information of the person image corresponding to the person element in the 3D image based on the position and direction information of the determined person element, and generates a 3D image based on the information. can do.

일 실시예에서, 적어도 하나의 프로세서(130)는 “한 남자”가 기대어 서 있는 포즈에 기초하여, “한 남자”의 위치 및 방향 정보를 결정할 수 있다. 일 실시예에서, 입력 문장에 배경 요소가 포함된 경우, 적어도 하나의 프로세서(130)는 “한 남자”의 위치 정보를 배경 요소를 등지고 기대어 서 있는 것으로 결정할 수 있다. 적어도 하나의 프로세서(130)는 “한 남자”가 배경 요소를 등지고, 앞쪽으로 바라보며 서있는 것으로 방향 정보를 결정할 수 있다. In one embodiment, at least one processor 130 may determine location and orientation information of “a man” based on a pose in which “a man” is leaning. In one embodiment, when the input sentence includes a background element, at least one processor 130 may determine the location information of “a man” as standing against the background element. At least one processor 130 may determine orientation information such that “a man” is standing facing forward with his back to a background element.

이때, 적어도 하나의 프로세서(130)는 다양한 인물들의 포즈에 따른 위치와 방향 정보들을 포함하는 학습 정보에 기초하여 기-학습된 보정 알고리즘을 이용하여 “한 남자”의 위치 및 방향 정보를 결정할 수 있다. At this time, at least one processor 130 may determine the location and direction information of “one man” using a pre-learned correction algorithm based on learning information including location and direction information according to the poses of various people. .

일 실시예에서, 도 4a에는 인물 요소(320)가 한 명의 인물을 포함한다. 일 실시예에서, 인물 요소(320)가 한 명의 인물을 포함할 때, 적어도 하나의 프로세서(130)는 한 명의 인물이 추정된 포즈에 기초하여 분할된 동작 요소를 수행하는 인물 요소의 위치 및 방향 정보를 결정할 수 있다. 적어도 하나의 프로세서(130)는 한 명의 인물의 포즈에 기초하여 3D 이미지에 포함되는 인물 이미지의 위치 및 방향 정보를 결정할 수 있다. 이에 따라, 사용자의 입력 문장에 포함된 인물 요소가 한 명의 인물을 포함할 때, 적어도 하나의 프로세서(130)는 3D 이미지에 포함된 인물 이미지의 위치 및 방향 정보를 한 명의 포즈를 고려하여 결정할 수 있다.In one embodiment, person element 320 in Figure 4A includes one person. In one embodiment, when the person element 320 includes one person, the at least one processor 130 determines the position and orientation of the person element at which the one person performs the segmented motion element based on the estimated pose. information can be determined. At least one processor 130 may determine location and direction information of a person image included in a 3D image based on the pose of one person. Accordingly, when the person element included in the user's input sentence includes one person, at least one processor 130 can determine the position and direction information of the person image included in the 3D image by considering the pose of one person. there is.

일 실시예에서, 도 4b에는 인물 요소(321)가 복수의 인물들을 포함한다. 일 실시예에서, 인물 요소(321)가 복수의 인물들을 포함할 때, 적어도 하나의 프로세서(130)는 복수의 인물들이 각각 추정된 포즈들 간의 상호 작용을 고려하여, 분할된 동작 요소를 수행하는 인물 요소의 위치 및 방향 정보를 결정할 수 있다. 적어도 하나의 프로세서(130)는 복수의 인물들의 포즈에 기초하여 3D 이미지에 포함되는 인물 이미지의 위치 및 방향 정보를 결정할 수 있다.In one embodiment, the person element 321 in FIG. 4B includes a plurality of people. In one embodiment, when the person element 321 includes a plurality of people, at least one processor 130 performs a segmented motion element by considering the interaction between the estimated poses of each of the plurality of people. Position and direction information of human elements can be determined. At least one processor 130 may determine location and direction information of a person image included in a 3D image based on the poses of a plurality of people.

일 실시예에서, 적어도 하나의 프로세서(130)는 “남자와 여자(321)” 각각의 포즈를 추정할 수 있다. 적어도 하나의 프로세서(130)는 멀티-인물 포즈 추정(multi-person pose estimation)을 이용하여 “남자와 여자” 각각의 포즈를 추정할 수 있다. 적어도 하나의 프로세서(130)는 “나란히(340)”와 “기대어 있다”의 동작 요소와 “남자와 여자(321)”의 인물 요소에 기초하여, “남자와 여자” 각각의 포즈를 추정할 수 있다.In one embodiment, at least one processor 130 may estimate the respective poses of “man and woman 321.” At least one processor 130 may estimate the poses of each “man and woman” using multi-person pose estimation. At least one processor 130 may estimate the respective poses of “man and woman” based on the motion elements of “side by side” (340) and “reclining” and the person elements of “man and woman” (321). there is.

일 실시예에서, 적어도 하나의 프로세서(130)는 다양한 인물들의 포즈에 따른 위치와 방향 정보들을 포함하는 학습 정보에 기초하여 기-학습된 보정 알고리즘을 이용하여 “남자와 여자(321)”각각의 추정된 포즈들 간의 상호 작용을 고려하여, “남자와 여자”의 위치 및 방향 정보를 결정할 수 있다. 이에 따라, 사용자의 입력 문장에 포함된 인물 요소가 복수의 인물들을 포함할 때, 적어도 하나의 프로세서(130)는 3D 이미지에 포함된 인물 이미지의 위치 및 방향 정보를 복수의 인물들 각각의 포즈의 상호 작용을 고려하여 결정할 수 있다.In one embodiment, at least one processor 130 uses a pre-learned correction algorithm based on learning information including position and direction information according to the poses of various people to determine each of the “man and woman 321”. By considering the interaction between the estimated poses, the position and orientation information of “man and woman” can be determined. Accordingly, when the person element included in the user's input sentence includes a plurality of people, at least one processor 130 uses the position and direction information of the person image included in the 3D image to determine the pose of each of the plurality of people. It can be decided by considering interactions.

도 5는 본 개시의 일 실시예에 따른 인식된 복수의 인물들 각각의 움직임과 방향성에 기초하여 3D 이미지를 생성하는 이미지 생성 장치의 동작을 설명하기 위한 도면이다. 이하, 도 2에서 설명한 구성과 동일한 구성에 대하여는 동일한 도면 부호를 부여하고, 중복되는 설명은 생략하도록 한다.FIG. 5 is a diagram illustrating the operation of an image generating device that generates a 3D image based on the movement and direction of each of a plurality of recognized people according to an embodiment of the present disclosure. Hereinafter, the same reference numerals will be assigned to the same components as those described in FIG. 2, and overlapping descriptions will be omitted.

일 실시예에서, 도 5를 참조하면, 적어도 하나의 프로세서(130)는 입력 인터페이스(300)를 통하여 제공된 사용자의 입력 문장에 대응되는 복수의 프레임들을 포함하는 3D 이미지(400)를 생성할 수 있다. 이때, 복수의 프레임들을 포함하는 3D 이미지(400)는 동영상일 수 있다.In one embodiment, referring to FIG. 5, at least one processor 130 may generate a 3D image 400 including a plurality of frames corresponding to a user input sentence provided through the input interface 300. . At this time, the 3D image 400 including a plurality of frames may be a video.

일 실시예에서, 적어도 하나의 프로세서(130)는 입력 문장에 대응되도록 단일 프레임의 3D 이미지를 생성한 후, 해당 프레임 내에 포함된 인물들에 식별자(Identifier)를 부여할 수 있다. 일 실시예에서, 입력 문장에 “두 명의 남자(320)”가 포함되고, 적어도 하나의 프로세서(130)가 “두 명의 남자(320)”에 대응되는 인물 이미지를 포함하도록 단일 프레임의 3D 이미지를 생성한 경우, 적어도 하나의 프로세서(130)는 인물 이미지에 포함된 두 명의 남자 이미지(420, 421) 각각에 식별자를 부여할 수 있다.In one embodiment, at least one processor 130 may generate a 3D image of a single frame to correspond to an input sentence and then assign identifiers to people included in the frame. In one embodiment, the input sentence includes “two men 320,” and at least one processor 130 generates a single frame of a 3D image to include a person image corresponding to “two men 320.” When generated, at least one processor 130 may assign an identifier to each of the two male images 420 and 421 included in the person image.

일 실시예에서, 적어도 하나의 프로세서(130)는 복수의 프레임들에서 두 명의 남자 이미지가 “몸싸움을 하고 있는(340)” 경우, 식별자를 이용하여, 각각의 남자들을 추적(tracking)할 수 있다. 이를 토대로, 복수의 프레임들 동안 두 명의 남자 이미지가 움직이는 동선 등을 추적하고, 이를 토대로 3D 이미지 내에 포함된 인물들의 위치 및 방향 정보를 결정할 수 있다. In one embodiment, at least one processor 130 may use an identifier to track each of the men when an image of two men is “fighting 340” in a plurality of frames. . Based on this, the movement lines of the two male images can be tracked over a plurality of frames, and based on this, the location and direction information of the people included in the 3D image can be determined.

도 6은 본 개시의 일 실시예에 따른 인식된 복수의 인물들 간의 상호 작용을 고려하여 3D 이미지를 생성하는 이미지 생성 장치의 동작을 설명하기 위한 도면이다. 이하, 도 2 및 도 6에서 설명한 구성과 동일한 구성에 대하여는 동일한 도면 부호를 부여하고, 중복되는 설명은 생략한다.FIG. 6 is a diagram illustrating the operation of an image generating device that generates a 3D image by considering interactions between a plurality of recognized people according to an embodiment of the present disclosure. Hereinafter, the same reference numerals will be assigned to the same components as those described in FIGS. 2 and 6, and overlapping descriptions will be omitted.

도 2, 도 4b, 도 5 및 도 6을 참조하면, 일 실시예에서, 적어도 하나의 프로세서(130)는 입력 문장에 포함된 인물 요소(320)에 포함된 인물의 수에 따라서, 인물 요소의 위치 및 방향 정보를 결정하는 방법을 다르게 적용할 수 있다.Referring to FIGS. 2, 4B, 5, and 6, in one embodiment, at least one processor 130 determines the number of people in the person element 320 included in the input sentence. Different methods for determining location and direction information can be applied.

일 실시예에서, 인물 요소(320)가 한 명의 인물을 포함하는 경우, 적어도 하나의 프로세서(130)는 인물 요소(320)의 상호 작용을 고려할 필요 없이, 인물 요소(320) 및 동작 요소(340)에 기초하여 인물 요소(320)의 포즈를 추정하고, 추정된 인물 요소(320)에 기초하여 인물 요소(320)의 위치 및 방향 정보를 결정할 수 있다.In one embodiment, when the people element 320 includes a single person, the at least one processor 130 can interact with the people element 320 and the motion element 340 without having to consider the interaction of the people element 320. ), the pose of the person element 320 may be estimated, and the location and direction information of the person element 320 may be determined based on the estimated person element 320.

일 실시예에서, 인물 요소(320)가 복수의 인물을 포함하는 경우, 적어도 하나의 프로세서(130)는 복수의 인물들 간의 상호 작용을 고려하여 인물 요소(320)의 위치 및 방향 정보를 결정할 수 있다. 일 실시예에서, 적어도 하나의 프로세서(130)는 인물 요소(320) 및 동작 요소(340)에 기초하여 복수의 인물들 각각의 포즈를 추정하고, 추정된 포즈에 기초하여 복수의 인물들 각각의 위치 및 방향 정보를 결정할 수 있다. In one embodiment, when the person element 320 includes a plurality of people, at least one processor 130 may determine the location and direction information of the person element 320 by considering the interaction between the plurality of people. there is. In one embodiment, the at least one processor 130 estimates the pose of each of the plurality of people based on the person element 320 and the motion element 340, and calculates the pose of each of the plurality of people based on the estimated pose. Location and direction information can be determined.

적어도 하나의 프로세서(130)는 복수의 인물들의 수에 따라 적합한 모델링 알고리즘을 이용하여, 복수의 인물들 각각의 위치 및 방향 정보를 보정할 수 있다. 이때, 보정된 복수의 인물들 각각의 위치 및 방향 정보는 복수의 인물들 간의 상호 작용이 고려된 것일 수 있다. At least one processor 130 may correct the position and direction information of each of the plurality of people using an appropriate modeling algorithm according to the number of the plurality of people. At this time, the corrected location and direction information for each of the plurality of people may take into account the interaction between the plurality of people.

일 실시예에서, 모델링 알고리즘은 Social force model 또는 Attention mechanism을 포함할 수 있다. Social force model은 행동 과학에 기초하여, 복수의 인물들 간의 상호 작용을 고려하는 모델일 수 있다. Attention mechanism은 복수의 인물들 각각이 동작을 수행함에 있어, 서로 간에게 영향을 미치는 정도를 분석하는 모델일 수 있다.In one embodiment, the modeling algorithm may include a Social force model or Attention mechanism. Social force model may be a model that considers interactions between multiple people based on behavioral science. Attention mechanism may be a model that analyzes the degree to which a plurality of people influence each other when each person performs an action.

일 실시예에서, 도 6을 참조하면, 도 6에는 인물 요소(320)가 “세 남자”를 포함하고, 동작 요소(340)가 “몸싸움을 하고 있다”인 것으로 도시되어 있다. 적어도 하나의 프로세서(130)는 인물 요소(320)와 동작 요소(340)에 기초하여, 몸싸움을 하는 세 명의 남자들의 포즈를 각각 추정하고, 세 명의 남자들 각각에 식별자를 부여할 수 있다.In one embodiment, referring to Figure 6, figure element 320 is shown to include “three men” and motion element 340 is shown to be “struggling.” At least one processor 130 may estimate the poses of three men fighting based on the person element 320 and the motion element 340, and may assign an identifier to each of the three men.

적어도 하나의 프로세서(130)는 추정된 각각의 포즈와 식별자에 기초하여 세 명의 남자들 각각의 위치와 방향 정보를 추출할 수 있다. 적어도 하나의 프로세서(130)는 추출한 세 명의 남자들 각각의 위치와 방향 정보를 모델링 알고리즘을 이용하여 보정하여, 세 명의 남자들 간의 상호 작용이 고려되어 보정된 위치와 방향 정보를 결정할 수 있다. 적어도 하나의 프로세서(130)는 보정된 위치와 방향 정보에 기초하여 3D 이미지(400)를 생성할 수 있다. 예를 들어, 세 명의 남자들 중 한 명이 왼손 잡이이고, 나머지 두 명이 오른손 잡이일 경우, 서로 간의 상호 작용을 고려하여 왼손 잡이인 남자와 오른손 잡이인 남자의 몸싸움이 자연스럽게 보이도록 위치 및 방향 정보를 보정할 수 있다.At least one processor 130 may extract location and direction information for each of the three men based on their respective estimated poses and identifiers. At least one processor 130 may correct the extracted position and direction information of each of the three men using a modeling algorithm and determine the corrected position and direction information by considering the interaction between the three men. At least one processor 130 may generate the 3D image 400 based on the corrected position and direction information. For example, if one of the three men is left-handed and the other two are right-handed, the position and direction information is corrected so that the struggle between the left-handed man and the right-handed man appears natural, taking into account their interaction. can do.

인물 요소(320)에 포함된 인물의 수와 무관하게 인물 요소(320)의 위치와 방향 정보를 결정하여 3D 이미지를 생성하는 것과 비교하여, 인물 요소(320)에 복수의 인물들이 포함된 경우, 복수의 인물들 간의 상호 작용을 고려하여 보정된 위치와 방향 정보에 기초하여 3D 이미지를 생성하는 경우, 보다 자연스러운 동작을 수행하는 인물들의 이미지를 생성할 수 있다. 또한, 복수의 인물들이 동작을 수행하는 것이 하나의 장면(scene)에 모두 표현되도록 3D 이미지를 생성할 수도 있다.Compared to generating a 3D image by determining the position and direction information of the person element 320 regardless of the number of people included in the person element 320, when a plurality of people are included in the person element 320, When generating a 3D image based on position and direction information corrected by taking into account the interaction between a plurality of people, it is possible to generate images of people performing more natural movements. Additionally, a 3D image can be generated so that the actions of multiple people are all expressed in one scene.

도 7a는 본 개시의 일 실시예에 따른 배경 이미지를 생성하는 이미지 생성 장치의 동작을 설명하기 위한 도면이다. 도 7b는 본 개시의 일 실시예에 따른 인물 이미지를 생성하는 이미지 생성 장치의 동작을 설명하기 위한 도면이다. 도 7c는 본 개시의 일 실시예에 따른 인물의 동작 이미지를 생성하는 이미지 생성 장치의 동작을 설명하기 위한 도면이다.FIG. 7A is a diagram for explaining the operation of an image generating device that generates a background image according to an embodiment of the present disclosure. FIG. 7B is a diagram for explaining the operation of an image generating device that generates a person image according to an embodiment of the present disclosure. FIG. 7C is a diagram for explaining the operation of an image generating device that generates a motion image of a person according to an embodiment of the present disclosure.

도 1, 도 7a, 도 7b 및 도 7c를 참조하면, 일 실시예에서, 적어도 하나의 프로세서(130)는 입력 문장에 대응되는 3D 이미지를 생성함에 있어, 배경 이미지(410), 인물 이미지(420) 및 동작 이미지(440)를 순차적으로 생성할 수 있다.Referring to FIGS. 1, 7A, 7B, and 7C, in one embodiment, at least one processor 130 generates a 3D image corresponding to an input sentence, a background image 410, and a person image 420. ) and motion images 440 can be generated sequentially.

도 7a를 참조하면, 적어도 하나의 프로세서(130)는 입력 문장에서 배경 요소를분할하고, 복수의 기본 배경 이미지를 포함하는 배경 에셋 더미(asset dump)로부터 분할된 배경 요소에 대응되는 배경 이미지를 로딩할 수 있다. 일 실시예에서, 적어도 하나의 프로세서(130)는 분할된 배경 요소로부터 배경 정보를 추출하고, 배경 에셋 더미로부터 추출된 배경 정보에 대응되는 배경 이미지(410)를 로딩할 수 있다. 일 실시예에서, 배경 이미지(410)만을 포함하는 제1 장면(500)이 디스플레이를 통하여 사용자에게 제공될 수도 있다.Referring to FIG. 7A, at least one processor 130 divides a background element in an input sentence and loads a background image corresponding to the divided background element from a background asset dump containing a plurality of basic background images. can do. In one embodiment, at least one processor 130 may extract background information from the divided background elements and load the background image 410 corresponding to the background information extracted from the background asset pile. In one embodiment, the first scene 500 including only the background image 410 may be provided to the user through the display.

도 7b를 참조하면, 적어도 하나의 프로세서(130)는 입력 문장에서 인물 요소를 분할하고, 복수의 기본 인물 이미지를 포함하는 인물 에셋 더미로부터 분할된 인물 요소에 대응되는 인물 이미지를 로딩할 수 있다. 일 실시예에서, 적어도 하나의 프로세서(130)는 분할된 인물 요소로부터 인물 정보를 추출하고, 인물 에셋 더미로부터 추출된 인물 정보에 대응되는 인물 이미지(420)를 로딩할 수 있다. 일 실시예에서, 배경 이미지(410) 및 인물 이미지(420)만을 포함하는 제2 장면(510)이 디스플레이를 통하여 사용자에게 제공될 수도 있다. 이때, 인물 정보에는 인물의 직업, 인물의 복장, 인물의 생김새, 인물의 성별 등의 정보가 포함될 수도 있다.Referring to FIG. 7B, at least one processor 130 may segment a person element in an input sentence and load a person image corresponding to the divided person element from a person asset pile including a plurality of basic person images. In one embodiment, at least one processor 130 may extract person information from the segmented person elements and load the person image 420 corresponding to the person information extracted from the person asset pile. In one embodiment, the second scene 510 including only the background image 410 and the person image 420 may be provided to the user through the display. At this time, the person information may include information such as the person's occupation, the person's clothing, the person's appearance, and the person's gender.

도 7c를 참조하면, 적어도 하나의 프로세서(130)는 입력 문장에서 동작 요소를 분할하고, 로딩된 인물 이미지(420)의 포즈, 위치 및 방향 정보를 결정할 수 있다. 일 실시예에서, 적어도 하나의 프로세서는 분할된 동작 요소로부터 동작 정보를 추출할 수 있다. 동작 정보는 인물의 포즈, 위치, 방향 정보 등을 포함할 수 있다. 적어도 하나의 프로세서(130)는 추출된 동작 정보에 기초하여, 로딩된 인물 이미지의 포즈, 위치, 방향 정보를 3D 모델링을 이용하여 변경할 수 있다. 다만, 본 개시는 이에 제한되지 않는다. 적어도 하나의 프로세서(130)는 복수의 인물의 포즈, 동작 등을 포함하는 동작 에셋 더미로부터 분할된 동작 요소에 대응되는 동작 데이터를 로딩하고, 로딩한 동작 데이터를 이용하여 로딩된 인물 이미지의 포즈, 위치, 방향 정보를 결정할 수도 있다.Referring to FIG. 7C, at least one processor 130 may segment action elements in an input sentence and determine pose, location, and direction information of the loaded person image 420. In one embodiment, at least one processor may extract operation information from the segmented operation elements. Motion information may include a person's pose, location, and direction information. At least one processor 130 may change the pose, position, and direction information of the loaded person image using 3D modeling, based on the extracted motion information. However, the present disclosure is not limited thereto. At least one processor 130 loads motion data corresponding to motion elements divided from a motion asset pile containing poses, motions, etc. of a plurality of people, and uses the loaded motion data to create a pose of the loaded person image, Location and direction information can also be determined.

이때, 배경 에셋 더미, 인물 에셋 더미 및 동작 에셋 더미는 메모리(120)에 저장되어 있을 수 있다. 또한, 이미지 생성 장치(100)는 외부의 서버 또는 주변의 전자 장치에 저장된 배경 에셋 더미, 인물 에셋 더미 및 동작 에셋 더미를 통신 인터페이스를 통하여 제공받을 수도 있다.At this time, the background asset pile, the character asset pile, and the motion asset pile may be stored in the memory 120. Additionally, the image generating device 100 may receive background asset piles, character asset piles, and motion asset piles stored in an external server or nearby electronic device through a communication interface.

도 8은 복수의 프레임을 포함하는 동영상에서 검색된 동작에 대응되는 단일 프레임의 3D 이미지를 제공하는 동작을 설명하기 위한 도면이다.Figure 8 is a diagram for explaining an operation of providing a 3D image of a single frame corresponding to a motion found in a video including a plurality of frames.

도 8을 참조하면, 입력 인터페이스(300)에 입력된 입력 문장에 대응되도록 생성된 3D 이미지(400)는 복수의 프레임에 걸친 복수의 동작을 나타내는 동영상일 수 있다. 일 실시예에서, 3D 이미지(400)가 동영상일 때, 적어도 하나의 프로세서(130)는 3D 이미지(400)에 포함된 인물의 동작을 검색할 수 있는 검색 인터페이스(360)를 디스플레이(110)를 통하여 제공할 수 있다. 사용자는 검색 인터페이스(360)에 검색하고자 하는 동작을 포함하는 검색 동작 요소, 예를 들어 “어퍼컷”을 입력하여, 3D 이미지(400) 중 해당 동작에 대응되는 하나의 프레임에서의 3D 이미지를 검출할 수 있다.Referring to FIG. 8, the 3D image 400 generated to correspond to the input sentence entered into the input interface 300 may be a video representing a plurality of actions over a plurality of frames. In one embodiment, when the 3D image 400 is a video, at least one processor 130 uses the display 110 to display a search interface 360 that can search for the motion of a person included in the 3D image 400. It can be provided through. The user enters a search action element, for example, “uppercut”, including the action to be searched, into the search interface 360 to detect a 3D image in one frame corresponding to the corresponding action among the 3D images 400. You can.

일 실시예에서, 적어도 하나의 프로세서(130)는 검색 인터페이스(360)를 통하여 검색 동작 요소를 획득할 수 있다. 적어도 하나의 프로세서(130)는 검색 동작 요소에 기초하여, 검색하고자 하는 동작에 대응되는 검색 동작 이미지를 획득할 수 있다. 이때, 검색 동작 이미지는 적어도 하나의 프로세서(130)가 검색 동작 요소에 기초하여 생성할 수도 있다. 일 실시예에서, 적어도 하나의 프로세서(130)는 GAN(Generative Adversarial Networks)을 이용하여 검색 동작 이미지를 생성할 수 있다. 일 실시예에서, 적어도 하나의 프로세서(130)는 GAN에 검색 동작 요소를 입력으로 제공하여, 검색하고자 하는 동작을 포함하는 검색 동작 이미지를 생성할 수도 있다.In one embodiment, at least one processor 130 may obtain a search operation element through the search interface 360. At least one processor 130 may obtain a search operation image corresponding to the operation to be searched, based on the search operation element. At this time, the search operation image may be generated by at least one processor 130 based on the search operation element. In one embodiment, at least one processor 130 may generate a search operation image using Generative Adversarial Networks (GAN). In one embodiment, at least one processor 130 may provide search operation elements as input to the GAN to generate a search operation image including the operation to be searched.

또한, 적어도 하나의 프로세서(130)는 외부의 서버 또는 주변의 전자 장치에 검색 동작 요소를 제공한 후, 외부의 서버 또는 주변의 전자 장치로부터 생성된 검색 동작 이미지를 획득할 수도 있다.Additionally, the at least one processor 130 may provide search operation elements to an external server or nearby electronic device and then obtain a search operation image generated from the external server or nearby electronic device.

적어도 하나의 프로세서(130)는 검색 동작 이미지와 복수의 프레임을 포함하는 3D 이미지를 비교하여, 3D 이미지에 포함된 복수의 프레임에 각각 대응되는 복수의 동작들 중, 검색 동작 이미지에 포함된 동작 요소에 대응되는 동작을 포함하는 하나의 프레임의 3D 이미지를 검출할 수 있다.At least one processor 130 compares the search motion image and a 3D image including a plurality of frames, and selects an action element included in the search motion image among a plurality of motions each corresponding to a plurality of frames included in the 3D image. A 3D image of one frame containing a motion corresponding to can be detected.

적어도 하나의 프로세서(130)는 검출된 하나의 프레임의 3D 이미지를 디스플레이(110)를 통하여 사용자에게 제공할 수 있다.At least one processor 130 may provide a 3D image of one detected frame to the user through the display 110.

이를 통하여, 사용자는 복수의 프레임을 포함하는 3D 이미지 내에서 원하는 동작에 대응되는 동작의 프레임을 찾기 위하여 수동으로 찾는 불편함 없이, 손쉽게 원하는 동작에 대응되는 프레임의 3D 이미지를 찾을 수 있다.Through this, the user can easily find the 3D image of the frame corresponding to the desired motion within the 3D image containing a plurality of frames without the inconvenience of manually searching for the frame of the motion corresponding to the desired motion.

도 9는 본 개시의 일 실시예에 따른 사용자의 입력 문장에 포함된 인물의 수 및동작 요소에 기초하여 3D 이미지의 뷰를 결정하는 동작을 설명하기 위한 도면이다.Figure 9 is a diagram for explaining an operation of determining a view of a 3D image based on the number of people and motion elements included in a user's input sentence according to an embodiment of the present disclosure.

도 9를 참조하면, 일 실시예에서, 적어도 하나의 프로세서(130)는 분할된 배경 요소, 분할된 인물 요소(320) 또는 분할된 동작 요소(340) 중 적어도 하나에 기초하여, 생성되는 3D 이미지(400)의 조명의 개수, 조명의 위치, 조명의 색, 날씨, 소리, 생성되는 3D 이미지의 시점(point of view) 중 적어도 하나를 결정할 수 있다. 이때, 3D 이미지의 시점은, 3D 이미지를 바라보는 카메라의 위치나 구도 등을 의미할 수도 있다.Referring to FIG. 9, in one embodiment, the at least one processor 130 generates a 3D image based on at least one of the segmented background element, the segmented character element 320, and the segmented motion element 340. At least one of the number of lights (400), the location of lights, the color of lights, weather, sound, and the point of view of the generated 3D image can be determined. At this time, the viewpoint of the 3D image may mean the position or composition of the camera looking at the 3D image.

일 실시예에서, 적어도 하나의 프로세서(130)는 분할된 인물 요소(320)에 포함된 인물이 한 명일 경우, 3D 이미지의 시점을 바스트 샷, 클로즈업, 전신샷, 반신샷 등의 시점으로 결정할 수 있다. 적어도 하나의 프로세서(130)는 분할된 인물 요소(320)에 포함된 인물이 복수일 경우, 3D 이미지의 시점을 풀 샷, 롱 샷, 탑 뷰 등의 시점으로 결정할 수 있다.In one embodiment, when there is only one person included in the divided person element 320, the at least one processor 130 may determine the viewpoint of the 3D image as a bust shot, close-up, full-body shot, half-body shot, etc. there is. When there are multiple people included in the divided person element 320, at least one processor 130 may determine the viewpoint of the 3D image as a full shot, long shot, top view, etc.

일 실시예에서, 적어도 하나의 프로세서(130)가 3D 이미지의 시점을 결정할 때, 분할된 인물 요소(320)에 포함된 인물의 수뿐만 아니라, 분할된 동작 요소(340)도 고려할 수 있다. In one embodiment, when the at least one processor 130 determines the viewpoint of the 3D image, not only the number of people included in the segmented person element 320 but also the segmented motion element 340 may be considered.

일 실시예에서, 적어도 하나의 프로세서(130)는 인물 요소(320)에 포함된 인물들의 위치, 자세, 거리뿐만 아니라, 인물들의 동작 요소을 고려하여 3D 이미지의 시점을 결정할 수 있다. 일 실시예에서, 인물 요소(320)에 포함된 복수의 인물들이 몸 싸움을 하고 있는 경우, 풀 샷으로 몸싸움을 하고 있는 복수의 인물들이 모두 표현되도록 3D 이미지의 시점을 결정할 수 있다. 또한, 입력 문장이 “교실에서 치고박고 싸우는 학생들”인 경우, 3D 이미지의 시점이 “학생들” 간의 싸움에 집중하기보다, 교실의 전체적인 분위기 및 학생들의 다툼이 모두 잘 보여 하나의 장면으로 인식되도록 결정될 수도 있다.In one embodiment, at least one processor 130 may determine the viewpoint of the 3D image by considering not only the position, posture, and distance of the people included in the person element 320, but also the motion elements of the people. In one embodiment, when a plurality of people included in the person element 320 are physically fighting, the viewpoint of the 3D image may be determined so that all the plurality of people fighting are expressed in a full shot. Additionally, if the input sentence is “students fighting and fighting in the classroom,” the viewpoint of the 3D image will be determined so that the overall atmosphere of the classroom and the students’ fight are clearly visible and recognized as a single scene, rather than focusing on the fight between the “students.” It may be possible.

이때, 적어도 하나의 프로세서(130)는 LSTM을 이용하여 입력 문장에 포함된 인물 요소들의 자세, 동작, 인원 등에 기초하여 3D 이미지의 시점을 결정할 수 있다.At this time, at least one processor 130 may use LSTM to determine the viewpoint of the 3D image based on the posture, motion, person, etc. of the character elements included in the input sentence.

또한, 적어도 하나의 프로세서(130)는 입력 문장에 포함된 조건에 따라, 3D 이미지의 효과를 달리하도록 조명의 위치, 개수, 색, 날씨, 소리 등을 달리할 수 있다. 일 실시예에서, 입력 문장에 “싸늘한 주검 앞에서”, “스산한 밤에”, “비내리는 공원에서” 등과 같이 3D 이미지의 효과에 영향을 미치는 배경 요소 등이 포함된 경우, 적어도 하나의 프로세서(130)는 해당 요소들에 기초하여 조명의 위치, 개수, 색, 날씨, 소리 등을 달리할 수 있다.Additionally, at least one processor 130 may vary the location, number, color, weather, sound, etc. of lights to vary the effect of the 3D image according to conditions included in the input sentence. In one embodiment, if the input sentence includes background elements that affect the effect of the 3D image, such as “in front of a cold corpse,” “on a dreary night,” “in a rainy park,” etc., at least one processor (130 ) can vary the location, number, color, weather, sound, etc. of lights based on the relevant factors.

일 실시예에서, 적어도 하나의 프로세서(130)는 CNN(Convolutional Neural Network)를 이용하여, 3D 이미지의 효과에 영향을 미치는 배경 요소들에 대응되는 이미지들의 특징을 추출하여, 추출된 이미지들에 기초하여 3D 이미지의 조명의 위치, 개수, 색, 날씨, 소리 등을 결정할 수 있다. 또한, 적어도 하나의 프로세서(130)는 FCN(Fully Convolutional Network)를 이용하여 3D 이미지의 효과에 영향을 미치는 배경 요소들에 대응되는 이미지들의 분할을 수행하고, 분할된 이미지들에 기초하여 3D 이미지의 조명의 위치, 개수, 색, 날씨, 소리 등을 결정할 수 있다.In one embodiment, at least one processor 130 uses a Convolutional Neural Network (CNN) to extract features of images corresponding to background elements that affect the effect of the 3D image, and performs processing based on the extracted images. This allows you to determine the location, number, color, weather, sound, etc. of lights in the 3D image. In addition, at least one processor 130 performs segmentation of images corresponding to background elements that affect the effect of the 3D image using FCN (Fully Convolutional Network), and divides the 3D image based on the segmented images. You can determine the location, number, color, weather, sound, etc. of lights.

도 10은 본 개시의 일 실시예에 따른 복수의 인터페이스를 통하여 복수의 문장 구성 요소를 각각 제공받는 이미지 생성 장치의 동작을 설명하기 위한 도면이다.FIG. 10 is a diagram illustrating the operation of an image generating device that receives a plurality of sentence components through a plurality of interfaces according to an embodiment of the present disclosure.

도 10을 참조하면, 일 실시예에서, 입력 인터페이스는 제1 입력 인터페이스(370), 제2 입력 인터페이스(380) 및 제3 입력 인터페이스(390)를 포함할 수 있다.Referring to FIG. 10 , in one embodiment, the input interface may include a first input interface 370, a second input interface 380, and a third input interface 390.

일 실시예에서, 제1 입력 인터페이스(370)는 배경 요소, 예를 들어 “교실”을 입력받을 수 있는 인터페이스일 수 있다. 제2 입력 인터페이스(380)는 인물 요소, 예를 들어 “학생” 또는 인물의 위치인 “한가운데”를 입력받을 수 있는 인터페이스일 수 있다. 제3 입력 인터페이스(390)는 동작 요소, 예를 들어 ”치고받고 싸우는”는 입력받을 수 있는 인터페이스일 수 있다. In one embodiment, the first input interface 370 may be an interface that can input a background element, for example, “classroom.” The second input interface 380 may be an interface that can receive character elements, for example, “student” or “center”, which is the location of the character. The third input interface 390 may be an interface that can receive input of action elements, for example, “hitting and fighting.”

일 실시예에서, 사용자는 제1 내지 제3 인터페이스들(370, 380, 390)을 통하여, 입력 문장의 문장 요소들을 순차적으로 입력할 수 있다. 이를 통하여 사용자가 불필요하게 이미지 생성 장치(100)로 제공하는 정보를 줄일 수 있고, 또한 이미지 생성 장치(100)를 이용함에 있어 사용자의 편의성을 증대시킬 수 있다.In one embodiment, a user may sequentially input sentence elements of an input sentence through the first to third interfaces 370, 380, and 390. Through this, the information that the user unnecessarily provides to the image generating device 100 can be reduced, and the user's convenience in using the image generating device 100 can be increased.

또한, 사용자는 제1 내지 제3 인터페이스들(370, 380, 390)을 통하여, 생성된 3D 이미지(400) 중 수정하고 싶은 요소를 선택하여 수정할 수 있다. 일 실시예에서, 3D 이미지(400)가 배경 요소, 인물 요소 및 동작 요소에 각각 대응되는 배경 이미지, 인물 이미지(420, 421) 및 동작 이미지(440)를 포함할 수 있다. 사용자는 제1 내지 제3 인터페이스들(370, 380, 390)을 이용하여 3D 이미지(400) 중 수정하고 싶은 적어도 하나의 이미지 요소에 대응되는 문장 요소를 수정하여 입력할 수 있다. 이를 통하여 사용자의 수정 자유도를 증가시킬 수 있다.Additionally, the user can select and modify an element he or she wants to modify among the generated 3D images 400 through the first to third interfaces 370, 380, and 390. In one embodiment, the 3D image 400 may include a background image, a person image 420, 421, and a motion image 440, respectively corresponding to the background element, the person element, and the motion element. The user may use the first to third interfaces 370, 380, and 390 to edit and input a sentence element corresponding to at least one image element that he or she wants to modify among the 3D image 400. Through this, the user's freedom of modification can be increased.

도 11a는 본 개시의 일 실시예에 따른 생성된 3D 이미지를 편집하는 동작을 설명하기 위한 도면이다. 도 11b는 본 개시의 일 실시예에 따른 생성된 3D 이미지를 편집하는 동작을 설명하기 위한 도면이다. 도 11c는 본 개시의 일 실시예에 따른 생성된 3D 이미지에 포함된 복수의 구성 요소들을 레이어 별로 구분하여 편집하는 동작을 설명하기 위한 도면이다. FIG. 11A is a diagram for explaining an operation of editing a generated 3D image according to an embodiment of the present disclosure. FIG. 11B is a diagram for explaining an operation of editing a generated 3D image according to an embodiment of the present disclosure. FIG. 11C is a diagram for explaining an operation of editing a plurality of components included in a 3D image generated by dividing them into layers according to an embodiment of the present disclosure.

도 1 및 도 11a를 참조하면, 일 실시예에서, 적어도 하나의 프로세서(130)는 생성된 3D 이미지(400)를 편집할 수 있는 편집 디스플레이(600)을 디스플레이(110)를 통하여 제공할 수 있다. 일 실시예에서, 사용자는 편집 인터페이스(600)를 통하여 생성된 3D 이미지를 수정하기 위한 입력을 제공할 수 있다.Referring to FIGS. 1 and 11A , in one embodiment, at least one processor 130 may provide an editing display 600 that can edit the generated 3D image 400 through the display 110. . In one embodiment, a user may provide input to modify the generated 3D image through the editing interface 600.

적어도 하나의 프로세서(130)는 편집 인터페이스(600)를 통하여 수정 입력 정보를 획득하고, 획득한 수정 입력 정보에 기초하여 3D 이미지(400)에 포함된 인물의 위치, 방향 정보, 인물의 동작 중 적어도 하나를 수정할 수 있다. At least one processor 130 acquires correction input information through the editing interface 600, and based on the obtained correction input information, at least one of the position, direction information, and movement of the person included in the 3D image 400 You can edit one.

일 실시예에서, 사용자는 편집 인터페이스(600)를 통하여 3D 이미지(400)에 포함된 에셋, 예를 들어 배경 이미지나 인물 이미지, 또는 동작 이미지의 크기를 조정하거나, 위치를 변경하거나 삭제 또는 추가할 수 있다.In one embodiment, a user may resize, change the position, delete, or add assets included in the 3D image 400, such as background images, portrait images, or motion images, through the editing interface 600. You can.

일 실시예에서, 3D 이미지(400)가 복수의 프레임을 포함하는 경우, 사용자는 편집 인터페이스(600)를 통하여 시간의 경과에 따른 이미지의 구성 요소들의 변화를 섬세하게 조절할 수 있다. 이때, 편집 인터페이스(600)는 타임-바(time-bar)를 포함할 수도 있다.In one embodiment, when the 3D image 400 includes a plurality of frames, the user can delicately control changes in the components of the image over time through the editing interface 600. At this time, the editing interface 600 may include a time-bar.

도 11c를 참조하면, 편집 인터페이스(610)는 복수의 동작 모델들을 표시할 수도 있다. 사용자는 복수의 동작 모델들 중 하나를 선택하여, 3D 이미지에 포함된 인물의 동작을 수정할 수도 있다.Referring to FIG. 11C, the editing interface 610 may display a plurality of operation models. A user may select one of a plurality of motion models to modify the motion of a person included in a 3D image.

도 11c를 참조하면, 편집 인터페이스는 복수의 편집 레이어들(620, 630, 640)을 포함할 수도 있다. 이때, 복수의 편집 레이어들(620, 630, 640) 각각은 3D 이미지에 포함된 복수의 구성 요소들을 레이어별로 구분한 것으로, 사용자는 복수의 편집 레이어들(620, 630, 640)을 이용하여 각각의 레이어로 구별된 구성 요소들을 개별적으로 편집할 수도 있다. 이를 통하여, 다른 레이어에 있는 구성 요소에 영향을 미치지 않고, 수정하고자 하는 레이어에 포함된 구성 요소만을 편집할 수 있다. Referring to FIG. 11C, the editing interface may include a plurality of editing layers 620, 630, and 640. At this time, each of the plurality of editing layers (620, 630, 640) divides the plurality of components included in the 3D image by layer, and the user uses the plurality of editing layers (620, 630, and 640) to edit each element. Components divided into layers can also be edited individually. Through this, you can edit only the components included in the layer you want to edit, without affecting the components in other layers.

도 12는 본 개시의 일 실시예에 따른 이미지 생성 장치의 동작 방법을 설명하기 위한 순서도이다.FIG. 12 is a flowchart for explaining a method of operating an image generating device according to an embodiment of the present disclosure.

도 1, 도 2 및 도 12를 참조하면, 사용자의 입력 문장에 대응되는 3D 이미지(400)를 생성하는 이미지 생성 장치(100)의 동작 방법은 입력 인터페이스(300)를 제공하는 단계(S100)를 포함할 수 있다. 일 실시예에서, 이미지 생성 장치(100)를 사용하는 사용자는 입력 인터페이스(300)를 통하여 이미지 생성 장치(100)에 입력 문장을 입력할 수 있다.Referring to FIGS. 1, 2, and 12, the method of operating the image generating device 100 to generate a 3D image 400 corresponding to a user's input sentence includes providing an input interface 300 (S100). It can be included. In one embodiment, a user using the image generating device 100 may input an input sentence into the image generating device 100 through the input interface 300.

일 실시예에서, 이미지 생성 장치(100)의 동작 방법은 입력 인터페이스(300)를 통하여 사용자의 입력 문장을 획득하는 단계(S200)를 포함할 수 있다.In one embodiment, the method of operating the image generating device 100 may include obtaining a user's input sentence through the input interface 300 (S200).

일 실시예에서, 이미지 생성 장치(100)의 동작 방법은 획득한 입력 문장에 포함된 적어도 하나의 문장 구성 요소를 분할하는 단계(S300)를 포함할 수 있다. 일 실시예에서, 적어도 하나의 문장 구성 요소를 분할하는 단계(S300)에서는, 분할된 적어도 하나의 문장 구성 요소의 정보를 추출할 수도 있다.In one embodiment, the method of operating the image generating device 100 may include segmenting at least one sentence component included in the obtained input sentence (S300). In one embodiment, in the step of segmenting at least one sentence component (S300), information on the at least one segmented sentence component may be extracted.

일 실시예에서, 이미지 생성 장치(100)의 동작 방법은 분할된 적어도 하나의 문장 구성 요소에 기초하여 3D 이미지(400)를 생성하는 단계(S400)를 포함할 수 있다. 일 실시예에서, 3D 이미지(400)를 생성하는 단계(S400)에서는, 추출된 적어도 하나의 문장 구성 요소의 정보에 기초하여 3D 이미지를 생성할 수도 있다.In one embodiment, a method of operating the image generating device 100 may include generating a 3D image 400 based on at least one segmented sentence component (S400). In one embodiment, in step S400 of generating the 3D image 400, the 3D image may be generated based on information on at least one extracted sentence component.

일 실시예에서, 이미지 생성 장치(100)의 동작 방법은, 생성된 3D 이미지(400)를 디스플레이(110)를 통하여 사용자에게 제공하는 단계(S500)를 포함할 수 있다. 이에 따라, 이미지 생성 장치(100)는, 입력 문장을 입력한 사용자에게, 입력 문장에 대응되는 3D 이미지를 생성하여 제공할 수 있다.In one embodiment, the method of operating the image generating device 100 may include providing the generated 3D image 400 to the user through the display 110 (S500). Accordingly, the image generating device 100 may generate and provide a 3D image corresponding to the input sentence to the user who inputs the input sentence.

이상에서 전술한 본 개시의 일 실시예에 따른 방법은, 하드웨어인 서버와 결합되어 실행되기 위해 프로그램(또는 어플리케이션)으로 구현되어 매체에 저장될 수 있다.The method according to an embodiment of the present disclosure described above may be implemented as a program (or application) and stored in a medium in order to be executed in combination with a server, which is hardware.

상기 전술한 프로그램은, 상기 컴퓨터가 프로그램을 읽어 들여 프로그램으로 구현된 상기 방법들을 실행시키기 위하여, 상기 컴퓨터의 프로세서(CPU)가 상기 컴퓨터의 장치 인터페이스를 통해 읽힐 수 있는 C, C++, JAVA, 기계어 등의 컴퓨터 언어로 코드화된 코드(Code)를 포함할 수 있다. 이러한 코드는 상기 방법들을 실행하는 필요한 기능들을 정의한 함수 등과 관련된 기능적인 코드(Functional Code)를 포함할 수 있고, 상기 기능들을 상기 컴퓨터의 프로세서가 소정의 절차대로 실행시키는데 필요한 실행 절차 관련 제어 코드를 포함할 수 있다. 또한, 이러한 코드는 상기 기능들을 상기 컴퓨터의 프로세서가 실행시키는데 필요한 추가 정보나 미디어가 상기 컴퓨터의 내부 또는 외부 메모리의 어느 위치(주소 번지)에서 참조되어야 하는지에 대한 메모리 참조관련 코드를 더 포함할 수 있다. 또한, 상기 컴퓨터의 프로세서가 상기 기능들을 실행시키기 위하여 원격(Remote)에 있는 어떠한 다른 컴퓨터나 서버 등과 통신이 필요한 경우, 코드는 상기 컴퓨터의 통신 모듈을 이용하여 원격에 있는 어떠한 다른 컴퓨터나 서버 등과 어떻게 통신해야 하는지, 통신 시 어떠한 정보나 미디어를 송수신해야 하는지 등에 대한 통신 관련 코드를 더 포함할 수 있다.The above-mentioned program is C, C++, JAVA, machine language, etc. that can be read by the processor (CPU) of the computer through the device interface of the computer in order for the computer to read the program and execute the methods implemented in the program. It may include code coded in a computer language. These codes may include functional codes related to functions that define the necessary functions for executing the methods, and include control codes related to execution procedures necessary for the computer's processor to execute the functions according to predetermined procedures. can do. In addition, these codes may further include memory reference-related codes that indicate at which location (address address) in the computer's internal or external memory additional information or media required for the computer's processor to execute the above functions should be referenced. there is. In addition, if the computer's processor needs to communicate with any other remote computer or server to execute the above functions, the code uses the computer's communication module to determine how to communicate with any other remote computer or server. It may further include communication-related codes regarding whether communication should be performed and what information or media should be transmitted and received during communication.

상기 저장되는 매체는, 레지스터, 캐쉬, 메모리 등과 같이 짧은 순간 동안 데이터를 저장하는 매체가 아니라 반영구적으로 데이터를 저장하며, 기기에 의해 판독(reading)이 가능한 매체를 의미한다. 구체적으로는, 상기 저장되는 매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피디스크, 광 데이터 저장장치 등이 있지만, 이에 제한되지 않는다. 즉, 상기 프로그램은 상기 컴퓨터가 접속할 수 있는 다양한 서버 상의 다양한 기록매체 또는 사용자의 상기 컴퓨터상의 다양한 기록매체에 저장될 수 있다. 또한, 상기 매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장될 수 있다.The storage medium refers to a medium that stores data semi-permanently and can be read by a device, rather than a medium that stores data for a short period of time, such as a register, cache, or memory. Specifically, examples of the storage medium include ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage device, etc., but are not limited thereto. That is, the program may be stored in various recording media on various servers that the computer can access or on various recording media on the user's computer. Additionally, the medium may be distributed to computer systems connected to a network, and computer-readable code may be stored in a distributed manner.

본 개시의 실시예와 관련하여 설명된 방법 또는 알고리즘의 단계들은 하드웨어로 직접 구현되거나, 하드웨어에 의해 실행되는 소프트웨어 모듈로 구현되거나, 또는 이들의 결합에 의해 구현될 수 있다. 소프트웨어 모듈은 RAM(Random Access Memory), ROM(Read Only Memory), EPROM(Erasable Programmable ROM), EEPROM(Electrically Erasable Programmable ROM), 플래시 메모리(Flash Memory), 하드 디스크, 착탈형 디스크, CD-ROM, 또는 본 개시가 속하는 기술 분야에서 잘 알려진 임의의 형태의 컴퓨터 판독가능 기록매체에 상주할 수도 있다.The steps of the method or algorithm described in connection with the embodiments of the present disclosure may be implemented directly in hardware, implemented as a software module executed by hardware, or a combination thereof. The software module may be RAM (Random Access Memory), ROM (Read Only Memory), EPROM (Erasable Programmable ROM), EEPROM (Electrically Erasable Programmable ROM), Flash Memory, hard disk, removable disk, CD-ROM, or It may reside on any type of computer-readable recording medium well known in the art to which this disclosure pertains.

이상, 첨부된 도면을 참조로 하여 본 개시의 실시예를 설명하였지만, 본 개시가 속하는 기술분야의 통상의 기술자는 본 개시가 그 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 그러므로, 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며, 제한적이 아닌 것으로 이해해야만 한다.Above, embodiments of the present disclosure have been described with reference to the attached drawings, but those skilled in the art will understand that the present disclosure can be implemented in other specific forms without changing its technical idea or essential features. You will be able to understand it. Therefore, the embodiments described above should be understood in all respects as illustrative and not restrictive.

Claims

A display providing an input interface;
a memory storing at least one instruction; and
At least one processor executing the at least one instruction stored in the memory,
The at least one processor,
By executing the at least one command,
Obtain the user's input sentence through the input interface,
Dividing at least one sentence component included in the obtained input sentence, wherein the at least one sentence component includes a background element of the sentence, a person element of the sentence, and an action element of the sentence,
Generating a 3D image based on the at least one segmented sentence component,
The generated 3D image is provided through the display,
When the person element includes a person, estimate the pose of the divided person element based on the divided motion element,
Generate the 3D image by determining position and direction information of the person element that performs the segmented action element based on the estimated pose of the person element,
When the person element includes a plurality of people, the position and direction information of the plurality of people performing the divided action element is determined by considering the interaction between the plurality of estimated poses of each of the plurality of people. do,
Assigning an identifier to each person included in the image frame of the generated 3D image,
An image generating device, characterized in that the location and direction information of each person included in the 3D image is determined by tracking the movement line of each person using the identifier.

According to paragraph 1,
When the person element includes one person,
The at least one processor, by executing the at least one instruction, determines position and direction information of the person performing the segmented action element based on the estimated pose of the person,
When the person element includes multiple people,
By executing the at least one instruction, the at least one processor determines the positions of the plurality of people performing the segmented action element, taking into account interactions between the plurality of estimated poses of each of the plurality of people, and An image generating device that determines orientation information.

According to paragraph 1,
The input interface is,
A first input interface for inputting the background element, a second input interface for inputting the character element, and a third input interface for inputting the action element,
The at least one processor executes the at least one instruction,
An image generating device that acquires the background element through the first input interface, the character element through the second input interface, and the action element through the third input interface.

According to paragraph 1,
The at least one processor executes the at least one instruction,
Loading a background image corresponding to the divided background element from a background asset dump containing a plurality of basic background images,
Loading a person image corresponding to the divided person element from a person asset pile containing a plurality of basic person images,
An image generating device that generates the 3D image by determining pose, position, and direction information of the loaded person image based on the divided motion elements.

According to paragraph 1,
The at least one processor executes the at least one instruction,
Based on at least one of the divided background element, the divided character element, and the divided motion element, the number of lights, the position of the light, the color of the light, weather, sound, and the generated 3D image of the generated 3D image. An image generating device that determines at least one of the points of view.

According to paragraph 1,
The at least one processor executes the at least one instruction,
Providing an editing interface for editing the generated 3D image through the display,
Obtain correction input information for modifying the generated 3D image through the editing interface,
An image generating device that generates and provides a 3D image in which at least one of the position, direction information, size, and motion of the person in the 3D image is modified based on the correction input information.

According to paragraph 1,
When the generated 3D image is a video showing multiple movements of a person included in the 3D image over multiple frames,
The at least one processor executes the at least one instruction,
Providing a search interface through the display that can search for movements of people in the 3D image,
Through the search interface, obtain a search sentence including a search action element for searching the movement of the person included in the 3D image,
Based on the search action elements, generate a search action image,
Comparing the generated search motion image and the 3D image to detect a 3D image of one frame representing a motion corresponding to a search motion element among the plurality of motions included in the 3D image,
An image generating device that provides a 3D image of the single frame through the display.

In a method of operating an image generating device that generates a 3D image corresponding to a user input sentence,
providing, by the image generating device, an input interface through a display;
acquiring, by the image generating device, the input sentence of the user through the input interface;
The image generating device divides at least one sentence component included in the obtained input sentence, wherein the at least one sentence component includes a background element of the sentence, a person element of the sentence, and an action element of the sentence. Component division step;
generating, by the image generating device, the 3D image based on the at least one segmented sentence component; and
A step of the image generating device providing the generated 3D image through the display,
The image generating device,
When the person element includes a person, estimate the pose of the divided person element based on the divided motion element,
Generate the 3D image by determining position and direction information of the person element that performs the segmented action element based on the estimated pose of the person element,
When the person element includes a plurality of people, considering the interaction between the plurality of estimated poses of each of the plurality of people, determine the position and direction information of the plurality of people performing the divided action element. do,
Assigning an identifier to each person included in the image frame of the generated 3D image,
A method of operating an image generating device, characterized in that the location and direction information of each person included in the 3D image is determined by tracking the moving line of each person using the identifier.

A computer-readable recording medium coupled to a computer and storing a program for executing the method of claim 9.