JP2020043384A

JP2020043384A - Information processing apparatus, method, and program for generating composite image for user

Info

Publication number: JP2020043384A
Application number: JP2018166814A
Authority: JP
Inventors: 千貴米倉; Kazutaka Yonekura
Original assignee: Alt Inc
Current assignee: Alt Inc
Priority date: 2018-09-06
Filing date: 2018-09-06
Publication date: 2020-03-19
Anticipated expiration: 2038-09-06
Also published as: WO2020050097A1; JP6715524B2

Abstract

To provide an information processing apparatus, a method, and a program, capable of achieving a service that fulfills a user's desire to impersonate another person.SOLUTION: In a computer system 1000, an information processing apparatus for generating a composite image for a user includes: first acquisition means for acquiring at least one user image; second acquisition means for acquiring at least one base image for which modification is permitted; generation means for generating a composite image on the basis of the at least one base image and the at least one user image.SELECTED DRAWING: Figure 4

Description

本発明は、ユーザのための合成画像を生成するための情報処理装置、方法、プログラムに関する。 The present invention relates to an information processing apparatus, method, and program for generating a composite image for a user.

人はだれしも他人に成り代わってみたいという願望を抱いている。例えば、映画の俳優に成り代わってみたい、広告の芸能人に成り代わってみたい等の願望を抱いている。このような願望を叶えてくれるサービスは存在していない。 Everyone has a desire to impersonate others. For example, they have a desire to impersonate a movie actor or an entertainer in advertising. There is no service that fulfills such a desire.

本発明の発明者は、ユーザの他人に成り代わってみたいという願望を叶えてくれるサービスを実現することが新たなメディア体験につながると考えた。 The inventor of the present invention has realized that realizing a service that fulfills the desire to impersonate another user will lead to a new media experience.

本発明は、ユーザの他人に成り代わってみたいという願望を叶えてくれるサービスを実現することが可能な情報処理装置等を提供することを目的とする。 SUMMARY OF THE INVENTION An object of the present invention is to provide an information processing apparatus or the like that can realize a service that fulfills a desire to impersonate another user.

本発明は、例えば、以下の項目を提供する。
（項目１）
ユーザのための合成画像を生成するための情報処理装置であって、
少なくとも１つのユーザ画像を取得する第１の取得手段と、
改変が許諾されている少なくとも１つのベース画像を取得する第２の取得手段と、
前記少なくとも１つのベース画像と前記少なくとも１つのユーザ画像とに基づいて合成画像を生成する生成手段と
を備える情報処理装置。
（項目２）
前記合成画像をユーザに提供するための提供手段をさらに備える、項目１に記載の情報処理装置。
（項目３）
前記提供手段は、前記合成画像を提供することの要求を前記ユーザから受信することなしに、自動的に前記合成画像を前記ユーザに提供する、項目２に記載の情報処理装置。
（項目４）
前記提供手段は、
前記ユーザに前記合成画像を提供可能であることを通知することと、
前記合成画像を提供することの要求を前記ユーザから受信することと、
前記合成画像を提供することの要求を前記ユーザから受信することに応答して、前記合成画像を前記ユーザに提供することと
を行う、項目２に記載の情報処理装置。
（項目５）
前記第２の取得手段は、複数のベース画像を取得し、
前記提供手段は、
前記ユーザに前記複数のベース画像の選択肢を提供することと、
前記複数のベース画像のうちの少なくとも１つを選択する入力を前記ユーザから受信することと、
前記複数のベース画像のうちの少なくとも１つを選択する入力を前記ユーザから受信することに応答して、前記選択された少なくとも１つのベース画像と前記少なくとも１つのユーザ画像とに基づいて生成された合成画像を前記ユーザに提供することと
を行う、項目２に記載の情報処理装置。
（項目６）
前記少なくとも１つのベース画像内に複数の人物が写っており、
前記提供手段は、
前記ユーザに前記少なくとも１つのベース画像内の複数の人物の選択肢を提供することと、
前記少なくとも１つのベース画像内の複数の人物のうちの少なくとも１人を選択する入力を前記ユーザから受信することと、
前記少なくとも１つのベース画像内の複数の人物のうちの少なくとも１人を選択する入力を前記ユーザから受信することに応答して、前記合成画像を前記ユーザに提供することと
を行い、
前記合成画像は、前記少なくとも１つのベース画像内の前記選択された少なくとも１人の人物の少なくとも一部と前記少なくとも１つのユーザ画像内の人物の少なくとも一部とを変換した合成画像である、項目２、４〜５のいずれか一項に記載の情報処理装置。
（項目７）
前記生成手段は、
前記少なくとも１つのベース画像の少なくとも一部と前記少なくとも１つのユーザ画像内の人物の少なくとも一部とを変換した合成画像を生成することと
を行う、項目１〜６のいずれか一項に記載の情報処理装置。
（項目８）
前記生成手段は、
前記少なくとも１つのベース画像内の人物の顔と前記少なくとも１つのユーザ画像内の人物の顔とを変換した合成画像を生成することと
を行う、項目１〜７のいずれか一項に記載の情報処理装置。
（項目９）
前記ベース画像および前記ユーザ画像は音声を含み、
前記生成手段は、
前記少なくとも１つのベース画像内の人物の音声と前記少なくとも１つのユーザ画像内の人物の音声とを変換した合成画像を生成することと
を行う、項目１〜８に記載の情報処理装置。
（項目１０）
前記生成手段は、
前記少なくとも１つのベース画像内の人物の体型と前記少なくとも１つのユーザ画像内の人物の体型とを変換した合成画像を生成することと
を行う、項目１〜９のいずれか一項に記載の情報処理装置。
（項目１１）
前記第１の取得手段は、第１の人物および第２の人物を含む複数の人物のユーザ画像を取得し、
前記生成手段は、
前記少なくとも１つのベース画像内の第１の人物の少なくとも一部と前記ユーザ画像内の第１の人物の少なくとも一部とを変換し、前記少なくとも１つのベース画像内の第２の人物の少なくとも一部と前記ユーザ画像内の第２の人物の少なくとも一部とを変換した合成画像を生成することと
を行う、項目１〜１０のいずれか一項に記載の情報処理装置。
（項目１２）
前記生成手段は、
前記ユーザ画像内の第１の人物と前記ユーザ画像内の第２の人物との間の関係に基づいて、前記ユーザ画像内の第１の人物の少なくとも一部が合成されるべき前記少なくとも１つのベース画像内の第１の人物を決定し、前記ユーザ画像内の第２の人物の少なくとも一部が合成されるべき前記少なくとも１つのベース画像内の第２の人物を決定すること
をさらに行う、項目１１に記載の情報処理装置。
（項目１３）
前記少なくとも１つのベース画像の各々は、一組のサブベース画像を含み、各サブベース画像は、内容が同一であるが、写っている人物がそれぞれ異なっており、
前記生成手段は、
前記少なくとも１つのユーザ画像内の人物に最も類似する人物が写っているサブベース画像を決定することと、
前記決定されたサブベース画像内の前記人物の少なくとも一部と前記少なくとも１つのユーザ画像内の人物の少なくとも一部とを変換した合成画像を生成することと
を行う、項目１〜１２のいずれか一項に記載の情報処理装置。
（項目１４）
前記ユーザ画像は、ユーザ自身の画像である、項目１〜１３のいずれか一項に記載の情報処理装置。
（項目１５）
前記合成画像は、広告動画である、項目１〜１４のいずれか一項に記載の情報処理装置。
（項目１６）
ユーザのための合成画像を生成するためのプログラムであって、前記プログラムは、プロセッサ部を備える情報処理装置において実行され、前記プログラムは、
少なくとも１つのユーザ画像を取得することと、
改変が許諾されている少なくとも１つのベース画像を取得することと、
前記少なくとも１つのベース画像と前記少なくとも１つのユーザ画像とに基づいて合成画像を生成することと
を含む処理を実行することを前記プロセッサ部に行わせる、プログラム。
（項目１７）
ユーザのための合成画像を生成するための方法であって、前記方法は、プロセッサ部を備える情報処理装置において実行され、前記方法は、
前記プロセッサ部が、少なくとも１つのユーザ画像を取得することと、
前記プロセッサ部が、改変が許諾されている少なくとも１つのベース画像を取得することと、
前記プロセッサ部が、前記少なくとも１つのベース画像と前記少なくとも１つのユーザ画像とに基づいて合成画像を生成することと
を含む、方法。
（項目１８）
ユーザのための合成画像を提供するための端末装置であって、前記端末装置はサーバ装置と通信することが可能であり、前記端末装置は、
少なくとも１つのユーザ画像を取得する取得手段と、
前記サーバ装置に前記少なくとも１つのユーザ画像を送信する送信手段と、
前記サーバ装置から、改変が許諾されている少なくとも１つのベース画像と前記少なくとも１つのユーザ画像とに基づいて生成された合成画像を受信する受信手段と
前記合成画像を出力する出力手段と
を備える端末装置。
（項目１９）
合成画像を生成することの許可をユーザから受信する受信手段をさらに備える、項目１８に記載の端末装置。
（項目２０）
ユーザのための合成画像を提供するためのプログラムであって、前記プログラムは、プロセッサ部を備える端末装置において実行され、前記端末装置は、サーバ装置と通信することが可能であり、前記プログラムは、
少なくとも１つのユーザ画像を取得することと、
前記サーバ装置に前記少なくとも１つのユーザ画像を送信することと、
前記サーバ装置から、改変が許諾されている少なくとも１つのベース画像と前記少なくとも１つのユーザ画像とに基づいて生成された合成画像を受信することと
前記合成画像を出力することと
を含む処理を実行することを前記プロセッサ部に行わせる、プログラム。
（項目２１）
ユーザのための合成画像を提供するための方法であって、前記方法は、サーバ装置と通信することが可能な端末装置において実行され、前記方法は、
少なくとも１つのユーザ画像を取得することと、
前記サーバ装置に前記少なくとも１つのユーザ画像を送信することと、
前記サーバ装置から、改変が許諾されている少なくとも１つのベース画像と前記少なくとも１つのユーザ画像とに基づいて生成された合成画像を受信することと、
前記合成画像を出力することと
を含む処理を実行することを前記プロセッサ部に行わせる、プログラム。
（項目２２）
ユーザのための合成画像を生成するためのコンピュータシステムであって、前記コンピュータシステムは、サーバ装置と、前記サーバ装置と通信することが可能な少なくとも１つの端末装置とを備え、
前記端末装置は、
少なくとも１つのユーザ画像を取得することと、
前記サーバ装置に前記少なくとも１つのユーザ画像を送信することと
を行うように構成され、
前記サーバ装置は、
前記少なくとも１つのユーザ画像を取得する前記サーバ装置から受信することと、
改変が許諾されている少なくとも１つのベース画像を取得することと、
前記少なくとも１つのベース画像と前記少なくとも１つのユーザ画像とに基づいて合成画像を生成することと、
前記合成画像を前記端末装置に送信することと
を行うように構成され、
前記端末装置は、
前記サーバ装置から、前記合成画像を受信することと、
前記合成画像を出力することと
を行うようにさらに構成されている、コンピュータシステム。 The present invention provides, for example, the following items.
(Item 1)
An information processing apparatus for generating a composite image for a user,
First acquisition means for acquiring at least one user image;
Second acquisition means for acquiring at least one base image for which modification is permitted;
An information processing apparatus comprising: a generation unit configured to generate a composite image based on the at least one base image and the at least one user image.
(Item 2)
The information processing apparatus according to item 1, further comprising a providing unit for providing the composite image to a user.
(Item 3)
The information processing apparatus according to item 2, wherein the providing unit automatically provides the user with the composite image without receiving a request to provide the composite image from the user.
(Item 4)
The providing means,
Notifying the user that the composite image can be provided;
Receiving a request from the user to provide the composite image;
3. The information processing apparatus according to item 2, wherein in response to receiving a request to provide the composite image from the user, providing the composite image to the user.
(Item 5)
The second acquisition unit acquires a plurality of base images,
The providing means,
Providing the user with a choice of the plurality of base images;
Receiving from the user an input to select at least one of the plurality of base images;
Responsive to receiving input from the user to select at least one of the plurality of base images, the image generated based on the selected at least one base image and the at least one user image 3. The information processing apparatus according to item 2, wherein the information processing apparatus provides a composite image to the user.
(Item 6)
A plurality of persons are shown in the at least one base image;
The providing means,
Providing the user with a choice of a plurality of persons in the at least one base image;
Receiving from the user an input selecting at least one of a plurality of persons in the at least one base image;
Providing the composite image to the user in response to receiving from the user an input to select at least one of the plurality of persons in the at least one base image;
The composite image is a composite image obtained by converting at least a part of the selected at least one person in the at least one base image and at least a part of a person in the at least one user image. The information processing device according to any one of 2, 4 to 5.
(Item 7)
The generation means,
Generating a composite image obtained by converting at least a part of the at least one base image and at least a part of a person in the at least one user image. Information processing device.
(Item 8)
The generation means,
Generating a composite image in which a face of a person in the at least one base image and a face of a person in the at least one user image are converted, the information according to any one of items 1 to 7, Processing equipment.
(Item 9)
The base image and the user image include sound,
The generation means,
The information processing apparatus according to any one of items 1 to 8, further comprising: generating a combined image obtained by converting a voice of a person in the at least one base image and a voice of a person in the at least one user image.
(Item 10)
The generation means,
The method according to any one of items 1 to 9, further comprising: generating a combined image obtained by converting a body shape of a person in the at least one base image and a body shape of a person in the at least one user image. Processing equipment.
(Item 11)
The first obtaining means obtains user images of a plurality of persons including a first person and a second person,
The generation means,
Converting at least a portion of a first person in the at least one base image and at least a portion of a first person in the user image, and converting at least one of a second person in the at least one base image; The information processing apparatus according to any one of items 1 to 10, further comprising: generating a combined image obtained by converting a part and at least a part of a second person in the user image.
(Item 12)
The generation means,
The at least one at least a portion of the first person in the user image is to be synthesized based on a relationship between a first person in the user image and a second person in the user image. Determining a first person in the base image and determining a second person in the at least one base image to which at least a portion of the second person in the user image is to be combined; Item 12. The information processing device according to item 11.
(Item 13)
Each of the at least one base image includes a set of sub-base images, each sub-base image having the same content, but a different person in the image,
The generation means,
Determining a sub-base image that includes a person most similar to the person in the at least one user image;
Generating a combined image obtained by converting at least a part of the person in the determined sub-base image and at least a part of a person in the at least one user image. An information processing device according to claim 1.
(Item 14)
The information processing device according to any one of items 1 to 13, wherein the user image is a user's own image.
(Item 15)
The information processing device according to any one of items 1 to 14, wherein the composite image is an advertisement moving image.
(Item 16)
A program for generating a composite image for a user, wherein the program is executed in an information processing apparatus including a processor unit, and the program includes:
Obtaining at least one user image;
Obtaining at least one base image for which modification is permitted;
Generating a composite image based on the at least one base image and the at least one user image.
(Item 17)
A method for generating a composite image for a user, wherein the method is performed in an information processing apparatus including a processor unit, the method comprising:
The processor unit acquiring at least one user image;
The processor unit acquiring at least one base image for which modification is permitted;
The processor unit generating a composite image based on the at least one base image and the at least one user image.
(Item 18)
A terminal device for providing a composite image for a user, wherein the terminal device can communicate with a server device, wherein the terminal device includes:
Acquiring means for acquiring at least one user image;
Transmitting means for transmitting the at least one user image to the server device;
A terminal comprising: a receiving unit that receives, from the server device, a composite image generated based on at least one base image for which modification is permitted and the at least one user image; and an output unit that outputs the composite image. apparatus.
(Item 19)
Item 19. The terminal device according to item 18, further comprising receiving means for receiving permission from the user to generate the composite image.
(Item 20)
A program for providing a composite image for a user, wherein the program is executed in a terminal device including a processor unit, the terminal device can communicate with a server device, and the program includes:
Obtaining at least one user image;
Transmitting the at least one user image to the server device;
Executing a process including, from the server device, receiving a composite image generated based on at least one base image for which modification is permitted and the at least one user image; and outputting the composite image. A program that causes the processor unit to perform the operation.
(Item 21)
A method for providing a composite image for a user, the method being performed in a terminal device capable of communicating with a server device, the method comprising:
Obtaining at least one user image;
Transmitting the at least one user image to the server device;
Receiving, from the server device, a composite image generated based on at least one base image for which modification is permitted and the at least one user image;
A program for causing the processor unit to execute a process including outputting the composite image.
(Item 22)
A computer system for generating a composite image for a user, the computer system comprising: a server device; and at least one terminal device capable of communicating with the server device.
The terminal device,
Obtaining at least one user image;
And transmitting the at least one user image to the server device.
The server device,
Receiving from the server device acquiring the at least one user image;
Obtaining at least one base image for which modification is permitted;
Generating a composite image based on the at least one base image and the at least one user image;
Transmitting the composite image to the terminal device.
The terminal device,
Receiving the composite image from the server device;
Outputting the composite image.

本発明によれば、ユーザの他人に成り代わってみたいという願望を叶えてくれるサービスを実現することが可能なサーバ装置等を提供することが可能である。これにより、新たなメディア体験をユーザに提供することが可能である。 ADVANTAGE OF THE INVENTION According to this invention, it is possible to provide the server apparatus etc. which can implement | achieve the service which fulfills the desire to impersonate another user. Thereby, it is possible to provide a new media experience to the user.

ユーザのための合成画像を提供するという新たなサービスのフローを概略的に示す図。The figure which shows roughly the flow of the new service which provides the synthetic | combination image for a user. 変換される前の元画像を再生している様子および変換された後の合成画像を再生している様子を示す図。FIG. 9 is a diagram illustrating a state in which an original image before conversion is reproduced and a state in which a composite image after conversion is reproduced. ベース画像をユーザに選択させるための選択画面１０の一例が表示された端末装置１００を示す図。The figure which shows the terminal device 100 on which the example of the selection screen 10 for making a user select a base image was displayed. 成り代わりたい人物をユーザに選択させるための選択画面２０の一例が表示された端末装置１００を示す図。The figure which shows the terminal device 100 on which the example of the selection screen 20 for making a user select the person who wants to substitute is displayed. 合成画像を視聴可能である旨の通知を表示する通知画面３０の一例が表示された端末装置１００を示す図。The figure which shows the terminal device 100 on which the example of the notification screen 30 which displays the notification to the effect that a synthetic image is viewable was displayed. ユーザのための合成画像を提供するためのコンピュータシステム１０００の構成の一例を示す図。FIG. 1 is a diagram illustrating an example of a configuration of a computer system 1000 for providing a composite image for a user. 端末装置１００の構成の一例を示すブロック図。FIG. 2 is a block diagram showing an example of the configuration of the terminal device 100. サーバ装置２００の構成の一例を示すブロック図。FIG. 2 is a block diagram showing an example of a configuration of a server device 200. サーバ装置２００のプロセッサ部２３０の構成の一例を示すブロック図。FIG. 3 is a block diagram showing an example of a configuration of a processor unit 230 of the server device 200. ユーザのための合成画像を提供するためのコンピュータシステム１０００における処理の一例を示すフローチャート。9 is a flowchart showing an example of processing in the computer system 1000 for providing a composite image for a user. ユーザのための合成画像を提供するためのコンピュータシステム１０００における処理の一例を示すフローチャート。9 is a flowchart showing an example of processing in the computer system 1000 for providing a composite image for a user. ユーザのための合成画像を提供するためのコンピュータシステム１０００における処理の一例を示すフローチャート。9 is a flowchart showing an example of processing in the computer system 1000 for providing a composite image for a user. ユーザのための合成画像を提供するためのコンピュータシステム１０００における処理の一例を示すフローチャート。9 is a flowchart showing an example of processing in the computer system 1000 for providing a composite image for a user.

（定義）
本明細書において「画像」は、静止画および動画を含む。静止画および動画は、音声を含んでもよいし、含まなくてもよい。音声を含まない静止画または動画は、映像と呼ぶ。画像は、静止画よりも動画であることが好ましい。動画は静止画よりも情報量が多く、かつ、表現の幅が大きいからである。動画は、静止画に比べて、ユーザにとってより魅力的なコンテンツを表現することができる。 (Definition)
As used herein, “image” includes a still image and a moving image. Still images and moving images may or may not include audio. A still image or a moving image that does not include audio is called a video. The image is preferably a moving image rather than a still image. This is because a moving image has a larger amount of information than a still image and has a wider range of expression. A moving image can represent more attractive content for a user than a still image.

本明細書において「ユーザ画像」は、ユーザ本人が写っている画像、または、ユーザの家族、親族もしくは友人等のユーザに関連する人物が写っている画像を含む。ユーザに関連する人物は、例えば、ユーザと血縁関係または婚姻関係でつながりを持つ人物であり得る。ユーザに関連する人物は、例えば、ユーザがその人物の肖像権について責任を負うことができる人物を意味する。 In the present specification, the “user image” includes an image in which the user himself or an image in which a person related to the user, such as the user's family, relatives, or friends, is shown. The person associated with the user may be, for example, a person who has a blood or marriage relationship with the user. A person associated with a user means, for example, a person for whom the user can be responsible for the portrait right of that person.

本明細書において「ベース画像」は、合成画像のベースとなる画像を意味する。ベース画像は、例えば、映画、番組、広告等の企業が著作権を有する画像であり得る。ベース画像は、例えば、有名人自らが撮影した画像等の有名人自身が著作権を有する画像であり得る。ベース画像は、例えば、ユーザ画像であってもよい。 In the present specification, the “base image” means an image serving as a base of the composite image. The base image can be, for example, an image in which a company, such as a movie, a program, or an advertisement, has a copyright. The base image may be, for example, an image in which the celebrity has a copyright, such as an image taken by the celebrity. The base image may be, for example, a user image.

以下、図面を参照しながら、本発明の実施の形態を説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

１．ユーザのための合成画像を提供するという新たなサービス
本発明の発明者は、ユーザのための合成画像を提供するという新たなサービスを開発した。そのサービスとは、映画、番組、広告等に登場する人物の画像の少なくとも一部（例えば、顔の画像）および／またはその人物の音声をその人物とは異なるユーザの画像の少なくとも一部（例えば、顔の画像）および／またはそのユーザの音声に変換した合成画像をそのユーザに提供するというものである。この新たなサービスにより、ユーザは、あたかも自分がその映画、番組、広告等に出演したかのような画像を視聴することができるようになる。 1. New service of providing composite images for users The inventor of the present invention has developed a new service of providing composite images for users. The service includes at least a part of an image of a person (for example, a face image) appearing in a movie, a program, an advertisement, and / or at least a part of an image of a user different from the person (for example, a face image). , A face image) and / or a synthesized image converted to the user's voice. With this new service, users can view images as if they appeared in the movie, program, advertisement, or the like.

上述したように、人はだれしも他人に成り代わってみたいという願望を抱いている。この新たなサービスによれば、映画の俳優に成り代わってみたい、広告の芸能人に成り代わってみたい等の願望を仮想的に画像上で叶えることができる。ユーザは、この新たなサービスにより、今までにない新たなメディア体験をすることができる。 As mentioned above, everyone has a desire to impersonate others. According to this new service, a desire to take the place of an actor in a movie or an entertainer in an advertisement can be virtually fulfilled on an image. With this new service, users can enjoy a new media experience like never before.

図１Ａは、ユーザのための合成画像を提供するという新たなサービスのフローを概略的に示す図である。端末装置１００を使用するユーザが、このサービスを利用する場合を例に説明する。 FIG. 1A is a diagram schematically illustrating a flow of a new service of providing a composite image for a user. A case in which a user using the terminal device 100 uses this service will be described as an example.

まず、ステップＳ１において、ユーザは、ユーザのための合成画像を提供するという新たなサービスを利用するために、利用登録を行う。例えば、端末装置１００を用いて専用アプリケーションを起動し、必要情報を入力することによって利用登録をすることができる。利用登録の際、ユーザは、他人に成り代わりたい願望があることを表明することができる。これは、例えば、ユーザ画像に基づいて合成画像を生成することの許可として専用アプリケーションに入力されるようにしてもよい。このアプリケーションは、例えば、端末装置１００にインストールされているローカルアプリケーションであってもよいし、ウェブブラウザを介して利用可能なウェブアプリケーションであってもよい。端末装置１００は、スマートフォンとして描かれているが、タブレット、パーソナルコンピュータ、スマートグラス等のユーザと相互作用する任意の端末装置であり得る。 First, in step S1, the user performs use registration in order to use a new service of providing a composite image for the user. For example, it is possible to perform use registration by starting a dedicated application using the terminal device 100 and inputting necessary information. At the time of use registration, the user can state that there is a desire to take the place of another person. This may be input to the dedicated application as permission to generate a composite image based on the user image, for example. This application may be, for example, a local application installed on the terminal device 100 or a web application available via a web browser. Although the terminal device 100 is depicted as a smartphone, it can be any terminal device that interacts with the user, such as a tablet, personal computer, smart glass, and the like.

利用登録が完了すると、ステップＳ２において、ユーザは、自身が写っている画像をサーバ装置２００にアップロードする。画像は、例えば、過去に撮影した画像であってもよいし、アップロードに際して撮影した画像であってもよい。ユーザは、画像の他に、例えば、自身の音声もアップロードすることができる。音声は、例えば、過去に録音された音声であってもよいし、アップロードに際して録音された音声であってもよいし、画像に含まれる音声であってもよい。音声が画像に含まれる音声である場合は、音声付き画像をアップロードすることにより、画像および音声のアップロードが達成される。例えば、ユーザは、画像または音声をアップロードする代わりに、またはこれに加えて、Ｆａｃｅｂｏｏｋ、Ｉｎｓｔａｇｒａｍ等のＳＮＳ上に既にアップロードしてある画像または音声の所在を指定することによって、サーバ装置２００に画像または音声を取得させるようにしてもよい。 When the use registration is completed, the user uploads an image including the user to the server device 200 in step S2. The image may be, for example, an image captured in the past or an image captured at the time of upload. The user can upload, for example, his own voice in addition to the image. The sound may be, for example, a sound recorded in the past, a sound recorded at the time of upload, or a sound included in an image. When the sound is the sound included in the image, the upload of the image and the sound is achieved by uploading the image with sound. For example, instead of, or in addition to, uploading an image or sound, the user specifies the location of the image or sound that has already been uploaded on the SNS such as Facebook or Instagram, and thereby specifies the image or sound to the server device 200. You may make it acquire a sound.

ユーザの画像（および音声）がサーバ装置２００にアップロードされた後、ステップＳ３において、サーバ装置２００が、ユーザのための合成画像を生成する。 After the image (and sound) of the user is uploaded to the server device 200, in step S3, the server device 200 generates a composite image for the user.

ユーザのための合成画像が生成された後、ステップＳ４において、ユーザのための合成画像が端末装置１００に提供される。 After the composite image for the user is generated, the composite image for the user is provided to the terminal device 100 in step S4.

ステップＳ５において、ユーザのための合成画像が、端末装置１００で再生される。 In step S5, the composite image for the user is reproduced on the terminal device 100.

例えば、図１Ｂに示されるように、映画中の俳優Ａの顔の画像をユーザの顔の画像に変換した合成画像が端末装置１００で再生される。図１Ｂの左側の図が変換される前の元画像を再生している様子を示し、図１Ｂの右側の図が変換された後の合成画像を再生している様子を示している。例えば、映画中の俳優Ａの音声をユーザの音声に変換するようにしてもよい。これにより、ユーザは、自分が俳優Ａの代わりに登場する映画を端末装置１００において視聴することができる。 For example, as shown in FIG. 1B, a composite image obtained by converting an image of the face of actor A in the movie into an image of the face of the user is reproduced on terminal device 100. The figure on the left side of FIG. 1B shows a state where the original image before the conversion is reproduced, and the figure on the right side of FIG. 1B shows a state where the composite image after the conversion is reproduced. For example, the voice of actor A in the movie may be converted into the voice of the user. Thereby, the user can watch the movie appearing in place of actor A on terminal device 100.

ステップＳ５では、例えば、ユーザが合成画像の元となる画像（ベース画像）を選択することに応答して、合成画像が提供されるようにしてもよい。 In step S5, for example, the composite image may be provided in response to the user selecting an image (base image) from which the composite image is based.

図１Ｃは、ベース画像をユーザに選択させるための選択画面１０の一例が表示された端末装置１００を示す。図１Ｃの選択画面１０では、ベース画像の選択肢として、複数の映画が表示されている。ユーザは、選択画面１０において、出演する俳優の代わりに登場してみたい映画を選択することができる。ユーザが登場したい映画を選択すると、その選択された映画中の俳優の顔の画像および／または音声をユーザの顔および／または音声に変換した合成画像が、端末装置１００で再生される。例えば、ユーザが選択画面の映画１１を選択すると、図１Ｂの右側の図に示されるように、映画中の俳優Ａの顔の画像および音声をユーザの顔の画像および音声に変換した合成画像が端末装置１００で再生される。 FIG. 1C shows terminal device 100 on which an example of selection screen 10 for allowing a user to select a base image is displayed. In the selection screen 10 of FIG. 1C, a plurality of movies are displayed as options of the base image. On the selection screen 10, the user can select a movie that he or she wants to appear in place of the acting actor. When the user selects a movie to appear, the terminal device 100 reproduces a composite image obtained by converting an image and / or sound of an actor's face in the selected movie into a user's face and / or sound. For example, when the user selects the movie 11 on the selection screen, as shown in the diagram on the right side of FIG. 1B, a composite image obtained by converting the face image and the voice of the actor A in the movie into the face image and the voice of the user is obtained. It is reproduced on the terminal device 100.

あるいは、ステップＳ５では、例えば、選択画面１０でベース画像を選択した後に、そのベース画像に出演する人物のうちの成り代わりたい人物を選択することに応答して、合成画像が提供されるようにしてもよい。 Alternatively, in step S5, for example, after selecting a base image on the selection screen 10, a composite image is provided in response to selecting a person who wants to replace the person appearing in the base image. Is also good.

図１Ｄは、成り代わりたい人物をユーザに選択させるための選択画面２０の一例が表示された端末装置１００を示す。選択画面２０は、例えば、図１Ｃの選択画面１０から遷移した画面である。図１Ｄの選択画面２０では、成り代わることが可能な人物の選択肢として、複数の俳優が表示されている。ユーザは、選択画面２０において、成り代わりたい俳優を選択することができる。ユーザが図１Ｃの選択画面１０で登場したい映画を選択し、図１Ｄの選択画面２０で成り代わりたい俳優を選択すると、その選択された映画中の選択された俳優の顔の画像および／または音声をユーザの顔の画像および／または音声に変換した合成画像が、端末装置１００で再生される。例えば、ユーザが選択画面１０の映画１１を選択し、選択画面２０の俳優２１を選択すると、図１Ｂの右側の図に示されるように、映画中の俳優Ａの顔の画像および／または音声をユーザの顔の画像および／または音声に変換した合成画像が端末装置１００で再生される。 FIG. 1D shows the terminal device 100 on which an example of the selection screen 20 for allowing the user to select a person to be replaced is displayed. The selection screen 20 is, for example, a screen transited from the selection screen 10 of FIG. 1C. In the selection screen 20 of FIG. 1D, a plurality of actors are displayed as options of a person who can be impersonated. The user can select an actor to be replaced on the selection screen 20. When the user selects a movie to appear on the selection screen 10 of FIG. 1C and selects an actor to be replaced on the selection screen 20 of FIG. 1D, the face image and / or sound of the selected actor in the selected movie is displayed. The composite image converted into the image and / or the voice of the user's face is reproduced on the terminal device 100. For example, when the user selects the movie 11 on the selection screen 10 and selects the actor 21 on the selection screen 20, the image and / or sound of the face of the actor A in the movie is displayed as shown in the right-side diagram of FIG. 1B. The composite image converted into the image of the user's face and / or the voice is reproduced on the terminal device 100.

あるいは、ステップＳ５では、例えば、ベース画像を選択することなしに、成り代わりたい人物を選択することに応答して、合成画像が提供されるようにしてもよい。 Alternatively, in step S5, for example, a composite image may be provided in response to selecting a person to be replaced without selecting a base image.

例えば、図１Ｃの選択画面１０を経ることなく図１Ｄの選択画面２０を表示し、成り代わりたい俳優をユーザに選択させるようにすることができる。ユーザが選択画面２０で成り代わりたい俳優を選択すると、或る画像（例えば、映画）において選択された俳優の顔の画像および／または音声をユーザの顔の画像および／または音声に変換した合成画像が、端末装置１００で再生される。このとき、或る画像は、例えば、ランダムに決定されるベース画像であってもよいし、ベース画像提供者またはこのサービスの提供者による恣意的なベース画像であってもよい。例えば、ユーザが選択画面２０の俳優２１を選択すると、図１Ｂの右側の図に示されるように、映画中の俳優Ａの顔の画像および／または音声をユーザの顔の画像および／または音声に変換した合成画像が端末装置１００で再生される。 For example, the selection screen 20 of FIG. 1D can be displayed without passing through the selection screen 10 of FIG. 1C, and the user can select an actor to be replaced. When the user selects an actor to be replaced on the selection screen 20, a composite image obtained by converting an image and / or sound of the face of the selected actor into an image and / or sound of the user's face in a certain image (for example, a movie) is displayed. Is reproduced on the terminal device 100. At this time, the certain image may be, for example, a randomly determined base image, or may be an arbitrary base image provided by a base image provider or a service provider. For example, when the user selects the actor 21 on the selection screen 20, the face image and / or sound of the actor A in the movie is converted into the face image and / or sound of the user as shown in the figure on the right side of FIG. 1B. The converted composite image is reproduced on the terminal device 100.

上記３つの例では、ユーザが登場したいベース画像および／またはユーザが成り代わりたい俳優をユーザが選択することにより、ユーザは自分の好みに応じたパーソナライズされた合成画像を見ることができ、これにより、ユーザは、新たなメディア体験をすることができる。 In the above three examples, the user selects a base image that the user wants to appear and / or an actor that the user wants to take over, so that the user can see a personalized composite image according to his or her preference, The user can have a new media experience.

あるいは、ステップＳ５では、例えば、ユーザが端末装置１００で動画を視聴しているときに合成画像を視聴可能である旨の通知を受信し、ユーザがその通知に対して合成画像を提供する要求を送信したときに、合成画像が提供されるようにしてもよい。 Alternatively, in step S5, for example, when the user is viewing a moving image on the terminal device 100, a notification that the composite image can be viewed is received, and the user provides a request to provide the composite image in response to the notification. When transmitted, a composite image may be provided.

図１Ｅは、合成画像を視聴可能である旨の通知を表示する通知画面３０の一例が表示された端末装置１００を示す。通知画面３０は、例えば、図１Ｂの左側に示される映画を視聴しているときに、映画の再生が休止されて表示される画面である。図１Ｅの通知画面３０では、「面白い広告を視聴することができます。視聴しますか？」というメッセージが表示されており、このメッセージに対して「はい」または「いいえ」で応答するようになっている。ユーザが通知画面３０で「はい」を選択すると、或る画像（例えば、広告）に登場する人物の顔の画像および／または音声をユーザの顔の画像および／または音声に変換した合成画像が、端末装置１００で再生される。ユーザが通知画面３０で「いいえ」を選択すると、休止されていた動画の再生が開始されるか、または、或る画像（例えば、広告）が変換されることなくそのまま端末装置１００で再生される。 FIG. 1E shows the terminal device 100 on which an example of the notification screen 30 displaying a notification that the composite image can be viewed is displayed. The notification screen 30 is, for example, a screen that is displayed while the reproduction of the movie is paused while watching the movie shown on the left side of FIG. 1B. On the notification screen 30 of FIG. 1E, a message “You can watch an interesting advertisement. Would you like to watch it?” Is displayed, and respond to this message with “Yes” or “No”. Has become. When the user selects “Yes” on the notification screen 30, a composite image obtained by converting a face image and / or a voice of a person appearing in a certain image (for example, an advertisement) into a user face image and / or a voice is displayed. It is reproduced on the terminal device 100. When the user selects “No” on the notification screen 30, the reproduction of the paused moving image is started, or a certain image (for example, an advertisement) is reproduced on the terminal device 100 without conversion. .

あるいは、ステップＳ５では、例えば、ユーザが端末装置１００で動画を視聴しているときに、突然、或る画像（例えば、広告）に登場する人物の顔の画像および／または音声をユーザの顔の画像および／または音声に変換した合成画像が、端末装置１００で再生される。例えば、図１Ｂの左側に示される映画を視聴しているときに、突然、映画の再生が休止されて、広告に登場する人物の顔の画像および／または音声をユーザの顔の画像および音声に変換した合成画像が端末装置１００で再生される。合成画像の再生が終了すると、休止されていた動画の再生が開始される。 Alternatively, in step S5, for example, when the user is watching a moving image on the terminal device 100, the image and / or sound of the face of the person suddenly appearing in a certain image (for example, an advertisement) is displayed. The synthesized image converted into the image and / or the sound is reproduced by the terminal device 100. For example, when watching the movie shown on the left side of FIG. 1B, suddenly, the movie is paused and the face image and / or sound of the person appearing in the advertisement is changed to the user face image and sound. The converted composite image is reproduced on the terminal device 100. When the reproduction of the composite image ends, the reproduction of the paused moving image starts.

上記２つの例では、偶然にパーソナライズされた画像に出会うことができ、これにより、ユーザは、新たなメディア体験をすることができる。 In the above two examples, a personalized image can be accidentally encountered, thereby allowing the user to have a new media experience.

図１Ａに示される例では、ユーザがログインした上で能動的に自身の画像をサーバ装置２００にアップロードしたが、例えば、端末装置１００のカメラ（例えばＷｅｂカメラ）が撮影した画像を自動的にサーバ装置２００にアップロードするようにしてもよい。これにより、演者の顔の画像および／または音声を自身の顔の画像および／または音声に変換した合成画像がユーザの意図とは無関係に生成されることになる。生成された合成画像は、例えば、リアルタイムで端末装置１００で再生されるようにしてもよいし、時間をずらして端末装置１００で再生されるようにしてもよい。これにより、ユーザは、まったく予期せずパーソナライズされた画像を見ることになり、ユーザは、新たなメディア体験をすることができる。 In the example shown in FIG. 1A, the user logs in and actively uploads his / her own image to the server device 200. For example, an image captured by a camera (for example, a web camera) of the terminal device 100 is automatically stored in the server device. You may make it upload to the apparatus 200. As a result, a synthesized image obtained by converting the image and / or sound of the performer's face into the image and / or sound of his / her own face is generated irrespective of the user's intention. The generated composite image may be reproduced by the terminal device 100 in real time, for example, or may be reproduced by the terminal device 100 at a time interval. This results in the user seeing the personalized image quite unexpectedly, allowing the user a new media experience.

上述したユーザのための合成画像を提供するという新たなサービスは、例えば、以下に説明するユーザのための合成画像を提供するためのコンピュータシステム１０００によって実現されることができる。 The above-described new service of providing a composite image for a user can be realized by, for example, a computer system 1000 for providing a composite image for a user described below.

２．ユーザのための合成画像を提供するためのコンピュータシステムの構成
図２は、ユーザのための合成画像を提供するためのコンピュータシステム１０００の構成の一例を示す。 2. Configuration diagram of a computer system for providing a composite image for the user. 2 shows an example of the configuration of a computer system 1000 for providing a composite image for the user.

コンピュータシステム１０００は、少なくとも１つの端末装置１００と、少なくとも１つの端末装置１００にネットワーク４００を介して接続されているサーバ装置２００と、サーバ装置２００に接続されているデータベース部３００とを含む。 The computer system 1000 includes at least one terminal device 100, a server device 200 connected to the at least one terminal device 100 via a network 400, and a database unit 300 connected to the server device 200.

端末装置１００は、スマートフォン、タブレット、パーソナルコンピュータ、スマートグラス等のユーザと相互作用する任意の端末装置であり得る。端末装置１００は、例えば、映画館、遊園地、商業施設、駅等の施設に設置されたカメラとそのカメラからの映像を投影するスクリーンとを備えるコンピュータシステムを含み得る。端末装置１００は、ネットワーク４００を介してサーバ装置１００と通信することができる。ここで、ネットワーク４００の種類は問わない。例えば、端末装置１００は、インターネットを介してサーバ装置２００と通信してもよいし、ＬＡＮを介してサーバ装置２００と通信してもよい。図２には３つの端末装置１００が描写されているが、端末装置１００の数はこれに限定されない。端末装置１００の数は、１以上の任意の数であり得る。 The terminal device 100 may be any terminal device that interacts with the user, such as a smartphone, a tablet, a personal computer, and a smart glass. The terminal device 100 may include, for example, a computer system including a camera installed in a facility such as a movie theater, an amusement park, a commercial facility, a station, and the like, and a screen that projects an image from the camera. The terminal device 100 can communicate with the server device 100 via the network 400. Here, the type of the network 400 does not matter. For example, the terminal device 100 may communicate with the server device 200 via the Internet, or may communicate with the server device 200 via a LAN. Although three terminal devices 100 are depicted in FIG. 2, the number of terminal devices 100 is not limited to this. The number of the terminal devices 100 may be any number of one or more.

サーバ装置２００は、ネットワーク４００を介して少なくとも１つの端末装置１００と通信することができる。例えば、サーバ装置２００は、サーバ装置２００に接続されているデータベース部３００から画像を取得し、取得した画像を少なくとも１つの端末装置１００に送信することができる。例えば、サーバ装置２００は、ネットワーク４００を介してサーバ装置２００に接続され得るベース画像提供者の端末装置５００からベース画像を取得し、取得したベース画像をデータベース部３００に格納のために送信することができる。 The server device 200 can communicate with at least one terminal device 100 via the network 400. For example, the server device 200 can acquire an image from the database unit 300 connected to the server device 200, and transmit the acquired image to at least one terminal device 100. For example, the server device 200 acquires a base image from the terminal device 500 of a base image provider that can be connected to the server device 200 via the network 400, and transmits the acquired base image to the database unit 300 for storage. Can be.

サーバ装置２００に接続されているデータベース部３００には、例えば、少なくとも１つの端末装置１００のユーザのユーザ画像、ベース画像、サーバ装置２００によって生成された合成画像等が格納される。データベース部３００には、例えば、ユーザの身体的特徴を示す情報（例えば、身長、体重、胸囲、胴囲、腰囲、肌の質感、肌年齢等）、ユーザの属性（例えば、年齢、性別、国籍、出身地等）、ユーザの挙動の特徴（例えば、癖、仕草等）ユーザの人間関係における情報（例えば、恋人の情報（例えば、恋人の属性、身体的特徴、画像等）、配偶者の情報（例えば、配偶者の属性、身体的特徴、画像等）、両親の情報（例えば、両親の属性、身体的特徴、画像等）、兄弟の情報（例えば、兄弟の属性、身体的特徴、画像等）、親族の情報（例えば、親族の属性、身体的特徴、画像等）等）が格納され得る。データベース部３００には、例えば、ベース画像の改変が許諾されているか否かの情報、ベース画像の改変が許諾される期間の情報、ベース画像内の改変が許諾された人物の情報等がベース画像と関連付けられて格納され得る。データベース部３００は、任意の記憶手段によって実装され得る。 The database unit 300 connected to the server device 200 stores, for example, a user image of a user of at least one terminal device 100, a base image, a composite image generated by the server device 200, and the like. The database unit 300 includes, for example, information indicating the physical characteristics of the user (for example, height, weight, chest circumference, waist circumference, waist circumference, skin texture, skin age, etc.), and user attributes (for example, age, gender, Nationality, hometown, etc.), characteristics of the user's behavior (eg, habit, gesture, etc.) Information on the user's human relationship (eg, lover information (eg, lover's attributes, physical characteristics, images, etc.), spouse's Information (eg, spouse attributes, physical characteristics, images, etc.), parents information (eg, parents attributes, physical characteristics, images, etc.), sibling information (eg, sibling attributes, physical characteristics, images) Etc.), kinship information (eg, kinship attributes, physical characteristics, images, etc.) may be stored. The database unit 300 includes, for example, information on whether or not modification of the base image is permitted, information on a period during which modification of the base image is permitted, and information on a person in the base image who is permitted to modify the base image. And may be stored in association with. The database unit 300 can be implemented by any storage means.

図３Ａは、端末装置１００の構成の一例を示す。 FIG. 3A shows an example of the configuration of the terminal device 100.

端末装置１００は、通信インターフェース部１１０と、入力部１２０と、表示部１３０と、メモリ部１４０と、プロセッサ部１５０とを備える。 The terminal device 100 includes a communication interface unit 110, an input unit 120, a display unit 130, a memory unit 140, and a processor unit 150.

通信インターフェース部１１０は、ネットワーク４００を介した通信を制御する。端末装置１００のプロセッサ部１５０は、通信インターフェース部１１０を介して、端末装置１００の外部から情報を受信することが可能であり、端末装置１００の外部に情報を送信することが可能である。通信インターフェース部１１０は、任意の方法で通信を制御し得る。 The communication interface unit 110 controls communication via the network 400. The processor unit 150 of the terminal device 100 can receive information from outside the terminal device 100 via the communication interface unit 110, and can transmit information to the outside of the terminal device 100. The communication interface unit 110 can control communication by an arbitrary method.

入力部１２０は、ユーザからの情報を端末装置１００に入力することを可能にする。入力部１２０は、例えば、ログインのためのアカウント名およびパスワード、ユーザの人間関係における情報、ユーザ画像に基づいて合成画像を生成することの許可等を端末装置１００に入力することを可能にする。入力部１２０がユーザからの情報をどのような態様で端末装置１００に入力することを可能にするかは問わない。例えば、入力部１２０がタッチパネルである場合には、ユーザがタッチパネルにタッチすることによって情報を入力するようにしてもよい。あるいは、入力部１２０がマウスである場合には、ユーザがマウスを操作することによって情報を入力するようにしてもよい。あるいは、入力部１２０がキーボードである場合には、ユーザがキーボードのキーを押下することによって情報を入力するようにしてもよい。あるいは、入力部１２０がマイクである場合には、ユーザがマイクに音声を入力することによって情報を入力するようにしてもよい。 The input unit 120 enables information from a user to be input to the terminal device 100. The input unit 120 enables the terminal device 100 to input, for example, an account name and a password for login, information on a human relationship of the user, permission to generate a composite image based on the user image, and the like. It does not matter how the input unit 120 can input information from the user to the terminal device 100. For example, when the input unit 120 is a touch panel, the user may input information by touching the touch panel. Alternatively, when the input unit 120 is a mouse, the user may input information by operating the mouse. Alternatively, when the input unit 120 is a keyboard, the user may input information by pressing a key on the keyboard. Alternatively, when the input unit 120 is a microphone, the user may input information by inputting voice to the microphone.

表示部１３０は、情報を表示するための任意のディスプレイであり得る。 Display unit 130 can be any display for displaying information.

メモリ部１４０には、端末装置１００における処理を実行するためのプログラムやそのプログラムの実行に必要とされるデータ等が格納されている。メモリ部１４０には、例えば、ユーザのための合成画像を生成するためのプログラム（例えば、後述する図４に示される処理を実現するプログラム）またはユーザのための合成画像を提供するためのプログラム（例えば、後述する図５、図６、図７に示される処理を実現するプログラム）の一部または全部が格納されている。メモリ部１４０には、任意の機能を実装するアプリケーションが格納されていてもよい。ここで、プログラムをどのようにしてメモリ部１４０に格納するかは問わない。例えば、プログラムは、メモリ部１４０にプリインストールされていてもよい。あるいは、プログラムは、ネットワーク４００を経由してダウンロードされることによってメモリ部１４０にインストールされるようにしてもよい。メモリ部１４０は、任意の記憶手段によって実装され得る。 The memory unit 140 stores a program for executing processing in the terminal device 100, data required for executing the program, and the like. The memory unit 140 includes, for example, a program for generating a composite image for a user (for example, a program for realizing a process illustrated in FIG. 4 described later) or a program for providing a composite image for a user ( For example, part or all of a program for realizing the processes shown in FIGS. 5, 6, and 7 described below is stored. The memory unit 140 may store an application that implements an arbitrary function. Here, it does not matter how the program is stored in the memory unit 140. For example, the program may be preinstalled in the memory unit 140. Alternatively, the program may be installed in the memory unit 140 by being downloaded via the network 400. The memory unit 140 can be implemented by any storage means.

プロセッサ部１５０は、端末装置１００全体の動作を制御する。プロセッサ部１５０は、メモリ部１４０に格納されているプログラムを読み出し、そのプログラムを実行する。これにより、端末装置１００を所望のステップを実行する装置として機能させることが可能である。プロセッサ部１５０は、単一のプロセッサによって実装されてもよいし、複数のプロセッサによって実装されてもよい。 The processor unit 150 controls the operation of the terminal device 100 as a whole. The processor unit 150 reads out a program stored in the memory unit 140 and executes the program. This allows the terminal device 100 to function as a device that executes a desired step. The processor unit 150 may be implemented by a single processor, or may be implemented by a plurality of processors.

端末装置１００は、上記構成に加えて、例えば、画像を撮影可能である任意のカメラを備え得る。カメラは、端末装置１００に内蔵のカメラであってもよいし、端末装置１００に取り付けられる外部カメラであってもよい。 The terminal device 100 may include, for example, an arbitrary camera capable of capturing an image in addition to the above configuration. The camera may be a camera built in the terminal device 100 or an external camera attached to the terminal device 100.

図３Ａに示される例では、端末装置１００の各構成要素が端末装置１００内に設けられているが、本発明はこれに限定されない。端末装置１００の各構成要素のいずれかが端末装置１００の外部に設けられることも可能である。例えば、表示部１３０を別個のハードウェア（例えば、テレビ）として構成することができる。例えば、入力部１２０、表示部１３０、メモリ部１４０、プロセッサ部１５０のそれぞれが別々のハードウェア部品で構成されている場合には、各ハードウェア部品が任意のネットワークを介して接続されてもよい。このとき、ネットワークの種類は問わない。各ハードウェア部品は、例えば、ＬＡＮを介して接続されてもよいし、無線接続されてもよいし、有線接続されてもよい。端末装置１００は、特定のハードウェア構成には限定されない。例えば、プロセッサ部１５０をデジタル回路ではなくアナログ回路によって構成することも本発明の範囲内である。端末装置１００の構成は、その機能を実現できる限りにおいて上述したものに限定されない。 In the example shown in FIG. 3A, each component of the terminal device 100 is provided in the terminal device 100, but the present invention is not limited to this. Any of the components of the terminal device 100 may be provided outside the terminal device 100. For example, the display unit 130 can be configured as separate hardware (for example, a television). For example, when each of the input unit 120, the display unit 130, the memory unit 140, and the processor unit 150 is configured by separate hardware components, each hardware component may be connected via an arbitrary network. . At this time, the type of network does not matter. Each hardware component may be connected, for example, via a LAN, may be connected wirelessly, or may be connected by wire. The terminal device 100 is not limited to a specific hardware configuration. For example, it is within the scope of the present invention to configure the processor unit 150 with an analog circuit instead of a digital circuit. The configuration of the terminal device 100 is not limited to the above as long as the function can be realized.

図３Ｂは、サーバ装置２００の構成の一例を示す。 FIG. 3B shows an example of the configuration of the server device 200.

サーバ装置２００は、通信インターフェース部２１０と、メモリ部２２０と、プロセッサ部２３０とを備える。 The server device 200 includes a communication interface unit 210, a memory unit 220, and a processor unit 230.

通信インターフェース部２１０は、ネットワーク４００を介した通信を制御する。また、通信インターフェース部２１０は、データベース部３００との通信も制御する。サーバ装置２００のプロセッサ部２２０は、通信インターフェース部２１０を介して、サーバ装置２００の外部から情報を受信することが可能であり、サーバ装置２００の外部に情報を送信することが可能である。例えば、サーバ装置２００のプロセッサ部２２０は、少なくとも１つの端末装置１００からネットワーク４００を介して、ユーザ画像を受信する。例えば、サーバ装置２００のプロセッサ部２２０は、少なくとも１つの端末装置１００にネットワーク４００を介して合成画像を送信する。例えば、サーバ装置２００のプロセッサ部２２０は、ベース画像提供者の端末装置５００からネットワーク４００を介してベース画像を受信し得る。通信インターフェース部２１０は、任意の方法で通信を制御し得る。 The communication interface unit 210 controls communication via the network 400. The communication interface unit 210 also controls communication with the database unit 300. The processor unit 220 of the server device 200 can receive information from outside the server device 200 via the communication interface unit 210, and can transmit information to the outside of the server device 200. For example, the processor unit 220 of the server device 200 receives a user image from at least one terminal device 100 via the network 400. For example, the processor unit 220 of the server device 200 transmits the composite image to at least one terminal device 100 via the network 400. For example, the processor unit 220 of the server device 200 can receive the base image from the terminal device 500 of the base image provider via the network 400. The communication interface unit 210 can control communication by an arbitrary method.

例えば、通信インターフェース部２１０は、ベース画像提供者によって改変が許諾された画像のみを受信するように構成されてもよい。改変が許諾された画像は、例えば、改変が許諾された人物が写っている画像を含む。これは、例えば、ベース画像提供者の端末装置５００が画像をサーバ装置２００に送信するときに、ベース画像提供者が画像の改変を許諾したことを示す情報も共に送信させることによって達成され得る。改変を許諾したことを示す情報は、改変を許諾する期間、改変を許諾する人物の情報を含み得る。改変を許諾したことを示す情報の一例は、ベース動画内の人物Ａについて１年間改変を許諾する等の情報である。 For example, the communication interface unit 210 may be configured to receive only an image whose alteration is permitted by the base image provider. The image whose alteration is permitted includes, for example, an image in which a person whose alteration is permitted is shown. This can be achieved, for example, by causing the base image provider's terminal device 500 to transmit an image to the server device 200 and also transmit information indicating that the base image provider has permitted image modification. The information indicating that the modification is permitted may include information of a person who permits the modification during a period during which the modification is permitted. An example of the information indicating that the modification is permitted is information indicating that modification of the person A in the base moving image is permitted for one year.

メモリ部２２０には、サーバ装置２００の処理の実行に必要とされるプログラムやそのプログラムの実行に必要とされるデータ等が格納されている。例えば、ユーザのための合成画像を生成するためのプログラム（例えば、後述する図４に示される処理を実現するプログラム）またはユーザのための合成画像を生成するためのプログラム（例えば、後述する図５、図６、図７に示される処理を実現するプログラム）の一部または全部が格納されている。メモリ部２２０は、任意の記憶手段によって実装され得る。 The memory unit 220 stores a program required to execute the processing of the server device 200, data required to execute the program, and the like. For example, a program for generating a composite image for the user (for example, a program for realizing the processing shown in FIG. 4 described later) or a program for generating a composite image for the user (for example, FIG. 5 described later) , A program for realizing the processing shown in FIGS. 6 and 7). The memory unit 220 can be implemented by any storage means.

プロセッサ部２３０は、サーバ装置２００全体の動作を制御する。プロセッサ部２３０は、メモリ部２２０に格納されているプログラムを読み出し、そのプログラムを実行する。これにより、サーバ装置２００を所望のステップを実行する装置として機能させることが可能である。プロセッサ部２３０は、単一のプロセッサによって実装されてもよいし、複数のプロセッサによって実装されてもよい。 The processor unit 230 controls the operation of the entire server device 200. The processor unit 230 reads out a program stored in the memory unit 220 and executes the program. This allows the server device 200 to function as a device that executes a desired step. The processor unit 230 may be implemented by a single processor, or may be implemented by a plurality of processors.

図３Ｂに示される例では、サーバ装置２００の各構成要素がサーバ装置２００内に設けられているが、本発明はこれに限定されない。サーバ装置２００の各構成要素のいずれかがサーバ装置２００の外部に設けられることも可能である。例えば、メモリ部２２０、プロセッサ部２３０のそれぞれが別々のハードウェア部品で構成されている場合には、各ハードウェア部品が任意のネットワークを介して接続されてもよい。このとき、ネットワークの種類は問わない。各ハードウェア部品は、例えば、ＬＡＮを介して接続されてもよいし、無線接続されてもよいし、有線接続されてもよい。サーバ装置２００は、特定のハードウェア構成には限定されない。例えば、プロセッサ部２３０をデジタル回路ではなくアナログ回路によって構成することも本発明の範囲内である。サーバ装置２００の構成は、その機能を実現できる限りにおいて上述したものに限定されない。 In the example shown in FIG. 3B, each component of the server device 200 is provided in the server device 200, but the present invention is not limited to this. Any of the components of the server device 200 may be provided outside the server device 200. For example, when each of the memory unit 220 and the processor unit 230 is configured by separate hardware components, each hardware component may be connected via an arbitrary network. At this time, the type of network does not matter. Each hardware component may be connected, for example, via a LAN, may be connected wirelessly, or may be connected by wire. The server device 200 is not limited to a specific hardware configuration. For example, it is within the scope of the present invention to configure the processor unit 230 with an analog circuit instead of a digital circuit. The configuration of the server device 200 is not limited to the above as long as the function can be realized.

図２、図３Ｂに示される例では、データベース部３００は、サーバ装置２００の外部に設けられているが、本発明はこれに限定されない。データベース部３００をサーバ装置２００の内部に設けることも可能である。このとき、データベース部３００は、メモリ部２２０を実装する記憶手段と同一の記憶手段によって実装されてもよいし、メモリ部２２０を実装する記憶手段とは別の記憶手段によって実装されてもよい。いずれにせよ、データベース部３００は、サーバ装置２００のための格納部として構成される。データベース部３００の構成は、特定のハードウェア構成に限定されない。例えば、データベース部３００は、単一のハードウェア部品で構成されてもよいし、複数のハードウェア部品で構成されてもよい。例えば、データベース部３００は、サーバ装置２００の外付けハードディスク装置として構成されてもよいし、ネットワークを介して接続されるクラウド上のストレージとして構成されてもよい。 In the examples shown in FIGS. 2 and 3B, the database unit 300 is provided outside the server device 200, but the present invention is not limited to this. The database unit 300 may be provided inside the server device 200. At this time, the database unit 300 may be implemented by the same storage unit that implements the memory unit 220, or may be implemented by a storage unit that is different from the storage unit that implements the memory unit 220. In any case, the database unit 300 is configured as a storage unit for the server device 200. The configuration of the database unit 300 is not limited to a specific hardware configuration. For example, the database unit 300 may be configured by a single hardware component, or may be configured by a plurality of hardware components. For example, the database unit 300 may be configured as an external hard disk device of the server device 200, or may be configured as a storage on a cloud connected via a network.

図３Ｃは、サーバ装置２００のプロセッサ部２３０の構成の一例を示す。 FIG. 3C illustrates an example of a configuration of the processor unit 230 of the server device 200.

プロセッサ部２３０は、ユーザ画像処理部２３１と、ベース画像処理部２３２と、顔変換モデル形成部２３３と、音声変換モデル形成部２３４と、顔コンバート部２３５と、音声コンバート部２３６と、映像音声結合部２３７とを含む。音声変換モデル形成部２３４、音声コンバート部２３６、映像音声結合部２３７は、図３Ｃにおいて破線で示されている。ユーザ画像およびベース画像が音声を含まない場合には、音声変換モデル形成部２３４、音声コンバート部２３６、映像音声結合部２３７は、不要であり、プロセッサ部２３０は、音声変換モデル形成部２３４、音声コンバート部２３６、映像音声結合部２３７を備える必要はない。 The processor unit 230 includes a user image processing unit 231, a base image processing unit 232, a face conversion model forming unit 233, an audio conversion model forming unit 234, a face converting unit 235, an audio converting unit 236, and a video / audio combining unit. Section 237. The audio conversion model forming unit 234, the audio converting unit 236, and the video / audio combining unit 237 are indicated by broken lines in FIG. 3C. When the user image and the base image do not include audio, the audio conversion model forming unit 234, the audio converting unit 236, and the video / audio combining unit 237 are unnecessary, and the processor unit 230 includes the audio conversion model forming unit 234, It is not necessary to provide the converter 236 and the video / audio combiner 237.

ユーザ画像処理部２３１は、ユーザ画像取得部２３１１と、映像処理に関連するブロック（映像切り出し部２３１３、顔切り出し部２３１５）と、音声処理に関連するブロック（音声切り出し部２３１４、音声調節部２３１６）とを含む。音声処理に関連するブロックは、図３Ｃにおいて破線で示されている。ユーザ画像が音声を含まない場合には、音声処理に関連するブロックは、不要であり、ユーザ画像処理部２３１は、音声処理に関連するブロックを備える必要はない。 The user image processing unit 231 includes a user image acquisition unit 2311, blocks related to video processing (video clipping unit 2313, face clipping unit 2315), and blocks related to audio processing (voice clipping unit 2314, audio adjustment unit 2316). And Blocks related to audio processing are indicated by dashed lines in FIG. 3C. When the user image does not include sound, blocks related to sound processing are unnecessary, and the user image processing unit 231 does not need to include blocks related to sound processing.

ユーザ画像取得部２３１１は、ユーザ画像を取得する。ユーザ画像取得部２３１１は、例えば、少なくとも１つの端末装置１００から送信されたユーザ画像を取得してもよいし、データベース部３００に格納されているユーザ画像を取得してもよい。ベース画像取得部２３２２は、ベース画像に加えて、ベース画像に写っている人物の他の画像も取得するようにしてもよい。 The user image acquisition unit 2311 acquires a user image. The user image acquisition unit 2311 may acquire, for example, a user image transmitted from at least one terminal device 100, or may acquire a user image stored in the database unit 300. The base image obtaining unit 2322 may obtain, in addition to the base image, another image of the person shown in the base image.

映像切り出し部２３１３は、ユーザ画像取得部２３１１によって受信されたユーザ画像から映像を切り出す処理を行う一方、音声切り出し部２３１４は、ユーザ画像取得部２３１１によって受信されたユーザ画像から音声を切り出す処理を行う。これにより、後続の処理において、映像と音声とが別個に処理されるようになる。映像切り出し部２３１３および音声切り出し部２３１４は、例えば、ｆｆｍｐｅｇ等のソフトウェアによって実装され得る。 The video clipping unit 2313 performs a process of clipping a video from the user image received by the user image acquisition unit 2311, while the audio clipping unit 2314 performs a process of clipping a voice from the user image received by the user image acquisition unit 2311. . This allows video and audio to be processed separately in subsequent processing. The video clipping unit 2313 and the audio clipping unit 2314 can be implemented by software such as ffmpeg, for example.

顔切り出し部２３１５は、映像に写っている人物の顔部分を切り出す処理を行う。例えば、ユーザ画像が動画である場合には、顔切り出し部２３１５は、動画映像から静止画映像を切り出す処理を行い、次いで、静止画映像から人物の顔部分を切り出す処理を行う。顔切り出し部２３１５によって切り出された顔部分は、顔変換モデル形成部２３３のための学習用データとして用いられる。顔切り出し部２３１５は、例えば、ｆａｃｅｓｗａｐ等のソフトウェアによって実装され得る。 The face cutout unit 2315 performs a process of cutting out a face portion of a person appearing in the video. For example, when the user image is a moving image, the face cutout unit 2315 performs a process of cutting out a still image from a moving image, and then performs a process of cutting out a face portion of a person from the still image. The face portion cut out by the face cutout unit 2315 is used as learning data for the face conversion model forming unit 233. The face cutout unit 2315 can be implemented by software such as faceswap, for example.

音声調節部２３１６は、音声に含まれ得る音声以外の音を除去し、処理しやすい形式に調節する処理を行う。音声調節部２３１６は、例えば、音声のサンプルレートを調節する処理を行う。音声調節部２３１６はさらに、音声解析処理を行う。これにより、音声の特徴（例えば、周波数分布等）を抽出することができ、音声認識が可能になる。音声調節部２３１６によって抽出された音声の特徴は、音声変換モデル形成部２３４のための学習用データとして用いられる。音声調節部２３１６は、例えば、ｆｆｍｐｅｇ、ｃｙｃｌｅＧＡＮｖｏｉｃｅ等のソフトウェアによって実装され得る。 The sound adjusting unit 2316 performs a process of removing sounds other than sounds that can be included in the sound and adjusting the sound to a format that can be easily processed. The sound adjusting unit 2316 performs, for example, a process of adjusting the sample rate of the sound. The voice adjusting unit 2316 further performs a voice analysis process. This makes it possible to extract features (for example, frequency distribution and the like) of speech, and to perform speech recognition. The features of the voice extracted by the voice adjusting unit 2316 are used as learning data for the voice conversion model forming unit 234. The audio adjustment unit 2316 can be implemented by software such as ffmpeg, cycle GANvoice, or the like.

ベース画像処理部２３２は、ベース画像取得部２３２２と、映像処理に関連するブロック（映像切り出し部２３２３、顔切り出し部２３２５）と、音声処理に関連するブロック（音声切り出し部２３２４、音声調節部２３２６）とを含む。音声処理に関連するブロックは、図３Ｃにおいて破線で示されている。ベース画像が音声を含まない場合には音声処理に関連するブロックは、不要であり、ベース画像処理部２３２は、音声処理に関連するブロックを備える必要はない。ベース画像処理部２３２は、ベース画像に加えて、ベース画像に写っている人物の別の画像も、ベース画像と同様に処理することができる。ベース画像に加えて、ベース画像に写っている人物の別も処理することにより、後段の顔変換モデル形成部２３３または音声変換モデル形成部２３４で学習に用いられるデータ量が増加するため、形成される変換モデルの精度が向上する。 The base image processing unit 232 includes a base image acquisition unit 2322, blocks related to video processing (video clipping unit 2323, face clipping unit 2325), and blocks related to audio processing (voice clipping unit 2324, audio adjustment unit 2326). And Blocks related to audio processing are indicated by dashed lines in FIG. 3C. When the base image does not include sound, blocks related to sound processing are unnecessary, and the base image processing unit 232 does not need to include blocks related to sound processing. The base image processing unit 232 can process, in addition to the base image, another image of a person appearing in the base image in the same manner as the base image. By processing the person in the base image in addition to the base image, the amount of data used for learning in the face conversion model forming unit 233 or the voice conversion model forming unit 234 at the subsequent stage increases. The accuracy of the conversion model is improved.

ベース画像取得部２３２２は、ベース画像を取得する。ベース画像取得部２３２２は、例えば、ベース画像提供者の端末装置５００から送信されたベース画像を取得してもよいし、データベース部３００に格納されているベース画像を取得してもよい。ベース画像取得部２３２２は、例えば、改変が許諾されたベース画像のみを取得するように構成され得る。例えば、ベース画像取得部２３２２は、ベース画像を取得する前に、そのベース画像の改変が許諾されているか否かを判定し、改変が許諾されている場合にのみそのベース画像を取得するようにすることができる。例えば、ベース画像取得部２３２２は、取得したベース画像の改変が許諾されているか否かを判定し、改変が許諾されている場合にのみそのベース画像を次の処理ブロックに渡すことができる。ベース画像の改変が許諾されているか否かを判定は、例えば、ベース画像に付されている情報を参照すること、ベース画像の権利を管理する権利管理団体のサーバ装置に問い合わせること、ベース画像提供者に改変の許諾の明示を要求すること等によって行われ得る。 The base image obtaining unit 2322 obtains a base image. The base image acquisition unit 2322 may acquire, for example, the base image transmitted from the terminal device 500 of the base image provider, or may acquire the base image stored in the database unit 300. The base image obtaining unit 2322 may be configured to obtain, for example, only the base image for which the modification is permitted. For example, before acquiring the base image, the base image acquiring unit 2322 determines whether the modification of the base image is permitted, and acquires the base image only when the modification is permitted. can do. For example, the base image acquisition unit 2322 can determine whether or not modification of the acquired base image is permitted, and can pass the base image to the next processing block only when modification is permitted. The determination as to whether modification of the base image is permitted includes, for example, referring to information attached to the base image, inquiring of a server device of a rights management organization that manages the rights of the base image, providing the base image, This can be done by requesting a person to explicitly give permission for modification.

映像切り出し部２３２３および音声切り出し部２３２４は、映像切り出し部２３１３および音声切り出し部２３１４と同様の処理を行う。映像切り出し部２３２３は、ベース画像取得部２３２１によって受信されたベース画像から映像を切り出す処理を行う一方、ベース画像取得部２３２１によって受信されたベース画像から音声を切り出す処理を行う。これにより、後続の処理において、映像と音声とが別個に処理されるようになる。映像切り出し部２３２３および音声切り出し部２３２４は、例えば、ｆｆｍｐｅｇ等のソフトウェアによって実装され得る。 The video clipping unit 2323 and the audio clipping unit 2324 perform the same processing as the video clipping unit 2313 and the audio clipping unit 2314. The video cutout unit 2323 performs a process of cutting out a video from the base image received by the base image acquisition unit 2321, and performs a process of cutting out audio from the base image received by the base image acquisition unit 2321. This allows video and audio to be processed separately in subsequent processing. The video clipping unit 2323 and the audio clipping unit 2324 can be implemented by software such as ffmpeg, for example.

顔切り出し部２３２５は、顔切り出し部２３１５と同様の処理を行う。顔切り出し部２３２５は、映像に写っている人物の顔を切り出す処理を行う。例えば、ベース画像が動画である場合には、顔切り出し部２３２５は、動画映像から静止画映像を切り出す処理を行い、次いで、静止画映像から人物の顔を切り出す処理を行う。顔切り出し部２３２５によって切り出された顔部分は、顔変換モデル形成部２３３のための学習用データとして用いられる。顔切り出し部２３２５は、例えば、ｆａｃｅｓｗａｐ等のソフトウェアによって実装され得る。 The face clipping unit 2325 performs the same processing as the face clipping unit 2315. The face cutout unit 2325 performs a process of cutting out the face of a person appearing in the video. For example, when the base image is a moving image, the face cutout unit 2325 performs a process of cutting out a still image from a moving image, and then performs a process of cutting out a human face from the still image. The face portion cut out by the face cutout unit 2325 is used as learning data for the face conversion model forming unit 233. The face cutout unit 2325 may be implemented by software such as faceswap, for example.

音声調節部２３２６は、音声調節部２３１６と同様の処理を行う。音声調節部２３２６は、音声に含まれ得る音声以外の音を除去し、処理しやすい形式に調節する処理を行う。音声調節部２３２６は、例えば、音声のサンプルレートを調節する処理を行う。音声調節部２３２６はさらに、音声解析処理を行う。これにより、音声の特徴（例えば、周波数分布等）を抽出することができ、音声認識が可能になる。音声調節部２３２６によって抽出された音声の特徴は、音声変換モデル形成部２３４のための学習用データとして用いられる。音声調節部２３２６は、例えば、ｆｆｍｐｅｇ、ｃｙｃｌｅＧＡＮｖｏｉｃｅ等のソフトウェアによって実装され得る。 The audio adjustment unit 2326 performs the same processing as the audio adjustment unit 2316. The sound adjusting unit 2326 removes sounds other than sounds that may be included in the sound, and performs processing for adjusting the sound to a format that can be easily processed. The sound adjusting unit 2326 performs, for example, a process of adjusting the sample rate of the sound. The voice adjusting unit 2326 further performs a voice analysis process. This makes it possible to extract features (for example, frequency distribution and the like) of speech, and to perform speech recognition. The features of the voice extracted by the voice adjusting unit 2326 are used as learning data for the voice conversion model forming unit 234. The sound adjusting unit 2326 can be implemented by software such as ffmpeg, cycle GANvoice, or the like.

顔変換モデル形成部２３３は、顔切り出し部２３１５によって切り出された顔部分と、顔切り出し部２３２５によって切り出された顔部分とを学習用データとして学習することにより、ベース画像に写っている人物の顔をユーザ画像に写っている人物の顔に変換するための顔変換モデルを形成する処理を行う。顔変換モデル形成部２３３は、例えば、多層ニューラルネットワークを用いたディープラーニング技術を利用する。このとき、学習用データを用いて学習する処理は、顔切り出し部２３１５によって切り出された顔部分と、顔切り出し部２３２５によって切り出された顔部分とを用いて、多層ニューラルネットワークの各隠れ層のノードの重み係数を調節する処理であり、これによって形成される顔変換モデルでは、ベース画像に写っている人物の顔映像を入力すると、ユーザ画像に写っている人物の顔に変換された顔映像（例えば、ベース画像に写っている人物の顔の表情をあたかもユーザ画像に写っている人物がしたかのような映像）が出力されるようになる。顔変換モデル形成部２３３は、学習に用いられる顔部分が多いほど、精度の高い顔変換モデルを形成することが可能である。顔変換モデル形成部２３３は、例えば、ｆａｃｅｓｗａｐ等のソフトウェアによって実装され得る。 The face conversion model forming unit 233 learns, as learning data, the face portion cut out by the face cutout unit 2315 and the face portion cutout by the face cutout unit 2325, so that the face of the person in the base image is learned. Is performed to form a face conversion model for converting into a face of a person in the user image. The face conversion model forming unit 233 uses, for example, a deep learning technique using a multilayer neural network. At this time, the process of learning using the learning data is performed by using the face part cut out by the face cutout unit 2315 and the face part cut out by the face cutout unit 2325 by using the node of each hidden layer of the multilayer neural network. In the face conversion model formed by this, when the face image of the person shown in the base image is input, the face image (the face image converted to the face of the person shown in the user image) For example, a video (as if a person appearing in the user image had an expression of the face of the person appearing in the base image) is output. The face conversion model forming unit 233 can form a face conversion model with higher accuracy as the number of face parts used for learning increases. The face conversion model forming unit 233 can be implemented by software such as faceswap, for example.

顔コンバート部２３５は、顔変換モデル形成部２３３によって形成された顔変換モデルを用いて、ベース画像に写っている人物の顔をユーザ画像に写っている人物の顔に変換した顔映像を生成し、その顔映像をベース画像の映像に合成する。これにより、ベース画像に写っている人物があたかもユーザ画像に写っている人物であるかのような合成映像が生成される。ベース画像が動画である場合には、動画を構成する静止画の各々に対して、ベース画像に写っている人物の顔をユーザ画像に写っている人物の顔に変換した顔映像を生成し、その顔映像をベース画像の静止画映像に合成する。次いで、合成された静止画映像をつなぎ合わせることにより、動画映像を構築することができる。この動画映像は、ベース画像（動画）に写っている人物があたかもユーザ画像に写っている人物であるかのような動画となる。顔コンバート部２３５は、例えば、ｆａｃｅｓｗａｐ等のソフトウェアによって実装され得る。 Using the face conversion model formed by the face conversion model forming unit 233, the face conversion unit 235 generates a face image in which the face of the person shown in the base image is converted into the face of the person shown in the user image. Then, the face image is synthesized with the image of the base image. As a result, a composite image is generated as if the person appearing in the base image is a person appearing in the user image. When the base image is a moving image, for each of the still images constituting the moving image, generate a face image in which the face of the person shown in the base image is converted into the face of the person shown in the user image, The face image is synthesized with the still image image of the base image. Next, a moving image can be constructed by connecting the synthesized still image. The moving image is a moving image as if the person appearing in the base image (moving image) is the person appearing in the user image. The face conversion unit 235 can be implemented by software such as faceswap, for example.

音声変換モデル形成部２３４は、音声調節部２３１６によって抽出された音声の特徴と、音声調節部２３２６によって抽出された音声の特徴とを学習用データとして学習することにより、ベース画像に含まれる音声をユーザ画像に含まれる音声に変換するための音声変換モデルを形成する処理を行う。音声変換モデル形成部２３４は、例えば、多層ニューラルネットワークを用いたディープラーニング技術を利用する。このとき、学習用データを用いて学習する処理は、音声調節部２３１６によって抽出された音声の特徴と、音声調節部２３２６によって抽出された音声の特徴とを用いて、多層ニューラルネットワークの各隠れ層のノードの重み係数を調節する処理であり、これによって形成される音声変換モデルでは、ベース画像に含まれる音声を入力すると、ユーザ画像に含まれる音声に変換された音声（例えば、ベース画像に含まれる音声をあたかもユーザ画像内の人物が発したかのような音声）が出力されるようになる。音声変換モデル形成部２３４は、学習に用いられる音声の特徴が多いほど、精度の高い音声変換モデルを形成することが可能である。音声変換モデル形成部２３４は、例えば、ｃｙｃｌｅＧＡＮｖｏｉｃｅ等のソフトウェアによって実装され得る。 The speech conversion model forming unit 234 learns the features of the speech extracted by the speech adjustment unit 2316 and the features of the speech extracted by the speech adjustment unit 2326 as learning data, thereby converting the speech included in the base image. A process of forming a voice conversion model for converting into a voice included in the user image is performed. The speech conversion model forming unit 234 uses, for example, a deep learning technique using a multilayer neural network. At this time, the learning process using the learning data is performed by using the features of the speech extracted by the speech adjustment unit 2316 and the features of the speech extracted by the speech adjustment unit 2326 in each hidden layer of the multilayer neural network. In the speech conversion model formed by this, when the speech included in the base image is input, the speech converted into the speech included in the user image (for example, included in the base image) (A sound as if the person in the user image uttered the sound). The voice conversion model forming unit 234 can form a voice conversion model with higher accuracy as the number of voice features used for learning increases. The voice conversion model forming unit 234 can be implemented by software such as cycle GANvoice, for example.

音声コンバート部２３６は、音声変換モデル形成部２３４によって形成された音声変換モデルを用いて、ベース画像に含まれる音声をユーザ画像に含まれる音声に変換した音声を生成する。これにより、ベース画像に含まれる音声があたかもユーザ画像に含まれる人物の音声であるかのような音声が生成される。音声コンバート部２３６は、例えば、ｃｙｃｌｅＧＡＮｖｏｉｃｅ等のソフトウェアによって実装され得る。 The audio conversion unit 236 uses the audio conversion model formed by the audio conversion model forming unit 234 to generate audio converted from audio included in the base image into audio included in the user image. Thereby, a sound is generated as if the sound included in the base image is the sound of a person included in the user image. The audio conversion unit 236 can be implemented by software such as cycle GANvoice, for example.

映像音声結合部２３７は、顔コンバート部２３５によって生成された合成映像と、音声コンバート部２３６によって生成された音声とを結合し、音声付きの合成画像を生成する。映像音声結合部２３７は、例えば、ｆｆｍｐｅｇ等のソフトウェアによって実装され得る。 The video / audio combining unit 237 combines the synthesized video generated by the face converting unit 235 and the voice generated by the voice converting unit 236 to generate a synthesized image with voice. The video / audio coupling unit 237 may be implemented by software such as ffmpeg, for example.

映像音声結合部２３７によって生成された合成画像は、通信インターフェース部２１０を介してサーバ装置２００の外部（例えば、少なくとも１つの端末装置１００、データベース部３００）に出力される。 The composite image generated by the video / audio combining unit 237 is output to the outside of the server device 200 (for example, at least one terminal device 100 and the database unit 300) via the communication interface unit 210.

図３Ｃに示される例では、プロセッサ部２３０の各構成要素が同一のプロセッサ部２３０内に設けられているが、本発明はこれに限定されない。プロセッサ部２３０の各構成要素が、複数のプロセッサ部に分散される構成も本発明の範囲内である。 In the example illustrated in FIG. 3C, each component of the processor unit 230 is provided in the same processor unit 230, but the present invention is not limited to this. A configuration in which each component of the processor unit 230 is distributed to a plurality of processor units is also within the scope of the present invention.

３．ユーザのための合成画像を提供するためのコンピュータシステムにおける処理
図４は、ユーザのための合成画像を提供するためのコンピュータシステム１０００における処理の一例を示す。図４に示される例では、コンピュータシステム１０００において合成画像を生成するための処理を説明する。以下では、ユーザ画像に写っている１人の人物の合成画像を生成することを例に説明する。 3. Processing in Computer System for Providing a Composite Image for User FIG. 4 shows an example of processing in a computer system 1000 for providing a composite image for a user. In the example shown in FIG. 4, processing for generating a composite image in the computer system 1000 will be described. Hereinafter, generation of a composite image of one person shown in a user image will be described as an example.

ステップＳ４０１において、端末装置１００のプロセッサ部１５０が、少なくとも１つのユーザ画像を取得する。プロセッサ部１５０は、例えば、端末装置１００が備え得るカメラを制御することによりユーザ画像を取得してもよいし、記憶手段（例えば、メモリ、ストレージ、外部記憶装置等）に記憶されたユーザ画像を取得してもよいし、通信インターフェース部１１０を介してネットワーク４００上のユーザ画像を取得してもよい。 In step S401, the processor unit 150 of the terminal device 100 acquires at least one user image. The processor unit 150 may obtain a user image by controlling a camera that the terminal device 100 can have, or may store the user image stored in a storage unit (for example, a memory, a storage, an external storage device, or the like). The user image may be obtained, or a user image on the network 400 may be obtained via the communication interface unit 110.

ステップＳ４０２において、端末装置１００のプロセッサ部１５０が、通信インターフェース部１１０を介して、少なくとも１つのユーザ画像をサーバ装置２００に送信し、サーバ装置２００のプロセッサ部２３０が、通信インターフェース部２１０を介して、少なくとも１つのユーザ画像を受信する。好ましくは、端末装置１００のプロセッサ部１５０は、複数のユーザ画像をサーバ装置２００に送信し、サーバ装置２００のプロセッサ部２３０が、複数のユーザ画像を受信する。より多くのユーザ画像を受信することにより、後述する処理でより多くのユーザ画像を用いることができ、これにより、生成される合成画像の質が向上するからである。受信されたユーザ画像は、例えば、データベース部３００に格納されて後続の処理のために使用されてもよいし、メモリ部２２０に一時的に格納されて後続の処理のために使用されてもよい。 In step S402, the processor unit 150 of the terminal device 100 transmits at least one user image to the server device 200 via the communication interface unit 110, and the processor unit 230 of the server device 200 transmits the user image via the communication interface unit 210. Receiving at least one user image. Preferably, processor unit 150 of terminal device 100 transmits a plurality of user images to server device 200, and processor unit 230 of server device 200 receives the plurality of user images. This is because, by receiving more user images, more user images can be used in the processing described later, thereby improving the quality of the generated composite image. The received user image may be stored in the database unit 300 and used for subsequent processing, or may be temporarily stored in the memory unit 220 and used for subsequent processing, for example. .

ステップＳ４０２において、端末装置１００のプロセッサ部１５０は、ユーザ画像をサーバ装置２００に送信する代わりに、または、ユーザ画像をサーバ装置２００に送信することに加えて、ユーザ画像のネットワーク上の所在（例えば、ＵＲＬ）を送信するようにしてもよい。サーバ装置２００がユーザ画像のネットワーク上の所在を受信すると、サーバ装置２００は、その所在にアクセスしてユーザ画像を取得する。 In step S402, instead of transmitting the user image to the server device 200 or in addition to transmitting the user image to the server device 200, the processor unit 150 of the terminal device 100 , URL) may be transmitted. When the server device 200 receives the location of the user image on the network, the server device 200 accesses the location and acquires the user image.

ステップＳ４０３において、プロセッサ部２３０のユーザ画像取得部２３１１が、少なくとも１つのユーザ画像を取得する。ユーザ画像取得部２３１１は、例えば、ステップＳ４０２で少なくとも１つの端末装置１００から受信されたユーザ画像を取得してもよいし、データベース部３００に格納されたユーザ画像をデータベース部３００から取得してもよい。 In step S403, the user image acquisition unit 2311 of the processor unit 230 acquires at least one user image. The user image acquisition unit 2311 may acquire, for example, the user image received from at least one terminal device 100 in step S402, or may acquire the user image stored in the database unit 300 from the database unit 300. Good.

ユーザ画像を取得すると、ステップＳ４０４において、プロセッサ部２３０の映像切り出し部２３１３、音声切り出し部２３１４、顔切り出し部２３１５、音声調節部２３１６が、少なくとも１つのユーザ画像を処理する。例えば、映像切り出し部２３１３がユーザ画像から映像を切り出し、顔切り出し部２３１５が切り出された映像から顔部分を切り出し、次いで、音声切り出し部２３１４がユーザ画像から音声を切り出し、音声調節部２３１６が切り出された音声を調節しかつ音声の特徴を抽出する。 Upon acquiring the user image, in step S404, the video clipping unit 2313, the voice clipping unit 2314, the face clipping unit 2315, and the voice adjusting unit 2316 of the processor unit 230 process at least one user image. For example, the video cutout unit 2313 cuts out a video from the user image, the face cutout unit 2315 cuts out a face portion from the cutout video, then the audio cutout unit 2314 cuts out audio from the user image, and the sound adjustment unit 2316 cuts out audio. And adjust the speech and extract features of the speech.

ステップＳ４０５において、サーバ装置２００のプロセッサ部２３０が、通信インターフェース部２１０を介して、ベース画像提供者の端末装置５００から送信された少なくとも１つのベース画像を受信する。受信されたベース画像は、例えば、データベース部３００に格納されて後続の処理のために使用されてもよいし、メモリ部２２０に一時的に格納されて後続の処理のために使用されてもよい。サーバ装置２００は、ベース画像に加えて、ベース画像に写っている人物の別の画像を受信してもよい。ベース画像に写っている人物の画像をより多く受信することにより、後述する処理でより多くのその人物の画像を用いることができ、これにより、生成される合成画像の質が向上するからである。受信されたベース画像に写っている人物の画像は、例えば、データベース部３００に格納されて後続の処理のために使用されてもよいし、メモリ部２２０に一時的に格納されて後続の処理のために使用されてもよい。 In step S405, the processor unit 230 of the server device 200 receives at least one base image transmitted from the base image provider terminal device 500 via the communication interface unit 210. For example, the received base image may be stored in the database unit 300 and used for subsequent processing, or may be temporarily stored in the memory unit 220 and used for subsequent processing. . The server device 200 may receive another image of the person shown in the base image in addition to the base image. This is because, by receiving more images of a person in the base image, more images of the person can be used in the processing described later, thereby improving the quality of the generated composite image. . The image of the person appearing in the received base image may be stored in, for example, the database unit 300 and used for subsequent processing, or may be temporarily stored in the memory unit 220 and used for subsequent processing. May be used for

ステップＳ４０５では、サーバ装置２００のプロセッサ部２３０は、ベース画像提供者の端末装置５００から送信されたベース画像（およびベース画像に写っている人物の画像）を受信する代わりに、または、ベース画像提供者の端末装置５００から送信されたベース画像（およびベース画像に写っている人物の画像）を受信することに加えて、ベース画像（およびベース画像に写っている人物の画像）のネットワーク上の所在（例えば、ＵＲＬ）を受信してもよい。サーバ装置２００がベース画像（およびベース画像に写っている人物の画像）のネットワーク上の所在を受信すると、サーバ装置２００は、その所在にアクセスしてベース画像（およびベース画像に写っている人物の画像）を取得する。 In step S405, the processor unit 230 of the server device 200 receives the base image (and the image of a person appearing in the base image) transmitted from the terminal device 500 of the base image provider, or provides the base image. In addition to receiving the base image (and the image of the person shown in the base image) transmitted from the terminal device 500 of the user, the location of the base image (and the image of the person shown in the base image) on the network (For example, a URL). When server device 200 receives the location of the base image (and the image of the person shown in the base image) on the network, server device 200 accesses the location and accesses the base image (and the person in the base image). Image).

ステップＳ４０５では、サーバ装置２００のプロセッサ部２３０は、改変が許諾されたベース画像のみを受信および／または取得するようにすることが好ましい。改変が許諾された画像は、例えば、ベース画像に写っている人物の改変が許諾された画像を含む。改変が許諾されたベース画像のみを受信または取得することにより、サーバ装置２００がベース画像に基づいて生成した合成画像が、ベース画像に付随し得る権利（例えば、著作権、肖像権等）を侵害する可能性を排除することができるようになるからである。 In step S405, the processor unit 230 of the server apparatus 200 preferably receives and / or acquires only the base image for which the modification is permitted. The image for which the modification is permitted includes, for example, an image for which modification of the person in the base image is permitted. By receiving or acquiring only the base image for which the modification is permitted, the composite image generated by the server device 200 based on the base image infringes the rights (eg, copyright, portrait right, etc.) that can accompany the base image. This is because it becomes possible to eliminate the possibility of doing this.

例えば、サーバ装置２００は、サーバ装置２００に送信されるベース画像は全て改変が許諾されている旨の同意をベース画像提供者から予め得ておいてもよいし、ベース画像提供者の端末装置５００からベース画像を受信するときに、ベース画像の改変が許諾されていることを示す情報も共に受信するようにしてもよい。このとき、ベース画像の改変が許諾されていることを示す情報は、ベース画像に関連付けられてデータベース部３００に格納され得る。 For example, the server apparatus 200 may obtain in advance from the base image provider an agreement that all the base images transmitted to the server apparatus 200 are permitted to be modified, or the terminal apparatus 500 of the base image provider. When the base image is received from, information indicating that modification of the base image is permitted may be received together. At this time, information indicating that modification of the base image is permitted may be stored in the database unit 300 in association with the base image.

ステップＳ４０６において、プロセッサ部２３０のベース画像取得部２３２１が、少なくとも１つのベース画像を取得する。ベース画像取得部２３２１は、例えば、ステップＳ４０３でベース画像提供者の端末装置５００から受信されたベース画像を取得してもよいし、データベース部３００に格納されたベース画像をデータベース部３００から取得してもよい。 In step S406, the base image acquisition unit 2321 of the processor unit 230 acquires at least one base image. The base image obtaining unit 2321 may obtain the base image received from the terminal device 500 of the base image provider in step S403, or may obtain the base image stored in the database unit 300 from the database unit 300, for example. You may.

ステップＳ４０６は、例えば、プロセッサ部２３０のベース画像取得部２３２１が、取得されるベース画像の改変が許諾されているか否かを判定することを含み得る。ベース画像の改変が許諾されているか否かを判定は、例えば、プロセッサ部２３０のベース画像取得部２３２１が、ベース画像に付されている情報を参照すること、ベース画像に関連付けられてデータベース部３００に格納されている情報を参照すること、ベース画像の権利を管理する権利管理団体のサーバ装置に通信インターフェース部２１０を介して問い合わせを送信すること、ベース画像提供者の端末装置５００に改変の許諾の明示を要求するリクエストを通信インターフェース部２１０を介して送信すること等によって行われ得る。例えば、プロセッサ部２３０のベース画像取得部２３２１は、改変が許諾されている場合にのみそのベース画像を取得するようにしてもよいし、改変が許諾されている場合にのみステップＳ４０７に進むようにしてもよい。 Step S <b> 406 may include, for example, determining whether the modification of the acquired base image is permitted by the base image acquisition unit 2321 of the processor unit 230. The determination as to whether the modification of the base image is permitted may be made, for example, by referring to the information attached to the base image by the base image acquisition unit 2321 of the processor unit 230, or by referring to the database unit 300 in association with the base image. Refer to the information stored in the base image provider, send an inquiry via the communication interface unit 210 to the server of the rights management organization that manages the rights of the base image, and permit the base image provider's terminal device 500 to modify. Is transmitted through the communication interface unit 210 or the like. For example, the base image acquisition unit 2321 of the processor unit 230 may acquire the base image only when the modification is permitted, or may proceed to step S407 only when the modification is permitted. Good.

ベース画像を取得すると、ステップＳ４０７において、プロセッサ部２３０の映像切り出し部２３２３、音声切り出し部２３２４、顔切り出し部２３２５、音声調節部２３２６が、少なくとも１つのベース画像を処理する。例えば、映像切り出し部２３２３がベース画像から映像を切り出し、顔切り出し部２３２５が切り出された映像から顔部分を切り出し、次いで、音声切り出し部２３２４がベース画像から音声を切り出し、音声調節部２３２６が切り出された音声を調節しかつ音声の特徴を抽出する。 After acquiring the base image, in step S407, the video clipping unit 2323, the voice clipping unit 2324, the face clipping unit 2325, and the voice adjusting unit 2326 of the processor unit 230 process at least one base image. For example, the video cutout unit 2323 cuts out a video from the base image, the face cutout unit 2325 cuts out a face part from the cutout video, and then the audio cutout unit 2324 cuts out audio from the base image, and the audio adjustment unit 2326 cuts out audio. And adjust the speech and extract features of the speech.

少なくとも１つのユーザ画像および少なくとも１つのベース画像の処理が完了すると、ステップＳ４０８において、プロセッサ部２３０の顔変換モデル形成部２３３および音声変換モデル形成部２３４が、変換モデルを形成する。例えば、顔変換モデル形成部２３３が顔変換モデルを形成し、次いで、音声変換モデル形成部２３４が音声変換モデルを形成する。顔変換モデルおよび音声変換モデルは、例えば、ディープラーニング技術を利用して形成される。 When the processing of at least one user image and at least one base image is completed, in step S408, the face conversion model forming unit 233 and the voice conversion model forming unit 234 of the processor unit 230 form a conversion model. For example, the face conversion model forming unit 233 forms a face conversion model, and then the voice conversion model forming unit 234 forms a voice conversion model. The face conversion model and the voice conversion model are formed by using, for example, a deep learning technique.

顔変換モデルおよび音声変換モデルが形成されると、ステップＳ４０９において、プロセッサ部２３０の顔コンバート部２３５、音声コンバート部２３６、映像音声結合部２３７が、合成画像を生成する。例えば、顔コンバート部２３５が、ステップＳ４０８で形成された顔変換モデルを用いて、ベース画像に写っている人物の顔をユーザ画像に写っている人物の顔に変換した顔映像を生成し、その顔映像をベース画像の映像に合成する。例えば、音声コンバート部２３６が、ステップＳ４０８で形成された音声変換モデルを用いて、ベース画像に含まれる音声をユーザ画像に含まれる音声に変換した音声を生成する。例えば、映像音声結合部２３７が、生成された合成映像と、生成された音声とを結合することにより、音声付きの合成画像を生成する。 When the face conversion model and the voice conversion model are formed, in step S409, the face conversion unit 235, the voice conversion unit 236, and the video / audio combination unit 237 of the processor unit 230 generate a synthesized image. For example, using the face conversion model formed in step S408, the face conversion unit 235 generates a face video obtained by converting the face of the person shown in the base image into the face of the person shown in the user image, The face image is combined with the base image. For example, the audio conversion unit 236 generates audio obtained by converting audio included in the base image into audio included in the user image, using the audio conversion model formed in step S408. For example, the video / audio combining unit 237 generates a combined image with sound by combining the generated composite video and the generated audio.

このようにして、サーバ装置２００において合成画像が生成される。 In this way, a composite image is generated in server device 200.

上述した例では、ステップＳ４０１でユーザ画像を取得することを説明したが、ステップＳ４０１は必ずしも必要ではない。ステップＳ４０２で上述したようにユーザ画像のネットワーク上の所在を送信する場合には、ユーザ装置１００がユーザ画像を保持する必要がないからである。 In the example described above, the user image is acquired in step S401, but step S401 is not always necessary. This is because when transmitting the location of the user image on the network as described above in step S402, the user device 100 does not need to hold the user image.

上述した例では、各ステップで映像と音声とを直列に処理することを説明したが、映像と音声とは並列に処理されるようにしてもよい。このとき、映像のための処理は、音声のための処理の完了を待って次のステップに進むようにしてもよいし、音声のための処理の完了を待つことなく次のステップに進むようにしてもよい。逆もまた同様である。 In the example described above, video and audio are processed in series in each step. However, video and audio may be processed in parallel. At this time, the process for video may proceed to the next step after waiting for the completion of the process for audio, or may proceed to the next step without waiting for the completion of the process for audio. The reverse is also true.

上述した例では、ステップＳ４０２〜ステップＳ４０４の後にステップＳ４０５〜ステップＳ４０７が行われる例を説明したが、ステップＳ４０２〜ステップＳ４０４とステップＳ４０５〜ステップＳ４０７との順序は問わない。例えば、ステップＳ４０５〜ステップＳ４０７の少なくとも１つの後にステップＳ４０２〜ステップＳ４０４が行われてもよいし、ステップＳ４０２〜ステップＳ４０４の少なくとも１つとステップＳ４０５〜ステップＳ４０７の少なくとも１つとが並列に行われてもよい。 In the example described above, an example in which steps S405 to S407 are performed after steps S402 to S404 has been described, but the order of steps S402 to S404 and steps S405 to S407 does not matter. For example, steps S402 to S404 may be performed after at least one of steps S405 to S407, or at least one of steps S402 to S404 and at least one of steps S405 to S407 may be performed in parallel. Good.

例えば、ステップＳ４０９でサーバ装置２００が合成画像を生成する前のいずれかのタイミングで、端末装置１００のプロセッサ部１５０は、通信インターフェース部１１０を介して、ユーザ画像に基づいて合成画像を生成することの許可をサーバ装置２００に送信し、サーバ装置２００のプロセッサ部２３０がこれを受信するようにしてもよい。これにより、サーバ装置２００がユーザ画像に基づいて生成した合成画像が、ユーザ画像に付随し得る権利（例えば、著作権、肖像権等）を侵害する可能性を排除することができるようになるからである。例えば、ユーザ画像に基づいて合成画像を生成することの許可は、ユーザ画像をサーバ装置２００に送信する度になされるようにしてもよいし、所定のタイミング（例えば、１日１回、週１回、月１回、年１回等）になされるようにしてもよい。例えば、図１Ａを参照して上述したユーザのための合成画像を提供するためのサービスの利用登録を行ったことでもって、ユーザ画像に基づいて合成画像を生成することの許可がなされたものとみなしてもよい。このとき、ユーザ画像に基づいて合成画像を生成することの許可として利用登録情報がサーバ装置２００に送信される。 For example, at any timing before the server device 200 generates the composite image in step S409, the processor unit 150 of the terminal device 100 generates the composite image based on the user image via the communication interface unit 110. May be transmitted to the server device 200, and the processor unit 230 of the server device 200 may receive the permission. This eliminates the possibility that the composite image generated by the server device 200 based on the user image infringes the rights (for example, copyright, portrait right, etc.) that can accompany the user image. It is. For example, permission to generate a composite image based on the user image may be made each time the user image is transmitted to the server device 200, or at a predetermined timing (for example, once a day, once a week, Times, once a month, once a year, etc.). For example, the registration of the use of the service for providing the composite image for the user described above with reference to FIG. 1A has permitted the generation of the composite image based on the user image. You may consider it. At this time, the use registration information is transmitted to the server device 200 as permission to generate a composite image based on the user image.

例えば、ベース画像に複数の人物が写っており、複数の人物のそれぞれが改変を許諾されているとき、ベース画像に写っている複数の人物のうちのいずれかの顔および音声とユーザ画像に写っている人物の顔および音声とを変換した合成画像を生成するようにしてもよい。ベース画像に写っている複数の人物のうちのどの人物の顔および音声と変換するかは、例えば、ユーザが設定できるようにしてもよいし、サーバ装置２００が自動的に決定するようにしてもよい。 For example, when a plurality of persons are shown in the base image and each of the plurality of persons is permitted to be modified, the face and voice of one of the plurality of persons shown in the base image and the user image are displayed. A synthesized image obtained by converting the face and voice of a person may be generated. Which one of a plurality of persons in the base image is to be converted to the face and voice may be set by the user, for example, or may be automatically determined by the server device 200. Good.

例えば、ユーザが設定する場合、サーバ装置２００は、ベース画像に写っている複数の人物の選択肢を端末装置１００に提供し、ユーザは、ベース画像に写っている複数の人物のうちの１人を選択する入力を端末装置１００に行い、端末装置１００はその入力をサーバ装置２００に送信する。サーバ装置２００は、その入力に基づいて、ベース画像に写っている複数の人物のうちの選択された人物の顔および音声とユーザ画像に写っている人物の顔および音声とを変換した合成画像を生成する。 For example, when the user sets, the server device 200 provides the terminal device 100 with options of a plurality of persons appearing in the base image, and the user designates one of the plurality of persons appearing in the base image. The input to be selected is performed on the terminal device 100, and the terminal device 100 transmits the input to the server device 200. Based on the input, the server device 200 converts the combined image obtained by converting the face and voice of the selected person among the plurality of persons shown in the base image and the face and voice of the person shown in the user image. Generate.

例えば、サーバ装置２００が自動的に決定する場合、ユーザ画像に写っている人物の属性（例えば、性別、年齢）、身体的特徴（例えば、身長、体型）等に基づいて、ベース画像に写っている複数の人物のうち、ユーザ画像に写っている人物により類似する人物、または、より類似しない人物を選択するようにしてもよいし、あるいは、ランダムに選択するようにしてもよい。例えば、ユーザ画像に写っている人物が男性であれば、ベース画像に写っている男性の顔および音声とユーザ画像に写っているその男性の顔および音声とを変換した合成画像を自動的に生成することができる。例えば、ユーザ画像に写っている人物が小太りの女性であれば、ベース画像に写っている細身の女性の顔および音声とユーザ画像に写っているその小太りの女性の顔および音声とを変換した合成画像を自動的に生成することができる。サーバ装置２００が自動的に変換対象の人物を決定することは、予期せぬ合成画像の創造につながり、新たなメディア体験をユーザに提供することにつながる。 For example, when the server device 200 automatically determines, based on the attributes (for example, gender, age), physical characteristics (for example, height, body type), and the like of the person shown in the user image, A person who is more similar to the person shown in the user image or a person who is less similar may be selected from the plurality of persons, or may be selected at random. For example, if the person in the user image is a male, a synthetic image is automatically generated by converting the male face and the voice in the base image and the male face and the voice in the user image. can do. For example, if the person appearing in the user image is a fat woman, the synthesis is performed by converting the face and voice of the slender woman appearing in the base image and the face and voice of the fat woman appearing in the user image. Images can be generated automatically. When the server device 200 automatically determines the person to be converted, it leads to the unexpected creation of a composite image and provides the user with a new media experience.

上述した例では、ユーザ画像に写っている１人の人物の合成画像を生成することを例に説明したが、本発明はこれに限定されない。ユーザのための合成画像を提供するためのコンピュータシステム１０００における処理では、ユーザ画像に写っている複数の人物の合成画像を生成することが可能である。 In the above-described example, an example has been described in which a composite image of one person shown in a user image is generated, but the present invention is not limited to this. In the processing in the computer system 1000 for providing a composite image for the user, it is possible to generate a composite image of a plurality of persons appearing in the user image.

例えば、図４の例において、ステップＳ４０２で複数の人物のユーザ画像（例えば、複数の人物が写っている少なくとも１つのユーザ画像、または、複数の人物のそれぞれが写っている複数のユーザ画像）を受信し、ステップＳ４０３〜ステップＳ４０４をユーザ画像に写っているそれぞれの人物について行う。ステップＳ４０５で、複数の人物が写っている少なくとも１つのベース画像を受信し、ステップＳ４０６〜ステップＳ４０７をベース画像に写っているそれぞれの人物について行う。次いで、ステップＳ４０８をユーザ画像に写っているそれぞれの人物およびベース画像に写っているそれぞれの人物について行う。 For example, in the example of FIG. 4, in step S402, the user images of a plurality of persons (for example, at least one user image in which a plurality of persons are captured, or a plurality of user images in which each of a plurality of persons is captured) are displayed. Then, steps S403 to S404 are performed for each person in the user image. In step S405, at least one base image including a plurality of persons is received, and steps S406 to S407 are performed for each person in the base image. Next, step S408 is performed for each person shown in the user image and each person shown in the base image.

例えば、ユーザ画像に写っている第１の人物についてステップＳ４０３〜ステップＳ４０４を行い、ベース画像に写っている第１の人物についてステップＳ４０６〜ステップＳ４０７を行い、ユーザ画像に写っている第１の人物とベース画像に写っている第１の人物とについて、ステップＳ４０８を行うことにより、ベース画像に写っている第１の人物の顔をユーザ画像に写っている第１の人物の顔に変換するための顔変換モデルを形成し、ベース画像に写っている第１の人物の音声をユーザ画像に写っている第１の人物の音声に変換するための音声変換モデルを形成する。 For example, steps S403 to S404 are performed on the first person in the user image, steps S406 to S407 are performed on the first person in the base image, and the first person in the user image is performed. Step S408 is performed for the first person in the base image and the first person in the base image to convert the first person's face in the base image into the first person's face in the user image. And a voice conversion model for converting the voice of the first person in the base image into the voice of the first person in the user image.

例えば、ユーザ画像に写っている第２の人物についてステップＳ４０３〜ステップＳ４０４を行い、ベース画像に写っている第２の人物についてステップＳ４０６〜ステップＳ４０７を行い、ユーザ画像に写っている第２の人物とベース画像に写っている第２の人物とについて、ステップＳ４０８を行うことにより、ベース画像に写っている第２の人物の顔をユーザ画像に写っている第２の人物の顔に変換するための顔変換モデルを形成し、ベース画像に写っている第２の人物の音声をユーザ画像に写っている第２の人物の音声に変換するための音声変換モデルを形成する。 For example, steps S403 to S404 are performed on the second person in the user image, steps S406 to S407 are performed on the second person in the base image, and the second person in the user image is performed. Step S408 is performed on the image and the second person in the base image to convert the face of the second person in the base image into the face of the second person in the user image. And a voice conversion model for converting the voice of the second person in the base image into the voice of the second person in the user image.

例えば、ユーザ画像に写っている第ｎの人物についてステップＳ４０３〜ステップＳ４０４を行い、ベース画像に写っている第ｎの人物についてステップＳ４０６〜ステップＳ４０７を行い、ユーザ画像に写っている第ｎの人物とベース画像に写っている第ｎの人物とについて、ステップＳ４０８を行うことにより、ベース画像に写っている第ｎの人物の顔をユーザ画像に写っている第ｎの人物の顔に変換するための顔変換モデルを形成し、ベース画像に写っている第ｎの人物の音声をユーザ画像に写っている第ｎの人物の音声に変換するための音声変換モデルを形成する（ｎは２以上の整数）。 For example, steps S403 to S404 are performed on the nth person in the user image, steps S406 to S407 are performed on the nth person in the base image, and the nth person in the user image Step S408 is performed on the and the nth person in the base image to convert the face of the nth person in the base image into the face of the nth person in the user image. And a voice conversion model for converting the voice of the n-th person shown in the base image into the voice of the n-th person shown in the user image (where n is 2 or more) integer).

このようにして形成された各人物の変換モデルを用いて、ステップＳ４０９において、ベース画像に写っている第１の人物の顔をユーザ画像に写っている第１の人物の顔に変換した顔映像をベース画像の映像に合成し、ベース画像に写っている第２の人物の顔をユーザ画像に写っている第２の人物の顔に変換した顔映像をベース画像の映像に合成し、・・・ベース画像に写っている第ｎの人物の顔をユーザ画像に写っている第ｎの人物の顔に変換した顔映像をベース画像の映像に合成する。また、ベース画像に写っている第１の人物の音声をユーザ画像に写っている第１の人物の音声に変換した音声を生成し、ベース画像に写っている第２の人物の音声をユーザ画像に写っている第２の人物の音声に変換した音声を生成し、・・・ベース画像に写っている第ｎの人物の音声をユーザ画像に写っている第ｎの人物の音声に変換した音声を生成する。そして、合成された映像と生成された音声とを結合することにより、音声付きの合成画像が生成される。 In step S409, the face image obtained by converting the face of the first person in the base image into the face of the first person in the user image using the conversion model of each person formed in this manner. Is combined with the image of the base image, and the face image obtained by converting the face of the second person in the base image into the face of the second person in the user image is combined with the image of the base image. A face image obtained by converting the face of the n-th person in the base image into the face of the n-th person in the user image is synthesized with the image of the base image. Also, a voice is generated by converting the voice of the first person in the base image into the voice of the first person in the user image, and the voice of the second person in the base image is converted to the user image. , A voice converted from the voice of the second person in the base image, and a voice converted from the voice of the n-th person in the base image into the voice of the n-th person in the user image Generate Then, by combining the synthesized video and the generated audio, a synthesized image with audio is generated.

複数の人物の合成画像を生成するとき、ユーザ画像に写っている人物の顔および音声をベース画像に写っているどの人物の顔および音声と変換するかは、例えば、ユーザが設定できるようにしてもよいし、サーバ装置２００が自動的に決定するようにしてもよい。 When generating a composite image of a plurality of people, the face and voice of the person in the user image to be converted to the face and voice of the person in the base image can be set by the user, for example. Alternatively, the server device 200 may automatically determine it.

例えば、ユーザが設定する場合、サーバ装置２００は、ベース画像に写っている複数の人物の選択肢を端末装置１００に提供し、ユーザは、ユーザ画像に写っている複数の人物のそれぞれとベース画像に写っている複数の人物のそれぞれとを対応付ける入力を端末装置１００に行い、端末装置１００はその入力をサーバ装置２００に送信する。サーバ装置２００は、その入力に基づいて、ベース画像およびユーザ画像の対応付けられた人物それぞれの顔および音声を変換した合成画像を生成する。 For example, when set by the user, the server device 200 provides the terminal device 100 with a plurality of choices of a plurality of persons appearing in the base image, and the user selects the plurality of persons appearing in the user image and the base image. The terminal device 100 performs input for associating each of the plurality of persons in the image with the terminal device 100, and the terminal device 100 transmits the input to the server device 200. Based on the input, server device 200 generates a composite image obtained by converting the face and voice of each person associated with the base image and the user image.

例えば、サーバ装置２００が自動的に決定する場合、ユーザ画像に写っている人物およびベース画像に写っている人物の属性（例えば、性別、年齢）、身体的特徴（例えば、身長、体型）等に基づいて、ベース画像に写っている複数の人物のうち、ユーザ画像に写っている人物により類似する人物、または、より類似しない人物を選択するようにしてもよいし、あるいは、ランダムに選択するようにしてもよい。ユーザ画像に写っている人物の属性および身体的特徴は、例えば、データベース部３００に格納されている情報を参照してもよいし、既存の画像認識技術を用いることにより、ユーザ画像から認識するようにしてもよい。ベース画像に写っている人物の属性および身体的特徴は、例えば、既存の画像認識技術を用いることにより、ユーザ画像から認識するようにしてもよい。例えば、ユーザ画像に写っている人物が男性であれば、ベース画像に写っている男性の顔および音声とユーザ画像に写っているその男性の顔および音声とを変換した合成画像を自動的に生成することができる。例えば、ユーザ画像に写っている人物が小太りの女性であれば、ベース画像に写っている細身の女性の顔および音声とユーザ画像に写っているその小太りの女性の顔および音声とを変換した合成画像を自動的に生成することができる。サーバ装置２００が自動的に変換対象の人物を決定することは、予期せぬ合成画像の創造につながり、面白く、新たなメディア体験をユーザに提供することにつながる。 For example, when the server device 200 automatically determines, the attributes (for example, gender, age), physical characteristics (for example, height, body type), and the like of the person shown in the user image and the person shown in the base image are used. Based on the plurality of persons shown in the base image, a person who is more similar to the person shown in the user image or a person who is less similar may be selected, or may be selected at random. It may be. For example, the attributes and physical characteristics of the person appearing in the user image may be referred to information stored in the database unit 300, or may be recognized from the user image by using existing image recognition technology. It may be. The attributes and physical characteristics of a person appearing in the base image may be recognized from the user image by using, for example, an existing image recognition technology. For example, if the person in the user image is a male, a synthetic image is automatically generated by converting the male face and the voice in the base image and the male face and the voice in the user image. can do. For example, if the person appearing in the user image is a fat woman, the synthesis is performed by converting the face and voice of the slender woman appearing in the base image and the face and voice of the fat woman appearing in the user image. Images can be generated automatically. When the server device 200 automatically determines the person to be converted, it leads to the unexpected creation of a composite image, and provides the user with an interesting and new media experience.

例えば、サーバ装置２００が自動的に決定する場合、ユーザ画像に写っている複数の人物の間の人間関係（例えば、恋人、兄弟、親子等）およびベース画像に写っている複数の人物の間の人間関係に基づいて、ベース画像に写っている複数の人物のうち、ユーザ画像に写っている人物の人間関係により類似する人物、または、より類似しない人物を選択するようにしてもよい。ユーザ画像に写っている複数の人物の間の人間関係は、例えば、データベース部３００に格納されている情報を参照してもよいし、既存の画像認識技術を用いることにより、ユーザ画像から推定するようにしてもよい。例えば、同年代の男性と女性が笑顔で写っている画像から、その男性とその女性が恋人あるいは夫婦であると推定される。ベース画像に写っている複数の人物の間の人間関係は、例えば、既存の画像認識技術を用いることにより、ユーザ画像から認識するようにしてもよい。例えば、ユーザ画像に写っている二人の男女が夫婦であると推定された場合、ベース画像に写っているヒーローの顔および音声とユーザ画像に写っているその男性の顔および音声とを変換し、ベース画像に写っているヒロインの顔および音声とユーザ画像に写っているその女性の顔および音声とを変換した合成画像を自動的に生成することができる。例えば、ユーザ画像に写っている大人と子供が親子であると推定された場合、ベース画像に写っている大人の顔および音声とユーザ画像に写っているその大人の顔および音声とを変換し、ベース画像に写っている子供の顔および音声とユーザ画像に写っているその子供の顔および音声とを変換した合成画像を自動的に生成することができる。画像から推定される人間関係が実際の人間関係とは異なるものであったとしても、そのような異なる人間関係に基づいて生成された合成画像もやはり予期せぬ合成画像となり、面白く、これもまた、新たなメディア体験をユーザに提供することにつながる。 For example, when the server device 200 determines automatically, a human relationship (for example, lover, sibling, parent and child, etc.) between a plurality of persons appearing in a user image and a plurality of persons appearing in a base image. Based on the human relationship, a person who is more or less similar to the human relationship of the person shown in the user image may be selected from a plurality of people shown in the base image. The human relationship between a plurality of persons appearing in the user image may be estimated from the user image by using, for example, information stored in the database unit 300 or using an existing image recognition technique. You may do so. For example, from an image in which men and women of the same age are shown with smiles, it is estimated that the men and women are lovers or couples. The human relationship between a plurality of persons appearing in the base image may be recognized from the user image by using, for example, an existing image recognition technology. For example, when it is estimated that two men and women appearing in the user image are a couple, the hero face and voice shown in the base image and the male face and voice shown in the user image are converted. It is possible to automatically generate a composite image in which the face and voice of the heroine shown in the base image and the female face and voice shown in the user image are converted. For example, if the adult and child in the user image are estimated to be parents and children, convert the adult face and voice in the base image and the adult face and voice in the user image, It is possible to automatically generate a composite image obtained by converting the child's face and voice shown in the base image and the child's face and voice shown in the user image. Even if the human relationship estimated from the image is different from the actual human relationship, the composite image generated based on such a different human relationship is also an unexpected composite image, which is interesting and also , Leading to a new media experience for users.

ベース画像は、俳優、芸能人、スポーツ選手等が出演する映画、番組、広告等であり得るが、ベース画像は、例えば、著作権フリーモデルを用いて撮影した画像であってもよい。著作権フリーモデルは、合成画像のために用いられるベース画像を撮影するためのモデルであり、言い換えると、著作権フリーモデルを用いて撮影した画像は、合成画像専用のベース画像となる。俳優、芸能人、スポーツ選手等であれば契約切れや犯罪等によってその人物の画像を企業が使用できなくなるおそれがあるが、著作権フリーモデルであれば、著作権フリーモデルの顔および音声がユーザ画像に写っている人物の顔および音声と変換されることが前提であるため、著作権フリーモデルの顔および音声がそのまま流通することはなく、契約切れや犯罪等によって画像を使用できなくなるリスクは極めて小さい。 The base image may be a movie, a program, an advertisement, or the like in which an actor, an entertainer, an athlete, or the like appears, but the base image may be, for example, an image shot using a copyright-free model. The copyright free model is a model for photographing a base image used for a composite image. In other words, an image photographed using the copyright free model is a base image dedicated to the composite image. In the case of an actor, entertainer, athlete, etc., the image of the person may not be usable by the company due to the expiration of the contract or the crime, but in the case of the copyright-free model, the face and sound of the copyright-free model are the user image It is assumed that the face and sound of the copyright-free model will not be distributed as they are, and the risk of being unable to use the image due to expiration of contract or crime is extremely high. small.

著作権フリーモデルについて予め大量の画像を撮影しておき、ステップＳ４０６、ステップＳ４０７の処理を予め行っておくことにより、合成画像を生成する処理における負荷を軽減することができ、合成画像生成の速度を向上させることができる。また、大量の画像を用いて学習することができるようになるため、変換モデルの精度も向上する。 By photographing a large amount of images in advance for the copyright-free model and performing the processes of steps S406 and S407 in advance, the load in the process of generating a composite image can be reduced, and the speed of generating the composite image can be reduced. Can be improved. Further, since learning can be performed using a large number of images, the accuracy of the conversion model is also improved.

著作権フリーモデルの顔および音声が著作権フリーモデルを用いて撮影した画像の著作権は、テレビ局、映像制作会社、撮影者等に帰属し得るが、著作権フリーモデルを用いて撮影した画像は、改変が許諾されたものとみなされ得る。従って、権利の面からも扱いが容易である。 The copyright of the image of the face and sound of the copyright-free model taken using the copyright-free model may belong to television stations, video production companies, photographers, etc. , May be considered as being licensed. Therefore, it is easy to handle in terms of rights.

著作権フリーモデルは、例えば、平均的な顔および／または音声を有するモデルであることが好ましい。平均的な顔および／または音声であれば、変換モデルの精度が向上するからである。平均の定義は、例えば、ユーザによって変動し得る。例えば、ユーザの国籍に基づいて平均を定義してもよく、ユーザの国籍が日本である場合には日本人の平均的な顔および／または音声であり得、ユーザの国籍が中国である場合には中国人の平均的な顔および／または音声であり得る。例えば、ユーザの出身地に基づいて平均を定義してもよく、ユーザの出身地が関西地方である場合には関西地方出身の人の平均的な顔および／または音声であり得、ユーザの出身地が九州地方である場合には九州地方出身の人の平均的な顔および／または音声であり得る。 The copyright-free model is preferably, for example, a model having an average face and / or sound. This is because the accuracy of the conversion model is improved with an average face and / or voice. The definition of an average can vary, for example, from user to user. For example, the average may be defined based on the user's nationality, which may be the average face and / or voice of a Japanese if the user's nationality is Japan, and a Chinese if the user's nationality is Chinese. May be the average face and / or voice of a Chinese. For example, the average may be defined based on the user's hometown, which may be the average face and / or voice of a person from the Kansai region if the user's hometown is in the Kansai region, If the land is in the Kyushu region, it may be the average face and / or sound of a person from the Kyushu region.

上述したステップＳ４０８で形成される変換モデルは、変換モデルを形成するために用いられるベース画像に写っている人物の顔または音声とユーザ画像に写っている人物の顔または音声とが類似している方が、精度が高くなる。従って、著作権フリーモデルの顔および／または、使用されるユーザ画像に写っている人物の顔および／または音声と類似していることが好ましい。例えば、１のベース画像に対して複数のサブベース画像を用意することによって、これを達成することができる。各サブベース画像は、内容が同一であるが、写っている人物がそれぞれ異なる画像である。例えば、各サブベース画像は、写っている人物のみが異なる画像である。例えば、各サブベース画像は、同一の脚本の映画または広告動画を異なる著作権フリーモデルを用いて撮影した画像である。例えば、各サブベース画像は、同一のコンセプトの画像を異なる著作権フリーモデルを用いて撮影した画像である。 In the conversion model formed in step S408 described above, the face or voice of the person shown in the base image used to form the conversion model is similar to the face or voice of the person shown in the user image. The higher the accuracy. Therefore, it is preferably similar to the face of the copyright-free model and / or the face and / or the voice of the person in the user image used. For example, this can be achieved by preparing a plurality of sub-base images for one base image. Each sub-base image is an image having the same content but different persons in the image. For example, each sub-base image is an image in which only the person in the image is different. For example, each sub-base image is an image obtained by shooting a movie or an advertisement moving image of the same script using different copyright-free models. For example, each sub-base image is an image obtained by photographing an image of the same concept using a different copyright-free model.

例えば、平均的な顔および／または音声を有する複数の著作権フリーモデルを用いて各サブベース画像を撮影しておき、合成画像を生成する前に、ユーザ画像に写っている人物に最も類似する著作権フリーモデルが写っているサブベース画像を決定し、決定されたサブベース画像に基づいてステップＳ４０９で合成画像を生成するようにすることができる。これにより、精度の高い変換モデルに基づいた合成画像が生成され、合成画像の質が向上する。ユーザ画像に写っている人物に最も類似する著作権フリーモデルは、例えば、既知の画像認識技術を用いて、画像の類似度を算出することによって決定され得る。 For example, each sub-base image is photographed using a plurality of copyright-free models having an average face and / or sound, and is most similar to the person shown in the user image before generating a composite image. It is possible to determine a sub-base image in which the copyright-free model is shown, and generate a composite image in step S409 based on the determined sub-base image. As a result, a composite image based on the highly accurate conversion model is generated, and the quality of the composite image is improved. The copyright-free model most similar to the person appearing in the user image can be determined by calculating the similarity between the images using, for example, a known image recognition technique.

上述した例では、顔および音声を変換した合成画像を生成することを説明したが、本発明はこれに限定されない。顔および音声に限られない、ベース画像に写っている人物の少なくとも一部とユーザ画像に写っている人物の少なくとも一部とを変換した合成画像を生成することも本願発明の範囲内である。これは、例えば、上述した顔変換モデルおよび音声変換モデルの代わりに、ベース画像に写っている人物の少なくとも一部とユーザ画像に写っている人物の少なくとも一部とを変換する変換モデルを構築することにより、図４を参照して上述した処理と同様の処理により達成され得る。ベース画像に写っている人物の少なくとも一部とユーザ画像に写っている人物の少なくとも一部とを変換する変換モデルは、例えば、ベース画像から得られたベース画像に写っている人物の少なくとも一部と、ユーザ画像から得られたユーザ画像に写っている人物の少なくとも一部とを用いて学習されたモデルである。 In the above-described example, the generation of the composite image in which the face and the voice are converted has been described, but the present invention is not limited to this. It is also within the scope of the present invention to generate a composite image obtained by converting at least a part of the person shown in the base image and at least a part of the person shown in the user image, which is not limited to the face and the voice. For example, instead of the face conversion model and the voice conversion model described above, a conversion model that converts at least a part of the person shown in the base image and at least a part of the person shown in the user image is constructed. This can be achieved by a process similar to the process described above with reference to FIG. A conversion model that converts at least a part of the person in the base image and at least a part of the person in the user image is, for example, at least a part of the person in the base image obtained from the base image. And a model learned using at least a part of the person in the user image obtained from the user image.

例えば、ベース画像およびユーザ画像に写っている人物の少なくとも一部は、写っている人物の体型であり得る。例えば、ベース画像に写っている人物の体型とユーザ画像に写っている人物の体型とを変換する変換モデルを構築するように、図４を参照して上述した処理と同様の処理を行うことにより、ベース画像に写っている人物の体型とユーザ画像に写っている人物の体型とを変換した合成画像を生成することができる。これにより、ベース画像（例えば、映画）に出演する俳優の体型をユーザ自身の体型に変換した合成画像を生成することができる。 For example, at least a part of the person appearing in the base image and the user image may be a figure of the person appearing. For example, by performing a process similar to the process described above with reference to FIG. 4 so as to construct a conversion model that converts the body shape of the person shown in the base image and the body shape of the person shown in the user image. Thus, it is possible to generate a composite image obtained by converting the figure of a person shown in the base image and the figure of a person shown in the user image. As a result, it is possible to generate a composite image in which the body shape of an actor appearing in a base image (for example, a movie) is converted into the body shape of the user.

例えば、ベース画像およびユーザ画像に写っている人物の少なくとも一部は、写っている人物の顔、音声、体型であり得る。これにより、ベース画像（例えば、映画）に出演する俳優の顔、音声、体型をユーザ自身の顔、音声体型に変換した合成画像を生成することができ、これは、あたかもユーザ自身が映画に出演したかのような合成画像を見るという新たなメディア体験をユーザに提供することにつながる。 For example, at least a part of the person appearing in the base image and the user image may be the face, voice, and body of the person appearing. As a result, it is possible to generate a composite image in which the face, voice, and body of an actor appearing in a base image (for example, a movie) are converted into the user's own face and voice body, as if the user himself appeared in the movie This leads to providing the user with a new media experience of seeing a composite image as if he had done it.

上述した例では、ベース画像に写っている人物の顔をユーザ画像に写っている人物の顔に変換した顔映像を生成し、その顔映像をベース画像の映像に合成することにより合成画像を生成することを説明したが、本発明はこれに限定されない。ユーザ画像に写っている人物の顔をベース画像に写っている人物の顔に変換した顔映像を生成し、その顔映像をユーザ画像に合成することにより合成画像を生成することも本発明の範囲内である。 In the above example, a face image is generated by converting the face of the person in the base image into the face of the person in the user image, and the face image is synthesized with the image of the base image to generate a synthesized image. However, the present invention is not limited to this. It is also within the scope of the present invention to generate a face image in which a face of a person in a user image is converted into a face of a person in a base image, and generate a composite image by synthesizing the face image with the user image. Is within.

これは、例えば、ユーザが撮影した画像に、俳優、芸能人、スポーツ選手等の顔を合成した画像を生成することである。例えば、ダンスを踊っているユーザを撮影した画像に、俳優の顔を合成することにより、あたかもその俳優がダンスを踊っているかのような合成画像を生成することができる。例えば、漫才をしているユーザを撮影した画像に、芸能人の顔を合成することにより、あたかもその芸能人が漫才をしているかのような合成画像を生成することができる。これにより、俳優、芸能人、スポーツ選手等の個性が、時間および／またはフィジカルを超えて流通することが可能になり、新たなコンテンツが創造され、ユーザに新たなメディア体験を提供することができる。 This is, for example, to generate an image in which faces of actors, entertainers, athletes, and the like are combined with an image captured by the user. For example, by synthesizing an actor's face with an image of a user who is dancing, it is possible to generate a synthetic image as if the actor is dancing. For example, by combining the face of an entertainer with an image of a user who is a comic artist, it is possible to generate a composite image as if the entertainer is a comic artist. As a result, personalities of actors, entertainers, athletes, and the like can be distributed over time and / or physical, new content is created, and a new media experience can be provided to the user.

上述した処理によって生成された合成画像は、以下に説明する処理によって、ユーザに提供される。 The composite image generated by the above-described processing is provided to the user by the processing described below.

図５は、ユーザのための合成画像を提供するためのコンピュータシステム１０００における処理の一例を示す。図５に示される例では、コンピュータシステム１０００において合成画像を提供するための処理を説明する。図５に示される例は、合成画像を提供することについてユーザが何ら要求しなくとも自動的に合成画像が提供される場合である。 FIG. 5 shows an example of processing in the computer system 1000 for providing a composite image for a user. In the example shown in FIG. 5, processing for providing a composite image in the computer system 1000 will be described. The example shown in FIG. 5 is a case where a composite image is automatically provided without a user requesting to provide the composite image.

ステップＳ５０１において、サーバ装置２００のプロセッサ部２３０が、図４を参照して説明した処理によって生成された合成画像を通信インターフェース部２１０を介して端末装置１００に送信する。サーバ装置２００のプロセッサ部２３０は、合成画像を提供することの要求をユーザから受信することなし、自動的に合成画像を端末装置１００に送信する。端末装置１００は、通信インターフェース部１１０を介して合成画像を受信する。 In step S501, the processor unit 230 of the server device 200 transmits the composite image generated by the processing described with reference to FIG. 4 to the terminal device 100 via the communication interface unit 210. The processor unit 230 of the server device 200 automatically transmits the composite image to the terminal device 100 without receiving a request to provide the composite image from the user. The terminal device 100 receives the composite image via the communication interface unit 110.

例えば、ステップＳ５０１の前に、サーバ装置２００のプロセッサ部２３０が、合成画像を送信する相手先の端末装置１００を決定することを行ってもよい。例えば、サーバ装置２００のプロセッサ部２３０は、ランダムにまたは任意のルールに従って、合成画像を送信する相手先の端末装置１００を決定する。サーバ装置２００のプロセッサ部２３０は、例えば、所定のタイミングで決定された端末装置１００に合成画像を送信する。所定のタイミングは、例えば、端末装置１００が所定の画像を再生する直前、端末装置１００が所定の画像を再生した直後、所定の画像を再生している間等であり得る。 For example, before step S501, the processor unit 230 of the server device 200 may determine the terminal device 100 to which the composite image is to be transmitted. For example, the processor unit 230 of the server device 200 determines the terminal device 100 to which the composite image is to be transmitted, randomly or according to an arbitrary rule. The processor unit 230 of the server device 200 transmits the synthesized image to the terminal device 100 determined at a predetermined timing, for example. The predetermined timing may be, for example, immediately before the terminal device 100 reproduces the predetermined image, immediately after the terminal device 100 reproduces the predetermined image, or while the terminal device 100 reproduces the predetermined image.

合成画像が受信されると、ステップＳ５０２において、端末装置１００のプロセッサ部１５０が表示部１３０を介して合成画像を出力する。 When the combined image is received, the processor unit 150 of the terminal device 100 outputs the combined image via the display unit 130 in step S502.

図５に示される処理を実現すると、例えば、ユーザがＹｏｕｔｕｂｅ等の動画投稿サイトで動画を視聴しているとき、突然、或る広告動画において演者の顔がユーザの顔に変換された合成広告動画が再生されることになる。これは、例えば、所定のタイミングでＹｏｕｔｕｂｅ等の動画投稿サイトから、サーバ装置２００が提供する合成広告動画を提供するサイトに移動させてそのサイト上で合成広告動画を提供することによって達成され得る。あるいは、例えば、後述する単一の情報処理装置である場合には、Ｙｏｕｔｕｂｅ等の動画投稿サイトから移動させることなく、情報処理装置がローカルで合成広告動画を生成して、それを再生することによって達成され得る。 When the processing illustrated in FIG. 5 is realized, for example, when a user is watching a moving image on a moving image posting site such as YouTube, a composite advertisement moving image in which the face of the performer is suddenly converted to the user's face in a certain advertisement moving image Will be played. This can be achieved, for example, by moving from a video posting site such as YouTube to a site providing a composite advertisement video provided by the server device 200 at a predetermined timing and providing the composite advertisement video on the site. Alternatively, for example, in the case of a single information processing device described later, the information processing device locally generates a synthetic advertisement moving image without moving from a moving image posting site such as YouTube and reproduces the generated moving image. Can be achieved.

図５に示される処理を実現すると、例えば、ユーザがＴＶ番組を視聴しているときに、突然、或る広告動画において演者の顔がユーザの顔に変換された合成広告動画が再生されることになる。これは、例えば、後述する単一の情報処理装置である場合に、放送された画像に基づいて情報処理装置がローカルで合成広告画像を作成して、それを再生することによって達成され得る。 When the processing shown in FIG. 5 is realized, for example, when the user is watching a TV program, a composite advertisement moving image in which the face of the performer is converted to the user's face in a certain advertisement moving image is suddenly reproduced. become. This can be achieved, for example, when the information processing apparatus is a single information processing apparatus described later, and the information processing apparatus locally creates a composite advertisement image based on the broadcast image and reproduces the composite advertisement image.

これにより、端末装置１００のユーザは、予期せず突然パーソナライズされた画像に出会うことができ、これにより、ユーザは、新たなメディア体験をすることができる。さらに、或る広告動画において演者の顔がユーザの顔に変換されたパーソナライズされた広告動画を視聴することにより、ユーザは、例えば、その広告の商品（例えば、化粧品、整髪剤、服等）を自分が使用したらどのようになるか、その広告のサービス（例えば、エステ、トレーニングジム）を自分が利用したらどのようになるかを仮想的に体験することができる。これも新たなメディア体験につながる。 This allows the user of the terminal device 100 to unexpectedly come across a personalized image unexpectedly, thereby allowing the user to have a new media experience. Further, by watching a personalized advertisement video in which the performer's face has been converted to the user's face in a certain advertisement video, the user can, for example, display the product (eg, cosmetics, hairdressing agent, clothes, etc.) of the advertisement. You can virtually experience what it will be like if you use it, and what you will do if you use the advertising service (eg, beauty treatment, training gym). This also leads to a new media experience.

図６は、ユーザのための合成画像を提供するためのコンピュータシステム１０００における処理の一例を示す。図６に示される例では、コンピュータシステム１０００において合成画像を提供するための処理を説明する。図６に示される例は、ユーザが、合成画像を視聴可能であることの通知を受け、合成画像を提供することの要求をしたときに合成画像が提供される場合である。 FIG. 6 shows an example of processing in the computer system 1000 for providing a composite image for a user. In the example shown in FIG. 6, a process for providing a composite image in the computer system 1000 will be described. The example shown in FIG. 6 is a case where the user receives a notification that the composite image can be viewed and requests to provide the composite image, and the composite image is provided.

ステップＳ６０１において、サーバ装置２００のプロセッサ部２３０が、合成画像を提供可能であることの通知を通信インターフェース部２１０を介して端末装置１００に送信する。端末装置１００は、その通知を通信インターフェース部１１０を介して受信する。合成画像を提供可能であることの通知は、例えば、サーバ装置２００から端末装置１００に、直接的に送信されるものであってもよいし、間接的に送信されるものであってもよい。間接的に送信されるものは、例えば、端末装置１００において再生されている画像または再生されようとしている画像内に埋め込まれた通知であり得る。例えば、サーバ装置２００は、その画像の提供者に通知を埋め込むことの要求を送信し、その画像の提供者がこれに応答して通知を画像に埋め込み、通知が埋め込まれた画像が、端末装置１００に送信される。 In step S <b> 601, the processor unit 230 of the server device 200 transmits a notification that the synthesized image can be provided to the terminal device 100 via the communication interface unit 210. The terminal device 100 receives the notification via the communication interface unit 110. The notification that the composite image can be provided may be, for example, transmitted directly from server device 200 to terminal device 100, or may be transmitted indirectly. What is transmitted indirectly may be, for example, an image being reproduced in the terminal device 100 or a notification embedded in an image to be reproduced. For example, the server device 200 transmits a request to embed a notification to the provider of the image, and the provider of the image embeds the notification in the image in response to the request. Sent to 100.

通知を受信すると、端末装置１００のプロセッサ部１５０が、表示部１３０を介して、ユーザが合成画像を提供することの要求を入力することを可能にするインターフェースを提供する。ユーザはこのインターフェースを介して合成画像を提供することの要求を入力することができる。 Upon receiving the notification, the processor unit 150 of the terminal device 100 provides, via the display unit 130, an interface that allows a user to input a request to provide a composite image. The user can enter a request to provide a composite image via this interface.

ステップＳ６０２において、端末装置１００のプロセッサ部１５０が、ユーザから合成画像を提供することの要求を受信すると、ステップＳ６０３において、端末装置１００のプロセッサ部１５０が、合成画像を提供することの要求を通信インターフェース部１１０を介してサーバ装置２００に送信する。サーバ装置２００のプロセッサ部２３０は、この要求を通信インターフェース部２１０を介して受信する。 In step S602, when the processor unit 150 of the terminal device 100 receives a request to provide a composite image from the user, in step S603, the processor unit 150 of the terminal device 100 communicates the request to provide the composite image. The data is transmitted to the server device 200 via the interface unit 110. The processor unit 230 of the server device 200 receives this request via the communication interface unit 210.

合成画像を提供することの要求を受信すると、ステップＳ６０４において、サーバ装置２００のプロセッサ部２３０が、図４を参照して説明した処理によって生成された合成画像を通信インターフェース部２１０を介して端末装置１００に送信する。端末装置１００は、通信インターフェース部１１０を介して合成画像を受信する。 Upon receiving the request to provide the composite image, in step S604, the processor unit 230 of the server device 200 transmits the composite image generated by the processing described with reference to FIG. Send to 100. The terminal device 100 receives the composite image via the communication interface unit 110.

合成画像が受信されると、ステップＳ６０５において、端末装置１００のプロセッサ部１５０が表示部１３０を介して合成画像を出力する。 When the combined image is received, the processor unit 150 of the terminal device 100 outputs the combined image via the display unit 130 in step S605.

図６に示される処理を実現すると、例えば、ユーザがＹｏｕｔｕｂｅ等の動画投稿サイトで動画を視聴しているとき、合成画像を視聴可能であることを通知する画面が表示され、これに対して合成画像を提供することの要求を入力すると、或る広告動画において演者の顔がユーザの顔に変換された合成広告動画が再生されることになる。これは、例えば、Ｙｏｕｔｕｂｅ等の動画投稿サイト上に通知画面を表示させ、通知画面に応答すると、サーバ装置２００が提供する合成広告動画を提供するサイトに移動させてそのサイト上で合成広告動画を提供することによって達成され得る。あるいは、例えば、後述する単一の情報処理装置である場合には、Ｙｏｕｔｕｂｅ等の動画投稿サイトから移動させることなく、情報処理装置がローカルで合成広告動画を生成して、それを再生することによって達成され得る。 When the processing shown in FIG. 6 is realized, for example, when the user is watching a moving image on a moving image posting site such as YouTube, a screen notifying that the synthesized image can be viewed is displayed. When a request to provide an image is input, a composite advertisement moving image in which a performer's face has been converted to a user's face in a certain advertisement moving image is reproduced. This is, for example, by displaying a notification screen on a video posting site such as YouTube, and responding to the notification screen to move to a site that provides a composite advertisement video provided by the server device 200 and to display the composite advertisement video on the site. Can be achieved by providing. Alternatively, for example, in the case of a single information processing device described later, the information processing device locally generates a synthetic advertisement moving image without moving from a moving image posting site such as YouTube and reproduces the generated moving image. Can be achieved.

図６に示される処理を実現すると、例えば、ユーザがＴＶ番組を視聴しているときに、合成画像を視聴可能であることを通知する画面が表示され、これに対して合成画像を提供することの要求を入力すると、或る広告動画において演者の顔がユーザの顔に変換された合成広告動画が再生されることになる。これは、例えば、後述する単一の情報処理装置である場合に、データ放送等で通知画面を表示させ、通知画面に応答すると、放送された画像に基づいて情報処理装置がローカルで合成広告画像を作成して、それを再生することによって達成され得る。 When the processing shown in FIG. 6 is realized, for example, when the user is watching a TV program, a screen notifying that the composite image can be viewed is displayed, and the composite image is provided. Is input, a composite advertisement moving image in which the performer's face is converted to the user's face in a certain advertisement moving image is reproduced. This is, for example, in the case of a single information processing device described later, a notification screen is displayed by data broadcasting or the like, and when the notification screen is responded, the information processing device locally generates a synthetic advertisement image And play it back.

これにより、端末装置１００のユーザは、予期せずパーソナライズされた画像に出会うことができ、これにより、ユーザは、新たなメディア体験をすることができる。さらに、或る広告動画において演者の顔がユーザの顔に変換されたパーソナライズされた広告動画を視聴することにより、ユーザは、例えば、その広告の商品（例えば、化粧品、整髪剤、服等）を自分が使用したらどのようになるか、その広告のサービス（例えば、エステ、トレーニングジム）を自分が利用したらどのようになるかを仮想的に体験することができる。これも新たなメディア体験につながる。 Thereby, the user of the terminal device 100 can unexpectedly encounter a personalized image, and thereby the user can have a new media experience. Further, by watching a personalized advertisement video in which the performer's face has been converted to the user's face in a certain advertisement video, the user can, for example, display the product (eg, cosmetics, hairdressing agent, clothes, etc.) of the advertisement. You can virtually experience what it will be like if you use it, and what you will do if you use the advertising service (eg, beauty treatment, training gym). This also leads to a new media experience.

図６に示される例において、ステップＳ６０１の後、合成画像において成り代わることができるベース画像内の人物を選択することを可能にするインターフェースも提供するようにしてもよい。ユーザはこのインターフェースを介して、ベース画像内の成り代わりたい対象を選択することができる。このとき、ステップＳ６０４で端末装置１００に送信される合成画像は、ベース画像内の選択された人物の少なくとも一部をユーザ画像内の人物の少なくとも一部に変換した合成画像となる。この合成画像は、ステップＳ６０３の後に図４に示される処理の少なくとも一部を行うことにより生成されてもよいし、予め図４に示される処理を行うことにより生成されたものであってもよい。 In the example shown in FIG. 6, after step S601, an interface may be provided that allows to select a person in the base image that can be substituted in the composite image. Through this interface, the user can select an object to replace in the base image. At this time, the composite image transmitted to the terminal device 100 in step S604 is a composite image obtained by converting at least a part of the selected person in the base image into at least a part of the person in the user image. This composite image may be generated by performing at least a part of the processing illustrated in FIG. 4 after step S603, or may be generated by performing the processing illustrated in FIG. 4 in advance. .

図７は、ユーザのための合成画像を提供するためのコンピュータシステム１０００における処理の一例を示す。図７に示される例では、コンピュータシステム１０００において合成画像を提供するための処理を説明する。図７に示される例は、ユーザが、成り代わって登場したいベース画像を選択し、選択されたベース画像に基づいて生成された合成画像が提供される場合である。 FIG. 7 shows an example of processing in the computer system 1000 for providing a composite image for a user. In the example illustrated in FIG. 7, a process for providing a composite image in the computer system 1000 will be described. The example shown in FIG. 7 is a case where the user selects a base image that he or she wants to appear on behalf of the user, and a composite image generated based on the selected base image is provided.

ステップＳ７０１において、サーバ装置２００のプロセッサ部２３０が、合成画像を生成可能な複数のベース画像の選択肢を通信インターフェース部２１０を介して端末装置１００に送信する。端末装置１００は、その選択肢を通信インターフェース部１１０を介して受信する。 In step S <b> 701, the processor unit 230 of the server device 200 transmits to the terminal device 100 via the communication interface unit 210 the options of a plurality of base images capable of generating a composite image. The terminal device 100 receives the option via the communication interface unit 110.

選択肢を受信すると、端末装置１００のプロセッサ部１５０が、表示部１３０を介して、ユーザが複数のベース画像の選択肢のうちの少なくとも１つを選択することを可能にするインターフェースを提供する。ユーザはこのインターフェースを介して複数のベース画像のうち、成り代わって登場したい少なくとも１つのベース画像を選択することができる。 Upon receiving the option, the processor unit 150 of the terminal device 100 provides, via the display unit 130, an interface that allows the user to select at least one of a plurality of base image options. Through this interface, the user can select at least one base image that the user wants to appear on behalf of the plurality of base images.

ステップＳ７０２において、端末装置１００のプロセッサ部１５０が、ユーザからベース動画を選択する入力を受信すると、ステップＳ７０３において、端末装置１００のプロセッサ部１５０が、ベース動画を選択する入力を通信インターフェース部１１０を介してサーバ装置２００に送信する。サーバ装置２００のプロセッサ部２３０は、この入力を通信インターフェース部２１０を介して受信する。 In step S702, when the processor unit 150 of the terminal device 100 receives an input for selecting a base moving image from the user, in step S703, the processor unit 150 of the terminal device 100 transmits the input for selecting the base moving image to the communication interface unit 110. And transmits it to the server device 200 via the Internet. The processor unit 230 of the server device 200 receives this input via the communication interface unit 210.

ベース動画を選択する入力を受信すると、ステップＳ６０１において、サーバ装置２００のプロセッサ部２３０が、選択されたベース画像に基づいて生成された合成画像を通信インターフェース部２１０を介して端末装置１００に送信する。端末装置１００は、通信インターフェース部１１０を介して合成画像を受信する。この合成画像は、ステップＳ７０３の後に図４に示される処理の少なくとも一部を行うことにより生成されてもよいし、予め図４に示される処理を行うことにより生成されたものであってもよい。 Upon receiving an input for selecting a base moving image, in step S601, the processor unit 230 of the server device 200 transmits a composite image generated based on the selected base image to the terminal device 100 via the communication interface unit 210. . The terminal device 100 receives the composite image via the communication interface unit 110. This composite image may be generated by performing at least a part of the processing illustrated in FIG. 4 after step S703, or may be generated by performing the processing illustrated in FIG. 4 in advance. .

合成画像が受信されると、ステップＳ７０５において、端末装置１００のプロセッサ部１５０が表示部１３０を介して合成画像を出力する。 When the combined image is received, in step S705, the processor unit 150 of the terminal device 100 outputs the combined image via the display unit 130.

図７に示される処理を実現すると、例えば、ユーザが登場することができる複数の映画の選択肢を提供され、ユーザが複数の映画のうちの１つを選択すると、選択された映画において演者の顔がユーザの顔に変換された合成映画が再生されることになる。 When the processing shown in FIG. 7 is implemented, for example, a plurality of movie choices in which the user can appear are provided, and when the user selects one of the plurality of movies, the face of the performer in the selected movie is selected. Is converted to the face of the user.

これにより、端末装置１００のユーザは、ユーザは自分の好みに応じたパーソナライズされた画像を見ることができ、これにより、ユーザは、新たなメディア体験をすることができる。 Thereby, the user of the terminal device 100 can see the personalized image according to his / her preference, and thus the user can have a new media experience.

図７に示される例において、ステップＳ７０１の後、合成画像において成り代わることができるベース画像内の人物を選択することを可能にするインターフェースも提供するようにしてもよい。ユーザはこのインターフェースを介して、ベース画像内の成り代わりたい対象を選択することができる。このとき、ステップＳ７０４で端末装置１００に送信される合成画像は、選択されたベース画像内の選択された人物の少なくとも一部をユーザ画像内の人物の少なくとも一部に変換した合成画像となる。この合成画像は、ステップＳ７０３の後に図４に示される処理の少なくとも一部を行うことにより生成されてもよいし、予め図４に示される処理を行うことにより生成されたものであってもよい。 In the example shown in FIG. 7, after step S701, an interface may be provided that allows to select a person in the base image that can be substituted in the composite image. Through this interface, the user can select an object to replace in the base image. At this time, the composite image transmitted to the terminal device 100 in step S704 is a composite image obtained by converting at least a part of the selected person in the selected base image into at least a part of the person in the user image. This composite image may be generated by performing at least a part of the processing illustrated in FIG. 4 after step S703, or may be generated by performing the processing illustrated in FIG. 4 in advance. .

上述した例では、図４、図５、図６、図７に示される各ステップの処理の各ステップを端末装置１００またはサーバ装置２００において実行する例を説明したが、本発明は、これに限定されない。図４、図５、図６、図７に示される各ステップの処理の各ステップは、プロセッサ部を備える少なくとも１つの情報処理装置によって実行されることができる。すなわち、端末装置１００について上述したステップおよびサーバ装置２００について上述したステップの両方の処理を行うことができる単一の情報処理装置も本発明の範囲内である。 In the above-described example, an example in which each step of the processing of each step shown in FIGS. 4, 5, 6, and 7 is executed in the terminal device 100 or the server device 200 has been described, but the present invention is not limited to this. Not done. Each step of the processing of each step shown in FIGS. 4, 5, 6, and 7 can be executed by at least one information processing device including a processor unit. That is, a single information processing apparatus capable of performing both the above-described steps for the terminal apparatus 100 and the above-described steps for the server apparatus 200 is also within the scope of the present invention.

図４、図５、図６、図７を参照して上述した例では、図４、図５、図６、図７に示される各ステップの処理は、プロセッサ部１５０およびメモリ部１４０に格納されたプログラムまたは、プロセッサ部２３０およびメモリ部２２０に格納されたプログラムによって実現することが説明されたが、本発明はこれに限定されない。図４、図５、図６、図７に示される各ステップの処理のうちの少なくとも１つは、制御回路などのハードウェア構成によって実現されてもよい。 In the example described above with reference to FIGS. 4, 5, 6, and 7, the processing of each step illustrated in FIGS. 4, 5, 6, and 7 is stored in the processor unit 150 and the memory unit 140. It has been described that the present invention is implemented by a program stored in the processor unit 230 or the memory unit 220, but the present invention is not limited to this. At least one of the processes of each step shown in FIGS. 4, 5, 6, and 7 may be realized by a hardware configuration such as a control circuit.

本発明は、上述した実施形態に限定されるものではない。本発明は、特許請求の範囲によってのみその範囲が解釈されるべきであることが理解される。当業者は、本発明の具体的な好ましい実施形態の記載から、本発明の記載および技術常識に基づいて等価な範囲を実施することができることが理解される。 The present invention is not limited to the embodiments described above. It is understood that the scope of the present invention should be construed only by the claims. It is understood that those skilled in the art can implement equivalent ranges based on the description of the present invention and common general knowledge from the description of the specific preferred embodiments of the present invention.

本発明は、ユーザの他人に成り代わってみたいという願望を叶えてくれるサービスを実現することが可能なサーバ装置等を提供するものとして有用である。これにより、新たなメディア体験をユーザに提供することが可能である。 INDUSTRIAL APPLICABILITY The present invention is useful as providing a server device or the like capable of realizing a service that fulfills a desire to impersonate another user. Thereby, it is possible to provide a new media experience to the user.

１００端末装置
２００サーバ装置
３００データベース部
４００ネットワーク
５００ベース画像提供者の端末装置
１０００コンピュータシステム Reference Signs List 100 terminal device 200 server device 300 database unit 400 network 500 base image provider terminal device 1000 computer system

Claims

An information processing apparatus for generating a composite image for a user,
First acquisition means for acquiring at least one user image;
Second acquisition means for acquiring at least one base image for which modification is permitted;
An information processing apparatus comprising: a generation unit configured to generate a composite image based on the at least one base image and the at least one user image.

The information processing apparatus according to claim 1, further comprising a providing unit configured to provide the composite image to a user.

The information processing apparatus according to claim 2, wherein the providing unit automatically provides the user with the composite image without receiving a request to provide the composite image from the user.

The providing means,
Notifying the user that the composite image can be provided;
Receiving a request from the user to provide the composite image;
The information processing apparatus according to claim 2, further comprising: providing the composite image to the user in response to receiving a request to provide the composite image from the user.

The second acquisition unit acquires a plurality of base images,
The providing means,
Providing the user with a choice of the plurality of base images;
Receiving from the user an input to select at least one of the plurality of base images;
Responsive to receiving input from the user to select at least one of the plurality of base images, the image generated based on the selected at least one base image and the at least one user image The information processing apparatus according to claim 2, wherein the information processing apparatus performs: providing a composite image to the user.

A plurality of persons are shown in the at least one base image;
The providing means,
Providing the user with a choice of a plurality of persons in the at least one base image;
Receiving from the user an input selecting at least one of a plurality of persons in the at least one base image;
Providing the composite image to the user in response to receiving from the user an input to select at least one of the plurality of persons in the at least one base image;
The composite image is a composite image obtained by converting at least a part of the selected at least one person in the at least one base image and at least a part of a person in the at least one user image. Item 6. The information processing apparatus according to any one of Items 2, 4 to 5.

The generation means,
7. Generating a composite image obtained by converting at least a part of the at least one base image and at least a part of a person in the at least one user image. 8. Information processing device.

The generation means,
Generating a composite image obtained by transforming a person's face in the at least one base image and a person's face in the at least one user image. Information processing device.

The base image and the user image include sound,
The generation means,
The information processing apparatus according to claim 1, further comprising: generating a combined image obtained by converting voice of a person in the at least one base image and voice of a person in the at least one user image.

The generation means,
The method according to any one of claims 1 to 9, further comprising: generating a combined image obtained by converting a body shape of a person in the at least one base image and a body shape of a person in the at least one user image. Information processing device.

The first obtaining means obtains user images of a plurality of persons including a first person and a second person,
The generation means,
Converting at least a portion of a first person in the at least one base image and at least a portion of a first person in the user image, and converting at least one of a second person in the at least one base image; The information processing apparatus according to any one of claims 1 to 10, further comprising: generating a composite image obtained by converting a part and at least a part of a second person in the user image.

The generation means,
The at least one at least a portion of the first person in the user image is to be synthesized based on a relationship between a first person in the user image and a second person in the user image. Determining a first person in the base image and determining a second person in the at least one base image to which at least a portion of the second person in the user image is to be combined; The information processing device according to claim 11.

Each of the at least one base image includes a set of sub-base images, each sub-base image having the same content, but a different person in the image,
The generation means,
Determining a sub-base image that includes a person most similar to the person in the at least one user image;
Generating a combined image obtained by converting at least a part of the person in the determined sub-base image and at least a part of the person in the at least one user image. The information processing device according to claim 1.

The information processing apparatus according to claim 1, wherein the user image is a user's own image.

The information processing device according to claim 1, wherein the composite image is an advertisement moving image.

A program for generating a composite image for a user, wherein the program is executed in an information processing apparatus including a processor unit, and the program includes:
Obtaining at least one user image;
Obtaining at least one base image for which modification is permitted;
Generating a composite image based on the at least one base image and the at least one user image.

A method for generating a composite image for a user, wherein the method is performed in an information processing apparatus including a processor unit, the method comprising:
The processor unit acquiring at least one user image;
The processor unit acquiring at least one base image for which modification is permitted;
The processor unit generating a composite image based on the at least one base image and the at least one user image.

A terminal device for providing a composite image for a user, wherein the terminal device can communicate with a server device, wherein the terminal device includes:
Acquiring means for acquiring at least one user image;
Transmitting means for transmitting the at least one user image to the server device;
A terminal comprising: a receiving unit that receives, from the server device, a composite image generated based on at least one base image for which modification is permitted and the at least one user image; and an output unit that outputs the composite image. apparatus.

The terminal device according to claim 18, further comprising a receiving unit configured to receive a permission to generate a composite image from a user.

A program for providing a composite image for a user, wherein the program is executed in a terminal device including a processor unit, the terminal device can communicate with a server device, and the program includes:
Obtaining at least one user image;
Transmitting the at least one user image to the server device;
Executing a process including, from the server device, receiving a composite image generated based on at least one base image for which modification is permitted and the at least one user image; and outputting the composite image. A program that causes the processor unit to perform the operation.

A method for providing a composite image for a user, the method being performed in a terminal device capable of communicating with a server device, the method comprising:
Obtaining at least one user image;
Transmitting the at least one user image to the server device;
Receiving, from the server device, a composite image generated based on at least one base image for which modification is permitted and the at least one user image;
A program that causes the processor unit to execute a process including outputting the composite image.

A computer system for generating a composite image for a user, the computer system comprising: a server device; and at least one terminal device capable of communicating with the server device.
The terminal device,
Obtaining at least one user image;
And transmitting the at least one user image to the server device.
The server device,
Receiving from the server device acquiring the at least one user image;
Obtaining at least one base image for which modification is permitted;
Generating a composite image based on the at least one base image and the at least one user image;
Transmitting the composite image to the terminal device.
The terminal device,
Receiving the composite image from the server device;
Outputting the composite image.