JP7448879B2

JP7448879B2 - Image generation method, system, and computer program

Info

Publication number: JP7448879B2
Application number: JP2020032353A
Authority: JP
Inventors: 航平渡邉
Original assignee: Brother Industries Ltd
Current assignee: Brother Industries Ltd
Priority date: 2020-02-27
Filing date: 2020-02-27
Publication date: 2024-03-13
Anticipated expiration: 2040-02-27
Also published as: JP2021135822A

Description

本明細書は、機械学習モデルを用いたスタイル変換処理を含む画像データの生成技術に関する。 The present specification relates to an image data generation technique including style conversion processing using a machine learning model.

ニューラルネットワークなどを用い画像生成モデルを用いて画像のスタイルを変換する技術が知られている。例えば、特許文献１に記載された画像形成装置は、変換元の画像を示す画像データと、スタイル参照用の画像を示す画像データと、が入力されると、変換後の画像を示す画像データを出力する。変換後の画像は、変換元の画像のコンテンツに、スタイル参照用の画像のスタイルが適用された画像である。 2. Description of the Related Art There is a known technique for converting the style of an image using an image generation model using a neural network or the like. For example, the image forming apparatus described in Patent Document 1 inputs image data indicating a conversion source image and image data indicating a style reference image, and then outputs image data indicating the converted image. Output. The converted image is an image in which the style of the style reference image is applied to the content of the conversion source image.

特開２０１８－１３２８５５号公報JP2018-132855A 特開２０１１－１９７９９５号公報Japanese Patent Application Publication No. 2011-197995 特開２００４－２１３５９８号公報Japanese Patent Application Publication No. 2004-213598

しかしながら、上記技術では、１個の変換元の画像には、スタイル参照用の画像の１個のスタイルが適用されるに過ぎないために、柔軟なスタイル変換ができない可能性があった。 However, with the above technique, only one style of the style reference image is applied to one conversion source image, so there is a possibility that flexible style conversion cannot be performed.

本明細書は、柔軟なスタイル変換を実現可能な技術を開示する。 This specification discloses a technique that can realize flexible style conversion.

本明細書に開示された技術は、以下の適用例として実現することが可能である。 The technology disclosed in this specification can be implemented as the following application examples.

［適用例１］入力画像を示す入力画像データを取得する画像取得工程と、前記入力画像データを用いて、前記入力画像の一部である第１入力部分画像と、前記入力画像の一部であって前記第１入力部分画像とは異なる位置にある第２入力部分画像と、を特定する部分画像特定工程と、前記第１入力部分画像を示す第１部分画像データに対して、機械学習モデルを用いた第１スタイル変換処理を実行して、第１変換済部分画像を示す第１変換済データを生成する第１変換工程と、前記第２入力部分画像を示す第２部分画像データに対して、機械学習モデルを用いた第２スタイル変換処理であって前記第１スタイル変換処理とは異なる前記第２スタイル変換処理を実行して、第２変換済部分画像を示す第２変換済データを生成する第２変換工程と、第１変換済データと前記第２変換済データとを用いて、前記入力画像に基づく出力画像を示す出力画像データを生成する出力画像生成工程であって、前記出力画像は前記第１入力部分画像に対応する第１出力部分画像と前記第２入力部分画像に対応する第２出力部分画像とを含み、前記第１出力部分画像は前記第１変換済部分画像に基づく画像であり、前記第２出力部分画像は前記第２変換済部分画像に基づく画像である、前記出力画像生成工程と、を備える画像生成方法。 [Application example 1] An image acquisition step of acquiring input image data indicating an input image, and using the input image data, a first input partial image that is a part of the input image, and a first input partial image that is a part of the input image. a second input partial image located at a different position from the first input partial image; and a machine learning model for the first partial image data indicating the first input partial image. a first conversion step of executing a first style conversion process using a method to generate first converted data representing a first converted partial image; Then, the second style conversion process, which is a second style conversion process using a machine learning model and which is different from the first style conversion process, is executed to generate second converted data indicating the second converted partial image. a second conversion step of generating, and an output image generation step of generating output image data representing an output image based on the input image using the first converted data and the second converted data, the output The image includes a first output partial image corresponding to the first input partial image and a second output partial image corresponding to the second input partial image, the first output partial image being the first converted partial image. and the second output partial image is an image based on the second converted partial image.

上記構成によれば、第１入力部分画像を示す第１部分画像データに対して第１スタイル変換処理を実行して生成される第１変換済データと、第２入力部分画像を示す第２部分画像データに対して第２スタイル変換処理を実行して生成される第２変換済データと、を用いて、入力画像に基づく出力画像を示す出力画像データが生成される。出力画像は、第１変換済データによって示される第１変換済部分画像に基づく第１出力部分画像と、第２変換済データによって示される第２変換済部分画像に基づく第２出力部分画像と、を含む。このように、１個の入力画像データに対して第１スタイル変換処理と第２スタイル変換処理とを適用することで出力画像データを生成するので、柔軟なスタイル変換を実現することができる。
［適用例２]
適用例１に記載の画像生成方法であって、
前記第１スタイル変換処理は、第１スタイル画像を示す第１スタイル画像データを用いて実行され、
前記第２スタイル変換処理は、第２スタイル画像を示す第２スタイル画像データを用いて実行され、
前記第１変換済部分画像は、前記第１スタイル画像のスタイルが前記第１入力部分画像に適用された画像であり、
前記第２変換済部分画像は、前記第２スタイル画像のスタイルが前記第２入力部分画像に適用された画像である、画像生成方法。
［適用例３]
適用例１または２に記載の画像生成方法であって、
前記出力画像生成工程は、
第１変換済データと前記第２変換済データとを用いて、前記第１変換済部分画像と前記第２変換済部分画像とを含む中間画像を示す中間画像データを生成する第１工程と、
前記中間画像データに対して特定の後処理を実行して、前記出力画像データを生成する第２工程と、
を含む、画像生成方法。
［適用例４]
適用例３に記載の画像生成方法であって、
前記特定の後処理は、前記中間画像において、前記第１変換済部分画像と前記第１変換済部分画像に隣接する部分との間における画素値の差と、前記第２変換済部分画像と前記第２変換済部分画像に隣接する部分との間における画素値の差と、をそれぞれ低減する処理を含む、画像生成方法。
［適用例５]
適用例３または４に記載の画像生成方法であって、
前記特定の後処理は、機械学習モデルを用いた第３スタイル変換処理であって前記第１スタイル変換処理および前記第２スタイル変換処理とは異なる前記第３スタイル変換処理を含む、画像生成方法。
［適用例６]
適用例５に記載の画像生成方法であって、
前記第３スタイル変換処理は、前記入力画像データをスタイル画像データとして用いて実行される、画像生成方法。
［適用例７]
適用例６に記載の画像生成方法であって、
前記入力画像は、人物の顔を示す画像を含み、
前記特定の後処理は、前記入力画像データに対して、前記人物の顔の肌色を補正する処理を実行して、補正済みの前記入力画像データを生成する処理を含み、
前記第３スタイル変換処理は、補正済みの前記入力画像データをスタイル画像データとして用いて実行される、画像生成方法。
［適用例８]
適用例３～７のいずれかに記載の画像生成方法であって、
前記特定の後処理は、機械学習モデルを用いた第４スタイル変換処理であって前記第１スタイル変換処理および前記第２スタイル変換処理とは異なる前記第４スタイル変換処理を含み、
前記入力画像は、人物の顔を示す画像を含み、
前記第４スタイル変換処理は、前記人物の顔の表情を変更する処理である、画像生成方法。
［適用例９]
適用例１～８のいずれかに記載の画像生成方法であって、
前記入力画像は、人物の顔を示す画像を含み、
前記第１入力部分画像は、前記人物の顔を構成する第１の部位を示す画像であり、
前記第２入力部分画像は、前記人物の顔を構成する第２の部位であって前記第１の部位とは異なる位置にある前記第２の部位を示す画像である、画像生成方法。
［適用例１０]
適用例１～９のいずれかに記載の画像生成方法であって、さらに、
前記入力画像の種類を特定する種類特定工程を備え、
前記入力画像が第１種の入力画像である場合に、
前記第１変換工程では、前記第１部分画像データに対して第１種の前記第１スタイル変換処理が実行され、
前記第２変換工程では、前記第２部分画像データに対して第１種の前記第２スタイル変換処理が実行され、
前記入力画像が第２種の入力画像である場合に、
前記第１変換工程では、前記第１部分画像データに対して第２種の前記第１スタイル変換処理が実行され、
前記第２変換工程では、前記第２部分画像データに対して第２種の前記第２スタイル変換処理が実行される、画像生成方法。
［適用例１１]
適用例１０に記載の画像生成方法であって、
前記入力画像は、人物の顔を示す画像を含み、
前記入力画像の種類は、前記人物の性別、人種、表情、顔の角度のうちの少なくとも一部に関する種類である、画像生成方法。
［適用例１２]
適用例１～１１のいずれかに記載の画像生成方法であって、
前記第１スタイル変換処理は、前記第１入力部分画像と、生成すべき前記第１変換済部分画像と、の間の差異の程度を指定する第１パラメータを用いて実行され、
前記第２スタイル変換処理は、前記第２入力部分画像と、生成すべき前記第２変換済部分画像と、の間の差異の程度を指定する第２パラメータを用いて実行され、
前記第１パラメータと前記第２パラメータとは、独立して調整される、画像生成方法。
［適用例１３]
適用例１～１２のいずれかに記載の画像生成方法であって、さらに、
前記第１入力部分画像を示す前記第１部分画像データに対して実行すべき処理を選択する処理選択工程と、
前記第１部分画像データに対して、機械学習モデルを用いずに前記第１入力部分画像の少なくとも一部の色を変換する色変換処理を実行する色変換工程と、
を備え、
前記処理選択工程にて前記第１スタイル変換処理が選択される場合に、前記色変換工程を実行せずに、前記第１変換工程が実行され、
前記処理選択工程にて前記色変換処理が選択される場合に、前記第１変換工程を実行せずに、前記色変換工程が実行される、画像生成方法。
［適用例１４]
適用例１３に記載の画像生成方法であって、
前記入力画像は、人物の顔を示す画像を含み、
前記第１入力部分画像は、前記人物の目を示す画像であり、
前記色変換処理は、前記目を示す画像の白目の部分に対応する画素の値を、白を示す特定の値に変換する処理である、画像生成方法。
［適用例１５]
適用例１～１４のいずれかに記載の画像生成方法であって、さらに、
ユーザによる第１の入力に基づいて、前記第１スタイル変換処理のための第１入力情報を取得し、ユーザによる第２の入力に基づいて、前記第２スタイル変換処理のための第２入力情報を取得する情報取得工程を備え、
前記第１変換工程では、前記第１入力情報を用いて前記第１スタイル変換処理が実行され、
前記第２変換工程では、前記第２入力情報を用いて前記第２スタイル変換処理が実行される、画像生成方法。
［適用例１６]
適用例１５に記載の画像生成方法であって、
前記第１入力情報は、前記第１入力部分画像に対応する画像であって前記第１入力部分画像とは異なるスタイルを有する画像を示すデータを含み、
前記第２入力情報は、前記第２入力部分画像に対応する画像であって前記第２入力部分画像とは異なるスタイルを有する画像を示すデータを含む、画像生成方法。
［適用例１７]
適用例１～１６のいずれかに記載の画像生成方法であって、
前記入力画像は、人物の顔を示す画像を含み、
前記第２入力部分画像は、前記人物の口を示す画像であり、
前記第２スタイル変換処理は、前記口を示す画像において歯列を矯正する処理である、画像生成方法。 According to the above configuration, the first converted data generated by performing the first style conversion process on the first partial image data representing the first input partial image, and the second portion representing the second input partial image. Output image data representing an output image based on the input image is generated using the second converted data generated by performing the second style conversion process on the image data. The output image includes a first output partial image based on the first converted partial image indicated by the first converted data, and a second output partial image based on the second converted partial image indicated by the second converted data. including. In this way, since output image data is generated by applying the first style conversion process and the second style conversion process to one piece of input image data, flexible style conversion can be realized.
[Application example 2]
The image generation method described in Application Example 1, comprising:
The first style conversion process is performed using first style image data indicating a first style image,
The second style conversion process is performed using second style image data indicating a second style image,
The first converted partial image is an image in which the style of the first style image is applied to the first input partial image,
The second converted partial image is an image in which the style of the second style image is applied to the second input partial image.
[Application example 3]
The image generation method according to Application Example 1 or 2,
The output image generation step includes:
a first step of generating intermediate image data representing an intermediate image including the first converted partial image and the second converted partial image using the first converted data and the second converted data;
a second step of performing specific post-processing on the intermediate image data to generate the output image data;
An image generation method, including:
[Application example 4]
The image generation method described in Application Example 3,
The specific post-processing includes, in the intermediate image, a difference in pixel values between the first converted partial image and a portion adjacent to the first converted partial image, and a difference between the second converted partial image and the pixel value. An image generation method comprising: reducing a difference in pixel values between a second converted partial image and an adjacent portion.
[Application example 5]
The image generation method according to Application Example 3 or 4,
The image generation method, wherein the specific post-processing includes the third style conversion process that is a third style conversion process using a machine learning model and is different from the first style conversion process and the second style conversion process.
[Application example 6]
The image generation method described in Application Example 5,
The third style conversion process is performed using the input image data as style image data.
[Application example 7]
The image generation method described in Application Example 6,
The input image includes an image showing a person's face,
The specific post-processing includes processing for correcting the skin color of the person's face on the input image data to generate the corrected input image data,
The third style conversion process is performed using the corrected input image data as style image data.
[Application example 8]
The image generation method according to any one of Application Examples 3 to 7,
The specific post-processing includes the fourth style conversion process that is a fourth style conversion process using a machine learning model and is different from the first style conversion process and the second style conversion process,
The input image includes an image showing a person's face,
In the image generation method, the fourth style conversion process is a process of changing the facial expression of the person.
[Application example 9]
The image generation method according to any one of Application Examples 1 to 8,
The input image includes an image showing a person's face,
The first input partial image is an image showing a first part of the person's face,
In the image generation method, the second input partial image is an image showing the second part of the person's face and located at a different position from the first part.
[Application example 10]
The image generation method according to any one of Application Examples 1 to 9, further comprising:
comprising a type identifying step of identifying the type of the input image;
When the input image is a first type input image,
In the first conversion step, a first type of first style conversion process is performed on the first partial image data,
In the second conversion step, a first type of second style conversion process is performed on the second partial image data,
When the input image is a second type input image,
In the first conversion step, a second type of first style conversion process is performed on the first partial image data,
In the second conversion step, a second type of second style conversion process is performed on the second partial image data.
[Application example 11]
The image generation method according to Application Example 10,
The input image includes an image showing a person's face,
The type of the input image is a type related to at least part of the person's gender, race, facial expression, and facial angle.
[Application example 12]
The image generation method according to any one of Application Examples 1 to 11,
The first style conversion process is performed using a first parameter that specifies the degree of difference between the first input partial image and the first converted partial image to be generated,
The second style conversion process is performed using a second parameter that specifies the degree of difference between the second input partial image and the second converted partial image to be generated,
The first parameter and the second parameter are adjusted independently.
[Application example 13]
The image generation method according to any one of Application Examples 1 to 12, further comprising:
a process selection step of selecting a process to be performed on the first partial image data indicating the first input partial image;
a color conversion step of performing a color conversion process on the first partial image data to convert at least part of the color of the first input partial image without using a machine learning model;
Equipped with
When the first style conversion process is selected in the process selection step, the first conversion process is executed without executing the color conversion process,
An image generation method, wherein when the color conversion process is selected in the process selection step, the color conversion process is executed without executing the first conversion process.
[Application example 14]
The image generation method according to Application Example 13,
The input image includes an image showing a person's face,
The first input partial image is an image showing the eyes of the person,
The image generation method is characterized in that the color conversion process is a process of converting a value of a pixel corresponding to the white part of the eye in the image showing the eye into a specific value showing white.
[Application example 15]
The image generation method according to any one of Application Examples 1 to 14, further comprising:
Obtain first input information for the first style conversion process based on a first input by the user, and obtain second input information for the second style conversion process based on a second input by the user. Equipped with an information acquisition process to acquire
In the first conversion step, the first style conversion process is performed using the first input information,
In the image generation method, in the second conversion step, the second style conversion process is performed using the second input information.
[Application example 16]
The image generation method according to Application Example 15,
The first input information includes data indicating an image corresponding to the first input partial image and having a style different from that of the first input partial image,
The second input information includes data indicating an image corresponding to the second input partial image and having a style different from the second input partial image.
[Application example 17]
The image generation method according to any one of Application Examples 1 to 16,
The input image includes an image showing a person's face,
The second input partial image is an image showing the mouth of the person,
The second style conversion process is a process of correcting tooth alignment in the image showing the mouth.

なお、本明細書に開示された技術は、種々の形態で実現可能であり、例えば、システム、画像生成装置、これらの方法、装置、システムの機能を実現するためのコンピュータプログラム、そのコンピュータプログラムを記録した記録媒体、等の形態で実現することができる。 Note that the technology disclosed in this specification can be realized in various forms, such as a system, an image generation device, a computer program for realizing the functions of these methods, devices, and the system, and a computer program for implementing the computer program. This can be realized in the form of a recorded recording medium, etc.

本実施例のシステム１０００の構成を示すブロック図。FIG. 1 is a block diagram showing the configuration of a system 1000 of this embodiment. 生成ネットワーク群ＧＮＧの構成の説明図。An explanatory diagram of the configuration of the generation network group GNG. 第１実施例の端末装置２００が実行する処理のフローチャート。5 is a flowchart of processing executed by the terminal device 200 of the first embodiment. 入力画像Ｉｉｎと出力画像Ｉｏｕｔとの一例を示す図。The figure which shows an example of the input image Iin and the output image Iout. 選択画面の一例を示す図。The figure which shows an example of a selection screen. 第１実施例のサーバ１００が実行する処理のフローチャート。5 is a flowchart of processing executed by the server 100 of the first embodiment. 端末装置２００が実行する処理のフローチャート。5 is a flowchart of processing executed by the terminal device 200. 第２実施例の選択画面ＵＤを示す図。FIG. 7 is a diagram showing a selection screen UD in the second embodiment. 第２実施例のサーバ１００が実行する処理のフローチャート。7 is a flowchart of processing executed by the server 100 of the second embodiment.

Ａ．第１実施例
Ａ－１．システム１０００の構成
次に、実施の形態を実施例に基づき説明する。図１は、本実施例のシステム１０００の構成を示すブロック図である。システム１０００は、サーバ１００と、端末装置２００と、を備えている。第１実施例のシステム１０００は、入力画像データを用いて、出力画像を示す出力画像データを生成するための画像生成システムである。図１で破線で示すミシン３００は、第２実施例のシステムが備える構成要素であり、第１実施例のシステムが備える構成要素ではないので、第２実施例において説明する。 A. First Example A-1. Configuration of System 1000 Next, an embodiment will be described based on an example. FIG. 1 is a block diagram showing the configuration of a system 1000 of this embodiment. The system 1000 includes a server 100 and a terminal device 200. The system 1000 of the first embodiment is an image generation system for generating output image data representing an output image using input image data. The sewing machine 300 indicated by a broken line in FIG. 1 is a component included in the system of the second embodiment, and is not a component included in the system of the first embodiment, so it will be described in the second embodiment.

サーバ１００は、インターネットＩＴに接続された計算機である。サーバ１００は、サーバ１００のコントローラとしてのＣＰＵ１１０と、ＲＡＭなどの揮発性記憶装置１２０と、ハードディスクドライブやフラッシュメモリなどの不揮発性記憶装置１３０と、通信インタフェース（ＩＦ）１６０と、を備えている。通信インタフェース１６０は、インターネットＩＴと接続するための有線または無線のインタフェースである。 Server 100 is a computer connected to Internet IT. The server 100 includes a CPU 110 as a controller of the server 100, a volatile storage device 120 such as a RAM, a non-volatile storage device 130 such as a hard disk drive or flash memory, and a communication interface (IF) 160. The communication interface 160 is a wired or wireless interface for connecting to the Internet IT.

揮発性記憶装置１２０は、ＣＰＵ１１０が処理を行う際に生成される種々の中間データを一時的に格納するバッファ領域を提供する。不揮発性記憶装置１３０には、コンピュータプログラムＰＧｓと、スタイル画像データ群ＳＤＧ（後述）と、肌色データ群ＳＫＧ（後述）と、が格納されている。 The volatile storage device 120 provides a buffer area that temporarily stores various intermediate data generated when the CPU 110 performs processing. The nonvolatile storage device 130 stores a computer program PGs, a style image data group SDG (described later), and a skin color data group SKG (described later).

コンピュータプログラムＰＧｓとスタイル画像データ群ＳＤＧと肌色データ群ＳＫＧとは、例えば、サーバ１００の運用者によって提供され、サーバ１００にアップロードされる。ＣＰＵ１１０は、コンピュータプログラムＰＧｓを実行することにより、端末装置２００と協働して、後述する出力画像を生成する処理を実行する。 The computer program PGs, the style image data group SDG, and the skin color data group SKG are provided by, for example, an operator of the server 100 and uploaded to the server 100. By executing the computer program PGs, the CPU 110 cooperates with the terminal device 200 to execute processing for generating an output image, which will be described later.

コンピュータプログラムＰＧｓは、後述する複数個の生成ネットワークＧＮを含む生成ネットワーク群ＧＮＧをＣＰＵ１１０に実現させるコンピュータプログラムをモジュールとして含んでいる。 The computer program PGs includes, as a module, a computer program that causes the CPU 110 to realize a generation network group GNG including a plurality of generation networks GN, which will be described later.

端末装置２００は、例えば、スマートフォンなどの携帯型の端末装置である。端末装置２００は、端末装置２００のコントローラとしてのプロセッサであるＣＰＵ２１０と、ハードディスクドライブやフラッシュメモリなどの不揮発性記憶装置２２０と、ＲＡＭなどの揮発性記憶装置２３０と、ユーザの操作を受け取るタッチパネルなどの操作部２４０と、タッチパネルと重畳された液晶ディスプレイなどの表示装置２５０と、外部機器と通信を行うための無線の通信インタフェース２６０と、を備えている。端末装置２００は、無線ネットワークＮＷとインターネットＩＴとを介して、サーバ１００と通信可能に接続されている。 The terminal device 200 is, for example, a portable terminal device such as a smartphone. The terminal device 200 includes a CPU 210 which is a processor serving as a controller of the terminal device 200, a non-volatile storage device 220 such as a hard disk drive or a flash memory, a volatile storage device 230 such as a RAM, and a touch panel etc. that receives user operations. It includes an operation unit 240, a display device 250 such as a liquid crystal display superimposed on a touch panel, and a wireless communication interface 260 for communicating with external devices. The terminal device 200 is communicably connected to the server 100 via the wireless network NW and the Internet IT.

端末装置２００の不揮発性記憶装置２２０には、コンピュータプログラムＰＧｔが格納されている。コンピュータプログラムＰＧｔは、上述したサーバ１００の運用者によって提供され、例えば、インターネットＩＴを介して端末装置２００に接続された所定のサーバからダウンロードされる形態で提供される。ＣＰＵ２１０は、コンピュータプログラムＰＧｔを実行することにより、サーバ１００と協働して、後述する出力画像を生成する処理を実行する。 A computer program PGt is stored in the nonvolatile storage device 220 of the terminal device 200. The computer program PGt is provided by the operator of the server 100 described above, and is provided, for example, in the form of being downloaded from a predetermined server connected to the terminal device 200 via the Internet IT. By executing the computer program PGt, the CPU 210 cooperates with the server 100 to execute processing for generating an output image, which will be described later.

Ａ－２．生成ネットワーク群の構成
図２は、生成ネットワーク群ＧＮＧの構成の説明図である。生成ネットワーク群ＧＮＧは、図２（Ａ）のブロック図に示すように、４個の生成ネットワークＧＮ１～ＧＮ４を含んでいる。なお、破線で示す２個の生成ネットワークＧＮ４、ＧＮ５は、第２実施例において備えられるので、第２実施例にて説明する。 A-2. Configuration of Generation Network Group FIG. 2 is an explanatory diagram of the configuration of the generation network group GNG. The generation network group GNG includes four generation networks GN1 to GN4, as shown in the block diagram of FIG. 2(A). Note that the two generation networks GN4 and GN5 indicated by broken lines are provided in the second embodiment, so they will be explained in the second embodiment.

４個の生成ネットワークＧＮ１～ＧＮ４は、それぞれ、図２（Ｂ）に生成ネットワークＧＮとして示す構成を有している。生成ネットワークＧＮは、スタイル変換を行う機械学習モデルである。本実施例では、生成ネットワークＧＮは、論文「Xun Huang and Serge Belongie. Arbitrary style transfer in real-time with adaptive instance normalization. In ICCV, 2017.」に開示されている機械学習モデルである。 Each of the four generation networks GN1 to GN4 has a configuration shown as generation network GN in FIG. 2(B). The generative network GN is a machine learning model that performs style conversion. In this embodiment, the generative network GN is a machine learning model disclosed in the paper "Xun Huang and Serge Belongie. Arbitrary style transfer in real-time with adaptive instance normalization. In ICCV, 2017."

生成ネットワークＧＮには、コンテンツ画像データＣＤとスタイル画像データＳＤとから成るデータペアが入力される。コンテンツ画像データＣＤは、コンテンツ画像を示す画像データである。例えば、目用の生成ネットワークＧＮ１では、コンテンツ画像は、人物の目を示す画像（後述）である。スタイル画像データＳＤは、スタイル画像を示す画像データである。例えば、目用の生成ネットワークＧＮ１では、スタイル画像は、人物の目を示す画像であり、コンテンツ画像とは異なるスタイル（例えば、目の色調やメイクの特徴）を有する画像である。 A data pair consisting of content image data CD and style image data SD is input to the generation network GN. Content image data CD is image data indicating a content image. For example, in the generation network GN1 for eyes, the content image is an image showing a person's eyes (described later). Style image data SD is image data indicating a style image. For example, in the eye generation network GN1, the style image is an image showing a person's eyes, and is an image having a different style (for example, eye color tone and makeup characteristics) from the content image.

生成ネットワークＧＮは、データペアが入力されると、データペアに対して複数個のパラメータを用いた演算を実行して、変換済画像データＴＤを生成し、出力する。変換済画像データＴＤは、コンテンツ画像に対してスタイル画像のスタイルを適用して得られる変換済画像を示すデータである。例えば、変換済画像は、コンテンツ画像の形状（例えば、目の形状）を維持しつつ、スタイル画像のスタイルを有する画像である。 When a data pair is input, the generation network GN executes an operation using a plurality of parameters on the data pair, generates converted image data TD, and outputs the converted image data TD. The converted image data TD is data indicating a converted image obtained by applying the style of the style image to the content image. For example, the converted image is an image that has the style of the style image while maintaining the shape of the content image (eg, the shape of an eye).

本実施例では、コンテンツ画像データＣＤ、スタイル画像データＳＤ、および、変換済画像データＴＤは、複数個の画素を含む画像を示すビットマップデータであり、具体的には、ＲＧＢ値によって画素ごとの色を表すＲＧＢ画像データである。ＲＧＢ値は、３個の色成分の階調値（以下、成分値とも呼ぶ）、すなわち、Ｒ値、Ｇ値、Ｂ値を含むＲＧＢ表色系の色値である。これらの画像データＣＤ、ＳＤ、ＴＤによって示される画像のサイズは、互いに等しく、例えば、縦２５６画素×横２５６画素のサイズである。 In this embodiment, the content image data CD, style image data SD, and converted image data TD are bitmap data indicating an image including a plurality of pixels, and specifically, each pixel is divided by RGB values. This is RGB image data representing colors. The RGB value is a color value of the RGB color system including gradation values of three color components (hereinafter also referred to as component values), that is, an R value, a G value, and a B value. The sizes of the images indicated by these image data CD, SD, and TD are equal to each other, and are, for example, 256 pixels vertically by 256 pixels horizontally.

図２（Ｂ）に示すように、生成ネットワークＧＮは、エンコーダＥＣと、特徴結合部ＣＣと、強度調整部ＳＡと、デコーダＤＣと、を含んでいる。 As shown in FIG. 2(B), the generation network GN includes an encoder EC, a feature combination section CC, a strength adjustment section SA, and a decoder DC.

エンコーダＥＣには、コンテンツ画像データＣＤやスタイル画像データＳＤが入力される。エンコーダＥＣは、入力された画像データに対して、次元削減処理を実行して、入力された画像データの特徴を示す特徴データを生成する。エンコーダＥＣは、例えば、畳込処理(convolution)を行う畳込層を含む複数の層を有するニューラルネットワーク（Convolutional Neural Network）である。本実施例では、エンコーダＥＣには、ＶＧＧ１９と呼ばれるニューラルネットワークのうちの入力層からRElu4_1層までの部分が用いられる。ＶＧＧ１９は、ＩｍａｇｅＮｅｔと呼ばれる画像データベースに登録された画像データを用いてトレーニングされた学習済みのニューラルネットワークであり、その学習済みの演算パラメータは一般公開されている。本実施例では、エンコーダＥＣの演算パラメータには、公開された学習済みの演算パラメータが用いられる。 Content image data CD and style image data SD are input to the encoder EC. The encoder EC performs dimension reduction processing on the input image data to generate feature data indicating the characteristics of the input image data. The encoder EC is, for example, a neural network (Convolutional Neural Network) having a plurality of layers including a convolution layer that performs convolution processing. In this embodiment, the portion from the input layer to the RElu4_1 layer of the neural network called VGG19 is used for the encoder EC. VGG19 is a learned neural network trained using image data registered in an image database called ImageNet, and its learned calculation parameters are publicly available. In this embodiment, published learned calculation parameters are used as the calculation parameters of the encoder EC.

特徴結合部ＣＣは、上記論文に開示された「AdaIN layer」である。特徴結合部ＣＣは、コンテンツ画像データＣＤをエンコーダＥＣに入力して得られる特徴データｆ（ｃ）と、スタイル画像データＳＤをエンコーダＥＣに入力して得られる特徴データｆ（ｓ）と、を用いて、変換特徴データｔを生成する。 The feature combination unit CC is the "AdaIN layer" disclosed in the above paper. The feature combination unit CC uses feature data f(c) obtained by inputting the content image data CD to the encoder EC and feature data f(s) obtained by inputting the style image data SD to the encoder EC. Then, converted feature data t is generated.

強度調整部ＳＡは、スタイル変換の強度を示すパラメータαを用いて、スタイル変換の強度を調整する。具体的には、強度調整部ＳＡは、パラメータαと、コンテンツ画像データＣＤの特徴データｆ（ｃ）と、変換特徴データｔと、を用いて、強度調整済みの変換特徴データｔ_adを生成する。変換特徴データｔ_adは、以下の式（１）で示される。
ｔ_ad＝（１－α）f（ｃ）＋αｔ …（１） The strength adjustment unit SA adjusts the strength of style transformation using a parameter α indicating the strength of style transformation. Specifically, the intensity adjustment unit SA generates the intensity-adjusted converted feature data t_ad using the parameter α, the feature data f(c) of the content image data CD, and the converted feature data t. The conversion feature data t_ad is expressed by the following equation (1).
t_ad=(1-α)f(c)+αt…(1)

パラメータαは、０＜α≦１の範囲の値を取る。パラメータαが１に近いほど、スタイル変換の強度が強くなる。換言すれば、パラメータαが１に近いほど、変換済画像データＴＤによって示される変換済画像は、スタイル画像に近づき、コンテンツ画像との差異が大きくなる。このために、パラメータαは、コンテンツ画像と変換済画像との間の差異の程度を指定するパラメータである、と言うことができる。パラメータαは、後述するように、ユーザによって指定される。パラメータαは、デコーダＤＣのトレーニング時には、１に設定される。 The parameter α takes a value in the range of 0<α≦1. The closer the parameter α is to 1, the stronger the style transformation becomes. In other words, the closer the parameter α is to 1, the closer the converted image indicated by the converted image data TD is to the style image, and the greater the difference from the content image. For this reason, it can be said that the parameter α is a parameter that specifies the degree of difference between the content image and the transformed image. The parameter α is specified by the user, as will be described later. The parameter α is set to 1 when training the decoder DC.

デコーダＤＣには、強度調整済みの変換特徴データｔ_adが入力される。デコーダＤＣは、デコーダＤＣは、変換特徴データｔ_adに対して、複数個の演算パラメータを用いて、エンコーダＥＣとは逆の次元復元処理を実行して、上述した変換済画像データＴＤを生成する。デコーダＤＣは、転置畳込処理（transposed convolution）を行う転置畳込層を含む複数の層を有するニューラルネットワークである。 The intensity-adjusted conversion feature data t_ad is input to the decoder DC. The decoder DC performs a dimension restoration process on the transformed feature data t_ad using a plurality of calculation parameters, which is opposite to that of the encoder EC, and generates the above-mentioned transformed image data TD. The decoder DC is a neural network having multiple layers including a transposed convolution layer that performs transposed convolution.

デコーダＤＣの複数個の演算パラメータは、以下のトレーニングによって調整される。学習用のコンテンツ画像データＣＤとスタイル画像データＳＤからなるデータペアが、所定数（例えば、数万個）分だけ準備される。これらのデータペアから選択される所定のバッチサイズ分のデータペアを用いて１回の調整処理が実行される。 A plurality of calculation parameters of the decoder DC are adjusted by the following training. A predetermined number (for example, tens of thousands) of data pairs consisting of learning content image data CD and style image data SD are prepared. One adjustment process is performed using a predetermined batch size of data pairs selected from these data pairs.

１回の調整処理では、バッチサイズ分のデータペアを用いて算出される損失関数Ｌが小さくなるように、所定のアルゴリズムに従って複数個の演算パラメータが調整される。所定のアルゴリズムには、例えば、誤差逆伝播法と勾配降下法とを用いたアルゴリズム（本実施例では、ａｄａｍ）が用いられる。 In one adjustment process, a plurality of calculation parameters are adjusted according to a predetermined algorithm so that the loss function L calculated using data pairs corresponding to the batch size becomes small. As the predetermined algorithm, for example, an algorithm (adam in this embodiment) using error backpropagation and gradient descent is used.

損失関数Ｌは、コンテンツ損失Ｌｃと、スタイル損失Ｌｓと、重みλを用いて、以下の式（２）で示される。
Ｌ＝Ｌｃ＋λＬｓ …（２） The loss function L is expressed by the following equation (2) using content loss Lc, style loss Ls, and weight λ.
L=Lc+λLs…(2)

コンテンツ損失Ｌｃは、本実施例では、変換済画像データＴＤの特徴データｆ（ｇ（ｔ））と、変換特徴データｔと、の間の損失（誤差とも呼ぶ）である。変換済画像データＴＤの特徴データｆ（ｇ（ｔ））は、用いるべきデータペアを生成ネットワークＧＮに入力して得られる変換済画像データＴＤを、さらに、エンコーダＥＣに入力することによって算出される。変換特徴データｔは、上述したように、用いるべきデータペアをエンコーダＥＣに入力して得られる特徴データｆ（ｃ）、ｆ（ｓ）を特徴結合部ＣＣに入力することによって算出される。 In this embodiment, the content loss Lc is a loss (also called an error) between the feature data f(g(t)) of the converted image data TD and the converted feature data t. The feature data f(g(t)) of the converted image data TD is calculated by further inputting the converted image data TD obtained by inputting the data pair to be used into the generation network GN to the encoder EC. . As described above, the converted feature data t is calculated by inputting the feature data f(c) and f(s) obtained by inputting the data pair to be used into the encoder EC to the feature combination unit CC.

スタイル損失Ｌｃは、変換済画像データＴＤをエンコーダＥＣに入力した場合にエンコーダＥＣの複数個の層からそれぞれ出力されるデータ群と、スタイル画像データＳＤをエンコーダＥＣに入力した場合にエンコーダＥＣの複数個の層からそれぞれ出力されるデータ群と、の間の損失である。 Style loss Lc is a data group output from multiple layers of encoder EC when converted image data TD is input to encoder EC, and a data group output from multiple layers of encoder EC when style image data SD is input to encoder EC. This is the loss between the data groups output from each layer.

以上のような調整処理を複数回に亘って繰り返される。これによって、コンテンツ画像データＣＤとスタイル画像データＳＤとが入力される場合に、コンテンツ画像に対してスタイル画像のスタイルを適用して得られる変換済画像を示す変換済画像データＴＤが出力できるように、生成ネットワークＧＮがトレーニングされる。 The above adjustment process is repeated multiple times. With this, when content image data CD and style image data SD are input, converted image data TD indicating a converted image obtained by applying the style of the style image to the content image can be output. , the generative network GN is trained.

生成ネットワークＧＮ１～ＧＮ４の基本的な構成は、図２（Ｂ）のネットワークＧＮに示す構成であるが、生成ネットワークＧＮ１～ＧＮ４のトレーニングに用いられるデータペアが互いに異なる。例えば、目用の生成ネットワークＧＮ１は、人物の目を示すデータペアを用いてトレーニングされている。鼻用の生成ネットワークＧＮ２は、人物の鼻を示すデータペアを用いてトレーニングされている。口用の生成ネットワークＧＮ３は、人物の口を示すデータペアを用いてトレーニングされている。顔用の生成ネットワークＧＮ４は、人物の顔の全体を示すデータペアを用いてトレーニングされている。このために、トレーニング済みの生成ネットワークＧＮ１～ＧＮ４では、複数個の演算パラメータの値が互いに異なっている。 The basic configuration of the generation networks GN1 to GN4 is the configuration shown in network GN in FIG. 2(B), but the data pairs used for training the generation networks GN1 to GN4 are different from each other. For example, the generation network GN1 for eyes is trained using data pairs representing human eyes. The generative network GN2 for the nose is trained using data pairs representing a person's nose. The mouth generation network GN3 is trained using data pairs representing a person's mouth. The face generation network GN4 is trained using data pairs representing the entire face of a person. For this reason, the values of a plurality of calculation parameters are different from each other in the trained generation networks GN1 to GN4.

Ａ－３．システムの動作
図３は、第１実施例の端末装置２００が実行する処理のフローチャートである。この処理は、サーバ１００が提供するスタイル変換サービスを利用して、入力画像データに対してスタイル変換を行って得られる出力画像データを取得する処理である。この処理は、例えば、端末装置２００のコンピュータプログラムＰＧｔが実行された状態で、ユーザの開始指示に基づいて開始される。 A-3. System Operation FIG. 3 is a flowchart of processing executed by the terminal device 200 of the first embodiment. This process is a process of performing style conversion on input image data using a style conversion service provided by the server 100 to obtain output image data obtained. This process is started based on a user's start instruction, for example, with the computer program PGt of the terminal device 200 being executed.

図３のＳ１０５では、端末装置２００のＣＰＵ２１０は、入力画像Ｉｉｎを示す入力画像データを取得する。ＣＰＵ２１０は、例えば、不揮発性記憶装置１３０に格納された複数個の画像データの中から、ユーザによって指定された画像データを入力画像データとして取得する。あるいは、ＣＰＵ２１０は、ユーザの撮影指示に応じて端末装置２００が備えるデジタルカメラ（図示省略）に撮影を実行させ、該撮影によって生成される画像データを入力画像データとして取得する。入力画像データは、例えば、ＲＧＢ画像データである。 In S105 of FIG. 3, the CPU 210 of the terminal device 200 obtains input image data indicating the input image Iin. For example, the CPU 210 obtains image data specified by the user from among a plurality of pieces of image data stored in the nonvolatile storage device 130 as input image data. Alternatively, the CPU 210 causes a digital camera (not shown) included in the terminal device 200 to execute photography in response to a user's photography instruction, and obtains image data generated by the photography as input image data. The input image data is, for example, RGB image data.

図４は、入力画像Ｉｉｎと出力画像Ｉｏｕｔとの一例を示す図である。図４（Ａ）に示すように、本実施例の入力画像Ｉｉｎは、人物の顔ＦＣの全体を含む写真を示す画像である。 FIG. 4 is a diagram showing an example of an input image Iin and an output image Iout. As shown in FIG. 4A, the input image Iin of this embodiment is an image showing a photograph including the entire face FC of a person.

図３のＳ１１０では、ＣＰＵ２１０は、入力画像データを用いて、入力画像Ｉｉｎを含む選択画面ＵＤａを表示装置２５０に表示する。図５は、選択画面の一例を示す図である。図５（Ａ）の選択画面ＵＤａは、入力画像Ｉｉｎと、入力画像Ｉｉｎの種類に関する選択指示（具体的には、性別および人種の選択指示）を入力するためのプルダウンメニューＰＭ１、ＰＭ２と、選択画面の切替指示を入力するためのボタンＢＴ１、ＢＴ２と、を含んでいる。 In S110 of FIG. 3, the CPU 210 uses the input image data to display the selection screen UDa including the input image Iin on the display device 250. FIG. 5 is a diagram showing an example of a selection screen. The selection screen UDa in FIG. 5A includes an input image Iin, and pull-down menus PM1 and PM2 for inputting selection instructions regarding the type of input image Iin (specifically, selection instructions for gender and race). It includes buttons BT1 and BT2 for inputting a selection screen switching instruction.

図３のＳ１１５では、ＣＰＵ２１０は、入力画像データをサーバ１００に送信する。なお、本実施例では、端末装置２００からサーバ１００へのデータの送信は、ＨＴＴＰ（Hypertext Transfer Protocol）に従うＨＴＴＰリクエストの送信として行われる。 In S115 of FIG. 3, the CPU 210 transmits the input image data to the server 100. In this embodiment, data is transmitted from the terminal device 200 to the server 100 as an HTTP request in accordance with HTTP (Hypertext Transfer Protocol).

サーバ１００が端末装置２００から送信される入力画像データを受信すると、サーバ１００のＣＰＵ１１０は、スタイル変換サービスを提供する処理を開始する。図６は、第１実施例のサーバ１００が実行する処理のフローチャートである。端末装置２００の図３の処理とサーバ１００の図６の処理とは、データの遣り取りを行いながら並行して実行される。 When the server 100 receives input image data transmitted from the terminal device 200, the CPU 110 of the server 100 starts processing to provide a style conversion service. FIG. 6 is a flowchart of processing executed by the server 100 of the first embodiment. The process shown in FIG. 3 by the terminal device 200 and the process shown in FIG. 6 by the server 100 are executed in parallel while exchanging data.

図６のＳ２０５では、サーバ１００のＣＰＵ１１０は、サーバ１００が端末装置２００から送信される入力画像データを受信する。図６のＳ２１０では、ＣＰＵ１１０は、入力画像データに対して所定の領域特定処理を実行して、入力画像Ｉｉｎの顔ＦＣに含まれる複数個の部位の領域を特定する。具体的には、図４（Ａ）に示すように、右目、左目、鼻、口の領域Ｐｅｒ、Ｐｅｌ、Ｐｎ、Ｐｍが特定される。領域特定処理には、公知の画像認識方法が用いられる。 In S205 of FIG. 6, the CPU 110 of the server 100 receives input image data transmitted from the terminal device 200. In S210 of FIG. 6, the CPU 110 performs a predetermined region specifying process on the input image data to specify regions of a plurality of body parts included in the face FC of the input image Iin. Specifically, as shown in FIG. 4(A), right eye, left eye, nose, and mouth regions Per, Pel, Pn, and Pm are specified. A known image recognition method is used for the area identification process.

例えば、ｙｏｌｏ(You only look once)と呼ばれる画像認識アルゴリズムは、畳込ニューラルネットワークを用いて、画像内のオブジェクトの位置と種類との認識を同時に行うことができる。本実施例では、右目、左目、鼻、口の４種類のオブジェクトの位置と種類を認識できるようにトレーニングされたｙｏｌｏの畳込ニューラルネットワークを用いて、右目、左目、鼻、口の領域Ｐｅｒ、Ｐｅｌ、Ｐｎ、Ｐｍが特定される。ｙｏｌｏは、例えば、論文「J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once:Unified, real-time object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 779-788.」に開示されている。 For example, an image recognition algorithm called yolo (You only look once) uses a convolutional neural network to simultaneously recognize the location and type of an object in an image. In this example, we use a YOLO convolutional neural network trained to recognize the positions and types of four types of objects: right eye, left eye, nose, and mouth. Pel, Pn, and Pm are specified. yolo can be used, for example, in the paper “J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once:Unified, real-time object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 779-788.

図６のＳ２１２では、ＣＰＵ１１０は、特定された複数個の部位の領域Ｐｅｒ、Ｐｅｌ、Ｐｎ、Ｐｍを示す領域情報、例えば、これらの領域の入力画像Ｉｉｎ内の位置とサイズとを示す領域情報を、端末装置２００に送信する。 In S212 of FIG. 6, the CPU 110 generates region information indicating the regions Per, Pel, Pn, and Pm of the plurality of identified body parts, for example, region information indicating the positions and sizes of these regions within the input image Iin. , is transmitted to the terminal device 200.

図３のＳ１２０では、端末装置２００のＣＰＵ２１０は、サーバ１００から送信される領域情報を受信し、該領域情報を用いて、複数個の部位の領域Ｐｅｒ、Ｐｅｌ、Ｐｎ、Ｐｍの特定結果を表示装置２５０に表示する。例えば、図５（Ａ）に示すように、選択画面ＵＤａの入力画像Ｉｉｎ上に、複数個の部位の領域Ｐｅｒ、Ｐｅｌ、Ｐｎ、Ｐｍを示す複数個の矩形の枠Ｓｅｒ、Ｓｅｌ、Ｓｎ、Ｓｍを表示する。なお、フローチャートでは省略するが、ＣＰＵ２１０は、矩形の枠Ｓｅｒ、Ｓｅｌ、Ｓｎ、Ｓｍの位置やサイズの修正指示がユーザから入力される場合には、該入力に応じて、対応する部位の領域Ｐｅｒ、Ｐｅｌ、Ｐｎ、Ｐｍの領域情報を修正する。修正後の領域情報は、サーバ１００に送信される。 In S120 of FIG. 3, the CPU 210 of the terminal device 200 receives the area information transmitted from the server 100, and uses the area information to display the identification results of the areas Per, Pel, Pn, and Pm of a plurality of body parts. Displayed on device 250. For example, as shown in FIG. 5(A), on the input image Iin of the selection screen UDa, a plurality of rectangular frames Ser, Sel, Sn, Sm indicating regions Per, Pel, Pn, Pm of a plurality of body parts are displayed. Display. Although omitted in the flowchart, when the user inputs an instruction to modify the position or size of the rectangular frames Ser, Sel, Sn, or Sm, the CPU 210 adjusts the area Per of the corresponding part in accordance with the input. , Pel, Pn, and Pm are corrected. The revised area information is sent to the server 100.

図３のＳ１２５では、ＣＰＵ２１０は、ユーザによって選択された性別と人種の情報をサーバ１００に送信する。例えば、図５（Ａ）のプルダウンメニューＰＭ１は、男性を示す選択肢と、女性を示す選択肢と、を含む。プルダウンメニューＰＭ２は、予め登録された人種を示す複数個の選択肢を含む。ユーザは、プルダウンメニューＰＭ１、ＰＭ２を操作して、複数個の選択肢のうちの１個の選択肢を選択して、ボタンＢＴ２を押下する。ＣＰＵ２１０は、ボタンＢＴ２が押下された時点で、プルダウンメニューＰＭ１、ＰＭ２にて選択されている選択肢に対応する性別および人種の情報を、サーバ１００に送信する。 In S125 of FIG. 3, the CPU 210 transmits information on the gender and race selected by the user to the server 100. For example, the pull-down menu PM1 in FIG. 5A includes an option indicating male and an option indicating female. The pull-down menu PM2 includes a plurality of choices indicating races registered in advance. The user operates the pull-down menus PM1 and PM2, selects one option from the plurality of options, and presses the button BT2. When the button BT2 is pressed, the CPU 210 transmits to the server 100 the gender and race information corresponding to the options selected in the pull-down menus PM1 and PM2.

図６のＳ２１５では、サーバ１００のＣＰＵ１１０は、端末装置２００から送信される性別および人種の情報を受信する。Ｓ２２０では、ＣＰＵ１１０は、受信された情報によって示される性別および人種に応じたスタイル画像データＳＤと肌色データとを、端末装置２００に送信する。例えば、サーバ１００の不揮発性記憶装置１３０に格納されたスタイル画像データ群ＳＤＧ（図１）は、性別および人種の組み合わせごとに、複数個のスタイル画像データＳＤを含んでいる。性別および人種の１つの組み合わせに対応する複数個のスタイル画像データＳＤは、顔の部位（本実施例では目、口、鼻）ごとに、顔の部位をそれぞれ示す複数個のスタイル画像データＳＤを含んでいる。例えば、受信された情報によって示される性別および人種に対応する複数個のスタイル画像データＳＤが、端末装置２００に送信される。サーバ１００の不揮発性記憶装置１３０に格納された肌色データ群ＳＫＧ（図１）は、性別および人種の組み合わせごとに、複数個の肌色データ（例えば、肌色を示すＲＧＢ値）を含んでいる。例えば、受信された情報によって示される性別および人種に対応する複数個の肌色データが端末装置２００に送信される。 In S215 of FIG. 6, the CPU 110 of the server 100 receives the gender and race information transmitted from the terminal device 200. In S220, CPU 110 transmits style image data SD and skin color data according to the gender and race indicated by the received information to terminal device 200. For example, the style image data group SDG (FIG. 1) stored in the nonvolatile storage device 130 of the server 100 includes a plurality of style image data SD for each combination of gender and race. A plurality of style image data SD corresponding to one combination of gender and race are a plurality of style image data SD each indicating a facial region for each facial region (in this example, eyes, mouth, and nose). Contains. For example, a plurality of style image data SD corresponding to the gender and race indicated by the received information are transmitted to the terminal device 200. The skin color data group SKG (FIG. 1) stored in the nonvolatile storage device 130 of the server 100 includes a plurality of pieces of skin color data (for example, RGB values indicating skin color) for each combination of gender and race. For example, a plurality of pieces of skin color data corresponding to the gender and race indicated by the received information are transmitted to the terminal device 200.

図３のＳ１２７では、端末装置２００のＣＰＵ２１０は、サーバ１００から送信されるスタイル画像データＳＤと肌色データとを受信する。 In S127 of FIG. 3, the CPU 210 of the terminal device 200 receives the style image data SD and skin color data transmitted from the server 100.

図３のＳ１３０では、ＣＰＵ２１０は、入力画像Ｉｉｎにて特定される顔の部位（目、鼻、口）の領域から注目領域を選択する。 In S130 of FIG. 3, the CPU 210 selects a region of interest from the regions of facial parts (eyes, nose, mouth) specified in the input image Iin.

図３のＳ１３５では、ＣＰＵ２１０は、注目領域用の選択画面を表示装置２５０に表示する。図５（Ｂ）の選択画面ＵＤｂは、目の領域用の選択画面である。選択画面ＵＤｂは、入力画像Ｉｉｎと、目のスタイル画像の選択指示を入力するための選択ウインドウＳＷｂと、目のスタイル変換の強度を入力するためのスライドバーＳＢｂと、ボタンＢＴ１、ＢＴ２と、を含んでいる。選択ウインドウＳＷｂは、選択肢として、Ｓ１２７にて受信された目の複数個のスタイル画像データＳＤによって示される複数個のスタイル画像ＳＩｅ１、ＳＩｅ２を含んでいる。図５（Ｃ）の選択画面ＵＤｃは、鼻の領域用の選択画面である。選択画面ＵＤｃは、後述する中間画像Ｉｍａと、鼻のスタイル画像の選択指示を入力するための選択ウインドウＳＷｃと、鼻のスタイル変換の強度を入力するためのスライドバーＳＢｃと、ボタンＢＴ１、ＢＴ２と、を含んでいる。選択ウインドウＳＷｃは、選択肢として、Ｓ１２７にて受信された鼻の複数個のスタイル画像データＳＤによって示される複数個のスタイル画像ＳＩｎ１、ＳＩｎ２を含んでいる。口の領域用の選択画面については図示を省略する。 In S135 of FIG. 3, the CPU 210 displays a selection screen for the region of interest on the display device 250. The selection screen UDb in FIG. 5(B) is a selection screen for the eye area. The selection screen UDb includes an input image Iin, a selection window SWb for inputting an eye style image selection instruction, a slide bar SBb for inputting the intensity of eye style conversion, and buttons BT1 and BT2. Contains. The selection window SWb includes, as options, a plurality of style images SIe1 and SIe2 indicated by the plurality of style image data SD of eyes received in S127. The selection screen UDc in FIG. 5C is a selection screen for the nose region. The selection screen UDc includes an intermediate image Ima to be described later, a selection window SWc for inputting an instruction to select a nose style image, a slide bar SBc for inputting the intensity of nose style conversion, and buttons BT1 and BT2. , contains. The selection window SWc includes, as options, a plurality of style images SIn1 and SIn2 indicated by the plurality of style image data SD of the nose received in S127. Illustration of the selection screen for the mouth area is omitted.

図３のＳ１４０では、ＣＰＵ２１０は、ユーザによって選択されたスタイル画像と強度とを示す情報をサーバ１００に送信する。例えば、注目領域が目の領域である場合には、ユーザは、図５（Ｂ）の選択ウインドウＳＷｂに表示された複数個のスタイル画像ＳＩｅ１、ＳＩｅ２の中から、用いるべき１個のスタイル画像を選択する。ユーザは、スライドバーＳＢｃのノブを操作して、用いるべき強度に対応する位置に移動させる。その後、ユーザは、ボタンＢＴ２を押下する。ＣＰＵ２１０は、ボタンＢＴ２が押下された時点で、選択ウインドウＳＷｂにて選択されているスタイル画像を示す情報（例えば、画像ＩＤ）と、スライドバーＳＢｂのノブの位置に対応する強度を示す情報（例えば、上述したパラメータα）と、をサーバ１００に送信する。 In S140 of FIG. 3, the CPU 210 transmits information indicating the style image and intensity selected by the user to the server 100. For example, when the attention area is the eye area, the user selects one style image to be used from among the multiple style images SIe1 and SIe2 displayed in the selection window SWb of FIG. 5(B). select. The user operates the knob of the slide bar SBc to move it to a position corresponding to the strength to be used. After that, the user presses button BT2. When the button BT2 is pressed, the CPU 210 displays information indicating the style image selected in the selection window SWb (e.g. image ID) and information indicating the intensity corresponding to the position of the knob of the slide bar SBb (e.g. , and the above-mentioned parameter α) are transmitted to the server 100.

図６のＳ２２５では、サーバ１００のＣＰＵ１１０は、注目領域について選択されたスタイル画像と強度とを示す情報を端末装置２００から受信する。 In S225 of FIG. 6, the CPU 110 of the server 100 receives information indicating the style image and intensity selected for the region of interest from the terminal device 200.

図６のＳ２２７では、ＣＰＵ１１０は、用いるべきスタイル画像データＳＤを取得する。
例えば、注目領域が目の領域Ｐｅｒ、Ｐｅｌである場合には、ＣＰＵ１１０は、Ｓ２２５にて受信された情報に基づいて、不揮発性記憶装置１３０に格納されたスタイル画像データ群ＳＤＧ（図１）から、用いるべき目のスタイル画像データＳＤを取得する。 In S227 of FIG. 6, the CPU 110 acquires the style image data SD to be used.
For example, when the attention area is the eye area Per or Pel, the CPU 110 selects the style image data group SDG (FIG. 1) stored in the nonvolatile storage device 130 based on the information received in S225. , obtain the eye style image data SD to be used.

図６のＳ２３０では、ＣＰＵ１１０は、注目領域のスタイル変換処理を実行する。ＣＰＵ１１０は、目の領域Ｐｅｒ、Ｐｅｌに対応する２個の部分画像ＰＩｅｒ、ＰＩｅｌ（図４（Ａ））を示す２個の部分画像データを、入力画像データからそれぞれ抽出する。ＣＰＵ１１０は、２個の部分画像データに対して、それぞれ、縮小処理または拡大処理を実行して、所定サイズ（本実施例では、縦２５６画素×横２５６画素）の２個の目のコンテンツ画像データＣＤを生成する。ＣＰＵ１１０は、右目のコンテンツ画像データＣＤとＳ２２７にて取得されたスタイル画像データＳＤとのデータペアを、目用の生成ネットワークＧＮ１に入力して、右目の変換済画像データＴＤを生成する。同様に、ＣＰＵ１１０は、左目のコンテンツ画像データＣＤとスタイル画像データＳＤとのデータペアを、目用の生成ネットワークＧＮ１に入力して、左目の変換済画像データＴＤを生成する。ＣＰＵ１１０は、生成された２個の変換済画像データＴＤに対して拡大処理または縮小処理を実行して、変換済画像データＴＤによって示される画像のサイズを元の部分画像と同じサイズに調整する。以下では、サイズが調整された後の変換済画像データＴＤを、変換済データと呼ぶ。注目領域が鼻の領域Ｐｎや口の領域Ｐｍである場合には、鼻用の生成ネットワークＧＮ２や口用の生成ネットワークＧＮ３を用いたスタイル変換処理によって、鼻や口の変換済データが生成される。 In S230 of FIG. 6, the CPU 110 executes style conversion processing for the region of interest. The CPU 110 extracts two pieces of partial image data representing two partial images PIer and PIel (FIG. 4(A)) corresponding to the eye regions Per and Pel from the input image data. The CPU 110 executes reduction processing or enlargement processing on each of the two partial image data to obtain content image data of two eyes of a predetermined size (in this embodiment, 256 pixels vertically x 256 pixels horizontally). Generate a CD. The CPU 110 inputs the data pair of the right eye content image data CD and the style image data SD acquired in S227 to the eye generation network GN1 to generate right eye converted image data TD. Similarly, the CPU 110 inputs the data pair of left-eye content image data CD and style image data SD to the eye generation network GN1 to generate left-eye converted image data TD. The CPU 110 executes an enlargement process or a reduction process on the two generated converted image data TD, and adjusts the size of the image indicated by the converted image data TD to be the same size as the original partial image. In the following, the converted image data TD whose size has been adjusted will be referred to as converted data. When the region of interest is the nose region Pn or the mouth region Pm, converted data of the nose and mouth is generated by style conversion processing using the nose generation network GN2 and the mouth generation network GN3. .

図６のＳ２３２では、ＣＰＵ１１０は、入力画像データのうちの注目領域に対応する部分画像データを変換済データに置換することによって、中間画像を示す中間画像データを生成する。図４（Ｂ）には、目の領域Ｐｅｒ、Ｐｅｌに対応する部分画像データが置換された後の中間画像Ｉｍａが示されている。中間画像Ｉｍａの顔ＦＣａでは、図４（Ａ）の入力画像Ｉｉｎの目の部分画像ＰＩｅｒ、ＰＩｅｌが、変換済データによって示される変換済部分画像ＴＩｅｒ、ＴＩｅｌに置換されている。中間画像Ｉｍａには、変換済部分画像ＴＩｅｒ、ＴＩｅｌと他の部分との境界に位置するスジＢＬが現れている。変換済部分画像ＴＩｅｒ、ＴＩｅｌと他の部分との境界では、画素の値が滑らかに変化しておらず、画素の値の差が大きくなっているためである。 In S232 of FIG. 6, the CPU 110 generates intermediate image data representing an intermediate image by replacing partial image data corresponding to the region of interest in the input image data with converted data. FIG. 4B shows an intermediate image Ima after the partial image data corresponding to the eye regions Per and Pel have been replaced. In the face FCa of the intermediate image Ima, the eye partial images PIer and PIel of the input image Iin in FIG. 4A are replaced with converted partial images TIer and TIel indicated by the converted data. In the intermediate image Ima, a streak BL located at the boundary between the converted partial images TIer, TIel and other parts appears. This is because the pixel values do not change smoothly at the boundaries between the converted partial images TIer and TIel and other parts, and the difference in pixel values becomes large.

図６のＳ２３５では、ＣＰＵ１１０は、中間画像データを端末装置２００に送信する。 In S235 of FIG. 6, CPU 110 transmits intermediate image data to terminal device 200.

図６のＳ２４０では、ＣＰＵ１１０は、顔の全ての部位の領域について処理されたか否かを判断する。未処理の領域がある場合には（Ｓ２４０：ＮＯ）、Ｓ２２５に処理が戻される。全ての部位の領域について処理された場合には（Ｓ２４０：ＹＥＳ）、Ｓ２４５に処理が進められる。 In S240 of FIG. 6, the CPU 110 determines whether all regions of the face have been processed. If there is an unprocessed area (S240: NO), the process returns to S225. If the regions of all parts have been processed (S240: YES), the process proceeds to S245.

図３のＳ１４５では、端末装置２００のＣＰＵ２１０は、サーバ１００から送信される中間画像データを受信する。Ｓ１４７では、ＣＰＵ２１０は、中間画像データを用いて、表示装置２５０に表示されている選択画面を更新する。例えば、図５（Ｃ）の選択画面ＵＤｃでは、入力画像Ｉｉｎに代えて、中間画像データによって示される中間画像Ｉｍａ（図４（Ｂ））が表示されている。ユーザは、表示装置２５０に表示される中間画像Ｉｍａを見て、注目領域のスタイル変換の結果を確認することができる。フローチャートでは、省略するが、ユーザは、注目領域のスタイル変換の結果に満足できない場合には、ボタンＢＴ１を押下することで、処理済みの注目領域について、再度、図３のＳ１３５～Ｓ１４７、および、図６のＳ２２５～Ｓ２３５を繰り返させることができる。 In S145 of FIG. 3, the CPU 210 of the terminal device 200 receives intermediate image data transmitted from the server 100. In S147, CPU 210 updates the selection screen displayed on display device 250 using the intermediate image data. For example, on the selection screen UDc in FIG. 5(C), an intermediate image Ima (FIG. 4(B)) indicated by intermediate image data is displayed instead of the input image Iin. The user can view the intermediate image Ima displayed on the display device 250 and check the result of style conversion of the region of interest. Although omitted in the flowchart, if the user is not satisfied with the result of the style conversion of the attention area, by pressing button BT1, the user can perform steps S135 to S147 of FIG. 3 again for the already processed attention area, and S225 to S235 in FIG. 6 can be repeated.

図３のＳ１５０では、ＣＰＵ２１０は、顔の全ての部位の領域について処理されたか否かを判断する。未処理の領域がある場合には（Ｓ１５０：ＮＯ）、Ｓ１３０に処理が戻される。全ての部位の領域について処理された場合には（Ｓ１５０：ＹＥＳ）、Ｓ１５５に処理が進められる。 In S150 of FIG. 3, the CPU 210 determines whether all regions of the face have been processed. If there is an unprocessed area (S150: NO), the process returns to S130. If the regions of all parts have been processed (S150: YES), the process proceeds to S155.

Ｓ１５５に処理が進められた時点で、サーバ１００において、図４（Ｃ）の中間画像Ｉｍｂを示す中間画像データが生成され、端末装置２００に送信されている。中間画像Ｉｍｂの顔ＦＣｂでは、図４（Ａ）の入力画像Ｉｉｎの各部位の部分画像ＰＩｅｒ、ＰＩｅｌ、ＰＩｎ、ＰＩｍが、変換済データによって示される変換済部分画像ＴＩｅｒ、ＴＩｅｌ、ＴＩｎ、ＴＩｍに置換されている。中間画像Ｉｍｂには、変換済部分画像ＴＩｅｒ、ＴＩｅｌ、ＴＩｎ、ＴＩｍと他の部分との境界に上述したスジＢＬが現れている。 When the process proceeds to S155, intermediate image data representing the intermediate image Imb of FIG. 4(C) is generated in the server 100 and transmitted to the terminal device 200. In the face FCb of the intermediate image Imb, the partial images PIer, PIel, PIn, and PIm of each part of the input image Iin in FIG. has been replaced. In the intermediate image Imb, the above-described streaks BL appear at the boundaries between the converted partial images TIer, TIel, TIn, and TIm and other parts.

図３のＳ１５５では、端末装置２００のＣＰＵ２１０は、図５（Ｄ）の肌色の選択画面ＵＤｄを表示装置２５０に表示する。図５（Ｄ）の選択画面ＵＤｄは、中間画像Ｉｍｂ（図）と、肌色の選択指示を入力するための選択ウインドウＳＷｄと、ボタンＢＴ１、ＢＴ２と、を含んでいる。選択ウインドウＳＷｄは、選択肢として、Ｓ１２７にて受信された複数個の肌色データによって示される肌色を有する矩形画像ＣＰ１、ＣＰ２を含んでいる。 In S155 of FIG. 3, the CPU 210 of the terminal device 200 displays the skin color selection screen UDd of FIG. 5(D) on the display device 250. The selection screen UDd in FIG. 5(D) includes an intermediate image Imb (figure), a selection window SWd for inputting a skin color selection instruction, and buttons BT1 and BT2. The selection window SWd includes, as options, rectangular images CP1 and CP2 having the skin color indicated by the plurality of skin color data received in S127.

図３のＳ１６０では、ＣＰＵ２１０は、ユーザによって選択された肌色を示す情報をサーバ１００に送信する。例えば、ユーザは、図５（Ｄ）の選択ウインドウＳＷｄに表示された複数個の矩形画像ＣＰ１、ＣＰ２の中から、１個の画像を選択して、ボタンＢＴ２を押下する。ＣＰＵ２１０は、ボタンＢＴ２が押下された時点で、選択ウインドウＳＷｄにて選択されている矩形画像が有する肌色を示す情報（例えば、色番号などのＩＤ）をサーバ１００に送信する。 In S160 of FIG. 3, CPU 210 transmits information indicating the skin color selected by the user to server 100. For example, the user selects one image from among the plurality of rectangular images CP1 and CP2 displayed in the selection window SWd of FIG. 5(D) and presses the button BT2. When the button BT2 is pressed, the CPU 210 transmits information (for example, an ID such as a color number) indicating the skin color of the rectangular image selected in the selection window SWd to the server 100.

図６の２４５では、サーバ１００のＣＰＵ１１０は、選択された肌色を示す情報を端末装置２００から受信する。図６のＳ２５０では、ＣＰＵ１１０は、Ｓ２０５にて取得済みの入力画像データに対して肌色補正を実行して、補正済みの入力画像データを生成する。肌色補正処理は、公知の補正処理が用いられる。例えば、ＣＰＵ１１０は、入力画像データに対して公知の顔認識アルゴリズムを用いた認識処理を実行し、入力画像Ｉｉｎ内の人物の顔ＦＣの領域を特定する。顔認識アルゴリズムには、例えば、人物の顔の領域を認識できるようにトレーニングされた上述したｙｏｌｏの畳込ニューラルネットワークが用いられる。ＣＰＵ１１０は、人物の顔ＦＣの領域の複数個の画素のうち、肌色を示す所定の範囲内のＲＧＢ値を有する肌色画素を特定し、特定された複数個の肌色画素の平均のＲＧＢ値を算出する。ＣＰＵ１１０は、肌色画素の平均のＲＧＢ値と、ユーザによって選択された肌色を示すＲＧＢ値と、の差分に基づいて、ＲＧＢの各成分の補正量を決定する。ＣＰＵ１１０は、該補正量に応じてＲＧＢの各成分のトーンカーブを決定し、該トーンカーブを用いて、特定済みの複数個の肌色画素のＲＧＢ値を補正する。図４（Ｄ）には、補正済みの入力画像データによって示される補正済画像Ｉｃが示されている。補正済画像Ｉｃの人物の顔ＦＣｃは、ユーザによって選択された肌色を有している。 At 245 in FIG. 6, the CPU 110 of the server 100 receives information indicating the selected skin color from the terminal device 200. In S250 of FIG. 6, the CPU 110 performs skin color correction on the input image data acquired in S205 to generate corrected input image data. A known correction process is used for the skin color correction process. For example, the CPU 110 executes recognition processing using a known face recognition algorithm on the input image data, and specifies the area of the person's face FC in the input image Iin. The face recognition algorithm uses, for example, the above-mentioned YOLO convolutional neural network that is trained to recognize the region of a person's face. The CPU 110 specifies a skin color pixel having an RGB value within a predetermined range indicating skin color among the plurality of pixels in the area of the person's face FC, and calculates the average RGB value of the specified plurality of skin color pixels. do. The CPU 110 determines the amount of correction for each RGB component based on the difference between the average RGB value of the skin color pixels and the RGB value indicating the skin color selected by the user. The CPU 110 determines a tone curve for each RGB component according to the correction amount, and uses the tone curve to correct the RGB values of the specified plurality of skin color pixels. FIG. 4(D) shows a corrected image Ic represented by corrected input image data. The person's face FCc in the corrected image Ic has a skin color selected by the user.

Ｓ２５５では、中間画像データに対して、顔の全体のスタイル変換処理を実行して、出力画像データを生成する。例えば、ＣＰＵ１１０は、図４（Ｃ）の中間画像Ｉｍｂを示す中間画像データと、図４（Ｄ）の補正済画像Ｉｃを示す補正済みの入力画像データと、に対して、それぞれ、縮小処理または拡大処理を実行する。これによって、中間画像Ｉｍｂと補正済画像Ｉｃとのサイズは、所定のサイズ（本実施例では、縦２５６画素×横２５６画素）に調整される。ＣＰＵ１１０は、サイズが調整された後の中間画像データをコンテンツ画像データＣＤとし、サイズが調整された後の補正済みの入力画像データをスタイル画像データＳＤとして、顔用の生成ネットワークＧＮ４に入力することによって、顔全体の変換済画像データＴＤを生成する。ＣＰＵ１１０は、生成された顔全体の変換済画像データＴＤに対して拡大処理または縮小処理を実行して、変換済画像データＴＤによって示される画像のサイズを元の入力画像Ｉｉｎと同じサイズに調整する。サイズが調整された後の変換済画像データＴＤが、最終的な出力画像Ｉｏｕｔを示す出力画像データである。顔用の生成ネットワークＧＮ４において、強度を示すパラメータαは、上述した顔の各部位に対するスタイル変換処理（図６のＳ２３０）におけるパラメータαよりも小さな値に設定される。これは、各部位に対するスタイル変換処理によって中間画像Ｉｍｂに現れている顔の各部位の特徴が、顔の全体のスタイル変換処理によって失われることを抑制するためである。パラメータαの値が比較的小さい場合であっても、顔の肌色のような全体的な特徴は、出力画像Ｉｏｕｔに反映される。 In S255, the entire face style conversion process is performed on the intermediate image data to generate output image data. For example, the CPU 110 performs reduction processing or Execute enlargement processing. As a result, the sizes of the intermediate image Imb and the corrected image Ic are adjusted to a predetermined size (in this embodiment, 256 pixels vertically by 256 pixels horizontally). The CPU 110 inputs the size-adjusted intermediate image data as content image data CD and the size-adjusted corrected input image data as style image data SD to the face generation network GN4. Thus, converted image data TD of the entire face is generated. The CPU 110 executes an enlargement process or a reduction process on the generated converted image data TD of the entire face, and adjusts the size of the image represented by the converted image data TD to the same size as the original input image Iin. . The converted image data TD after the size adjustment is output image data indicating the final output image Iout. In the face generation network GN4, the parameter α indicating the strength is set to a smaller value than the parameter α in the above-described style conversion process for each part of the face (S230 in FIG. 6). This is to prevent the features of each part of the face appearing in the intermediate image Imb through the style conversion process for each part from being lost due to the style conversion process for the entire face. Even when the value of the parameter α is relatively small, the overall characteristics such as the skin color of the face are reflected in the output image Iout.

図４（Ｅ）には、出力画像Ｉｏｕｔの一例が示されている。出力画像Ｉｏｕｔの人物の顔ＦＣｏは、中間画像Ｉｍｂの顔の部位の特徴を備えており、顔ＦＣｏの肌色は、補正済画像Ｉｃの顔ＦＣｃの肌色に近い色である。また、出力画像Ｉｏｕｔの人物の顔ＦＣｏでは、中間画像Ｉｍｂと比較して、スジＢＬは目立たない。すなわち、出力画像Ｉｏｕｔでは、スジＢＬを形成する境界における画素の値の差が小さくされている。スタイル画像として用いられる補正済画像Ｉｃの顔ＦＣｃはスジＢＬを含まないために、スタイル変換処理によって、補正済画像Ｉｃのスタイルが中間画像Ｉｍｂに適用されると、スジＢＬが軽減されるためである。 FIG. 4(E) shows an example of the output image Iout. The human face FCo in the output image Iout has the characteristics of the facial region in the intermediate image Imb, and the skin color of the face FCo is close to the skin color of the face FCc in the corrected image Ic. Furthermore, in the human face FCo of the output image Iout, the streaks BL are less noticeable than in the intermediate image Imb. That is, in the output image Iout, the difference in pixel values at the boundaries forming the streaks BL is reduced. This is because the face FCc of the corrected image Ic used as a style image does not include streaks BL, so when the style of the corrected image Ic is applied to the intermediate image Imb through style conversion processing, the streaks BL are reduced. be.

Ｓ２６０では、ＣＰＵ１１０は、生成された出力画像データを端末装置２００に送信して処理を終了する。 In S260, the CPU 110 transmits the generated output image data to the terminal device 200 and ends the process.

図３のＳ１６５では、端末装置２００のＣＰＵ２１０は、端末装置２００から送信される出力画像データを受信する。Ｓ１７０では、ＣＰＵ２１０は、出力画像データを出力する。出力画像データの出力の態様は、例えば、表示、保存、印刷を含む。例えば、ＣＰＵ２１０は、出力画像データによって示される出力画像Ｉｏｕｔを表示装置２５０に表示する。例えば、ＣＰＵ２１０は、ユーザの指示に基づいて、出力画像データを含むファイルを揮発性記憶装置１２０、不揮発性記憶装置１３０に保存する。例えば、ＣＰＵ２１０は、出力画像データを用いて、出力画像Ｉｏｕｔを示す印刷データを生成して、図示しないプリンタに送信する。 In S165 of FIG. 3, the CPU 210 of the terminal device 200 receives the output image data transmitted from the terminal device 200. In S170, CPU 210 outputs output image data. The output mode of the output image data includes, for example, display, storage, and printing. For example, the CPU 210 displays the output image Iout indicated by the output image data on the display device 250. For example, the CPU 210 stores a file containing output image data in the volatile storage device 120 and the nonvolatile storage device 130 based on a user's instruction. For example, the CPU 210 uses the output image data to generate print data representing the output image Iout, and sends it to a printer (not shown).

以上説明した第１実施例では、サーバ１００のＣＰＵ１１０は、入力画像データを取得し（図６のＳ２０５）、入力画像データを用いて、入力画像Ｉｉｎの一部である第１入力部分画像（例えば、目の領域Ｐｅｒ、Ｐｅｌに対応する部分画像ＰＩｅｒ、ＰＩｅｌ）と、入力画像の一部であって第１入力部分画像とは異なる位置にある第２入力部分画像（例えば、鼻の領域Ｐｎに対応する部分画像ＰＩｎ）と、を特定する（図６のＳ２１０）。ＣＰＵ１１０は、第１入力部分画像を示す第１部分画像データ（例えば、目の部分画像ＰＩｅｒ、ＰＩｅｌを示す部分画像データ）に対して、機械学習モデル（例えば、目の生成ネットワークＧＮ１）を用いた第１スタイル変換処理を実行して、第１変換済部分画像（例えば、目の変換済部分画像ＴＩｅｒ、ＴＩｅｌ）を示す第１変換済データ（例えば、目の変換済部分画像ＴＩｅｒ、ＴＩｅｌを示す変換済データ）を生成する（図６のＳ２３０）。ＣＰＵ１１０は、第２入力部分画像を示す第２部分画像データ（例えば、鼻の部分画像ＰＩｎを示す部分画像データ）に対して、機械学習モデル（例えば、鼻の生成ネットワークＧＮ２）を用いた第２スタイル変換処理を実行して、第２変換済部分画像（例えば、鼻の変換済部分画像ＴＩｎ）を示す第２変換済データ（例えば、鼻の変換済部分画像ＴＩｎを示す変換済データ）を生成する（図６のＳ２３０）。ＣＰＵ１１０は、第１変換済データと第２変換済データとを用いて、入力画像Ｉｉｎに基づく出力画像Ｉｏｕｔを示す出力画像データを生成する（図６のＳ２３２、Ｓ２５０、Ｓ２５５）。図４（Ｄ）の出力画像Ｉｏｕｔは、第１入力部分画像に対応する第１出力部分画像（例えば、目の部分画像ＯＩｅｒ、ＯＩｅｌ）と、第２入力部分画像に対応する第２出力部分画像（鼻の部分画像ＯＩｎ）とを含む。第１出力部分画像（例えば、目の部分画像ＯＩｅｒ、ＯＩｅｌ）は、第１変換済部分画像（例えば、目の変換済部分画像ＴＩｅｒ、ＴＩｅｌ）に基づく画像である。第２出力部分画像（例えば、目の部分画像ＯＩｎ）は第２変換済部分画像（例えば、鼻の変換済部分画像ＴＩｎ）に基づく画像である。第１実施例によれば、このように、１個の入力画像データに対して第１スタイル変換処理と第２スタイル変換処理とを適用することで出力画像データが生成されるので、柔軟なスタイル変換を実現することができる。 In the first embodiment described above, the CPU 110 of the server 100 acquires input image data (S205 in FIG. 6), and uses the input image data to generate a first input partial image that is a part of the input image Iin (for example, , partial images PIer, PIel corresponding to the eye regions Per, Pel), and a second input partial image that is part of the input image and located at a different position from the first input partial image (for example, a nose region Pn). The corresponding partial image PIn) is specified (S210 in FIG. 6). The CPU 110 uses a machine learning model (for example, eye generation network GN1) on first partial image data indicating the first input partial image (for example, partial image data indicating eye partial images PIer, PIel). The first style conversion process is executed to perform first converted data (for example, the converted partial images TIer, TIel of the eyes) indicating the first converted partial images (for example, the converted partial images TIer, TIel of the eyes). (converted data) is generated (S230 in FIG. 6). The CPU 110 generates a second partial image data representing the second input partial image (for example, partial image data representing the nose partial image PIn) using a machine learning model (for example, the nose generation network GN2). Execute the style conversion process to generate second converted data (for example, converted data indicating the converted nose partial image TIn) indicating the second converted partial image (for example, the converted partial image TIn of the nose) (S230 in FIG. 6). The CPU 110 uses the first converted data and the second converted data to generate output image data indicating the output image Iout based on the input image Iin (S232, S250, and S255 in FIG. 6). The output image Iout in FIG. 4(D) includes a first output partial image (for example, eye partial images OIer, OIel) corresponding to the first input partial image and a second output partial image corresponding to the second input partial image. (nose partial image OIn). The first output partial image (for example, the eye partial image OIer, OIel) is an image based on the first converted partial image (for example, the eye converted partial image TIer, TIel). The second output partial image (eg, eye partial image OIn) is an image based on the second converted partial image (eg, nose converted partial image TIn). According to the first embodiment, the output image data is generated by applying the first style conversion process and the second style conversion process to one piece of input image data, so the style can be flexible. Conversion can be realized.

さらに、上記実施例では、第１スタイル変換処理（例えば、目の領域Ｐｅｒ、Ｐｅｌのスタイル変換処理）は、第１スタイル画像（例えば、目のスタイル画像ＳＩｅ１）を示すスタイル画像データＳＤを用いて実行され、第２スタイル変換処理（例えば、鼻の領域Ｐｎのスタイル変換処理）は、第２スタイル画像（例えば、鼻のスタイル画像ＳＩｎ１）を示すスタイル画像データＳＤを用いて実行される（図２（Ｂ）等）。第１変換済部分画像（例えば、目の変換済部分画像ＴＩｅｒ、ＴＩｅｌ）は、第１スタイル画像（例えば、目のスタイル画像ＳＩｅ１）のスタイルが、第１入力部分画像（例えば、目の部分画像ＰＩｅｒ、ＰＩｅｌ）に適用された画像であり、第２変換済部分画像（例えば、鼻の変換済部分画像ＴＩｎ）は、第２スタイル画像（例えば、鼻のスタイル画像ＳＩｎ１）のスタイルが、第２入力部分画像（例えば、鼻の部分画像ＰＩｎ）に適用された画像である。この結果、第１スタイル画像のスタイルと第２スタイル画像のスタイルとが適用された出力画像を示す出力画像データが生成できるので、より柔軟なスタイル変換を実現することができる。 Furthermore, in the above embodiment, the first style conversion process (for example, the style conversion process for the eye areas Per and Pel) uses the style image data SD indicating the first style image (for example, the eye style image SIe1). The second style conversion process (for example, the style conversion process for the nose region Pn) is executed using the style image data SD indicating the second style image (for example, the nose style image SIn1) (FIG. 2 (B) etc.). The first converted partial images (for example, the converted eye partial images TIer, TIel) are such that the style of the first style image (for example, the eye style image SIe1) is the same as that of the first input partial image (for example, the eye partial image TIer, TIel). PIer, PIel), and the second converted partial image (for example, the converted nose partial image TIn) is an image in which the style of the second style image (for example, the nose style image SIn1) is the second converted partial image (for example, the nose style image SIn1). This is an image applied to an input partial image (for example, nose partial image PIn). As a result, output image data indicating an output image to which the style of the first style image and the style of the second style image are applied can be generated, so that more flexible style conversion can be realized.

さらに、ＣＰＵ１１０は、第１変換済データと第２変換済データとを用いて、第１変換済部分画像（例えば、目の変換済部分画像ＴＩｅｒ、ＴＩｅｌ）と第２変換済部分画像（例えば、鼻の変換済部分画像ＴＩｎ）とを含む中間画像（例えば、中間画像Ｉｍｂ）を示す中間画像データを生成する（図６のＳ２３２、図４（Ｃ））。ＣＰＵ１１０は、中間画像データに対して特定の後処理（図６のＳ２５５）を実行して、出力画像データを生成する。この結果、特定の後処理を実行することで、適切な出力画像データを生成することができる。 Further, the CPU 110 uses the first converted data and the second converted data to convert the first converted partial images (for example, the converted partial images TIer and TIel of the eyes) and the second converted partial images (for example, Intermediate image data indicating an intermediate image (for example, intermediate image Imb) including the converted partial image TIn of the nose is generated (S232 in FIG. 6, FIG. 4C). The CPU 110 performs specific post-processing (S255 in FIG. 6) on the intermediate image data to generate output image data. As a result, appropriate output image data can be generated by performing specific post-processing.

具体的には、本実施例の特定の後処理として、顔の全体のスタイル変換処理（図６のＳ２５５）が行われる。この処理によって、上述したように、中間画像Ｉｍａにおいて、変換済部分画像（例えば、目や鼻の変換済部分画像ＴＩｅｒ、ＴＩｅｌ、ＴＩｎ）と、該１変換済部分画像に隣接する部分との間における画素値の差が低減される。この結果、例えば、出力画像Ｉｏｕｔでは、中間画像Ｉｍａに現れているスジＢＬが目立たない。このように、出力画像Ｉｏｕｔが自然な見栄えを有するように、出力画像データを生成することができる。 Specifically, as specific post-processing in this embodiment, style conversion processing for the entire face (S255 in FIG. 6) is performed. Through this processing, as described above, in the intermediate image Ima, there is a gap between the converted partial images (for example, the converted partial images TIer, TIel, TIn of eyes and nose) and the portion adjacent to the one converted partial image. The difference in pixel values at is reduced. As a result, for example, in the output image Iout, the streaks BL appearing in the intermediate image Ima are not noticeable. In this way, output image data can be generated so that the output image Iout has a natural appearance.

さらに、本実施例の顔の全体のスタイル変換処理（図６のＳ２５５）は、機械学習モデル（例えば、顔用の生成ネットワークＧＮ４）を用いた第３スタイル変換処理である。この結果、部分画像に対するスタイル変換処理と、画像の全体に対するスタイル変換さらに、第３スタイル変換処理を実行することで、さらに、柔軟なスタイル変換を実現することができる。 Furthermore, the overall style conversion process for the face (S255 in FIG. 6) of this embodiment is a third style conversion process using a machine learning model (for example, the face generation network GN4). As a result, more flexible style conversion can be achieved by performing style conversion processing on a partial image, style conversion on the entire image, and third style conversion processing.

さらに、本実施例の第３スタイル変換処理（図６のＳ２５５の顔の全体のスタイル変換処理）は、入力画像データをスタイル画像データＳＤとして用いて実行される。この結果、例えば、上述したスジＢＬが目立たない自然な見栄えを有する出力画像を示す出力画像データを容易に生成することができる。 Further, the third style conversion process (the style conversion process for the whole face in S255 in FIG. 6) of this embodiment is executed using the input image data as the style image data SD. As a result, for example, it is possible to easily generate output image data showing an output image having a natural appearance in which the above-described streaks BL are not noticeable.

さらに、本実施例の特定の後処理は、入力画像データに対して、人物の顔ＦＣの肌色を補正する処理を実行して、補正済みの入力画像データを生成する処理（図６のＳ２５０）を含む。そして、第３スタイル変換処理（図６のＳ２５５の顔の全体のスタイル変換処理）は、補正済みの入力画像データをスタイル画像データＳＤとして用いて実行される。この結果、補正済みの入力画像（図４（Ｄ）の補正済画像Ｉｃ）の人物の顔の肌色が、スタイルとして出力画像Ｉｏｕｔに適用される。したがって、任意の肌色を有する出力画像Ｉｏｕｔを示す出力画像データを容易に生成することができる。 Furthermore, the specific post-processing of this embodiment is a process of executing a process of correcting the skin color of a person's face FC on input image data to generate corrected input image data (S250 in FIG. 6). including. Then, the third style conversion process (the overall style conversion process of the face in S255 in FIG. 6) is executed using the corrected input image data as the style image data SD. As a result, the skin color of the person's face in the corrected input image (corrected image Ic in FIG. 4(D)) is applied to the output image Iout as a style. Therefore, output image data representing an output image Iout having an arbitrary skin color can be easily generated.

さらに、本実施例では、上述のように、入力画像Ｉｉｎは、人物の顔ＦＣを示す画像を含む（図４（Ａ））、第１入力部分画像（例えば、部分画像ＰＩｅｒ、ＰＩｅｌ）は、人物の顔ＦＣを構成する第１の部位（例えば、目）を示す画像であり、第２入力部分画像（例えば、部分画像ＰＩｎ）は、人物の顔ＦＣを構成する第２の部位（例えば、鼻）を示す画像である。この結果、人物の顔を構成する第１の部位と第２の部位とについて、柔軟なスタイル変換を実現することができる。例えば、目のスタイル画像として人物Ａの目の画像を選択し、鼻のスタイル画像として人物Ｂの鼻の画像を選択すれば、入力画像Ｉｉｎの人物の顔ＦＣの目を人物Ａの目に近づけ、顔ＦＣの鼻を人物Ｂの鼻に近づけるように、スタイル変換することができる。 Furthermore, in this embodiment, as described above, the input image Iin includes an image showing the person's face FC (FIG. 4(A)), and the first input partial images (for example, partial images PIer, PIel) are The image is an image showing a first part (e.g., eyes) constituting the person's face FC, and the second input partial image (e.g., partial image PIn) is an image showing the second part (e.g., This is an image showing the nose. As a result, flexible style conversion can be realized for the first part and the second part that make up a person's face. For example, if you select the eye image of person A as the eye style image and the nose image of person B as the nose style image, the eyes of the person's face FC in the input image Iin will be brought closer to the eyes of person A. , the style can be converted so that the nose of the face FC approaches the nose of the person B.

さらに、本実施例では、端末装置２００から情報を受信することで入力画像Ｉｉｎの種類（例えば、人物の性別や人種）が特定される（図６のＳ２１５）。そして、入力画像Ｉｉｎの種類に応じて、Ｓ２３０のスタイル変換処理に用いるべきスタイル画像データＳＤの候補が変更される（図６のＳ２２０）。すなわち、Ｓ２３０では、入力画像Ｉｉｎの種類に応じて異なるスタイル変換処理が実行される。換言すれば、入力画像Ｉｉｎが第１種の入力画像（例えば、女性の顔の入力画像）である場合に、顔の各部位の部分画像データに対して第１種のスタイル変換処理が実行され、入力画像Ｉｉｎが第２種の入力画像（例えば、男性の顔の入力画像）である場合に、顔の各部位の部分画像データに対して第２種のスタイル変換処理が実行される。この結果、入力画像Ｉｉｎの種類に応じた柔軟なスタイル変換を実現できる。例えば、入力画像Ｉｉｎの人物の性別や人種などによって、ユーザに好まれるスタイル変換は異なり得ると考えられるので、本実施例によれば、ユーザのニーズに合致したスタイル変換を実現できる。 Further, in this embodiment, the type of input image Iin (for example, the gender and race of the person) is specified by receiving information from the terminal device 200 (S215 in FIG. 6). Then, candidates for style image data SD to be used in the style conversion process of S230 are changed depending on the type of input image Iin (S220 in FIG. 6). That is, in S230, different style conversion processes are performed depending on the type of input image Iin. In other words, when the input image Iin is a first type input image (for example, an input image of a woman's face), the first type style conversion process is performed on partial image data of each part of the face. , when the input image Iin is a second type input image (for example, an input image of a man's face), the second type style conversion process is performed on partial image data of each part of the face. As a result, flexible style conversion can be realized depending on the type of input image Iin. For example, it is considered that the style conversion preferred by the user may differ depending on the gender, race, etc. of the person in the input image Iin, so according to this embodiment, style conversion that meets the user's needs can be realized.

さらに、本実施例によれば、ユーザは、選択画面ＵＤｂ、ＳＤｃのスライドバーＳＢｂ、ＳＢｃを操作して、顔の部位ごとに、スタイル変換の強度を示すパラメータαを設定できる（図５（Ｂ）、（Ｃ）、図３のＳ１４０、図６のＳ２２５）。換言すれば、第１スタイル変換処理（例えば、目のスタイル変換処理）は、第１パラメータα１を用いて実行され、第２スタイル変換処理（例えば、鼻のスタイル変換）は、第１パラメータα１とは独立して調整される第２パラメータα２を用いて実行される。この結果、さらに柔軟なスタイル変換を実現できる。例えば、目については入力画像Ｉｉｎとの差異が大きく、鼻については入力画像Ｉｉｎとの差異が小さな出力画像Ｉｏｕｔを示す出力画像データを容易に生成することができる。この結果、例えば、用意されるスタイル画像データＳＤの個数が比較的少なくても柔軟で多様なスタイル変換が実現できる。 Furthermore, according to this embodiment, the user can set the parameter α indicating the strength of style conversion for each part of the face by operating the slide bars SBb and SBc on the selection screens UDb and SDc (Fig. 5(B) ), (C), S140 in FIG. 3, S225 in FIG. 6). In other words, the first style conversion process (e.g., eye style conversion process) is performed using the first parameter α1, and the second style conversion process (e.g., nose style conversion) is performed using the first parameter α1. is performed using an independently adjusted second parameter α2. As a result, more flexible style conversion can be achieved. For example, it is possible to easily generate output image data that shows an output image Iout in which the eyes have a large difference from the input image Iin, and the nose has a small difference from the input image Iin. As a result, for example, even if the number of prepared style image data SD is relatively small, flexible and various style conversions can be realized.

さらに、本実施例によれば、ＣＰＵ１１０は、ユーザによる目のスタイル画像の選択指示（図５（Ｂ））に基づいて、目のスタイル画像データＳＤを取得し、ユーザによる鼻のスタイル画像の選択指示（図５（Ｃ））に基づいて、鼻のスタイル画像を取得する（図６のＳ２２７）。目や鼻のスタイル変換処理は、取得された目や鼻のスタイル画像データＳＤを用いて実行される（図６のＳ２３０）。この結果、ユーザによるスタイル画像の選択指示に応じた柔軟なスタイル変換を実現できる。例えば、ユーザは選択指示を入力することで、例えば、目と鼻に類似したスタイルが適用された出力画像Ｉｏｕｔを示す出力画像データをサーバ１００に生成させることもでき、目と鼻に大きく異なるスタイルが適用された出力画像Ｉｏｕｔを示す出力画像データをサーバ１００に生成させることもできる。 Further, according to the present embodiment, the CPU 110 obtains the eye style image data SD based on the user's instruction to select the eye style image (FIG. 5(B)), and the user's selection of the nose style image. Based on the instruction (FIG. 5(C)), a nose style image is acquired (S227 in FIG. 6). The eye and nose style conversion process is performed using the acquired eye and nose style image data SD (S230 in FIG. 6). As a result, flexible style conversion can be realized in response to a user's instruction to select a style image. For example, by inputting a selection instruction, the user can cause the server 100 to generate output image data indicating the output image Iout to which a style similar to the eyes and nose is applied, or a style that is significantly different for the eyes and nose. It is also possible to cause the server 100 to generate output image data indicating the output image Iout to which the Iout has been applied.

以上の説明から解るように、目のスタイル画像の選択指示は、第１の入力の例であり、鼻のスタイル画像の選択指示は、第２の入力の例である。また、目のスタイル画像の選択指示に基づいて取得される目のスタイル画像データＳＤは、第１入力情報の例であり、鼻のスタイル画像の選択指示に基づいて取得される鼻のスタイル画像データＳＤは、第２入力情報の例である。 As can be seen from the above description, the instruction to select the eye style image is an example of the first input, and the instruction to select the nose style image is an example of the second input. Further, the eye style image data SD obtained based on the selection instruction of the eye style image is an example of the first input information, and the nose style image data obtained based on the selection instruction of the nose style image. SD is an example of second input information.

Ｂ．第２実施例
Ｂ－１．システム１０００の構成
第２実施例のシステム１０００の基本的な構成は、第１実施例と同様に図１に示す構成であるので、以下では、図１を参照して第１実施例と異なる点について説明する。 B. Second Example B-1. Configuration of System 1000 The basic configuration of the system 1000 of the second embodiment is the configuration shown in FIG. 1 as in the first embodiment. I will explain about it.

第２実施例のシステム１０００は、第１実施例の構成に加えて、端末装置２００と通信可能に接続されるミシン３００を備える。ミシン３００は、刺繍データに基づいて、複数色の糸を布に縫い付けることによって布に刺繍模様を縫製する。 In addition to the configuration of the first embodiment, a system 1000 according to the second embodiment includes a sewing machine 300 that is communicably connected to a terminal device 200. The sewing machine 300 sews an embroidery pattern onto the cloth by sewing threads of multiple colors onto the cloth based on the embroidery data.

第２実施例の端末装置２００は、パーソナルコンピュータなどの据え置き型の端末装置である。第２実施例の端末装置２００の揮発性記憶装置２３０に格納されるコンピュータプログラムＰＧｔは、ミシン３００を制御するドライバプログラムである。コンピュータプログラムＰＧｔは、ミシン３００の製造者によって提供され、インターネットＩＴを介して端末装置２００に接続されたサーバからダウンロードされる形態で提供される。これに代えて、コンピュータプログラムＰＧｔは、ＣＤ－ＲＯＭやＤＶＤ－ＲＯＭなどに格納された形態で提供されても良い。ＣＰＵ２１０は、コンピュータプログラムＰＧｔを実行することによって、サーバ１００と協働して、後述する刺繍データを生成してミシン３００に供給する処理を実行する。 The terminal device 200 of the second embodiment is a stationary terminal device such as a personal computer. The computer program PGt stored in the volatile storage device 230 of the terminal device 200 of the second embodiment is a driver program that controls the sewing machine 300. The computer program PGt is provided by the manufacturer of the sewing machine 300, and is provided in the form of being downloaded from a server connected to the terminal device 200 via the Internet IT. Alternatively, the computer program PGt may be provided in a form stored on a CD-ROM, DVD-ROM, or the like. By executing the computer program PGt, the CPU 210 cooperates with the server 100 to perform a process of generating embroidery data, which will be described later, and supplying it to the sewing machine 300.

第２実施例のサーバ１００の不揮発性記憶装置１３０に格納されるコンピュータプログラムＰＧｓは、ミシン３００の製造者によって提供され、サーバ１００にアップロードされる。ＣＰＵ１１０は、コンピュータプログラムＰＧｓを実行することによって、端末装置２００と協働して、後述する刺繍データを生成してミシン３００に供給する処理を実行する。 The computer program PGs stored in the nonvolatile storage device 130 of the server 100 of the second embodiment is provided by the manufacturer of the sewing machine 300 and uploaded to the server 100. By executing the computer program PGs, the CPU 110 cooperates with the terminal device 200 to perform a process of generating embroidery data, which will be described later, and supplying it to the sewing machine 300.

Ｂ－２．生成ネットワーク群の構成
第２実施例では、入力画像Ｉｉｎは、第１実施例と同様に、人物の顔ＦＣの全体を含む写真を示す画像である。写真などの画像データから刺繍データを生成する場合には、画像データに対して前処理を行い、前処理済みの画像データを用いて刺繍データを生成することが通常である。刺繍模様の縫製に用いられる糸の色数（例えば、数十色）は、写真に表現されている色数（例えば、約１千万色）よりも少ないことや、輪郭線がはっきりしていることが好ましいためである。このような前処理は、経験豊かな作業者が、画像加工プログラム（フォトレタッチソフトとも呼ばれる）を用いて行うことが一般的である。第２実施例では、入力画像データを用いて、前処理済みの出力画像Ｉｏｕｔを示す出力画像データを生成するために、スタイル変換処理が利用される。 B-2. Configuration of Generation Network Group In the second embodiment, the input image Iin is an image showing a photograph including the entire face FC of a person, as in the first embodiment. When generating embroidery data from image data such as a photograph, it is common to perform preprocessing on the image data and generate the embroidery data using the preprocessed image data. The number of colors of thread used to sew the embroidery pattern (e.g., several dozen colors) is smaller than the number of colors represented in the photograph (e.g., about 10 million colors), and the outlines are clear. This is because it is preferable. Such preprocessing is generally performed by an experienced operator using an image processing program (also called photo retouching software). In the second embodiment, style conversion processing is used to generate output image data representing the preprocessed output image Iout using input image data.

第２実施例の生成ネットワーク群ＧＮＧは、第１実施例と同様に、生成ネットワークＧＮ１～ＧＮ４を含んでいる。第２実施例では、顔の各部位のスタイル変換は、出力画像Ｉｏｕｔが刺繍データの生成に適した画像になるように実行される。このために、生成ネットワークＧＮ１～ＧＮ４のトレーニングおよび後述する刺繍データの生成の際に用いられるスタイル画像データＳＤによって示されるスタイル画像は、刺繍データの生成に適した前処理済みの画像である。前処理の手法、例えば、輪郭線を明確にする手法、陰影の付け方、色の調整の手法には、多数の手法があり、例えば、作業者によって異なる。このために、様々な手法で前処理が行われた複数個の画像がスタイル画像として用いられる。 The generation network group GNG of the second embodiment includes generation networks GN1 to GN4, as in the first embodiment. In the second embodiment, style conversion of each part of the face is performed so that the output image Iout becomes an image suitable for generating embroidery data. For this reason, the style image represented by the style image data SD used in training the generation networks GN1 to GN4 and in generating embroidery data, which will be described later, is a preprocessed image suitable for generating embroidery data. There are many preprocessing methods, such as methods for clarifying outlines, adding shading, and adjusting colors, and these methods vary depending on the operator. For this purpose, a plurality of images that have been preprocessed using various methods are used as style images.

例えば、目用の生成ネットワークＧＮ１は、様々な目の写真に対して様々な手法で前処理が行われた多数の画像がトレーニングのためのスタイル画像データＳＤとして用いられる。また、刺繍データの生成の際に、図５（Ｂ）の選択画面ＵＤｂを介して選択可能なスタイル画像データＳＤには、代表的な目の写真に対して複数個の手法で前処理が行われた複数個のスタイル画像を示す複数個のスタイル画像データＳＤが用いられる。 For example, the generation network GN1 for eyes uses a large number of images obtained by preprocessing photos of various eyes using various methods as style image data SD for training. In addition, when generating embroidery data, the style image data SD that can be selected via the selection screen UDb in FIG. A plurality of style image data SD representing a plurality of style images are used.

第２実施例の生成ネットワーク群ＧＮＧは、さらに、表情用の生成ネットワークＧＮ５と、歯列用の生成ネットワークＧＮ６と、を含んでいる。 The generation network group GNG of the second embodiment further includes a generation network GN5 for facial expressions and a generation network GN6 for dentition.

表情用の生成ネットワークＧＮ５は、機械学習モデルであり、ＳｔａｒＧＡＮと呼ばれる敵対的生成ネットワーク（GANs(Generative adversarial networks)）を構成する生成ネットワークである。表情用の生成ネットワークＧＮ５は、表情を変更するスタイル変換処理を実行する。具体的には、人物の顔を示す画像データと表情の種類を示すラベルデータとが、表情用の生成ネットワークＧＮ５に入力されると、表情用の生成ネットワークＧＮ５は、変換済画像データを出力する。該変換済画像データによって示される変換済画像は、入力された画像データによって示される人物の顔であって、ラベルデータによって示される表情を有する顔を示す。本実施例では、表情用の生成ネットワークＧＮ５は、無表情、歯を見せない笑顔(smile)、歯を見せた笑顔（grin）、真顔(serious)などの表情に変換することができるように、トレーニングされている。ＳｔａｒＧＡＮは、論文「Yunjey Choi et al., "StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation", arXiv preprint arXiv:1711.09020, 2017.」に開示されている。 The generative network GN5 for facial expressions is a machine learning model, and is a generative network that constitutes generative adversarial networks (GANs) called StarGAN. The facial expression generation network GN5 executes style conversion processing to change facial expressions. Specifically, when image data indicating a person's face and label data indicating the type of facial expression are input to the facial expression generation network GN5, the facial expression generation network GN5 outputs converted image data. . The converted image shown by the converted image data is the face of the person shown by the input image data, and shows the face having the expression shown by the label data. In this embodiment, the facial expression generation network GN5 is configured to be able to convert facial expressions such as a neutral expression, a smile that does not show teeth, a grin that shows teeth, and a serious face. Has been trained. StarGAN is disclosed in the paper "Yunjey Choi et al., "StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation", arXiv preprint arXiv:1711.09020, 2017."

歯列用の生成ネットワークＧＮ６は、上述した生成ネットワークＧＮ１～ＧＮ４と同様の機械学習モデルである。歯列用の生成ネットワークＧＮ６は、歯が露出した表情を有する人物の顔を示す画像データがコンテンツ画像データＣＤとして入力され、歯が露出した表情を有し、歯列が矯正されている人物の顔を示す画像データがスタイル画像データＳＤとして入力される。生成ネットワークＧＮ６が出力する変換済画像データＴＤによって示される画像は、コンテンツ画像データＣＤによって示される人物の顔であって、歯列が矯正されている人物の顔である。 The generation network GN6 for the dentition is a machine learning model similar to the generation networks GN1 to GN4 described above. The generation network GN6 for dentition receives image data showing the face of a person with an expression with exposed teeth as content image data CD, and generates a generation network GN6 of a person with an expression with exposed teeth and whose dentition has been corrected. Image data showing a face is input as style image data SD. The image represented by the converted image data TD output by the generation network GN6 is the face of the person represented by the content image data CD, and is the face of a person whose teeth have been corrected.

Ｂ－３．システムの動作
図７は、端末装置２００が実行する処理のフローチャートである。この処理は、サーバ１００が提供するスタイル変換を用いた前処理サービスを利用して、入力画像データに対して前処理を行って得られる出力画像データを取得し、該出力画像データを用いて刺繍データを生成する処理である。この処理は、例えば、端末装置２００のコンピュータプログラムＰＧｔが実行された状態で、ユーザの開始指示に基づいて開始される。 B-3. System Operation FIG. 7 is a flowchart of processing executed by the terminal device 200. This process uses a preprocessing service using style conversion provided by the server 100 to perform preprocessing on input image data, obtains output image data, and uses the output image data to perform embroidery. This is the process of generating data. This process is started based on a user's start instruction, for example, with the computer program PGt of the terminal device 200 being executed.

図７のＳ３０５では、端末装置２００のＣＰＵ２１０は、図４（Ａ）の人物の顔ＦＣを含む入力画像Ｉｉｎを示す入力画像データを取得する。なお、第１実施例と第２実施例とでは、用いられることが想定される画像（例えば、入力画像、スタイル画像、出力画像）は同じではないが、同様の人物の顔、部位を示す画像であるので、説明の便宜上、同じ図、同じ符号を用いて説明する。ＣＰＵ２１０は、例えば、不揮発性記憶装置１３０に格納された複数個の画像データの中から、ユーザによって指定された画像データを入力画像データとして取得する。 In S305 of FIG. 7, the CPU 210 of the terminal device 200 obtains input image data indicating the input image Iin including the face FC of the person shown in FIG. 4(A). Note that in the first and second embodiments, although the images that are expected to be used (for example, input images, style images, and output images) are not the same, images showing similar human faces and body parts are used. Therefore, for convenience of explanation, the same drawings and the same reference numerals will be used for explanation. For example, the CPU 210 obtains image data specified by the user from among a plurality of pieces of image data stored in the nonvolatile storage device 130 as input image data.

Ｓ３１０では、ＣＰＵ２１０は、入力画像Ｉｉｎを含む選択画面ＵＤを表示装置２５０に表示する。図８は、第２実施例の選択画面ＵＤを示す図である。図８の選択画面ＵＤは、入力画像Ｉｉｎと、プルダウンメニューＰＭ１～ＰＭ３と、選択ウインドウＳＷａ～ＳＷｄと、スライドバーＳＢａ～ＳＢｃと、チェックボックスＣＢａ、ＣＢｂと、ボタンＢＴ３、ＢＴ４と、を含んでいる。 In S310, CPU 210 displays selection screen UD including input image Iin on display device 250. FIG. 8 is a diagram showing the selection screen UD of the second embodiment. The selection screen UD in FIG. 8 includes an input image Iin, pull-down menus PM1 to PM3, selection windows SWa to SWd, slide bars SBa to SBc, check boxes CBa and CBb, and buttons BT3 and BT4. There is.

プルダウンメニューＰＭ１、ＰＭ２は、入力画像Ｉｉｎの種類に関する選択指示（具体的には、性別および人種の選択指示）を入力するためのメニューであり、第１実施例の図５（Ａ）のプルダウンメニューＰＭ１、ＰＭ２と同様のメニューである。プルダウンメニューＰＭ３は、上述した表情用の生成ネットワークＧＮ５を用いた表情の変更を行うか否かと、表情の変更を行う場合における変更後の表情の種類と、の選択指示を入力するためのメニューである。 The pull-down menus PM1 and PM2 are menus for inputting selection instructions regarding the type of input image Iin (specifically, selection instructions for gender and race), and are similar to the pull-down menus in FIG. 5(A) of the first embodiment. This menu is similar to menus PM1 and PM2. The pull-down menu PM3 is a menu for inputting selection instructions for whether or not to change the facial expression using the above-mentioned facial expression generation network GN5, and the type of facial expression after the change when changing the facial expression. be.

選択ウインドウＳＷｂ、ＳＷｃは、目、鼻のスタイル画像の選択指示を入力するための選択ウインドウであり、第１実施例の図５（Ｂ）、（Ｃ）の選択ウインドウＳＷｂ、ＳＷｃと同様のメニューである。選択ウインドウＳＷａは、選択肢として、口の複数個のスタイル画像データＳＤによって示される複数個のスタイル画像Ｓｍ１、Ｓｍ２が表示される。なお、各選択ウインドウ内のスタイル画像は、この時点では、表示されておらず、後述するＳ３３５にて表示される。 The selection windows SWb and SWc are selection windows for inputting selection instructions for eye and nose style images, and have the same menus as the selection windows SWb and SWc in FIGS. 5(B) and 5(C) of the first embodiment. It is. In the selection window SWa, a plurality of style images Sm1 and Sm2 indicated by a plurality of style image data SD of the mouth are displayed as options. Note that the style images in each selection window are not displayed at this point, but will be displayed in S335, which will be described later.

スライドバーＳＢａ～ＳＢｃは、図５（Ｂ）、（Ｃ）のスライドバーＳＢｂ、ＳＢｃと同様に、口、目、鼻のスタイル変換の強度を入力するためのスライドバーである。 Slide bars SBa to SBc are slide bars for inputting the strength of style conversion of the mouth, eyes, and nose, similar to slide bars SBb and SBc in FIGS. 5(B) and 5(C).

チェックボックスＣＢａは、後述する白目処理を実行するか否かを指定するためのチェックボックスである。チェックボックスＣＢｂは、歯列用の生成ネットワークＧＮ６を用いた歯列の矯正を行うか否かを指定するためのチェックボックスである。 The check box CBa is a check box for specifying whether or not to perform white eye processing, which will be described later. The check box CBb is a check box for specifying whether or not to correct the dentition using the generation network GN6 for the dentition.

図７のＳ３１５では、図３のＳ１１５と同様に、ＣＰＵ２１０は、入力画像データをサーバ１００に送信する。 In S315 of FIG. 7, the CPU 210 transmits the input image data to the server 100, similar to S115 of FIG.

サーバ１００が端末装置２００から送信される入力画像データを受信すると、サーバ１００のＣＰＵ２１０は、スタイル変換処理を用いた前処理サービスを提供する処理を開始する。図９は、第２実施例のサーバ１００が実行する処理のフローチャートである。図９のＳ４０５に示すように、サーバ１００のＣＰＵ１１０は、第１実施例と同様に、端末装置２００とデータの遣り取りを行いながら図６のＳ２０５～Ｓ２２０の処理を実行する。 When the server 100 receives input image data transmitted from the terminal device 200, the CPU 210 of the server 100 starts processing to provide a preprocessing service using style conversion processing. FIG. 9 is a flowchart of processing executed by the server 100 of the second embodiment. As shown in S405 of FIG. 9, the CPU 110 of the server 100 executes the processes of S205 to S220 of FIG. 6 while exchanging data with the terminal device 200, as in the first embodiment.

図６のＳ２０５では、サーバ１００のＣＰＵ１１０は、サーバ１００が端末装置２００から送信される入力画像データを受信する。Ｓ２１０では、ＣＰＵ１１０は、入力画像データに対して所定の領域特定処理を実行して、入力画像Ｉｉｎの顔ＦＣに含まれる複数個の部位、すなわち、右目、左目、鼻、口の領域Ｐｅｒ、Ｐｅｌ、Ｐｎ、Ｐｍを特定する。Ｓ２１２では、ＣＰＵ１１０は、複数個の部位の領域Ｐｅｒ、Ｐｅｌ、Ｐｎ、Ｐｍを示す領域情報を端末装置２００に送信する。 In S205 of FIG. 6, the CPU 110 of the server 100 receives input image data transmitted from the terminal device 200. In S210, the CPU 110 executes a predetermined area specifying process on the input image data to identify a plurality of parts included in the face FC of the input image Iin, that is, areas Per and Pel of the right eye, left eye, nose, and mouth. , Pn, and Pm. In S212, CPU 110 transmits area information indicating areas Per, Pel, Pn, and Pm of a plurality of body parts to terminal device 200.

図７のＳ３２０では、図３の１２０と同様に、端末装置２００のＣＰＵ２１０は、サーバ１００から送信される領域情報を受信し、該領域情報を用いて、複数個の部位の領域Ｐｅｒ、Ｐｅｌ、Ｐｎ、Ｐｍの特定結果を表示装置２５０に表示する。図７のＳ３２５では、図３のＳ１２５と同様に、ＣＰＵ２１０は、ユーザによって選択された性別と人種の情報をサーバ１００に送信する。 In S320 of FIG. 7, similarly to 120 of FIG. The identification results of Pn and Pm are displayed on the display device 250. In S325 of FIG. 7, similarly to S125 of FIG. 3, the CPU 210 transmits information on the gender and race selected by the user to the server 100.

図６のＳ２１５では、サーバ１００のＣＰＵ１１０は、端末装置２００から送信される性別および人種の情報を受信する。Ｓ２２０では、ＣＰＵ１１０は、受信された情報によって示される性別および人種に応じたスタイル画像データＳＤと肌色データとを、端末装置２００に送信する。 In S215 of FIG. 6, the CPU 110 of the server 100 receives the gender and race information transmitted from the terminal device 200. In S220, CPU 110 transmits style image data SD and skin color data according to the gender and race indicated by the received information to terminal device 200.

図７のＳ３３０では、端末装置２００のＣＰＵ２１０は、サーバ１００から送信されるスタイル画像データＳＤと肌色データとを受信する。図７のＳ３３５では、受信されたスタイル画像データＳＤによって示される口、目、鼻のスタイル画像ＳＩｍ１、ＳＩｍ２、ＳＩｅ１、ＳＩｅ２、ＳＩｎ１、ＳＩｎ２を、対応する選択ウインドウＳＷａ、ＳＷｂ、ＳＷｃに表示する（図８）。 In S330 of FIG. 7, the CPU 210 of the terminal device 200 receives the style image data SD and skin color data transmitted from the server 100. In S335 of FIG. 7, the style images SIm1, SIm2, SIe1, SIe2, SIn1, and SIn2 of the mouth, eyes, and nose indicated by the received style image data SD are displayed in the corresponding selection windows SWa, SWb, and SWc ( Figure 8).

図７のＳ３４０では、ＣＰＵ２１０は、選択画面ＵＤにて選択された変換処理のための情報をサーバ１００に送信する。ユーザは、図８の選択ウインドウＳＷａ～ＳＷｄ、スライドバーＳＢａ～ＳＢｃを介して、顔の各部位について用いるべきスタイル画像、各部位のスタイル変換の強度、出力画像の顔が有すべき肌色の選択指示を入力する。ユーザは、チェックボックスＣＢａ、ＣＢｂを介して、白目処理を実行するか否かと、歯列の矯正を実行するか否かと、の選択指示を入力する。ユーザは、プルダウンメニューＰＭ３を介して、表情の変更を行うか否かと、表情の変更を行う場合における変更後の表情の種類と、の選択指示を入力する。ただし、白目処理を実行することの選択指示が入力された場合には、目のスタイル画像の選択ウインドウＳＷｂは無効とされる。すなわち、白目処理を実行することの選択指示と目のスタイル画像の選択指示とのうち、一方の指示のみが有効となる。後述するように、サーバ１００において、白目処理と目のスタイル変換処理とのうちの一方のみが実行可能であるためである。その後、ユーザは、選択指示が入力された状態で、前処理の実行を指示するためのボタンＢＴ３を押下する。ＣＰＵ２１０は、ボタンＢＴ２が押下された時点で入力されている選択指示に対応する情報をサーバ１００に送信する。 In S340 of FIG. 7, the CPU 210 transmits information for the conversion process selected on the selection screen UD to the server 100. The user selects the style image to be used for each part of the face, the strength of style conversion for each part, and the skin color that the face of the output image should have, via the selection windows SWa to SWd and slide bars SBa to SBc in FIG. Enter instructions. The user inputs selection instructions as to whether or not to perform white eye treatment and whether to perform orthodontic correction via check boxes CBa and CBb. The user inputs selection instructions for whether or not to change the facial expression and the type of facial expression after the change if the facial expression is to be changed, via the pull-down menu PM3. However, if a selection instruction to perform the white eye processing is input, the eye style image selection window SWb is invalidated. That is, only one of the instruction to select to perform the white eye processing and the instruction to select the eye style image is valid. This is because, as will be described later, only one of the white eye processing and the eye style conversion processing can be executed in the server 100. Thereafter, with the selection instruction input, the user presses the button BT3 for instructing execution of preprocessing. CPU 210 transmits to server 100 information corresponding to the selection instruction input at the time button BT2 was pressed.

図９のＳ４１０では、サーバ１００のＣＰＵ１１０は、変換処理のための情報を端末装置２００から受信する。 In S410 of FIG. 9, the CPU 110 of the server 100 receives information for conversion processing from the terminal device 200.

図９のＳ４１５では、ＣＰＵ１１０は、Ｓ４１０にて受信された情報に基づいて、白目処理を実行することが選択されたか否かを判断する。白目処理を実行することが選択された場合には（Ｓ４１５：ＹＥＳ）、Ｓ４２０にて、ＣＰＵ１１０は、入力画像データに対して、白目処理を実行する。白目処理は、目の領域Ｐｅｒ、Ｐｅｌにおいて、目を示す画像の白目の部分を、見栄えの良い特定の白色で塗りつぶす処理である。例えば、ＣＰＵ１１０は、白目の部分に対応する画素の値を、白を示す特定の値（例えば、（２５５、２５５、２５５）のＲＧＢ値）に変換する。例えば、白および白に近似する色を示す所定範囲の値有する画素が、白目の部分に対応する画素として特定される。これによって、例えば、入力画像Ｉｉｎにおける白目の濁りが低減されて、刺繍模様にて表現される人物の顔の目の見栄えが向上する。白目処理は、機械学習モデルを用いずに目の部分画像ＰＩｅｒ、ＰＩｅｌの少なくとも一部の色を変換する処理である、と言うことができる。 In S415 of FIG. 9, the CPU 110 determines whether execution of the white eye process has been selected based on the information received in S410. If executing the white eye processing is selected (S415: YES), in S420, the CPU 110 executes the white eye processing on the input image data. The white eye processing is a process of filling the white part of the image showing the eyes with a specific white color that looks good in the eye regions Per and Pel. For example, the CPU 110 converts the value of a pixel corresponding to the white part of the eye to a specific value indicating white (for example, an RGB value of (255, 255, 255)). For example, pixels having values in a predetermined range indicating white and a color close to white are specified as pixels corresponding to the white part of the eye. As a result, for example, the cloudiness of the whites of the eyes in the input image Iin is reduced, and the appearance of the eyes of the person's face expressed by the embroidery pattern is improved. The white eye processing can be said to be a process of converting the color of at least a portion of the eye partial images PIer and PIel without using a machine learning model.

図９のＳ４２５では、ＣＰＵ１１０は、目の領域Ｐｅｒ、Ｐｅｌをスタイル変換の対象領域から除外する。白目処理が実行された後にスタイル変換処理が行われると、スタイル変換処理後の画像に白目の濁りが現れる場合があり、白目処理の効果が低下するためである。 In S425 of FIG. 9, the CPU 110 excludes the eye areas Per and Pel from the style conversion target area. This is because if the style conversion process is performed after the white of the eyes process is executed, the white of the eyes may become cloudy in the image after the style conversion process, which reduces the effectiveness of the white of the eyes process.

白目処理を実行することが選択されない場合には（Ｓ４１５：ＮＯ）、ＣＰＵ１１０は、Ｓ４２０、Ｓ４２５をスキップして、Ｓ４３０に処理を進める。 If executing the white eye processing is not selected (S415: NO), the CPU 110 skips S420 and S425 and advances the process to S430.

図９のＳ４３０では、ＣＰＵ１１０は、入力画像Ｉｉｎにて特定される顔の部位（目、鼻、口）の領域のうち、スタイル変換処理の対象とすべき対象領域から、注目領域を選択する。目の領域が対象領域から除外されている場合には、対象領域は、口と鼻の領域Ｐｎ、Ｐｍであり、目の領域が対象領域から除外されていない場合には、対象領域は、目と口と鼻の領域Ｐｅｒ、Ｐｅｌ、Ｐｎ、Ｐｍである。 In S430 of FIG. 9, the CPU 110 selects a region of interest from target regions to be subjected to style conversion processing among regions of facial parts (eyes, nose, mouth) specified in the input image Iin. When the eye area is excluded from the target area, the target area is the mouth and nose area Pn, Pm, and when the eye area is not excluded from the target area, the target area is the eye area. and the mouth and nose areas Per, Pel, Pn, and Pm.

図９のＳ４３５では、ＣＰＵ１１０は、Ｓ４１０にて受信された情報に基づいて、不揮発性記憶装置１３０に格納されたスタイル画像データ群ＳＤＧ（図１）から、注目領域のスタイル変換処理において、用いるべきスタイル画像データＳＤを取得する。 In S435 of FIG. 9, the CPU 110 selects the style image data group SDG (FIG. 1) stored in the non-volatile storage device 130 based on the information received in S410, and determines which image data should be used in the style conversion process of the region of interest. Acquire style image data SD.

図９のＳ４４０では、ＣＰＵ１１０は、図６のＳ２３０と同様に、注目領域のスタイル変換処理を実行する。Ｓ４４２では、図６のＳ２３２と同様に、ＣＰＵ１１０は、入力画像データのうちの注目領域に対応する部分画像データを変換済データに置換することによって、中間画像を示す中間画像データを生成する。 In S440 of FIG. 9, the CPU 110 executes style conversion processing for the region of interest, similar to S230 of FIG. In S442, similarly to S232 in FIG. 6, the CPU 110 generates intermediate image data representing an intermediate image by replacing partial image data corresponding to the region of interest in the input image data with converted data.

図９のＳ４４５では、ＣＰＵ１１０は、全ての対象領域について処理されたか否かを判断する。未処理の領域がある場合には（Ｓ４４５：ＮＯ）、Ｓ４３０に処理が戻される。全ての対象領域について処理された場合には（Ｓ４４５：ＹＥＳ）、Ｓ４５０に処理が進められる。 In S445 of FIG. 9, the CPU 110 determines whether all target areas have been processed. If there is an unprocessed area (S445: NO), the process returns to S430. If all target areas have been processed (S445: YES), the process proceeds to S450.

図９のＳ４５０では、ＣＰＵ１１０は、図６のＳ２５０と同様に、入力画像データに対して肌色補正を実行して、補正済みの入力画像データを生成する。図９のＳ４５５では、ＣＰＵ１１０は、図６のＳ２５５と同様に、中間画像データに対して、顔の全体のスタイル変換処理を実行して、出力画像データを生成する。 In S450 of FIG. 9, CPU 110 performs skin color correction on the input image data to generate corrected input image data, similar to S250 of FIG. In S455 of FIG. 9, the CPU 110 performs overall face style conversion processing on the intermediate image data to generate output image data, similar to S255 of FIG.

図９のＳ４６０では、ＣＰＵ１１０は、Ｓ４１０にて受信された情報に基づいて、表情の変更を実行することが選択されたか否かを判断する。表情の変更を実行することが選択された場合には（Ｓ４６０：ＹＥＳ）、Ｓ４２０にて、ＣＰＵ１１０は、出力画像データに対して、さらに、表情を変更するためのスタイル変換処理を実行する。例えば、ＣＰＵ１１０は、Ｓ４１０にて受信された情報に基づいて、変更後の表情の種類（例えば、歯を見せない笑顔、真顔）を決定して、変更後の表情の種類を示すラベルデータを生成する。ＣＰＵ１１０は、出力画像データとラベルデータとを、表情用の生成ネットワークＧＮ５に入力することによって、表情が変更された人物の顔を含む出力画像（図示省略）を示す出力画像データを生成する。 In S460 of FIG. 9, CPU 110 determines whether or not it has been selected to change the facial expression based on the information received in S410. If it is selected to change the facial expression (S460: YES), in S420, the CPU 110 further executes a style conversion process for changing the facial expression on the output image data. For example, the CPU 110 determines the type of facial expression after the change (e.g., a smile that does not show teeth, a straight face) based on the information received in S410, and generates label data indicating the type of facial expression after the change. do. The CPU 110 inputs the output image data and label data to the facial expression generation network GN5, thereby generating output image data indicating an output image (not shown) including a person's face whose facial expression has been changed.

表情の変更を実行することが選択されない場合には（Ｓ４６０：ＮＯ）、ＣＰＵ１１０は、Ｓ４６５をスキップして、Ｓ４７０に処理を進める。 If it is not selected to change the facial expression (S460: NO), the CPU 110 skips S465 and proceeds to S470.

図９のＳ４７０では、ＣＰＵ１１０は、Ｓ４１０にて受信された情報に基づいて、歯列の矯正を実行することが選択されたか否かを判断する。表情の変更を実行することが選択された場合には（Ｓ４７０：ＹＥＳ）、図９のＳ４７５にて、ＣＰＵ１１０は、歯列を矯正するためのスタイル変換処理を実行する。例えば、ＣＰＵ１１０は、出力画像データをコンテンツ画像データＣＤとし、予め用意された歯列が矯正された人物の顔を示す画像データをスタイル画像データＳＤとして、歯列用の生成ネットワークＧＮ６に入力することによって、歯列が矯正された人物の顔を含む出力画像（図示省略）を示す出力画像データを生成する。 In S470 of FIG. 9, CPU 110 determines whether or not it has been selected to perform orthodontic correction based on the information received in S410. If it is selected to change the facial expression (S470: YES), in S475 of FIG. 9, the CPU 110 executes a style conversion process for correcting the tooth alignment. For example, the CPU 110 inputs the output image data as the content image data CD and the image data prepared in advance showing the face of a person whose dentition has been corrected as the style image data SD to the generation network GN6 for the dentition. This generates output image data representing an output image (not shown) including the face of a person whose dentition has been corrected.

歯列の矯正を実行することが選択されない場合には（Ｓ４７０：ＮＯ）、ＣＰＵ１１０は、Ｓ４７５をスキップして、Ｓ４８０に処理を進める。 If it is not selected to perform tooth alignment correction (S470: NO), the CPU 110 skips S475 and proceeds to S480.

表情の変更も歯列の矯正も実行されない場合には、Ｓ４５５にて生成された出力画像データが、最終的な出力画像データである。表情の変更が実行され、歯列の矯正が実行されない場合には、Ｓ４６５にて生成された出力画像データが、最終的な出力画像データである。歯列の矯正が実行される場合には、Ｓ４７５にて生成された出力画像データが、最終的な出力画像データである。 If neither facial expression change nor tooth alignment correction is performed, the output image data generated in S455 is the final output image data. If the facial expression is changed and the tooth alignment is not corrected, the output image data generated in S465 is the final output image data. When correction of tooth alignment is performed, the output image data generated in S475 is the final output image data.

図９のＳ４８０では、ＣＰＵ１１０は、最終的な出力画像データを端末装置２００に送信して、処理を終了する。 In S480 of FIG. 9, the CPU 110 transmits the final output image data to the terminal device 200, and ends the process.

図７のＳ３４５では、端末装置２００のＣＰＵ２１０は、端末装置２００から送信される出力画像データを受信する。Ｓ３５０では、ＣＰＵ２１０は、出力画像データを用いて、出力画像を表示装置２５０に表示する。具体的には、図８の選択画面ＵＤの入力画像Ｉｉｎに代えて、出力画像を表示する。ユーザは、選択画面ＵＤにて、出力画像を確認して、出力画像に満足した場合には出力ボタンＢＴ４を押下する。ユーザは、出力画像の生成を再度やり直す場合には、選択画面ＵＤにおいて、選択指示の入力内容を適宜に変更して、前処理ボタンＢＴ３を押下する。 In S345 of FIG. 7, the CPU 210 of the terminal device 200 receives the output image data transmitted from the terminal device 200. In S350, CPU 210 displays the output image on display device 250 using the output image data. Specifically, the output image is displayed in place of the input image Iin on the selection screen UD in FIG. The user checks the output image on the selection screen UD, and if satisfied with the output image, presses the output button BT4. When the user wants to generate the output image again, the user changes the input content of the selection instruction as appropriate on the selection screen UD and presses the preprocessing button BT3.

図９のＳ３５５では、ＣＰＵ２１０は、出力ボタンＢＴ４が押下されたか、前処理ボタンＢＴ３が押下されたか、を判断する。出力ボタンＢＴ４が押下された場合には（Ｓ３５５：ＹＥＳ）、ＣＰＵ２１０は、Ｓ３６０に処理を進める。前処理ボタンＢＴ３が押下された場合には（Ｓ３５５：ＮＯ）、ＣＰＵ２１０は、Ｓ３４０に戻る。 In S355 of FIG. 9, the CPU 210 determines whether the output button BT4 or the preprocessing button BT3 has been pressed. If the output button BT4 is pressed (S355: YES), the CPU 210 advances the process to S360. If the preprocessing button BT3 is pressed (S355: NO), the CPU 210 returns to S340.

Ｓ３６０では、ＣＰＵ２１０は、出力画像データを用いて刺繍データに変換する。刺繍データは、刺繍模様を表すデータであり、例えば、刺繍模様の縫目を形成するための針落点の座標と、縫い順と、使用すべき糸の色と、を縫目ごとに示すデータである。出力画像データを刺繍データに変換する処理には、公知の処理、例えば、特開２０１９－４１８３４号に開示された処理が用いられる。 In S360, the CPU 210 converts the output image data into embroidery data. Embroidery data is data that represents an embroidery pattern, and for example, data that indicates, for each stitch, the coordinates of the needle drop point, the sewing order, and the color of thread to be used to form the stitches of the embroidery pattern. It is. The process of converting the output image data into embroidery data uses a known process, for example, the process disclosed in Japanese Patent Application Publication No. 2019-41834.

Ｓ３６５では、ＣＰＵ２１０は、刺繍データをミシン３００に送信する。ミシン３００は、刺繍データを受信すると、刺繍データを用いて、布に刺繍模様を縫製する。 In S365, CPU 210 transmits the embroidery data to sewing machine 300. Upon receiving the embroidery data, the sewing machine 300 sews an embroidery pattern onto cloth using the embroidery data.

以上説明した第２実施例によれば、出力画像データを生成する際に、第１実施例と同様に柔軟なスタイル変換処理を実現できる。この結果、例えば、ユーザの好みに応じた柔軟な前処理が行われた出力画像データを生成できる。したがって、例えば、ユーザが一般的な画像加工プログラムを用いて前処理を行う技術を有していなくても、ユーザの好みに応じた多様な刺繍模様を布に印刷することができる。 According to the second embodiment described above, when generating output image data, flexible style conversion processing can be realized as in the first embodiment. As a result, it is possible to generate output image data that has undergone flexible preprocessing according to the user's preferences, for example. Therefore, for example, even if the user does not have the skills to perform preprocessing using a general image processing program, various embroidery patterns according to the user's preferences can be printed on cloth.

例えば、第２実施例によれば、特定の後処理として、人物の顔の表情を変更するスタイル変換処理（図９のＳ４６５）が実行される。この結果、人物の顔の表情の変更を含む柔軟なスタイル変換を実現することができる。例えば、ユーザは１個の入力画像データを用意するだけで、様々な表情を有する顔を示す出力画像データをシステム１０００に生成させることができ、ひいては、様々な表情を有する顔の刺繍模様をミシン３００に縫製させることができる。 For example, according to the second embodiment, a style conversion process (S465 in FIG. 9) for changing the expression of a person's face is executed as a specific post-process. As a result, flexible style conversion including changing the expression of a person's face can be realized. For example, by simply preparing one piece of input image data, the user can have the system 1000 generate output image data showing faces with various expressions, and can even create embroidery patterns of faces with various expressions on the sewing machine. 300 can be sewn.

さらに、第２実施例によれば、ＣＰＵ１１０は、目の部分画像ＰＩｅｒ、ＰＩｅｌを示す部分画像データに対して実行すべき処理を、白目処理とスタイル変換処理とから選択する（図９のＳ４１５）。ＣＰＵ２１０は、スタイル変換処理が選択される場合に、白目処理を実行せずに、スタイル変換処理を実行し、白目処理が選択される場合に、スタイル変換処理を実行せずに、白目処理を実行する。この結果、目の部分画像データに対する処理として、機械学習モデルを用いたスタイル変換処理と、機械学習モデルを用いない白目処理と、が使い分けられるので、処理の柔軟性を向上できる。例えば、ユーザは、目に対する処理として、スタイル変換処理よりも単純な白目処理を好む場合もあるが、本実施例では、このようなユーザのニーズにも応えることができる。 Further, according to the second embodiment, the CPU 110 selects the process to be performed on the partial image data representing the eye partial images PIer and PIel from the white of the eye process and the style conversion process (S415 in FIG. 9). . When the style conversion process is selected, the CPU 210 executes the style conversion process without executing the pewter process, and when the pewter process is selected, the CPU 210 executes the pewter process without executing the style conversion process. do. As a result, style conversion processing using a machine learning model and white eye processing not using a machine learning model can be used as processing for eye partial image data, thereby improving processing flexibility. For example, a user may prefer simple white eye processing to style conversion processing as processing for eyes, and the present embodiment can also meet the needs of such users.

さらに、第２実施例によれば、口を示す画像において歯列を矯正するスタイル変換処理が実行される（図９のＳ４７５）。この結果、歯列が矯正された画像を示す出力画像データを容易に生成することができる。 Furthermore, according to the second embodiment, style conversion processing for correcting the tooth alignment in the image showing the mouth is executed (S475 in FIG. 9). As a result, output image data showing an image with corrected tooth alignment can be easily generated.

Ｂ．変形例：
（１）上記各実施例では、入力画像Ｉｉｎに含まれる人物の人種と性別に応じて異なるスタイル画像データＳＤが用いられる。これに限らず、例えば、入力画像Ｉｉｎに含まれる人物の表情（例えば、怒り、笑い、真顔）や、顔の角度（例えば、正面、側面、斜め）に応じて異なるスタイル画像データＳＤが用いられても良い。また、上記実施例では、これらの入力画像Ｉｉｎの種類は、ユーザの選択指示に基づいて特定されているが、例えば、画像認識処理、例えば、上述したｙｏｌｏと呼ばれる画像認識アルゴリズムを用いて特定されても良い。 B. Variant:
(1) In each of the above embodiments, different style image data SD are used depending on the race and gender of the person included in the input image Iin. For example, different style image data SD may be used depending on the facial expression (e.g., angry, laughing, straight face) or the angle of the face (e.g., front, side, diagonal) included in the input image Iin. It's okay. Further, in the above embodiment, the type of these input images Iin is specified based on the user's selection instruction, but it may also be specified using an image recognition process, for example, the above-mentioned image recognition algorithm called yolo. It's okay.

（２）上記各実施例では、部位ごとのスタイル変換処理（図６のＳ２３０、図９のＳ４４０）の対象の部位は、目、鼻、口である。これに限らず、対象の部位は、頭（頭髪）、耳、ほほ、顎などの他の部位であっても良い。 (2) In each of the above embodiments, the target parts of the style conversion process for each part (S230 in FIG. 6, S440 in FIG. 9) are the eyes, nose, and mouth. The target site is not limited to this, and may be other sites such as the head (hair), ears, cheeks, and chin.

（３）上記各実施例では、入力画像Ｉｉｎは、人物の顔ＦＣを含む画像に限らず、他の画像であっても良い。例えば、入力画像Ｉｉｎは、風景、動物、建物を含み、人物を含まない画像であっても良い。いずれの画像が入力画像として用いられる場合であっても、その画像の一部である第１部分画像と、第１部分画像とは異なる位置にある第２部分画像と、で互いに異なるスタイル変換処理が実行されることが好ましい。 (3) In each of the above embodiments, the input image Iin is not limited to an image including a person's face FC, but may be another image. For example, the input image Iin may be an image that includes landscapes, animals, and buildings, but does not include people. Regardless of which image is used as the input image, the first partial image that is part of the image and the second partial image located at a different position from the first partial image undergo different style conversion processing. is preferably performed.

（４）上記各実施例で用いられる生成ネットワーク（機械学習モデル）は一例であり、これに限られない。例えば、目、鼻、口で共通の生成ネットワークが用いられても良い。また、例えば、トレーニング時に用いられたスタイル画像が有する１種類のスタイルのみに変換可能である生成ネットワークが用いられても良い。この場合には、例えば、１つの部位（例えば、鼻）のスタイル変換のために、選択可能なスタイル画像の個数分の生成ネットワークが用意され、選択されたスタイル画像に応じて使い分けられても良い。 (4) The generative network (machine learning model) used in each of the above embodiments is an example, and is not limited to this. For example, a common generation network may be used for eyes, nose, and mouth. Furthermore, for example, a generation network may be used that is capable of converting style images used during training into only one type of style. In this case, for example, generation networks for the number of selectable style images may be prepared for style conversion of one part (e.g., nose), and used depending on the selected style image. .

（５）上記各実施例では、スタイル画像データＳＤは、サーバ１００に格納されたスタイル画像データ群ＳＤＧから選択される。これに代えて、スタイル画像データＳＤは、ユーザによって用意された画像データであっても良い。この場合には、ユーザは、用意したスタイル画像データＳＤを端末装置２００に入力する。入力されたスタイル画像データＳＤは、端末装置２００からサーバ１００に送信され、サーバ１００においてスタイル変換処理に用いられる。 (5) In each of the above embodiments, the style image data SD is selected from the style image data group SDG stored in the server 100. Alternatively, the style image data SD may be image data prepared by the user. In this case, the user inputs the prepared style image data SD into the terminal device 200. The input style image data SD is transmitted from the terminal device 200 to the server 100, and is used for style conversion processing in the server 100.

（６）上記各実施例では、ＣＰＵ１１０は、ユーザによって選択されたスタイル画像データＳＤを取得し（例えば、図６のＳ２２７）、該スタイル画像データＳＤを生成ネットワークに入力してスタイル変換処理を実行している（例えば、図６のＳ２３０）。これに代えて、予め複数個のスタイル画像データＳＤを、それぞれ、生成ネットワークＧＮのエンコーダＥＣに入力して、複数個の特徴データを生成しておいても良い。この場合には、ユーザによって選択されたスタイル画像データＳＤに対応する特徴データを取得し、該特徴データを用いてスタイル変換処理を実行しても良い。 (6) In each of the above embodiments, the CPU 110 acquires the style image data SD selected by the user (for example, S227 in FIG. 6), inputs the style image data SD to the generation network, and executes the style conversion process. (For example, S230 in FIG. 6). Alternatively, a plurality of style image data SD may be input in advance to the encoder EC of the generation network GN to generate a plurality of feature data. In this case, feature data corresponding to the style image data SD selected by the user may be acquired, and the style conversion process may be performed using the feature data.

（７）上記各実施例では、特定の後処理として、顔の全体のスタイル変換処理（例えば、図６のＳ２５５）を実行することによって、図４（Ｃ）の中間画像Ｉｍｂに現れるスジＢＬを軽減している。これに代えて、他の処理、例えば、フィルタを用いた平滑化処理をスジＢＬの部分の画素に対して実行しても良い。一般的には、スジＢＬを構成する部分、例えば、図４（Ｃ）の変換済部分画像ＴＩｅｒと、変換済部分画像ＴＩｅｒに隣接する部分との間における画素値の差と、を軽減する処理が実行されることが好ましい。 (7) In each of the above embodiments, the streaks BL appearing in the intermediate image Imb of FIG. 4(C) are removed by executing style conversion processing for the entire face (for example, S255 in FIG. 6) as specific post-processing. It is being reduced. Instead of this, other processing, for example, smoothing processing using a filter, may be performed on the pixels of the streak BL portion. Generally, processing is performed to reduce the difference in pixel values between a portion constituting the streak BL, for example, a converted partial image TIer in FIG. 4(C) and a portion adjacent to the converted partial image TIer. is preferably performed.

（８）上記各実施例の処理は、一例であり、適宜に省略や追加などの変更が行われ得る。例えば、図９のＳ４２０の白目処理、Ｓ４６０のスタイル変換処理、Ｓ４７５のスタイル変換処理の全部または一部は省略されても良い。また、これらの処理は、第１実施例の図６の処理の中で適宜に実行されても良い。図６または図９において、顔の全体のスタイル変換処理（図６のＳ２５５、図９のＳ４５５）は、省略されても良い。また、スタイル変換の強度のパラメータαは、固定値とされても良いし、各領域のスタイル変換において共通の値が用いられても良い。 (8) The processing in each of the above embodiments is merely an example, and may be omitted, added, or otherwise modified as appropriate. For example, all or part of the white eye processing in S420, the style conversion process in S460, and the style conversion process in S475 in FIG. 9 may be omitted. Further, these processes may be executed as appropriate during the process shown in FIG. 6 of the first embodiment. In FIG. 6 or 9, the entire face style conversion process (S255 in FIG. 6, S455 in FIG. 9) may be omitted. Further, the style conversion strength parameter α may be a fixed value, or a common value may be used in the style conversion of each region.

（９）上記各実施例のサーバ１００が実行する処理の全部または一部は、端末装置２００によって実行されても良い。例えば、図６のＳ２１０の顔の部位の領域の特定は、端末装置２００のＣＰＵ２１０によって実行されても良い。また、図６のＳ２３０にて生成された各部位の領域に対応する変換済データは、端末装置２００に送信され、端末装置２００において入力画像データと変換済データとを用いて、中間画像データ、あるいは、最終的な出力画像データが生成されても良い。 (9) All or part of the processing executed by the server 100 in each of the above embodiments may be executed by the terminal device 200. For example, the identification of the region of the face in S210 of FIG. 6 may be performed by the CPU 210 of the terminal device 200. Further, the converted data corresponding to the region of each part generated in S230 of FIG. 6 is transmitted to the terminal device 200, and the terminal device 200 uses the input image data and the converted data to generate intermediate image data, Alternatively, final output image data may be generated.

（１０）図１のサーバ１００や端末装置２００のハードウェア構成は、一例であり、これに限られない。例えば、各実施例の処理を行うサーバ１００や端末装置２００のプロセッサは、ＣＰＵに限らず、ＧＰＵ（Graphics Processing Unit）やＡＳＩＣ（application specific integrated circuit）、あるいは、これらとＣＰＵとの組み合わせであっても良い。また、サーバ１００は、ネットワークを介して互いに通信可能な複数個の計算機（例えば、いわゆるクラウドサーバ）であっても良い。 (10) The hardware configurations of the server 100 and the terminal device 200 in FIG. 1 are merely examples, and are not limited thereto. For example, the processor of the server 100 or terminal device 200 that performs the processing of each embodiment is not limited to a CPU, but may also be a GPU (Graphics Processing Unit), an ASIC (application specific integrated circuit), or a combination of these and a CPU. Also good. Further, the server 100 may be a plurality of computers (for example, a so-called cloud server) that can communicate with each other via a network.

（１１）上記各実施例において、ハードウェアによって実現されていた構成の一部をソフトウェアに置き換えるようにしてもよく、逆に、ソフトウェアによって実現されていた構成の一部あるいは全部をハードウェアに置き換えるようにしてもよい。例えば、生成ネットワークＧＮ１～ＧＮ６は、プログラムモジュールに代えて、ASIC（Application Specific Integrated Circuit）等のハードウェア回路によって実現されてよい。 (11) In each of the above embodiments, part of the configuration realized by hardware may be replaced with software, or conversely, part or all of the configuration realized by software may be replaced by hardware. You can do it like this. For example, the generation networks GN1 to GN6 may be realized by a hardware circuit such as an ASIC (Application Specific Integrated Circuit) instead of a program module.

以上、実施例、変形例に基づき本発明について説明してきたが、上記した発明の実施の形態は、本発明の理解を容易にするためのものであり、本発明を限定するものではない。本発明は、その趣旨並びに特許請求の範囲を逸脱することなく、変更、改良され得ると共に、本発明にはその等価物が含まれる。 Although the present invention has been described above based on examples and modifications, the embodiments of the invention described above are for facilitating understanding of the present invention, and are not intended to limit the present invention. The present invention may be modified and improved without departing from the spirit and scope of the claims, and the present invention includes equivalents thereof.

１００…サーバ,１０００…システム,１１０…ＣＰＵ,１２０…揮発性記憶装置,１３０…不揮発性記憶装置,１６０…通信インタフェース,２００…端末装置,２１０…ＣＰＵ,２２０…不揮発性記憶装置,２３０…揮発性記憶装置,２４０…操作部,２５０…表示装置,２６０…通信インタフェース,３００…ミシン,ＣＣ…特徴結合部,ＣＤ…コンテンツ画像データ,ＤＣ…デコーダ,ＥＣ…エンコーダ,ＧＮ１～ＧＮ６…生成ネットワーク,ＧＮＧ…生成ネットワーク群,ＩＴ…インターネット,Ｉｃ…補正済画像,Ｉｉｎ…入力画像,Ｉｍａ,Ｉｍｂ…中間画像,Ｉｏｕｔ…出力画像,ＮＷ…無線ネットワーク,ＰＧｓ,ＰＧｔ…コンピュータプログラム,ＳＤ…スタイル画像データ,ＳＤＧ…スタイル画像データ群,ＴＤ…変換済画像データ 100...Server, 1000...System, 110...CPU, 120...Volatile storage device, 130...Nonvolatile storage device, 160...Communication interface, 200...Terminal device, 210...CPU, 220...Nonvolatile storage device, 230...Volatile 240...operation unit, 250...display device, 260...communication interface, 300...sewing machine, CC...characteristic combination unit, CD...content image data, DC...decoder, EC...encoder, GN1 to GN6...generation network, GNG...Generation network group, IT...Internet, Ic...Corrected image, Iin...Input image, Ima, Imb...Intermediate image, Iout...Output image, NW...Wireless network, PGs, PGt...Computer program, SD...Style image data ,SDG...style image data group,TD...converted image data

Claims

an image acquisition step of acquiring input image data indicating the input image;
Using the input image data, a first input partial image that is a part of the input image, and a second input partial image that is a part of the input image and located at a different position from the first input partial image. a partial image identification step for identifying ,
A first style conversion process using a machine learning model is performed on the first partial image data representing the first input partial image to generate first converted data representing the first converted partial image. 1 conversion step;
performing the second style conversion process, which is a second style conversion process using a machine learning model and which is different from the first style conversion process, on second partial image data indicating the second input partial image; , a second conversion step of generating second converted data indicating the second converted partial image;
a first step of generating intermediate image data representing an intermediate image including the first converted partial image and the second converted partial image using the first converted data and the second converted data; , a second step of performing specific post-processing on the intermediate image data to generate output image data representing an output image based on the input image, the output image generating step comprising: includes a first output partial image corresponding to the first input partial image and a second output partial image corresponding to the second input partial image, the first output partial image being based on the first transformed partial image. the output image generating step, wherein the second output partial image is an image based on the second converted partial image;
Equipped with
The specific post-processing includes the third style conversion process that is a third style conversion process using a machine learning model and is different from the first style conversion process and the second style conversion process,
The third style conversion process is performed using the input image data as style image data .

The image generation method according to claim 1 ,
The input image includes an image showing a person's face,
The specific post-processing includes processing for correcting the skin color of the person's face on the input image data to generate the corrected input image data,
The third style conversion process is performed using the corrected input image data as style image data.

an image acquisition step of acquiring input image data indicating the input image;
Using the input image data, a first input partial image that is a part of the input image, and a second input partial image that is a part of the input image and located at a different position from the first input partial image. a partial image identification step for identifying ,
A first style conversion process using a machine learning model is performed on the first partial image data representing the first input partial image to generate first converted data representing the first converted partial image. 1 conversion step;
performing the second style conversion process, which is a second style conversion process using a machine learning model and which is different from the first style conversion process, on second partial image data indicating the second input partial image; , a second conversion step of generating second converted data indicating the second converted partial image;
a first step of generating intermediate image data representing an intermediate image including the first converted partial image and the second converted partial image using the first converted data and the second converted data; , a second step of performing specific post-processing on the intermediate image data to generate output image data representing an output image based on the input image, the output image generating step comprising: includes a first output partial image corresponding to the first input partial image and a second output partial image corresponding to the second input partial image, the first output partial image being based on the first transformed partial image. the output image generating step, wherein the second output partial image is an image based on the second converted partial image;
Equipped with
The specific post-processing includes the fourth style conversion process that is a fourth style conversion process using a machine learning model and is different from the first style conversion process and the second style conversion process,
The input image includes an image showing a person's face,
In the image generation method, the fourth style conversion process is a process of changing the facial expression of the person .

The image generation method according to any one of claims 1 to 3,
The specific post-processing includes determining, in the intermediate image, the difference in pixel values between the first converted partial image and a portion adjacent to the first converted partial image, and the difference between the second converted partial image and the portion adjacent to the first converted partial image. An image generation method comprising: reducing a difference in pixel values between a second converted partial image and an adjacent portion.

an image acquisition step of acquiring input image data indicating the input image;
Using the input image data, a first input partial image that is a part of the input image, and a second input partial image that is a part of the input image and located at a different position from the first input partial image. a partial image identification step for identifying ,
A first style conversion process using a machine learning model is performed on the first partial image data representing the first input partial image to generate first converted data representing the first converted partial image . 1 conversion step, the first style conversion process uses a first parameter that specifies the degree of difference between the first input partial image and the first converted partial image to be generated. the first conversion step being performed ;
performing the second style conversion process, which is a second style conversion process using a machine learning model and which is different from the first style conversion process, on second partial image data indicating the second input partial image; , a second conversion step of generating second converted data indicating a second converted partial image, wherein the second style conversion process includes the second input partial image and the second converted portion to be generated. the second transformation step is performed using a second parameter specifying the degree of difference between the images, the second parameter being adjusted independently of the first parameter;
an output image generation step of generating output image data representing an output image based on the input image using the first converted data and the second converted data, the output image being the first input partial image; a first output partial image corresponding to the first output partial image and a second output partial image corresponding to the second input partial image, the first output partial image being an image based on the first converted partial image; the output image generation step, wherein the output partial image is an image based on the second converted partial image;
An image generation method comprising:

an image acquisition step of acquiring input image data indicating the input image;
Using the input image data, a first input partial image that is a part of the input image, and a second input partial image that is a part of the input image and located at a different position from the first input partial image. a partial image identification step for identifying ,
a process selection step of selecting a process to be performed on first partial image data indicating the first input partial image;
a first conversion step of performing a first style conversion process using a machine learning model on the first partial image data to generate first converted data indicating the first converted partial image;
a color conversion step of performing a color conversion process on the first partial image data to convert at least a part of the color of the first input partial image without using a machine learning model;
performing the second style conversion process, which is a second style conversion process using a machine learning model and which is different from the first style conversion process, on second partial image data indicating the second input partial image; , a second conversion step of generating second converted data indicating the second converted partial image;
an output image generation step of generating output image data representing an output image based on the input image using the first converted data and the second converted data, the output image being the first input partial image; a first output partial image corresponding to the first output partial image and a second output partial image corresponding to the second input partial image, the first output partial image being an image based on the first converted partial image; the output image generation step, wherein the output partial image is an image based on the second converted partial image;
Equipped with
When the first style conversion process is selected in the process selection step, the first conversion process is executed without executing the color conversion process,
An image generation method , wherein when the color conversion process is selected in the process selection step, the color conversion process is executed without executing the first conversion process .

The image generation method according to claim 6 ,
The input image includes an image showing a person's face,
The first input partial image is an image showing the eyes of the person,
In the image generation method, the color conversion process is a process of converting a value of a pixel corresponding to the white part of the eye in the image showing the eye to a specific value indicating white.

an image acquisition step of acquiring input image data indicating the input image;
Using the input image data, a first input partial image that is a part of the input image, and a second input partial image that is a part of the input image and located at a different position from the first input partial image. a partial image identification step for identifying ,
Based on the first input by the user, first input information for the first style conversion process using the machine learning model is acquired, and based on the second input by the user, the second style conversion process using the machine learning model is acquired. an information acquisition step of acquiring second input information for the second style conversion process that is a style conversion process and is different from the first style conversion process;
Execute the first style conversion process on the first partial image data indicating the first input partial image using the first input information to generate first converted data indicating the first converted partial image. a first conversion step of generating;
Execute the second style conversion process on the second partial image data indicating the second input partial image using the second input information to generate second converted data indicating the second converted partial image. a second conversion step to generate;
an output image generation step of generating output image data representing an output image based on the input image using the first converted data and the second converted data, the output image being the first input partial image; a first output partial image corresponding to the first output partial image and a second output partial image corresponding to the second input partial image, the first output partial image being an image based on the first converted partial image; the output image generation step, wherein the output partial image is an image based on the second converted partial image;
An image generation method comprising:

The image generation method according to claim 8 ,
The first input information includes data indicating an image corresponding to the first input partial image and having a style different from the first input partial image,
The second input information includes data indicating an image corresponding to the second input partial image and having a style different from the second input partial image.

The image generation method according to any one of claims 1 to 9 ,
The first style conversion process is performed using first style image data indicating a first style image,
The second style conversion process is performed using second style image data indicating a second style image,
The first converted partial image is an image in which the style of the first style image is applied to the first input partial image,
The second converted partial image is an image in which the style of the second style image is applied to the second input partial image.

The image generation method according to any one of claims 1 to 10 ,
The input image includes an image showing a person's face,
The first input partial image is an image showing a first part of the person's face,
In the image generation method, the second input partial image is an image showing the second part of the person's face and located at a different position from the first part.

The image generation method according to any one of claims 1 to 11 , further comprising:
comprising a type identifying step of identifying the type of the input image;
When the input image is a first type input image,
In the first conversion step, a first type of first style conversion process is performed on the first partial image data,
In the second conversion step, a first type of second style conversion process is performed on the second partial image data,
When the input image is a second type input image,
In the first conversion step, a second type of first style conversion process is performed on the first partial image data,
In the image generation method, in the second conversion step, a second type of second style conversion process is performed on the second partial image data.

The image generation method according to claim 12 ,
The input image includes an image showing a person's face,
The type of the input image is a type related to at least part of the person's gender, race, facial expression, and facial angle.

The image generation method according to any one of claims 1 to 13 ,
The input image includes an image showing a person's face,
The second input partial image is an image showing the mouth of the person,
The second style conversion process is a process of correcting the alignment of teeth in the image showing the mouth.

an image acquisition unit that acquires input image data indicating the input image;
Using the input image data, a first input partial image that is a part of the input image, and a second input partial image that is a part of the input image and located at a different position from the first input partial image. a partial image identification unit that identifies ,
A first style conversion process using a machine learning model is performed on the first partial image data representing the first input partial image to generate first converted data representing the first converted partial image. 1 conversion section;
performing the second style conversion process, which is a second style conversion process using a machine learning model and which is different from the first style conversion process, on second partial image data indicating the second input partial image; , a second conversion unit that generates second converted data indicating the second converted partial image;
a first part that uses the first converted data and the second converted data to generate intermediate image data representing an intermediate image including the first converted partial image and the second converted partial image; , a second part that performs specific post-processing on the intermediate image data to generate output image data representing an output image based on the input image, the output image generation unit comprising: includes a first output partial image corresponding to the first input partial image and a second output partial image corresponding to the second input partial image, the first output partial image being based on the first transformed partial image. an image, and the second output partial image is an image based on the second converted partial image;
Equipped with
The specific post-processing includes the third style conversion process that is a third style conversion process using a machine learning model and is different from the first style conversion process and the second style conversion process,
The system wherein the third style conversion process is executed using the input image data as style image data .

an image acquisition unit that acquires input image data indicating the input image;
Using the input image data, a first input partial image that is a part of the input image, and a second input partial image that is a part of the input image and located at a different position from the first input partial image. a partial image identification unit that identifies ,
A first style conversion process using a machine learning model is performed on the first partial image data representing the first input partial image to generate first converted data representing the first converted partial image. 1 conversion section;
performing the second style conversion process, which is a second style conversion process using a machine learning model and which is different from the first style conversion process, on second partial image data indicating the second input partial image; , a second conversion unit that generates second converted data indicating the second converted partial image;
a first part that uses the first converted data and the second converted data to generate intermediate image data representing an intermediate image including the first converted partial image and the second converted partial image; , a second part that performs specific post-processing on the intermediate image data to generate output image data representing an output image based on the input image, the output image generation unit comprising: includes a first output partial image corresponding to the first input partial image and a second output partial image corresponding to the second input partial image, the first output partial image being based on the first transformed partial image. an image, and the second output partial image is an image based on the second converted partial image;
Equipped with
The specific post-processing includes the fourth style conversion process that is a fourth style conversion process using a machine learning model and is different from the first style conversion process and the second style conversion process,
The input image includes an image showing a person's face,
The system wherein the fourth style conversion process is a process of changing the facial expression of the person .

an image acquisition unit that acquires input image data indicating the input image;
Using the input image data, a first input partial image that is a part of the input image, and a second input partial image that is a part of the input image and located at a different position from the first input partial image. a partial image identification unit that identifies ,
A first style conversion process using a machine learning model is performed on the first partial image data representing the first input partial image to generate first converted data representing the first converted partial image . 1 conversion unit, wherein the first style conversion process uses a first parameter that specifies the degree of difference between the first input partial image and the first converted partial image to be generated. The first conversion unit is executed ;
performing the second style conversion process, which is a second style conversion process using a machine learning model and which is different from the first style conversion process, on second partial image data indicating the second input partial image; , a second conversion unit that generates second converted data indicating a second converted partial image, wherein the second style conversion process is performed on the second input partial image and the second converted portion to be generated. and the second transformation unit is performed using a second parameter that specifies the degree of difference between the images, and the second parameter is adjusted independently of the first parameter ;
an output image generation unit that uses the first converted data and the second converted data to generate output image data representing an output image based on the input image, the output image being the first input partial image; a first output partial image corresponding to the first output partial image and a second output partial image corresponding to the second input partial image, the first output partial image being an image based on the first converted partial image; the output image generation unit, wherein the output partial image is an image based on the second converted partial image;
A system equipped with

an image acquisition unit that acquires input image data indicating the input image;
Using the input image data, a first input partial image that is a part of the input image, and a second input partial image that is a part of the input image and located at a different position from the first input partial image. a partial image identification unit that identifies ,
a process selection unit that selects a process to be performed on first partial image data indicating the first input partial image;
a first conversion unit that executes a first style conversion process using a machine learning model on the first partial image data to generate first converted data indicating the first converted partial image;
a color conversion unit that performs a color conversion process on the first partial image data to convert at least part of the color of the first input partial image without using a machine learning model;
performing the second style conversion process, which is a second style conversion process using a machine learning model and which is different from the first style conversion process, on second partial image data indicating the second input partial image; , a second conversion unit that generates second converted data indicating the second converted partial image;
an output image generation unit that uses the first converted data and the second converted data to generate output image data representing an output image based on the input image, the output image being the first input partial image; a first output partial image corresponding to the first output partial image and a second output partial image corresponding to the second input partial image, the first output partial image being an image based on the first converted partial image; the output image generation unit, wherein the output partial image is an image based on the second converted partial image;
Equipped with
When the first style conversion process is selected by the process selection unit, the color conversion unit does not execute the color conversion process, and the first conversion unit executes the first style conversion process,
When the color conversion process is selected by the process selection unit, the color conversion unit executes the color conversion process, and the first conversion unit does not execute the first style conversion process.

an image acquisition unit that acquires input image data indicating the input image;
Using the input image data, a first input partial image that is a part of the input image, and a second input partial image that is a part of the input image and located at a different position from the first input partial image. a partial image identification unit that identifies ,
Based on the first input by the user, first input information for the first style conversion process using the machine learning model is acquired, and based on the second input by the user, the second style conversion process using the machine learning model is acquired. an information acquisition unit that acquires second input information for the second style conversion process that is a style conversion process and is different from the first style conversion process;
Execute the first style conversion process on the first partial image data indicating the first input partial image using the first input information to generate first converted data indicating the first converted partial image. a first conversion unit that generates;
Execute the second style conversion process on the second partial image data indicating the second input partial image using the second input information to generate second converted data indicating the second converted partial image. a second conversion unit that generates;
an output image generation unit that uses the first converted data and the second converted data to generate output image data representing an output image based on the input image, the output image being the first input partial image; a first output partial image corresponding to the first output partial image and a second output partial image corresponding to the second input partial image, the first output partial image being an image based on the first converted partial image; the output image generation unit, wherein the output partial image is an image based on the second converted partial image;
A system equipped with

first partial image data indicating a first input partial image that is a part of the input image; and second partial image data indicating a second input partial image that is a part of the input image and is located at a different position from the first input partial image. a partial image acquisition function that acquires two partial image data;
a first conversion function that executes a first style conversion process using a machine learning model on the first partial image data to generate first converted data indicating the first converted partial image;
The second style conversion process, which is a second style conversion process using a machine learning model and which is different from the first style conversion process, is performed on the second partial image data to obtain a second converted part. a second conversion function that generates second converted data representing the image;
a first function of generating intermediate image data representing an intermediate image including the first converted partial image and the second converted partial image using the first converted data and the second converted data; , a second function for performing specific post-processing on the intermediate image data to generate output image data representing an output image based on the input image, the output image generation function comprising: includes a first output partial image corresponding to the first input partial image and a second output partial image corresponding to the second input partial image, the first output partial image being based on the first transformed partial image. the output image generation function, wherein the second output partial image is an image based on the second converted partial image;
to be realized by a computer,
The specific post-processing includes the third style conversion process that is a third style conversion process using a machine learning model and is different from the first style conversion process and the second style conversion process,
The third style conversion process is a computer program that is executed using input image data indicating the input image as style image data .

first partial image data indicating a first input partial image that is a part of the input image; and second partial image data indicating a second input partial image that is a part of the input image and is located at a different position from the first input partial image. a partial image acquisition function that acquires two partial image data;
a first conversion function that executes a first style conversion process using a machine learning model on the first partial image data to generate first converted data indicating the first converted partial image;
The second style conversion process, which is a second style conversion process using a machine learning model and which is different from the first style conversion process, is performed on the second partial image data to obtain a second converted part. a second conversion function that generates second converted data representing the image;
a first function of generating intermediate image data representing an intermediate image including the first converted partial image and the second converted partial image using the first converted data and the second converted data; , a second function for performing specific post-processing on the intermediate image data to generate output image data representing an output image based on the input image, the output image generation function comprising: includes a first output partial image corresponding to the first input partial image and a second output partial image corresponding to the second input partial image, the first output partial image being based on the first transformed partial image. the output image generation function, wherein the second output partial image is an image based on the second converted partial image;
Let the computer realize it,
The specific post-processing includes the fourth style conversion process that is a fourth style conversion process using a machine learning model and is different from the first style conversion process and the second style conversion process,
The input image includes an image showing a person's face,
The computer program is a computer program , wherein the fourth style conversion process is a process of changing the facial expression of the person .

first partial image data indicating a first input partial image that is a part of the input image; and second partial image data indicating a second input partial image that is a part of the input image and is located at a different position from the first input partial image. a partial image acquisition function that acquires two partial image data;
A first conversion function that executes a first style conversion process using a machine learning model on the first partial image data to generate first converted data indicating a first converted partial image, The first style conversion process is performed using a first parameter that specifies the degree of difference between the first input partial image and the first converted partial image to be generated. conversion function and
performing the second style conversion process, which is a second style conversion process using a machine learning model and which is different from the first style conversion process, on second partial image data indicating the second input partial image; , a second conversion function that generates second converted data indicating a second converted partial image, wherein the second style conversion process includes the second input partial image and the second converted portion to be generated. the second transformation function is performed using a second parameter specifying the degree of difference between the images, the second parameter being adjusted independently of the first parameter;
Let the computer realize it,
The first transformed data and the second transformed data are used to generate an output image representing an output image based on the input image, the output image corresponding to the first input partial image. a first output partial image corresponding to the second input partial image, the first output partial image is an image based on the first converted partial image, and the second output partial image is an image based on the first converted partial image; The computer program product, wherein the image is an image based on the second converted partial image.

first partial image data indicating a first input partial image that is a part of the input image; and second partial image data indicating a second input partial image that is a part of the input image and is located at a different position from the first input partial image. a partial image acquisition function that acquires two partial image data;
a process selection function that selects a process to be performed on the first partial image data;
a first conversion function that executes a first style conversion process using a machine learning model on the first partial image data to generate first converted data indicating the first converted partial image;
a color conversion function that performs a color conversion process on the first partial image data to convert at least part of the color of the first input partial image without using a machine learning model;
The second style conversion process, which is a second style conversion process using a machine learning model and which is different from the first style conversion process, is performed on the second partial image data to obtain a second converted part. a second conversion function that generates second converted data representing the image;
Let the computer realize it,
The first transformed data and the second transformed data are used to generate an output image representing an output image based on the input image, the output image corresponding to the first input partial image. a first output partial image corresponding to the second input partial image, the first output partial image is an image based on the first converted partial image, and the second output partial image is an image based on the first converted partial image; The image is an image based on the second converted partial image,
When the first style conversion process is selected by the process selection function, the color conversion function does not execute the color conversion process, and the first conversion function executes the first style conversion process,
When the color conversion process is selected by the process selection function, the color conversion function executes the color conversion process, and the first conversion function does not execute the first style conversion process.

first partial image data representing a first input partial image that is part of the input image; and second partial image data representing a second input partial image that is part of the input image and located at a different position from the first input partial image. a partial image acquisition function that acquires two partial image data;
Based on the first input by the user, first input information for the first style conversion process using the machine learning model is acquired, and based on the second input by the user, the second style conversion process using the machine learning model is acquired. an information acquisition function that acquires second input information for the second style conversion process that is a style conversion process and is different from the first style conversion process;
a first conversion function that executes the first style conversion process on the first partial image data using the first input information to generate first converted data indicating the first converted partial image; ,
a second conversion function that executes the second style conversion process on the second partial image data using the second input information to generate second converted data indicating the second converted partial image; and,
Let the computer realize it,
The first transformed data and the second transformed data are used to generate an output image representing an output image based on the input image, the output image corresponding to the first input partial image. a first output partial image corresponding to the second input partial image, the first output partial image is an image based on the first converted partial image, and the second output partial image is an image based on the first converted partial image; The computer program product, wherein the image is an image based on the second converted partial image.