JP7423905B2

JP7423905B2 - Machine learning model training method, data generation device, and trained machine learning model

Info

Publication number: JP7423905B2
Application number: JP2019092212A
Authority: JP
Inventors: 航平渡邉
Original assignee: Brother Industries Ltd
Current assignee: Brother Industries Ltd
Priority date: 2019-05-15
Filing date: 2019-05-15
Publication date: 2024-01-30
Anticipated expiration: 2039-05-15
Also published as: WO2020230777A1; JP2020187583A

Description

本明細書は、機械学習モデルや該機械学習モデルを用いてデータを生成する技術に関する。 The present specification relates to a machine learning model and a technique for generating data using the machine learning model.

近年、注目される機械学習モデルとして、敵対的生成ネットワーク（GANs: Generative Adversarial Networks）が知られている。敵対的生成ネットワークでは、ディスクリミネータとジェネレータと呼ばれる２つのネットワークが利用される。ディスクリミネータは実データと偽データとを識別することを目的としてトレーニングされ、ジェネレータは、ディスクリミネータが上述の識別を誤るような偽データを生成することを目的としてトレーニングされる。 Generative adversarial networks (GANs) are known as machine learning models that have received attention in recent years. A generative adversarial network uses two networks called a discriminator and a generator. The discriminator is trained for the purpose of distinguishing between real data and false data, and the generator is trained for the purpose of generating false data such that the discriminator makes the above-mentioned false identification.

非特許文献１には、条件付き敵対的生成ネットワークを利用して、特定の区分(ドメインと呼ばれる)に属する画像（例えば、線画）を、他の区分に属する画像（例えば、写真）に変換する技術が開示されている。この技術では、ジェネレータは、入力される画像をエンコードするエンコーダと、エンコードして得られたデータをデコードして変換後の画像を生成するデコーダと、を備えている。 Non-Patent Document 1 describes a method of converting an image (e.g., a line drawing) belonging to a specific category (called a domain) into an image (e.g., a photo) belonging to another category using a conditional generative adversarial network. The technology has been disclosed. In this technology, the generator includes an encoder that encodes an input image, and a decoder that decodes the encoded data to generate a converted image.

Isola, P. et al ”Image-to-Image Translation with Conditional Adversarial Networks.” IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Isola, P. et al “Image-to-Image Translation with Conditional Adversarial Networks.” IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017) Taigman, Y. et al ”Unsupervised cross-domain image generation.” arXivpreprint arXiv:1611.02200 (2016).Taigman, Y. et al “Unsupervised cross-domain image generation.” arXivpreprint arXiv:1611.02200 (2016).

上記技術では、入力データと偽データとディスクリミネータによる識別結果を用いてジェネレータとを用いてジェネレータのトレーニングが行われるが、ジェネレータのトレーニングのさらなる改善が求められている。 In the above technology, training of the generator is performed using input data, false data, and a generator using identification results by a discriminator, but there is a need for further improvement in training of the generator.

本明細書は、機械学習モデルを効果的にトレーニングして、機械学習モデルを用いて入力データの特徴が適切に反映されたデータを生成する技術を開示する。 This specification discloses a technique for effectively training a machine learning model and using the machine learning model to generate data that appropriately reflects the characteristics of input data.

本明細書に開示された技術は、上述の課題の少なくとも一部を解決するためになされたものであり、以下の適用例として実現することが可能である。 The technology disclosed in this specification has been made to solve at least part of the above-mentioned problems, and can be realized as the following application examples.

［適用例１］機械学習モデルのトレーニング方法であって、入力データを第１の機械学習モデルに入力することによって前記入力データに対応する偽データを出力させる第１工程であって、前記第１の機械学習モデルは、前記入力データに対して、次元数を削減する次元削減処理を実行して第１の特徴データを生成するエンコーダと、前記第１の特徴データに対して、次元数を復元する次元復元処理を実行して前記偽データを生成するデコーダと、を含み、前記第１の機械学習モデルは、複数個の第１の演算パラメータを用いて前記次元削減処理および前記次元復元処理を実行する、前記第１工程と、第１のペアと第２のペアとを含む複数個のデータペアを第２の機械学習モデルに入力して、前記複数個のデータペアに対応する複数個の識別データを出力させる第２工程であって、前記第１のペアは、前記入力データと、前記入力データに対応する実データと、から成る一対のデータであり、前記第２のペアは、前記入力データと、前記入力データに対応する前記偽データと、から成る一対のデータであり、前記識別データは、対応する前記データペアが前記第１のペアと前記第２のペアとのいずれであるかを識別した結果を示し、前記第２の機械学習モデルは、複数個の第２の演算パラメータを用いた演算を実行して前記識別データを生成する、前記第２工程と、前記識別データと前記識別データの目標値を示す教師データとを用いて、前記識別データと前記教師データとの差分が小さくなるように、前記複数個の第２の演算パラメータを調整する第３工程と、前記偽データに対して、前記エンコーダによる前記次元削減処理を実行して第２の特徴データを生成する第４工程と、前記識別データと前記教師データと前記第１の特徴データと前記第２の特徴データとを用いて、前記識別データと前記教師データとの差分が大きくなり、かつ、前記第１の特徴データと前記第２の特徴データとの差分が小さくなるように、前記複数個の第１の演算パラメータを調整する第５工程と、を備え、前記第１工程～前記第５工程を複数回繰り返すことによって、前記第１の機械学習モデルと前記第２の機械学習モデルとを並行してトレーニングする、方法。 [Application example 1] A method for training a machine learning model, the first step of inputting input data to a first machine learning model to output fake data corresponding to the input data, The first machine learning model includes an encoder that performs dimension reduction processing to reduce the number of dimensions on the input data to generate first feature data, and an encoder that reduces the number of dimensions for the first feature data. a decoder that generates the fake data by performing a dimension restoration process to restore, the first machine learning model performing the dimension reduction process and the dimension restoration process using a plurality of first calculation parameters. inputting a plurality of data pairs including the first pair and the second pair into a second machine learning model, and inputting a plurality of data pairs corresponding to the plurality of data pairs. A second step of outputting identification data, wherein the first pair is a pair of data consisting of the input data and actual data corresponding to the input data, and the second pair is: A pair of data consisting of the input data and the fake data corresponding to the input data, and the identification data indicates whether the corresponding data pair is the first pair or the second pair. the second step of generating the identification data by performing a calculation using a plurality of second calculation parameters; and teacher data indicating a target value of the identification data, a third step of adjusting the plurality of second calculation parameters so that the difference between the identification data and the teacher data becomes small; a fourth step of performing the dimension reduction process by the encoder on the fake data to generate second feature data; and the identification data, the teacher data, the first feature data, and the second feature. data, so that the difference between the identification data and the teacher data becomes large and the difference between the first feature data and the second feature data becomes small. a fifth step of adjusting calculation parameters of the first machine learning model and the second machine learning model in parallel by repeating the first to fifth steps a plurality of times. How to train.

上記構成によれば、第１の機械学習モデルの第１の演算パラメータは、識別データと教師データとの差分が大きくなるように調整されるだけでなく、第１の特徴データと第２の特徴データとを用いて、これらの特徴データの差分が小さくなるように調整される。この結果、入力データの特徴が第１の特徴データに適切に反映されるように、第１の演算パラメータを調整できる。従って、第１の機械学習モデルを効果的にトレーニングできるので、上記方法を用いてトレーニングされた機械学習モデルを用いれば、入力データの特徴が反映された適切な偽データを生成できる。
［適用例２]
適用例１に記載の方法であって、さらに、
前記第１の特徴データと前記第２の特徴データとを用いて、前記識別データと前記教師データとを用いずに、前記第１の特徴データと前記第２の特徴データとの差分が小さくなるように、前記複数個の第１の演算パラメータを調整する第６工程を備え、
前記第１工程と前記第４工程と前記第６工程とを複数回繰り返すことによって、前記第１の機械学習モデルをトレーニングした後に、前記第１工程～前記第５工程を複数回繰り返すことによって、前記第１の機械学習モデルと前記第２の機械学習モデルとを並行してトレーニングする、方法。 According to the above configuration, the first calculation parameter of the first machine learning model is not only adjusted so that the difference between the identification data and the training data becomes large, but also The difference between these feature data is adjusted using the data. As a result, the first calculation parameters can be adjusted so that the characteristics of the input data are appropriately reflected in the first characteristic data. Therefore, the first machine learning model can be effectively trained, and by using the machine learning model trained using the above method, it is possible to generate appropriate fake data that reflects the characteristics of the input data.
[Application example 2]
The method described in Application Example 1, further comprising:
Using the first feature data and the second feature data, without using the identification data and the teacher data, the difference between the first feature data and the second feature data is reduced. a sixth step of adjusting the plurality of first calculation parameters,
After training the first machine learning model by repeating the first step, the fourth step, and the sixth step multiple times, by repeating the first step to the fifth step multiple times, A method of training the first machine learning model and the second machine learning model in parallel.

［適用例３］機械学習モデルのトレーニング方法であって、入力データを第１の機械学習モデルに入力することによって前記入力データに対応する偽データを出力させる第１工程であって、前記第１の機械学習モデルは、前記入力データに対して、次元数を削減する次元削減処理を実行して第１の特徴データを生成するエンコーダと、前記第１の特徴データに対して、次元数を復元する次元復元処理を実行して前記偽データを生成するデコーダと、を含み、前記第１の機械学習モデルは、複数個の第１の演算パラメータを用いて前記次元削減処理および前記次元復元処理を実行する、前記第１工程と、第１のペアと第２のペアとを含む複数個のデータペアを第２の機械学習モデルに入力して、前記複数個のデータペアに対応する複数個の識別データを出力させる第２工程であって、前記第１のペアは、前記入力データと、前記入力データに対応する実データと、から成る一対のデータであり、前記第２のペアは、前記入力データと、前記入力データに対応する前記偽データと、から成る一対のデータであり、前記識別データは、対応する前記データペアが前記第１のペアと前記第２のペアとのいずれであるかを識別した結果を示し、前記第２の機械学習モデルは、複数個の第２の演算パラメータを用いた演算を実行して前記識別データを生成する、前記第２工程と、前記識別データと前記識別データの目標値を示す教師データとを用いて、前記識別データと前記教師データとの差分が小さくなるように、前記複数個の第２の演算パラメータを調整する第３工程と、前記偽データに対して、前記エンコーダによる前記次元削減処理を実行して第２の特徴データを生成する第４工程と、前記識別データと前記教師データとを用いて、前記識別データと前記教師データとの差分が大きくなるように、前記複数個の第１の演算パラメータを調整する第５工程と、前記第１の特徴データと前記第２の特徴データとを用いて、前記識別データと前記教師データとを用いずに、前記第１の特徴データと前記第２の特徴データとの差分が小さくなるように、前記複数個の第１の演算パラメータを調整する第６工程と、を備え、前記第１工程と前記第４工程と前記第６工程とを複数回繰り返すことによって、前記第１の機械学習モデルをトレーニングした後に、前記第１工程～前記第５工程を複数回繰り返すことによって、前記第１の機械学習モデルと前記第２の機械学習モデルとを並行してトレーニングする、方法。 [Application Example 3 ] A method for training a machine learning model, the first step of inputting input data to a first machine learning model to output fake data corresponding to the input data, The first machine learning model includes an encoder that performs dimension reduction processing to reduce the number of dimensions on the input data to generate first feature data, and an encoder that reduces the number of dimensions for the first feature data. a decoder that generates the fake data by performing a dimension restoration process to restore, the first machine learning model performing the dimension reduction process and the dimension restoration process using a plurality of first calculation parameters. inputting a plurality of data pairs including the first pair and the second pair into a second machine learning model, and inputting a plurality of data pairs corresponding to the plurality of data pairs. A second step of outputting identification data, wherein the first pair is a pair of data consisting of the input data and actual data corresponding to the input data, and the second pair is: A pair of data consisting of the input data and the fake data corresponding to the input data, and the identification data indicates whether the corresponding data pair is the first pair or the second pair. the second step of generating the identification data by performing a calculation using a plurality of second calculation parameters; and teacher data indicating a target value of the identification data, a third step of adjusting the plurality of second calculation parameters so that the difference between the identification data and the teacher data becomes small; a fourth step of performing the dimension reduction process by the encoder on the fake data to generate second feature data; and using the identification data and the teacher data, a fifth step of adjusting the plurality of first calculation parameters so that the difference between the identification data and the teacher data is increased; a sixth step of adjusting the plurality of first calculation parameters so that the difference between the first feature data and the second feature data is small, without using the first feature data, After training the first machine learning model by repeating the first step, the fourth step, and the sixth step multiple times, the first machine learning model is trained by repeating the first to fifth steps multiple times. A method for training a first machine learning model and a second machine learning model in parallel.

上記構成によれば、第１の特徴データと第２の特徴データとを用いて、これらの特徴データの差分が小さくなるように第１の機械学習モデルをトレーニングした後に、第１の機械学習モデルと第２の機械学習モデルとを並行してトレーニングする。この結果、入力データの特徴が第１の特徴データに適切に反映されるように、第１の機械学習モデルを効果的にトレーニングできるので、上記方法を用いてトレーニングされた機械学習モデルを用いれば、入力データの特徴が反映された適切な出力データを生成できる。
［適用例４]
適用例２または３のいずれかに記載の方法であって、
前記第６工程において、さらに、特定の前記入力データに対応する前記実データおよび前記偽データを用いて、前記実データと前記偽データとの差分が小さくなるように、前記第１の演算パラメータを調整する、方法。
［適用例５]
適用例１～４のいずれかに記載の方法であって、
前記第５工程において、さらに、特定の前記入力データに対応する前記実データおよび前記偽データを用いて、前記実データと前記偽データとの差分が小さくなるように、前記第１の演算パラメータを調整する、方法。 According to the above configuration, after the first machine learning model is trained using the first feature data and the second feature data so that the difference between these feature data becomes small, the first machine learning model is trained using the first feature data and the second feature data. and a second machine learning model in parallel. As a result, the first machine learning model can be effectively trained so that the features of the input data are appropriately reflected in the first feature data, so if the machine learning model trained using the above method is used, , it is possible to generate appropriate output data that reflects the characteristics of input data.
[Application example 4]
The method according to either Application Example 2 or 3,
In the sixth step, the first calculation parameter is further adjusted using the real data and the fake data corresponding to the specific input data so that the difference between the real data and the fake data is small. How to adjust.
[Application example 5]
The method according to any one of Application Examples 1 to 4,
In the fifth step, the first calculation parameter is further adjusted using the real data and the fake data corresponding to the specific input data so that the difference between the real data and the fake data is small. How to adjust.

［適用例６］データ生成装置であって、学習済みの第１の機械学習モデルを用いて入力データに対応する出力データを生成する生成部と、前記出力データを出力する出力部と、を備え、前記第１の機械学習モデルは、前記入力データに対して、次元数を削減する次元削減処理を実行して第１の特徴データを生成するエンコーダと、前記第１の特徴データに対して、次元数を復元する次元復元処理を実行して前記出力データを生成するデコーダと、を含み、複数個の第１の演算パラメータを用いて前記次元削減処理および前記次元復元処理を実行するモデルであり、学習済みの前記第１の機械学習モデルの前記複数個の第１の演算パラメータは、トレーニング処理によって調整済みであり、前記トレーニング処理は、前記入力データを第１の機械学習モデルに入力することによって前記入力データに対応する前記出力データを偽データとして出力させる第１処理と、第１のペアと第２のペアとを含む複数個のデータペアを第２の機械学習モデルに入力して、前記複数個のデータペアに対応する複数個の識別データを出力させる第２処理であって、前記第１のペアは、前記入力データと、前記入力データに対応する実データと、から成る一対のデータであり、前記第２のペアは、前記入力データと、前記入力データに対応する前記偽データと、から成る一対のデータであり、前記識別データは、対応する前記データペアが前記第１のペアと前記第２のペアとのいずれであるかを識別した結果を示し、前記第２の機械学習モデルは、複数個の第２の演算パラメータを用いた演算を実行して前記識別データを生成する、前記第２処理と、前記識別データと前記識別データの目標値を示す教師データとを用いて、前記識別データと前記教師データとの差分が小さくなるように、前記複数個の第２の演算パラメータを調整する第３処理と、前記偽データに対して、前記エンコーダによる前記次元削減処理を実行して第２の特徴データを生成する第４処理と、前記識別データと前記教師データと前記第１の特徴データと前記第２の特徴データとを用いて、前記識別データと前記教師データとの差分が大きくなり、かつ、前記第１の特徴データと前記第２の特徴データとの差分が小さくなるように、前記複数個の第１の演算パラメータを調整する第５処理と、を備え、前記第１処理～前記第５処理を複数回繰り返すことによって、前記第１の機械学習モデルと前記第２の機械学習モデルとを並行してトレーニングする処理である、データ生成装置。 [Application Example 6 ] A data generation device, comprising a generation unit that generates output data corresponding to input data using a trained first machine learning model, and an output unit that outputs the output data. , the first machine learning model includes an encoder that performs dimension reduction processing to reduce the number of dimensions on the input data to generate first feature data; , a decoder that performs a dimension restoration process to restore the number of dimensions and generates the output data, and a model that executes the dimension reduction process and the dimension restoration process using a plurality of first calculation parameters. and the plurality of first calculation parameters of the trained first machine learning model have been adjusted by a training process, and the training process inputs the input data to the first machine learning model. a first process of outputting the output data corresponding to the input data as fake data; and inputting a plurality of data pairs including the first pair and the second pair to a second machine learning model. , a second process of outputting a plurality of identification data corresponding to the plurality of data pairs, wherein the first pair is a pair consisting of the input data and actual data corresponding to the input data. , the second pair is a pair of data consisting of the input data and the fake data corresponding to the input data, and the identification data is such that the corresponding data pair is the first pair of data. and the second pair, and the second machine learning model executes a calculation using a plurality of second calculation parameters to obtain the identification data. Using the second process to generate the identification data and the teacher data indicating the target value of the identification data, the plurality of second a fourth process of performing the dimension reduction process by the encoder on the fake data to generate second feature data; Using the first feature data and the second feature data, the difference between the identification data and the teacher data becomes large, and the difference between the first feature data and the second feature data increases. a fifth process of adjusting the plurality of first calculation parameters such that A data generation device that performs a process of training the second machine learning model in parallel.

上記構成によれば、第１の機械学習モデルの第１の演算パラメータは、トレーニング処理において、識別データと教師データとの差分が大きくなるように調整されるだけでなく、第１の特徴データと第２の特徴データとを用いて、これらの特徴データの差分が小さくなるように調整されている。これによって、入力データの特徴が第１の特徴データに適切に反映されるように、第１の機械学習モデルが効果的にトレーニングされている。従って、データ生成装置は、機械学習モデルを用いて、入力データの特徴が反映された適切な出力データを生成できる。 According to the above configuration, the first calculation parameter of the first machine learning model is not only adjusted so as to increase the difference between the identification data and the teacher data in the training process, but also adjusted to increase the difference between the identification data and the teacher data. Adjustment is made using the second feature data so that the difference between these feature data becomes small. As a result, the first machine learning model is effectively trained so that the features of the input data are appropriately reflected in the first feature data. Therefore, the data generation device can use the machine learning model to generate appropriate output data that reflects the characteristics of the input data.

［適用例７］データ生成装置であって、学習済みの第１の機械学習モデルを用いて入力データに対応する出力データを生成する生成部と、前記出力データを出力する出力部と、を備え、前記第１の機械学習モデルは、前記入力データに対して、次元数を削減する次元削減処理を実行して第１の特徴データを生成するエンコーダと、前記第１の特徴データに対して、次元数を復元する次元復元処理を実行して前記出力データを生成するデコーダと、を含み、複数個の第１の演算パラメータを用いて前記次元削減処理および前記次元復元処理を実行するモデルであり、学習済みの前記第１の機械学習モデルの前記複数個の第１の演算パラメータは、トレーニング処理によって調整済みであり、前記トレーニング処理は、前記入力データを第１の機械学習モデルに入力することによって前記入力データに対応する前記出力データを偽データとして出力させる第１処理と、第１のペアと第２のペアとを含む複数個のデータペアを第２の機械学習モデルに入力して、前記複数個のデータペアに対応する複数個の識別データを出力させる第２処理であって、前記第１のペアは、前記入力データと、前記入力データに対応する実データと、から成る一対のデータであり、前記第２のペアは、前記入力データと、前記入力データに対応する前記偽データと、から成る一対のデータであり、前記識別データは、対応する前記データペアが前記第１のペアと前記第２のペアとのいずれであるかを識別した結果を示し、前記第２の機械学習モデルは、複数個の第２の演算パラメータを用いた演算を実行して前記識別データを生成する、前記第２処理と、前記識別データと前記識別データの目標値を示す教師データとを用いて、前記識別データと前記教師データとの差分が小さくなるように、前記複数個の第２の演算パラメータを調整する第３処理と、前記偽データに対して、前記エンコーダによる前記次元削減処理を実行して第２の特徴データを生成する第４処理と、前記識別データと前記教師データとを用いて、前記識別データと前記教師データとの差分が大きくなるように、前記複数個の第１の演算パラメータを調整する第５処理と、前記第１の特徴データと前記第２の特徴データとを用いて、前記識別データと前記教師データとを用いずに、前記第１の特徴データと前記第２の特徴データとの差分が小さくなるように、前記複数個の第１の演算パラメータを調整する第６処理と、を備え、前記第１処理と前記第４処理と前記第６処理とを複数回繰り返すことによって、前記第１の機械学習モデルをトレーニングした後に、前記第１処理～前記第５処理を複数回繰り返すことによって、前記第１の機械学習モデルと前記第２の機械学習モデルとを並行してトレーニングする処理である、データ生成装置。 [Application Example 7 ] A data generation device, comprising a generation unit that generates output data corresponding to input data using a trained first machine learning model, and an output unit that outputs the output data. , the first machine learning model includes an encoder that performs dimension reduction processing to reduce the number of dimensions on the input data to generate first feature data; , a decoder that performs a dimension restoration process to restore the number of dimensions and generates the output data, and a model that executes the dimension reduction process and the dimension restoration process using a plurality of first calculation parameters. and the plurality of first calculation parameters of the trained first machine learning model have been adjusted by a training process, and the training process inputs the input data to the first machine learning model. a first process of outputting the output data corresponding to the input data as fake data; and inputting a plurality of data pairs including the first pair and the second pair to a second machine learning model. , a second process of outputting a plurality of identification data corresponding to the plurality of data pairs, wherein the first pair is a pair consisting of the input data and actual data corresponding to the input data. , the second pair is a pair of data consisting of the input data and the fake data corresponding to the input data, and the identification data is such that the corresponding data pair is the first pair of data. and the second pair, and the second machine learning model executes a calculation using a plurality of second calculation parameters to obtain the identification data. Using the second process to generate the identification data and the teacher data indicating the target value of the identification data, the plurality of second a fourth process of performing the dimension reduction process by the encoder on the fake data to generate second feature data; a fifth process of adjusting the plurality of first calculation parameters so that the difference between the identification data and the teacher data becomes large using and, without using the identification data and the teacher data, calculate the plurality of first calculation parameters so that the difference between the first feature data and the second feature data becomes small. and a sixth process for adjusting, and after training the first machine learning model by repeating the first process, the fourth process, and the sixth process multiple times, the first process to the sixth process are performed. The data generation device is a process of training the first machine learning model and the second machine learning model in parallel by repeating the fifth process a plurality of times.

上記構成によれば、第１の機械学習モデルは、第１の特徴データと第２の特徴データとを用いて、これらの特徴データの差分が小さくなるようにトレーニングされた後に、第２の機械学習モデルとを並行してさらにトレーニングされている。この結果、第１の機械学習モデルは、入力データの特徴が第１の特徴データに適切に反映されるように、効果的にトレーニングされている。従って、データ生成装置は、機械学習モデルを用いて、入力データの特徴が反映された適切な偽データを生成できる。
［適用例８]
適用例６または７に記載のデータ生成装置であって、
前記入力データは、第１の属性と第２の属性を有し、かつ、第３の属性を有しない画像データであり、
前記出力データは、前記第１の属性と前記第３の属性を有し、かつ、前記第２の属性を有しない画像データである、データ生成装置。
According to the above configuration, the first machine learning model is trained using the first feature data and the second feature data so that the difference between these feature data becomes small, and then the second machine learning model The learning model is further trained in parallel. As a result, the first machine learning model is effectively trained so that the features of the input data are appropriately reflected in the first feature data. Therefore, the data generation device can generate appropriate fake data that reflects the characteristics of the input data using the machine learning model.
[Application example 8]
The data generation device according to Application Example 6 or 7,
The input data is image data having a first attribute and a second attribute and not having a third attribute,
The output data is image data having the first attribute and the third attribute and not having the second attribute.

なお、本明細書に開示される技術は、種々の形態で実現することが可能であり、例えば、上記トレーニング方法を実行するトレーニング装置、学習済みの機械学習モデル、これらの方法や装置を実現するためのコンピュータプログラム、そのコンピュータプログラムを記録した記録媒体、等の形態で実現することができる。 Note that the technology disclosed in this specification can be realized in various forms, such as a training device that executes the above training method, a trained machine learning model, and a device that implements these methods and devices. This can be realized in the form of a computer program for, a recording medium on which the computer program is recorded, and the like.

本実施例のデータ生成装置２００の構成を示すブロック図である。FIG. 2 is a block diagram showing the configuration of a data generation device 200 of this embodiment. 画像生成処理のフローチャートである。It is a flowchart of image generation processing. 入力画像と出力画像との一例を示す図である。FIG. 3 is a diagram showing an example of an input image and an output image. 生成ネットワークＧＮの構成を示すブロック図である。FIG. 2 is a block diagram showing the configuration of a generation network GN. エンコーダＥＣの構成を示すブロック図である。FIG. 2 is a block diagram showing the configuration of an encoder EC. デコーダＤＣの構成を示すブロック図である。FIG. 2 is a block diagram showing the configuration of a decoder DC. 本実施例のトレーニング装置１００の構成を示すブロック図である。FIG. 1 is a block diagram showing the configuration of a training device 100 according to the present embodiment. 入力画像と実画像との一例を示す図である。FIG. 3 is a diagram showing an example of an input image and a real image. 本実施例のネットワークシステム１０００の概念図である。FIG. 1 is a conceptual diagram of a network system 1000 according to the present embodiment. 識別ネットワークＤＮの構成を示すブロック図である。FIG. 2 is a block diagram showing the configuration of an identification network DN. トレーニング処理のフローチャートである。It is a flowchart of training processing. 事前処理のフローチャートである。It is a flowchart of pre-processing. メイン処理のフローチャートである。It is a flowchart of main processing.

Ａ．実施例
Ａ－１．データ生成装置の構成
次に、実施の形態を実施例に基づき説明する。図１は、本実施例のデータ生成装置２００の構成を示すブロック図である。 A. Example A-1. Configuration of data generation device Next, an embodiment will be described based on an example. FIG. 1 is a block diagram showing the configuration of a data generation device 200 of this embodiment.

データ生成装置２００は、パーソナルコンピュータやスマートフォンなどの計算機である。データ生成装置２００は、データ生成装置２００のコントローラとしてのＣＰＵ２１０と、ＲＡＭなどの揮発性記憶装置２２０と、ハードディスクドライブやフラッシュメモリなどの不揮発性記憶装置２３０と、液晶ディスプレイなどの表示部２４０と、キーボードやマウスなどの操作部２５０と、通信インタフェース（ＩＦ）２７０と、を備えている。通信インタフェース２７０は、外部機器（例えば、プリンタ３００）と接続するためのインタフェースである。通信インタフェース２７０は、例えば、プリンタ３００が接続されたネットワークＮＷに接続するための有線または無線のインタフェースを含む。 The data generation device 200 is a computer such as a personal computer or a smartphone. The data generation device 200 includes a CPU 210 as a controller of the data generation device 200, a volatile storage device 220 such as a RAM, a nonvolatile storage device 230 such as a hard disk drive or a flash memory, and a display unit 240 such as a liquid crystal display. It includes an operation unit 250 such as a keyboard and a mouse, and a communication interface (IF) 270. Communication interface 270 is an interface for connecting to an external device (for example, printer 300). The communication interface 270 includes, for example, a wired or wireless interface for connecting to the network NW to which the printer 300 is connected.

揮発性記憶装置２２０は、ＣＰＵ２１０が処理を行う際に生成される種々の中間データを一時的に格納するバッファ領域を提供する。不揮発性記憶装置２３０には、コンピュータプログラムＰＧｇと、後述する複数個の入力データＩＤを含む入力データ群ＩＧと、が格納されている。揮発性記憶装置２２０や不揮発性記憶装置２３０は、データ生成装置２００の内部メモリである。 The volatile storage device 220 provides a buffer area that temporarily stores various intermediate data generated when the CPU 210 performs processing. The nonvolatile storage device 230 stores a computer program PGg and an input data group IG including a plurality of input data IDs, which will be described later. The volatile storage device 220 and the nonvolatile storage device 230 are internal memories of the data generation device 200.

コンピュータプログラムＰＧｇは、例えば、プリンタ３００の製造者が運用するサーバからダウンロードされる形態で提供される。これに代えて、コンピュータプログラムＰＧｇは、ＤＶＤ－ＲＯＭなどに格納される形態で提供されてもよい。ＣＰＵ２１０は、コンピュータプログラムＰＧｇを実行することにより、後述する画像生成処理を実行する。 The computer program PGg is provided, for example, in the form of being downloaded from a server operated by the manufacturer of the printer 300. Alternatively, the computer program PGg may be provided in a form stored on a DVD-ROM or the like. The CPU 210 executes image generation processing, which will be described later, by executing the computer program PGg.

コンピュータプログラムＰＧｇは、後述する生成ネットワーク（generator）ＧＮの機能をＣＰＵ２１０に実現させるコンピュータプログラムである生成ネットワークプログラムＧＮＭをモジュールとして含んでいる。 The computer program PGg includes as a module a generation network program GNM, which is a computer program that causes the CPU 210 to implement the functions of a generation network (generator) GN, which will be described later.

プリンタ３００は、インクジェット方式や電子写真方式の印刷装置であり、印刷材としてのインクやトナーを用いて用紙などの印刷媒体上に画像を印刷する。 The printer 300 is an inkjet printing device or an electrophotographic printing device, and prints an image on a printing medium such as paper using ink or toner as a printing material.

Ａ－２．画像生成処理
図２は、画像生成処理のフローチャートである。この画像生成処理は、例えば、データ生成装置２００において、ユーザの開始指示に基づいて開始される。 A-2. Image Generation Processing FIG. 2 is a flowchart of the image generation processing. This image generation process is started, for example, in the data generation device 200 based on a start instruction from the user.

Ｓ１０では、ＣＰＵ２１０は、１個の入力データＩＤを取得する。この入力データＩＤは、例えば、不揮発性記憶装置２３０に格納された入力データ群ＩＧの中から、ユーザの指定に基づいて選択された１個のデータである。入力データＩＤは、入力画像ＩＩを示す画像データである。 In S10, the CPU 210 obtains one input data ID. This input data ID is, for example, one piece of data selected from the input data group IG stored in the nonvolatile storage device 230 based on the user's designation. The input data ID is image data indicating the input image II.

図３は、入力画像と出力画像との一例を示す図である。本実施例の入力画像ＩＩは、第１種の書体（フォント）で特定の文字を示す画像である。入力画像ＩＩは、縦方向および横方向にマトリクス状に配列された（Ｈ×Ｗ）個の画素を含む。Ｈは、入力画像ＩＩの縦方向の画素数であり、Ｗは、入力画像ＩＩの横方向の画素数である。本実施例では、Ｈ＝Ｗ＝２５６である。例えば、図３（Ａ）の入力画像ＩＩａは、第１種の書体で「Ａ」の文字を示す画像である。図３（Ｂ）の入力画像ＩＩｂは、第１種の書体で「Ｂ」の文字を示す画像である。 FIG. 3 is a diagram showing an example of an input image and an output image. The input image II of this embodiment is an image showing specific characters in a first type of font. Input image II includes (H×W) pixels arranged in a matrix in the vertical and horizontal directions. H is the number of pixels in the vertical direction of the input image II, and W is the number of pixels in the horizontal direction of the input image II. In this example, H=W=256. For example, the input image IIa in FIG. 3A is an image showing the character "A" in the first type font. The input image IIb in FIG. 3B is an image showing the character "B" in the first type font.

本実施例では、入力データＩＤは、複数個の画素を含む画像を示すビットマップデータであり、具体的には、ＲＧＢ値によって画素ごとの色を表すＲＧＢ画像データである。ＲＧＢ値は、３個の色成分の階調値（以下、成分値とも呼ぶ）、すなわち、Ｒ値、Ｇ値、Ｂ値を含むＲＧＢ表色系の色値である。Ｒ値、Ｇ値、Ｂ値は、例えば、所定の階調数（例えば、２５６）の階調値である。 In this embodiment, the input data ID is bitmap data representing an image including a plurality of pixels, and specifically, RGB image data representing the color of each pixel using RGB values. The RGB value is a color value of the RGB color system including gradation values of three color components (hereinafter also referred to as component values), that is, an R value, a G value, and a B value. The R value, G value, and B value are, for example, gradation values of a predetermined number of gradations (for example, 256).

Ｓ２０では、ＣＰＵ２１０は、入力データＩＤを生成ネットワークＧＮに入力して、入力データＩＤに対応する出力データＯＤを生成させる。出力データＯＤは、出力画像ＯＩを示す画像データであり、入力データＩＤと同様にＲＧＢ画像データである。本実施例の出力画像ＯＩは、第１種の書体とは異なる第２種の書体で、対応する入力画像ＩＩと同一の文字を示す画像である。例えば、図３（Ａ）の出力画像ＯＩａは、入力画像ＩＩａに対応する出力画像であり、第２種の書体で「Ａ」の文字を示す画像である。図３（Ｂ）の出力画像ＯＩｂは、入力画像ＩＩｂに対応する出力画像であり、第２種の書体で「Ｂ」の文字を示す画像である。本実施例では、出力画像ＯＩのサイズは、入力画像ＩＩのサイズと同じである。生成ネットワークＧＮの具体的な構成については後述する。 In S20, the CPU 210 inputs the input data ID to the generation network GN to generate output data OD corresponding to the input data ID. The output data OD is image data indicating the output image OI, and is RGB image data like the input data ID. The output image OI of this embodiment is an image showing the same characters as the corresponding input image II in a second type font different from the first type font. For example, the output image OIa in FIG. 3A is an output image corresponding to the input image IIa, and is an image showing the character "A" in the second type font. The output image OIb in FIG. 3B is an output image corresponding to the input image IIb, and is an image showing the character "B" in the second type font. In this embodiment, the size of the output image OI is the same as the size of the input image II. The specific configuration of the generation network GN will be described later.

Ｓ３０では、ＣＰＵ２１０は、生成された出力データＯＤを出力する。例えば、ＣＰＵ２１０は、出力データＯＤを用いて出力画像ＯＩを示す印刷データを生成して、プリンタ３００に送信する。プリンタ３００は、印刷データを用いて出力画像ＯＩを印刷媒体上に印刷する。あるいは、ＣＰＵ２１０は、出力データＯＤを用いて出力画像ＯＩを表示部２４０に表示する。 In S30, the CPU 210 outputs the generated output data OD. For example, the CPU 210 uses the output data OD to generate print data indicating the output image OI, and sends it to the printer 300. The printer 300 prints the output image OI on a print medium using print data. Alternatively, the CPU 210 displays the output image OI on the display unit 240 using the output data OD.

Ａ－３．生成ネットワークＧＮの構成
図４は、生成ネットワークＧＮの構成を示すブロック図である。生成ネットワークＧＮは、エンコーダＥＣとデコーダＤＣとを含んでいる。 A-3. Configuration of Generation Network GN FIG. 4 is a block diagram showing the configuration of generation network GN. The generation network GN includes an encoder EC and a decoder DC.

Ａ－３－１．エンコーダＥＣの構成
エンコーダＥＣは、入力データＩＤに対して、複数個の演算パラメータＰｅを用いて、次元削減処理を実行して、入力データＩＤの特徴（すなわち、入力画像ＩＩの特徴）を示す特徴データＣＤを生成する。本実施例では、入力データＩＤは、（２５６×２５６）個の画素のそれぞれの３個の成分値（Ｒ値、Ｇ値、Ｂ値）を含むので、（２５６×２５６×３）個の値を含むデータ、すなわち、（２５６×２５６×３）次元のデータである。特徴データＣＤは、本実施例では、（１×１×５１２）個の値を含むデータ、すなわち、５１２次元のデータである。このように、次元削減処理では、入力データＩＤの次元数が削減される。 A-3-1. Configuration of encoder EC The encoder EC performs dimension reduction processing on the input data ID using a plurality of calculation parameters Pe, and calculates the characteristics representing the characteristics of the input data ID (that is, the characteristics of the input image II). Generate a data CD. In this example, the input data ID includes three component values (R value, G value, B value) for each of (256 x 256) pixels, so it has (256 x 256 x 3) values. , that is, (256×256×3) dimensional data. In this embodiment, the feature data CD is data including (1×1×512) values, that is, 512-dimensional data. In this way, in the dimension reduction process, the number of dimensions of the input data ID is reduced.

図５は、エンコーダＥＣの構成を示すブロック図である。エンコーダＥＣは、入力層ＥＬ＿０と、複数個の畳込層ＥＬ＿１～畳込層ＥＬ＿８を有するニューラルネットワークである。 FIG. 5 is a block diagram showing the configuration of encoder EC. The encoder EC is a neural network having an input layer EL_0 and a plurality of convolutional layers EL_1 to EL_8.

入力層ＥＬ＿０は、入力データＩＤが入力される層である。１番目の畳込層ＥＬ＿１には、入力層ＥＬ＿０に入力された入力データＩＤがそのまま入力される。畳込層ＥＬ＿１は、（２５６×２５６×３）次元の入力データＩＤに対して、後述する演算処理を実行して（Ａ_１×Ｂ_１×Ｃ_１）次元のデータを生成する（Ａ_１、Ｂ_１、Ｃ_１は正の整数）。 The input layer EL_0 is a layer into which input data ID is input. The input data ID input to the input layer EL_0 is input as is to the first convolutional layer EL_1. The convolutional layer EL_1 performs arithmetic processing, which will be described later, on the (256×256×3)-dimensional input data ID to generate (A ₁ ×B ₁ ×C ₁ )-dimensional data (A ₁ , B ₁ and C ₁ are positive integers).

ｋ番目（ｋは、２～８の整数）の畳込層ＥＬ＿ｋには、（ｋ－１）番目の畳込層ＥＬ＿（ｋ－１）によって生成される（Ａ_ｋ－１×Ｂ_ｋ－１×Ｃ_ｋ－１）次元のデータに対して、所定の後処理（後述）を実行して得られる（Ａ_ｋ－１、Ｂ_ｋ－１、Ｃ_ｋ－１）次元の処理済データが入力される。畳込層ＥＬ＿ｋは、（Ａ_ｋ－１×Ｂ_ｋ－１×Ｃ_ｋ－１）次元の処理済データに対して、後述する演算処理を実行して（Ａ_ｋ×Ｂ_ｋ×Ｃ_ｋ）次元のデータを生成する（Ａ_ｋ、Ｂ_ｋ、Ｃ_ｋは正の整数）。 The kth (k is an integer from 2 to 8) convolutional layer EL_k has (A _k-1 ×B _k- 1) generated by the (k-1)th convolutional layer EL_(k-1). Processed data of dimensions (A _k-1 , B _k-1 , C _k-1 ) obtained by performing predetermined post-processing (described later) on data of dimensions × _C k-1 is input. Ru. The convolutional layer EL_k performs arithmetic processing to be described later on the (A _k-1 ×B _k-1 ×C _k-1 )-dimensional processed data to obtain (A _k ×B _k ×C _k )-dimensional processed data. (A _k , B _k , C _k are positive integers).

各畳込層ＥＬ＿１～ＥＬ＿８が実行する演算処理は、畳込処理(convolution)とバイアスの加算処理とを含む。畳込処理は、入力されたデータに対して、（ｐ×ｑ×ｒ）次元のｓ個のフィルタを順次に適用して入力されたデータとフィルタとの相関を示す相関値を算出する処理である。各フィルタを適用する処理では、フィルタをスライドさせながら複数個の相関値が順次に算出される。１個のフィルタは、（ｐ×ｑ×ｒ）個の重みを含んでいる。バイアスの加算処理は、算出された相関値に、１個のフィルタに対して１個ずつ準備されたバイアスを加算する処理である。ｓ個のフィルタに含まれる（ｐ×ｑ×ｒ×ｓ）個の重みと、ｓ個のフィルタに対応するｓ個のバイアスと、は、上述した複数個の演算パラメータＰｅであり、後述するトレーニング処理において更新される。 The arithmetic processing executed by each of the convolutional layers EL_1 to EL_8 includes a convolution process and a bias addition process. Convolution processing is a process in which s filters of (p x q x r) dimensions are sequentially applied to input data to calculate a correlation value indicating the correlation between the input data and the filters. be. In the process of applying each filter, a plurality of correlation values are sequentially calculated while sliding the filter. One filter includes (p×q×r) weights. The bias addition process is a process of adding one bias prepared for each filter to the calculated correlation value. The (p×q×r×s) weights included in the s filters and the s biases corresponding to the s filters are the plurality of calculation parameters Pe described above, and the training Updated in processing.

各畳込層ＥＬ＿１～ＥＬ＿８によって生成されるデータの各値は、上述した相関値にバイアスを加えた値である。各畳込層ＥＬ＿１～ＥＬ＿８によって生成されるデータに含まれるデータの個数（例えば、畳込層ＥＬ＿１の場合は（Ａ_１×Ｂ_１×Ｃ_１））は、畳込処理におけるストライド（フィルタをスライドさせる量）と、フィルタの個数ｓと、によって決定される。 Each value of data generated by each convolutional layer EL_1 to EL_8 is a value obtained by adding a bias to the above-mentioned correlation value. The number of data included in the data generated by each convolutional layer EL_1 to EL_8 (for example, (A ₁ ×B ₁ ×C ₁ ) in the case of convolutional layer EL_1) is the stride (sliding the filter) in the convolution process. s) and the number s of filters.

畳込層ＥＬ＿１によって生成されるデータの各値は、上述した後処理として、活性化関数に入力されて変換される。本実施例では、活性化関数には、いわゆるLeakyReLU（Leaky Rectified Linear Unit）が用いられる。 Each value of the data generated by the convolutional layer EL_1 is input to the activation function and converted as the above-mentioned post-processing. In this embodiment, a so-called LeakyReLU (Leaky Rectified Linear Unit) is used as the activation function.

畳込層ＥＬ＿２～畳込層ＥＬ＿８によって生成されるデータの各値は、上述した後処理として、バッチノーマライゼーション（Batch Normalization）によって変換された後に、さらに、活性化関数に入力されて変換される。バッチノーマライゼーションは、後述するトレーニング処理では、用いられる入力データの集合（バッチ）分について、各値の平均と分散を計算して、各値を正規化する処理である。画像生成処理では、トレーニング処理時にバッチごとに算出された平均と分散の移動平均値を用いて、各値が正規化される。 Each value of the data generated by the convolutional layer EL_2 to convolutional layer EL_8 is converted by batch normalization as the above-mentioned post-processing, and then further input to the activation function and converted. Batch normalization is a process of normalizing each value by calculating the average and variance of each value for a set (batch) of input data used in the training process described later. In the image generation process, each value is normalized using the moving average value of the mean and variance calculated for each batch during the training process.

畳込層ＥＬ＿８によって生成されるデータに対して、上述した後処理を実行して得られる処理済データが、上述した特徴データＣＤである。 Processed data obtained by performing the above-described post-processing on the data generated by the convolutional layer EL_8 is the above-mentioned feature data CD.

なお、本実施例にて、各畳込層ＥＬ＿１～ＥＬ＿８によって生成されるデータの次元数（Ａ_１×Ｂ_１×Ｃ_１）～（Ａ_８×Ｂ_８×Ｃ_８）は、以下の通りである。
（Ａ_１×Ｂ_１×Ｃ_１）＝（１２８×１２８×６４）
（Ａ_２×Ｂ_２×Ｃ_２）＝（６４×６４×１２８）
（Ａ_３×Ｂ_３×Ｃ_３）＝（３２×３２×２５６）
（Ａ_４×Ｂ_４×Ｃ_４）＝（１６×１６×５１２）
（Ａ_５×Ｂ_５×Ｃ_５）＝（８×８×５１２）
（Ａ_６×Ｂ_６×Ｃ_６）＝（４×４×５１２）
（Ａ_７×Ｂ_７×Ｃ_７）＝（２×２×５１２）
（Ａ_８×Ｂ_８×Ｃ_８）＝（１×１×５１２） In this example, the number of dimensions (A ₁ ×B ₁ ×C ₁ ) to (A ₈ ×B ₈ ×C ₈ ) of data generated by each convolutional layer EL_1 to EL_8 is as follows. be.
(A ₁ ×B ₁ ×C ₁ )=(128×128×64)
(A ₂ ×B ₂ ×C ₂ )=(64×64×128)
(A ₃ ×B ₃ ×C ₃ )=(32×32×256)
(A ₄ ×B ₄ ×C ₄ )=(16×16×512)
(A ₅ × B ₅ × C ₅ ) = (8 × 8 × 512)
(A ₆ ×B ₆ ×C ₆ )=(4×4×512)
(A ₇ ×B ₇ ×C ₇ )=(2×2×512)
(A ₈ ×B ₈ ×C ₈ )=(1×1×512)

Ａ－３－２．デコーダＤＣの構成
デコーダＤＣは、エンコーダＥＣによって生成された特徴データＣＤに対して、複数個の演算パラメータＰｄを用いて、次元復元処理を実行して、上述した出力データＯＤを生成する。本実施例では、特徴データＣＤは、上述したように（１×１×５１２）個の値を含むデータ、すなわち、５１２次元のデータである。本実施例では、出力データＯＤは、入力データＩＤと同様に、（２５６×２５６×３）個の値を含むデータ、すなわち、（２５６×２５６×３）次元のデータである。本実施例では、このように、本実施例の次元復元処理では、特徴データＣＤの次元数が復元される。 A-3-2. Configuration of Decoder DC The decoder DC performs dimension restoration processing on the feature data CD generated by the encoder EC using a plurality of calculation parameters Pd to generate the above-mentioned output data OD. In this embodiment, the feature data CD is data including (1×1×512) values, that is, 512-dimensional data, as described above. In this embodiment, the output data OD, like the input data ID, is data including (256×256×3) values, that is, (256×256×3) dimensional data. In this embodiment, in the dimension restoration process of this embodiment, the number of dimensions of the feature data CD is restored in this way.

図６は、デコーダＤＣの構成を示すブロック図である。デコーダＤＣは、複数個の転置畳込層ＤＬ＿１～転置畳込層ＤＬ＿８を有するニューラルネットワークである。 FIG. 6 is a block diagram showing the configuration of decoder DC. The decoder DC is a neural network having a plurality of transposed convolutional layers DL_1 to transposed convolutional layers DL_8.

１番目の転置畳込層ＤＬ＿１には、特徴データＣＤが入力される。転置畳込層ＤＬ＿１は、特徴データＣＤに対して、後述する演算処理を実行して（Ｄ_１×Ｅ_１×Ｆ_１）次元のデータを生成する（Ｄ_１、Ｅ_１、Ｆ_１は正の整数）。 Feature data CD is input to the first transposed convolution layer DL_1. The transposed convolution layer DL_1 performs arithmetic processing to be described later on the feature data CD to generate (D ₁ ×E ₁ ×F ₁ )-dimensional data (D ₁ , E ₁ , and F ₁ are positive integer).

ｍ番目（ｍは、２～８の整数）の転置畳込層ＤＬ＿ｍには、（ｍ－１）番目の転置畳込層ＤＬ＿（ｍ－１）によって生成される（Ｄ_ｍ－１、Ｅ_ｍ－１、Ｆ_ｍ－１）次元のデータに対して所定の後処理（後述）を実行して得られる（Ｄ_ｍ－１、Ｅ_ｍ－１、Ｆ_ｍ－１）次元の処理済データが入力される。さらに、ｍ番目の転置畳込層ＤＬ＿ｍには、エンコーダＥＣの（９－ｍ）番目の畳込層ＥＬ＿（９－ｍ）によって生成されるデータに対して上述した後処理を実行して得られる（Ａ_９－ｍ、Ｂ_９－ｍ、Ｃ_９－ｍ）次元の処理済みデータが入力される。例えば、図５、図６に示すように、デコーダＤＣの転置畳込層ＤＬ＿２には、エンコーダＥＣのＥＬ＿７によって生成されるデータに基づく処理済みデータが入力され、転置畳込層ＤＬ＿４には、エンコーダＥＣのＥＬ＿５によって生成されるデータに基づく処理済みデータが入力される。したがって、ｍ番目（ｍは、２～８の整数）の転置畳込層ＤＬ＿ｍには、｛（Ｄ_ｍ－１、Ｅ_ｍ－１、Ｆ_ｍ－１）＋（Ａ_９－ｍ、Ｂ_９－ｍ、Ｃ_９－ｍ）｝次元の処理済データが入力される。転置畳込層ＤＬ＿ｍは、入力される処理済データに対して、後述する演算処理を実行して（Ｄ_ｍ×Ｅ_ｍ×Ｆ_ｍ）次元のデータを生成する（Ｄ_ｍ、Ｅ_ｍ、Ｆ_ｍは正の整数）。 The m-th (m is an integer from 2 to 8) transposed convolutional layer DL_m includes (D _m-1 , E _m Processed data of dimensions (D _m -1 , E _{m-1 , F m-1 ) obtained by performing predetermined post-processing (described later) on data of dimensions (D m-1} , E _m-1 , F _m _-1 ) is input. be done. Furthermore, the m-th transposed convolutional layer DL_m is obtained by performing the above-mentioned post-processing on the data generated by the (9-m)th convolutional layer EL_(9-m) of the encoder EC. (A _9-m , B _9-m , C _9-m )-dimensional processed data is input. For example, as shown in FIGS. 5 and 6, processed data based on the data generated by EL_7 of the encoder EC is input to the transposed convolution layer DL_2 of the decoder DC, and the transposed convolution layer DL_4 is inputted to the transposed convolution layer DL_2 of the encoder DC. Processed data based on data generated by EC EL_5 is input. Therefore, the m-th (m is an integer from 2 to 8) transposed convolutional layer DL_m has {(D _m-1 , E _m-1 , F _m-1 )+(A _9-m , B _{9- m} , C _9-m )}-dimensional processed data is input. The transposed convolutional layer DL_m performs arithmetic processing, which will be described later, on the input processed data to generate (D _m ×E _m ×F _m )-dimensional data (D _m , E _m , F _m is a positive integer).

各転置畳込層ＤＬ＿１～ＤＬ＿８が実行する演算処理は、転置畳込処理（transposed convolution）とバイアスの加算処理とを含む。転置畳込処理は、入力されたデータに対して、ストライドに応じて適宜に値（例えばゼロの値）を追加して次元数を増加させた後に、上述した畳込処理と同様に（ｐ×ｑ×ｒ）次元のフィルタを用いた畳み込み演算を行う処理である。バイアスの加算処理は、転置畳込演算で算出された相関値に、１個のフィルタに対して１個ずつ準備されたバイアスを加算する処理である。ｓ個のフィルタに含まれる（ｐ×ｑ×ｒ×ｓ）個の重みと、ｓ個のフィルタに対応するｓ個のバイアスと、は、上述した複数個の演算パラメータＰｄであり、後述するトレーニング処理において更新される。 The arithmetic processing executed by each transposed convolution layer DL_1 to DL_8 includes transposed convolution processing and bias addition processing. In the transposed convolution process, after increasing the number of dimensions by adding an appropriate value (for example, a value of zero) to the input data according to the stride, the transpose convolution process is performed in the same way as the convolution process described above. This is a process of performing a convolution operation using a q×r)-dimensional filter. The bias addition process is a process of adding one bias prepared for each filter to the correlation value calculated by the transposed convolution operation. The (p×q×r×s) weights included in the s filters and the s biases corresponding to the s filters are the plurality of calculation parameters Pd described above, and the training Updated in processing.

各転置畳込層ＤＬ＿１～ＤＬ＿８によって生成されるデータの各値は、上述した相関値にバイアスを加えた値である。各転置畳込層ＤＬ＿１～ＤＬ＿８によって生成されるデータに含まれるデータの個数（例えば、転置畳込層ＤＬ＿１の場合は（Ｄ_１×Ｅ_１×Ｆ_１））は、転置畳込処理におけるストライド（ゼロ等の値を追加する量）と、フィルタの個数ｓと、によって決定される。 Each value of data generated by each of the transposed convolutional layers DL_1 to DL_8 is a value obtained by adding a bias to the above-mentioned correlation value. The number of data included in the data generated by each of the transposed convolutional layers DL_1 to DL_8 (for example, (D ₁ ×E ₁ ×F ₁ ) in the case of the transposed convolutional layer DL_1) is the stride ( (the amount to add a value such as zero) and the number of filters s.

転置畳込層ＤＬ＿１～ＤＬ＿３によって生成されるデータの各値は、上述した後処理として、上述したバッチノーマライゼーションによって変換される。そして、トレーニング処理では、バッチノーマライゼーションによって変換された各値は、さらに後処理として、ドロップアウトよって変換された後に、活性化関数に入力されて変換される。ドロップアウトは、過学習を抑制するために、ランダムに選択された一部の値を無効化（０にする）する処理である。活性化関数には、いわゆるReLU（Rectified Linear Unit）が用いられる。画像生成処理では、ドロップアウト処理は行われず、バッチノーマライゼーションによって変換された各値は、活性化関数に入力されて変換される。 Each value of the data generated by the transposed convolutional layers DL_1 to DL_3 is transformed by the batch normalization described above as the post-processing described above. In the training process, each value transformed by batch normalization is further transformed by dropout as post-processing, and then input to the activation function and transformed. Dropout is a process of invalidating (setting to 0) some randomly selected values in order to suppress overfitting. A so-called ReLU (Rectified Linear Unit) is used as the activation function. In the image generation process, dropout processing is not performed, and each value converted by batch normalization is input to the activation function and converted.

転置畳込層ＤＬ＿４～ＤＬ＿７によって生成されるデータの各値は、上述した後処理として、上述したバッチノーマライゼーションによって変換され、その後に活性化関数に入力されて変換される。転置畳込層ＤＬ＿４～ＤＬ＿７の後処理では、トレーニング処理でも画像生成処理でもドロップアウトは行われない。 Each value of the data generated by the transposed convolutional layers DL_4 to DL_7 is transformed by the batch normalization described above as the post-processing described above, and then input to the activation function and transformed. In the post-processing of the transposed convolutional layers DL_4 to DL_7, dropout is not performed in either the training process or the image generation process.

転置畳込層ＤＬ＿８によって生成される（Ｄ_８×Ｅ_８×Ｆ_８）次元のデータが、上述した出力データＯＤである。したがって、転置畳込層ＤＬ＿８によって生成されるデータの次元数（Ｄ_８×Ｅ_８×Ｆ_８）は、出力データＯＤの次元数（２５６×２５６×３）と等しい。 The (D ₈ ×E ₈ ×F ₈ )-dimensional data generated by the transposed convolutional layer DL_8 is the above-mentioned output data OD. Therefore, the number of dimensions (D ₈ ×E ₈ ×F ₈ ) of the data generated by the transposed convolutional layer DL_8 is equal to the number of dimensions (256 × 256 × 3) of the output data OD.

なお、本実施例にて、各転置畳込層ＤＬ＿１～ＤＬ＿８によって生成されるデータの次元数（Ｄ_１×Ｅ_１×Ｆ_１）～（Ｄ_８×Ｅ_８×Ｆ_８）は、以下の通りである。
（Ｄ_１×Ｅ_１×Ｆ_１）＝（２×２×５１２）
（Ｄ_２×Ｅ_２×Ｆ_２）＝（４×４×５１２）
（Ｄ_３×Ｅ_３×Ｆ_３）＝（８×８×５１２）
（Ｄ_４×Ｅ_４×Ｆ_４）＝（１６×１６×５１２）
（Ｄ_５×Ｅ_５×Ｆ_５）＝（３２×３２×２５６）
（Ｄ_６×Ｅ_６×Ｆ_６）＝（６４×６４×１２８）
（Ｄ_７×Ｅ_７×Ｆ_７）＝（１２８×１２８×６４）
（Ｄ_８×Ｅ_８×Ｆ_８）＝（２５６×２５６×３） In this example, the number of dimensions (D ₁ ×E ₁ ×F ₁ ) to (D ₈ ×E ₈ ×F ₈ ) of data generated by each transposed convolution layer DL_1 to DL_8 is as follows. It is.
(D ₁ ×E ₁ ×F ₁ )=(2×2×512)
(D ₂ ×E ₂ ×F ₂ )=(4×4×512)
(D ₃ ×E ₃ ×F ₃ )=(8×8×512)
(D ₄ ×E ₄ ×F ₄ )=(16×16×512)
(D ₅ ×E ₅ ×F ₅ )=(32×32×256)
(D ₆ ×E ₆ ×F ₆ )=(64×64×128)
(D ₇ ×E ₇ ×F ₇ )=(128×128×64)
(D ₈ ×E ₈ ×F ₈ )=(256×256×3)

Ａ－４．生成ネットワークＧＮのトレーニング
上述した生成ネットワークＧＮは、入力データＩＤが入力された場合に、所望の出力データＯＤを生成できるように、トレーニングされている。以下では、生成ネットワークＧＮのトレーニングについて説明する。 A-4. Training of Generation Network GN The generation network GN described above is trained so that it can generate desired output data OD when input data ID is input. Below, training of the generation network GN will be explained.

Ａ－４－１．トレーニング装置の構成
図７は、本実施例の生成ネットワークＧＮのトレーニングを実行するトレーニング装置１００の構成を示すブロック図である。 A-4-1. Configuration of Training Device FIG. 7 is a block diagram showing the configuration of the training device 100 that executes training of the generation network GN of this embodiment.

トレーニング装置１００は、パーソナルコンピュータやサーバなどの計算機である。トレーニング装置１００は、トレーニング装置１００のコントローラとしてのＣＰＵ１１０と、ＲＡＭなどの揮発性記憶装置１２０と、ハードディスクドライブやフラッシュメモリなどの不揮発性記憶装置１３０と、液晶ディスプレイなどの表示部１４０と、キーボードやマウスなどの操作部１５０と、外部機器（例えば、プリンタ３００）と接続するための通信インタフェース（ＩＦ）１７０と、を備えている。 The training device 100 is a computer such as a personal computer or a server. The training device 100 includes a CPU 110 as a controller of the training device 100, a volatile storage device 120 such as a RAM, a non-volatile storage device 130 such as a hard disk drive or a flash memory, a display unit 140 such as a liquid crystal display, and a keyboard and the like. It includes an operation unit 150 such as a mouse, and a communication interface (IF) 170 for connecting to an external device (for example, a printer 300).

揮発性記憶装置１２０は、ＣＰＵ１１０が処理を行う際に生成される種々の中間データを一時的に格納するバッファ領域を提供する。不揮発性記憶装置１３０には、コンピュータプログラムＰＧｔと、トレーニング用の複数個の入力データＩＤを含む入力データ群ＩＧと、複数個の実データＲＤを含む実データ群ＲＧと、教師データＬＤと、が格納されている。揮発性記憶装置１２０や不揮発性記憶装置１３０は、トレーニング装置１００の内部メモリである。 The volatile storage device 120 provides a buffer area that temporarily stores various intermediate data generated when the CPU 110 performs processing. The nonvolatile storage device 130 includes a computer program PGt, an input data group IG including a plurality of input data IDs for training, a real data group RG including a plurality of real data RD, and teacher data LD. Stored. The volatile storage device 120 and the nonvolatile storage device 130 are internal memories of the training device 100.

コンピュータプログラムＰＧｔは、例えば、プリンタ３００の製造者が運用するサーバからダウンロードされる形態で提供される。これに代えて、コンピュータプログラムＰＧｔは、ＤＶＤ－ＲＯＭなどに格納される形態で提供されてもよい。ＣＰＵ１１０は、コンピュータプログラムＰＧｔを実行することにより、後述するトレーニング処理を実行する。 The computer program PGt is provided, for example, in the form of being downloaded from a server operated by the manufacturer of the printer 300. Alternatively, the computer program PGt may be provided in a form stored on a DVD-ROM or the like. The CPU 110 executes a training process, which will be described later, by executing the computer program PGt.

トレーニング処理で用いられる入力データＩＤは、上述した画像生成処理で用いられる入力データＩＤと同様のＲＧＢ画像データである。入力データＩＤによって示される入力画像ＩＩは、上述したように、第１の書体（フォント）で特定の文字を示す画像である。実データＲＤは、上述した画像生成処理で生成される出力データＯＤと同様のＲＧＢ画像データである。複数個の実データＲＤは、入力データ群ＩＧに含まれる複数個の入力データＩＤのそれぞれに一対一で対応している。本実施例では、実データＲＤによって示される実画像ＲＩは、第２の書体で、対応する入力画像ＩＩと同一の文字を示す画像である。 The input data ID used in the training process is RGB image data similar to the input data ID used in the image generation process described above. As described above, the input image II indicated by the input data ID is an image showing specific characters in the first typeface (font). The actual data RD is RGB image data similar to the output data OD generated in the image generation process described above. The plurality of real data RD correspond one-to-one to each of the plurality of input data IDs included in the input data group IG. In this embodiment, the real image RI indicated by the real data RD is an image showing the same characters as the corresponding input image II in the second font.

図８は、入力画像と実画像との一例を示す図である。図８（Ａ）の入力画像ＩＩｃは、第１の書体で「Ｃ」の文字を示す画像である。図８（Ａ）の入力画像ＩＩｃに対応する実画像ＲＩｃは、第２の書体で「Ｃ」の文字を示す画像である。図８（Ｂ）の入力画像ＩＩｄは、第１種の書体で「ｄ」の文字を示す画像である。図８（Ｂ）の入力画像ＩＩｄに対応する実画像ＲＩｄは、第２の書体で「ｄ」の文字を示す画像である。 FIG. 8 is a diagram showing an example of an input image and a real image. Input image IIc in FIG. 8A is an image showing the character "C" in the first typeface. The real image RIc corresponding to the input image IIc in FIG. 8(A) is an image showing the character "C" in the second font. The input image IId in FIG. 8B is an image showing the character "d" in the first type font. The real image RId corresponding to the input image IId in FIG. 8(B) is an image showing the character "d" in the second font.

入力データＩＤと実データＲＤとのペアは、トレーニングを実行する作業者によって、所定数、例えば、１０００個準備される。 A predetermined number, for example, 1000 pairs of input data ID and actual data RD are prepared by the operator who executes the training.

教師データＬＤは、後述する識別ネットワークＤＮが出力すべき識別データＤＤの目標値を示すデータである。教師データＬＤは、トレーニングを実行する作業者によって準備される。教師データＬＤについてはさらに後述する。 The teacher data LD is data indicating a target value of identification data DD to be output by the identification network DN, which will be described later. The teacher data LD is prepared by the operator who executes the training. The teacher data LD will be described further later.

Ａ－４－２．ネットワークシステムの構成
図９は、本実施例のネットワークシステム１０００の概念図である。ネットワークシステム１０００は、生成ネットワークＧＮをトレーニングするために利用されるシステムであり、上述した生成ネットワークＧＮに加えて、識別ネットワーク（discriminator）ＤＮを含んでいる。生成ネットワークＧＮと識別ネットワークＤＮとは、いわゆる敵対的生成ネットワーク（GANs(Generative adversarial networks)）を構成している。 A-4-2. Configuration of Network System FIG. 9 is a conceptual diagram of the network system 1000 of this embodiment. The network system 1000 is a system used to train the generation network GN, and includes a discriminator DN in addition to the generation network GN described above. The generative network GN and the identification network DN constitute so-called generative adversarial networks (GANs).

識別ネットワークＤＮには、入力データＩＤと、その入力データＩＤに対応する実データと、から成る一対のデータ（実データペアＰｒとも呼ぶ）が入力される。さらに、識別ネットワークＤＮには、入力データＩＤと、その入力データＩＤに対応する出力データＯＤと、から成る一対のデータ（偽データペアＰｆとも呼ぶ）が入力される。入力データＩＤに対応する出力データＯＤは、その入力データを生成ネットワークＧＮに入力することによって生成される出力データＯＤを意味する。また、敵対的生成ネットワークでは、出力データＯＤは「偽データＯＤ」とも呼ばれる。ここで、特定の実データＲＤに対応する入力データＩＤを生成ネットワークＧＮに入力することによって生成される偽データＯＤを、該特定の実データＲＤに対応する偽データＯＤとも呼ぶ。 A pair of data (also referred to as a real data pair Pr) consisting of an input data ID and real data corresponding to the input data ID is input to the identification network DN. Furthermore, a pair of data (also referred to as a false data pair Pf) consisting of input data ID and output data OD corresponding to the input data ID is input to the identification network DN. The output data OD corresponding to the input data ID means the output data OD generated by inputting the input data to the generation network GN. In addition, in a generative adversarial network, the output data OD is also called "fake data OD." Here, the fake data OD that is generated by inputting the input data ID corresponding to the specific real data RD into the generation network GN is also referred to as the fake data OD that corresponds to the specific real data RD.

識別ネットワークＤＮは、実データペアＰｒと偽データペアＰｆとのいずれかが入力されると、入力されたデータペアの真偽を識別する。すなわち、識別ネットワークＤＮは、は、入力されたデータペアに対して、複数個の演算パラメータＰｄｎを用いた演算処理を実行して、入力されたデータペアが実データペアと偽データペアとのいずれであるかを識別した結果を示す識別データＤＤを出力する。 When either the real data pair Pr or the false data pair Pf is input, the identification network DN identifies the authenticity of the input data pair. That is, the identification network DN performs arithmetic processing on the input data pair using a plurality of calculation parameters Pdn, and determines whether the input data pair is a real data pair or a fake data pair. The identification data DD indicating the result of identification is output.

識別データＤＤは、本実施例では、（３０×３０×１）個の値を含む（３０×３０×１）次元のデータである。識別ネットワークＤＮは、実データペアＰｒが入力される場合には、入力されたデータペアが実データペアＰｒであることを示す識別データＤＤを出力し、偽データペアＰｆが入力される場合には、入力されたデータペアが偽データペアＰｆであることを示す識別データＤＤを出力するようにトレーニングされる。 In this embodiment, the identification data DD is (30×30×1)-dimensional data including (30×30×1) values. When the real data pair Pr is input, the identification network DN outputs identification data DD indicating that the input data pair is the real data pair Pr, and when the fake data pair Pf is input, the identification network DN outputs identification data DD indicating that the input data pair is the real data pair Pr. , is trained to output identification data DD indicating that the input data pair is a false data pair Pf.

このために、本実施例では、教師データＬＤ（図７）として、実データペアＰｒに対応する教師データＬＤｒと、偽データペアＰｆに対応する教師データＬＤｆと、の２種類が準備される。教師データＬＤｒは、入力されたデータペアが実データペアＰｒであることを示すデータであり、全ての値が「１」である（３０×３０×１）次元のデータである。教師データＬＤｆは、入力されたデータペアが偽データペアＰｆであることを示すデータであり、全ての値が「０」である（３０×３０×１）次元のデータである。 For this reason, in this embodiment, two types of teacher data LD (FIG. 7) are prepared: teacher data LDr corresponding to the real data pair Pr and teacher data LDf corresponding to the false data pair Pf. The teacher data LDr is data indicating that the input data pair is a real data pair Pr, and is dimensional data (30×30×1) in which all values are “1”. The teacher data LDf is data indicating that the input data pair is a false data pair Pf, and is dimensional data (30×30×1) in which all values are “0”.

仮に、識別データＤＤを１次元のデータとする場合には、識別ネットワークＤＮにおいて１次元まで次元数を削減する過程で、入力された実データや偽データによって示される画像を大域的な特徴を示す情報が失われる可能性がある。この場合には、識別ネットワークＤＮのトレーニングが進まなくなる可能性がある。また、識別データＤＤを１次元のデータとする場合には、識別ネットワークＤＮの層数、ひいては、演算パラメータの個数が過度に大きくなり、トレーニングに要する時間が過度に長くなる。本実施例では、識別データＤＤを（３０×３０×１）次元のデータとすることによって、入力されたデータペアの真偽を識別するための情報が失われることを抑制し、かつ、迅速なトレーニングを実現することができる。 If the identification data DD is one-dimensional data, in the process of reducing the number of dimensions to one dimension in the identification network DN, the image represented by the input real data or fake data is changed to show global characteristics. Information may be lost. In this case, there is a possibility that the training of the identification network DN will not progress. Furthermore, when the identification data DD is one-dimensional data, the number of layers of the identification network DN and, by extension, the number of calculation parameters becomes excessively large, and the time required for training becomes excessively long. In this embodiment, by setting the identification data DD to (30 x 30 x 1) dimensional data, it is possible to suppress the loss of information for identifying the authenticity of input data pairs, and to quickly training can be achieved.

図１０は、識別ネットワークＤＮの構成を示すブロック図である。識別ネットワークＤＮは、入力層Ｌ＿０と、複数個の畳込層Ｌ＿１～畳込層Ｌ＿５を有するニューラルネットワークである。 FIG. 10 is a block diagram showing the configuration of identification network DN. The identification network DN is a neural network having an input layer L_0 and a plurality of convolutional layers L_1 to L_5.

入力層Ｌ＿０は、偽データペアＰｆと実データペアＰｒとのいずれかが入力される層である。１番目の畳込層Ｌ＿１には、入力層Ｌ＿０に入力されたデータペアがそのまま入力される。データペアを構成する入力データＩＤの次元数は（２５６×２５６×３）であり、データペアを構成する実データＲＤまたは偽データＯＤの次元数は（２５６×２５６×３）である。このため、データペアの次元数は、（２５６×２５６×６）である。畳込層Ｌ＿１は、（２５６×２５６×６）次元の入力データＩＤに対して、後述する演算処理を実行して（Ｇ_１×Ｈ_１×Ｉ_１）次元のデータを生成する（Ｇ_１、Ｈ_１、Ｉ_１は正の整数）。 The input layer L_0 is a layer into which either the false data pair Pf or the real data pair Pr is input. The data pair input to the input layer L_0 is input as is to the first convolutional layer L_1. The number of dimensions of the input data ID making up the data pair is (256x256x3), and the number of dimensions of the real data RD or fake data OD making up the data pair is (256x256x3). Therefore, the number of dimensions of the data pair is (256×256×6). The convolution layer L_1 performs arithmetic processing, which will be described later, on the (256×256×6)-dimensional input data ID to generate (G ₁ ×H ₁ ×I ₁ )-dimensional data (G ₁ , H ₁ and I ₁ are positive integers).

ｎ番目（ｎは、２～５の整数）の畳込層Ｌ＿ｎには、（ｎ－１）番目の畳込層Ｌ＿（ｎ－１）によって生成される（Ｇ_ｎ－１×Ｈ_ｎ－１×Ｉ_ｎ－１）次元のデータに対して、所定の後処理（後述）を実行して得られる（Ｇ_ｎ－１、Ｈ_ｎ－１、Ｉ_ｎ－１）次元の処理済データが入力される。畳込層Ｌ＿ｎは、（Ｇ_ｎ－１×Ｈ_ｎ－１×Ｉ_ｎ－１）次元の処理済データに対して、後述する演算処理を実行して（Ｇ_ｎ×Ｈ_ｎ×Ｉ_ｎ）次元のデータを生成する（Ｇ_ｎ、Ｈ_ｎ、Ｉ_ｎは正の整数）。 The nth (n is an integer from 2 to 5) convolutional layer L_n has (G _n-1 ×H _n- 1) generated by the (n-1)th convolutional layer L_(n-1). Processed data of dimensions (G _n _-1 , H _n-1 , I _n-1 ) obtained by performing predetermined post-processing (described later) on data of dimensions ×I n-1 is input. Ru. The convolutional layer L_n executes the arithmetic processing described later on the (G _n-1 ×H _n-1 ×I _n-1 )-dimensional processed data to obtain (G _n ×H _n ×I _n )-dimensional processed data. (G _n , H _n , I _n are positive integers).

各畳込層Ｌ＿１～Ｌ＿５が実行する演算処理は、エンコーダＥＣの畳込層の演算処理と同様に、畳込処理(convolution)とバイアスの加算処理とを含む。畳込処理で用いられるフィルタに含まれる複数個の重みと、各フィルタに対応するバイアスと、は、上述した複数個の演算パラメータＰｅであり、後述するトレーニング処理において更新される。なお、畳込処理では、生成されるデータの次元数を調整するためにデータを補うゼロパディングが適宜に行われる。 The arithmetic processing executed by each of the convolutional layers L_1 to L_5 includes convolution processing and bias addition processing, similar to the arithmetic processing of the convolutional layer of the encoder EC. The plurality of weights included in the filter used in the convolution process and the bias corresponding to each filter are the plurality of calculation parameters Pe described above, and are updated in the training process described later. Note that in the convolution process, zero padding is appropriately performed to supplement data in order to adjust the number of dimensions of generated data.

畳込層Ｌ＿１によって生成されるデータの各値は、上述した後処理として、活性化関数に入力されて変換される。本実施例では、活性化関数には、いわゆるLeakyReLU（Leaky Rectified Linear Unit）が用いられる。 Each value of the data generated by the convolutional layer L_1 is input to the activation function and converted as the above-mentioned post-processing. In this embodiment, a so-called LeakyReLU (Leaky Rectified Linear Unit) is used as the activation function.

畳込層Ｌ＿２～畳込層Ｌ＿４によって生成されるデータの各値は、上述した後処理として、バッチノーマライゼーション（Batch Normalization）によって変換された後に、さらに、活性化関数に入力されて変換される。バッチノーマライゼーションは、上述したデコーダＤＣの説明において説明した通りである。 Each value of the data generated by the convolutional layers L_2 to L_4 is converted by batch normalization as the above-mentioned post-processing, and then further input to the activation function and converted. Batch normalization is as explained in the description of the decoder DC above.

畳込層Ｌ＿５によって生成される（Ｇ_５×Ｈ_５×Ｉ_５）次元のデータが、上述した識別データＤＤである。したがって、畳込層Ｌ＿５によって生成されるデータの次元数（Ｇ_５×Ｈ_５×Ｉ_５）は、識別データＤＤの次元数（３０×３０×１）と等しい。 The (G ₅ ×H ₅ ×I ₅ )-dimensional data generated by the convolutional layer L_5 is the above-mentioned identification data DD. Therefore, the number of dimensions (G ₅ ×H ₅ ×I ₅ ) of the data generated by the convolutional layer L_5 is equal to the number of dimensions (30 × 30 × 1) of the identification data DD.

なお、本実施例にて、各畳込層Ｌ＿１～Ｌ＿８によって生成されるデータの次元数（Ｇ_１×Ｈ_１×Ｉ_１）～（Ｇ_５×Ｈ_５×Ｉ_５）は、以下の通りである。
（Ｇ_１×Ｈ_１×Ｉ_１）＝（１２８×１２８×６４）
（Ｇ_２×Ｈ_２×Ｉ_２）＝（６４×６４×１２８）
（Ｇ_３×Ｈ_３×Ｉ_３）＝（３２×３２×２５６）
（Ｇ_４×Ｈ_４×Ｉ_４）＝（３１×３１×５１２）
（Ｇ_５×Ｈ_５×Ｉ_５）＝（３０×３０×１） In this example, the number of dimensions (G ₁ ×H ₁ ×I ₁ ) to (G ₅ ×H ₅ ×I ₅ ) of data generated by each convolutional layer L_1 to L_8 is as follows. be.
(G ₁ ×H ₁ ×I ₁ )=(128×128×64)
(G ₂ × H ₂ × I ₂ ) = (64 × 64 × 128)
(G ₃ × H ₃ × I ₃ ) = (32 × 32 × 256)
(G ₄ × H ₄ × I ₄ ) = (31 × 31 × 512)
(G ₅ × H ₅ × I ₅ ) = (30 × 30 × 1)

本実施例では、識別ネットワークＤＮおよび生成ネットワークＧＮをトレーニングするために、識別データＤＤと教師データＬＤとの間の誤差値Ｅｇａｎが用いられる。誤差Ｅｇａｎは、本実施例では、シグモイドクロスエントロピー誤差が用いられる。例えば、Ｅｇａｎは、以下の式（１）を用いて算出される。誤差値Ｅｇａｎは、識別データＤＤが、対応する教師データＬＤに近づくほど大きくなる。換言すれば、誤差値Ｅｇａｎは、識別データＤＤと、対応する教師データＬＤと、の差分が小さくなるほど大きくなる。 In this embodiment, the error value Egan between the identification data DD and the teacher data LD is used to train the identification network DN and the generation network GN. In this embodiment, a sigmoid cross entropy error is used as the error Egan. For example, Egan is calculated using the following equation (1). The error value Egan increases as the identification data DD approaches the corresponding teacher data LD. In other words, the error value Egan becomes larger as the difference between the identification data DD and the corresponding teacher data LD becomes smaller.

ここで、ｐは、識別データＤＤおよび教師データＬＤの次元数（本実施例では、ｐ＝（３０×３０×１））である。ａ_ｉは、識別データＤＤの各値をシグモイド関数に入力して正規化した値であり、ｔ_ｉは、値ａ_ｉに対応する教師データＬＤの各値（上述したように０または１）である。 Here, p is the number of dimensions of the identification data DD and the teacher data LD (in this embodiment, p=(30×30×1)). a _i is a value obtained by normalizing each value of the identification data DD by inputting it into a sigmoid function, and t _i is each value (0 or 1 as described above) of the teacher data LD corresponding to the value a _i . be.

本実施例では、生成ネットワークＧＮをトレーニングするために、誤差値Ｅｇａｎに加えて、２種類の誤差値Ｅ１と、Ｅ２とが用いられる。誤差値Ｅ１は、上述した入力データＩＤの特徴データＣＤと、該入力データＩＤに対応する偽データＯＤの特徴データＣＤｆと、の間の誤差を示す。偽データＯＤの特徴データＣＤｆは、入力データＩＤを生成ネットワークＧＮに入力することによって生成された偽データＯＤを、生成ネットワークＧＮのエンコーダＥＣに入力することによって、生成されるデータである。すなわち、特徴データＣＤを生成する際に入力データＩＤに対して実行される演算と同じ演算を偽データＯＤに対して実行することによって、特徴データＣＤｆが生成される。誤差値Ｅ１には、平均二乗誤差（MSE（Mean Squared Error））が用いられる。例えば、誤差値Ｅ１は、以下の式（２）を用いて算出される。 In this embodiment, in addition to the error value Egan, two types of error values E1 and E2 are used to train the generation network GN. The error value E1 indicates the error between the feature data CD of the input data ID described above and the feature data CDf of the false data OD corresponding to the input data ID. The feature data CDf of the fake data OD is data generated by inputting the fake data OD generated by inputting the input data ID into the generation network GN to the encoder EC of the generation network GN. That is, the feature data CDf is generated by performing the same operation on the fake data OD as the operation performed on the input data ID when generating the feature data CD. A mean squared error (MSE) is used for the error value E1. For example, the error value E1 is calculated using the following equation (2).

ここで、ｑは、特徴データＣＤ、ＣＤｆの次元数（本実施例では、ｑ＝（１×１×５１２））である。ｂ_ｉは、特徴データＣＤの各値であり、ｃ_ｉは、値ｂ_ｉに対応する特徴データＣＤｆの各値である。誤差値Ｅ１は、特徴データＣＤｆが特徴データＣＤに近いほど小さくなる。換言すれば、誤差値Ｅ１は、特徴データＣＤｆと特徴データＣＤとの差分が小さくなるほど小さくなる。 Here, q is the number of dimensions of the feature data CD and CDf (in this embodiment, q=(1×1×512)). b _i is each value of the feature data CD, and c _i is each value of the feature data CDf corresponding to the value b _i . The error value E1 becomes smaller as the feature data CDf is closer to the feature data CD. In other words, the error value E1 becomes smaller as the difference between the feature data CDf and the feature data CD becomes smaller.

誤差値Ｅ２は、実データＲＤと、実データＲＤに対応する偽データＯＤと、の間の誤差を示す。誤差値Ｅ２には、例えば、平均絶対誤差（MAE（Mean Absolute Error））が用いられる。例えば、誤差値Ｅ２は、以下の式（３）を用いて算出される。 The error value E2 indicates the error between the real data RD and the false data OD corresponding to the real data RD. For example, a mean absolute error (MAE) is used as the error value E2. For example, the error value E2 is calculated using the following equation (3).

ここで、ｓは、実データＲＤおよび偽データＯＤの次元数（本実施例では、ｓ＝（２５６×２５６×３））である。ｄ_ｉは、実データＲＤの各値であり、ｅ_ｉは、値ｄ_ｉに対応する出力データＯＤの各値である。式（３）から解るように、誤差値Ｅ２は、実データＲＤによって示される画像と、偽データＯＤによって示される画像と、の間の画素ごとの誤差である。誤差値Ｅ２は、偽データＯＤと実データＲＤとが近いほど小さくなる。換言すれば、誤差値Ｅ２は、偽データＯＤと実データＲＤとの差分が小さくなるほど小さくなる。 Here, s is the number of dimensions of the real data RD and the fake data OD (in this embodiment, s=(256×256×3)). d _i is each value of the actual data RD, and e _i is each value of the output data OD corresponding to the value d _i . As can be seen from equation (3), the error value E2 is the pixel-by-pixel error between the image represented by the real data RD and the image represented by the false data OD. The error value E2 becomes smaller as the false data OD and the real data RD are closer. In other words, the error value E2 becomes smaller as the difference between the false data OD and the real data RD becomes smaller.

識別ネットワークＤＮは、誤差値Ｅｇａｎが大きくなるようにトレーニングされる。換言すれば、識別ネットワークＤＮは、入力されたデータペアの真偽を正しく識別することを目的としてトレーニングされる。 The discrimination network DN is trained such that the error value Egan becomes large. In other words, the identification network DN is trained for the purpose of correctly identifying the authenticity of input data pairs.

生成ネットワークＧＮは、誤差値Ｅｇａｎが小さくなるようにトレーニングされる。同時に、生成ネットワークＧＮは、誤差値Ｅ１、Ｅ２が小さくなるようにトレーニングされる。換言すれば、生成ネットワークＧＮは、識別ネットワークＤＮによって、偽データペアＰｆが実データペアＰｒであると誤って識別されること、および、特徴データＣＤｆと特徴データＣＤとが近づくこと、および、偽データＯＤが実データＲＤに近づくことを目的としてトレーニングされる。 The generation network GN is trained so that the error value Egan is small. At the same time, the generation network GN is trained such that the error values E1, E2 are small. In other words, the generation network GN recognizes that the false data pair Pf is incorrectly identified as the real data pair Pr by the identification network DN, that the feature data CDf and the feature data CD approach each other, and that the false data pair Pf is incorrectly identified as the real data pair Pr. The purpose of training is to make the data OD closer to the real data RD.

Ａ－４－３．トレーニング処理
以下では、具体的なトレーニング処理について説明する。トレーニング処理は、生成ネットワークＧＮの上述した複数個の演算パラメータＰｅ、Ｐｄおよび識別ネットワークＤＮの上述した複数個の演算パラメータＰｄｎを調整することで、生成ネットワークＧＮが適切な出力データ（偽データ）ＯＤを出力できるようにトレーニングする処理である。上述したデータ生成装置２００の不揮発性記憶装置２３０に格納されたコンピュータプログラムＰＧｔに組み込まれた生成ネットワークＧＮは、本トレーニング処理によってトレーニングされた学習済みモデルである。 A-4-3. Training Processing Below, specific training processing will be explained. The training process is performed by adjusting the plurality of calculation parameters Pe and Pd described above in the generation network GN and the plurality of calculation parameters Pdn described above in the identification network DN, so that the generation network GN can output appropriate output data (fake data) OD. This is a process of training to be able to output . The generation network GN incorporated in the computer program PGt stored in the nonvolatile storage device 230 of the data generation device 200 described above is a learned model trained by this training process.

図１１は、トレーニング処理のフローチャートである。Ｓ１００では、ＣＰＵ１１０は、事前処理を実行する。事前処理は、識別ネットワークＤＮを用いることなく、生成ネットワークＧＮのみを用いて、生成ネットワークＧＮの複数個の演算パラメータＰｅ、Ｐｄを調整する処理である。ＧＡＮｓでは、生成ネットワークＧＮのタスク（画像データの生成）が識別ネットワークＤＮのタスク（画像データの識別）よりも重い。このために、生成ネットワークＧＮを事前にある程度トレーニングしておくことが好ましいためである。 FIG. 11 is a flowchart of the training process. In S100, CPU 110 executes pre-processing. The preprocessing is a process of adjusting a plurality of calculation parameters Pe and Pd of the generation network GN using only the generation network GN without using the identification network DN. In GANs, the task of the generation network GN (generating image data) is heavier than the task of the identification network DN (identifying image data). For this reason, it is preferable to train the generation network GN to some extent in advance.

Ｓ２００では、ＣＰＵ２１０は、メイン処理を実行する。メイン処理は、識別ネットワークＤＮと生成ネットワークＧＮとを用いて、識別ネットワークＤＮの複数個の演算パラメータＰｄｎと、生成ネットワークＧＮの複数個の演算パラメータＰｅ、Ｐｄと、を調整する処理である。メイン処理が完了すると、生成ネットワークＧＮのトレーニングは完了し、生成ネットワークＧＮの学習済みモデルが完成する。 In S200, CPU 210 executes main processing. The main process is a process of adjusting a plurality of calculation parameters Pdn of the identification network DN and a plurality of calculation parameters Pe and Pd of the generation network GN using the identification network DN and the generation network GN. When the main processing is completed, the training of the generation network GN is completed, and the trained model of the generation network GN is completed.

Ａ－４－３－１．事前処理
図１２は、事前処理のフローチャートである。Ｓ１１０では、生成ネットワークＧＮの複数個の演算パラメータＰｅ、Ｐｄと、識別ネットワークＤＮの複数個の演算パラメータＰｄｎと、を初期化する。例えば、これらの演算パラメータＰｅ、Ｐｄ、Ｐｄｎの初期値は、同一の分布（例えば、正規分布）から独立に取得された乱数に設定される。 A-4-3-1. Pre-processing FIG. 12 is a flowchart of pre-processing. In S110, a plurality of calculation parameters Pe and Pd of the generation network GN and a plurality of calculation parameters Pdn of the identification network DN are initialized. For example, the initial values of these calculation parameters Pe, Pd, and Pdn are set to random numbers independently obtained from the same distribution (eg, normal distribution).

Ｓ１２０では、ＣＰＵ２１０は、不揮発性記憶装置２３０に格納された複数個（例えば、１０００個）の実データペアＰｒから、バッチサイズ分の実データペアＰｒ、例えば、Ｖ個（Ｖは、２以上の整数、例えば、Ｖ＝１００）の実データペアＰｒを選択する。不揮発性記憶装置２３０に格納された複数個の実データペアＰｒは、Ｖ個ずつの実データペアＰｒをそれぞれ含む複数個のグループ（バッチ）に予め分割されている。ＣＰＵ２１０は、これらの複数個のグループから１個のグループを順次に選択することによって、Ｖ個の使用すべき実データペアＰｒを選択する。これに代えて、Ｖ個ずつの実データペアＰｒは、不揮発性記憶装置２３０に格納された複数個の実データペアＰｒから、毎回、ランダムに選択されても良い。 In S120, the CPU 210 selects a batch size of real data pairs Pr, for example, V (V is 2 or more) from a plurality of (for example, 1000) real data pairs Pr stored in the nonvolatile storage device 230. Select a real data pair Pr of an integer, eg, V=100. The plurality of real data pairs Pr stored in the nonvolatile storage device 230 are divided in advance into a plurality of groups (batches) each containing V real data pairs Pr. The CPU 210 selects V real data pairs Pr to be used by sequentially selecting one group from the plurality of groups. Alternatively, V real data pairs Pr may be randomly selected each time from a plurality of real data pairs Pr stored in the nonvolatile storage device 230.

Ｓ１３０では、ＣＰＵ２１０は、選択されたＶ個の実データペアＰｒに含まれるＶ個の入力データＩＤをそれぞれ、生成ネットワークＧＮのエンコーダＥＣに入力して、Ｖ個の入力データＩＤの特徴データＣＤを生成する。 In S130, the CPU 210 inputs the V input data IDs included in the V selected real data pairs Pr to the encoder EC of the generation network GN, and obtains the feature data CD of the V input data IDs. generate.

Ｓ１４０では、ＣＰＵ２１０は、Ｖ個の特徴データＣＤをそれぞれ、生成ネットワークＧＮのデコーダＤＣに入力して、Ｖ個の実データＲＤおよび入力データＩＤに対応するＶ個の偽データＯＤを生成する。 In S140, the CPU 210 inputs the V feature data CD to the decoder DC of the generation network GN to generate V real data RD and V false data OD corresponding to the input data ID.

Ｓ１５０では、ＣＰＵ２１０は、Ｖ個の実データＲＤのそれぞれについて、実データＲＤと、該実データＲＤに対応する偽データＯＤと、の間の誤差値Ｅ２を算出する。 In S150, the CPU 210 calculates, for each of the V pieces of real data RD, an error value E2 between the real data RD and the fake data OD corresponding to the real data RD.

Ｓ１６０では、ＣＰＵ２１０は、Ｖ個の偽データＯＤのそれぞれを、生成ネットワークＧＮのエンコーダＥＣに入力して、Ｖ個の偽データＯＤの特徴データＣＤｆを生成する。 In S160, the CPU 210 inputs each of the V pieces of fake data OD to the encoder EC of the generation network GN to generate feature data CDf of the V pieces of fake data OD.

Ｓ１７０では、ＣＰＵ２１０は、Ｖ個の入力データＩＤの特徴データＣＤのそれぞれについて、特徴データＣＤと、対応する特徴データＣＤｆと、の間の誤差値Ｅ１を算出する。 In S170, the CPU 210 calculates the error value E1 between the feature data CD and the corresponding feature data CDf for each of the feature data CD of the V input data IDs.

Ｓ１８０では、ＣＰＵ２１０は、Ｖ個の誤差値Ｅ１とＶ個の誤差値Ｅ２とを用いて、生成ネットワークＧＮの複数個の演算パラメータＰｅ、Ｐｄを調整する。具体的には、ＣＰＵ２１０は、以下の式（４）に示す損失関数Ｌｐｒｅを用いて指標値を算出する。ＣＰＵ２１０は、損失関数Ｌｐｒｅを用いて算出される指標値が小さくなるように、所定のアルゴリズムに従って演算パラメータＰｅ、Ｐｄを調整する。所定のアルゴリズムには、例えば、誤差逆伝播法と勾配降下法とを用いたアルゴリズムが用いられる。 In S180, the CPU 210 uses the V error values E1 and V error values E2 to adjust the plurality of calculation parameters Pe and Pd of the generation network GN. Specifically, the CPU 210 calculates the index value using a loss function Lpre shown in equation (4) below. The CPU 210 adjusts the calculation parameters Pe and Pd according to a predetermined algorithm so that the index value calculated using the loss function Lpre becomes small. As the predetermined algorithm, for example, an algorithm using error backpropagation and gradient descent is used.

ここで、Ｌ１は、Ｖ個の誤差値Ｅ１の平均値であり、Ｌ２は、Ｖ個の誤差値Ｅ２の平均値であり、αは、所定の係数である。 Here, L1 is an average value of V error values E1, L2 is an average value of V error values E2, and α is a predetermined coefficient.

Ｓ１９０では、ＣＰＵ２１０は、トレーニングが完了したか否かを判断する。本実施例では、作業者からの完了指示が入力された場合にはトレーニングが完了したと判断し、トレーニングの継続指示が入力された場合にはトレーニングが完了していないと判断する。例えば、ＣＰＵ２１０は、トレーニング用に用いられた入力データＩＤとは別の複数個のテスト用の入力データＩＤを、生成ネットワークＧＮに入力して、複数個の偽データＯＤを生成する。ＣＰＵ２１０は、生成された偽データＯＤによって示される画像を表示部２４０に表示する。作業者は、表示された画像を確認して、事前処理の段階の画像として十分な画像（例えば、ある程度の文字が認識できる画像）が生成されているか否かを確認し、確認結果に応じて、操作部２５０を介して、トレーニングの完了指示または継続指示を入力する。変形例では、例えば、Ｓ１２０～Ｓ１８０の処理が所定回数だけ繰り返された場合に、トレーニングが完了されたと判断されても良い。 In S190, CPU 210 determines whether training has been completed. In this embodiment, if a completion instruction is input from the worker, it is determined that the training has been completed, and if an instruction to continue training is input, it is determined that the training is not completed. For example, the CPU 210 inputs a plurality of test input data IDs different from the input data ID used for training to the generation network GN, and generates a plurality of false data OD. The CPU 210 displays the image indicated by the generated fake data OD on the display unit 240. The worker checks the displayed image to see if a sufficient image has been generated for the pre-processing stage (for example, an image in which some characters can be recognized), and then , an instruction to complete or continue training is input via the operation unit 250. In a modified example, for example, it may be determined that training is completed when the processes of S120 to S180 are repeated a predetermined number of times.

トレーニングが完了していないと判断される場合には（Ｓ１９０：ＮＯ）、ＣＰＵ２１０は、Ｓ１２０に処理を戻す。トレーニングが完了したと判断される場合には（Ｓ１９０：ＹＥＳ）、ＣＰＵ２１０は、事前処理を終了する。 If it is determined that the training has not been completed (S190: NO), the CPU 210 returns the process to S120. If it is determined that the training has been completed (S190: YES), the CPU 210 ends the pre-processing.

Ａ－４－３－２．メイン処理
図１３は、メイン処理のフローチャートである。Ｓ２０５では、図１２のＳ１２０と同様に、ＣＰＵ２１０は、不揮発性記憶装置２３０に格納された複数個の実データペアＰｒから、バッチサイズ分の実データペアＰｒ、例えば、Ｖ個の実データペアＰｒを選択する。なお、メイン処理のバッチサイズ、すなわち、本ステップで選択される実データペアＰｒの個数は、事前処理のバッチサイズとは異なっていても良い。 A-4-3-2. Main Processing FIG. 13 is a flowchart of the main processing. In S205, similar to S120 in FIG. 12, the CPU 210 extracts real data pairs Pr for the batch size, for example, V real data pairs Pr from the plurality of real data pairs Pr stored in the nonvolatile storage device 230. Select. Note that the batch size of the main process, that is, the number of real data pairs Pr selected in this step, may be different from the batch size of the pre-process.

Ｓ２１０では、図１２のＳ１３０と同様に、ＣＰＵ２１０は、選択されたＶ個の実データペアＰｒに含まれるＶ個の入力データＩＤをそれぞれ、生成ネットワークＧＮのエンコーダＥＣに入力して、Ｖ個の入力データＩＤの特徴データＣＤを生成する。 In S210, similarly to S130 in FIG. 12, the CPU 210 inputs each of the V input data IDs included in the selected V real data pairs Pr to the encoder EC of the generation network GN. Generate feature data CD of input data ID.

Ｓ２１５では、図１２のＳ１４０と同様に、ＣＰＵ２１０は、Ｖ個の特徴データＣＤをそれぞれ、生成ネットワークＧＮのデコーダＤＣに入力して、Ｖ個の実データＲＤおよび入力データＩＤに対応するＶ個の偽データＯＤを生成する。 In S215, similarly to S140 in FIG. 12, the CPU 210 inputs the V feature data CD to the decoder DC of the generation network GN, and inputs the V feature data CD corresponding to the V real data RD and the input data ID. Generate fake data OD.

Ｓ２２０では、ＣＰＵ２１０は、Ｖ個の実データペアＰｒを識別ネットワークＤＮに入力して、Ｖ個の実データペアＰｒに対応するＶ個の識別データＤＤを生成する。 In S220, the CPU 210 inputs the V real data pairs Pr to the identification network DN and generates V identification data DD corresponding to the V real data pairs Pr.

Ｓ２２５では、ＣＰＵ２１０は、Ｖ個の偽データペアＰｆを識別ネットワークＤＮに入力して、Ｖ個の偽データペアＰｆに対応するＶ個の識別データＤＤを生成する。Ｖ個の偽データペアＰｆのそれぞれは、Ｖ個の実データペアＰｒに含まれるＶ個の入力データＩＤのそれぞれと、該入力データＩＤに対応する偽データＯＤと、のペアである。入力データＩＤに対応する偽データＯＤは、Ｓ２１０、Ｓ２１５にて該入力データＩＤを生成ネットワークＧＮに入力して生成された偽データＯＤを意味する。 In S225, the CPU 210 inputs the V fake data pairs Pf to the identification network DN, and generates V identification data DD corresponding to the V fake data pairs Pf. Each of the V false data pairs Pf is a pair of each of the V input data IDs included in the V real data pairs Pr and the false data OD corresponding to the input data ID. The fake data OD corresponding to the input data ID means the fake data OD generated by inputting the input data ID to the generation network GN in S210 and S215.

Ｓ２３０では、ＣＰＵ２１０は、Ｓ２２０にて実データペアＰｒについて生成されたＶ個の識別データＤＤと、Ｓ２２５にて偽データペアＰｆについて生成されたＶ個の識別データＤＤと、のそれぞれについて、識別データＤＤと教師データＬＤとの間の誤差値Ｅｇａｎを算出する。 In S230, the CPU 210 generates identification data for each of the V identification data DD generated for the real data pair Pr in S220 and the V identification data DD generated for the fake data pair Pf in S225. An error value Egan between DD and teacher data LD is calculated.

Ｓ２３５では、ＣＰＵ２１０は、算出された（２×Ｖ）個の誤差値Ｅｇａｎを用いて、識別ネットワークＤＮの複数個の演算パラメータＰｄｎを調整する。具体的には、ＣＰＵ２１０は、以下の式（５）に示す損失関数Ｌｇａｎを用いて指標値を算出する。ＣＰＵ２１０は、損失関数Ｌｇａｎを用いて算出される指標値が大きくなるように、所定のアルゴリズムに従って演算パラメータＰｄｎを調整する。所定のアルゴリズムには、例えば、誤差逆伝播法と勾配降下法とを用いたアルゴリズムが用いられる。 In S235, the CPU 210 uses the calculated (2×V) error values Egan to adjust the plurality of calculation parameters Pdn of the identification network DN. Specifically, the CPU 210 calculates the index value using a loss function Lgan shown in equation (5) below. The CPU 210 adjusts the calculation parameter Pdn according to a predetermined algorithm so that the index value calculated using the loss function Lgan becomes large. As the predetermined algorithm, for example, an algorithm using error backpropagation and gradient descent is used.

式（５）に示すように、損失関数Ｌｇａｎは、（２×Ｖ）個の誤差値Ｅｇａｎの平均値を示す関数である。 As shown in equation (5), the loss function Lgan is a function that indicates the average value of (2×V) error values Egan.

Ｓ２４０では、図１２のＳ１５０と同様に、ＣＰＵ２１０は、Ｖ個の実データＲＤのそれぞれについて、実データＲＤと、該実データＲＤに対応する偽データＯＤと、の間の誤差値Ｅ２を算出する。 In S240, similarly to S150 in FIG. 12, the CPU 210 calculates, for each of the V pieces of real data RD, an error value E2 between the real data RD and the false data OD corresponding to the real data RD. .

Ｓ２４５では、図１２のＳ１６０と同様に、ＣＰＵ２１０は、Ｖ個の偽データＯＤのそれぞれを、生成ネットワークＧＮのエンコーダＥＣに入力して、Ｖ個の偽データＯＤの特徴データＣＤｆを生成する。 In S245, similarly to S160 in FIG. 12, the CPU 210 inputs each of the V pieces of fake data OD to the encoder EC of the generation network GN to generate feature data CDf of the V pieces of fake data OD.

Ｓ２５０では、図１２のＳ１７０と同様に、ＣＰＵ２１０は、Ｖ個の入力データＩＤの特徴データＣＤのそれぞれについて、特徴データＣＤと、対応する特徴データＣＤｆと、の間の誤差値Ｅ１を算出する。 In S250, similarly to S170 in FIG. 12, the CPU 210 calculates the error value E1 between the feature data CD and the corresponding feature data CDf for each of the feature data CD of the V input data IDs.

Ｓ２５５では、ＣＰＵ２１０は、（２×Ｖ）個の誤差値ＥｇａｎとＶ個の誤差値Ｅ１とＶ個の誤差値Ｅ２とを用いて、生成ネットワークＧＮの複数個の演算パラメータＰｅ、Ｐｄを調整する。具体的には、ＣＰＵ２１０は、以下の式（６）に示す損失関数Ｌｇを用いて指標値を算出する。ＣＰＵ２１０は、損失関数Ｌｇを用いて算出される指標値が小さくなるように、所定のアルゴリズムに従って演算パラメータＰｅ、Ｐｄを調整する。所定のアルゴリズムには、例えば、誤差逆伝播法と勾配降下法とを用いたアルゴリズムが用いられる。 In S255, the CPU 210 uses (2×V) error values Egan, V error values E1, and V error values E2 to adjust the plurality of calculation parameters Pe and Pd of the generation network GN. . Specifically, the CPU 210 calculates the index value using a loss function Lg shown in equation (6) below. The CPU 210 adjusts the calculation parameters Pe and Pd according to a predetermined algorithm so that the index value calculated using the loss function Lg becomes small. As the predetermined algorithm, for example, an algorithm using error backpropagation and gradient descent is used.

ここで、上述したように、Ｌｇａｎは（２×Ｖ）個の誤差値Ｅｇａｎの平均値であり、Ｌ１はＶ個の誤差値Ｅ１の平均値であり、Ｌ２はＶ個の誤差値Ｅ２の平均値である。β、γは、所定の係数である。 Here, as described above, Lgan is the average value of (2×V) error values Egan, L1 is the average value of V error values E1, and L2 is the average value of V error values E2. It is a value. β and γ are predetermined coefficients.

Ｓ２６０では、ＣＰＵ２１０は、トレーニングが完了したか否かを判断する。本実施例では、作業者からの完了指示が入力された場合にはトレーニングが完了したと判断し、トレーニングの継続指示が入力された場合にはトレーニングが完了していないと判断する。例えば、ＣＰＵ２１０は、トレーニング用に用いられた入力データＩＤとは別の複数個のテスト用の入力データＩＤを、生成ネットワークＧＮに入力して、複数個の偽データＯＤを生成する。ＣＰＵ２１０は、生成された偽データＯＤによって示される画像を表示部２４０に表示する。作業者は、表示された画像を確認して、最終的な画像として十分な画像（例えば、第２の書体で特定の文字を精度良く表現する画像）が生成されているか否かを確認し、確認結果に応じて、操作部２５０を介して、トレーニングの完了指示または継続指示を入力する。変形例では、例えば、Ｓ２０５～Ｓ２５５の処理が所定回数だけ繰り返された場合に、トレーニングが完了されたと判断されても良い。 In S260, CPU 210 determines whether training has been completed. In this embodiment, if a completion instruction is input from the worker, it is determined that the training has been completed, and if an instruction to continue training is input, it is determined that the training is not completed. For example, the CPU 210 inputs a plurality of test input data IDs different from the input data ID used for training to the generation network GN, and generates a plurality of false data OD. The CPU 210 displays the image indicated by the generated fake data OD on the display unit 240. The operator checks the displayed image and confirms whether or not an image sufficient for the final image (for example, an image that accurately represents a specific character in the second font) has been generated, Depending on the confirmation result, an instruction to complete or continue the training is input via the operation unit 250. In a modified example, for example, it may be determined that training is completed when the processes of S205 to S255 are repeated a predetermined number of times.

トレーニングが完了していないと判断される場合には（Ｓ２６０：ＮＯ）、ＣＰＵ２１０は、Ｓ２０５に処理を戻す。トレーニングが完了したと判断される場合には（Ｓ２６０：ＹＥＳ）、ＣＰＵ２１０は、トレーニング処理を終了する。以上説明したトレーニング処理が終了した時点で、生成ネットワークＧＮは、演算パラメータＰｅ、Ｐｄが調整された学習済みモデルになっている。したがって、トレーニング処理は、学習済みモデルを生成（製造）する処理である、と言うことができる。 If it is determined that the training has not been completed (S260: NO), the CPU 210 returns the process to S205. If it is determined that the training has been completed (S260: YES), the CPU 210 ends the training process. At the time when the training process described above is completed, the generation network GN has become a learned model with the calculation parameters Pe and Pd adjusted. Therefore, it can be said that the training process is a process of generating (manufacturing) a trained model.

以上説明した本実施例によれば、トレーニング処理（図１１）のメイン処理（図１３）は、入力データＩＤを生成ネットワークＧＮに入力することによって入力データＩＤに対応する偽データＯＤを出力させる第１工程（図１３のＳ２１０、Ｓ２１５）と、実データペアＰｒと偽データペアＰｆとを含む複数個のデータペアを識別ネットワークＤＮに入力して、複数個のデータペアに対応する複数個の識別データＤＤを出力させる第２工程（図１３のＳ２２０、Ｓ２２５）と、識別データＤＤと教師データＬＤとを用いて、識別データＤＤと教師データＬＤとの差分が小さくなるように（本実施例では、誤差値Ｅｇａｎが大きくなるように）、識別ネットワークＤＮの複数個の演算パラメータＰｄｎを調整する第３工程（図１３のＳ２３０、Ｓ２３５）と、を備える。 According to the present embodiment described above, the main process (FIG. 13) of the training process (FIG. 11) is to output the fake data OD corresponding to the input data ID by inputting the input data ID to the generation network GN. 1 step (S210, S215 in FIG. 13), a plurality of data pairs including a real data pair Pr and a fake data pair Pf are input to the identification network DN, and a plurality of identifications corresponding to the plurality of data pairs are The second step of outputting the data DD (S220, S225 in FIG. 13) and the identification data DD and the teacher data LD are used to reduce the difference between the identification data DD and the teacher data LD (in this embodiment, , a third step (S230, S235 in FIG. 13) of adjusting a plurality of calculation parameters Pdn of the identification network DN (so that the error value Egan becomes large).

さらに、メイン処理は、偽データＯＤに対して、エンコーダＥＣによる次元削減処理を実行して特徴データＣＤｆを生成する第４工程（図１３のＳ２４５）と、識別データＤＤと教師データＬＤと特徴データＣＤと特徴データＣＤｆとを用いて、識別データＤＤと教師データＬＤとの差分が大きくなり（本実施例では、誤差値Ｅｇａｎが小さくなり）、かつ、特徴データＣＤと特徴データＣＤｆとの差分が小さくなるように（本実施例では誤差値Ｅ１が小さくなるように）、生成ネットワークＧＮの複数個の演算パラメータＰｅ、Ｐｄを調整する第５工程（図１３のＳ２５０、Ｓ２５５）と、を備える。 Furthermore, the main process includes a fourth step (S245 in FIG. 13) of performing dimension reduction processing using the encoder EC on the fake data OD to generate feature data CDf, and identifying data DD, teacher data LD, and feature data. By using CD and the feature data CDf, the difference between the identification data DD and the teacher data LD becomes large (in this example, the error value Egan becomes small), and the difference between the feature data CD and the feature data CDf becomes large. A fifth step (S250, S255 in FIG. 13) of adjusting a plurality of calculation parameters Pe, Pd of the generation network GN so that the error value E1 becomes smaller (in this embodiment, the error value E1 becomes smaller) is provided.

メイン処理では、第１工程～第５工程を複数回繰り返す（図１３のＳ２６０）ことによって、生成ネットワークＧＮと識別ネットワークＤＮとを並行してトレーニングする。このメイン処理によれば、生成ネットワークＧＮの複数個の演算パラメータＰｅ、Ｐｄは、識別データＤＤと教師データＬＤとの差分が大きくなるように調整されるだけでなく、特徴データＣＤ、ＣＤｆとを用いて、これらの特徴データＣＤ、ＣＤｆ間の差分が小さくなるように調整される。この結果、入力データＩＤの特徴が特徴データＣＤに適切に反映されるように、演算パラメータＰｅ、Ｐｄを調整できる。従って、上記トレーニング処理を用いてトレーニングされた生成ネットワークＧＮを用いれば、入力データＩＤの特徴が反映された適切な偽データＯＤを生成できる。 In the main process, the generation network GN and the discrimination network DN are trained in parallel by repeating the first to fifth steps multiple times (S260 in FIG. 13). According to this main processing, the plurality of calculation parameters Pe and Pd of the generation network GN are not only adjusted so that the difference between the identification data DD and the teacher data LD becomes large, but also are adjusted to increase the difference between the identification data DD and the teacher data LD. is used to adjust the difference between these feature data CD and CDf to be small. As a result, the calculation parameters Pe and Pd can be adjusted so that the characteristics of the input data ID are appropriately reflected in the characteristic data CD. Therefore, by using the generation network GN trained using the above training process, it is possible to generate appropriate fake data OD that reflects the characteristics of the input data ID.

例えば、エンコーダＥＣが入力データＩＤの特徴（例えば、入力画像ＩＩに示される文字の書体に依存しない本質的な特徴）を、特徴データの空間（潜在空間（latent space）とも呼ぶ）に適切にマッピングできており、デコーダＤＣによって、その特徴が適切に偽データＯＤに再現されているとする。この場合には、特徴データＣＤと特徴データＣＤｆとの差分は十分に小さくなると考えられる。このために、特徴データＣＤと特徴データＣＤｆとの差分が小さくなるように、生成ネットワークＧＮをトレーニングすることで、エンコーダＥＣが潜在空間に入力データＩＤの特徴をマッピングする精度を向上できるとともに、デコーダＤＣがその特徴を偽データＯＤに再現する精度を向上することができる。換言すれば、例えば、入力データＩＤの特定の属性（例えば、表現すべき文字の種類）を保持しつつ、変換すべき別の属性（例えば、書体）を変換できるように、生成ネットワークＧＮをトレーニングすることができる。また、限られた入力データＩＤだけでなく、多様な入力データＩＤ（例えば、様々な文字を示す入力データＩＤ）に対応して、変換すべき属性（例えば、書体）を変換した偽データＯＤを生成できるように、生成ネットワークＧＮをトレーニングできる（多様性の確保）。さらには、エンコーダＥＣが潜在空間に誤ったマッピングを行う方向にトレーニングが進むことを抑制できるので、生成ネットワークＧＮのトレーニングを安定させることができる。 For example, the encoder EC appropriately maps the features of the input data ID (for example, the essential features independent of the font of the characters shown in the input image II) into the space of feature data (also called latent space). It is assumed that the characteristics are appropriately reproduced in the fake data OD by the decoder DC. In this case, the difference between the feature data CD and the feature data CDf is considered to be sufficiently small. For this purpose, by training the generation network GN so that the difference between the feature data CD and the feature data CDf becomes small, the accuracy with which the encoder EC maps the features of the input data ID in the latent space can be improved, and the decoder The accuracy with which the DC reproduces its features in the fake data OD can be improved. In other words, for example, the generation network GN is trained so that a specific attribute of the input data ID (for example, the type of character to be expressed) can be maintained while converting another attribute to be converted (for example, the typeface). can do. In addition, we can create fake data OD with converted attributes (for example, font) in response to not only limited input data IDs but also various input data IDs (for example, input data IDs indicating various characters). The generation network GN can be trained to generate (ensure diversity). Furthermore, since it is possible to prevent the training from progressing in a direction in which the encoder EC performs incorrect mapping in the latent space, it is possible to stabilize the training of the generation network GN.

さらに、本実施例のメイン処理では、上述した第５工程において、さらに、一の入力データに対応する実データＲＤおよび偽データＯＤを用いて、実データＲＤと偽データＯＤとの差分が小さくなるように（本実施例では、誤差値Ｅ２が小さくなるように）、生成ネットワークＧＮの複数個の演算パラメータＰｅ、Ｐｄを調整する（図１３のＳ２４０、Ｓ２５５）。この結果、所望の偽データＯＤが生成されるように、生成ネットワークＧＮをトレーニングできる。換言すれば、実データＲＤと偽データＯＤとの差分が小さくなるように生成ネットワークＧＮをトレーニングすることで、生成ネットワークＧＮは、単に識別ネットワークＤＮが真偽の識別を誤るような偽データＯＤを生成するだけでなく、実データＲＤに近い偽データＯＤを生成するようにトレーニングされる。例えば、本実施例では、偽データＯＤにおいて、実データＲＤで表現された書体が適切に再現されるように、生成ネットワークＧＮをトレーニングできる。 Furthermore, in the main processing of this embodiment, in the fifth step mentioned above, the difference between the real data RD and the fake data OD is further reduced by using the real data RD and the fake data OD corresponding to one input data. (in this embodiment, the plurality of calculation parameters Pe and Pd of the generation network GN are adjusted so that the error value E2 becomes small) (S240 and S255 in FIG. 13). As a result, the generation network GN can be trained to generate the desired false data OD. In other words, by training the generation network GN so that the difference between the real data RD and the false data OD becomes small, the generation network GN simply generates false data OD that would cause the identification network DN to misidentify the truth. It is trained not only to generate, but also to generate false data OD that is close to the real data RD. For example, in this embodiment, the generation network GN can be trained so that the typeface expressed in the real data RD is appropriately reproduced in the fake data OD.

また、本実施例の事前処理（図１２）は、メイン処理（図１３）と同様に、入力データＩＤを生成ネットワークＧＮに入力することによって入力データＩＤに対応する偽データＯＤを出力させる第１工程（図１２のＳ１３０、Ｓ１４０）と、偽データＯＤに対して、エンコーダＥＣによる次元削減処理を実行して特徴データＣＤｆを生成する第４工程（図１２のＳ１６０）と、を備える。事前処理は、さらに、特徴データＣＤと特徴データＣＤｆとを用いて、識別データＤＤと教師データＬＤとを用いずに、特徴データＣＤと特徴データＣＤｆとの差分が小さくなるように（本実施例では誤差値Ｅ１が小さくなるように）、生成ネットワークＧＮの複数個の演算パラメータＰｅ、Ｐｄを調整する第６工程（図１２のＳ１８０）を備える。そして、事前処理では、第１工程と第４工程と第６工程とを複数回繰り返す（図１２のＳ１９０）ことによって、生成ネットワークＧＮをトレーニングする。事前処理の後に、上述したメイン処理が実行される（図１１）。このように、特徴データＣＤ、ＣＤｆの差分が小さくなるように、生成ネットワークＧＮをトレーニングした後に、生成ネットワークＧＮと識別ネットワークＤＮとを並行してトレーニングする。この結果、事前処理において、エンコーダＥＣが潜在空間に入力データＩＤの特徴をマッピングする精度をある程度保証することができる。したがって、入力データＩＤの特徴が第１の特徴データに適切に反映されるように、生成ネットワークＧＮを効果的にトレーニングできるので、上記トレーニング処理を用いてトレーニングされた生成ネットワークＧＮを用いれば、入力データＩＤの特徴が反映された適切な偽データＯＤを生成できる。 Further, in the pre-processing (FIG. 12) of this embodiment, similarly to the main processing (FIG. 13), the first process outputs the fake data OD corresponding to the input data ID by inputting the input data ID to the generation network GN. (S130, S140 in FIG. 12), and a fourth step (S160 in FIG. 12) of performing dimension reduction processing by the encoder EC on the fake data OD to generate feature data CDf. The preprocessing is further performed using the feature data CD and the feature data CDf, without using the identification data DD and the teacher data LD, so that the difference between the feature data CD and the feature data CDf becomes small (this embodiment A sixth step (S180 in FIG. 12) of adjusting a plurality of calculation parameters Pe and Pd of the generation network GN so that the error value E1 becomes smaller) is provided. In the pre-processing, the first step, the fourth step, and the sixth step are repeated multiple times (S190 in FIG. 12) to train the generation network GN. After the pre-processing, the main processing described above is executed (FIG. 11). In this way, after training the generation network GN so that the difference between the feature data CD and CDf becomes small, the generation network GN and the discrimination network DN are trained in parallel. As a result, in the pre-processing, the accuracy with which the encoder EC maps the features of the input data ID onto the latent space can be guaranteed to some extent. Therefore, the generation network GN can be effectively trained so that the features of the input data ID are appropriately reflected in the first feature data, so if the generation network GN trained using the above training process is used, the input Appropriate fake data OD that reflects the characteristics of data ID can be generated.

ここで、上述したように、ＧＡＮｓでは、生成ネットワークＧＮのタスク（画像データの生成）が識別ネットワークＤＮのタスク（画像データの識別）よりも重い。メイン処理において、生成ネットワークＧＮのトレーニングは、識別ネットワークＤＮのトレーニングに対して遅れがちである。仮に、事前処理を実行しない場合には、メイン処理において、生成ネットワークＧＮのトレーニングが識別ネットワークＤＮのトレーニングに対して過度に遅れる可能性がある。この場合には、生成ネットワークＧＮの複数個の演算パラメータＰｅ、Ｐｄを調整する際に、勾配の消失が発生して生成ネットワークＧＮのトレーニングが進まなくなる場合がある。本実施例によれば、事前処理において、特徴データＣＤと特徴データＣＤｆとの差分が小さくなるように、生成ネットワークＧＮの複数個の演算パラメータＰｅ、Ｐｄが調整されるので、このような不都合の発生を抑制できる。したがって、本実施例によれば、トレーニングの安定化およびトレーニング時間の短縮を実現できる。 Here, as described above, in GANs, the task of the generation network GN (generation of image data) is heavier than the task of the identification network DN (identification of image data). In the main process, the training of the generation network GN tends to lag behind the training of the discrimination network DN. If pre-processing is not performed, there is a possibility that training of the generation network GN will be excessively delayed with respect to training of the identification network DN in the main processing. In this case, when adjusting the plurality of calculation parameters Pe and Pd of the generation network GN, the gradient may disappear and the training of the generation network GN may not progress. According to this embodiment, in the pre-processing, a plurality of calculation parameters Pe and Pd of the generation network GN are adjusted so that the difference between the feature data CD and the feature data CDf becomes small, so that such inconveniences can be avoided. The occurrence can be suppressed. Therefore, according to this embodiment, it is possible to stabilize training and shorten training time.

さらに、本実施例の事前処理では、第６工程において、さらに、一の入力データに対応する実データＲＤおよび偽データＯＤを用いて、実データＲＤと偽データＯＤとの差分が小さくなるように（本実施例では、誤差値Ｅ２が小さくなるように）、生成ネットワークＧＮの複数個の演算パラメータＰｅ、Ｐｄを調整する（図１２のＳ１５０、Ｓ１８０）。この結果、メイン処理の前に、所望の偽データＯＤが生成されるように、生成ネットワークＧＮの複数個の演算パラメータＰｅ、Ｐｄをできる。この結果、所望の偽データＯＤが生成されるように、生成ネットワークＧＮを効率良くトレーニングできる。また、生成ネットワークＧＮのトレーニングの進み具合が識別ネットワークＤＮのトレーニングの進み具合よりも遅れる不都合を、さらに抑制できる。 Furthermore, in the pre-processing of this embodiment, in the sixth step, the real data RD and fake data OD corresponding to one input data are further used to reduce the difference between the real data RD and the fake data OD. A plurality of calculation parameters Pe and Pd of the generation network GN are adjusted (in this embodiment, so that the error value E2 is small) (S150 and S180 in FIG. 12). As a result, a plurality of calculation parameters Pe and Pd of the generation network GN can be adjusted before the main processing so that the desired false data OD is generated. As a result, the generation network GN can be efficiently trained to generate the desired false data OD. Further, it is possible to further suppress the inconvenience that the training progress of the generation network GN lags behind the training progress of the identification network DN.

さらに、本実施例では、入力データＩＤは、第１の属性（例えば、「Ａ」の文字であるという属性）と第２の属性（例えば、第１の書体であるという属性）を有し、かつ、第３の属性（例えば、第２の書体であるという属性）を有しない画像データである。そして、出力データ（偽データ）ＯＤは、第１の属性と第３の属性を有し、かつ、第２の属性を有しない画像データである。したがって、本実施例によれば、学習済みの生成ネットワークＧＮは、入力データＩＤの第１の属性を維持しつつ、入力データＩＤの第２の属性を第３の属性に変換することによって出力データ（偽データ）ＯＤを生成することができる。 Furthermore, in this embodiment, the input data ID has a first attribute (for example, an attribute that is the character "A") and a second attribute (for example, an attribute that is a first font), In addition, it is image data that does not have the third attribute (for example, the attribute that it is the second font). The output data (false data) OD is image data that has the first attribute and the third attribute, but does not have the second attribute. Therefore, according to this embodiment, the trained generation network GN converts the second attribute of the input data ID into the third attribute while maintaining the first attribute of the input data ID, thereby generating the output data. (Fake data) OD can be generated.

さらに、本実施例のトレーニング装置１００（図１）、および、学習済みの生成ネットワークＧＮによれば、生成ネットワークＧＮは、上述したトレーニング処理によってトレーニングされているので、入力データＩＤの特徴が反映された適切な偽データＯＤを生成できる。また、限られた入力データＩＤだけでなく、多様な入力データＩＤ（例えば、様々な文字を示す入力データＩＤ）に対応して、変換すべき属性（例えば、書体）を変換した偽データＯＤを生成できる。 Furthermore, according to the training device 100 (FIG. 1) of the present embodiment and the trained generation network GN, the generation network GN is trained by the training process described above, so that the characteristics of the input data ID are not reflected. Appropriate fake data OD can be generated. In addition, we can create fake data OD with converted attributes (for example, font) in response to not only limited input data IDs but also various input data IDs (for example, input data IDs indicating various characters). Can be generated.

以上の説明から解るように、本実施例の実データペアＰｒは、第１のペアの例であり、偽データペアＰｆは、第２のペアの例である。また、入力データＩＤの特徴データＣＤは、第１の特徴データの例であり、偽データＯＤの特徴データＣＤｆは、第２の特徴データの例である。また、生成ネットワークＧＮの演算パラメータＰｅ、Ｐｄは、第１の演算パラメータの例であり、識別ネットワークＤＮの演算パラメータＰｄｎは、第２の演算パラメータの例である。生成ネットワークＧＮは、第１の機械学習モデルの例であり、識別ネットワークＤＮは、第２の機械学習モデルの例である。 As can be seen from the above description, the real data pair Pr of this embodiment is an example of the first pair, and the false data pair Pf is an example of the second pair. Further, the feature data CD of the input data ID is an example of first feature data, and the feature data CDf of the fake data OD is an example of second feature data. Furthermore, the calculation parameters Pe and Pd of the generation network GN are examples of first calculation parameters, and the calculation parameters Pdn of the identification network DN are examples of second calculation parameters. The generation network GN is an example of a first machine learning model, and the discrimination network DN is an example of a second machine learning model.

Ｂ．変形例：
（１）上記実施例では、生成ネットワークＧＮをトレーニングする際に、事前処理とメイン処理との両方にて、特徴データＣＤと特徴データＣＤｆとの間の誤差値Ｅ１が用いられている。これに限らず、事前処理とメイン処理との少なくとも一方だけで、誤差値Ｅ１が用いられても良い。例えば、上記実施例にて、メイン処理のＳ２５５では、誤差値Ｅ１を用いて、生成ネットワークＧＮの演算パラメータＰｅ、Ｐｄを調整し、事前処理のＳ１８０では、誤差値Ｅ１を用いずに、誤差値Ｅ２のみを用いて、生成ネットワークＧＮの演算パラメータＰｅ、Ｐｄを調整しても良い。あるいは、メイン処理のＳ２５５では、誤差値Ｅ１を用いずに、誤差値Ｅｇａｎ、Ｅ２のみを用いて、生成ネットワークＧＮの演算パラメータＰｅ、Ｐｄを調整し、事前処理のＳ１８０では、誤差値Ｅ１を用いて、生成ネットワークＧＮの演算パラメータＰｅ、Ｐｄを調整しても良い。 B. Variant:
(1) In the above embodiment, when training the generation network GN, the error value E1 between the feature data CD and the feature data CDf is used in both the pre-processing and the main processing. The present invention is not limited to this, and the error value E1 may be used only in at least one of the pre-processing and the main processing. For example, in the above embodiment, in S255 of the main processing, the calculation parameters Pe and Pd of the generation network GN are adjusted using the error value E1, and in S180 of the pre-processing, the error value is adjusted without using the error value E1. The calculation parameters Pe and Pd of the generation network GN may be adjusted using only E2. Alternatively, in S255 of the main process, the calculation parameters Pe, Pd of the generation network GN are adjusted using only the error value Egan, E2 without using the error value E1, and in S180 of the pre-processing, the error value E1 is used. Then, the calculation parameters Pe and Pd of the generation network GN may be adjusted.

（２）上記実施例のメイン処理のＳ２５５では、誤差値Ｅｇａｎと、誤差値Ｅ１と、誤差値Ｅ２と、を用いて、生成ネットワークＧＮのトレーニング、すなわち、演算パラメータＰｅ、Ｐｄの調整が行われている。これに代えて、メイン処理のＳ２５５では、誤差値Ｅ２を用いることなく、誤差値Ｅｇａｎと誤差値Ｅ１とのみを用いて、演算パラメータＰｅ、Ｐｄの調整が行われても良い。 (2) In S255 of the main processing of the above embodiment, the training of the generation network GN, that is, the adjustment of the calculation parameters Pe and Pd, is performed using the error value Egan, the error value E1, and the error value E2. ing. Alternatively, in S255 of the main process, the calculation parameters Pe and Pd may be adjusted using only the error value Egan and the error value E1 without using the error value E2.

（３）上記実施例の事前処理のＳ１８０では、誤差値Ｅ１と誤差値Ｅ２とを用いて、生成ネットワークＧＮのトレーニング、すなわち、演算パラメータＰｅ、Ｐｄの調整が行われている。これに代えて、事前処理のＳ１８０では、誤差値Ｅ２を用いることなく、誤差値Ｅ１のみを用いて、演算パラメータＰｅ、Ｐｄの調整が行われても良い。 (3) In S180 of the pre-processing in the above embodiment, the error value E1 and the error value E2 are used to train the generation network GN, that is, adjust the calculation parameters Pe and Pd. Alternatively, in S180 of the pre-processing, the calculation parameters Pe and Pd may be adjusted using only the error value E1 without using the error value E2.

（４）上記実施例のトレーニング処理において、事前処理を省略して、メイン処理のみで、生成ネットワークＧＮのトレーニングが実行されても良い。 (4) In the training process of the above embodiment, the training of the generation network GN may be performed by omitting the pre-processing and only by the main process.

（５）上記実施例では、誤差値Ｅｇａｎには、シグモイドクロスエントロピー誤差が用いられ、誤差値Ｅ１には、平均二乗誤差が用いられ、誤差値Ｅ１には、平均絶対誤差が用いれている。これに代えて、誤差値Ｅｇａｎ、Ｅ１、Ｅ２には、他の種類の誤差値が用いれても良い。例えば、誤差値Ｅｇａｎには、ソフトマックスクロスエントロピー誤差が用いられても良い。誤差値Ｅ１には、平均絶対誤差が用いられても良い。誤差値Ｅ２には、平均二乗誤差が用いられても良い。 (5) In the above embodiment, a sigmoid cross entropy error is used for the error value Egan, a mean square error is used for the error value E1, and a mean absolute error is used for the error value E1. Instead, other types of error values may be used for the error values Egan, E1, and E2. For example, a softmax cross entropy error may be used for the error value Egan. An average absolute error may be used as the error value E1. A mean square error may be used for the error value E2.

（６）上記実施例は、生成ネットワークＧＮは、一般的には、上述したように、第１の属性（例えば、「Ａ」の文字であるという属性）と第２の属性（例えば、第１の書体であるという属性）を有し、かつ、第３の属性（例えば、第２の書体であるという属性）を有しない画像データを、第１の属性と第３の属性を有し、かつ、第２の属性を有しない画像データに変換するモデルである。このような変換は、一般的には、「スタイル変換(style transfer)」「画像翻訳（image-to-image translation）」とも呼ばれる。他の具体例としては、生成ネットワークＧＮは、例えば、所定の物（建物、靴などの静物や馬、犬などの動物）を示す線画（線のみで描画された画像）を、所定の物を示すカラー画像に変換するモデルであっても良い。また、生成ネットワークＧＮは、例えば、所定の場所を示す航空写真を、該所定の場所を示す地図に変換するモデルであっても良い。これらの場合にも、生成ネットワークＧＮは、第１の属性（所定の物を示すという属性、所定の場所を示すという属性）と第２の属性（線画であるという属性、航空写真であるという属性）を有し、かつ、第３の属性（カラー画像であるという属性、地図であるという属性）を有しない画像データを、第１の属性と第３の属性を有し、かつ、第２の属性を有しない画像データに変換するモデルである、と言うことができる。 (6) In the above embodiment, the generation network GN generally has a first attribute (for example, the attribute that it is the letter "A") and a second attribute (for example, the first image data that has the first attribute and the third attribute and does not have the third attribute (for example, the attribute that it is the second font) and does not have the third attribute (for example, the attribute that it is the second font) , is a model for converting into image data that does not have the second attribute. Such conversion is also commonly referred to as "style transfer" or "image-to-image translation." As another specific example, the generation network GN may generate a line drawing (an image drawn only with lines) showing a predetermined object (a still life such as a building, shoes, or an animal such as a horse or dog). It may also be a model that converts into a color image as shown. Furthermore, the generation network GN may be, for example, a model that converts an aerial photograph showing a predetermined location into a map showing the predetermined location. In these cases as well, the generation network GN has a first attribute (an attribute indicating a predetermined object, an attribute indicating a predetermined location) and a second attribute (an attribute indicating a line drawing, an attribute indicating an aerial photograph). ) and does not have the third attribute (the attribute of being a color image, the attribute of being a map), the image data that has the first attribute and the third attribute and does not have the third attribute (the attribute of being a color image, the attribute of being a map) is It can be said that this is a model for converting image data to image data that does not have attributes.

（７）上記実施例では、生成ネットワークＧＮでは、入力データＩＤと偽データＯＤとは共に、画像データである。これに限らず、入力データＩＤや偽データＯＤの両方または片方は、テキストや音声を示すデータであっても良い。例えば、生成ネットワークＧＮは、特定の物を表現する画像データを、特定の物を表現するテキストや音声に変換するモデルであっても良い。あるいは、生成ネットワークＧＮは、特定の物を表現するテキストを、特定の物を表現する画像や音声に変換するモデルであっても良い。 (7) In the above embodiment, in the generation network GN, both the input data ID and the false data OD are image data. The present invention is not limited to this, and both or one of the input data ID and the fake data OD may be data indicating text or audio. For example, the generation network GN may be a model that converts image data representing a specific object into text or audio representing the specific object. Alternatively, the generation network GN may be a model that converts text expressing a specific object into images or sounds expressing the specific object.

（８）生成ネットワークＧＮや識別ネットワークＤＮの具体的な構成は、図５、図６に示す構成に限られず、他の様々な構成であっても良い。例えば、生成ネットワークＧＮや識別ネットワークＤＮにおいて、畳込層や転置畳込層の層数は、適宜に変更されて良い。また、生成ネットワークＧＮや識別ネットワークＤＮは、畳込層や転置畳込層の全部または一部に代えて、プーリング層や全結合層を備えても良い。また、各層で出力された値に対して実行される後処理も任意の様々な構成が採用される。例えば、後処理に用いられる活性化関数は、任意の他の関数、例えば、ＰＲｅＬＵ、ソフトマックス、シグモイドが用いられても良い。また、バッチノーマリゼイション、ドロップアウトなどの処理も後処理として適宜に追加や省略がされ得る。さらに、上記実施例の生成ネットワークＧＮでは、デコーダＤＣのｍ番目の転置畳込層ＤＬ＿ｍには、エンコーダＥＣの（９－ｍ）番目の畳込層ＥＬ＿（９－ｍ）から出力される値が入力されるが、該入力は省略されても良い。 (8) The specific configurations of the generation network GN and identification network DN are not limited to the configurations shown in FIGS. 5 and 6, and may be various other configurations. For example, in the generation network GN and the identification network DN, the number of convolutional layers and transposed convolutional layers may be changed as appropriate. Further, the generation network GN and the identification network DN may include a pooling layer or a fully connected layer instead of all or part of the convolution layer or transposed convolution layer. Furthermore, various arbitrary configurations may be adopted for the post-processing performed on the values output in each layer. For example, the activation function used in post-processing may be any other function, such as PReLU, softmax, or sigmoid. Additionally, processes such as batch normalization and dropout may be added or omitted as post-processing as appropriate. Furthermore, in the generation network GN of the above embodiment, the m-th transposed convolutional layer DL_m of the decoder DC has the value output from the (9-m)th convolutional layer EL_(9-m) of the encoder EC. However, the input may be omitted.

また、識別ネットワークＤＮは、ニューラルネットワークとは異なる機械学習モデルであって、教師データを用いてトレーニングされる機械学習モデル、例えば、サポートベクターマシーン（ＳＶＭ）であっても良い。 Further, the identification network DN may be a machine learning model different from a neural network, and may be a machine learning model trained using teacher data, such as a support vector machine (SVM).

（９）図１のデータ生成装置２００や図７のトレーニング装置１００のハードウェア構成は、一例であり、これに限られない。例えば、データ生成装置２００やトレーニング装置１００のプロセッサは、ＣＰＵに限らず、ＧＰＵ（Graphics Processing Unit）やＡＳＩＣ（application specific integrated circuit）、あるいは、これらとＣＰＵとの組み合わせであっても良い。また、トレーニング装置１００やデータ生成装置２００は、ネットワークを介して互いに通信可能な複数個の計算機（例えば、いわゆるクラウドサーバ）であっても良い。 (9) The hardware configurations of the data generation device 200 in FIG. 1 and the training device 100 in FIG. 7 are examples, and are not limited thereto. For example, the processor of the data generation device 200 and the training device 100 is not limited to a CPU, but may be a GPU (Graphics Processing Unit), an ASIC (application specific integrated circuit), or a combination of these and a CPU. Further, the training device 100 and the data generation device 200 may be a plurality of computers (for example, a so-called cloud server) that can communicate with each other via a network.

（１０）上記各実施例において、ハードウェアによって実現されていた構成の一部をソフトウェアに置き換えるようにしてもよく、逆に、ソフトウェアによって実現されていた構成の一部あるいは全部をハードウェアに置き換えるようにしてもよい。例えば、生成ネットワークＧＮや識別ネットワークＤＮは、プログラムモジュールに代えて、ASIC（Application Specific Integrated Circuit）等のハードウェア回路によって実現されてよい。 (10) In each of the above embodiments, part of the configuration realized by hardware may be replaced with software, or conversely, part or all of the configuration realized by software may be replaced by hardware. You can do it like this. For example, the generation network GN and the identification network DN may be realized by a hardware circuit such as an ASIC (Application Specific Integrated Circuit) instead of a program module.

以上、実施例、変形例に基づき本発明について説明してきたが、上記した発明の実施の形態は、本発明の理解を容易にするためのものであり、本発明を限定するものではない。本発明は、その趣旨並びに特許請求の範囲を逸脱することなく、変更、改良され得ると共に、本発明にはその等価物が含まれる。 Although the present invention has been described above based on examples and modifications, the embodiments of the invention described above are for facilitating understanding of the present invention, and are not intended to limit the present invention. The present invention may be modified and improved without departing from the spirit and scope of the claims, and the present invention includes equivalents thereof.

１００…トレーニング装置、１１０…ＣＰＵ、１２０…揮発性記憶装置、１３０…不揮発性記憶装置、１４０…表示部、１５０…操作部、２００…データ生成装置、２１０…ＣＰＵ、２２０…揮発性記憶装置、２３０…不揮発性記憶装置、２４０…表示部、２５０…操作部、２７０…通信インタフェース、３００…プリンタ、１０００…ネットワークシステム、ＥＬ＿０、Ｌ＿０…入力層、ＥＬ＿１～ＥＬ＿８、Ｌ＿０～Ｌ＿５…畳込層、ＤＬ＿１～ＤＬ＿８…転置畳込層、ＥＣ…エンコーダ、ＤＣ…デコーダ、ＩＤ…入力データ、ＯＤ…出力データ（偽データ）、ＣＤ、ＣＤｆ…特徴データ、ＬＤ…教師データ、ＧＮ…生成ネットワーク、ＤＮ…識別ネットワーク、Ｐｄ、Ｐｅ、Ｐｄｎ…演算パラメータ、Ｐｒ…実データペア、ＰＧｇ、ＰＧｔ…コンピュータプログラム 100... Training device, 110... CPU, 120... Volatile storage device, 130... Non-volatile storage device, 140... Display section, 150... Operation section, 200... Data generation device, 210... CPU, 220... Volatile storage device, 230...Nonvolatile storage device, 240...Display unit, 250...Operation unit, 270...Communication interface, 300...Printer, 1000...Network system, EL_0, L_0...Input layer, EL_1 to EL_8, L_0 to L_5...Convolution layer, DL_1 to DL_8...Transposed convolution layer, EC...Encoder, DC...Decoder, ID...Input data, OD...Output data (fake data), CD, CDf...Feature data, LD...Teacher data, GN...Generation network, DN... Identification network, Pd, Pe, Pdn... calculation parameters, Pr... real data pair, PGg, PGt... computer program

Claims

A method for training a machine learning model, the method comprising:
A first step of outputting fake data corresponding to the input data by inputting the input data to a first machine learning model, the first machine learning model an encoder that performs dimension reduction processing to reduce the number of dimensions to generate first feature data; and a decoder that performs dimension restoration processing to restore the number of dimensions to the first feature data to generate the fake data. and the first step, wherein the first machine learning model executes the dimension reduction process and the dimension restoration process using a plurality of first calculation parameters;
A second step of inputting a plurality of data pairs including a first pair and a second pair into a second machine learning model and outputting a plurality of identification data corresponding to the plurality of data pairs. The first pair is a pair of data consisting of the input data and actual data corresponding to the input data, and the second pair is the input data and actual data corresponding to the input data. a pair of data consisting of the false data, the identification data indicating a result of identifying whether the corresponding data pair is the first pair or the second pair; The second machine learning model performs an operation using a plurality of second operation parameters to generate the identification data, and the task of the second machine learning model is to the second step, which is lighter than the task;
A third step of adjusting the plurality of second calculation parameters using the identification data and teacher data indicating a target value of the identification data so that the difference between the identification data and the teacher data becomes small. and,
a fourth step of performing the dimension reduction process by the encoder on the fake data to generate second feature data;
Using the identification data, the teaching data, the first feature data, and the second feature data, the difference between the identification data and the teaching data becomes large, and the first feature data and the a fifth step of adjusting the plurality of first calculation parameters so that the difference with the second feature data is small;
Using the first feature data and the second feature data, without using the identification data and the teacher data, the difference between the first feature data and the second feature data is reduced. a sixth step of adjusting the plurality of first calculation parameters,
Equipped with
The first machine learning model is created by repeating the first step, the fourth step, and the sixth step multiple times without performing the second step, the third step, and the fifth step. After training, the first machine learning model and the second machine learning model are trained in parallel by repeating the first to fifth steps multiple times.

A method for training a machine learning model, the method comprising:
A first step of outputting fake data corresponding to the input data by inputting the input data to a first machine learning model, the first machine learning model an encoder that performs dimension reduction processing to reduce the number of dimensions to generate first feature data; and a decoder that performs dimension restoration processing to restore the number of dimensions to the first feature data to generate the false data. and the first step, wherein the first machine learning model executes the dimension reduction process and the dimension restoration process using a plurality of first calculation parameters;
A second step of inputting a plurality of data pairs including a first pair and a second pair into a second machine learning model and outputting a plurality of identification data corresponding to the plurality of data pairs. The first pair is a pair of data consisting of the input data and actual data corresponding to the input data, and the second pair is the input data and actual data corresponding to the input data. a pair of data consisting of the false data, the identification data indicating a result of identifying whether the corresponding data pair is the first pair or the second pair; The second machine learning model performs an operation using a plurality of second operation parameters to generate the identification data, and the task of the second machine learning model is to the second step, which is lighter than the task;
A third step of adjusting the plurality of second calculation parameters using the identification data and teacher data indicating a target value of the identification data so that the difference between the identification data and the teacher data becomes small. and,
a fourth step of performing the dimension reduction process by the encoder on the fake data to generate second feature data;
a fifth step of adjusting the plurality of first calculation parameters using the identification data and the teacher data so that the difference between the identification data and the teacher data becomes large;
Using the first feature data and the second feature data, without using the identification data and the teacher data, the difference between the first feature data and the second feature data is reduced. a sixth step of adjusting the plurality of first calculation parameters,
Equipped with
The first machine learning model is created by repeating the first step, the fourth step, and the sixth step multiple times without performing the second step, the third step, and the fifth step. After training, the first machine learning model and the second machine learning model are trained in parallel by repeating the first to fifth steps multiple times.

3. The method according to claim 1 or 2,
In the sixth step, the first calculation parameter is further adjusted using the real data and the fake data corresponding to the specific input data so that the difference between the real data and the fake data is small. How to adjust.

The method according to any one of claims 1 to 3,
In the fifth step, the first calculation parameter is further adjusted using the real data and the fake data corresponding to the specific input data so that the difference between the real data and the fake data is small. How to adjust.