JP7346767B1

JP7346767B1 - Learning device and reasoning device

Info

Publication number: JP7346767B1
Application number: JP2023118774A
Authority: JP
Inventors: 修二奥野
Original assignee: Individual
Current assignee: Individual
Priority date: 2023-07-21
Filing date: 2023-07-21
Publication date: 2023-09-19
Anticipated expiration: 2043-07-21

Abstract

【課題】機械学習によるデータ処理において、元となる教師データの水増しを行って学習を行う際にも、より高品質のデータを準備して、機械学習モデルの学習効率の低下を確実に抑えることができる学習装置を提供する。【解決手段】学習装置は、元となる教師データの入力を受け付ける入力部１１０と、元となる教師データに基づいて加工された加工後教師データを作成するデータ加工部１１３と、学習対象の機械学習モデル１１１を用いて、少なくとも加工後教師データを含む教師データに基づいて、学習対象の機械学習モデル１１１における設定値を学習する処理を実行するための学習処理実行部１０１と、設定値を記憶する記憶部１２と、を備える。この構成により、学習装置では、機械学習によるデータ処理において、元となる教師データの水増しを行って学習を行う際にも、より高品質のデータを準備して、機械学習モデル１１１の学習効率の低下を抑えることができる。【選択図】図２[Problem] In data processing using machine learning, even when performing learning by inflating the original teacher data, prepare higher quality data and reliably suppress the decline in learning efficiency of machine learning models. We provide a learning device that allows you to A learning device includes an input unit 110 that receives input of original teacher data, a data processing unit 113 that creates processed teacher data based on the original teacher data, and a machine to be learned. a learning process execution unit 101 for executing a process of learning setting values in the machine learning model 111 to be learned based on teacher data including at least post-processed teacher data using the learning model 111; and a learning process execution unit 101 for storing setting values. A storage unit 12 is provided. With this configuration, the learning device prepares higher quality data even when performing learning by inflating the original teacher data in data processing by machine learning, and improves the learning efficiency of the machine learning model 111. The decline can be suppressed. [Selection diagram] Figure 2

Description

本発明は、機械学習モデルを用いた学習装置及び推論装置に関し、特に、画像処理用の機械学習モデルを用いた学習装置及び推論装置に関する。 The present invention relates to a learning device and an inference device using a machine learning model, and particularly to a learning device and an inference device using a machine learning model for image processing.

従来より、機械学習によるデータ処理方法で、コンピュータのプログラムに対して、教師データとなるデータセットを与えて、プログラムのパラメータを学習することで、任意のデータに対してデータ処理を行うことができる学習済みモデルを生成する方法が知られている。 Conventionally, data processing methods using machine learning can perform data processing on arbitrary data by giving a data set that serves as training data to a computer program and learning the program's parameters. A method of generating a trained model is known.

例えば、機械学習を用いた「入力画像（教師データ）→学習用プログラム→出力画像（教師データ）」という処理において、入力画像と出力画像の誤差が一番小さくなるように学習用プログラムのパラメータを計算して「学習済みモデル」を生成する。そして、この「学習済モデル」を用いることで、「入力画像（任意のデータ：例えば低解像度画像）→学習済みモデル→出力画像（推論データ：例えば高解像度画像）」という、入力画像から出力画像を推論することで出力画像を生成することができる。 For example, in the process of "input image (teacher data) → learning program → output image (teacher data)" using machine learning, the parameters of the learning program are set so that the error between the input image and the output image is minimized. Calculate and generate a "trained model." By using this "trained model", the output image is converted from the input image to "input image (arbitrary data: e.g. low resolution image) → trained model → output image (inference data: e.g. high resolution image)". An output image can be generated by inferring.

近年、機械学習のうちニューラルネットワークを用いた機械学習が多くの分野に適用されている。特に画像認識、音声認識の分野にて、ニューラルネットワークを多層構造で使用したディープラーニング（Deep Learning；深層学習）が高い認識精度を発揮している。多層化したディープラーニングでも、入力の特徴を抽出する畳み込み層及びプーリング層を複数回使用した畳み込みニューラルネットワーク（ＣＮＮ：Convolutional Neural Network）を用いた画像処理が行なわれている。 In recent years, among machine learning, machine learning using neural networks has been applied to many fields. Especially in the fields of image recognition and voice recognition, deep learning that uses a multilayered neural network has demonstrated high recognition accuracy. Even in multi-layered deep learning, image processing is performed using a convolutional neural network (CNN) that uses convolutional layers and pooling layers multiple times to extract features of the input.

そして、ニューラルネットワークを用いた画像処理としては、信号を高解像度化させる超解像装置（例えば、特許文献１参照）や疾患領域の違いを把握することを容易にして精度の高い診断支援を行う診断支援装置（例えば、特許文献２参照）などがある。また、デジタル画像に対する自由変形による劣化を回避したプログラム、画像処理方法及び画像処理装置も開示されている（例えば、特許文献３参照）。 Image processing using neural networks includes super-resolution devices that increase the resolution of signals (see, for example, Patent Document 1) and highly accurate diagnostic support that facilitates understanding of differences in disease areas. There are diagnostic support devices (for example, see Patent Document 2). Further, a program, an image processing method, and an image processing apparatus that avoid deterioration due to free deformation of digital images are also disclosed (for example, see Patent Document 3).

特開2020-27557号公報Japanese Patent Application Publication No. 2020-27557 特開2018-38789号公報Japanese Patent Application Publication No. 2018-38789 特許第6570164号公報Patent No. 6570164

機械学習において、教師データの質と量は、そのまま機械学習モデルの性能に直結する。大量の高品質の学習用データセットを準備することは非常なる労力を有するため、少ない学習用データで、高い性能を得るために、学習データの水増し（Data Augmentation）が一般的に用いられている。 In machine learning, the quality and quantity of training data are directly linked to the performance of machine learning models. Preparing a large amount of high-quality training data sets requires a lot of effort, so data augmentation is commonly used to obtain high performance with a small amount of training data. .

例えば、少ない教師画像データから効率よく学習を行うために、拡大・縮小、回転、菱形・台形変形、シフト、カラーシフト、シャープネス・アンシャープネスなどの編集を教師用データに適用して、データ量の水増し（Data Augmentation）を行う。このうちの拡大・縮小、回転、菱形・台形変形、シフトは、小さい画像データをそのまま変形した場合にはボケやエイリアスを生じるので、データの水増しを行った場合は常に一定以上ボケやノイズが発生した水増しデータが新たに生成される。このため、従来のデータ水増しにおいては、画質のいい画像に含まれる微細な高周波データを切り捨てているので、このように画質の劣化した水増しデータを用いて学習すると、反って機械学習モデルの性能が悪化するという問題が生じる。 For example, in order to efficiently learn from a small amount of training image data, editing such as enlargement/reduction, rotation, diamond/trapezoid deformation, shift, color shift, sharpness/unsharpness, etc. can be applied to the training data to reduce the amount of data. Perform Data Augmentation. Among these, scaling up/down, rotation, rhombic/trapezoidal transformation, and shifting will cause blurring or aliasing if small image data is transformed as is, so if data is increased, blurring or noise will always occur above a certain level. New inflated data is generated. For this reason, in conventional data augmentation, minute high-frequency data included in images with good image quality are discarded, so learning using padded data with degraded image quality will warp the performance of the machine learning model. The problem arises that it gets worse.

そして、上記特許文献１乃至３においても少ない学習用データから高品質の学習用データを多量に準備するという内容に関しては開示されていない。 Furthermore, the above-mentioned Patent Documents 1 to 3 do not disclose the content of preparing a large amount of high-quality learning data from a small amount of learning data.

本発明は、上記課題に鑑みてなされたものであり、機械学習によるデータ処理において、元となる教師データの水増しを行って学習を行う際にも、より高品質のデータを準備して、機械学習モデルの学習効率の低下を確実に抑えることができる学習装置を提供することを目的とする。また、この機械学習モデルを用いた推論装置を提供することをも目的とする。 The present invention has been made in view of the above-mentioned problems, and even when performing learning by inflating the original teacher data in data processing by machine learning, it is possible to prepare higher quality data and use the machine learning method. It is an object of the present invention to provide a learning device that can reliably suppress a decrease in learning efficiency of a learning model. Another purpose is to provide an inference device using this machine learning model.

上記目的を達成するために本発明は、教師データに基づいて機械学習モデルにおける設定値を学習する学習装置であって、元となる教師データの入力を受け付ける入力部と、前記元となる教師データに基づいて加工された加工後教師データを作成するデータ加工部と、学習対象の機械学習モデルと、前記機械学習モデルを用いて、少なくとも前記加工後教師データを含む教師データに基づいて、前記学習対象の機械学習モデルにおける設定値を学習する処理を実行するための学習処理実行部と、前記設定値を記憶する記憶部と、を備え、前記データ加工部では、前記加工後教師データの目標解像度を特定し、前記元となる教師データに基づいて前記目標解像度を有する画像の所定倍の大きさの暫定基準画像を獲得し、前記暫定基準画像に対する回転、移動又は変形を含む編集を受け付け、編集実行後の前記暫定基準画像に対し、前記所定倍よりも高い倍率のサンプリングレートでサンプリングし、当該サンプリング後の画像を、前記目標解像度を有する画像の大きさに変換したものを前記加工後教師データとすることを特徴とするものである。 In order to achieve the above object, the present invention provides a learning device that learns setting values in a machine learning model based on teaching data, comprising: an input unit that receives input of original teaching data; a data processing unit that creates processed training data based on the processing, a machine learning model to be learned, and the machine learning model to perform the learning based on the training data including at least the processed training data. The data processing unit includes a learning processing execution unit for executing a process of learning setting values in a target machine learning model, and a storage unit for storing the setting values, and the data processing unit calculates the target resolution of the processed teacher data. , obtain a provisional reference image that is a predetermined times the size of the image having the target resolution based on the original teacher data, accept editing including rotation, movement, or deformation of the provisional reference image, and edit the provisional reference image. The tentative standard image after execution is sampled at a sampling rate higher than the predetermined magnification, and the sampled image is converted to an image size having the target resolution, and the processed training data is It is characterized by the following.

この学習装置において、前記学習装置は、さらに、前記元となる教師データの周囲を予め自動生成してより大きな教師データを作成するアウトペインティングモデルを有し、前記データ加工部は、前記アウトペインティングモデルにより作成された教師データを、前記加工後教師データに変換することが好ましい。 In this learning device, the learning device further includes an outpainting model that automatically generates the surroundings of the original teaching data in advance to create larger teaching data, and the data processing unit Preferably, the training data created by the processing model is converted into the processed training data.

この学習装置において、前記データ加工部は、編集実行後の前記暫定基準画像に対し、前記編集に応じてオーバーサンプルのレートを前記所定倍よりも高い倍率以上で決定し、決定されたレートでオーバーサンプリングを行ない、オーバーサンプリングにより得られた画像を、前記加工後教師データへ変換することが好ましい。 In this learning device, the data processing unit determines an oversampling rate at a magnification higher than the predetermined multiple for the provisional reference image after editing, and oversamples at the determined rate. It is preferable to perform sampling and convert the image obtained by oversampling into the processed teacher data.

この学習装置において、前記学習装置は、さらに、前記元となる教師データの解像度が前記目標解像度の前記所定倍未満である場合、前記元となる教師データに基づいて前記所定倍の超解像画像を作成する超解像モデルを有し、前記データ加工部においては、前記超解像モデルで作成された当該超解像画像を前記暫定基準画像とすることが好ましい。 In this learning device, the learning device further comprises, when the resolution of the original teacher data is less than the predetermined times the target resolution, a super-resolution image of the predetermined times the original teacher data. It is preferable that the data processing unit has a super-resolution model for creating a super-resolution image, and that the data processing unit uses the super-resolution image created by the super-resolution model as the provisional reference image.

また、上記目的を達成するために本発明は、上記学習装置の学習処理実行部において前記設定値を学習された機械学習モデルの、少なくとも一部を損失関数として用いることを特徴とする学習装置である。 Further, in order to achieve the above object, the present invention provides a learning device characterized in that at least a part of a machine learning model that has learned the setting value in a learning processing execution unit of the learning device is used as a loss function. be.

また、上記目的を達成するために本発明は、機械学習モデルを用いて対象データに対して所定の推論処理を実行する推論装置であって、前記対象データの入力を受け付ける入力部と、前記入力部から対象データが入力される機械学習モデルと、推論処理を実行する機械学習モデルを用いて、対象データに対して所定の推論処理を実行する推論処理実行部と、を備え、前記機械学習モデルは、上記学習装置の学習処理実行部において前記設定値を学習された機械学習モデルであることを特徴とするものである。 Further, in order to achieve the above object, the present invention provides an inference device that executes a predetermined inference process on target data using a machine learning model, comprising an input unit that receives input of the target data; a machine learning model into which target data is input from the unit; and an inference processing execution unit that performs predetermined inference processing on the target data using the machine learning model that executes inference processing, the machine learning model is a machine learning model in which the setting value is learned by a learning processing execution unit of the learning device.

また、本発明は、コンピュータを上記記載の学習装置又は推論装置として動作させることを特徴とするコンピュータプログラムである。 Further, the present invention is a computer program that causes a computer to operate as the learning device or inference device described above.

また、上記目的を達成するために本発明は、教師データに基づいて機械学習モデルにおける設定値を学習する学習方法であって、元となる教師データの入力を受け付ける入力ステップと、前記元となる教師データに基づいて加工された加工後教師データを作成するデータ加工ステップと、学習対象の機械学習モデルと、前記機械学習モデルを用いて、少なくとも前記加工後教師データを含む教師データに基づいて、前記学習対象の機械学習モデルにおける設定値を学習する処理を実行するための学習処理実行ステップと、前記設定値を記憶する記憶ステップと、を含み、前記データ加工ステップにおいては、前記加工後教師データの目標解像度を特定し、前記元となる教師データに基づいて前記目標解像度を有する画像の所定倍の大きさの暫定基準画像を獲得し、前記暫定基準画像に対する回転、移動又は変形を含む編集を受け付け、編集実行後の前記暫定基準画像に対し、前記所定倍よりも高い倍率のサンプリングレートでサンプリングし、当該サンプリング後の画像を、前記目標解像度を有する画像の大きさに変換したものを前記加工後教師データとすることを特徴とするものである。 Further, in order to achieve the above object, the present invention provides a learning method for learning setting values in a machine learning model based on teacher data, comprising an input step of receiving input of original teacher data; a data processing step of creating processed teacher data processed based on the teacher data, a machine learning model to be learned, and using the machine learning model, based on the teacher data including at least the processed teacher data, The data processing step includes a learning process execution step for performing a process of learning setting values in the machine learning model to be learned, and a storing step for storing the setting values, and in the data processing step, the processed teacher data specify a target resolution of the image, obtain a provisional reference image that is a predetermined times the size of the image having the target resolution based on the original training data, and perform editing including rotation, movement, or transformation on the provisional reference image. After receiving and editing, the provisional reference image is sampled at a sampling rate higher than the predetermined magnification, and the sampled image is converted to an image size having the target resolution, and the image is processed. This is characterized in that it is used as post-supervised data.

また、上記目的を達成するために本発明は、機械学習モデルを用いて対象データに対して所定の推論処理を実行する推論方法であって、前記対象データの入力を受け付ける入力ステップと、前記入力ステップから対象データが入力される機械学習モデルと、推論処理を実行する機械学習モデルを用いて、対象データに対して所定の推論処理を実行する推論処理実行ステップと、を含み、前記機械学習モデルは、上記学習処理実行ステップにおいて前記設定値を学習された機械学習モデルであることを特徴とするものである。 Further, in order to achieve the above object, the present invention provides an inference method for performing a predetermined inference process on target data using a machine learning model, comprising: an input step of receiving input of the target data; a machine learning model into which target data is input from the step, and an inference processing execution step of performing a predetermined inference process on the target data using the machine learning model that executes the inference process, the machine learning model is a machine learning model that has learned the setting value in the learning process execution step.

本発明に係る学習装置は、元となる教師データの入力を受け付ける入力部と、元となる教師データに基づいて加工された加工後教師データを作成するデータ加工部と、学習対象の機械学習モデルを用いて、少なくとも加工後教師データを含む教師データに基づいて、学習対象の機械学習モデルにおける設定値を学習する処理を実行するための学習処理実行部と、設定値を記憶する記憶部と、を備える。データ加工部では、（１）加工後教師データの目標解像度を特定し、（２）元となる教師データに基づいて目標解像度を有する画像の所定倍の大きさの暫定基準画像を獲得し、（３）暫定基準画像に対する回転、移動又は変形を含む編集を受け付け、（４）編集実行後の前記暫定基準画像に対し、所定倍よりも高い倍率のサンプリングレートでサンプリングし、（５）サンプリング後の画像を、目標解像度を有する画像の大きさに変換したものを加工後教師データとする。この構成により、本願発明に係る学習装置では、機械学習によるデータ処理において、元となる教師データの水増しを行って学習を行う際にも、より高品質のデータを準備して、機械学習モデルの学習効率の低下を確実に抑えることができる。 The learning device according to the present invention includes an input unit that receives input of original teacher data, a data processing unit that creates processed teacher data based on the original teacher data, and a machine learning model to be learned. a learning processing execution unit for performing a process of learning setting values in a machine learning model to be learned based on teacher data including at least post-processed teacher data using the learning process; and a storage unit for storing the setting values; Equipped with The data processing unit (1) specifies the target resolution of the processed training data, (2) obtains a provisional reference image that is a predetermined times the size of the image having the target resolution based on the original training data, and ( 3) Accept editing including rotation, movement, or transformation of the provisional reference image; (4) Sample the provisional reference image after editing at a sampling rate higher than a predetermined magnification; and (5) Edit the provisional reference image after sampling. An image converted to an image size having a target resolution is used as processed teacher data. With this configuration, the learning device according to the present invention can prepare higher quality data and use the machine learning model even when performing learning by inflating the original teacher data in data processing by machine learning. It is possible to reliably suppress the decline in learning efficiency.

本発明の実施の形態に係る画像処理装置の構成を示すブロック図である。FIG. 1 is a block diagram showing the configuration of an image processing device according to an embodiment of the present invention. 同上画像処理装置の機能ブロック図である。It is a functional block diagram of the image processing device same as the above. 同上画像処理装置の学習動作時の加工後教師データの作成例を示す図である。It is a figure which shows the example of creation of the processed teacher data at the time of the learning operation of the image processing apparatus same as the above. 同上画像処理装置の学習動作時のデータ水増しを説明する図である。It is a figure explaining the data padding at the time of the learning operation of the image processing device same as the above. 同上画像処理装置における学習動作時に与えられるデータセットのイメージ図である。FIG. 2 is an image diagram of a data set given during a learning operation in the image processing device same as the above. （ａ）同上画像処理装置の変形例において学習動作時に用いる損失関数を説明する図、（ｂ）同変形例において生成される合成画像の例を示す図である。(a) A diagram illustrating a loss function used during a learning operation in a modification of the above image processing device, and (b) a diagram illustrating an example of a composite image generated in the modification.

（実施の形態）
本発明の実施の形態に係る画像処理装置について図１乃至図５を参照して説明する。本実施の形態において、画像処理装置は、所定の学習用画像データに基づいて機械学習モデルにおける設定値を学習する学習装置、又は機械学習モデルを用いて対象画像データに対して所定の推論処理を実行する推論装置の少なくとも一方としての機能を発揮する。 (Embodiment)
An image processing apparatus according to an embodiment of the present invention will be described with reference to FIGS. 1 to 5. In this embodiment, the image processing device is a learning device that learns setting values in a machine learning model based on predetermined learning image data, or a learning device that performs predetermined inference processing on target image data using a machine learning model. It functions as at least one of the inference devices that executes.

最初に、画像処理装置１に備わる各処理部に関して図１を参照しながら説明する。画像処理装置１は、図１に示すように、制御部１０、画像処理部１１、記憶部１２、通信部１３、表示部１４、操作部１５及び読取部１６を備える。なお、画像処理装置１及び画像処理装置１における動作について以下では、１台のサーバコンピュータとして説明するが、複数のコンピュータによって処理を分散するようにして構成されてもよい。 First, each processing section included in the image processing apparatus 1 will be explained with reference to FIG. As shown in FIG. 1, the image processing device 1 includes a control section 10, an image processing section 11, a storage section 12, a communication section 13, a display section 14, an operation section 15, and a reading section 16. Note that although the image processing apparatus 1 and the operations thereof will be described below as one server computer, they may be configured so that processing is distributed among a plurality of computers.

制御部１０は、ＣＰＵなどのプロセッサやメモリを用いて、装置の構成部を制御して各種機能を実現する。画像処理部１１は、ＧＰＵ又は専用回路等のプロセッサ及びメモリを用い、制御部１０からの制御指示に応じて画像処理を実行する。なお、制御部１０及び画像処理部１１は、ＣＰＵ，ＧＰＵ等のプロセッサ、メモリ、さらには記憶部１２及び通信部１３を集積した１つのハードウェア（ＳｏＣ：System on a Chip）として構成されていてもよい。 The control unit 10 uses a processor such as a CPU and a memory to control the components of the device and realize various functions. The image processing unit 11 uses a processor such as a GPU or a dedicated circuit and a memory to perform image processing in accordance with control instructions from the control unit 10. The control unit 10 and the image processing unit 11 are configured as a single piece of hardware (SoC: System on a Chip) that integrates a processor such as a CPU and a GPU, a memory, a storage unit 12, and a communication unit 13. Good too.

記憶部１２は、ハードディスクやフラッシュメモリを用いる。記憶部１２には、画像処理プログラム１Ｐ、機械学習モデル（例えばＣＮＮ）としての機能を発揮させる機械学習ライブラリ１Ｌ、超解像モデル１Ｍ、及びアウトペインティングモデル１Ｎが記憶されている。また、記憶部１２には、機械学習モデルを定義する定義データ、学習済み機械学習モデルにおける設定値等を含むパラメータなどが記憶される。画像処理部１１は、記憶部１２に記憶されている超解像モデル１Ｍに基づいて、解像度を元の解像度よりも高くしたデジタル画像を出力する。なお、アウトペインティングモデル１Ｎや超解像モデル１Ｍは機械学習モデルによって生成できる。 The storage unit 12 uses a hard disk or flash memory. The storage unit 12 stores an image processing program 1P, a machine learning library 1L that functions as a machine learning model (for example, CNN), a super resolution model 1M, and an outpainting model 1N. Further, the storage unit 12 stores definition data that defines a machine learning model, parameters including set values in the trained machine learning model, and the like. The image processing unit 11 outputs a digital image whose resolution is higher than the original resolution based on the super-resolution model 1M stored in the storage unit 12. Note that the out-painting model 1N and the super-resolution model 1M can be generated by a machine learning model.

通信部１３は、インターネット等の通信網への通信接続を実現する通信モジュールである。通信部１３は、ネットワークカード、無線通信デバイス又はキャリア通信用モジュールを用いる。 The communication unit 13 is a communication module that realizes a communication connection to a communication network such as the Internet. The communication unit 13 uses a network card, a wireless communication device, or a carrier communication module.

表示部１４は、液晶パネル又は有機ＥＬ（Electro Luminescence）ディプレイ等を用いる。表示部１４は、制御部１０の指示による画像処理部１１での処理によって画像を表示することが可能である。 The display unit 14 uses a liquid crystal panel, an organic EL (Electro Luminescence) display, or the like. The display unit 14 is capable of displaying images through processing by the image processing unit 11 according to instructions from the control unit 10.

操作部１５は、キーボード又はマウス等のユーザインタフェースを含む。筐体に設けられた物理的ボタンを用いてもよい。操作部１５はまた表示部１４に表示されるソフトウェアボタン等を用いてもよい。操作部１５は、ユーザによる操作情報を制御部１０へ通知する。具体的には、操作部１５は、元となる教師データの編集時において、ユーザから教師データに対する変形、例えば拡大・縮小、回転、左右方向へのシフト、輝度・彩度、コントラストの変更など、様々な変形方法の設定を受け付ける。 The operation unit 15 includes a user interface such as a keyboard or a mouse. A physical button provided on the housing may also be used. The operation unit 15 may also use software buttons displayed on the display unit 14. The operation unit 15 notifies the control unit 10 of operation information by the user. Specifically, when editing the original teacher data, the operation unit 15 allows the user to modify the teacher data, such as enlarging/reducing, rotating, shifting in the left/right direction, changing brightness/saturation, contrast, etc. Accepts settings for various transformation methods.

読取部１６は、例えばディスクドライブを用い、光ディスク等を用いた記録媒体２に記憶してある画像処理プログラム２Ｐ、機械学習ライブラリ２Ｌ、超解像モデル２М、及びアウトペインティングモデル２Ｎを読み取ることが可能である。記憶部１２に記憶してある画像処理プログラム１Ｐ、機械学習ライブラリ１Ｌ、超解像モデル１Ｍ、及びアウトペインティングモデル１Ｎは、記録媒体２から読取部１６が読み取った画像処理プログラム２Ｐ、機械学習ライブラリ２Ｌ、超解像モデル２Ｍ、アウトペインティングモデル２Ｎを制御部１０が記憶部１２に複製したものであってもよい。 The reading unit 16 can read the image processing program 2P, machine learning library 2L, super resolution model 2M, and outpainting model 2N stored in the recording medium 2 using an optical disk or the like using, for example, a disk drive. It is possible. The image processing program 1P, machine learning library 1L, super resolution model 1M, and outpainting model 1N stored in the storage unit 12 are the image processing program 2P and machine learning library read by the reading unit 16 from the recording medium 2. 2L, super-resolution model 2M, and outpainting model 2N may be copied by the control unit 10 into the storage unit 12.

次に、画像処理装置１の画像処理の機能に関して図２を参照しながら説明する。画像処理装置１の制御部１０は学習処理実行部１０１及び推論処理実行部１０２を備える。
学習処理実行部１０１は、記憶部１２に記憶してある機械学習ライブラリ１Ｌ、定義データ、パラメータ情報に基づき機械学習モデル（機械学習エンジン）として機能する。すなわち、学習処理実行部１０１は、学習対象の機械学習モデルを用いて、学習用画像データ又は学習用テキストデータに基づいて、学習対象の機械学習モデルにおける設定値（パラメータ等）を学習する処理を実行する。この設定値の学習は、例えばパラメータをミニバッチ勾配降下法で更新するなど入力データと解答データとの差分を最小化する処理である。また、学習処理実行部１０１は、操作部１５を用いた操作に基づき、入力データである画像データを編集する画像処理実行部としての機能をも発揮する。 Next, the image processing function of the image processing device 1 will be explained with reference to FIG. 2. The control unit 10 of the image processing device 1 includes a learning process execution unit 101 and an inference process execution unit 102.
The learning processing execution unit 101 functions as a machine learning model (machine learning engine) based on the machine learning library 1L, definition data, and parameter information stored in the storage unit 12. In other words, the learning processing execution unit 101 uses the learning target machine learning model to perform a process of learning setting values (parameters, etc.) in the learning target machine learning model based on the learning image data or the learning text data. Execute. This setting value learning is a process that minimizes the difference between input data and answer data, such as updating parameters using a mini-batch gradient descent method. Furthermore, the learning processing execution unit 101 also functions as an image processing execution unit that edits image data that is input data based on operations using the operation unit 15.

推論処理実行部１０２は、記憶部１２に記憶してある画像処理プログラム１Ｐに基づき、画像処理を行う。すなわち、推論処理実行部１０２は、機械学習モデルを用いて、入力される対象データ（対象画像データや推論対象テキストデータ）に対して所定の推論処理を実行する。また、推論処理実行部１０２は、ユーザの操作部１５を用いた操作に基づき、入力データである画像データを入力部１１０に入力する画像処理実行部としての機能をも発揮する。 The inference processing execution unit 102 performs image processing based on the image processing program 1P stored in the storage unit 12. That is, the inference processing execution unit 102 executes a predetermined inference process on input target data (target image data or inference target text data) using a machine learning model. The inference processing execution unit 102 also functions as an image processing execution unit that inputs image data, which is input data, to the input unit 110 based on the user's operation using the operation unit 15.

画像処理部１１は、入力部１１０、機械学習モデル１１１、機械学習モデル１１１としての生成器１１１ａと判別器１１１ｂ、出力部１１２、加工部１１３及び切出部１１４を備える。例えば、CNNなどの機械学習モデルは与えられた教師データを用いて学習し、生成器１１１ａとして機能する定義データ及びパラメータ情報、判別器１１１ｂとして機能する定義データ及びパラメータ情報が記憶部１２に記憶されてモデルが作成される。 The image processing unit 11 includes an input unit 110, a machine learning model 111, a generator 111a and a discriminator 111b as the machine learning model 111, an output unit 112, a processing unit 113, and a cutting unit 114. For example, a machine learning model such as CNN learns using given training data, and definition data and parameter information that function as a generator 111a and definition data and parameter information that function as a discriminator 111b are stored in the storage unit 12. A model is created.

機械学習モデルの学習例としては、典型的には、分類タスク（入力側の教師画像データ→機械学習モデル→出力ラベル⇔教師ラベル）、画像変換（入力側の教師画像データ→機械学習モデル→出力画像データ⇔出力側教師画像データ）、画像生成（入力側の教師テキストデータ→機械学習モデル→出力画像データ⇔出力側教師画像データ）が挙げられる。なお、ここでの→はデータの流れ、⇔は学習時の比較を示す。 Examples of machine learning model learning typically include classification tasks (input side teacher image data → machine learning model → output label ⇔ teacher label), image conversion (input side teacher image data → machine learning model → output) These include image data ⇔ output side teacher image data), and image generation (input side teacher text data → machine learning model → output image data ⇔ output side teacher image data). Note that → here indicates the flow of data, and ⇔ indicates comparison during learning.

入力部１１０は、教師用（学習用）の画像データや分類ラベル、テキストデータ、又は推論対象となる対象画像データや推論対象テキストデータ（図２の１２ａ参照）の入力を受け付ける。また、これらのデータは記憶部１２に記録・保持される。 The input unit 110 receives input of teacher (learning) image data, classification labels, text data, or target image data to be inferred or inference target text data (see 12a in FIG. 2). Further, these data are recorded and held in the storage unit 12.

データ加工部１１３は、教師データの水増し(Data Augmentation)のため加工処理を行う。具体的には、データ加工部１１３は、（１）加工後（水増し後）教師データの目標解像度を特定し、（２）入力部１１０に入力された元となる教師データに基づいて目標解像度を有する画像の所定倍の大きさの暫定基準画像を獲得し、（３）暫定基準画像に対する回転、移動又は変形を含む編集を受け付け、（４）編集実行後の暫定基準画像に対し、前記所定倍よりも高い倍率のサンプリングレートでサンプリングし、（５）サンプリング後の画像を、目標解像度を有する画像の大きさに変換したものを加工後教師データとする、処理を行う。 The data processing unit 113 performs processing for data augmentation of teacher data. Specifically, the data processing unit 113 (1) identifies the target resolution of the processed (inflated) teaching data, and (2) determines the target resolution based on the original teaching data input to the input unit 110. (3) accept editing including rotation, movement, or deformation of the provisional reference image; (4) obtain a provisional reference image that is a predetermined times the size of the image that has been edited; (5) Processing is performed in which the sampled image is converted to an image size having the target resolution and used as processed teacher data.

例えば、図３に示すように（１）元となる教師データ３０１に対する加工後教師データ３０４の目標解像度（ここでは等倍）を特定する。次に、（２）元となる教師データ３０１（ここでは112*112ピクセル）に対して、目標解像度の所定倍（ここでは２倍、すなわち224*224ピクセル）に暫定基準画像３０２が準備される。なお、暫定基準画像３０２の獲得には、例えば、記憶部１２に予め記憶されている画像データや、画像処理部１１が超解像モデル１Ｍに基づいて、元となる教師データの解像度をより高くした超解像画像を生成して一時的に記憶部１２などに記憶する。 For example, as shown in FIG. 3, (1) the target resolution (here, same size) of the processed teacher data 304 with respect to the original teacher data 301 is specified. Next, (2) a provisional reference image 302 is prepared at a predetermined times the target resolution (here twice, that is, 224*224 pixels) with respect to the original teacher data 301 (here 112*112 pixels). . Note that in order to obtain the provisional reference image 302, for example, the image data stored in advance in the storage unit 12 or the image processing unit 11 increases the resolution of the original teacher data based on the super-resolution model 1M. A super-resolution image is generated and temporarily stored in the storage unit 12 or the like.

次に、（３）暫定基準画像３０２に対する編集（ここでは回転）を受け付け、（４）編集後の暫定基準画像３０３を、当該所定倍（ここでは２倍）よりも高いサンプリングレートの設定（ここでは３倍）でサンプリングする。そして、（５）サンプリング後の画像を、目標解像度へ縮小処理して（ここでは回転により領域よりはみ出した画像を切り取り、且つ１／３倍にして）目標解像度を有する画像の大きさに変換したものを加工後教師データ３０４とする。 Next, (3) the editing (rotation in this case) for the provisional reference image 302 is accepted, and (4) the edited provisional reference image 303 is set at a sampling rate higher than the predetermined time (in this case, 2 times). Then sample at 3x). (5) The sampled image was reduced to the target resolution (here, the image that protruded from the area due to rotation was cut out and multiplied by 1/3) and converted to the size of the image having the target resolution. This data is used as the processed teacher data 304.

このように、目標となる画像（目標解像度の画像）の大きさよりも大きい（例えば２倍の）画像（暫定基準画像）を用意して、その画像に対して高いサンプリングレートで回転や移動を行ってから、目標のサイズに縮小すると画像が劣化しない。 In this way, you can prepare an image (temporary reference image) that is larger (for example, twice the size) than the target image (image with the target resolution), and then rotate or move that image at a high sampling rate. If you then reduce the image to the target size, the image will not deteriorate.

本実施の形態の説明において、目標解像度は元の教師データ３０１から加工後教師データ３０４への拡大の倍率であり、制御部１０は初期的に、加工前のデジタル画像（オリジナル）と同一の解像度を目標解像度（等倍）として設定しても良い。また、編集にデータの拡大・縮小が含まれている場合、拡大・縮小後の画像データの解像度を目標解像度（１以上、又は１以下）として設定してもよい。さらに、編集に回転・変形が含まれている場合、回転する角度に合わせてサンプリングレートを設定しても良い。 In the description of this embodiment, the target resolution is the magnification of the original training data 301 to the processed training data 304, and the control unit 10 initially sets the resolution to the same resolution as the digital image (original) before processing. may be set as the target resolution (same size). Furthermore, if the editing includes enlarging/reducing data, the resolution of the image data after enlarging/reducing may be set as the target resolution (1 or more or 1 or less). Furthermore, if the editing includes rotation/transformation, the sampling rate may be set according to the rotation angle.

目標解像度は必ず加工後の解像度に丁度合わせる必要はなく、近い値を設定してもよい。その場合、解像度が足りない分は劣化する。例えば、加工後教師データが大体元となる教師データの大きさに近ければ、目標解像度が等倍、暫定基準画像が２倍、オーバーサンプリングが３倍である。例えば、加工後教師データを元となる教師データより若干大きくする程度であれば、目標解像度を１．５倍、暫定基準画像が３倍、オーバーサンプリングが５倍である。この数値は画像ごとに設定されてもよいし、ピクセルごとに設定されてもよい。 The target resolution does not necessarily have to exactly match the resolution after processing, and may be set to a value close to it. In that case, the resolution will be degraded to compensate for the lack of resolution. For example, if the processed teacher data is close to the size of the original teacher data, the target resolution is equal to the same size, the temporary reference image is twice the size, and the oversampling is three times the same. For example, if the processed training data is to be slightly larger than the original training data, the target resolution is 1.5 times, the temporary reference image is 3 times, and the oversampling is 5 times. This value may be set for each image or for each pixel.

なお、データ加工部１１３は、編集実行後の暫定基準画像に対し、編集に応じてオーバーサンプルのレートを所定倍よりも高い倍率以上で決定し、決定されたレートでオーバーサンプリングを行ない、オーバーサンプリングにより得られた画像を、加工後教師データへ変換してもよい。また、これらのデータ加工のための設定は記憶部１２に記録・保持されてもよい。 Note that the data processing unit 113 determines the oversampling rate at a magnification higher than a predetermined magnification for the provisional reference image after editing, and performs oversampling at the determined rate. The image obtained by the above may be converted into training data after processing. Further, these settings for data processing may be recorded and held in the storage unit 12.

図３の例では、２倍の画像を用意して１／２に縮小することになるが、編集として回転が行われる場合は、縦横方向と斜め方向でサンプリングレートが異なるため、縦横方向ではサンプリングレートに対してサンプル数が減り、斜め方向ではサンプリングレートに対してサンプル数が超過する。このため、特に斜め方向ではエイリアスノイズが発生する。このノイズを抑制するためには、サンプリングレートを上げた状態で回転させた上で、目標のサイズに縮小すると、このノイズが発生しない。例を挙げると、目標のサイズの２倍の画像を、さらに１．５倍の大きさに拡大しつつ回転を行い、１／３に縮小する（３倍オーバーサンプル）、というような処理である。 In the example in Figure 3, an image that is twice the size is prepared and reduced to 1/2, but when rotation is performed as an edit, the sampling rate is different in the vertical and horizontal directions and diagonally, so the sampling rate is different in the vertical and horizontal directions. The number of samples decreases relative to the sampling rate, and in diagonal directions the number of samples exceeds the sampling rate. For this reason, alias noise occurs particularly in diagonal directions. In order to suppress this noise, this noise will not occur if the rotation is performed at a high sampling rate and then reduced to the target size. For example, an image that is twice the target size is further enlarged to 1.5 times, rotated, and reduced to 1/3 (3 times oversampling). .

また、画像変形の場合は画素ごとに大きくなったり小さくなったりするので、各々の画素が目標解像度よりも大きい暫定基準画像を持ち、且つ、それよりも高いサンプリングレートを持つように調整しても良い。各々の画素ごとに考えなくても、全ての画素が条件を満たすように一律で、十分に高い解像度の暫定基準画像と、十分に高いサンプリングレートを設定しても良い。例えば、拡大される部分に対応できるように高い解像度の暫定基準画像と、縮小される部分や回転する部分に対応できるように高いサンプリングレートを設定するなどである。 Also, in the case of image deformation, each pixel becomes larger or smaller, so even if each pixel has a provisional reference image that is larger than the target resolution and is adjusted to have a higher sampling rate than that, good. Without considering each pixel, a provisional reference image of sufficiently high resolution and a sufficiently high sampling rate may be set uniformly so that all pixels satisfy the conditions. For example, a provisional reference image with a high resolution is set to accommodate the enlarged portion, and a high sampling rate is set so as to accommodate the reduced or rotated portion.

ここで、編集に関して、画像処理装置１の制御部１０は、記憶部１２に記憶している画像処理プログラム１Ｐに基づき、画像データに対する編集を行う。特に制御部１０は、画像処理プログラム１Ｐに基づいて操作部１５を介してユーザの操作を受け付け、画像データへの編集を実現する。なお、ここでの編集とは、学習用データ水増しのために、操作部１５を介して暫定基準画像に対する拡大・縮小、回転、菱形・台形変形、シフト、カラーシフト、シャープネス・アンシャープネスなどの加工を施すことである。 Regarding editing, the control unit 10 of the image processing device 1 edits image data based on the image processing program 1P stored in the storage unit 12. In particular, the control unit 10 receives user operations via the operation unit 15 based on the image processing program 1P, and realizes editing of image data. Note that editing here refers to processing such as enlarging/reducing, rotation, diamond/trapezoidal deformation, shift, color shift, sharpness/unsharpness, etc. on the provisional reference image via the operation unit 15 in order to increase the data for learning. It is to administer.

次に、データの水増し加工に関して、図４及び図５を参照しながら説明する。図４に示すように、データ加工部１１３が元となる教師データ４０１に様々な加工を施すことで、学習時に使用される見かけ上の教師データ数を水増しした加工後教師データ４０２が生成される。この際、本実施の形態に係るデータ加工部１１３を用いることで、加工後教師データ４０２が劣化することが効果的に抑制され、水増し後の教師データの量と質を向上し、その結果、学習モデルの性能を飛躍的に向上できる。 Next, data padding will be explained with reference to FIGS. 4 and 5. As shown in FIG. 4, the data processing unit 113 performs various processing on the original teacher data 401 to generate processed teacher data 402 in which the apparent number of teacher data used during learning is increased. . At this time, by using the data processing unit 113 according to the present embodiment, deterioration of the processed teacher data 402 is effectively suppressed, and the quantity and quality of the padded teacher data are improved, and as a result, The performance of learning models can be dramatically improved.

また、例えば、図５（II）に示すように、従来においては元となる教師データを水増しするために直接編集し、画像は同じ大きさのままでも回転や小数画素分の移動（シフト）を行うと水増し後の入力側教師画像データ５０３及び出力側教師画像データ５０４は劣化する。これは、画像には境界面など、急に変化して波形的な性質を示さない部分があるので、画像変形のような波形的な処理を行うと、その部分の信号が失われるためである。この結果、教師データを水増ししたとしても、入力側教師画像データ５０３及び出力側教師画像データ５０４自体が劣化しているので学習モデルの性能が悪くなってしまう。 For example, as shown in Figure 5 (II), in the past, the original training data was directly edited to inflate it, and even though the image remained the same size, it could be rotated or moved (shifted) by a fraction of a pixel. If this is done, the input side teacher image data 503 and output side teacher image data 504 after padding will deteriorate. This is because the image has parts such as boundary surfaces that change suddenly and do not exhibit waveform properties, so if waveform processing such as image transformation is performed, the signal in that part will be lost. . As a result, even if the teacher data is padded, the performance of the learning model deteriorates because the input side teacher image data 503 and the output side teacher image data 504 themselves have deteriorated.

一方、図５（I）に示すように、本実施の形態に係るデータ加工部１１３を用いると、水増し後の入力側教師画像データ５０１及び出力側教師画像データ５０２の品質が維持され、機械学習モデルの高い性能を実現できる。 On the other hand, as shown in FIG. 5(I), when the data processing unit 113 according to the present embodiment is used, the quality of input-side teacher image data 501 and output-side teacher image data 502 after padding is maintained, and machine learning High performance of the model can be achieved.

一方、画像処理装置１が推論装置として機能する場合、機械学習モデルを用いて対象データに対して所定の推論処理を実行する推論装置であって、対象データの入力を受け付ける入力部１１０と、入力部１１０から対象データが入力される機械学習モデル１１１と、推論処理を実行する機械学習モデル１１１を用いて、対象データに対して所定の推論処理を実行する推論処理実行部１０２と、を備え、機械学習モデル１１１は上述した学習装置の学習処理実行部１０１において設定値を学習された機械学習モデル１１１である。 On the other hand, when the image processing device 1 functions as an inference device, it is an inference device that performs predetermined inference processing on target data using a machine learning model, and includes an input unit 110 that receives input of target data, and an input unit 110 that receives input of target data. A machine learning model 111 to which target data is input from the unit 110, and an inference processing execution unit 102 that performs predetermined inference processing on the target data using the machine learning model 111 that executes inference processing, The machine learning model 111 is a machine learning model 111 whose setting values have been learned by the learning processing execution unit 101 of the learning device described above.

機械学習モデル１１１は、学習済みモデル使用時にはそれぞれ既に学習済のパラメータに基づいてデータの最適化処理（例えば高解像度化、クラス分けやノイズ除去、画像生成）を行う。推論時の機械学習モデル１１１がＣＮＮとなる場合には、定義データにより定義される複数段の畳み込み層及びプーリング層と、全結合層とを含んでもよく、画像データの特徴量を取り出し、取り出された特徴量に基づいて画像加工処理を行う。 When using a learned model, the machine learning model 111 performs data optimization processing (for example, increasing resolution, classifying, noise removal, and image generation) based on already learned parameters. When the machine learning model 111 at the time of inference is a CNN, it may include multiple stages of convolutional layers and pooling layers defined by definition data, and a fully connected layer, and extracts and extracts the feature values of image data. Image processing is performed based on the calculated feature values.

具体的には、機械学習モデル１１１の推論時においては、機械学習モデル１１１に推論対象画像データや推論対象テキストデータを入力し、出力部１１２からの出力として推論後画像データを得ることができる。ここでの画像データは、YCbCrやRGBを用いて表現される画像データである。また、出力は画像データに限られずクラス分けの場合は特定されたクラスを出力する。出力部１１２は、画像データや分類クラスを記憶部１２に出力する。なお、出力データを画像処理部１１において画像として描画し、表示部１４へ出力してもよい。 Specifically, during inference by the machine learning model 111, image data to be inferred and text data to be inferred can be input to the machine learning model 111, and post-inference image data can be obtained as an output from the output unit 112. The image data here is image data expressed using YCbCr or RGB. Further, the output is not limited to image data, and in the case of classification, the specified class is output. The output unit 112 outputs image data and classification classes to the storage unit 12. Note that the output data may be drawn as an image in the image processing section 11 and output to the display section 14.

次に、本実施の形態に係る学習装置における画像データのアウトペインティングに関して説明する。従来、画像データの水増しは、基本的に同じサイズの画像データを増やすものなので、拡大縮小や回転を掛けた場合は、切出部１１４で画像をカットしたり、穴埋めしたりして元のサイズに戻す。従来の画像の水増し操作では、変形後に画像が回転や縮小して穴が開いた部分は近くの色で埋めるなど周辺画素の画素値を用いた演算処理によって、画像の不自然さを軽減して、画質の劣化を抑止していた。 Next, outpainting of image data in the learning device according to this embodiment will be explained. Conventionally, inflating image data basically involves increasing image data of the same size, so when scaling or rotating the image, the image is cut or filled with holes in the cutting unit 114 to restore the original size. Return to In conventional image augmentation operations, the unnaturalness of the image is reduced by performing arithmetic processing using the pixel values of surrounding pixels, such as filling in holes with nearby colors when the image is rotated or reduced after transformation. , which suppressed the deterioration of image quality.

一方、本実施の形態に係る画像処理部１１は、記憶部１２に記憶されているアウトペインティングモデル１Ｎに基づいて、元となる教師データの周囲を予め自動生成してより大きな教師データを作成し、データ加工部１１３は、アウトペインティングモデル１Ｎにより作成された教師データを、加工後教師データに変換する。すなわち、従来の方法であれば加工後画像データに生じた穴は、近傍のデータで埋めたり、黒で埋めたりするが、その場合不自然な画像が生成されていたが、本実施の形態に係る画像処理装置１では、アウトペインティングモデル１Ｎを用いて、先に教師画像データの周りに自然に連続する画像を自動生成して大きくしておくことで、穴が生じないようにする。そして、これを上述のデータ加工部１１３の教師データの水増しの加工手法と併せて用いることで、より高品質の加工後教師データを獲得できるようになる。 On the other hand, the image processing unit 11 according to the present embodiment automatically generates the surroundings of the original teacher data in advance based on the outpainting model 1N stored in the storage unit 12 to create larger teacher data. The data processing unit 113 then converts the teacher data created by the outpainting model 1N into processed teacher data. In other words, in the conventional method, holes created in the processed image data are filled with nearby data or filled with black, but in this case an unnatural image is generated, but this embodiment In such an image processing device 1, an outpainting model 1N is used to first automatically generate and enlarge a naturally continuous image around the teacher image data, thereby preventing the occurrence of holes. By using this in conjunction with the above-described processing method of inflating the teaching data of the data processing unit 113, it becomes possible to obtain processed teaching data of higher quality.

以上の説明のように、画像処理装置１が教師データに基づいて機械学習モデル１１１における設定値を学習する学習装置をして機能する場合、当該学習装置は、元となる教師データの入力を受け付ける入力部１１０と、元となる教師データに基づいて加工された加工後教師データを作成するデータ加工部１１３と、学習対象の機械学習モデル１１１を用いて、少なくとも加工後教師データを含む教師データに基づいて、学習対象の機械学習モデル１１１における設定値を学習する処理を実行するための学習処理実行部１０１と、設定値を記憶する記憶部１２と、を備える。データ加工部１１３では、（１）加工後教師データの目標解像度を特定し、（２）元となる教師データに基づいて目標解像度を有する画像の所定倍の大きさの暫定基準画像を獲得し、（３）暫定基準画像に対する回転、移動又は変形を含む編集を受け付け、（４）編集実行後の暫定基準画像に対し、所定倍よりも高い倍率のサンプリングレートでサンプリングし、（５）サンプリング後の画像を、目標解像度を有する画像の大きさに変換したものを加工後教師データとする。この構成により、学習装置では、機械学習によるデータ処理において、元となる教師データの水増しを行って学習を行う際にも、より高品質のデータを準備して、機械学習モデルの学習効率の低下を確実に抑えることができる。 As described above, when the image processing device 1 functions as a learning device that learns setting values in the machine learning model 111 based on teacher data, the learning device receives input of the original teacher data. The input unit 110, the data processing unit 113 that creates processed teacher data based on the original teacher data, and the machine learning model 111 to be learned are used to create teacher data including at least the processed teacher data. It includes a learning processing execution unit 101 for executing a process of learning setting values in a machine learning model 111 to be learned based on the learning target, and a storage unit 12 for storing the setting values. The data processing unit 113 (1) specifies the target resolution of the processed teacher data, (2) obtains a provisional reference image that is a predetermined times the size of the image having the target resolution based on the original teacher data, (3) Accept edits that include rotation, movement, or transformation on the provisional reference image; (4) Sample the provisional reference image after editing at a sampling rate higher than a predetermined magnification; (5) After sampling. An image converted to an image size having a target resolution is used as processed teacher data. With this configuration, in data processing by machine learning, the learning device prepares higher quality data even when performing learning by inflating the original teacher data, reducing the learning efficiency of the machine learning model. can be reliably suppressed.

すなわち、画像処理装置１が学習装置として機能する際、データ加工部１１３は、元となる教師データの水増しの際に、超解像などの充分に大きな暫定基準画像を用いて変形などを行い、かつ、大きな倍率でのオーバーサンプリングを用いた後に加工後教師データを得る。これにより、変形後の加工後教師データも好適な画質を保つことが可能になり、そのデータで学習した機械学習モデルの性能も向上させることができる。また、この機械学習モデルを用いることで画像処理装置１が推論装置として機能する際、より高精度な推論後画像データの生成や分類を行うことができる。 That is, when the image processing device 1 functions as a learning device, the data processing unit 113 performs transformation using a sufficiently large provisional reference image such as super-resolution when inflating the original teacher data. In addition, the processed training data is obtained after using oversampling with a large magnification. This makes it possible to maintain suitable image quality of the post-transformed teacher data, and also improves the performance of the machine learning model learned using the data. Further, by using this machine learning model, when the image processing device 1 functions as an inference device, it is possible to generate and classify image data after inference with higher accuracy.

（変形例）
本発明の実施の形態に係る画像処理装置１の変形例に関して図６を参照して説明する。本変形例では、学習処理実行部１０１により学習された機械学習モデル１１１を、損失関数として用い、当該損失関数は、例えばContent loss、Style lossである。例えば、図６（ａ）に示すように、構造を担保するコンテンツ画像６０１と、画風を担保するスタイル画像６０２の２つを入力にとり、前者の構造と後者の画風を併せ持つ合成画像を出力する仕組みであるCNNの一種であるstyle transferが知られている。 (Modified example)
A modification of the image processing device 1 according to the embodiment of the present invention will be described with reference to FIG. 6. In this modification, the machine learning model 111 learned by the learning processing execution unit 101 is used as a loss function, and the loss functions are, for example, Content loss and Style loss. For example, as shown in FIG. 6(a), a system takes two inputs, a content image 601 that ensures structure and a style image 602 that ensures style, and outputs a composite image that has both the structure of the former and the style of the latter. Style transfer, which is a type of CNN, is known.

このstyle transferでは、まず、左右の訓練済ＶＧＧ(Visual Geometry Group)ネットワークにコンテンツ画像６０１とスタイル画像６０２をそれぞれ入力し、その途中の特徴マップから各画像らしさを表すベクトルを抜き出して損失関数６０３を定義する。そして、その損失を最小化する形で最適化を進めていくが、ここで使用される機械学習モデルの一種である訓練済みVGGについて、コンテンツ画像６０１とスタイル画像６０２を上記実施の形態のデータ加工部１１３で生成された劣化していない画像で学習していれば、最適化が完了した時点において、高周波成分の特徴を捉えた（解像度が高い）図６（ｂ）に例示するようなコンテンツとスタイルを組み合わせた合成画像６０４の生成が期待できる。 In this style transfer, first, a content image 601 and a style image 602 are input to the left and right trained VGG (Visual Geometry Group) networks, respectively, and a vector representing the likeness of each image is extracted from the intermediate feature map to create a loss function 603. Define. Then, optimization is carried out in a manner that minimizes the loss. Regarding the trained VGG, which is a type of machine learning model used here, the content image 601 and style image 602 are processed as the data in the above embodiment. If learning is performed using the undegraded images generated in the section 113, at the time the optimization is completed, the content as illustrated in FIG. 6(b) that captures the characteristics of high frequency components (high resolution) Generation of a composite image 604 combining styles can be expected.

なお、本実施の形態に係る画像処理装置１のハードウェア構成の内、通信部１３、表示部１４、及び読取部１６は必須ではない。通信部１３については、例えば記憶部１２に記憶される画像処理プログラム１Ｐ、及び機械学習ライブラリ１Ｌを外部サーバ装置から取得する場合に一旦使用された後は使用しない場合がある。読取部１６も同様に、画像処理プログラム１Ｐ、機械学習ライブラリ１Ｌを記憶媒体から読み出して取得した後は使用されない可能性がある。そして通信部１３及び読取部１６は、ＵＳＢ等のシリアル通信を用いた同一のデバイスであってもよい。 Note that in the hardware configuration of the image processing device 1 according to the present embodiment, the communication section 13, the display section 14, and the reading section 16 are not essential. The communication unit 13 may not be used after being used, for example, when acquiring the image processing program 1P and the machine learning library 1L stored in the storage unit 12 from an external server device. Similarly, the reading unit 16 may not be used after reading and acquiring the image processing program 1P and the machine learning library 1L from the storage medium. The communication unit 13 and the reading unit 16 may be the same device using serial communication such as USB.

画像処理装置１がＷｅｂサーバとして、上述の機械学習モデル１１１としての機能を、表示部及び通信部を備えるＷｅｂクライアント装置へ提供する構成としてもよい。この場合、通信部１３は、Ｗｅｂクライアント装置からのリクエストを受信し、処理結果を送信するために使用される。 The image processing device 1 may function as a Web server and provide the function of the machine learning model 111 described above to a Web client device including a display section and a communication section. In this case, the communication unit 13 is used to receive requests from the Web client device and to transmit processing results.

学習時に用いる誤差は、二乗誤差、絶対値誤差、又は交差エントロピー誤差等、入出力されるデータ、学習目的に応じて適切な関数を用いるとよい。例えば、出力が分類である場合、交差エントロピー誤差を用いる。誤差関数を用いることに拘わらずその他の基準を用いるなど柔軟な運用が適用できる。この誤差関数自体に外部の機械学習モデルを用いて評価を行なってもよい。 As the error used during learning, an appropriate function such as a square error, absolute value error, or cross entropy error may be used depending on input/output data and the learning purpose. For example, if the output is a classification, use cross-entropy error. Regardless of using the error function, flexible operations such as using other criteria can be applied. This error function itself may be evaluated using an external machine learning model.

なお、本発明は、上記実施の形態の構成に限られず、発明の趣旨を変更しない範囲で種々の変形が可能である。また、本発明の目的を達成するために、本発明は、画像処理装置（学習装置及び推論装置）に含まれる特徴的な構成手段をステップとする画像処理方法（学習方法及び推論方法）としたり、それらの特徴的なステップを含むプログラムとして実現することもできる。そして、そのプログラムは、ＲＯＭ等に格納しておくだけでなく、ＵＳＢメモリ等の記録媒体や通信ネットワークを介して流通させることもできる。 Note that the present invention is not limited to the configuration of the above-described embodiments, and various modifications can be made without changing the spirit of the invention. Furthermore, in order to achieve the object of the present invention, the present invention provides an image processing method (a learning method and an inference method) whose steps are characteristic configuration means included in an image processing device (a learning device and an inference device). , it can also be realized as a program including those characteristic steps. The program can not only be stored in a ROM or the like, but can also be distributed via a recording medium such as a USB memory or a communication network.

また、本発明は、画像処理装置又はコンピュータプログラムに向けて入力データを送信し、画像処理装置又はコンピュータプログラムからの出力データを受信して利用するコンピュータシステムとしても実現できる。このシステムは、上述の処理により学習済みの機械学習モデルから得られるデータを利用した処理システムで、種々のサービスを提供できる。本システムに用いる装置は、表示部及び通信部を備えた画像処理装置又はコンピュータと情報を送受信できる情報処理装置などであり、例えば所謂ＰＣ、スマートフォン、携帯端末、ゲーム機器などである。 Furthermore, the present invention can be implemented as a computer system that transmits input data to an image processing device or computer program, and receives and uses output data from the image processing device or computer program. This system is a processing system that uses data obtained from a machine learning model that has been trained through the processing described above, and can provide various services. The device used in this system is an image processing device equipped with a display section and a communication section, or an information processing device capable of transmitting and receiving information to and from a computer, such as a so-called PC, a smartphone, a mobile terminal, a game device, and the like.

１画像処理装置（学習装置及び推論装置）
１０制御部
１２記憶部（学習結果記憶部）
１５操作部
１０１学習処理実行部
１０２推論処理実行部
１１０入力部
１１１機械学習モデル
１１１ａ生成器
１１１ｂ判別器
１１２出力部
１１３データ加工部
１１４切出部
３０１，４０１元となる教師データ
３０２暫定基準画像
３０３編集後の暫定基準画像
３０４，４０２加工後教師データ 1 Image processing device (learning device and inference device)
10 Control unit 12 Storage unit (learning result storage unit)
15 Operation unit 101 Learning processing execution unit 102 Inference processing execution unit 110 Input unit 111 Machine learning model 111a Generator 111b Discriminator 112 Output unit 113 Data processing unit 114 Extraction unit 301,401 Original teacher data 302 Temporary reference image 303 Temporary reference image after editing 304,402 Teacher data after processing

Claims

A learning device that learns setting values in a machine learning model based on training data,
an input section that accepts input of the original teacher data;
a data processing unit that creates processed teacher data based on the original teacher data;
learning to perform a process of learning setting values in the machine learning model to be learned, based on a machine learning model to be learned, and teacher data including at least the post-processed teacher data using the machine learning model; a processing execution unit;
comprising a storage unit that stores the setting value,
In the data processing department,
specifying a target resolution of the processed training data;
Obtaining a provisional reference image that is a predetermined times the size of the image having the target resolution based on the original teacher data;
Accepting edits including rotation, movement, or transformation of the provisional reference image;
sampling the provisional reference image after editing at a sampling rate higher than the predetermined magnification;
A learning device characterized in that the processed teacher data is obtained by converting the sampled image into the size of the image having the target resolution.

The learning device further includes:
It has an outpainting model that automatically generates the surroundings of the original training data in advance to create larger training data,
The learning device according to claim 1, wherein the data processing unit converts the teacher data created by the outpainting model into the processed teacher data.

The data processing section is
Determining an oversampling rate at a magnification higher than the predetermined magnification for the provisional reference image after the editing is performed according to the editing;
Perform oversampling at the determined rate,
2. The learning device according to claim 1, further comprising converting an image obtained by oversampling into the processed teacher data.

The learning device further includes:
If the resolution of the original teacher data is less than the predetermined times the target resolution, the method includes a super-resolution model that creates a super-resolution image of the predetermined size based on the original teacher data;
2. The learning device according to claim 1, wherein the data processing unit uses the super-resolution image created using the super-resolution model as the provisional reference image.

A learning device according to any one of claims 1 to 4, wherein at least a part of a machine learning model that has learned the set value in a learning processing execution unit of the learning device is used as a loss function.

An inference device that performs predetermined inference processing on target data using a machine learning model,
an input unit that accepts input of the target data;
a machine learning model into which target data is input from the input section;
an inference processing execution unit that executes a predetermined inference process on the target data using a machine learning model that executes the inference process,
An inference device, wherein the machine learning model is a machine learning model that has learned the setting value in a learning processing execution unit of the learning device according to any one of claims 1 to 4.

A computer program that causes a computer to operate as the learning device according to any one of claims 1 to 4.

A computer program that causes a computer to operate as the learning device according to claim 5.

A computer program that causes a computer to operate as the inference device according to claim 6.

A learning method for learning setting values in a machine learning model based on training data, the method comprising:
an input step for accepting input of source teacher data;
a data processing step of creating processed teacher data based on the original teacher data;
learning to perform a process of learning setting values in the machine learning model to be learned, based on a machine learning model to be learned, and teacher data including at least the post-processed teacher data using the machine learning model; a processing execution step;
a storing step of storing the set value;
In the data processing step,
specifying a target resolution of the processed training data;
Obtaining a provisional reference image that is a predetermined times the size of the image having the target resolution based on the original teacher data;
Accepting edits including rotation, movement, or transformation of the provisional reference image;
sampling the provisional reference image after editing at a sampling rate higher than the predetermined magnification;
A learning method characterized in that the sampled image is converted to a size of an image having the target resolution and is used as the processed teacher data.

An inference method for performing predetermined inference processing on target data using a machine learning model,
an input step of accepting input of the target data;
a machine learning model into which target data is input from the input step;
an inference process execution step of executing a predetermined inference process on the target data using a machine learning model that executes the inference process,
An inference method characterized in that the machine learning model is a machine learning model that has learned the setting value in the learning process execution step according to claim 10.