JP7440149B1

JP7440149B1 - Data generation method, data generation program, and data generation system

Info

Publication number: JP7440149B1
Application number: JP2023134067A
Authority: JP
Inventors: 健太郎三原
Original assignee: Eaglys Inc
Current assignee: Eaglys Inc
Priority date: 2023-08-21
Filing date: 2023-08-21
Publication date: 2024-02-28
Anticipated expiration: 2043-08-21

Abstract

【課題】データの内容を開示せずに、最適パラメータ設計を容易にする。【解決手段】制御部とユーザ記憶部とを有するユーザ端末と、準同型暗号方式で暗号化されたデータを含む演算を行う暗号演算部と親データから予測結果を出力する機械学習モデルを記憶する暗号データ記憶部とを有する演算装置と、を備えるシステムにおいて用いられる方法であって、制御部が公開鍵及び秘密鍵を生成するステップと、制御部がユーザ記憶部から取得した複数の親データを、公開鍵を用いて準同型暗号方式で暗号化するステップと、暗号演算部及び制御部が暗号化された複数の親データを機械学習モデルに入力して出力された複数の予測結果に基づき、複数の子データを生成する次世代生成ステップと、暗号データ記憶部が、生成された複数の子データを暗号化された複数の親データとして記憶する更新ステップと、次世代生成ステップと更新ステップとを繰り返すステップと、を備える。【選択図】図１The present invention facilitates optimal parameter design without disclosing data contents. [Solution] A user terminal having a control unit and a user storage unit, a cryptographic calculation unit that performs calculations involving data encrypted using a homomorphic encryption method, and a machine learning model that outputs prediction results from parent data are stored. A method for use in a system comprising: an arithmetic device having an encrypted data storage; , a step of encrypting using a homomorphic encryption method using a public key, and a cryptographic operation unit and a control unit input a plurality of encrypted parent data to a machine learning model and based on a plurality of output prediction results, a next generation generation step of generating a plurality of child data; an update step in which the encrypted data storage unit stores the generated plurality of child data as a plurality of encrypted parent data; a next generation generation step and an update step; and repeating the steps. [Selection diagram] Figure 1

Description

本開示は、データ生成方法、データ生成プログラム、およびデータ生成システムに関する。 The present disclosure relates to a data generation method, a data generation program, and a data generation system.

機械学習技術を活用して、最適なパラメータを設計する取り組みがある。例えば、特許文献１には、新規物質の開発を効率的に行うマテリアルズ・インフォマティクスに関する技術であって、化学データベースから新材料の候補を発見する発明が開示されている。 There are efforts to design optimal parameters using machine learning technology. For example, Patent Document 1 discloses an invention that is a technology related to materials informatics for efficiently developing new substances and discovers new material candidates from a chemical database.

特表２０２２－５１６６９７号公報Special Publication No. 2022-516697

近年、様々な分野において、社内で保有するデータ（機械学習モデルに関する情報を含む。）を活用するだけでなく、社外の企業や組織と連携して、データを利活用しようとする機運が高まっている。しかしながら、機密性が高いデータの他社への提供や利活用は、情報漏洩や不正利用などセキュリティ上の課題がある。 In recent years, there has been a growing momentum in various fields to not only utilize data held internally (including information on machine learning models), but also to collaborate with external companies and organizations to utilize data. There is. However, providing and utilizing highly confidential data to other companies poses security issues such as information leaks and unauthorized use.

そこで、本開示では、上記課題を解決すべくなされたものであって、その目的は、各組織が保有するデータの内容を開示することなく利活用し、最適なパラメータ設計を容易にするデータ生成方法を提供することである。 Therefore, the present disclosure has been made to solve the above problems, and the purpose is to generate data that facilitates optimal parameter design by utilizing data held by each organization without disclosing the contents. The purpose is to provide a method.

上記目的を達成するため、本開示に係るデータ生成方法は、制御部と、複数の親データを記憶するユーザ記憶部と、を有するユーザ端末と、準同型暗号方式で暗号化されたデータを含む演算を行う暗号演算部と、親データから予測結果を出力する機械学習モデルを記憶する暗号データ記憶部と、を有する演算装置と、を備えるシステムにおいて用いられる方法であって、制御部が、公開鍵および秘密鍵を生成するステップと、制御部が、ユーザ記憶部から取得した複数の親データを、公開鍵を用いて準同型暗号方式で暗号化するステップと、暗号演算部および制御部が、暗号化された複数の親データを機械学習モデルに入力して出力された複数の予測結果に基づき、複数の子データを生成する次世代生成ステップと、暗号データ記憶部が、生成された複数の子データを、暗号化された複数の親データとして記憶する更新ステップと、次世代生成ステップと、更新ステップとを繰り返すステップと、を備える。 In order to achieve the above object, a data generation method according to the present disclosure includes a user terminal having a control unit, a user storage unit storing a plurality of parent data, and data encrypted using a homomorphic encryption method. A method used in a system comprising an arithmetic unit having a cryptographic arithmetic unit that performs an arithmetic operation and an encrypted data storage unit that stores a machine learning model that outputs a prediction result from parent data, the control unit a step of generating a key and a private key; a step of the control unit encrypting the plurality of parent data acquired from the user storage unit using a homomorphic encryption method using the public key; a cryptographic operation unit and the control unit; A next-generation generation step generates a plurality of child data based on a plurality of encrypted parent data input to a machine learning model and a plurality of output prediction results; The method includes an update step of storing child data as a plurality of encrypted parent data, a step of repeating the next generation generation step and the update step.

また、上記目的を達成するため、本開示に係るデータ生成プログラムは、制御部と、複数の親データを記憶するユーザ記憶部と、を有するユーザ端末と、準同型暗号方式で暗号化されたデータを含む演算を行う暗号演算部と、親データから予測結果を出力する機械学習モデルを記憶する暗号データ記憶部と、を有する演算装置と、を備えるシステムにおいて用いられるプログラムであって、制御部が、公開鍵および秘密鍵を生成するステップと、制御部が、ユーザ記憶部から取得した複数の親データを、公開鍵を用いて準同型暗号方式で暗号化するステップと、暗号演算部および制御部が、暗号化された複数の親データを機械学習モデルに入力して出力された複数の予測結果に基づき、複数の子データを生成する次世代生成ステップと、暗号データ記憶部が、生成された複数の子データを、暗号化された複数の親データとして記憶する更新ステップと、次世代生成ステップと、更新ステップとを繰り返すステップと、を備える。 Furthermore, in order to achieve the above object, a data generation program according to the present disclosure includes a user terminal having a control unit, a user storage unit that stores a plurality of parent data, and a data generation program that generates data encrypted using a homomorphic encryption method. A program for use in a system comprising: a cryptographic computation unit that performs computations including; and a cryptographic data storage unit that stores a machine learning model that outputs prediction results from parent data; , a step of generating a public key and a private key, a step of the control unit encrypting a plurality of parent data acquired from the user storage unit using a homomorphic encryption method using the public key, a cryptographic operation unit and a control unit. However, a next generation generation step that generates multiple child data based on multiple prediction results output by inputting multiple encrypted parent data to a machine learning model, and an encrypted data storage unit are generated. The method includes an update step of storing a plurality of child data as a plurality of encrypted parent data, a step of repeating the next generation generation step and the update step.

また、上記目的を達成するため、本開示に係るデータ生成システムは、制御部と、複数の親データを記憶するユーザ記憶部と、を有するユーザ端末と、準同型暗号方式で暗号化されたデータを含む演算を行う暗号演算部と、親データから予測結果を出力する機械学習モデルを記憶する暗号データ記憶部と、を有する演算装置と、を備えるシステムであって、制御部が、公開鍵および秘密鍵を生成するステップと、制御部が、ユーザ記憶部から取得した複数の親データを、公開鍵を用いて準同型暗号方式で暗号化するステップと、暗号演算部および制御部が、暗号化された複数の親データを機械学習モデルに入力して出力された複数の予測結果に基づき、複数の子データを生成する次世代生成ステップと、暗号データ記憶部が、生成された複数の子データを、暗号化された複数の親データとして記憶する更新ステップと、次世代生成ステップと、更新ステップとを繰り返すステップと、を備える。 Furthermore, in order to achieve the above object, a data generation system according to the present disclosure includes a user terminal having a control unit, a user storage unit that stores a plurality of parent data, and a data generation system that stores data encrypted using a homomorphic encryption method. and a cryptographic data storage unit that stores a machine learning model that outputs a prediction result from parent data. a step of generating a private key; a step of the control unit encrypting the plurality of parent data acquired from the user storage unit using a homomorphic encryption method using the public key; a next-generation generation step that generates a plurality of child data based on the plurality of prediction results outputted by inputting the plurality of parent data into a machine learning model; The method includes an update step of storing the data as a plurality of encrypted parent data, a next generation generation step, and a step of repeating the update step.

本開示によれば、各組織が保有するデータの内容を開示することなく利活用し、最適なパラメータ設計を容易にすることができる。 According to the present disclosure, it is possible to utilize the contents of data held by each organization without disclosing them, thereby facilitating optimal parameter design.

データ生成システム１の全体図である。1 is an overall diagram of a data generation system 1. FIG. 端末装置１００のハードウェア構成を示す図である。1 is a diagram showing a hardware configuration of a terminal device 100. FIG. 端末装置１００の機能的構成を示すブロック図である。1 is a block diagram showing a functional configuration of a terminal device 100. FIG. 材料データのデータ構造を示す図である。It is a figure showing the data structure of material data. 材料データの具体例を示す図である。FIG. 3 is a diagram showing a specific example of material data. 探索対象の材料データを設定する画面の一例を示す図である。It is a figure which shows an example of the screen which sets the material data of a search target. 目的変数を設定する画面の一例を示す図である。FIG. 3 is a diagram showing an example of a screen for setting objective variables. 評価値の算出を説明する図である。It is a figure explaining calculation of an evaluation value. 演算装置２００の機能的構成を示すブロック図である。2 is a block diagram showing the functional configuration of a computing device 200. FIG. 実施形態１における特定困難化処理の例を示す図である。6 is a diagram illustrating an example of identification difficulty processing according to the first embodiment. FIG. 実施形態１における特定困難化処理の例を示す図である。6 is a diagram illustrating an example of identification difficulty processing according to the first embodiment. FIG. 遺伝アルゴリズムの適用例を示す図である。It is a figure showing an example of application of a genetic algorithm. データ生成システム１における処理を示すシーケンス図である。3 is a sequence diagram showing processing in the data generation system 1. FIG. データ生成システム１における処理を示すシーケンス図である。3 is a sequence diagram showing processing in the data generation system 1. FIG. 実施形態２における特定困難化処理の例を示す図である。7 is a diagram illustrating an example of identification difficulty processing in Embodiment 2. FIG. 演算装置３００の機能的構成を示すブロック図である。3 is a block diagram showing the functional configuration of a computing device 300. FIG. データ生成システム２における処理を示すシーケンス図である。3 is a sequence diagram showing processing in the data generation system 2. FIG. データ生成システム４の全体図である。1 is an overall diagram of a data generation system 4. FIG. 端末装置４００の機能的構成を示すブロック図である。4 is a block diagram showing the functional configuration of a terminal device 400. FIG. 演算装置５００の機能的構成を示すブロック図である。5 is a block diagram showing the functional configuration of a computing device 500. FIG. データ生成システム４における処理を示すシーケンス図である。4 is a sequence diagram showing processing in the data generation system 4. FIG.

以下、本開示の実施形態について図面を参照して説明する。実施形態を説明する全図において、共通の構成要素には同一の符号を付し、繰り返しの説明を省略する。なお、以下の実施形態は、特許請求の範囲に記載された本開示の内容を不当に限定するものではない。また、実施形態に示される構成要素のすべてが、本開示の必須の構成要素であるとは限らない。 Embodiments of the present disclosure will be described below with reference to the drawings. In all the figures explaining the embodiments, common components are given the same reference numerals and repeated explanations will be omitted. Note that the following embodiments do not unduly limit the content of the present disclosure described in the claims. Furthermore, not all components shown in the embodiments are essential components of the present disclosure.

＜本発明に係るデータ生成方法の概要＞
機械学習技術を活用して、最適なパラメータを設計し、配送ルートの設定、需要の予測、工業製品の形状デザインや、新規物質の開発などに用いる取り組みがある。例えば、マテリアルズ・インフォマティクスでは、化学物質の特徴（組成等）を説明変数とし、その化学物質の特性（硬度等）を目的変数とする大量のデータで機械学習して生成させた予測モデルを用いることにより、新規物質の特性を実際に実験して確認する手間を削減することができる。 <Summary of data generation method according to the present invention>
There are efforts to utilize machine learning technology to design optimal parameters and use them for setting delivery routes, predicting demand, designing the shape of industrial products, developing new substances, etc. For example, materials informatics uses a predictive model generated by machine learning using a large amount of data, with the characteristics of a chemical substance (composition, etc.) as an explanatory variable, and the properties of that chemical substance (hardness, etc.) as an objective variable. By doing so, it is possible to reduce the effort required to actually conduct experiments and confirm the properties of new substances.

また、最適なパラメータを探索する手法として、自然の進化を模倣した最適化アルゴリズムである遺伝アルゴリズム（Genetic Algorithm）が知られている。遺伝アルゴリズムでは、親世代の個体の集団に対して、交叉や突然変異等の、生存選択の操作を適用して、子世代集団を得ることを繰り返し、より良い個体を探索する方法である。例えば、このようなアルゴリズムにより探索されたパラメータを有する化学物質と、上述のような予測モデルとを用いることにより、効率的に新規物質の開発を行うことが可能である。 Additionally, a genetic algorithm, which is an optimization algorithm that imitates natural evolution, is known as a method for searching for optimal parameters. Genetic algorithms are a method of searching for better individuals by repeatedly applying survival selection operations such as crossover and mutation to a population of individuals in the parent generation to obtain a child generation population. For example, by using a chemical substance having parameters searched by such an algorithm and a predictive model as described above, it is possible to efficiently develop a new substance.

近年、社外の企業等と連携し、各自が保有するデータを利活用しようとする機運が高まっており、機密性の高いデータを他社に秘匿したまま、データ連携する手法が望まれている。本発明に係るデータ生成方法によれば、予測モデルを提供するモデル提供者は、モデル利用者に対し、遺伝アルゴリズムによって生成される途中の子世代のデータを開示しないため、モデル利用者に予測モデルを推測されることなく、利用させることができる。また、モデル利用者は、予測対象データを、モデル提供者に開示することなく、最終的な最適解を得ることができる。 In recent years, there has been a growing trend to collaborate with outside companies and utilize the data that each company owns, and there is a need for a method to link data while keeping highly confidential data hidden from other companies. According to the data generation method according to the present invention, the model provider who provides the predictive model does not disclose to the model user the child generation data that is currently being generated by the genetic algorithm. can be used without being guessed. Furthermore, the model user can obtain the final optimal solution without disclosing the prediction target data to the model provider.

具体的には、データを暗号化したまま演算することができる準同型暗号を用いることで実現する。一方で、全ての演算工程を暗号化したまま演算すると、計算コストが膨大になってしまう。そこで、遺伝アルゴリズムにおける親世代集団に対する生存選択の操作を、モデル利用者が予測モデルを推測できないように、予測モデルへの入力と出力との正しい対応関係が特定困難となる処理の実行後に行い、予測結果に対する評価値に基づいて子世代集団を生成する。 Specifically, this is achieved by using homomorphic encryption, which allows calculations to be made while data is encrypted. On the other hand, if calculations are performed with all calculation steps encrypted, the calculation cost will be enormous. Therefore, in order to prevent the model user from inferring the predictive model, the survival selection operation for the parent generation population in the genetic algorithm is performed after processing that makes it difficult to identify the correct correspondence between the input and output of the predictive model. A child generation group is generated based on the evaluation value for the prediction result.

これにより、遺伝アルゴリズムにおいて、子世代集団を効率的に生成することができ、計算コストを削減することができる。また、予測モデルをモデル利用者に推測されることを防ぐことができる。 Thereby, in the genetic algorithm, a child generation population can be efficiently generated, and calculation costs can be reduced. Further, it is possible to prevent the model user from guessing the prediction model.

＜実施形態１＞
本実施形態に係るデータ生成システム１では、上述したように、遺伝アルゴリズムにおける親世代集団に対する生存選択の操作を、モデル利用者が予測モデルを推測できないように、予測モデルの入力（予測対象データ）と出力（予測結果）との対応関係が特定困難となる処理として、対応関係を並び替える処理を行う。 <Embodiment 1>
In the data generation system 1 according to the present embodiment, as described above, the operation of survival selection for the parent generation population in the genetic algorithm is performed using the input of the prediction model (prediction target data) so that the model user cannot infer the prediction model. As a process that makes it difficult to specify the correspondence between the output and the output (prediction result), a process of rearranging the correspondence is performed.

（データ生成システム１の構成）
図１は、データ生成システム１の全体図である。図１を参照して、本実施形態に係るデータ生成システム１の全体図について説明する。 (Configuration of data generation system 1)
FIG. 1 is an overall diagram of the data generation system 1. Referring to FIG. 1, an overall diagram of a data generation system 1 according to the present embodiment will be described.

本実施形態に係るデータ生成システム１は、ユーザによって使用される一つ以上の端末装置１００と、演算装置２００と、を備える。端末装置１００と、演算装置２００とは、ネットワークＮＷを介して通信可能に接続される。ネットワークＮＷは、ＷＡＮ（Wide Area Network）、ＬＡＮ（Local Area Network）等から構成される。データ生成システム１において、新規物質を探索したいユーザは、端末装置１００を介して、他の組織等が保有する予測モデルを提供する演算装置２００を利用する。 The data generation system 1 according to this embodiment includes one or more terminal devices 100 used by a user and a calculation device 200. The terminal device 100 and the arithmetic device 200 are communicably connected via the network NW. The network NW includes a WAN (Wide Area Network), a LAN (Local Area Network), and the like. In the data generation system 1, a user who wants to search for a new substance uses the computing device 200, which provides a predictive model owned by another organization, via the terminal device 100.

端末装置１００は、ユーザからの入力を受け付け、その入力内容を、ネットワークＮＷを介して演算装置２００に送信する。また、端末装置１００は、演算装置２００からネットワークＮＷを介して送信されたデータを受信し、ユーザに提示する。 Terminal device 100 accepts input from a user, and transmits the input content to computing device 200 via network NW. Furthermore, the terminal device 100 receives data transmitted from the arithmetic device 200 via the network NW, and presents the data to the user.

演算装置２００は、予測モデルを提供するモデル提供者（または、データ生成システム１のプラットフォーマ）のサーバである。演算装置２００は、端末装置１００から送信されるデータについて暗号化された状態のまま、機械学習の推論処理や遺伝アルゴリズムによる新規物質の探索等の演算処理を行う。 The computing device 200 is a server of a model provider (or a platformer of the data generation system 1) that provides a predictive model. The arithmetic device 200 performs arithmetic processing such as machine learning inference processing and a search for a new substance using a genetic algorithm while keeping the data transmitted from the terminal device 100 in an encrypted state.

（端末装置１００のハードウェア構成）
図２は、端末装置１００のハードウェア構成を示す図である。図２を参照して、本実施形態に係る端末装置１００のハードウェア構成について説明する。 (Hardware configuration of terminal device 100)
FIG. 2 is a diagram showing the hardware configuration of the terminal device 100. With reference to FIG. 2, the hardware configuration of the terminal device 100 according to this embodiment will be described.

端末装置１００は、ユーザによって使用されるユーザ端末であって、例えば、据え置き型や、ラップトップ型のＰＣ（Personal Computer）のような汎用コンピュータにより実現されてもよいし、スマートフォンやタブレットによって実現されてもよい。 The terminal device 100 is a user terminal used by a user, and may be realized by a general-purpose computer such as a stationary or laptop PC (Personal Computer), or may be realized by a smartphone or a tablet. It's okay.

端末装置１００は、プロセッサ１１と、メモリ１２と、ストレージ１３と、通信ＩＦ１４と、入出力ＩＦ１５と、を備える。 The terminal device 100 includes a processor 11, a memory 12, a storage 13, a communication IF 14, and an input/output IF 15.

プロセッサ１１は、プログラムに記述された命令セットを実行するためのハードウェアであり、演算装置、レジスタ、周辺回路などにより構成される。 The processor 11 is hardware for executing a set of instructions written in a program, and is composed of an arithmetic unit, registers, peripheral circuits, and the like.

メモリ１２は、プログラム、および、プログラム等で処理されるデータ等を一時的に記憶するためのものである。メモリ１２は、例えば、ＤＲＡＭ（Dynamic Random Access Memory）等の揮発性のメモリにより実現される。メモリ１２に記憶されるプログラムは、本発明に係るデータ生成方法を実行するプログラム等である。 The memory 12 is for temporarily storing programs and data processed by the programs. The memory 12 is realized, for example, by a volatile memory such as DRAM (Dynamic Random Access Memory). The program stored in the memory 12 is a program for executing the data generation method according to the present invention.

ストレージ１３は、データを保存するための記憶装置である。ストレージ１３は、例えば、フラッシュメモリ、ＨＤＤ（Hard Disc Drive）等により実現される。 The storage 13 is a storage device for storing data. The storage 13 is realized by, for example, a flash memory, an HDD (Hard Disc Drive), or the like.

通信ＩＦ１４は、端末装置１００が外部の装置と通信するため、信号を送受信するためのインタフェースである。例えば、無線通信を行うための無線通信機、シリアル通信のためのＵＳＢ（Universal Serial Bus）コネクタ等である。 The communication IF 14 is an interface for transmitting and receiving signals so that the terminal device 100 communicates with an external device. Examples include a wireless communication device for wireless communication, a USB (Universal Serial Bus) connector for serial communication, and the like.

入出力ＩＦ１５は、ユーザからの入力を受け付けるための入力装置、および、ユーザに対し情報を提示するための出力装置とのインタフェースである。入力装置は、例えば、キーボードやマウス、タッチパネル、ボタン、マイクロフォン等により実現される。出力装置は、例えば、ディスプレイやプリンタ、スピーカ等により実現される。 The input/output IF 15 is an interface with an input device for receiving input from a user and an output device for presenting information to the user. The input device is realized by, for example, a keyboard, a mouse, a touch panel, a button, a microphone, or the like. The output device is realized by, for example, a display, a printer, a speaker, or the like.

なお、演算装置２００のハードウェア構成を、図２に示した端末装置１００のハードウェア構成としてもよい。演算装置２００は、例えば、ワークステーション、サーバコンピュータ、据え置き型のＰＣ（Personal Computer）、ラップトップ型のＰＣ等のような汎用コンピュータにより実現されてもよいし、クラウド・コンピューティングによって論理的に実現されてもよい。演算装置２００の各構成要素の動作は、上述の端末装置１００と同様に、メモリ１２に記憶されたプログラムに従ったプロセッサ１１により実現する。 Note that the hardware configuration of the arithmetic device 200 may be the hardware configuration of the terminal device 100 shown in FIG. The computing device 200 may be realized by a general-purpose computer such as a workstation, a server computer, a stationary PC (Personal Computer), a laptop PC, etc., or it may be realized logically by cloud computing. may be done. The operation of each component of the arithmetic device 200 is realized by the processor 11 according to a program stored in the memory 12, similarly to the above-described terminal device 100.

（端末装置１００の機能的構成）
図３は、端末装置１００の機能的構成を示すブロック図である。図３を参照して、本実施形態に係る端末装置１００の機能的構成について説明する。 (Functional configuration of terminal device 100)
FIG. 3 is a block diagram showing the functional configuration of the terminal device 100. The functional configuration of the terminal device 100 according to this embodiment will be described with reference to FIG. 3.

端末装置１００は、通信部１１０と、入力部１２０と、出力部１３０と、記憶部１４０と、制御部１５０と、を備える。 The terminal device 100 includes a communication section 110, an input section 120, an output section 130, a storage section 140, and a control section 150.

通信部１１０は、端末装置１００が演算装置２００や他の装置と通信するための処理を行う。ここでいう他の装置とは、ネットワークで接続されたＰＣでもよいし、スマートフォン、タブレットやＸＲ(Cross Reality)（ＡＲ:Augmented Reality，ＭＲ:Mixed Reality，ＶＲ:Virtual Reality）などの端末であってもよい。また、他の装置は、入出力装置、例えば、フラッシュメモリやＨＤＤなどによりデータの入出力を行う装置であってもよい。 The communication unit 110 performs processing for the terminal device 100 to communicate with the computing device 200 and other devices. The other devices mentioned here may be PCs connected via a network, or terminals such as smartphones, tablets, and XR (Cross Reality) (AR: Augmented Reality, MR: Mixed Reality, VR: Virtual Reality). Good too. Further, the other device may be an input/output device, for example, a device that inputs and outputs data using a flash memory, an HDD, or the like.

通信部１１０は、また、ネットワークＮＷを介した装置やローカルに接続された装置等と、セキュリティが確保されたセキュアな通信チャンネルでデータを送受信する。セキュアな通信チャンネルの構築や、通信方法は、共通鍵（セッション鍵など）や公開鍵等を用いた周知の技術であるため、説明を省略する。 The communication unit 110 also transmits and receives data to and from devices via the network NW, locally connected devices, and the like through a secure communication channel where security is ensured. The construction of a secure communication channel and the communication method are well-known techniques that use a common key (such as a session key), a public key, etc., and therefore their explanation will be omitted.

入力部１２０は、端末装置１００のユーザ入力を受け付けるインタフェースである。入力部１２０は、例えば、キーボードや、タッチパネル、仮想空間内表示パネル、音声入力を検出するマイクであるが、これらに限られない。ユーザは、入力部１２０を介して、新規物質の探索に必要なパラメータ等を入力することができる。パラメータとしては、例えば、探索対象の説明変数や探索の制約条件、予測したい目的変数、目的変数の目標値、および優先度（重み）等である。 The input unit 120 is an interface that receives user input from the terminal device 100. The input unit 120 is, for example, a keyboard, a touch panel, a display panel in a virtual space, or a microphone that detects voice input, but is not limited to these. The user can input parameters and the like necessary for searching for new substances via the input unit 120. Examples of the parameters include explanatory variables to be searched, search constraints, objective variables to be predicted, target values of objective variables, priorities (weights), and the like.

また、探索の停止条件を入力するようにしてもよい。停止条件としては、例えば、遺伝アルゴリズムで生成する世代数であったり、評価値（後述）の閾値であったり、探索にかかる最大実行時間であったりでもよい。入力例については、図６および図７において説明する。 Alternatively, search stopping conditions may be input. The stopping condition may be, for example, the number of generations generated by a genetic algorithm, a threshold value of an evaluation value (described later), or the maximum execution time required for the search. Input examples will be explained in FIGS. 6 and 7.

出力部１３０は、情報を出力して端末装置１００のユーザに通知するインタフェースである。出力部１３０は、例えば、ディスプレイや、音声出力するスピーカであるが、これらに限られない。出力部１３０は、例えば、演算装置２００等から受信したデータをディスプレイに表示する。 The output unit 130 is an interface that outputs information and notifies the user of the terminal device 100. The output unit 130 is, for example, a display or a speaker that outputs audio, but is not limited to these. The output unit 130 displays, for example, data received from the arithmetic device 200 or the like on a display.

記憶部１４０は、例えば、ＲＡＭ等の揮発性のメモリ、フラッシュメモリ、ＨＤＤ等により構成され、端末装置１００が使用するデータ、および各種処理に用いられるコンピュータプログラムを記憶する。コンピュータプログラムは、所定のサーバ等からインストールされてもよいし、コンピュータ読み取り可能な可搬型記録媒体から公知のセットアッププログラム等を用いてインストールされてもよい。可搬型記録媒体は、例えばＣＤ－ＲＯＭ（Compact Disc Read Only Memory）、ＤＶＤ－ＲＯＭ（Digital Versatile Disc Read Only Memory）等である。 The storage unit 140 includes, for example, a volatile memory such as a RAM, a flash memory, an HDD, etc., and stores data used by the terminal device 100 and computer programs used for various processes. The computer program may be installed from a predetermined server or the like, or may be installed from a computer-readable portable recording medium using a known setup program or the like. The portable recording medium is, for example, a CD-ROM (Compact Disc Read Only Memory) or a DVD-ROM (Digital Versatile Disc Read Only Memory).

記憶部１４０は、「ユーザ記憶部」として機能し、鍵記憶部１４１と、材料データベース１４２と、を有する。 The storage unit 140 functions as a “user storage unit” and includes a key storage unit 141 and a material database 142.

鍵記憶部１４１は、後述する制御部１５０が生成した公開鍵および秘密鍵を記憶する。なお、公開鍵および秘密鍵は、端末装置１００とネットワークを介して、または直接的に接続される、セキュアな鍵管理システム（ＫＭＳ：Key Management System）において管理するようにしてもよい。また、鍵記憶部１４１は、公開鍵および秘密鍵の鍵対を複数記憶してもよいし、通信用の鍵（セッション鍵）を記憶してもよい。 The key storage unit 141 stores a public key and a private key generated by a control unit 150, which will be described later. Note that the public key and the private key may be managed in a secure key management system (KMS) that is connected to the terminal device 100 via a network or directly. Further, the key storage unit 141 may store a plurality of key pairs of a public key and a private key, or may store a communication key (session key).

材料データベース１４２は、複数の材料データを記憶するデータベースである。複数の材料データは、探索対象の未知の材料データであるが、既知の材料データを含んでもよい。材料データベース１４２は、探索対象の複数の材料データを、遺伝アルゴリズムの複数の親データとして格納する。また、材料データベース１４２は、探索結果の材料データ（子データ）を格納してもよい。 The material database 142 is a database that stores a plurality of material data. The plurality of material data is unknown material data to be searched for, but may also include known material data. The material database 142 stores a plurality of material data to be searched as a plurality of parent data of the genetic algorithm. Further, the material database 142 may store material data (child data) of search results.

材料データは、材料に関する複数の特徴を示す情報と、複数の特性に関する情報とを含む。材料データのデータ構造については、図４において説明する。 The material data includes information indicating a plurality of characteristics regarding the material and information regarding a plurality of properties. The data structure of material data will be explained with reference to FIG.

制御部１５０は、端末装置１００の各機能を制御し、予め記憶部１４０に記憶されているプログラムに基づいて動作するＣＰＵ（Central Processing Unit）等のプロセッサである。 The control unit 150 is a processor such as a CPU (Central Processing Unit) that controls each function of the terminal device 100 and operates based on a program stored in the storage unit 140 in advance.

制御部１５０は、鍵生成部１５１と、暗復号部１５２と、評価値算出部１５３と、を有する。 The control unit 150 includes a key generation unit 151, an encryption/decryption unit 152, and an evaluation value calculation unit 153.

鍵生成部１５１は、データを暗号化し、または暗号化されたデータを復号するための公開鍵および秘密鍵の鍵対を生成する。生成した公開鍵および秘密鍵は、鍵記憶部１４１に格納される。 The key generation unit 151 generates a key pair of a public key and a private key for encrypting data or decrypting encrypted data. The generated public key and private key are stored in the key storage unit 141.

暗復号部１５２は、材料データベース１４２に格納される複数の親データを、生成した公開鍵を用いて準同型暗号方式で暗号化する。準同型暗号方式としては、例えば、Paillier方式、Lifted-ElGamal方式、Somewhat Homomorphic Encryption方式、Fully Homomorphic Encryption方式等を含む。上述したように、材料データは材料に関する複数の特徴を示す情報を含んでおり、暗復号部１５２は、各特徴についての値ごとに暗号化する。 The encryption/decryption unit 152 encrypts a plurality of pieces of parent data stored in the material database 142 using a homomorphic encryption method using the generated public key. Homomorphic encryption methods include, for example, the Paillier method, the Lifted-ElGamal method, the Somewhat Homomorphic Encryption method, the Fully Homomorphic Encryption method, and the like. As described above, the material data includes information indicating a plurality of characteristics regarding the material, and the encryption/decryption unit 152 encrypts each value for each characteristic.

また、暗復号部１５２は、演算装置２００から受信した暗号化されたデータを、公開鍵と対の秘密鍵を用いて復号する。 Further, the encryption/decryption unit 152 decrypts the encrypted data received from the arithmetic device 200 using the private key paired with the public key.

評価値算出部１５３は、演算装置２００の予測モデル（後述）が出力する予測結果（予測値）から、目標値に対する評価値を算出する。評価値は、二つの値がどの程度異なるかを示す値であり、例えば、絶対差、平方差、相対誤差や、特定の指標、重み付け等を用いるなど、周知の方法により評価してもよい。なお、本実施形態では、評価値は、予測結果と目標値との絶対差とする。例えば、予測結果が「２０」であり、目標値が「２４」の場合、評価値は、目標値と予測結果との絶対差「４」（＝｜２０－２４｜）となる。 The evaluation value calculation unit 153 calculates an evaluation value for the target value from a prediction result (prediction value) output by a prediction model (described later) of the calculation device 200. The evaluation value is a value indicating how much the two values differ, and may be evaluated by a known method such as using an absolute difference, a square difference, a relative error, a specific index, weighting, etc. Note that in this embodiment, the evaluation value is the absolute difference between the prediction result and the target value. For example, if the prediction result is "20" and the target value is "24", the evaluation value is the absolute difference between the target value and the prediction result of "4" (=|20-24|).

図４は、材料データのデータ構造を示す図である。図４を参照して、材料データのデータ構造について説明する。 FIG. 4 is a diagram showing the data structure of material data. The data structure of material data will be explained with reference to FIG. 4.

材料データ１０００は、材料ＩＤ１１００と、材料の特徴を示す情報１２００と、材料の特性を示す情報１３００とを含む。 The material data 1000 includes a material ID 1100, information 1200 indicating characteristics of the material, and information 1300 indicating characteristics of the material.

材料ＩＤ１０００は、複数の材料データのそれぞれを識別する識別子である。 The material ID 1000 is an identifier that identifies each piece of material data.

材料の特徴を示す情報１２００は、例えば、材料を構成する物質あり、複数の説明変数x1～xnとして表される。材料データごとに、材料の特徴を示す情報として、実験値、計算値等の数値が入力される。なお、説明変数は、材料の特徴を示す情報に限られず、材料の反応工程や生成工程でのプロセス条件（温度、圧力、時間等）を示す情報を含んでもよい。 Information 1200 indicating the characteristics of the material includes, for example, substances that constitute the material, and is expressed as a plurality of explanatory variables x1 to xn. For each material data, numerical values such as experimental values and calculated values are input as information indicating the characteristics of the material. Note that the explanatory variables are not limited to information indicating the characteristics of the material, but may also include information indicating process conditions (temperature, pressure, time, etc.) in the reaction step or production step of the material.

材料の特性を示す情報１３００は、例えば、材料の密度、硬度、伝導率、強度のような、材料の物理的特性、化学的特性、機械的特性、電気的特性などであり、複数の目的変数y1～ymとして表される。 The information 1300 indicating the properties of the material is, for example, the physical properties, chemical properties, mechanical properties, and electrical properties of the material, such as the density, hardness, conductivity, and strength of the material, and includes a plurality of objective variables. Represented as y1~ym.

材料ＩＤ１１００により識別される材料データが、探索対象の未知の材料データである場合、材料の特性を示す情報１３００には、情報が不明であることを示す記号等が入力される。一方、材料ＩＤ１１００により識別される材料が、既知の材料である場合、材料の特性を示す情報１３００には、特性の実験値、計算値等の数値が入力される。 If the material data identified by the material ID 1100 is unknown material data to be searched for, a symbol or the like indicating that the information is unknown is input to the information 1300 indicating the characteristics of the material. On the other hand, if the material identified by the material ID 1100 is a known material, numerical values such as experimental values, calculated values, etc. of the characteristics are input to the information 1300 indicating the characteristics of the material.

なお、図３において説明したように、端末装置１００の暗復号部１５２は、図４の例では、探索対象の未知の材料データのセル（説明変数）ごとに暗号化する。 Note that, as described in FIG. 3, in the example of FIG. 4, the encryption/decryption unit 152 of the terminal device 100 encrypts each cell (explanatory variable) of the unknown material data to be searched.

図５は、材料データの具体例を示す図である。図５を参照して、材料データベース１４２に格納される材料データの具体例について説明する。 FIG. 5 is a diagram showing a specific example of material data. A specific example of material data stored in the material database 142 will be described with reference to FIG. 5.

材料データ２０００は、材料ＩＤと、材料の特徴（説明変数）と、材料の特性（目的変数）とを含むデータセットである。 The material data 2000 is a data set that includes a material ID, material characteristics (explanatory variables), and material characteristics (objective variables).

図５の例では、材料の特徴として、材料を構成する「原料Ａ」、「原料Ｂ」、「原料Ｃ」…、「添加剤Ｘ」、「添加剤Ｙ」、「添加剤Ｚ」を含む。また、材料の特性として、材料の「粘性」を含む。なお、図５の例では、目的変数は一つであるが、複数であってもよい。 In the example of FIG. 5, the characteristics of the material include "Raw material A", "Raw material B", "Raw material C", etc., "Additive X", "Additive Y", "Additive Z" constituting the material. . The material properties also include the "viscosity" of the material. Note that in the example of FIG. 5, there is one objective variable, but there may be a plurality of objective variables.

材料を識別する材料ＩＤごとに、材料の特徴（説明変数）と材料の特性（目的変数）とが対応付けられる。また、説明変数には、各材料の特徴を示す実験値等の数値が入力される。探索対象である材料データの目的変数には、数値が存在しない旨を示す「ＮＵＬＬ」が入力されるが、既知の材料の目的変数には、数値が入力される。 For each material ID that identifies the material, material characteristics (explanatory variables) and material characteristics (objective variables) are associated. In addition, numerical values such as experimental values indicating the characteristics of each material are input as explanatory variables. "NULL" indicating that a numerical value does not exist is input to the target variable of the material data that is the search target, but a numerical value is input to the target variable of the known material.

材料ＩＤ「１」～「５」で識別される材料データは、探索対象の材料データであり、目的変数である粘性には「ＮＵＬＬ」と入力される。具体的には、材料ＩＤが「１」の場合、原料Ａは「８０」、原料Ｂは「０」、原料Ｃは「０」、…、添加剤Ｘは「４」、添加剤Ｙは「０」、添加剤Ｚは「８」、粘性は「ＮＵＬＬ」が入力されている。探索対象の材料データは、遺伝アルゴリズムの親データとして、材料データベース１４２に格納される。すなわち、複数の親データは、複数の説明変数に対応する各値を含むレコードが所定の順序に並ぶデータセットである。 The material data identified by the material IDs "1" to "5" are the material data to be searched, and "NULL" is input to the viscosity, which is the objective variable. Specifically, when the material ID is "1", raw material A is "80", raw material B is "0", raw material C is "0", ..., additive X is "4", additive Y is " 0", additive Z is "8", and viscosity is "NULL". The material data to be searched for is stored in the material database 142 as parent data for the genetic algorithm. That is, the plurality of parent data is a data set in which records including respective values corresponding to the plurality of explanatory variables are arranged in a predetermined order.

材料ＩＤが「５１」～「５５」で識別される材料データは、既知の材料データであり、説明変数および目的変数には、数値が入力される。具体的には、材料ＩＤが「５１」の場合、原料Ａは「４０」、原料Ｂは「０」、原料Ｃは「０」、…、添加剤Ｘは「２」、添加剤Ｙは「０」、添加剤Ｚは「８」、粘性は「１０」が入力されている。既知の材料データは、例えば、探索済みの材料データであったり、予め実測値等が知られている材料データであったりしてもよい。既知の材料データは、機械学習の教師データとして利用されてもよい。 The material data identified by the material IDs "51" to "55" are known material data, and numerical values are input to the explanatory variables and objective variables. Specifically, when the material ID is "51", raw material A is "40", raw material B is "0", raw material C is "0", ..., additive X is "2", additive Y is " 0'', additive Z is ``8'', and viscosity is ``10''. The known material data may be, for example, searched material data or material data whose actual measured values are known in advance. Known material data may be used as training data for machine learning.

図６は、探索対象の材料データを設定する画面の一例を示す図である。図６を参照して、探索対象の材料データの設定例について説明する。 FIG. 6 is a diagram showing an example of a screen for setting material data to be searched. An example of setting material data to be searched will be described with reference to FIG. 6.

画面３０００は、材料の特徴を示す情報３１００と、選択欄３２００と、範囲を示す情報３３００とを含む。画面３０００は、例えば、端末装置１００の出力部１３０に表示される。ユーザは、入力部１２０を介して、材料データの探索に必要なパラメータを入力する。 Screen 3000 includes information 3100 indicating characteristics of the material, selection field 3200, and information 3300 indicating range. Screen 3000 is displayed on output unit 130 of terminal device 100, for example. The user inputs parameters necessary for searching for material data via the input unit 120.

材料を示す情報３１００は、材料データの特徴（説明変数）を示す。例えば、図６の例では、図５で示した説明変数すべてを示しているが、説明変数のうち、特定の変数のみを示すようにしてもよいし、ユーザに説明変数を入力させるようにしてもよいし、プルダウン表示により選択させるようにしてもよい。 Information 3100 indicating the material indicates characteristics (explanatory variables) of the material data. For example, in the example of FIG. 6, all the explanatory variables shown in FIG. Alternatively, the selection may be made using a pull-down display.

選択欄３２００は、説明変数の中で、最適化の対象となる説明変数を選択するチェックボックスを示す。図６の例では、最適化する対象として、「添加剤Ｘ」、「添加剤Ｙ」、「添加剤Ｚ」が選択されている。 A selection column 3200 shows checkboxes for selecting explanatory variables to be optimized from among the explanatory variables. In the example of FIG. 6, "Additive X", "Additive Y", and "Additive Z" are selected as targets to be optimized.

範囲を示す情報３３００は、最適化する対象として選択された説明変数の制約条件を示す。図６の例では、ユーザは、パラメータとして、最小値と最大値を入力し、この値の範囲で最適化を実行するように設定することができる。すなわち、図６の例では、材料の特徴のうち、「添加剤Ｘ」が「０～１０」の範囲、「添加剤Ｙ」が「２～５」の範囲、「添加剤Ｚ」が「０～３」の範囲になるような材料データを探索するよう設定されている。 Information 3300 indicating the range indicates the constraint conditions of the explanatory variable selected as the optimization target. In the example of FIG. 6, the user can input a minimum value and a maximum value as parameters and set the optimization to be performed within this value range. That is, in the example of FIG. 6, among the material characteristics, "additive It is set to search for material data that falls within the range of 3.

図７は、目的変数を設定する画面の一例を示す図である。図７を参照して、予測する目的変数の設定例について説明する。 FIG. 7 is a diagram showing an example of a screen for setting objective variables. An example of setting target variables to be predicted will be described with reference to FIG. 7.

画面４０００は、目的変数を示す情報４１００と、選択欄４２００と、目標値を示す情報４３００と、重みを示す情報４４００とを含む。画面４０００は、例えば、端末装置１００の出力部１３０に表示される。ユーザは、入力部１２０を介して、目的変数の設定に必要なパラメータを入力する。 Screen 4000 includes information 4100 indicating objective variables, selection field 4200, information 4300 indicating target values, and information 4400 indicating weights. Screen 4000 is displayed on output unit 130 of terminal device 100, for example. The user inputs parameters necessary for setting the objective variable via the input unit 120.

目的変数を示す情報４１００は、材料データの特性（目的変数）を示す。例えば、図７の例では、強度、溶解度、…、粘性、伝導性、耐性、という材料の特性が示されている。なお、目的変数は、ユーザに入力させるようにしてもよいし、プルダウン表示により選択させるようにしてもよい。 Information 4100 indicating the objective variable indicates the characteristics (objective variable) of the material data. For example, in the example of FIG. 7, the properties of the material are shown: strength, solubility, . . . , viscosity, conductivity, and resistance. Note that the target variable may be input by the user or may be selected from a pull-down display.

選択欄４２００は、予測したい目的変数を選択するチェックボックスを示す。図７の例では、予測する対象として、「溶解度」および「粘性」が選択されている。なお、目的変数は、複数選択してもよいし、一つだけ選択するようにしてもよい。 A selection column 4200 shows checkboxes for selecting the target variable to be predicted. In the example of FIG. 7, "solubility" and "viscosity" are selected as the targets to be predicted. Note that a plurality of objective variables or only one objective variable may be selected.

目標値を示す情報４３００は、目的変数の目標値を示しており、探索した材料データの目的変数の最終的な値を設定する。図７の例では、目標値として、「溶解度」が「１０」、「粘性」が「４０」と設定されている。 Information 4300 indicating the target value indicates the target value of the objective variable, and sets the final value of the objective variable of the searched material data. In the example of FIG. 7, the target values are set as "10" for "solubility" and "40" for "viscosity."

重みを示す情報４４００は、目的変数を複数選択した場合に、どの目的変数を重視するかを示すパラメータである。図７の例では、設定範囲が０～１００の重みについて、「溶解度」に対しては「４０」、「粘性」に対しては「６０」と入力されている。すなわち、「粘性」の方を重視して材料データを探索するように設定されている。 Information 4400 indicating weight is a parameter indicating which objective variable is to be emphasized when a plurality of objective variables are selected. In the example of FIG. 7, for weights with a setting range of 0 to 100, "40" is entered for "solubility" and "60" is entered for "viscosity." In other words, the setting is such that material data is searched with emphasis on "viscosity".

図８は、評価値の算出を説明する図である。図８を参照して、端末装置１００における制御部１５０の評価値算出部１５３が評価値を算出する方法の一例について説明する。 FIG. 8 is a diagram illustrating calculation of evaluation values. With reference to FIG. 8, an example of a method by which the evaluation value calculation unit 153 of the control unit 150 in the terminal device 100 calculates an evaluation value will be described.

図８の例では、粘性を目的変数とし、その目標値を「４０」とした場合の、各材料ＩＤについての評価値を示している。評価値は、目標値と、目的変数の予測値との距離（絶対差）を表すものである。材料ＩＤ「１１」では、粘性の予測値が「２０」であるから、目標値「４０」との距離である「２０」（＝｜４０－２０｜）が評価値として算出される。 The example in FIG. 8 shows evaluation values for each material ID when viscosity is the objective variable and its target value is "40". The evaluation value represents the distance (absolute difference) between the target value and the predicted value of the target variable. For material ID "11", the predicted value of viscosity is "20", so "20" (=|40-20|), which is the distance from the target value "40", is calculated as the evaluation value.

同様に、材料ＩＤ「１３」では、粘性の予測値が「５０」であるから、目標値「４０」との距離である「１０」（＝｜４０－５０｜）が評価値として算出される。 Similarly, for material ID "13", the predicted viscosity value is "50", so "10" (=|40-50|), which is the distance from the target value "40", is calculated as the evaluation value. .

図８の例では、材料ＩＤ「１１」～「１５」のうち、材料ＩＤ「１４」の評価値が「０」であり、材料ＩＤ「１４」の目的変数の予測値が、目標値を達成していることを示している。 In the example of FIG. 8, the evaluation value of material ID "14" among material IDs "11" to "15" is "0", and the predicted value of the objective variable of material ID "14" has achieved the target value. It shows that you are doing it.

（演算装置２００の機能的構成）
図９は、演算装置２００の機能的構成を示すブロック図である。図９を参照して、実施形態１に係る演算装置２００の機能的構成について説明する。 (Functional configuration of arithmetic device 200)
FIG. 9 is a block diagram showing the functional configuration of the arithmetic device 200. With reference to FIG. 9, the functional configuration of the arithmetic device 200 according to the first embodiment will be described.

演算装置２００は、通信部２１０と、入力部２２０と、出力部２３０と、記憶部２４０と、制御部２５０と、を備える。 The arithmetic device 200 includes a communication section 210, an input section 220, an output section 230, a storage section 240, and a control section 250.

通信部２１０は、演算装置２００が端末装置１００や他の装置と通信するための処理を行う。通信部２１０は、端末装置１００の通信部１１０と同等の機能であるため、重複する説明は省略する。 The communication unit 210 performs processing for the arithmetic device 200 to communicate with the terminal device 100 and other devices. The communication unit 210 has the same function as the communication unit 110 of the terminal device 100, so a redundant explanation will be omitted.

入力部２２０は、演算装置２００のユーザ入力を受け付けるインタフェースである。入力部２２０は、例えば、キーボードや、タッチパネル、音声入力を検出するマイクであるが、これらに限られない。 The input unit 220 is an interface that receives user input from the arithmetic device 200. The input unit 220 is, for example, a keyboard, a touch panel, or a microphone that detects voice input, but is not limited to these.

出力部２３０は、情報を出力して演算装置２００のユーザに通知するインタフェースである。出力部２３０は、例えば、ディスプレイや、音声出力するスピーカであるが、これらに限られない。出力部２３０は、例えば、端末装置１００等から受信したデータをディスプレイに表示する。 The output unit 230 is an interface that outputs information and notifies the user of the computing device 200. The output unit 230 is, for example, a display or a speaker that outputs audio, but is not limited to these. For example, the output unit 230 displays data received from the terminal device 100 or the like on a display.

記憶部２４０は、例えば、ＲＡＭ等の揮発性のメモリ、フラッシュメモリ、ＨＤＤ等により構成され、演算装置２００が使用するデータ、および各種処理に用いられるコンピュータプログラムを記憶する。記憶部２４０は、端末装置１００の記憶部１４０と同等の機能であるため、重複する説明は省略する。 The storage unit 240 is configured of, for example, a volatile memory such as a RAM, a flash memory, an HDD, etc., and stores data used by the arithmetic unit 200 and computer programs used for various processes. The storage unit 240 has the same function as the storage unit 140 of the terminal device 100, so a duplicate explanation will be omitted.

記憶部２４０は、「暗号データ記憶部」として機能し、モデルデータベース２４１と、親データ記憶部２４２と、を有する。 The storage unit 240 functions as a “cipher data storage unit” and includes a model database 241 and a parent data storage unit 242.

モデルデータベース２４１は、材料の特徴を説明変数とし、材料の特性を目的変数とする材料データを教師データとして機械学習して構築された予測モデルを格納する。教師データは、具体的には、複数の材料データのそれぞれについて、材料の特徴に関する複数の説明変数の値と、材料の特性に関する一または複数の目的変数の値とが対応付けられたデータセットである。 The model database 241 stores a predictive model constructed by machine learning using material data as training data, with material characteristics as explanatory variables and material characteristics as objective variables. Specifically, the training data is a data set in which, for each of a plurality of material data, the values of a plurality of explanatory variables regarding material characteristics are associated with the values of one or more objective variables regarding material characteristics. be.

予測モデルは、端末装置１００から取得した親データから予測結果を出力する機械学習モデルである。モデルデータベース２４１は、複数の予測モデルを格納してもよい。予測モデルは、例えば、線形回帰、ロジスティクス回帰、サポートベクトルマシン（ＳＶＭ）、ニューラルネットなどモデル構築のアルゴリズムが異なったり、教師データの量、ラベリングの精度等が異なったり、説明変数の選択、前処理等が異なったり、学習プロセスのパラメータが異なっていたりしてもよい。 The prediction model is a machine learning model that outputs prediction results from parent data acquired from the terminal device 100. Model database 241 may store multiple predictive models. For example, predictive models differ in model construction algorithms such as linear regression, logistics regression, support vector machine (SVM), and neural networks, in the amount of training data, in the accuracy of labeling, and in the selection of explanatory variables and preprocessing. etc., or the parameters of the learning process may be different.

親データ記憶部２４２は、端末装置１００から取得した、暗号化された親データを格納する。また、親データ記憶部２４２は、後述する次世代生成部２５２において生成した子データを親データとして更新して格納する。端末装置１００から取得した親データは、すなわち、探索対象の材料データである。 The parent data storage unit 242 stores encrypted parent data obtained from the terminal device 100. Furthermore, the parent data storage unit 242 updates and stores child data generated by the next generation generation unit 252, which will be described later, as parent data. The parent data acquired from the terminal device 100 is, in other words, the material data to be searched.

制御部２５０は、演算装置２００の各機能を制御し、予め記憶部２４０に記憶されているプログラムに基づいて動作するＣＰＵ（Central Processing Unit）等のプロセッサである。 The control unit 250 is a processor such as a CPU (Central Processing Unit) that controls each function of the arithmetic device 200 and operates based on a program stored in the storage unit 240 in advance.

制御部２５０は、準同型暗号方式で暗号化されたデータを含む演算を行う「暗号演算部」として機能し、予測部２５１と、次世代生成部２５２と、特定困難化部２５３と、を有する。 The control unit 250 functions as a “cipher calculation unit” that performs calculations involving data encrypted using a homomorphic encryption method, and includes a prediction unit 251, a next generation generation unit 252, and an identification difficulty unit 253. .

予測部２５１は、準同型暗号方式により暗号化された親データから、モデルデータベース２４１の予測モデルに基づいて、目的変数の予測結果を出力する。具体的には、予測部２５１は、端末装置１００において設定された目的変数に応じて、モデルデータベース２４１の予測モデルを選択する。そして、予測部２５１は、予測モデルを端末装置１００の公開鍵を用いて準同型暗号方式で暗号化する。次いで、予測部２５１は、暗号化した予測モデルに親データを入力し、目的変数の予測値を出力する。出力された予測値は、端末装置１００の公開鍵を用いて準同型暗号方式により暗号化されており、演算装置２００のユーザには、開示されない。 The prediction unit 251 outputs the prediction result of the target variable from the parent data encrypted by the homomorphic encryption method based on the prediction model of the model database 241. Specifically, the prediction unit 251 selects a prediction model from the model database 241 according to the target variable set in the terminal device 100. The prediction unit 251 then encrypts the prediction model using a homomorphic encryption method using the public key of the terminal device 100. Next, the prediction unit 251 inputs the parent data into the encrypted prediction model and outputs the predicted value of the objective variable. The output predicted value is encrypted by homomorphic encryption using the public key of the terminal device 100, and is not disclosed to the user of the computing device 200.

次世代生成部２５２は、暗号化された複数の親データの中から所定の条件を満たす複数の親データに対して、遺伝アルゴリズムを適用して、複数の子データを生成する。所定の条件とは、例えば、評価値が上位１０位以内のデータであったり、評価値が所定の閾値以内のデータであったりしてもよい。また、それらの条件を満たすデータから、ランダムに選んだ親データについて遺伝アルゴリズムを適用してもよいし、ランダムに選んだ親データと、評価値に基づいて抽出した親データとを含めた複数の親データに対して遺伝アルゴリズムを適用してもよい。遺伝アルゴリズムを適用した子データの生成については、図１２において概要を説明するが、遺伝アルゴリズムは周知の技術であるので、ここでは詳細な説明は省略する。 The next generation generation unit 252 generates a plurality of child data by applying a genetic algorithm to a plurality of encrypted parent data that satisfy a predetermined condition. The predetermined condition may be, for example, data whose evaluation value is within the top 10, or data whose evaluation value is within a predetermined threshold. Additionally, genetic algorithms may be applied to randomly selected parent data from data that meet these conditions, or multiple sets of parent data, including randomly selected parent data and parent data extracted based on evaluation values, may be applied. A genetic algorithm may be applied to the parent data. The generation of child data using the genetic algorithm will be outlined in FIG. 12, but since the genetic algorithm is a well-known technique, detailed explanation will be omitted here.

特定困難化部２５３は、予測モデルへの入力（親データ）と出力（予測結果）の対応関係を、並び替える処理を行う。また、特定困難化部２５３は、並び替えられた対応関係を元に戻す処理（特定困難化処理の解除）を行う。並び替える処理を行うことにより、予測モデルに対する入力と出力との正しい対応関係を秘匿できるため、予測モデルが推定されることを防ぐことができる。 The identification difficulty unit 253 performs a process of rearranging the correspondence between the input (parent data) and the output (prediction result) to the prediction model. Further, the identification difficulty unit 253 performs processing to restore the rearranged correspondence relationships to their original state (cancellation of identification difficulty processing). By performing the rearrangement process, it is possible to hide the correct correspondence between the input and output of the prediction model, thereby preventing the prediction model from being estimated.

また、特定困難化部２５３は、複数の子データ（材料データ）に含まれる説明変数について対応関係を並び替える処理を行ってもよい。これにより、正しい子データ（入力）と出力との対応関係を秘匿できるため、予測モデルが推定されることを防ぐことができる。なお、並び替える処理の具体例については、図１０および図１１において説明する。 Further, the identification difficulty unit 253 may perform a process of rearranging the correspondence relationships for explanatory variables included in a plurality of child data (material data). This makes it possible to hide the correspondence between correct child data (input) and output, thereby preventing the prediction model from being estimated. Note that a specific example of the rearrangement process will be explained with reference to FIGS. 10 and 11.

図１０は、実施形態１における特定困難化処理の例を示す図である。図１０を参照して、特定困難化処理として、材料データの説明変数と目的変数との対応関係を並び替える処理について説明する。 FIG. 10 is a diagram illustrating an example of identification difficulty processing in the first embodiment. With reference to FIG. 10, a process of rearranging the correspondence between explanatory variables of material data and target variables will be described as a process to make identification difficult.

材料データ５１００は、材料の特徴に関する説明変数に対して、材料の特性に関する目的変数が対応しているデータセットである。材料ＩＤ５１１０と、材料ＩＤ５１２０は、同じ識別子を示している。例えば、材料ＩＤ「１」の説明変数と、材料ＩＤ「１」の目的変数とは対応関係にある。 The material data 5100 is a data set in which explanatory variables regarding material characteristics correspond to objective variables regarding material characteristics. Material ID 5110 and material ID 5120 indicate the same identifier. For example, the explanatory variable of material ID "1" and the objective variable of material ID "1" have a corresponding relationship.

材料データ５２００は、材料データ５１００に対して特定困難化処理として、説明変数と目的変数の対応関係を並び替える処理を行った後のデータセットである。材料ＩＤ５２１０は、材料ＩＤ５１１０と対応しており、同じ識別子を示す。材料ＩＤ５２２０は、材料ＩＤ５１２０の識別子（番号）を並び替えたものである。例えば、材料ＩＤ「１」の説明変数に対しては、材料ＩＤ「１」の目的変数（粘性「２０」）ではなく、材料ＩＤ「４」の目的変数（粘性「３０」）が対応付けられている。同様に、材料ＩＤ「２」の説明変数に対しては、材料ＩＤ「２」の目的変数（粘性「１０」）ではなく、材料ＩＤ「１」の目的変数（粘性「２０」）が対応付けられている。このように、特定困難化処理は、説明変数と目的変数との対応関係を並び替える。並び替えた後、例えば、仮材料ＩＤ５２３０として、新たに識別子を割り振ってもよい。 The material data 5200 is a data set after performing processing on the material data 5100 to rearrange the correspondence between explanatory variables and objective variables as a process to make identification difficult. Material ID 5210 corresponds to material ID 5110 and indicates the same identifier. The material ID 5220 is obtained by rearranging the identifiers (numbers) of the material ID 5120. For example, for the explanatory variable of material ID "1", the objective variable of material ID "4" (viscosity "30") is associated with the objective variable (viscosity "30") instead of the objective variable (viscosity "20") of material ID "1". ing. Similarly, the explanatory variable of material ID "2" is associated with the objective variable of material ID "1" (viscosity "20") instead of the objective variable of material ID "2" (viscosity "10"). It is being In this manner, the identification difficulty processing rearranges the correspondence between explanatory variables and objective variables. After sorting, a new identifier may be assigned, for example, as the temporary material ID 5230.

予測結果を端末装置１００に送信する際は、並び替えられた予測結果を送信することで、端末装置１００では、材料ＩＤについての説明変数に対応する目的変数の正しい予測値を特定することができない。すなわち、モデル利用者である端末装置１００のユーザは、説明変数および目的変数からなる材料データの取得ができないため、これらの材料データに基づいて、予測モデルを推測することは困難である。したがって、モデル提供者は、モデル利用者に予測モデルを推測されることなく、予測モデルを利用させることができる。 When transmitting the prediction results to the terminal device 100, by transmitting the rearranged prediction results, the terminal device 100 cannot identify the correct predicted value of the objective variable corresponding to the explanatory variable for the material ID. . That is, since the user of the terminal device 100 who is a model user cannot acquire material data consisting of explanatory variables and objective variables, it is difficult to infer a predictive model based on these material data. Therefore, the model provider can allow the model user to use the predictive model without having the model user guess the predictive model.

また、特定困難化部２５３は、材料ＩＤ５２２０と、仮材料ＩＤ５２３０との対応関係を記憶しておくことで、特定困難化処理を解除することができる。 Further, the identification difficulty making unit 253 can cancel the identification difficulty processing by storing the correspondence relationship between the material ID 5220 and the temporary material ID 5230.

図１１は、実施形態１における特定困難化処理の例を示す図である。図１１を参照して、特定困難化処理として、材料データの説明変数の対応関係を並び替える処理について説明する。 FIG. 11 is a diagram illustrating an example of the identification difficulty processing in the first embodiment. With reference to FIG. 11, a process of rearranging the correspondence of explanatory variables of material data will be described as a process to make identification difficult.

材料データ５３００は、材料の特徴に関する説明変数から構成される親データのデータセットである。材料データ５４００は、材料データ５３００の説明変数に対し特定困難化処理が実行されたデータセットである。 Material data 5300 is a dataset of parent data composed of explanatory variables related to material characteristics. Material data 5400 is a data set in which identification difficulty processing has been performed on the explanatory variables of material data 5300.

特定困難化処理として、材料データ５３００に対し、各材料ＩＤの説明変数「原料Ａ」に対応する値を並び替える処理を行う。例えば、材料ＩＤ「２１」～材料ＩＤ「２５」の「原料Ａ」に対応する値は、それぞれ「８０」，「１０」，「２０」，「７０」，「３０」であるが、並び替えにより、「原料Ａ」に対応する値を、それぞれ「１０」，「８０」，「７０」，「３０」，「２０」とする。同様に、材料ＩＤ「２１」～材料ＩＤ「２５」の「原料Ｂ」に対応する値は、それぞれ「１０」，「８０」，「２０」，「３０」，「７０」であるが、並び替えにより、「原料Ｂ」に対応する値を、それぞれ「２０」，「３０」，「１０」，「７０」，「８０」とする。 As identification difficulty processing, processing is performed for the material data 5300 to rearrange the values corresponding to the explanatory variable "raw material A" of each material ID. For example, the values corresponding to "raw material A" of material ID "21" to material ID "25" are "80", "10", "20", "70", and "30", respectively. Accordingly, the values corresponding to "raw material A" are set to "10", "80", "70", "30", and "20", respectively. Similarly, the values corresponding to "raw material B" of material ID "21" to material ID "25" are "10", "80", "20", "30", and "70", respectively. By changing, the values corresponding to "raw material B" are set to "20", "30", "10", "70", and "80", respectively.

このように、各説明変数の対応関係を並び替えると、材料データ５４００で示すデータセットとなる。すなわち、例えば、材料ＩＤ「２１’」で識別される材料データは、「原料Ａ」が「１０」、「原料Ｂ」が「２０」、「原料Ｃ」が「８０」、…、「添加剤Ｘ」が「０」、「添加剤Ｙ」が「６」、「添加剤Ｚ」が「０」から構成される。材料データ５３００に対して特定困難化処理を実行することにより、材料ＩＤ「２１」で識別される材料データは、材料データ５４００からは特定が困難になる。 By rearranging the correspondence of each explanatory variable in this way, a data set indicated by material data 5400 is obtained. That is, for example, the material data identified by material ID "21'" is "10" for "raw material A", "20" for "raw material B", "80" for "raw material C", etc., "additive", etc. "X" is "0", "Additive Y" is "6", and "Additive Z" is "0". By performing the identification difficulty processing on the material data 5300, the material data identified by the material ID "21" becomes difficult to identify from the material data 5400.

なお、図１１の例では、材料データ５３００の全ての説明変数の対応関係について並び替えを行ったが、各材料データの特定が困難になるのであれば、一部の説明変数についてのみ特定困難化処理を実行するようにしてもよい。また、特定困難化部２５３は、各説明変数について並び替えた材料ＩＤとの対応関係を記憶しておくことで、特定困難化処理を解除することができる。 In addition, in the example of FIG. 11, the correspondence of all the explanatory variables of the material data 5300 was rearranged, but if it becomes difficult to identify each material data, it may be necessary to make it difficult to identify only some of the explanatory variables. Processing may also be executed. Furthermore, the identification difficulty making unit 253 can cancel the identification difficulty processing by storing the correspondence relationship with the rearranged material IDs for each explanatory variable.

図１２は、遺伝アルゴリズムの適用例を示す図である。図１１を参照して、材料データ６１００の説明変数「添加剤Ｘ」、「添加剤Ｙ」、「添加剤Ｚ」を最適化する場合において、遺伝アルゴリズムにより親データ（材料ＩＤ「１」および材料ＩＤ「５」）から子データ（材料ＩＤ「Ｃ１」および材料ＩＤ「Ｃ２」）を生成する処理について説明する。 FIG. 12 is a diagram showing an example of application of the genetic algorithm. Referring to FIG. 11, when optimizing the explanatory variables “Additive X”, “Additive Y”, and “Additive Z” of material data 6100, genetic algorithm is used to The process of generating child data (material ID "C1" and material ID "C2") from material ID "5") will be described.

遺伝アルゴリズムでは、親世代の個体（各材料データ）Ｎ個の中から、二つを選択して交叉を行う、または、一つを選択して突然変異を行うか、そのままコピーするという処理を子世代の個体数がＮ個になるまで繰り返される。そして、子世代がＮ個になれば、子世代を親世代として更新し、同じ処理を繰り返す。例えば、所定の回数まで繰り返したのち、親世代の中で、評価値が良い（つまり、目標値に近い。）個体を「解」として出力する。 In the genetic algorithm, the process of selecting two from N individuals (each material data) of the parent generation and performing crossover, or selecting one and performing mutation, or copying it as is, is performed on the child. This process is repeated until the number of individuals in each generation reaches N. Then, when the number of child generations reaches N, the child generation is updated as the parent generation, and the same process is repeated. For example, after repeating the process a predetermined number of times, individuals in the parent generation with good evaluation values (that is, close to the target value) are output as "solutions."

図１２の例において、材料ＩＤ「１」および材料ＩＤ「５」の材料データについて、一点交叉する場合、まず、交叉する場所をランダムに選ぶ。材料データ６１００では、添加剤Ｘと添加剤Ｙの間を交叉点Ｌ１とする。 In the example of FIG. 12, when the material data of material ID "1" and material ID "5" intersect at one point, first, the intersecting location is randomly selected. In the material data 6100, the intersection point L1 is between the additive X and the additive Y.

そして、交叉点Ｌ１より後ろのデータを入れ替える。材料データ６１００において、原料Ａから添加剤Ｚへデータが並んでいるとすると、交叉点Ｌ１より後ろにある、材料ＩＤ「１」の「添加剤Ｙ」と「添加剤Ｚ」の値を、材料ＩＤ「５」の「添加剤Ｙ」と「添加剤Ｚ」の値と入れ替える。 Then, data after the intersection point L1 is replaced. In the material data 6100, if data is arranged from raw material A to additive Z, the values of "additive Y" and "additive Z" of material ID "1" located after the intersection point L1 are Replace the values of "Additive Y" and "Additive Z" with ID "5".

材料データ６２００は、交叉により生成された子データ（材料ＩＤ「Ｃ１」および材料ＩＤ「Ｃ５」）である。交叉により、「添加剤Ｘ」、「添加剤Ｙ」、「添加剤Ｚ」について、材料ＩＤ「１」および材料ＩＤ「５」とは異なる組成の材料データが生成されている。 Material data 6200 is child data (material ID "C1" and material ID "C5") generated by crossover. Due to the crossover, material data of compositions different from those of material ID "1" and material ID "5" is generated for "additive X", "additive Y", and "additive Z".

（データ生成システム１における処理）
図１３は、データ生成システム１における処理を示すシーケンス図である。図１３を参照して、データ生成システム１において、端末装置１００に入力された探索対象の材料データ（親データ）から次世代の子データを生成（次世代生成ステップ）し、子データを親データとして更新（更新ステップ）する処理について説明する。 (Processing in data generation system 1)
FIG. 13 is a sequence diagram showing processing in the data generation system 1. Referring to FIG. 13, in the data generation system 1, next generation child data is generated (next generation generation step) from the search target material data (parent data) input to the terminal device 100, and the child data is converted into the parent data. The process of updating (update step) as follows will be explained.

図１３では、端末装置１００、および演算装置２００（特に、予測部２５１、次世代生成部２５２、特定困難化部２５３）での処理の一例を示している。 FIG. 13 shows an example of processing in the terminal device 100 and the arithmetic device 200 (in particular, the prediction unit 251, the next generation generation unit 252, and the identification difficulty unit 253).

ステップＳ１００において、データ生成システム１では、パラメータ等の設定が行われる。具体的には、端末装置１００の鍵生成部１５１は、公開鍵および秘密鍵を生成し、記憶部１４０の鍵記憶部１４１に記憶する。また、端末装置１００は、生成した公開鍵を演算装置２００に送信し、演算装置２００は、公開鍵を記憶部２４０に記憶する。 In step S100, parameters and the like are set in the data generation system 1. Specifically, the key generation unit 151 of the terminal device 100 generates a public key and a private key, and stores them in the key storage unit 141 of the storage unit 140. Further, the terminal device 100 transmits the generated public key to the computing device 200, and the computing device 200 stores the public key in the storage unit 240.

また、端末装置１００は、材料データの探索に必要なパラメータを取得する。パラメータは、例えば、目的変数や目標値、最適化したい説明変数である。また、演算装置２００における次世代生成ステップと更新ステップを繰り返すステップの実行回数や、アルゴリズムの最大実行時間などの停止条件であってもよい。なお、停止条件は、予め、データ生成システム１の開発者側で最適な条件を設定しておき、適宜、端末装置１００のユーザによって変更できるようにしてもよい。 The terminal device 100 also acquires parameters necessary for searching for material data. The parameters are, for example, objective variables, target values, and explanatory variables to be optimized. Alternatively, the stop condition may be the number of times the arithmetic device 200 repeats the next generation generation step and the update step, or the maximum execution time of an algorithm. Note that the optimum stop conditions may be set in advance by the developer of the data generation system 1, and may be changed by the user of the terminal device 100 as appropriate.

また、端末装置１００は、探索対象の複数の材料データを、複数の親データとして取得し、記憶部１４０の材料データベース１４２に記憶する。 Further, the terminal device 100 acquires a plurality of material data to be searched as a plurality of parent data and stores them in the material database 142 of the storage unit 140.

ステップＳ１０１において、端末装置１００の暗復号部１５２は、材料データベース１４２から探索対象の複数の親データを読み出し、生成した公開鍵を用いて準同型暗号方式で暗号化する。親データは、説明変数単位（セル単位）で暗号化される。 In step S101, the encryption/decryption unit 152 of the terminal device 100 reads a plurality of parent data to be searched from the material database 142, and encrypts them using the homomorphic encryption method using the generated public key. Parent data is encrypted in explanatory variable units (cell units).

ステップＳ１０２において、演算装置２００の通信部２１０は、端末装置１００が暗号化した複数の親データを取得し、親データ記憶部２４２は、取得した複数の親データを記憶する。 In step S102, the communication unit 210 of the computing device 200 acquires a plurality of parent data encrypted by the terminal device 100, and the parent data storage unit 242 stores the acquired plurality of parent data.

ステップＳ１０３において、演算装置２００の予測部２５１は、材料の特性の予測を行う。具体的には、予測部２５１は、ステップＳ１００で設定された目的変数に応じた予測モデルをモデルデータベース２４１から読み込む。そして、予測部２５１は、探索対象の材料データである複数の親データを予測モデルに入力し、親データのそれぞれに対応する複数の予測結果（予測値）を出力する。なお、予測モデルは、暗号化された複数の親データと同じ暗号空間となるよう、端末装置１００から取得した公開鍵（すなわち、親データの暗号化で用いた公開鍵）で暗号化しておく。これにより、効率的に演算して予測結果を出力することができる。 In step S103, the prediction unit 251 of the arithmetic device 200 predicts the characteristics of the material. Specifically, the prediction unit 251 reads from the model database 241 a prediction model according to the objective variable set in step S100. Then, the prediction unit 251 inputs a plurality of parent data, which is material data to be searched, into the prediction model, and outputs a plurality of prediction results (predicted values) corresponding to each of the parent data. Note that the prediction model is encrypted using the public key obtained from the terminal device 100 (that is, the public key used in encrypting the parent data) so that it is in the same cryptographic space as the plurality of encrypted parent data. Thereby, it is possible to efficiently calculate and output prediction results.

ステップＳ１０４において、特定困難化部２５３は、予測部２５１が出力した複数の予測結果を取得する。 In step S104, the identification difficulty unit 253 acquires a plurality of prediction results output by the prediction unit 251.

ステップＳ１０５において、特定困難化部２５３は、出力された複数の予測結果に対し、複数の親データに対応する当該複数の予測結果の特定を困難にする特定困難化処理を実行する。具体的には、特定困難化部２５３は、図１０で示したように、予測モデルの入力（説明変数）と出力（目的変数）との対応関係を並び替える処理を行う。特定困難化部２５３は、並び替えた対応関係を記憶する。 In step S105, the identification difficulty unit 253 performs identification difficulty processing on the plurality of output prediction results to make it difficult to identify the plurality of prediction results corresponding to the plurality of parent data. Specifically, the identification difficulty unit 253 performs a process of rearranging the correspondence between the input (explanatory variable) and the output (objective variable) of the prediction model, as shown in FIG. The identification difficulty unit 253 stores the rearranged correspondence relationships.

ステップＳ１０６において、端末装置１００の通信部１１０は、演算装置２００の通信部２１０を介して、特定困難化処理が実行された複数の予測結果を取得する。 In step S<b>106 , the communication unit 110 of the terminal device 100 acquires, via the communication unit 210 of the arithmetic device 200 , a plurality of prediction results on which the identification difficulty processing has been performed.

ステップＳ１０７において、端末装置１００の暗復号部１５２は、鍵記憶部１４１から秘密鍵を読み込み、当該秘密鍵を用いて、取得した、特定困難化処理が実行された複数の予測結果を復号する。端末装置１００のユーザには、予測結果が開示されるが、ステップＳ１０５において親データの説明変数と目的変数（予測結果）との対応関係が並び替えられているため、どの親データに対応する予測結果であるか知ることができない。すなわち、端末装置１００のユーザには、予測モデルの入力と出力が正しい対応関係で開示されないため、予測モデルの推測を行うことは困難である。予測モデルの提供者は、モデル利用者に予測モデルを秘匿しつつ利活用させることができる。 In step S107, the encryption/decryption unit 152 of the terminal device 100 reads the secret key from the key storage unit 141, and uses the secret key to decrypt the obtained plurality of prediction results that have been subjected to the identification difficulty processing. The prediction results are disclosed to the user of the terminal device 100, but since the correspondence between the explanatory variables of the parent data and the target variables (prediction results) has been rearranged in step S105, the predictions corresponding to which parent data I can't know what the results are. That is, since the input and output of the prediction model are not disclosed in the correct correspondence to the user of the terminal device 100, it is difficult to infer the prediction model. The provider of the predictive model can allow the model user to utilize the predictive model while keeping it confidential.

ステップＳ１０８において、端末装置１００の評価値算出部１５３は、復号した複数の予測結果から、当該複数の予測結果のそれぞれの目標値に対する複数の評価値を算出する。具体的には、評価値算出部１５３は、ステップＳ１００において設定した目標値と、予測結果（予測値）との距離を算出する。なお、本実施形態では、絶対差を距離として算出するが、目標値と予測値とがどれぐらい異なるかを示す値であれば、どのような方法で算出されてもよい。 In step S108, the evaluation value calculation unit 153 of the terminal device 100 calculates a plurality of evaluation values for each target value of the plurality of prediction results from the plurality of decoded prediction results. Specifically, the evaluation value calculating unit 153 calculates the distance between the target value set in step S100 and the prediction result (predicted value). Note that in this embodiment, the absolute difference is calculated as a distance, but any method may be used to calculate the value as long as it indicates how much the target value and the predicted value differ.

ステップＳ１０９において、演算装置２００の特定困難化部２５３は、通信部２１０を介して、ステップＳ１０８で算出された複数の評価値を取得する。なお、複数の評価値は、当該評価値を算出するのに用いた予測結果との対応関係が分かる態様で演算装置２００に送信される。例えば、予測結果を識別するＩＤに対応する識別子を評価値に付与し、評価値と予測結果の対応関係と共に、演算装置２００に送信してもよい。 In step S109, the identification difficulty unit 253 of the arithmetic device 200 acquires the plurality of evaluation values calculated in step S108 via the communication unit 210. Note that the plurality of evaluation values are transmitted to the arithmetic device 200 in a manner that allows the correspondence with the prediction results used to calculate the evaluation values to be understood. For example, an identifier corresponding to an ID for identifying a prediction result may be given to the evaluation value, and the evaluation value may be transmitted to the calculation device 200 together with the correspondence between the evaluation value and the prediction result.

なお、評価値は端末装置１００において暗号化されていないため、演算装置２００のユーザには評価値が開示される。しかし、評価値からは予測モデルに入力された材料データの説明変数、およびその予測結果を求めることはできない。すなわち、演算装置２００のユーザには、どのような特徴（説明変数）の材料データに対する特性（目的変数）であるかは開示されないため、予測対象の材料データを取得することは困難である。モデル利用者は、モデル提供者に探索対象の材料データを秘匿しつつ、予測モデルを利用することができる。 Note that since the evaluation value is not encrypted in the terminal device 100, the evaluation value is disclosed to the user of the arithmetic device 200. However, the explanatory variables of the material data input to the prediction model and their prediction results cannot be determined from the evaluation values. That is, since it is not disclosed to the user of the calculation device 200 what kind of feature (explanatory variable) is the characteristic (objective variable) for the material data, it is difficult to obtain the material data to be predicted. The model user can use the predictive model while keeping the material data to be searched secret from the model provider.

ステップＳ１１０において、演算装置２００の特定困難化部２５３は、取得した複数の評価値に対応する複数の予測結果に対する特定困難化処理を解除する。具体的には、特定困難化部２５３は、ステップＳ１０５で並び替えた対応関係を元に戻す処理を実行し、親データの説明変数に対する予測結果を正しい対応関係とする。 In step S110, the identification difficulty section 253 of the arithmetic device 200 cancels the identification difficulty processing for the plurality of prediction results corresponding to the plurality of acquired evaluation values. Specifically, the identification difficulty unit 253 executes a process of restoring the correspondence relationship rearranged in step S105, and sets the prediction result for the explanatory variable of the parent data as the correct correspondence relationship.

ステップＳ１１１において、演算装置２００の次世代生成部２５２は、特定困難化処理が解除された複数の予測結果を介して、複数の親データと対応関係にある複数の評価値を取得する。親データと予測結果とが対応関係にあり、また、予測結果と、予測結果から算出した評価値とが対応関係にあるため、予測結果を介して、親データに対応する評価値を取得することができる。 In step S111, the next generation generation unit 252 of the arithmetic device 200 acquires a plurality of evaluation values corresponding to a plurality of parent data via a plurality of prediction results for which the identification difficulty processing has been canceled. Since there is a correspondence relationship between the parent data and the prediction result, and a correspondence relationship between the prediction result and the evaluation value calculated from the prediction result, it is possible to obtain the evaluation value corresponding to the parent data via the prediction result. I can do it.

ステップＳ１１２において、次世代生成部２５２は、次世代の生成を行う。具体的には、次世代生成部２５２は、所定の条件を満たす評価値に対応する複数の親データを抽出する。そして、抽出された複数の親データに対し、遺伝アルゴリズムを適用して、複数の子データを生成する。所定の条件とは、例えば、評価値が上位１０位以内であったり、評価値が所定の閾値以内であったりしてもよい。遺伝アルゴリズムの適用例については、図１１で示した通りである。材料データは、各説明変数単位で暗号化されているため、生成された子データは、演算装置２００のユーザには開示されない。 In step S112, the next generation generation unit 252 generates the next generation. Specifically, the next generation generation unit 252 extracts a plurality of parent data corresponding to evaluation values that satisfy a predetermined condition. Then, a genetic algorithm is applied to the extracted plurality of parent data to generate a plurality of child data. The predetermined condition may be, for example, that the evaluation value is within the top 10 or that the evaluation value is within a predetermined threshold. An application example of the genetic algorithm is as shown in FIG. 11. Since the material data is encrypted for each explanatory variable, the generated child data is not disclosed to the user of the computing device 200.

ステップＳ１１３において、特定困難化部２５３は、生成された複数の子データを取得する。 In step S113, the identification difficulty unit 253 acquires the plurality of generated child data.

ステップＳ１１４において、特定困難化部２５３は、生成された複数の子データの説明変数に対し、特定困難化処理を実行する。具体的には、図１１において説明したように、各子データが含む説明変数の対応関係を並び替える。 In step S114, the identification difficulty making unit 253 performs identification difficulty processing on the generated explanatory variables of the plurality of child data. Specifically, as explained in FIG. 11, the correspondence of explanatory variables included in each child data is rearranged.

ステップＳ１１５において、端末装置１００の通信部１１０は、特定困難化処理が実行された複数の子データを取得する。 In step S115, the communication unit 110 of the terminal device 100 acquires a plurality of child data on which the identification difficulty processing has been performed.

ステップＳ１１６において、端末装置１００の暗復号部１５２は、複数の子データを再暗号化する。具体的には、暗復号部１５２は、鍵記憶部１４１から秘密鍵を読み込み、当該秘密鍵を用いて、ステップＳ１１５で取得した複数の子データを復号する。さらに、暗復号部１５２は、鍵記憶部１４１から公開鍵を読み込み、復号した複数の子データを、当該公開鍵を用いて準同型暗号方式で再暗号化する。子データは、説明変数単位で暗号化される。 In step S116, the encryption/decryption unit 152 of the terminal device 100 re-encrypts the plurality of child data. Specifically, the encryption/decryption unit 152 reads the private key from the key storage unit 141, and uses the private key to decrypt the plurality of child data acquired in step S115. Further, the encryption/decryption unit 152 reads the public key from the key storage unit 141 and re-encrypts the plurality of decrypted child data using the homomorphic encryption method using the public key. Child data is encrypted in units of explanatory variables.

ステップＳ１１６で行う再暗号化の処理は、準同型暗号を用いた演算の過程で蓄積される誤差を制御し、効率的に演算を実行し続けるために必要な処理である。再暗号化の処理のために、ステップＳ１１６では、子データが復号されるが、たとえ子データが復号されたとしても、ステップＳ１１４で子データに対し特定困難化処理がされているため、端末装置１００のユーザは子データを特定することはできない。 The re-encryption process performed in step S116 is necessary to control errors accumulated in the process of calculations using homomorphic encryption and to continue to perform calculations efficiently. For the re-encryption process, the child data is decrypted in step S116, but even if the child data is decrypted, the child data has been processed to make it difficult to identify in step S114, so the terminal device 100 users cannot specify child data.

ステップＳ１１７において、演算装置２００の特定困難化部２５３は、通信部２１０を介して、再暗号化された複数の子データを取得する。 In step S117, the identification difficulty unit 253 of the arithmetic device 200 acquires the plurality of re-encrypted child data via the communication unit 210.

ステップＳ１１８において、特定困難化部２５３は、複数の子データの説明変数に対し、特定困難化処理を解除する。具体的には、ステップＳ１１４で実行した説明変数の対応関係を元に戻す並び替えを行う。 In step S118, the identification difficulty making unit 253 cancels the identification difficulty processing for the explanatory variables of the plurality of child data. Specifically, the rearrangement is performed to restore the correspondence of explanatory variables that was performed in step S114.

ステップＳ１１９において、演算装置２００の次世代生成部２５２は、特定困難化処理が解除された複数の子データを取得する。 In step S119, the next generation generation unit 252 of the arithmetic device 200 acquires a plurality of child data from which the identification difficulty processing has been canceled.

ステップＳ１２０において、次世代生成部２５２は、子データを親データとして更新する。具体的には、次世代生成部２５２は、ステップＳ１１９で取得した子データを親データ記憶部２４２に記憶させる。すなわち、親データ記憶部２４２は、複数の子データを、暗号化された複数の親データとして記憶する。 In step S120, the next generation generation unit 252 updates the child data as parent data. Specifically, the next generation generation unit 252 causes the parent data storage unit 242 to store the child data acquired in step S119. That is, the parent data storage unit 242 stores a plurality of child data as a plurality of encrypted parent data.

ステップＳ１２１において、演算装置２００の予測部２５１は、暗号化された複数の親データを取得する。 In step S121, the prediction unit 251 of the arithmetic device 200 acquires a plurality of encrypted parent data.

上述したように、ステップＳ１０３～ステップＳ１１９までの処理が、演算装置２００の制御部２５０および端末装置１００の制御部１５０が、暗号化された複数の親データを機械学習モデルに入力して出力された複数の予測結果に基づき、複数の子データを生成する次世代生成ステップである。また、ステップＳ１２０は、演算装置２００の親データ記憶部２４２が、生成された複数の子データを、暗号化された複数の親データとして記憶する更新ステップである。 As described above, the processing from step S103 to step S119 is performed by the control unit 250 of the arithmetic device 200 and the control unit 150 of the terminal device 100 inputting a plurality of encrypted parent data to the machine learning model and outputting the data. This is a next generation generation step that generates a plurality of child data based on a plurality of prediction results. Further, step S120 is an update step in which the parent data storage unit 242 of the arithmetic device 200 stores the plurality of generated child data as a plurality of encrypted parent data.

ステップＳ１２１以降は、ステップＳ１０３～ステップＳ１２０において行った処理を繰り返すことにより、次世代の子データを繰り返し生成することができる。 After step S121, the next generation child data can be repeatedly generated by repeating the processes performed in steps S103 to S120.

なお、ステップＳ１１６で行った再暗号化の処理は、準同型暗号のアルゴリズムや、演算装置２００の処理性能等に応じて、準同型暗号を用いた演算が現実的な時間で実行できる場合は省略してもよい。再暗号化の処理を省略する場合は、ステップＳ１１２において、次世代生成部２５２が生成した複数の子データは、次いで、ステップＳ１２０において、親データとして更新される。すなわち、端末装置１００で再暗号化の処理がされない（子データが復号されない）ため、子データを秘匿するための特定困難化処理を実行する必要がなくなり、ステップＳ１１３～ステップＳ１１９の処理を省略することができる。 Note that the re-encryption process performed in step S116 may be omitted if the calculation using homomorphic encryption can be executed in a realistic time depending on the homomorphic encryption algorithm, the processing performance of the calculation device 200, etc. You may. If the re-encryption process is omitted, the plurality of child data generated by the next generation generation unit 252 in step S112 are then updated as parent data in step S120. That is, since the re-encryption process is not performed in the terminal device 100 (the child data is not decrypted), there is no need to perform the identification difficulty process for concealing the child data, and the processes from step S113 to step S119 are omitted. be able to.

図１４は、データ生成システム１における処理を示すシーケンス図である。図１４を参照して、図１３で示した処理以降に行われる、探索した材料データを出力する処理について説明する。 FIG. 14 is a sequence diagram showing processing in the data generation system 1. With reference to FIG. 14, the process of outputting the searched material data, which is performed after the process shown in FIG. 13, will be described.

ステップＳ２００において、端末装置１００と演算装置２００との間で、次世代生成ステップと、更新ステップとを繰り返すステップが行われる。繰り返し回数については、図１３のステップＳ１００において、設定パラメータとして入力されてもよい。 In step S200, a step of repeating a next generation generation step and an update step is performed between the terminal device 100 and the arithmetic device 200. The number of repetitions may be input as a setting parameter in step S100 of FIG. 13.

ステップＳ２０１において、演算装置２００の予測部２０１は、ステップＳ２００において更新された複数の親データ（すなわち、生成された複数の子データ）を取得して予測モデルに入力し、予測結果を出力する。 In step S201, the prediction unit 201 of the arithmetic device 200 acquires a plurality of parent data (that is, a plurality of generated child data) updated in step S200, inputs the acquired data to a prediction model, and outputs a prediction result.

ステップＳ２０２において、端末装置１００の通信部１１０は、遺伝アルゴリズムによる探索結果として、ステップＳ２０１で予測モデルに入力された複数の親データ、および当該親データと対応関係にある予測結果を取得する。 In step S202, the communication unit 110 of the terminal device 100 acquires a plurality of parent data input to the prediction model in step S201 and a prediction result corresponding to the parent data, as a search result using the genetic algorithm.

ステップＳ２０３において、端末装置１００の暗復号部１５２は、鍵記憶部１４１から秘密鍵を読み込み、取得した親データおよび予測結果を復号する。 In step S203, the encryption/decryption unit 152 of the terminal device 100 reads the private key from the key storage unit 141 and decrypts the obtained parent data and prediction result.

以上の処理により、端末装置１００のユーザは、探索対象の材料データおよび、当該材料データから予測される目的変数の予測値を得ることができる。 Through the above processing, the user of the terminal device 100 can obtain the material data to be searched and the predicted value of the objective variable predicted from the material data.

なお、本実施形態に係るデータ生成システム１では、探索対象は材料データであったが、この他にも、例えば、工業製品の製造プロセスにおいて、デザイン（形状、外観、寸法等）や仕様（機能性、性能、規格等）、配送ルート最適化、需要予測に関するパラメータを探索対象とすることも可能である。 In the data generation system 1 according to the present embodiment, the search target is material data, but in addition to this, for example, in the manufacturing process of industrial products, design (shape, appearance, dimensions, etc.) and specifications (function) can be searched. It is also possible to search for parameters related to performance, performance, standards, etc.), delivery route optimization, and demand forecasting.

（効果の説明）
本発明に係るデータ生成方法によれば、データを暗号化したまま演算することができる準同型暗号を用い、遺伝アルゴリズムにより生成された材料データと、材料データの特性を予測する予測モデルとを用いることにより、未知の材料データを探索する。ここで、遺伝アルゴリズムにおける親世代集団に対する生存選択の操作を、予測モデルの入力（予測対象データ）と出力（予測結果）との対応関係が特定困難となる処理の実行後に行い、予測結果に対する評価値に基づいて子世代集団を生成する。これにより、モデル利用者に予測モデルが推測されることを防ぐことができ、予測モデルの提供者は、モデル利用者に予測モデルを秘匿しつつ利活用させることができる。 (Explanation of effects)
According to the data generation method according to the present invention, homomorphic encryption is used that allows calculations to be made while data is encrypted, and material data generated by a genetic algorithm and a prediction model that predicts the characteristics of the material data are used. This allows us to explore unknown material data. Here, the survival selection operation for the parent generation population in the genetic algorithm is performed after processing that makes it difficult to identify the correspondence between the input (prediction target data) and output (prediction result) of the prediction model, and the prediction results are evaluated. Generate a child generation population based on the value. This can prevent the model user from guessing the predictive model, and the provider of the predictive model can allow the model user to utilize the predictive model while keeping it confidential.

また、遺伝アルゴリズムにおいて、次世代集団を効率的に生成することができ、計算コストを削減することができる。 Furthermore, in the genetic algorithm, the next generation population can be efficiently generated and calculation costs can be reduced.

また、モデル利用者は、予測対象データをモデル提供者（またはプラットフォーマ）に開示することなく、予測モデルを利用することができる。 Furthermore, the model user can use the prediction model without disclosing the prediction target data to the model provider (or platform provider).

また、本発明に係るデータ生成方法によれば、特定困難化処理を、対応関係の並び替えにより実現する。これにより、暗号鍵を用いて暗号化するよりも計算コストを削減しつつ、機微データを秘匿することができる。 Further, according to the data generation method according to the present invention, the identification difficulty processing is realized by rearranging the correspondence relationships. Thereby, sensitive data can be kept secret while reducing calculation costs compared to encrypting using an encryption key.

また、本発明に係るデータ生成方法によれば、準同型暗号を用いた演算の過程で蓄積された誤差を制御し、効率的に演算を実行し続けるために必要な再暗号化の処理に関し、再暗号のために子データが復号される前に、子データの説明変数に対し特定困難化処理を行う。これにより、モデル利用者は子データを復号しても、子データを特定することはできない。したがって、モデル利用者に、遺伝アルゴリズムによって生成される途中の子世代のデータを秘匿することができ、子世代のデータと予測結果とに基づいて、予測モデルが推測されることを防ぐことができる。 Further, according to the data generation method according to the present invention, regarding the re-encryption process necessary to control errors accumulated in the process of calculation using homomorphic encryption and continue to perform calculations efficiently, Before the child data is decrypted for re-encryption, processing to make identification difficult is performed on the explanatory variables of the child data. As a result, even if the model user decrypts the child data, the model user cannot identify the child data. Therefore, it is possible to hide the child generation data that is being generated by the genetic algorithm from the model user, and it is possible to prevent the prediction model from being inferred based on the child generation data and prediction results. .

＜実施形態２＞
実施形態１では、特定困難化処理として、対応関係を並び替える処理を行った。本実施形態２に係るデータ生成システム２では、特定困難化処理として、ノイズを付加する処理を行う。 <Embodiment 2>
In the first embodiment, processing for rearranging the correspondence relationships was performed as the processing for making identification difficult. The data generation system 2 according to the second embodiment performs a process of adding noise as the process of making identification difficult.

データ生成システム２は、実施形態１に係るデータ生成システム１の演算装置２００に代えて、演算装置３００を備える。まず、本実施形態における特定困難化処理の説明をした後、演算装置３００の機能的構成について説明する。 The data generation system 2 includes a calculation device 300 in place of the calculation device 200 of the data generation system 1 according to the first embodiment. First, the identification difficulty processing in this embodiment will be explained, and then the functional configuration of the arithmetic device 300 will be explained.

図１５は、実施形態２における特定困難化処理の例を示す図である。図１５では、説明のため、材料データの目的変数が「粘性」のみの場合を示している。図１５を参照して、特定困難化処理として、材料データの目的変数にノイズを付加する処理について説明する。 FIG. 15 is a diagram illustrating an example of identification difficulty processing in the second embodiment. For the sake of explanation, FIG. 15 shows a case where the objective variable of the material data is only "viscosity." With reference to FIG. 15, a process of adding noise to the target variable of material data will be described as a process to make identification difficult.

テーブル７０００は、材料ＩＤ「３１」～「３５」で識別される材料データの目的変数「粘性」の予測値および評価値に対するノイズ付加、ノイズ除去による値の変化を示す。 Table 7000 shows changes in values due to noise addition and noise removal to predicted values and evaluation values of the objective variable "viscosity" of material data identified by material IDs "31" to "35".

列７１００は、材料ＩＤ「３１」～「３５」で識別される材料データの目的変数「粘性」の予測値をそれぞれ示している。列７１００で示される予測値には、ノイズは付加されていない。 Column 7100 shows predicted values of the target variable "viscosity" of material data identified by material IDs "31" to "35". No noise is added to the predicted values shown in column 7100.

列７２００は、材料ＩＤ「３１」～「３５」の予測値に付加するノイズの値を示している。例えば、材料ＩＤ「３１」の予測値「２０」には、ノイズとして「＋１０」が付加される。また、例えば、材料ＩＤ「３４」の予測値「３０」には、ノイズとして「－２」が付加される。 Column 7200 shows the noise values added to the predicted values of material IDs "31" to "35". For example, "+10" is added as noise to the predicted value "20" of material ID "31". Further, for example, "-2" is added as noise to the predicted value "30" of the material ID "34".

列７３００は、ノイズが付加された後の予測値を示している。例えば、材料ＩＤ「３１」の予測値「２０」に対し、ノイズとして「＋１０」を付加すると、ノイズを含む予測値は「３０」となる。 Column 7300 shows the predicted values after noise is added. For example, if "+10" is added as noise to the predicted value "20" of material ID "31", the predicted value including noise becomes "30".

列７４００は、目標値を「２０」とした場合の、ノイズを含む予測値と目標値との評価値を示している。なお、本実施形態において、評価値は、目標値と予測値との距離（絶対差）である。評価値として「絶対差」を用いるのは一例であって、他の周知の方法で評価するようにしてもよい。 Column 7400 shows evaluation values between the predicted value including noise and the target value when the target value is "20". Note that in this embodiment, the evaluation value is the distance (absolute difference) between the target value and the predicted value. Using "absolute difference" as the evaluation value is just one example, and evaluation may be performed using other well-known methods.

例えば、材料ＩＤ「３１」のノイズを含む予測値「３０」と目標値「２０」との距離は「１０」（＝｜３０－２０｜）となる。また、例えば、材料ＩＤ「３２」のノイズを含む予測値「２」と目標値「２０」との距離は「１８」（＝｜２－２０｜）となる。 For example, the distance between the noise-containing predicted value "30" of the material ID "31" and the target value "20" is "10" (=|30-20|). Further, for example, the distance between the noise-containing predicted value "2" of the material ID "32" and the target value "20" is "18" (=|2-20|).

列７５００は、ノイズを含む評価値からノイズを除去した後の評価値を示している。すなわち、列７５００に示される各値は、列７４００で示されるノイズを含む各評価値から、列７２００で示されるノイズの各値を除去した値である。例えば、材料ＩＤ「３３」のノイズを含む評価値「３５」からノイズ「＋５」を除去すると評価値は「３０」となる。 Column 7500 shows evaluation values after noise is removed from evaluation values that include noise. That is, each value shown in column 7500 is a value obtained by removing each value of noise shown in column 7200 from each evaluation value including noise shown in column 7400. For example, if noise "+5" is removed from the noise-containing evaluation value "35" of material ID "33", the evaluation value becomes "30".

（演算装置３００の機能的構成）
図１６は、演算装置３００の機能的構成を示すブロック図である。図１６を参照して、本実施形態に係る演算装置３００の機能的構成について説明する。なお、実施形態１に係る演算装置２００と共通の構成要素には同一の符号を付しており、繰り返しの説明は省略する。 (Functional configuration of arithmetic device 300)
FIG. 16 is a block diagram showing the functional configuration of the arithmetic device 300. The functional configuration of the arithmetic device 300 according to this embodiment will be described with reference to FIG. 16. Note that the same components as those in the arithmetic device 200 according to the first embodiment are given the same reference numerals, and repeated explanations will be omitted.

演算装置３００は、演算装置２００の制御部２５０に代えて、制御部３５０を備える。制御部３５０は、予測部２５１と、次世代生成部２５２と、特定困難化部３５３と、を有する。 The arithmetic device 300 includes a control section 350 instead of the control section 250 of the arithmetic device 200. The control unit 350 includes a prediction unit 251, a next generation generation unit 252, and an identification difficulty unit 353.

特定困難化部３５３は、複数の親データ（材料データ）のそれぞれについて予測部２５１が出力した予測値についてノイズを付加する処理を行う。また、特定困難化部３５３は、端末装置１００において算出された評価値から、予測値に付加したノイズを除去する処理（特定困難化処理の解除）を行う。ノイズを付加する処理を行うことにより、親データに対応する予測値を特定できないようにすることができる。なお、予測値に対するノイズの付加は、準同型暗号で暗号化された数値同士の演算が行われる。一方、評価値に対するノイズの除去は、端末装置１００において算出される評価値は平文であるから、平文の数値同士の演算が行われる。 The identification difficulty unit 353 performs a process of adding noise to the predicted value output by the prediction unit 251 for each of a plurality of parent data (material data). Further, the identification difficulty unit 353 performs a process of removing noise added to the predicted value from the evaluation value calculated in the terminal device 100 (cancellation of identification difficulty processing). By performing the process of adding noise, it is possible to make it impossible to specify the predicted value corresponding to the parent data. Note that noise is added to the predicted value by performing calculations between numerical values encrypted using homomorphic encryption. On the other hand, since the evaluation value calculated in the terminal device 100 is a plain text, noise removal from the evaluation value is performed by calculating the numerical values of the plain text.

また、特定困難化部３５３は、複数の子データ（材料データ）に含まれる説明変数について、ノイズを付加する処理を行う。これにより、子データの特徴（説明変数）を特定できないようにすることができる。なお、説明変数に対するノイズの付加、除去は、準同型暗号で暗号化された数値同士の演算が行われる。 Further, the identification difficulty unit 353 performs a process of adding noise to explanatory variables included in a plurality of child data (material data). This makes it possible to prevent the characteristics (explanatory variables) of child data from being specified. Note that noise is added to and removed from explanatory variables by performing calculations between numerical values encrypted using homomorphic encryption.

特定困難化部３５３の具体的な処理例については、図１５において説明した通りである。 The specific processing example of the identification difficulty unit 353 is as described in FIG. 15 .

（データ生成システム２における処理）
図１７は、データ生成システム２における処理を示すシーケンス図である。データ生成システム２における処理は、実施形態１に係るデータ生成システム１が行う処理（図１３および図１４参照）のうち、特定困難化処理が異なる以外は、同様の処理が行われる。したがって、図１７を参照して、データ生成システム２における処理のうち、データ生成システム１と異なる処理である特定困難化処理に関するステップについて説明する。 (Processing in data generation system 2)
FIG. 17 is a sequence diagram showing processing in the data generation system 2. The processing in the data generation system 2 is the same as the processing performed by the data generation system 1 according to the first embodiment (see FIGS. 13 and 14), except that the identification difficulty processing is different. Therefore, with reference to FIG. 17, steps related to the identification difficulty processing, which is different from the processing in the data generation system 1, among the processing in the data generation system 2 will be described.

ステップＳ３０５において、特定困難化部３５３は、出力された複数の予測結果に対し、複数の親データに対応する当該複数の予測結果の特定を困難にする特定困難化処理を実行する。具体的には、特定困難化部３５３は、図１５で示したように、複数の親データのそれぞれに対応する予測結果（予測値）に対してノイズを付加する処理を行う。特定困難化部３５３は、付加したノイズを記憶する。 In step S305, the identification difficulty unit 353 performs identification difficulty processing on the plurality of output prediction results to make it difficult to identify the plurality of prediction results corresponding to the plurality of parent data. Specifically, as shown in FIG. 15, the identification difficulty unit 353 performs a process of adding noise to the prediction results (predicted values) corresponding to each of the plurality of parent data. The identification difficulty unit 353 stores the added noise.

ステップＳ３１０において、特定困難化部３５３は、取得した複数の評価値に対応する複数の予測結果に対する特定困難化処理を解除する。具体的には、特定困難化部３５３は、ステップＳ３０５で付加したノイズを評価値から除去する処理を実行し、親データの説明変数に対する予測結果を正しい対応関係とする。 In step S310, the identification difficulty making unit 353 cancels the identification difficulty processing for the plurality of prediction results corresponding to the plurality of acquired evaluation values. Specifically, the identification difficulty unit 353 executes a process of removing the noise added in step S305 from the evaluation value, and makes the prediction result for the explanatory variable of the parent data a correct correspondence.

ステップＳ３１４において、特定困難化部３５３は、生成された複数の子データの説明変数に対し、特定困難化処理を実行する。具体的には、子データの説明変数に対し、ノイズを付加する。ノイズを付加する説明変数は、子データに含まれるすべての説明変数であってもよいし、一部の説明変数であってもよい。特定困難化部３５３は、付加したノイズを記憶する。 In step S314, the identification difficulty making unit 353 performs identification difficulty processing on the generated explanatory variables of the plurality of child data. Specifically, noise is added to the explanatory variables of child data. The explanatory variables to which noise is added may be all the explanatory variables included in the child data, or may be some of the explanatory variables. The identification difficulty unit 353 stores the added noise.

ステップＳ３１８において、特定困難化部３５３は、複数の子データの説明変数に対し、特定困難化処理を解除する。具体的には、ステップＳ３１４で説明変数に付加したノイズを除去する。なお、ステップＳ３１４およびステップＳ３１８におけるノイズの付加・除去は、準同型暗号で暗号化された数値同士の演算が行われる。 In step S318, the identification difficulty making unit 353 cancels the identification difficulty processing for the explanatory variables of the plurality of child data. Specifically, the noise added to the explanatory variables in step S314 is removed. Note that the noise addition/removal in steps S314 and S318 is performed by calculating numerical values encrypted using homomorphic encryption.

上述した特定困難化処理を実行することにより、予測モデルの入力と出力の対応関係、および子データについて、暗号化を行うことなく特定を困難とすることができる。 By performing the above-described identification difficulty processing, it is possible to make identification difficult without encrypting the correspondence between the input and output of the prediction model and the child data.

＜実施形態３＞
実施形態１および実施形態２では、特定困難化処理として、それぞれ対応関係を並び替える処理、ノイズを付加する処理を行った。実施形態３に係るデータ生成システム３では、特定困難化処理を、対応関係を並び替える処理、またはノイズを付加する処理の組み合わせとする。 <Embodiment 3>
In Embodiment 1 and Embodiment 2, processing for rearranging correspondence relationships and processing for adding noise were performed as processing for making identification difficult, respectively. In the data generation system 3 according to the third embodiment, the identification difficulty processing is a combination of processing for rearranging correspondence relationships or processing for adding noise.

具体的には、特定困難化処理のうち、予測モデルの入力と出力との対応関係を特定困難とするための処理（例えば、図１３のステップＳ１０５に相当。）は、対応関係を並び替える処理とする。また、子データを特定困難とするための処理（例えば、図１３のステップＳ１１４に相当。）は、ノイズを付加する処理とする。 Specifically, among the identification difficulty processing, the processing for making it difficult to identify the correspondence between the input and output of the prediction model (e.g., corresponding to step S105 in FIG. 13) is a process for rearranging the correspondence. shall be. Further, the process for making it difficult to identify the child data (e.g., corresponding to step S114 in FIG. 13) is the process of adding noise.

また、その逆の組み合わせであってもよい。すなわち、予測モデルの入力と出力との対応関係を特定困難とするための処理（例えば、図１３のステップＳ１０５に相当。）が、ノイズを付加する処理であり、子データを特定困難とするための処理（例えば、図１３のステップＳ１１４に相当。）が、対応関係を並び替える処理であってもよい。 Alternatively, the reverse combination may be used. That is, the process for making it difficult to identify the correspondence between the input and output of the prediction model (e.g., corresponding to step S105 in FIG. 13) is a process for adding noise, which makes it difficult to identify the child data. The process (for example, corresponding to step S114 in FIG. 13) may be a process of rearranging the correspondence relationships.

また、特定困難化処理として、対応関係を並び替える処理を行った後、ノイズを付加する処理を行ってもよい。また、その逆に、ノイズを付加する処理を行った後、対応関係を並び替える処理を行ってもよい。 Further, as the identification difficulty processing, processing for rearranging the correspondence relationships may be performed, and then processing for adding noise may be performed. Conversely, after performing the process of adding noise, the process of rearranging the correspondence relationships may be performed.

データの特定を困難にする処理を組み合わせることで、情報が開示されてしまうリスクをより低減させることができる。 By combining processes that make it difficult to identify data, the risk of information disclosure can be further reduced.

＜実施形態４＞
実施形態１から実施形態３では、端末装置１００が評価値を算出した。本実施形態４に係るデータ生成システム４では、評価値の算出を、端末装置側ではなく、演算装置側で行う。端末装置側においてデータの暗復号を行い、演算装置側において、データの暗復号以外のデータに関する演算処理を集約することで、効率的に演算処理を実行することができる。 <Embodiment 4>
In Embodiments 1 to 3, the terminal device 100 calculated the evaluation value. In the data generation system 4 according to the fourth embodiment, evaluation values are calculated not on the terminal device side but on the arithmetic device side. By encrypting and decoding data on the terminal device side and consolidating arithmetic processing related to data other than data encryption and decoding on the arithmetic device side, arithmetic processing can be efficiently executed.

（データ生成システム４の構成）
図１８は、データ生成システム４の全体図である。図１８を参照して、本実施形態に係るデータ生成システム４の全体図について説明する。 (Configuration of data generation system 4)
FIG. 18 is an overall diagram of the data generation system 4. An overall diagram of the data generation system 4 according to this embodiment will be described with reference to FIG. 18.

本実施形態に係るデータ生成システム４は、実施形態１に係るデータ生成すステム１の端末装置１００および演算装置２００に代えて、端末装置４００および演算装置５００を備える。 The data generation system 4 according to the present embodiment includes a terminal device 400 and a calculation device 500 in place of the terminal device 100 and the calculation device 200 of the data generation system 1 according to the first embodiment.

（端末装置４００の機能的構成）
図１９は、端末装置４００の機能的構成を示すブロック図である。実施形態１に係る端末装置１００と共通の構成要素には同一の符号を付しており、繰り返しの説明は省略する。 (Functional configuration of terminal device 400)
FIG. 19 is a block diagram showing the functional configuration of the terminal device 400. Components common to the terminal device 100 according to Embodiment 1 are given the same reference numerals, and repeated explanations will be omitted.

端末装置４００は、端末装置１００の制御部１５０に代えて、制御部４５０を備える点が端末装置１００と異なる。制御部４５０は、鍵生成部１５１と、暗復号部１５２と、を備える。制御部４５０は、端末装置１００の制御部１５０と比べて、評価値算出部１５３を有しない点が異なる。 The terminal device 400 differs from the terminal device 100 in that it includes a control section 450 instead of the control section 150 of the terminal device 100. The control unit 450 includes a key generation unit 151 and an encryption/decryption unit 152. The control unit 450 differs from the control unit 150 of the terminal device 100 in that it does not include the evaluation value calculation unit 153.

端末装置１００は、演算装置２００から取得した、特定困難化処理が実行された予測結果を復号し、評価値を算出した。一方、端末装置４００は、特定困難化処理が実行された予測結果を復号した後、通信部１１０を介して、演算装置５００に送信する。 The terminal device 100 decoded the prediction result obtained from the arithmetic device 200 and subjected to the identification difficulty processing, and calculated an evaluation value. On the other hand, the terminal device 400 decodes the prediction result on which the identification difficulty processing has been performed, and then transmits it to the arithmetic device 500 via the communication unit 110.

（演算装置５００の機能的構成）
図２０は、演算装置５００の機能的構成を示すブロック図である。実施形態１に係る演算装置２００と共通の構成要素には同一の符号を付しており、繰り返しの説明は省略する。 (Functional configuration of computing device 500)
FIG. 20 is a block diagram showing the functional configuration of arithmetic device 500. Components common to the arithmetic device 200 according to the first embodiment are denoted by the same reference numerals, and repeated explanations will be omitted.

演算装置５００は、演算装置２００の制御部２５０に代えて、制御部５５０を備える点が演算装置２００と異なる。制御部５５０は、演算装置２００の制御部２５０と比べて、予測部２５１と、次世代生成部２５２と、特定困難化部２５３と、に加えて、評価値算出部５５４を有する点が異なる。 The arithmetic device 500 differs from the arithmetic device 200 in that it includes a control section 550 instead of the control section 250 of the arithmetic device 200. The control unit 550 differs from the control unit 250 of the arithmetic device 200 in that it includes an evaluation value calculation unit 554 in addition to a prediction unit 251, a next generation generation unit 252, and an identification difficulty unit 253.

評価値算出部５５４は、予測部２５１が出力し、端末装置１００において復号された予測結果（予測値）から、目標値に対する評価値を算出する。評価値は、二つの値がどの程度異なるかを示す値であり、実施形態１と同様に、説明を容易にするため、本実施形態では、評価値は予測結果と目標値との絶対差とする。 The evaluation value calculation unit 554 calculates an evaluation value for the target value from the prediction result (prediction value) output by the prediction unit 251 and decoded by the terminal device 100. The evaluation value is a value that indicates how much two values differ. Similarly to Embodiment 1, for ease of explanation, in this embodiment, the evaluation value is expressed as the absolute difference between the predicted result and the target value. do.

（データ生成システム４における処理）
図２１は、データ生成システム４における処理を示すシーケンス図である。より具体的には、データ生成システム４において、端末装置４００に入力された探索対象の材料データ（親データ）から次世代の子データを生成（次世代生成ステップ）し、子データを親データとして更新（更新ステップ）する処理である。 (Processing in data generation system 4)
FIG. 21 is a sequence diagram showing processing in the data generation system 4. More specifically, in the data generation system 4, next generation child data is generated (next generation generation step) from the search target material data (parent data) input to the terminal device 400, and the child data is used as the parent data. This is a process of updating (update step).

図２１で示すデータ生成システム４における処理は、実施形態１（図１３）で示したデータ生成システム１におけるステップＳ１０８からステップＳ１１１の処理に代えて、ステップＳ４０８からステップＳ４１１の処理である点が、データ生成システム１における処理とは異なる。すなわち、データ生成システム４におけるステップＳ４０８からステップＳ４１１以外の処理ステップは、データ生成システム１における処理ステップと同一であり、同一の符号を付している。したがって、同一符号であるステップについては繰り返しの説明は省略する。図２１で示す処理に引き続いて、図１４で示した処理が行われ、探索した材料データが出力される。 The process in the data generation system 4 shown in FIG. 21 is the process from step S408 to step S411 instead of the process from step S108 to step S111 in the data generation system 1 shown in the first embodiment (FIG. 13). This is different from the processing in the data generation system 1. That is, the processing steps other than step S408 to step S411 in the data generation system 4 are the same as the processing steps in the data generation system 1, and are given the same reference numerals. Therefore, repeated description of steps having the same reference numerals will be omitted. Following the process shown in FIG. 21, the process shown in FIG. 14 is performed, and the searched material data is output.

ステップＳ４０８は、ステップＳ１０７の処理に次いで行われる処理である。端末装置４００は、ステップＳ１０７において復号された、特定困難化処理が実行された複数の予測結果を、演算装置５００に送信する。演算装置５００の特定困難化部２５３は、通信部２１０を介して、復号された複数の予測結果を取得する。 Step S408 is a process performed subsequent to the process of step S107. The terminal device 400 transmits the plurality of prediction results decoded in step S107 and subjected to the identification difficulty processing to the arithmetic device 500. The identification difficulty section 253 of the arithmetic device 500 acquires the plurality of decoded prediction results via the communication section 210.

ステップＳ４０９において、演算装置５００の特定困難化部２５３は、取得した複数の予測結果に対する特定困難化処理を解除する。具体的には、特定困難化部２５３は、ステップＳ１０５で並び替えた対応関係を元に戻す処理を実行し、親データの説明変数に対する予測結果を正しい対応関係とする。次いで、特定困難化部２５３は、複数の親データと、当該親データに対して正しい対応関係となった予測結果を評価値算出部５５４へ送る。 In step S409, the identification difficulty section 253 of the arithmetic device 500 cancels the identification difficulty processing for the plurality of acquired prediction results. Specifically, the identification difficulty unit 253 executes a process of restoring the correspondence relationship rearranged in step S105, and sets the prediction result for the explanatory variable of the parent data as the correct correspondence relationship. Next, the identification difficulty unit 253 sends the plurality of parent data and the prediction result showing the correct correspondence to the parent data to the evaluation value calculation unit 554.

ステップＳ４１０において、評価値算出部５５４は、端末装置４００において復号された複数の予測結果から、当該複数の予測結果のそれぞれの目標値に対する複数の評価値を算出する。具体的には、評価値算出部１５３と同様に、評価値算出部５５４は、ステップＳ１００において設定した目標値と、予測結果（予測値）との距離を算出する。なお、本実施形態では、実施形態１と同様に、絶対差を距離として算出するが、目標値と予測値とがどれぐらい異なるかを示す値であれば、どのような方法で算出されてもよい。 In step S410, the evaluation value calculation unit 554 calculates a plurality of evaluation values for each target value of the plurality of prediction results from the plurality of prediction results decoded by the terminal device 400. Specifically, similarly to the evaluation value calculation section 153, the evaluation value calculation section 554 calculates the distance between the target value set in step S100 and the prediction result (prediction value). Note that in this embodiment, the absolute difference is calculated as a distance, as in the first embodiment, but any method used to calculate it can be used as long as it is a value that indicates how much the target value and the predicted value differ. good.

ステップＳ４１１において、演算装置５００の次世代生成部２５２は、評価値算出部５５４から、暗号化された、複数の親データと、当該親データと対応関係にある評価値とを取得する。次世代生成部２５２は、複数の親データと、当該親データと対応関係にある評価値とに基づいて、引き続き、ステップＳ１１２において、次世代の生成を行う。以降の処理は、実施形態１（図１３）において説明したのと同様である。 In step S411, the next generation generation unit 252 of the arithmetic device 500 acquires a plurality of encrypted parent data and evaluation values corresponding to the parent data from the evaluation value calculation unit 554. The next generation generation unit 252 continues to generate the next generation in step S112 based on the plurality of parent data and the evaluation values that correspond to the parent data. The subsequent processing is similar to that described in Embodiment 1 (FIG. 13).

なお、本実施形態に係るデータ生成システム４では、特定困難化処理を、予測モデルの入力（予測対象データ）と出力（予測結果）との対応関係の並び替えにより実現した。しかし、この他にも、特定困難化処理として、実施形態２で説明した「ノイズを付加する処理」や、実施形態３で説明した、「対応関係を並び替える処理、またはノイズを付加する処理の組み合わせ」で実現することも可能である。 In the data generation system 4 according to the present embodiment, the identification difficulty processing is realized by rearranging the correspondence between the input (prediction target data) and the output (prediction result) of the prediction model. However, in addition to this, as processing to make identification difficult, there is also the "processing to add noise" explained in the second embodiment, the "processing to rearrange the correspondence relationships, or the process to add noise" explained in the third embodiment. It is also possible to realize this by a combination.

（効果の説明）
本実施形態では、端末装置４００においてデータの暗復号を行い、演算装置５００において、データの暗復号以外のデータに関する演算処理を集約する。これにより、より効率的に演算処理を実行することができる。 (Explanation of effects)
In this embodiment, data is encrypted and decrypted in the terminal device 400, and arithmetic processing related to data other than data encryption and decryption is consolidated in the arithmetic device 500. Thereby, calculation processing can be executed more efficiently.

上記実施形態は、その他の様々な形態で実施されることが可能であり、発明の要旨を逸脱しない範囲で種々の省略、置き換え、変更を行うことができる。これら実施形態やその変形は、発明の範囲や要旨に含まれると同様に、特許請求の範囲に記載された発明とその均等の範囲に含まれるものとする。 The embodiments described above can be implemented in various other forms, and various omissions, substitutions, and changes can be made without departing from the gist of the invention. These embodiments and their modifications are included within the scope and gist of the invention as well as within the scope of the invention described in the claims and its equivalents.

１，２，３データ生成システム、１００，４００端末装置、２００，３００，５００演算装置、１１０，２１０通信部、１２０，２２０入力部、１３０，２３０出力部、１４０，２４０記憶部、１４１鍵記憶部、１４２材料データベース、１５０，２５０，３５０，４５０制御部、１５１鍵生成部、１５２暗復号部、１５３，５５４評価値算出部、２５１予測部、２５２次世代生成部、２５３，３５３特定困難化部。 1, 2, 3 data generation system, 100,400 terminal device, 200,300,500 arithmetic unit, 110,210 communication unit, 120,220 input unit, 130,230 output unit, 140,240 storage unit, 141 key storage part, 142 material database, 150,250,350,450 control unit, 151 key generation unit, 152 encryption/decryption unit, 153,554 evaluation value calculation unit, 251 prediction unit, 252 next generation generation unit, 253,353 identification difficulty Department.

Claims

a user terminal having a control unit and a user storage unit storing a plurality of parent data;
an arithmetic device having a cryptographic operation unit that performs an operation involving data encrypted using a homomorphic encryption method; and an encrypted data storage unit that stores a machine learning model that outputs a prediction result from the parent data;
A method used in a system comprising:
the control unit generating a public key and a private key;
a step in which the control unit encrypts the plurality of parent data acquired from the user storage unit using a homomorphic encryption method using the public key;
a next-generation generation step in which the cryptographic operation unit generates a plurality of child data based on a plurality of prediction results outputted by inputting the plurality of encrypted parent data to the machine learning model;
an updating step in which the encrypted data storage unit stores the plurality of generated child data as the plurality of encrypted parent data;
repeating the next generation generation step and the update step;
Equipped with
The next generation generation step includes executing, on the plurality of output prediction results, an identification difficulty process that makes it difficult to specify the plurality of prediction results corresponding to the plurality of parent data.

The next generation generation step includes:
the cryptographic operation unit acquiring the plurality of parent data encrypted by the control unit;
a step in which the cryptographic operation unit inputs the acquired plurality of parent data into the machine learning model and outputs a plurality of prediction results corresponding to each of the parent data;
a step in which the cryptographic operation unit performs identification difficulty processing on the plurality of output prediction results to make it difficult to specify the plurality of prediction results corresponding to the plurality of parent data;
a step in which the control unit obtains the plurality of prediction results on which the identification difficulty processing has been performed, and decrypts them using the secret key;
The control unit calculates a plurality of evaluation values for each target value of the plurality of prediction results from the plurality of decoded prediction results on which the identification difficulty processing has been performed;
a step in which the cryptographic operation unit obtains the plurality of calculated evaluation values;
the cryptographic calculation unit canceling the identification difficulty processing for the plurality of prediction results corresponding to the plurality of acquired evaluation values;
a step in which the cryptographic calculation unit obtains a plurality of evaluation values corresponding to the plurality of parent data via the plurality of prediction results for which the identification difficulty processing has been canceled;
a step in which the cryptographic operation unit extracts the plurality of parent data corresponding to the evaluation value satisfying a predetermined condition;
the cryptographic operation unit applying a genetic algorithm to the extracted plurality of parent data to generate a plurality of child data;
The data generation method according to claim 1, comprising:

The next generation generation step includes:
the cryptographic operation unit acquiring the plurality of parent data encrypted by the control unit;
a step in which the cryptographic operation unit inputs the acquired plurality of parent data into the machine learning model and outputs a plurality of prediction results corresponding to each of the parent data;
a step in which the cryptographic operation unit performs identification difficulty processing on the plurality of output prediction results to make it difficult to specify the plurality of prediction results corresponding to the plurality of parent data;
a step in which the control unit obtains the plurality of prediction results on which the identification difficulty processing has been performed, and decrypts them using the secret key;
a step in which the cryptographic calculation unit obtains the plurality of decrypted prediction results on which the identification difficulty processing has been performed;
a step in which the cryptographic calculation unit cancels the identification difficulty processing for the plurality of acquired prediction results on which the identification difficulty processing has been performed;
a step in which the cryptographic calculation unit calculates a plurality of evaluation values corresponding to the plurality of parent data from the plurality of prediction results from which the identification difficulty processing has been canceled;
a step in which the cryptographic operation unit extracts the plurality of parent data corresponding to the evaluation value satisfying a predetermined condition;
the cryptographic operation unit applying a genetic algorithm to the extracted plurality of parent data to generate a plurality of child data;
The data generation method according to claim 1, comprising:

The plurality of parent data is a data set in which records including each value corresponding to a plurality of explanatory variables are arranged in a predetermined order,
The next generation generation step includes:
a step in which the cryptographic operation unit performs the identification difficulty processing on explanatory variables of the plurality of generated child data;
a step in which the control unit obtains a plurality of child data on which the identification difficulty processing has been performed, and decrypts the plurality of child data using the private key;
the control unit re-encrypting the plurality of decrypted child data using a homomorphic encryption method using the public key;
the cryptographic operation unit acquiring the plurality of re-encrypted child data;
the cryptographic operation unit canceling the identification difficulty processing for the explanatory variables of the acquired plurality of child data;
further including;
In the updating step, the encrypted data storage unit stores the plurality of child data from which the identification difficulty processing has been canceled as the plurality of encrypted parent data.
The data generation method according to claim 2 or claim 3.

4. The data generation method according to claim 2, wherein the identification difficulty process is a process of rearranging correspondence relationships.

4. The data generation method according to claim 2, wherein the identification difficulty process is a process of adding noise.

4. The data generation method according to claim 2, wherein the identification difficulty processing is a combination of processing for rearranging correspondence relationships or processing for adding noise.

The control unit further comprises a step of acquiring a plurality of child data generated by performing the repeating step a plurality of times and a prediction result corresponding to the child data, and decrypting the same using the private key. The data generation method according to any one of claims 1 to 3.

The data generation method according to claim 8, further comprising a step in which the control unit receives the number of times the repeating step is executed.

a user terminal having a control unit and a user storage unit storing a plurality of parent data;
an arithmetic device having a cryptographic operation unit that performs an operation involving data encrypted using a homomorphic encryption method; and an encrypted data storage unit that stores a machine learning model that outputs a prediction result from the parent data;
A program used in a system comprising,
the control unit generating a public key and a private key;
a step in which the control unit encrypts the plurality of parent data acquired from the user storage unit using a homomorphic encryption method using the public key;
a next-generation generation step in which the cryptographic operation unit generates a plurality of child data based on a plurality of prediction results outputted by inputting the plurality of encrypted parent data to the machine learning model;
an updating step in which the encrypted data storage unit stores the plurality of generated child data as the plurality of encrypted parent data;
repeating the next generation generation step and the update step;
Equipped with
The next generation generation step includes executing, on the plurality of output prediction results, an identification difficulty process that makes it difficult to specify the plurality of prediction results corresponding to the plurality of parent data.

a user terminal having a control unit and a user storage unit storing a plurality of parent data;
an arithmetic device having a cryptographic operation unit that performs an operation involving data encrypted using a homomorphic encryption method; and an encrypted data storage unit that stores a machine learning model that outputs a prediction result from the parent data;
A system comprising:
the control unit generating a public key and a private key;
a step in which the control unit encrypts the plurality of parent data acquired from the user storage unit using a homomorphic encryption method using the public key;
a next-generation generation step in which the cryptographic operation unit generates a plurality of child data based on a plurality of prediction results outputted by inputting the plurality of encrypted parent data to the machine learning model;
an updating step in which the encrypted data storage unit stores the plurality of generated child data as the plurality of encrypted parent data;
repeating the next generation generation step and the update step;
Equipped with
The next generation generation step includes executing, on the plurality of output prediction results, a process for making it difficult to specify the plurality of prediction results corresponding to the plurality of parent data.