JP7315843B2

JP7315843B2 - Generating method, generating program, and information processing device

Info

Publication number: JP7315843B2
Application number: JP2019185420A
Authority: JP
Inventors: 郁也森川
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2019-10-08
Filing date: 2019-10-08
Publication date: 2023-07-27
Anticipated expiration: 2039-10-08
Also published as: JP2021060872A

Description

本発明は、生成方法、生成プログラム、および情報処理装置に関する。 The present invention relates to a generation method, a generation program, and an information processing apparatus.

コンピュータシステムには、収集した情報に基づいて機械学習を行う機械学習システムがある。機械学習システムは、機械学習により、例えば情報を分析するための訓練済みモデルを生成する。そして機械学習システムは、生成した訓練済みモデルを用いて、情報分析などのサービスを提供することができる。 Computer systems have machine learning systems that perform machine learning based on collected information. Machine learning systems generate trained models for analyzing information, for example, through machine learning. Machine learning systems can then use the generated trained models to provide services such as information analysis.

なお、機械学習システムに対して悪意のある第三者から攻撃が仕掛けられることがある。機械学習システムへの攻撃は、機械学習の仕組みを利用して行われる。機械学習システムへの攻撃を目的として機械学習の仕組みを利用することを敵対的機械学習と呼ぶこともある。 Attacks may be launched by malicious third parties against machine learning systems. Attacks on machine learning systems are carried out using machine learning mechanisms. The use of machine learning mechanisms to attack machine learning systems is sometimes called adversarial machine learning.

機械学習に対する攻撃は、さまざまな目的で行われる。例えばクエリデータを操作して出力データを誤らせることを目的とした、敵対的サンプル（Adversarial Example）と呼ばれる攻撃がある。またクエリデータを操作して訓練データを導出することを目的とした、訓練データ推定（Model Inversion）と呼ばれる攻撃もある。さらに訓練データを操作して誤った出力データを出すモデルを訓練させることを目的とした、ポイズニングと呼ばれる攻撃もある。そこで、これらの攻撃から機械学習システムを適切に保護することが重要となる。 Attacks against machine learning serve a variety of purposes. For example, there is an attack called Adversarial Example that aims to manipulate query data to mislead output data. There is also an attack called model inversion, which aims to derive training data by manipulating query data. Another attack, called poisoning, aims to manipulate the training data to train a model that produces incorrect output data. Therefore, it is important to properly protect machine learning systems from these attacks.

機械学習を行うコンピュータの保護に関する技術としては、例えば所定のビジネスに関する情報処理を実行するシステムの脆弱性に関するリスクを評価する脆弱性リスク評価システムが提案されている。また、機械学習の結果を利用した処理を安全に実行することができる情報処理システムも提案されている。さらにクライアント装置とサーバ装置間の通信を深層学習が完了するまで保持する必要がなく、最新の学習結果をサーバ装置からクライアント装置にインターネットを通じてセキュアに提供する深層学習自動学習システムも提案されている。 As a technology related to the protection of computers that perform machine learning, for example, a vulnerability risk assessment system that evaluates the risk related to the vulnerability of a system that executes information processing related to a given business has been proposed. An information processing system has also been proposed that can safely execute processing using the results of machine learning. Furthermore, there is proposed a deep learning automatic learning system that securely provides the latest learning results from the server device to the client device through the Internet without maintaining the communication between the client device and the server device until the deep learning is completed.

特開２０１７－２２４０５３号公報JP 2017-224053 A 特開２０１９－１２１１４１号公報JP 2019-121141 A 特開２０１８－１９０２３９号公報JP 2018-190239 A

Ian J. Goodfellow, Jonathon Shlens, Christian Szegedy, "Explaining and Harnessing Adversarial Examples", ICLR 2015, arXiv:1412.6572v3, 20 March 2015Ian J. Goodfellow, Jonathon Shlens, Christian Szegedy, "Explaining and Harnessing Adversarial Examples", ICLR 2015, arXiv:1412.6572v3, 20 March 2015 Matt Fredrikson, Somesh Jha, Thomas Ristenpart, "Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures", CCS '15 Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, 12 October 2015, Pages 1322-1333Matt Fredrikson, Somesh Jha, Thomas Ristenpart, "Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures", CCS '15 Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, 12 October 2015, Pages 1322-1333 Luis Munoz-Gonzalez, Battista Biggio, Ambra Demontis, Andrea Paudice, Vasin Wongrassamee, Emil C. Lupu, Fabio Roli, "Towards Poisoning of Deep Learning Algorithms with Back-gradient Optimization", AISec '17 Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, 3 November 2017, Pages 27-38Luis Munoz-Gonzalez, Battista Biggio, Ambra Demontis, Andrea Paudice, Vasin Wongrassamee, Emil C. Lupu, Fabio Roli, "Towards Poisoning of Deep Learning Algorithms with Back-gradient Optimization", AISec '17 Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, 3 November 2017, Pages 27-38

攻撃から機械学習システムを適切に保護するには、その機械学習システムに対して、どのような攻撃が行われる可能性が高いのかを予め知っておくことが有効となる。例えば、可能性の高い攻撃手法が分かれば、その攻撃に対する対応策を優先的に実施することで、機械学習システムの安全性を高めることができる。しかし、機械学習システムへの攻撃の目的はさまざまであり、攻撃の目的ごとに攻撃の手法や攻撃に利用されるデータも異なる。そのため機械学習システムへの外部からの攻撃の可能性を正しく評価するのは容易ではない。 In order to properly protect a machine learning system from attacks, it is effective to know in advance what kinds of attacks are likely to be made against the machine learning system. For example, if a highly probable attack method is known, the security of the machine learning system can be enhanced by prioritizing countermeasures against that attack. However, the objectives of attacks on machine learning systems vary, and the methods of attack and the data used in attacks differ for each objective. Therefore, it is not easy to correctly assess the possibility of external attacks on machine learning systems.

１つの側面では、本件は、機械学習システムへの攻撃の可能性を示す情報を容易に生成できるようにすることを目的とする。 In one aspect, the present invention aims to facilitate the generation of information indicating possible attacks on machine learning systems.

１つの案では、コンピュータは、クエリに応じた出力データを規定する訓練済みモデルの入手可能性を示す情報、機械学習による訓練済みモデルの訓練に用いられる訓練データの入手可能性を示す情報、および訓練済みモデルが受け付けるクエリ数の制限値を示す情報のうち、少なくともいずれか１つを含む第１の情報、および訓練済みモデルが受け付けるクエリの操作可能性を示す情報、および訓練データの操作可能性を示す情報のうち、少なくともいずれか１つを含む第２の情報を受け付ける。コンピュータは、受け付けた第１の情報に基づき、１または複数のアクセス形態それぞれによる訓練済みモデルへのアクセスが、攻撃者によって行われる可能性を示す第３の情報を生成する。そしてコンピュータは、生成した第３の情報と受け付けた第２の情報とに基づき、所定の攻撃目的を達成するために１または複数のアクセス形態それぞれにより訓練済みモデルへの攻撃が行われる可能性を示す第４の情報を生成する。 In one version, the computer provides information indicating the availability of a trained model that defines output data in response to a query, information indicating the availability of training data used to train the trained model by machine learning, and First information including at least one of information indicating a limit value of the number of queries accepted by a trained model, information indicating operability of queries accepted by the trained model, and operability of training data second information including at least one of the information indicating Based on the received first information, the computer generates third information indicating the possibility that the trained model is accessed by an attacker through one or more access modes. Then, based on the generated third information and the received second information, the computer determines the possibility that the trained model will be attacked by one or more access modes to achieve a predetermined attack purpose. to generate the fourth information shown.

１態様によれば、機械学習システムへの攻撃の可能性を示す情報を容易に生成できる。 According to one aspect, it is possible to easily generate information indicating the possibility of an attack on a machine learning system.

機械学習システムへの攻撃の可能性を示す情報の生成方法の一例を示す図である。It is a figure which shows an example of the generation method of the information which shows the possibility of the attack to a machine-learning system. 機械学習システムを含むコンピュータシステムの一例を示す図である。1 illustrates an example of a computer system including a machine learning system; FIG. 攻撃リスク評価用のコンピュータのハードウェアの一構成例を示す図である。It is a figure which shows one structural example of the hardware of the computer for attack risk evaluation. 攻撃分析の手順の一例を示す図である。It is a figure which shows an example of the procedure of attack analysis. モデル化された機械学習システムの一例を示す図である。1 illustrates an example of a modeled machine learning system; FIG. モデル抽出の攻撃を説明する図である。It is a figure explaining the attack of model extraction. 敵対的サンプルの攻撃を説明する図である。FIG. 4 is a diagram illustrating an attack of adversarial samples; 訓練データ推定の攻撃を説明する図である。FIG. 4 is a diagram illustrating an attack on training data estimation; ポイズニングの攻撃を説明する図である。FIG. 10 is a diagram explaining a poisoning attack; 訓練済みモデルに対するアクセス性に応じた攻撃手法を説明する図である。It is a figure explaining the attack method according to the accessibility with respect to a trained model. 機械学習システムに関する攻撃ツリーの一例を示す図である。FIG. 2 is a diagram showing an example of an attack tree for a machine learning system; FIG. 攻撃リスクの概算機能を示すブロック図である。FIG. 4 is a block diagram showing an attack risk approximation function; データ特性管理テーブルの一例を示す図である。It is a figure which shows an example of a data characteristic management table. 尤度管理テーブルの一例を示す図である。It is a figure which shows an example of a likelihood management table. 攻撃リスク概算処理の手順の一例を示すフローチャートである。FIG. 11 is a flow chart showing an example of a procedure of attack risk approximation processing; FIG. 尤度算出処理の手順の一例を示すフローチャートである。7 is a flowchart illustrating an example of a procedure of likelihood calculation processing; 攻撃ツリーを用いた尤度の算出過程を示す図である。It is a figure which shows the calculation process of the likelihood using an attack tree. 尤度計算処理で用いる変数群の一例を示す図である。It is a figure which shows an example of the variable group used by likelihood calculation processing. 尤度計算プログラムの擬似コードの一例を示す図である。FIG. 10 is a diagram showing an example of pseudo code of a likelihood calculation program; 攻撃リスクの算出方法の一例を示す図である。It is a figure which shows an example of the calculation method of an attack risk. 攻撃リスク管理画面の一例を示す図である。It is a figure which shows an example of an attack risk management screen. 攻撃リスク詳細画面の一例を示す図である。It is a figure which shows an example of an attack risk detail screen. 計算詳細表示部の表示例を示す図である。It is a figure which shows the example of a display of a calculation details display part. 攻撃ツリーの構造定義情報の一例を示す図である。It is a figure which shows an example of the structure definition information of an attack tree.

以下、本実施の形態について図面を参照して説明する。なお各実施の形態は、矛盾のない範囲で複数の実施の形態を組み合わせて実施することができる。
〔第１の実施の形態〕
まず第１の実施の形態について説明する。第１の実施の形態は、機械学習システムへの攻撃の可能性を示す情報の生成方法を、情報処理装置によって実現するものである。 Hereinafter, this embodiment will be described with reference to the drawings. It should be noted that each embodiment can be implemented by combining a plurality of embodiments within a consistent range.
[First embodiment]
First, a first embodiment will be described. In the first embodiment, an information processing apparatus realizes a method of generating information indicating the possibility of an attack on a machine learning system.

図１は、機械学習システムへの攻撃の可能性を示す情報の生成方法の一例を示す図である。情報処理装置１０は、例えば機械学習システムへの攻撃の可能性を示す情報の生成方法の処理手順が記述された生成プログラムを実行することにより、機械学習システムへの攻撃の可能性を示す情報を生成することができる。 FIG. 1 is a diagram showing an example of a method of generating information indicating the possibility of attack on a machine learning system. The information processing apparatus 10 generates information indicating the possibility of an attack on the machine learning system, for example, by executing a generation program that describes a processing procedure of a method for generating information indicating the possibility of an attack on the machine learning system. can be generated.

情報処理装置１０は、例えば記憶部１１と処理部１２とを有するコンピュータである。記憶部１１は、例えば情報処理装置１０が有するメモリ、またはストレージ装置である。処理部１２は、例えば情報処理装置１０が有するプロセッサ、または演算回路である。 The information processing device 10 is a computer having a storage unit 11 and a processing unit 12, for example. The storage unit 11 is, for example, a memory included in the information processing device 10 or a storage device. The processing unit 12 is, for example, a processor or an arithmetic circuit included in the information processing device 10 .

処理部１２は、分析者３による、機械学習システム１で使用するデータについての特徴を示す第１の情報４と第２の情報５との入力を受け付ける。処理部１２は、例えば入力された第１の情報４と第２の情報５とを、記憶部１１に格納する。 The processing unit 12 receives inputs of first information 4 and second information 5 indicating characteristics of data used in the machine learning system 1 by the analyst 3 . The processing unit 12 stores, for example, the input first information 4 and second information 5 in the storage unit 11 .

第１の情報４は、次の３つの情報のうちの少なくともいずれか１つを含む。第１の情報４に含まれることがある１つ目の情報は、クエリデータ（以下、単にクエリと呼ぶ）に応じた出力データを規定する訓練済みモデル１ａの攻撃者２による入手可能性を示す情報である。第１の情報４に含まれることがある２つ目の情報は、機械学習による訓練済みモデル１ａの訓練に用いられる訓練データの攻撃者２による入手可能性を示す情報である。第１の情報４に含まれることがある３つ目の情報は、訓練済みモデル１ａが受け付けるクエリ数の制限値を示す情報である。クエリ数の制限値は、例えば単位期間内に機械学習システム１が入力を受け付けることができるクエリ数の上限値である。 The first information 4 includes at least one of the following three pieces of information. A first piece of information, which may be included in the first piece of information 4, indicates the availability by the attacker 2 of trained models 1a that define output data in response to query data (hereinafter simply referred to as queries). Information. A second piece of information that may be included in the first information 4 is information indicating the availability by the attacker 2 of training data used to train the trained model 1a by machine learning. A third piece of information that may be included in the first information 4 is information indicating a limit value for the number of queries that the trained model 1a accepts. The limit value of the number of queries is, for example, the upper limit of the number of queries that the machine learning system 1 can accept input within a unit period.

第２の情報５は、次の２つの情報のうちの少なくともいずれか１つを含む。第２の情報５に含まれることがある１つ目の情報は、訓練済みモデル１ａが受け付けるクエリの攻撃者２による操作可能性を示す情報である。第２の情報５に含まれることがある２つ目の情報は、訓練データの攻撃者２による操作可能性を示す情報である。 The second information 5 includes at least one of the following two pieces of information. The first information that may be included in the second information 5 is information indicating the possibility of manipulation by the attacker 2 of the queries that the trained model 1a receives. A second piece of information that may be included in the second information 5 is information indicating the manipulability of the training data by the attacker 2 .

処理部１２は、第１の情報４と第２の情報５の入力を受け付けると、受け付けた第１の情報４に基づき、１または複数のアクセス形態それぞれによる訓練済みモデル１ａへのアクセスが、攻撃者２によって行われる可能性を示す第３の情報６を生成する。１または複数のアクセス形態は、例えば以下の第１～第３のアクセス形態のうちの少なくとも１つを含む。 When the processing unit 12 receives the input of the first information 4 and the second information 5, based on the received first information 4, access to the trained model 1a by one or a plurality of access modes is considered as an attack. Third information 6 indicating the possibility of being performed by person 2 is generated. The one or more forms of access include, for example, at least one of the following first to third forms of access.

第１のアクセス形態は、訓練済みモデル１ａ内の情報に対するアクセスである。アクセス対象の訓練済みモデル１ａ内の情報は、例えばニューラルネットワークにおける重みパラメータの値などである。攻撃者２による第１のアクセス形態によるアクセスは、例えばホワイトボックス攻撃と呼ばれる。処理部１２は、第１のアクセス形態によるアクセスが行われる可能性を、例えば訓練済みモデル１ａの入手可能性を示す情報に基づいて算出する。 The first form of access is access to information within the trained model 1a. The information in the trained model 1a to be accessed is, for example, the values of weight parameters in the neural network. Access by the attacker 2 according to the first access mode is called a white box attack, for example. The processing unit 12 calculates the possibility of access by the first access mode based on information indicating availability of the trained model 1a, for example.

第２のアクセス形態は、訓練済みモデル１ａへのクエリの入力を行うアクセス形態である。第２のアクセス形態では訓練済みモデル１ａ内の情報へはアクセスされないため、攻撃者２による第２のアクセス形態によるアクセスは、ブラックボックス攻撃と呼ばれる。処理部１２は、第２のアクセス形態によるアクセスが行われる可能性を、例えば訓練済みモデル１ａの入手可能性を示す情報と訓練データの入手可能性を示す情報とに基づいて算出する。 The second access form is an access form for inputting a query to the trained model 1a. Access by the second form of access by the attacker 2 is called a black box attack, as the information in the trained model 1a is not accessed in the second form of access. The processing unit 12 calculates the possibility of access by the second access form, for example, based on information indicating availability of the trained model 1a and information indicating availability of training data.

第３のアクセス形態は、訓練済みモデル１ａへのクエリの入力または訓練データの入手によって生成した代理モデル内の情報に対するアクセスである。代理モデルと訓練済みモデル１ａとの類似性が高ければ、攻撃者２による代理モデル内の情報に対するアクセスは、訓練済みモデル１ａ内の情報に対してアクセスされるのと同様に、機械学習システム１における攻撃の危険性を有している。このような攻撃者２による代理モデル内の情報に対するアクセスは、例えば代理モデルを介した攻撃と呼ばれる。処理部１２は、第３のアクセス形態によるアクセスが行われる可能性を、例えば訓練済みモデル１ａの入手可能性を示す情報と訓練データの入手可能性を示す情報とクエリ数の制限値を示す情報とに基づいて算出する。 A third form of access is access to information in a surrogate model generated by entering a query or obtaining training data for a trained model 1a. If the similarity between the surrogate model and the trained model 1a is high, access to the information in the surrogate model by the attacker 2 will affect the machine learning system 1 in the same way as the information in the trained model 1a is accessed. at risk of attack in Such access by the attacker 2 to information in the proxy model is called an attack via the proxy model, for example. The processing unit 12 determines the possibility of access by the third access mode, for example, information indicating the availability of the trained model 1a, information indicating the availability of training data, and information indicating the limit value of the number of queries. Calculated based on

処理部１２は、第３の情報６を生成すると、生成した第３の情報６と受け付けた第２の情報５とに基づき、第４の情報７ａ，７ｂ，・・・を生成する。第４の情報７ａ，７ｂ，・・・は、所定の攻撃目的を達成するために１または複数のアクセス形態それぞれにより訓練済みモデル１ａへの攻撃が行われる可能性を示す情報である。例えば第４の情報７ａの所定の攻撃目的は敵対的サンプルである。敵対的サンプルについては、第２の実施の形態において詳細に説明する（図７参照）。また第４の情報７ｂの所定の攻撃目的はポイズニングである。ポイズニングについては、第２の実施の形態において詳細に説明する（図９参照）。 After generating the third information 6, the processing unit 12 generates fourth information 7a, 7b, . . . based on the generated third information 6 and the received second information 5. FIG. Fourth information 7a, 7b, . For example, the predetermined attack objective of the fourth information 7a is adversarial samples. Adversarial samples are described in detail in the second embodiment (see FIG. 7). The predetermined attack purpose of the fourth information 7b is poisoning. Poisoning will be described in detail in the second embodiment (see FIG. 9).

このようにして分析者３が機械学習システム１で使用するデータについての特徴を示す第１の情報４と第２の情報５とを情報処理装置１０に入力すると、処理部１２によって第４の情報７ａ，７ｂ，・・・が自動で生成される。すなわち機械学習システム１への攻撃の可能性を示す情報を容易に生成できる。 In this way, when the analyst 3 inputs the first information 4 and the second information 5 indicating the characteristics of the data used in the machine learning system 1 to the information processing device 10, the processing unit 12 generates the fourth information 7a, 7b, . . . are automatically generated. That is, information indicating the possibility of attack on the machine learning system 1 can be easily generated.

さらに分析者３は、第４の情報７ａ，７ｂ，・・・に基づいて、機械学習システム１の攻撃に対する脆弱性を容易に評価できる。すなわち第４の情報７ａ，７ｂ，・・・には、攻撃目的ごとに、攻撃者２が、どのようなアクセス手法により訓練済みモデル１ａにアクセスし、目的の攻撃を実行する可能性があるのかが示されている。処理部１２により、ある種の攻撃を受ける可能性があると判断されたということは、機械学習システム１がその攻撃に対する脆弱性を有していることを意味する。そのため分析者３は、第４の情報７ａ，７ｂ，・・・を参照することで、機械学習システム１がどのような攻撃に対して脆弱性を有しているのかを認識できる。 Furthermore, the analyst 3 can easily evaluate the vulnerability of the machine learning system 1 to attacks based on the fourth information 7a, 7b, . That is, the fourth information 7a, 7b, . It is shown. The fact that the processing unit 12 has determined that there is a possibility of being attacked means that the machine learning system 1 is vulnerable to the attack. Therefore, the analyst 3 can recognize what attacks the machine learning system 1 is vulnerable to by referring to the fourth information 7a, 7b, . . .

機械学習システム１が有する脆弱性が具体的に分かれば、分析者３は、その脆弱性を低下させるように機械学習システム１を運用することで、機械学習システム１の安全性を向上させることができる。 If the vulnerability of the machine learning system 1 is specifically known, the analyst 3 can improve the security of the machine learning system 1 by operating the machine learning system 1 so as to reduce the vulnerability. can.

なお、攻撃者２による訓練済みモデル１ａへのアクセス形態は、攻撃者２がどのようなデータを入手可能なのか、または機械学習システム１がどの程度の量のクエリを受け付け可能なのかによって変わる。そこで処理部１２は、アクセス形態ごとに、第１の情報４内の適切な情報を用いて、そのアクセス形態でアクセスされる可能性を算出する。これにより、アクセス形態ごとのアクセスの可能性を高精度に算出することができる。 The form of access by the attacker 2 to the trained model 1a changes depending on what kind of data the attacker 2 can obtain or how much query the machine learning system 1 can accept. Therefore, the processing unit 12 uses appropriate information in the first information 4 for each access mode to calculate the possibility of being accessed in that access mode. As a result, the possibility of access for each access mode can be calculated with high accuracy.

処理部１２は、第４の情報７ａ，７ｂ，・・・として、アクセス形態ごとの攻撃の可能性の高さを示す数値を算出することができる。例えば処理部１２は、敵対的サンプルを目的とする攻撃の可能性の高さを算出することができる。この場合、処理部１２は、訓練済みモデル１ａの入手可能性の高さを示す数値、訓練データの入手可能性の高さを示す数値、および受け付けるクエリ数の制限値の高さを示す数値のうち、少なくともいずれか１つを第１の情報４として取得する。また処理部１２は、訓練済みモデル１ａが受け付けるクエリの操作可能性の高さを示す数値を、第２の情報５として取得する。次に処理部１２は、第３の情報６として、１または複数のアクセス形態それぞれにより訓練済みモデル１ａへのアクセスが行われる可能性の高さを示す数値を算出する。そして処理部１２は、第２の情報５に示される数値と、第３の情報６に示される一アクセス形態の数値との小さい方の数値を、その一アクセス形態による攻撃の可能性を示す数値として第４の情報７ａに含める。 The processing unit 12 can calculate a numerical value indicating the probability of attack for each access mode as the fourth information 7a, 7b, . . . For example, the processing unit 12 can calculate the likelihood of an attack aimed at a hostile sample. In this case, the processing unit 12 sets a numerical value indicating the availability of the trained model 1a, a numerical value indicating the availability of the training data, and a numerical value indicating the limit of the number of queries to be accepted. At least one of them is acquired as the first information 4 . In addition, the processing unit 12 acquires, as the second information 5, a numerical value indicating the degree of operability of queries received by the trained model 1a. Next, the processing unit 12 calculates, as the third information 6, a numerical value indicating the probability that the trained model 1a will be accessed by one or a plurality of access modes. Then, the processing unit 12 converts the smaller numerical value of the numerical value indicated in the second information 5 and the numerical value of the one access mode indicated in the third information 6 to a numerical value indicating the possibility of an attack by that one access mode. is included in the fourth information 7a.

このように、アクセス形態ごとのそのアクセス形態による攻撃の可能性が数値で示されることで、分析者３は、攻撃に対する対策の優先順を容易に判断することができる。すなわち、機械学習システム１の管理者が、攻撃の可能性を示す数値が高いアクセス形態に対する対応策を優先的に実施することで、攻撃による被害の発生を効率的に抑止することができる。 In this way, by numerically indicating the possibility of an attack for each access form, the analyst 3 can easily determine the order of priority of countermeasures against attacks. In other words, the administrator of the machine learning system 1 can efficiently prevent damage caused by an attack by preferentially implementing countermeasures against access forms with high numerical values indicating the possibility of attack.

また処理部１２は、生成した第３の情報６と、受け付けた第１の情報４および第２の情報５とに基づき第４の情報７ａ，７ｂ，・・・を生成することもできる。第４の情報７ａ，７ｂ，・・・の生成に第１の情報４も利用することで、第４の情報７ａ，７ｂ，・・・の精度を向上させることができる。この場合にも、処理部１２は、アクセス形態ごとの攻撃の可能性を数値で示すことができる。 The processing unit 12 can also generate fourth information 7a, 7b, . . . based on the generated third information 6 and the received first information 4 and second information 5. FIG. By also using the first information 4 to generate the fourth information 7a, 7b, . . . , the accuracy of the fourth information 7a, 7b, . In this case as well, the processing unit 12 can numerically indicate the possibility of attack for each access mode.

例えば処理部１２は、ポイズニングを目的とする攻撃についてのアクセス形態ごとの攻撃の可能性を数値で示すことができる。この場合、処理部１２は、訓練済みモデル１ａの入手可能性の高さを示す数値、訓練データの入手可能性の高さを示す数値、および受け付けるクエリ数の制限値の高さを示す数値のうち、少なくともいずれか１つを第１の情報４として取得する。また処理部１２は、訓練データの操作可能性の高さを示す数値を第２の情報５として取得する。次に処理部１２は、第３の情報６として、１または複数のアクセス形態それぞれにより訓練済みモデル１ａへのアクセスが行われる可能性の高さを示す数値を算出する。そして処理部１２は、第１の情報４に示される１または複数の数値のうちの最大値と第２の情報５に示される数値と第３の情報６に示される一アクセス形態の数値とのうちの最小値を、一アクセス形態による攻撃の可能性を示す数値として第４の情報７ｂに含める。 For example, the processing unit 12 can numerically indicate the possibility of an attack aimed at poisoning for each access mode. In this case, the processing unit 12 sets a numerical value indicating the availability of the trained model 1a, a numerical value indicating the availability of the training data, and a numerical value indicating the limit of the number of queries to be accepted. At least one of them is acquired as the first information 4 . The processing unit 12 also acquires, as the second information 5, a numerical value indicating the degree of operability of the training data. Next, the processing unit 12 calculates, as the third information 6, a numerical value indicating the probability that the trained model 1a will be accessed by one or a plurality of access modes. Then, the processing unit 12 determines the maximum value among the one or more numerical values indicated in the first information 4, the numerical value indicated in the second information 5, and the numerical value of one access mode indicated in the third information 6. Among them, the minimum value is included in the fourth information 7b as a numerical value indicating the possibility of attack by one access mode.

攻撃者２によるポイズニングの攻撃には、訓練データを操作可能であることが重要であるが、訓練済みモデル１ａや訓練済みモデル１ａの生成に使用された訓練データなどを攻撃者２が入手できる可能性などの第１の情報４も重要である。そこでポイズニングを目的とする攻撃についてのアクセス形態ごとの攻撃の可能性を示す数値を、第１の情報４を利用して計算することで、精度の高い数値を算出することができる。 For a poisoning attack by attacker 2, it is important to be able to manipulate training data. Primary information 4, such as gender, is also important. Therefore, by using the first information 4 to calculate a numerical value indicating the possibility of an attack aimed at poisoning for each access form, a highly accurate numerical value can be calculated.

〔第２の実施の形態〕
次に第２の実施の形態について説明する。第２の実施の形態は、敵対的機械学習に特有の攻撃の尤度を自動算出すると共に、攻撃目的種別ごとの尤度に応じて、機械学習システムのリスクを数値化して評価するものである。なお第２の実施の形態における尤度は、第１の実施の形態に示した第４の情報７ａ，７ｂ，・・・の一例である。 [Second embodiment]
Next, a second embodiment will be described. The second embodiment automatically calculates the likelihood of an attack peculiar to adversarial machine learning, and quantifies and evaluates the risk of the machine learning system according to the likelihood of each attack purpose type. . The likelihood in the second embodiment is an example of the fourth information 7a, 7b, . . . shown in the first embodiment.

図２は、機械学習システムを含むコンピュータシステムの一例を示す図である。機械学習システム３０は、例えばネットワーク２０を介して複数の利用者端末３１，３２，・・・に接続されている。機械学習システム３０は、例えば利用者端末３１，３２，・・・から送られたクエリについて学習済みのモデルを用いて解析し、解析結果を利用者端末３１，３２，・・・に送信する。利用者端末３１，３２，・・・は、機械学習によって生成されたモデルを用いたサービスの提供を受けるユーザが使用するコンピュータである。なお、機械学習システム３０は、例えば一定時間あたりに同一ユーザから受け付け可能なクエリ数の上限を設けることができる。このクエリ数の上限が高いほど、攻撃者は、多くのクエリを攻撃に利用することができる。 FIG. 2 is a diagram illustrating an example of a computer system including a machine learning system. The machine learning system 30 is connected to a plurality of user terminals 31, 32, . . . via the network 20, for example. The machine learning system 30, for example, analyzes queries sent from user terminals 31, 32, . . . using learned models, and transmits analysis results to user terminals 31, 32, . The user terminals 31, 32, . . . are computers used by users who receive services using models generated by machine learning. Note that the machine learning system 30 can set, for example, an upper limit on the number of queries that can be received from the same user per certain period of time. The higher the upper limit of the number of queries, the more queries an attacker can use in an attack.

ネットワーク２０には、さらに攻撃リスク評価用のコンピュータ１００が接続されている。機械学習システム３０を管理する分析者は、コンピュータ１００を用いて機械学習システム３０の敵対的機械学習に対する攻撃リスクを評価する。 A computer 100 for attack risk evaluation is also connected to the network 20 . An analyst who manages machine learning system 30 uses computer 100 to evaluate the attack risk of machine learning system 30 against adversarial machine learning.

図３は、攻撃リスク評価用のコンピュータのハードウェアの一構成例を示す図である。コンピュータ１００は、プロセッサ１０１によって装置全体が制御されている。プロセッサ１０１には、バス１０９を介してメモリ１０２と複数の周辺機器が接続されている。プロセッサ１０１は、マルチプロセッサであってもよい。プロセッサ１０１は、例えばＣＰＵ（Central Processing Unit）、ＭＰＵ（Micro Processing Unit）、またはＤＳＰ（Digital Signal Processor）である。プロセッサ１０１がプログラムを実行することで実現する機能の少なくとも一部を、ＡＳＩＣ（Application Specific Integrated Circuit）、ＰＬＤ（Programmable Logic Device）などの電子回路で実現してもよい。 FIG. 3 is a diagram showing a configuration example of hardware of a computer for attack risk evaluation. A computer 100 is entirely controlled by a processor 101 . A memory 102 and a plurality of peripheral devices are connected to the processor 101 via a bus 109 . Processor 101 may be a multiprocessor. The processor 101 is, for example, a CPU (Central Processing Unit), MPU (Micro Processing Unit), or DSP (Digital Signal Processor). At least part of the functions realized by the processor 101 executing the program may be realized by an electronic circuit such as an ASIC (Application Specific Integrated Circuit) or a PLD (Programmable Logic Device).

メモリ１０２は、コンピュータ１００の主記憶装置として使用される。メモリ１０２には、プロセッサ１０１に実行させるＯＳ（Operating System）のプログラムやアプリケーションプログラムの少なくとも一部が一時的に格納される。また、メモリ１０２には、プロセッサ１０１による処理に利用する各種データが格納される。メモリ１０２としては、例えばＲＡＭ（Random Access Memory）などの揮発性の半導体記憶装置が使用される。 Memory 102 is used as the main storage device of computer 100 . The memory 102 temporarily stores at least part of an OS (Operating System) program and application programs to be executed by the processor 101 . In addition, the memory 102 stores various data used for processing by the processor 101 . As the memory 102, for example, a volatile semiconductor memory device such as a RAM (Random Access Memory) is used.

バス１０９に接続されている周辺機器としては、ストレージ装置１０３、グラフィック処理装置１０４、入力インタフェース１０５、光学ドライブ装置１０６、機器接続インタフェース１０７およびネットワークインタフェース１０８がある。 Peripheral devices connected to the bus 109 include the storage device 103 , graphic processing device 104 , input interface 105 , optical drive device 106 , device connection interface 107 and network interface 108 .

ストレージ装置１０３は、内蔵した記録媒体に対して、電気的または磁気的にデータの書き込みおよび読み出しを行う。ストレージ装置１０３は、コンピュータの補助記憶装置として使用される。ストレージ装置１０３には、ＯＳのプログラム、アプリケーションプログラム、および各種データが格納される。なお、ストレージ装置１０３としては、例えばＨＤＤ（Hard Disk Drive）やＳＳＤ（Solid State Drive）を使用することができる。 The storage device 103 electrically or magnetically writes data to and reads data from a built-in recording medium. The storage device 103 is used as an auxiliary storage device for the computer. The storage device 103 stores an OS program, application programs, and various data. As the storage device 103, for example, an HDD (Hard Disk Drive) or an SSD (Solid State Drive) can be used.

グラフィック処理装置１０４には、モニタ２１が接続されている。グラフィック処理装置１０４は、プロセッサ１０１からの命令に従って、画像をモニタ２１の画面に表示させる。モニタ２１としては、有機ＥＬ（Electro Luminescence）を用いた表示装置や液晶表示装置などがある。 A monitor 21 is connected to the graphics processing unit 104 . The graphics processing unit 104 displays an image on the screen of the monitor 21 according to instructions from the processor 101 . Examples of the monitor 21 include a display device using an organic EL (Electro Luminescence), a liquid crystal display device, and the like.

入力インタフェース１０５には、キーボード２２とマウス２３とが接続されている。入力インタフェース１０５は、キーボード２２やマウス２３から送られてくる信号をプロセッサ１０１に送信する。なお、マウス２３は、ポインティングデバイスの一例であり、他のポインティングデバイスを使用することもできる。他のポインティングデバイスとしては、タッチパネル、タブレット、タッチパッド、トラックボールなどがある。 A keyboard 22 and a mouse 23 are connected to the input interface 105 . The input interface 105 transmits signals sent from the keyboard 22 and mouse 23 to the processor 101 . Note that the mouse 23 is an example of a pointing device, and other pointing devices can also be used. Other pointing devices include touch panels, tablets, touchpads, trackballs, and the like.

光学ドライブ装置１０６は、レーザ光などを利用して、光ディスク２４に記録されたデータの読み取りを行う。光ディスク２４は、光の反射によって読み取り可能なようにデータが記録された可搬型の記録媒体である。光ディスク２４には、ＤＶＤ（Digital Versatile Disc）、ＤＶＤ－ＲＡＭ、ＣＤ－ＲＯＭ（Compact Disc Read Only Memory）、ＣＤ－Ｒ（Recordable）／ＲＷ（ReWritable）などがある。 The optical drive device 106 reads data recorded on the optical disc 24 using laser light or the like. The optical disc 24 is a portable recording medium on which data is recorded so as to be readable by light reflection. The optical disc 24 includes DVD (Digital Versatile Disc), DVD-RAM, CD-ROM (Compact Disc Read Only Memory), CD-R (Recordable)/RW (ReWritable), and the like.

機器接続インタフェース１０７は、コンピュータ１００に周辺機器を接続するための通信インタフェースである。例えば機器接続インタフェース１０７には、メモリ装置２５やメモリリーダライタ２６を接続することができる。メモリ装置２５は、機器接続インタフェース１０７との通信機能を搭載した記録媒体である。メモリリーダライタ２６は、メモリカード２７へのデータの書き込み、またはメモリカード２７からのデータの読み出しを行う装置である。メモリカード２７は、カード型の記録媒体である。 The device connection interface 107 is a communication interface for connecting peripheral devices to the computer 100 . For example, the device connection interface 107 can be connected to the memory device 25 and the memory reader/writer 26 . The memory device 25 is a recording medium equipped with a communication function with the device connection interface 107 . The memory reader/writer 26 is a device that writes data to the memory card 27 or reads data from the memory card 27 . The memory card 27 is a card-type recording medium.

ネットワークインタフェース１０８は、ネットワーク２０に接続されている。ネットワークインタフェース１０８は、ネットワーク２０を介して、他のコンピュータまたは通信機器との間でデータの送受信を行う。 Network interface 108 is connected to network 20 . Network interface 108 transmits and receives data to and from other computers or communication devices via network 20 .

コンピュータ１００は、以上のようなハードウェア構成によって、第２の実施の形態の処理機能を実現することができる。機械学習システム３０および利用者端末３１，３２，・・・も、図３に示したコンピュータ１００と同様のハードウェアにより実現することができる。また、第１の実施の形態に示した情報処理装置１０も、図３に示したコンピュータ１００と同様のハードウェアにより実現することができる。 The computer 100 can implement the processing functions of the second embodiment with the hardware configuration described above. Machine learning system 30 and user terminals 31, 32, . . . can also be realized by hardware similar to computer 100 shown in FIG. The information processing apparatus 10 shown in the first embodiment can also be realized by hardware similar to the computer 100 shown in FIG.

コンピュータ１００は、例えばコンピュータ読み取り可能な記録媒体に記録されたプログラムを実行することにより、第２の実施の形態の処理機能を実現する。コンピュータ１００に実行させる処理内容を記述したプログラムは、様々な記録媒体に記録しておくことができる。例えば、コンピュータ１００に実行させるプログラムをストレージ装置１０３に格納しておくことができる。プロセッサ１０１は、ストレージ装置１０３内のプログラムの少なくとも一部をメモリ１０２にロードし、プログラムを実行する。またコンピュータ１００に実行させるプログラムを、光ディスク２４、メモリ装置２５、メモリカード２７などの可搬型記録媒体に記録しておくこともできる。可搬型記録媒体に格納されたプログラムは、例えばプロセッサ１０１からの制御により、ストレージ装置１０３にインストールされた後、実行可能となる。またプロセッサ１０１が、可搬型記録媒体から直接プログラムを読み出して実行することもできる。 The computer 100 implements the processing functions of the second embodiment by executing a program recorded in a computer-readable recording medium, for example. A program describing the processing content to be executed by the computer 100 can be recorded in various recording media. For example, a program to be executed by the computer 100 can be stored in the storage device 103 . The processor 101 loads at least part of the program in the storage device 103 into the memory 102 and executes the program. The program to be executed by the computer 100 can also be recorded in a portable recording medium such as the optical disc 24, memory device 25, memory card 27, or the like. A program stored in a portable recording medium can be executed after being installed in the storage device 103 under the control of the processor 101, for example. Alternatively, the processor 101 can read and execute the program directly from the portable recording medium.

機械学習システム３０の分析者は、コンピュータ１００を利用して、機械学習システム３０に対する攻撃分析を行う。
図４は、攻撃分析の手順の一例を示す図である。分析者は、まずシステムのモデル化を行う（ステップＳ１１）。次に分析者は、攻撃目的種別のリストアップを行う（ステップＳ１２）。次に分析者は、攻撃者から攻撃を受けるリスク（攻撃リスク）を概算する（ステップＳ１３）。攻撃リスクは、例えば攻撃の尤度と攻撃の影響度とを掛け合わせたものである。敵対的機械学習における容易の尤度は、攻撃の容易さを示す数値である。敵対的機械学習における攻撃の影響度は、その攻撃が現実に実行された場合に想定される被害の大きさである。最後に、分析者は、リスクに対する対策方針を決める（ステップＳ１４）。 An analyst of the machine learning system 30 uses the computer 100 to analyze attacks against the machine learning system 30 .
FIG. 4 is a diagram illustrating an example of an attack analysis procedure. The analyst first models the system (step S11). Next, the analyst makes a list of attack purpose types (step S12). Next, the analyst approximates the risk of being attacked by an attacker (attack risk) (step S13). An attack risk is, for example, a product of the likelihood of an attack and the impact of the attack. The likelihood of ease in adversarial machine learning is a numerical value that indicates the ease of attack. The impact of an attack in adversarial machine learning is the amount of damage that would be expected if the attack were actually executed. Finally, the analyst determines countermeasures against risks (step S14).

このように攻撃分析は４つのステップで行われる。このうち、攻撃リスクを概算する処理について、コンピュータ１００を用いて自動化することができる。そこで分析者は、機械学習システムの攻撃リスクの概算を行うため、機械学習システムのモデル化と、機械学システムに対する攻撃目的種別のリストアップとを行う。 Attack analysis is thus performed in four steps. Among these processes, the process of roughly estimating the attack risk can be automated using the computer 100 . Therefore, in order to roughly estimate the attack risk of the machine learning system, the analyst models the machine learning system and lists the types of attack objectives against the machine learning system.

図５は、モデル化された機械学習システムの一例を示す図である。図５に示すように、機械学習システム３０で行う機械学習は、訓練フェイズ４０と推論フェイズ５０とに分かれる。機械学習システム３０は、訓練フェイズ４０において、空のモデル４１に訓練データ４２を適用することによって、空のモデル４１に対する訓練を行う。 FIG. 5 is a diagram illustrating an example of a modeled machine learning system. As shown in FIG. 5, machine learning performed by the machine learning system 30 is divided into a training phase 40 and an inference phase 50 . Machine learning system 30 trains empty model 41 by applying training data 42 to empty model 41 in a training phase 40 .

訓練データ４２には、例えば入力データ４２ａと正解出力データ４２ｂとの組からなる複数のデータセットが含まれる。入力データ４２ａと正解出力データ４２ｂとは、いずれも数値列で表現される。例えば画像を用いた機械学習の場合、入力データ４２ａとして、該当画像の特徴を表す数値列が用いられる。 The training data 42 includes, for example, a plurality of data sets consisting of pairs of input data 42a and correct output data 42b. Both the input data 42a and the correct answer output data 42b are represented by numerical strings. For example, in the case of machine learning using images, as the input data 42a, a numerical string representing features of the image is used.

機械学習システム３０は、訓練データ４２内の入力データ４２ａを空のモデル４１に適用して解析を行い、出力データを得る。機械学習システム３０は、出力データと正解出力データ４２ｂとを比較し、不一致であれば、空のモデル４１を修正する。空のモデル４１の修正とは、例えば空のモデル４１を用いた解析に用いるパラメータ（ニューラルネットワークであればユニットへの入力データの重みパラメータ）を、出力データが正解に近づくように修正することである。 The machine learning system 30 applies the input data 42a in the training data 42 to the empty model 41 to perform analysis and obtain output data. The machine learning system 30 compares the output data with the correct output data 42b, and corrects the empty model 41 if they do not match. Correction of the empty model 41 means, for example, correcting parameters used for analysis using the empty model 41 (weight parameters of input data to units in the case of a neural network) so that the output data approaches the correct answer. be.

機械学習システム３０は、大量の訓練データ４２を用いて訓練を行うことで、入力データ４２ａに対して正解出力データ４２ｂと同じ出力データが得られる訓練済みモデル４３を生成することができる。訓練済みモデル４３は、例えば空のモデル４１と、訓練によって適切な値が設定されたモデルのパラメータ４４で表される。 By performing training using a large amount of training data 42, the machine learning system 30 can generate a trained model 43 that provides the same output data as the correct output data 42b for the input data 42a. The trained model 43 is represented by, for example, an empty model 41 and model parameters 44 to which appropriate values have been set by training.

このように生成される訓練済みモデルは、「ｙ＝ｆ（ｘ）」の形の関数と捉えることができる（ｘ、ｙは、それぞれベクトル）。すなわち機械学習における訓練は、大量のｘとｙの組から、それに合った関数ｆを決める作業である。 A trained model generated in this way can be regarded as a function of the form “y=f(x)” (x and y are vectors). In other words, training in machine learning is a task of determining a function f suitable for a large number of pairs of x and y.

機械学習システム３０は、訓練済みモデル４３を生成後、その訓練済みモデル４３を用いて推論フェイズ５０を実施する。例えば機械学習システム３０は、クエリ５１の入力を受け付け、訓練済みモデル４３を用いて、クエリ５１に応じた出力データ５２を得る。例えばクエリ５１がメールの文章であるとき、機械学習システム３０は、そのメールがスパムか否かの判定結果を出力データとして出力する。また入力データが画像のとき、機械学習システム３０は、例えばその画像に写っている動物の種類を出力データとして出力する。 After generating trained model 43 , machine learning system 30 performs inference phase 50 using trained model 43 . For example, machine learning system 30 receives input of query 51 and uses trained model 43 to obtain output data 52 in response to query 51 . For example, when the query 51 is a text of an email, the machine learning system 30 outputs the determination result as to whether the email is spam or not as output data. Also, when the input data is an image, the machine learning system 30 outputs, for example, the type of animal appearing in the image as output data.

分析者は、システムのモデル化の作業において、例えば空のモデル４１の構造や、機械学習の訓練のアルゴリズムなどを明確化する。分析者は、機械学習システム３０のモデル化が完了すると、攻撃目的種別のリストアップを行う。機械学習システム３０に対する攻撃目的種別としては、攻撃の目的ごとに「モデル抽出」、「敵対的サンプル」、「訓練データ推定」、「ポイズニング」がある。 In the system modeling work, the analyst clarifies, for example, the structure of the empty model 41, the machine learning training algorithm, and the like. After completing the modeling of the machine learning system 30, the analyst lists the attack purpose types. Attack objective types for the machine learning system 30 include "model extraction", "hostile sample", "training data estimation", and "poisoning" for each attack objective.

図６は、モデル抽出の攻撃を説明する図である。攻撃者６０は、攻撃対象モデル６１を有する機械学習システム３０に対してアクセスし、攻撃対象モデル６１の複製を作成する。例えば攻撃者６０は、利用者端末３１を用いて、モデル抽出用の特別な操作を施したクエリを機械学習システム３０に送信する。そして攻撃者６０は、利用者端末３１により操作したクエリと出力データとの組を解析し、攻撃対象モデル６１と同じ複製モデル６３を生成する。 FIG. 6 is a diagram explaining an attack on model extraction. The attacker 60 accesses the machine learning system 30 having the attack target model 61 and creates a copy of the attack target model 61 . For example, the attacker 60 uses the user terminal 31 to send a specially manipulated query for model extraction to the machine learning system 30 . Then, the attacker 60 analyzes the set of the query operated by the user terminal 31 and the output data, and generates a copy model 63 that is the same as the attack target model 61 .

攻撃者６０は、モデル抽出により複製モデル６３を作成することができれば、その複製モデル６３を用いて、例えば機械学習システム３０で提供されているサービスと同じサービスを、他のコンピュータを用いて無断で実施することが可能となる。また攻撃者６０は、複製モデル６３を用いて、機械学習システム３０に対するさらなる攻撃の方法を研究することができる。 If the attacker 60 can create a duplicate model 63 by model extraction, the attacker 60 can use the duplicate model 63 to provide the same service as the service provided by the machine learning system 30 without permission using another computer. can be implemented. Attacker 60 can also use replication model 63 to study further attack methods against machine learning system 30 .

図７は、敵対的サンプルの攻撃を説明する図である。例えばパンダが写った元画像６４があるものとする。この元画像６４を、攻撃対象モデル６１を有する機械学習システム３０へクエリとして入力すると、「パンダ」との出力データが得られる。 FIG. 7 is a diagram illustrating an attack of a hostile sample. For example, assume that there is an original image 64 in which a panda appears. When this original image 64 is input as a query to the machine learning system 30 having the attack target model 61, the output data "panda" is obtained.

攻撃者６０は、例えば利用者端末３１を用いて、敵対的サンプルを行うための摂動画像６５を生成する。摂動画像６５は、その画像のみでは何の動物も表していない。そして攻撃者６０は、利用者端末３１により、元画像６４に半透明の摂動画像６５を合成する。その際、利用者端末３１は、摂動画像６５の透明度を高くする。これにより、合成により得られる敵対的サンプル画像６６には、元画像６４と同じパンダが表されている。 Attacker 60 uses, for example, user terminal 31 to generate perturbation image 65 for adversarial sampling. Perturbed image 65 does not represent any animal by itself. Then, the attacker 60 combines the translucent perturbation image 65 with the original image 64 using the user terminal 31 . At that time, the user terminal 31 increases the transparency of the perturbation image 65 . As a result, the same panda as in the original image 64 is represented in the hostile sample image 66 obtained by synthesis.

攻撃者６０が、敵対的サンプル画像６６を機械学習システム３０へクエリとして入力すると、「パンダ」と判定することができず、他の動物（例えば「テナガザル」）を示す判定結果が得られる。 When an attacker 60 enters an adversarial sample image 66 as a query into the machine learning system 30, it fails to determine "panda" and results in determinations indicating other animals (eg, "gibbons").

敵対的サンプルが可能となれば、攻撃者６０は、例えば機械学習システム３０における推論を、意図的に誤らせることができる。例えば機械学習システム３０が人物の顔認証を行っているとき、攻撃者６０は、認証用の顔画像として、摂動画像を合成した画像を機械学習システム３０に読み取らせることで、別人に成りすますことができてしまう。 If adversarial samples are enabled, attacker 60 can intentionally mislead the reasoning in machine learning system 30, for example. For example, when the machine learning system 30 is performing face authentication of a person, the attacker 60 pretends to be another person by having the machine learning system 30 read an image synthesized with the perturbation image as a face image for authentication. I can do it.

図８は、訓練データ推定の攻撃を説明する図である。例えば機械学習システム３０が、訓練フェイズ４０において、人物の画像を訓練データ６７として用いて訓練済みモデル４３を生成したものとする。攻撃者６０は利用者端末３１を用い、攻撃用に操作したクエリ５１ａを機械学習システム３０に入力する。攻撃者６０は利用者端末３１を用い、入力したクエリ５１ａ、推論によって得られた出力データ５２ａなどに基づいて、訓練フェイズ４０で訓練に用いられた訓練データ６７を推定する推定データ６８を生成する。推定データ６８としては、訓練データ６７と同一ではないが、類似する画像データが得られる。 FIG. 8 is a diagram illustrating an attack on training data estimation. For example, machine learning system 30 may have generated trained model 43 in training phase 40 using images of people as training data 67 . The attacker 60 uses the user terminal 31 to input the query 51a manipulated for attack into the machine learning system 30 . Using the user terminal 31, the attacker 60 generates estimated data 68 for estimating the training data 67 used for training in the training phase 40 based on the input query 51a, the output data 52a obtained by reasoning, and the like. . As the estimation data 68, image data similar to, but not identical to, the training data 67 is obtained.

攻撃者６０は、訓練データ６７の推定を行うことで、例えば訓練データ６７に含まれる個人情報を取得することができる。また、機械学習システム３０が、例えば工場における製品の品質検査に用いられている場合、攻撃者６０は、訓練に用いられた製品の特徴に関する情報を取得できる。 By estimating the training data 67, the attacker 60 can acquire personal information included in the training data 67, for example. Also, if the machine learning system 30 is used, for example, for quality inspection of products in a factory, the attacker 60 can obtain information about the characteristics of the products used for training.

図９は、ポイズニングの攻撃を説明する図である。例えば機械学習システム３０は、訓練フェイズ４０において、訓練データ４２を用いて、決定境界４５によってデータを３つのグループに分類する訓練済みモデル４３を生成したものとする。攻撃者６０は、利用者端末３１を用い、ポイズニング用に操作した訓練データ６９を用いて、機械学習システム３０に訓練を実施させる。ポイズニング用に操作した訓練データ６９には、正しい訓練済みモデル４３では、正しく判定されないようなポイズニング用サンプル６９ａが含まれる。ポイズニング用サンプル６９ａは、入力データに対して誤った正解出力データが設定されている。機械学習システム３０は、ポイズニング用サンプル６９ａに応じて、決定境界４５を変更する。 FIG. 9 is a diagram explaining a poisoning attack. For example, machine learning system 30 may have used training data 42 during training phase 40 to generate trained model 43 that classifies the data into three groups by decision boundaries 45 . Using the user terminal 31, the attacker 60 causes the machine learning system 30 to perform training using the training data 69 manipulated for poisoning. The training data 69 manipulated for poisoning includes poisoning samples 69a that would not be correctly judged by the correct trained model 43 . In the poisoning sample 69a, wrong correct output data is set for the input data. The machine learning system 30 changes the decision boundary 45 according to the poisoning sample 69a.

変更された決定境界４５ａは、ポイズニング用サンプル６９ａに適応させるために、誤った方向への変更が行われている。その結果、ポイズニングの攻撃を受けた後の訓練済みモデル４３ａを推論フェイズ５０で用いると、誤った出力データを出力する。 The modified decision boundary 45a is modified in the wrong direction to accommodate the poisoning sample 69a. As a result, using the trained model 43a in the inference phase 50 after being subjected to a poisoning attack produces erroneous output data.

攻撃者６０は、機械学習システム３０に対してポイズニングの攻撃を行うことで、推論での判定精度を劣化させることができる。例えば機械学習システム３０が訓練済みモデル４３ａを用いて、サーバに入力されるファイルのフィルタリングを行っている場合、判定精度が劣化することで、ウィルスなどの危険性を有するファイルの入力がフィルタリングされずに許可される可能性がある。 By attacking the machine learning system 30 with poisoning, the attacker 60 can degrade the judgment accuracy in the inference. For example, when the machine learning system 30 uses the trained model 43a to filter files input to the server, the deterioration in determination accuracy prevents the input of files with risks such as viruses from being filtered. may be permitted to

図６～図９に示したように、どのような効果を得ることを目的とした攻撃なのかによって、機械学習システム３０に対する攻撃を分類できる。また訓練済みモデルに対するアクセス性の違いによって、攻撃者６０が採り得る攻撃手法が変わる。 As shown in FIGS. 6 to 9, attacks against the machine learning system 30 can be classified according to what effect the attack aims to achieve. Moreover, the attack method that the attacker 60 can take varies depending on the difference in accessibility to the trained model.

図１０は、訓練済みモデルに対するアクセス性に応じた攻撃手法を説明する図である。訓練済みモデル４３を入手可能な場合、ホワイトボックス（ＷＢ）攻撃が可能である。ＷＢ攻撃では、攻撃者６０は、訓練済みモデル４３からパラメータを抽出し、抽出したパラメータを確認しながら適確な攻撃方法を見つける。 FIG. 10 is a diagram illustrating an attack method according to accessibility to a trained model. White-box (WB) attacks are possible if a trained model 43 is available. In the WB attack, the attacker 60 extracts parameters from the trained model 43 and finds an appropriate attack method while confirming the extracted parameters.

ＷＢ攻撃は、攻撃者６０に高度な知識と技術が要求される。その一方、高度な知識と技術を有する攻撃者６０であれば、精密な攻撃を高確率で成功させることができる。
訓練済みモデル４３を入手不可能な場合、機械学習システム３０に対して多量のクエリを送信可能であれば、ブラックボックス（ＢＢ）攻撃が可能である。ＢＢ攻撃では、攻撃者６０は、利用者端末３１を用いて、機械学習システム３０に対して多量のクエリを送信し、それらのクエリに対する出力データを得る。攻撃者６０は、取得した出力データに基づいて適当な攻撃手法を判断し、攻撃を行う。 The WB attack requires the attacker 60 to have advanced knowledge and skill. On the other hand, if the attacker 60 has advanced knowledge and skills, it is possible to make a precise attack successful with a high probability.
If a trained model 43 is not available, a black box (BB) attack is possible if a large number of queries can be sent to the machine learning system 30 . In the BB attack, the attacker 60 uses the user terminal 31 to send a large number of queries to the machine learning system 30 and obtain output data for those queries. The attacker 60 determines an appropriate attack technique based on the acquired output data and launches an attack.

ＢＢ攻撃は、攻撃に高度な知識または技術は要求されない。そのため、専門知識のない人物であってもＢＢ攻撃の攻撃者６０になり得るが、粗雑な攻撃となり、攻撃成功率は低い。 The BB attack does not require advanced knowledge or skill to attack. Therefore, even a person without specialized knowledge can be the attacker 60 of the BB attack, but the attack is crude and the attack success rate is low.

訓練済みモデル４３を入手不可能であるがより精密な攻撃を行う場合、多くのクエリ送信が可能であるか、あるいは訓練データの入手または模倣が可能であれば、代理モデルを介した攻撃が可能である。代理モデルを介した攻撃では、攻撃者６０は、まず訓練データの入手が可能かを検討する。例えば機械学習システム３０が、一般に公開されたデータを訓練データとして用いていることが分かっている場合、攻撃者６０は、その訓練データを入手する。訓練データの入手が困難であれば、攻撃者６０は、利用者端末３１を用いて、機械学習システム３０に対して多量のクエリを送信し、それらのクエリに対する出力データを取得する。攻撃者６０は、クエリと出力データとを参照し、機械学習で使用された訓練データを推定する。 If a trained model 43 is not available, but a more sophisticated attack can be sent, many queries can be sent, or if training data can be obtained or mimicked, an attack via a surrogate model is possible. is. In attacking via a surrogate model, attacker 60 first considers the availability of training data. For example, if machine learning system 30 is known to use publicly available data as training data, attacker 60 obtains the training data. If it is difficult to obtain the training data, the attacker 60 uses the user terminal 31 to send a large number of queries to the machine learning system 30 and obtain output data for those queries. The attacker 60 references the query and output data to deduce the training data used in machine learning.

攻撃者６０は、取得した訓練データまたは推定した訓練データを用いて、代理モデル４３ｂを生成する。例えば攻撃者６０は、自身が管理する機械学習システム３０に訓練データを入力して、その機械学習システム３０に訓練を実施させ、訓練済みモデルを生成させる。このとき生成された訓練済みモデルが、攻撃対象の訓練済みモデル４３に対する代理モデル４３ｂである。攻撃者６０は、生成した代理モデル４３ｂからパラメータを抽出し、抽出したパラメータを確認しながら適確な攻撃方法を見つける。 The attacker 60 uses the obtained training data or the estimated training data to generate the proxy model 43b. For example, the attacker 60 inputs training data into the machine learning system 30 managed by the attacker 60 and causes the machine learning system 30 to perform training and generate a trained model. The trained model generated at this time is the proxy model 43b for the trained model 43 of the attack target. The attacker 60 extracts parameters from the generated proxy model 43b and finds an appropriate attack method while confirming the extracted parameters.

代理モデルを介した攻撃では、攻撃者６０に、ＷＢ攻撃なみの高度な知識と技術が要求される。代理モデルを介した攻撃の成功率は、代理モデル４３ｂの質や転移性（transferability）の高さに依存する。転移性とは、代理モデル４３ｂに対して成功する攻撃が、元の訓練済みモデル４３に対しても成功しやすいという性質である。 Attacks through the proxy model require the attacker 60 to have advanced knowledge and techniques comparable to WB attacks. The success rate of attack via the proxy model depends on the quality and transferability of the proxy model 43b. Transferability is the property that an attack that succeeds against the proxy model 43b is likely to succeed against the original trained model 43 as well.

以上のような攻撃の目的に応じた分類と、訓練済みモデルへのアクセス性に応じた攻撃手法との関係は、攻撃ツリーで表すことができる。
図１１は、機械学習システムに関する攻撃ツリーの一例を示す図である。攻撃ツリー７０には攻撃が実行可能となるための条件を表すノード７１ａ～７１ｅ，７２ａ～７２ｃと、攻撃目的種別を表すノード７３ａ～７３ｌとが含まれており、ノード間が矢印で接続されている。矢印の始点のノードは原因または条件を示し、矢印の終点のノードは結果を示す。 The relationship between the classification according to the purpose of the attack as described above and the attack method according to the accessibility to the trained model can be represented by an attack tree.
FIG. 11 is a diagram showing an example of an attack tree for a machine learning system. The attack tree 70 includes nodes 71a to 71e and 72a to 72c representing conditions for making an attack executable, and nodes 73a to 73l representing attack purpose types, and the nodes are connected by arrows. there is The node at the beginning of the arrow indicates the cause or condition, and the node at the end of the arrow indicates the effect.

ノード７１ａ～７１ｅには、攻撃に用いられるデータについて、攻撃に用いるための条件が設定されている。ノード７１ａに示される条件は、訓練済みモデルの入手または模倣が可能なことである。ノード７１ｂに示される条件は、多くのクエリを機械学習システム３０に送信可能なことである。ノード７１ｃに示される条件は、訓練データの入手または模倣が可能なことである。ノード７１ｄに示される条件は、クエリの任意の操作が可能なことである。ノード７１ｅに示される条件は、訓練データの任意の操作が可能なことである。 In the nodes 71a to 71e, conditions for attacking data are set. The condition indicated at node 71a is that a trained model can be obtained or imitated. The condition indicated at node 71 b is that many queries can be sent to machine learning system 30 . The condition indicated at node 71c is that training data can be obtained or simulated. The condition indicated at node 71d is that arbitrary manipulation of the query is possible. The condition indicated at node 71e is that arbitrary manipulation of the training data is possible.

ノード７２ａ～７２ｃには、モデルアクセス性に応じた攻撃手法が示されている。ノード７２ａはＷＢ攻撃を示し、ノード７２ｂはＢＢ攻撃を示し、ノード７２ｃは代理モデルを介した攻撃を示す。 Nodes 72a-72c show attack techniques depending on model accessibility. Node 72a indicates a WB attack, node 72b indicates a BB attack, and node 72c indicates an attack through a proxy model.

ノード７３ａ～７３ｌには、モデルアクセス性に応じた攻撃手法それぞれで実施される攻撃目的種別が示されている。ノード７３ａに示される攻撃目的種別は、ＷＢ攻撃による訓練済みモデルの漏洩である。ノード７３ｂに示される攻撃目的種別は、ＢＢ攻撃による訓練済みモデルの漏洩である。ノード７３ｃに示される攻撃目的種別は、代理モデルを介した攻撃による訓練済みモデルの漏洩である。 Nodes 73a to 73l indicate attack purpose types to be executed by attack methods corresponding to model accessibility. The attack objective type indicated at node 73a is leakage of trained models by WB attack. The attack objective type shown at node 73b is the leakage of trained models by BB attacks. The attack objective type shown at node 73c is the leakage of trained models by attacking through surrogate models.

ノード７３ｄに示される攻撃目的種別は、ＷＢ攻撃による訓練データ推定である。ノード７３ｅに示される攻撃目的種別は、ＢＢ攻撃による訓練データ推定である。ノード７３ｆに示される攻撃目的種別は、代理モデルを介した攻撃による訓練データ推定である。なお代理モデルを生成できている時点で訓練データは入手されており、攻撃者６０にとって、代理モデルを介した攻撃により訓練データ推定をする意味は大きくない。そのため機械学習システム３０のリスクの分析時には、コンピュータ１００は、代理モデルを介した攻撃による訓練データ推定のリスクは、参考値として扱い、機械学習システム３０のリスクを表す情報からは除外する。 The attack objective type shown in node 73d is training data estimation by WB attack. The attack objective type shown at node 73e is training data estimation by BB attack. The attack objective type shown at node 73f is training data estimation by attacking via surrogate models. Note that the training data has already been obtained at the time the proxy model is generated, and it is not significant for the attacker 60 to estimate the training data by attacking via the proxy model. Therefore, when analyzing the risk of the machine learning system 30, the computer 100 treats the training data estimation risk due to the attack via the proxy model as a reference value and excludes it from the information representing the risk of the machine learning system 30.

ノード７３ｇに示される攻撃目的種別は、ＷＢ攻撃による敵対的サンプルである。ノード７３ｈに示される攻撃目的種別は、ＢＢ攻撃による敵対的サンプルである。ノード７３ｉに示される攻撃目的種別は、代理モデルを介した攻撃による敵対的サンプルである。 The attack objective type shown in node 73g is a hostile sample by WB attack. The attack objective type shown at node 73h is an adversarial sample with a BB attack. The attack objective type shown at node 73i is an adversarial sample by attacking through a surrogate model.

ノード７３ｊに示される攻撃目的種別は、ＷＢ攻撃によるポイズニングである。ノード７３ｋに示される攻撃目的種別は、ＢＢ攻撃によるポイズニングである。ノード７３ｌに示される攻撃目的種別は、代理モデルを介した攻撃によるポイズニングである。 The attack purpose type indicated by the node 73j is poisoning by WB attack. The attack purpose type indicated by the node 73k is poisoning by BB attack. The attack objective type shown at node 73l is poisoning by attacking through a proxy model.

攻撃ツリー７０内に示される正方形のノード７４ａ～７４ｅは、左側に隣接するノードの条件に関する中間条件を示している。ノード７４ａ～７４ｅは、左側に矢印で接続されたノードのいずれかの条件が満たされれば（論理和）、右側に矢印で接続されたノードに示される事象が発生し得ることを表している。例えばノード７４ａは、訓練済みモデルの入手または模倣が可能であるか、あるいは多くのクエリが送信可能であれば、ＢＢ攻撃が可能となることを示している。 The square nodes 74a-74e shown in the attack tree 70 represent intermediate conditions with respect to the conditions of the left adjacent nodes. Nodes 74a to 74e represent that if any of the conditions of the nodes connected by arrows on the left are satisfied (logical sum), the event indicated by the nodes connected by arrows on the right can occur. For example, node 74a indicates that a BB attack is possible if a trained model can be obtained or mimicked, or if many queries can be sent.

またノード７５ａ～７５ｅは、ノード７５ａの左側に示される複数のノードのうちの上からｎ（ｎは１以上の整数）番目のノードからの矢印が、ノード７５ｂ～７５ｅそれぞれの右側の上からｎ番目のノードへの矢印となることを示している。例えばノード７２ａを始点としてノード７５ａを終点とする矢印は、ノード７２ａを始点としてノード７３ａを終点とする矢印、ノード７２ａを始点としてノード７３ｄを終点とする矢印、ノード７２ａを始点としてノード７３ｇを終点とする矢印の束を表している。 In addition, the nodes 75a to 75e are arranged so that the arrows from the n-th node (n is an integer equal to or greater than 1) from the top of the plurality of nodes shown on the left side of the node 75a are the n-th nodes from the top on the right side of each of the nodes 75b to 75e. It indicates that it will be an arrow to the th node. For example, an arrow starting at node 72a and ending at node 75a is an arrow starting at node 72a and ending at node 73a, an arrow starting at node 72a and ending at node 73d, and an arrow starting at node 72a and ending at node 73g. It represents a bunch of arrows.

なおノード７３ｇ～７３ｌにおける矢印の合流点には「ＡＮＤ」と示されている。これは、各矢印の始点のノードに示される条件のすべてが満たされたとき（論理積）、該当ノードの事象が発生し得ることを示している。例えばノード７３ｇであれば、２つの条件の論理積となる。すなわちＷＢ攻撃が可能であり、かつクエリの操作が可能な場合に、ＷＢ攻撃による敵対的サンプルが生じ得ることが表されている。 Note that "AND" is indicated at the confluence of the arrows at the nodes 73g-73l. This indicates that the event of the corresponding node can occur when all the conditions indicated at the node at the starting point of each arrow are satisfied (logical AND). For example, node 73g is a logical product of two conditions. That is, when WB attacks are possible and query manipulation is possible, it is possible to generate hostile samples due to WB attacks.

コンピュータ１００は、攻撃ツリー７０に基づいて、攻撃リスクの概算を行うことができる。
図１２は、攻撃リスクの概算機能を示すブロック図である。コンピュータ１００は、記憶部１１０、データ特性受け付け部１２０、尤度算出部１３０、攻撃リスク算出部１４０、および攻撃リスク表示部１５０を有する。 The computer 100 can estimate the attack risk based on the attack tree 70 .
FIG. 12 is a block diagram showing an attack risk estimation function. The computer 100 has a storage unit 110 , a data characteristic reception unit 120 , a likelihood calculation unit 130 , an attack risk calculation unit 140 and an attack risk display unit 150 .

記憶部１１０は、データ特性管理テーブル１１１と尤度管理テーブル１１２とを記憶する。データ特性管理テーブル１１１は、分析者によって入力された、機械学習システム３０で利用するデータについての特性（攻撃者６０が入手可能かまたは任意の操作が可能か）を管理するデータテーブルである。尤度管理テーブル１１２は、攻撃目的種別ごとに算出された尤度を管理するデータテーブルである。記憶部１１０は、例えばメモリ１０２またはストレージ装置１０３の記憶領域の一部である。 Storage unit 110 stores data characteristic management table 111 and likelihood management table 112 . The data property management table 111 is a data table for managing the property (whether the attacker 60 can obtain the data or whether it can be manipulated arbitrarily) about the data used by the machine learning system 30, which is input by the analyst. The likelihood management table 112 is a data table for managing the likelihood calculated for each attack purpose type. The storage unit 110 is part of the storage area of the memory 102 or the storage device 103, for example.

データ特性受け付け部１２０は、分析者から、機械学習システム３０で利用するデータについての特性を示す情報の入力を受け付ける。データ特性受け付け部１２０は、入力されたデータ特性の情報を、データ特性管理テーブル１１１に設定する。 The data characteristic receiving unit 120 receives input of information indicating characteristics of data used in the machine learning system 30 from the analyst. The data characteristic receiving unit 120 sets the input data characteristic information in the data characteristic management table 111 .

尤度算出部１３０は、データ特性管理テーブル１１１を参照し、攻撃ツリー７０に従って攻撃目的種別ごとの尤度を算出する。尤度算出部１３０は、算出した尤度を尤度管理テーブル１１２に設定する。 The likelihood calculation unit 130 refers to the data characteristic management table 111 and calculates the likelihood for each attack purpose type according to the attack tree 70 . Likelihood calculation section 130 sets the calculated likelihood in likelihood management table 112 .

攻撃リスク算出部１４０は、データ特性管理テーブル１１１と尤度管理テーブル１１２とを参照し、攻撃目的種別ごとの攻撃リスクを算出する。攻撃リスク算出部１４０は、算出した攻撃リスクを攻撃リスク表示部１５０に送信する。 The attack risk calculator 140 refers to the data characteristic management table 111 and the likelihood management table 112 to calculate the attack risk for each attack purpose type. The attack risk calculation unit 140 transmits the calculated attack risks to the attack risk display unit 150 .

攻撃リスク表示部１５０は、取得した攻撃リスクをモニタ２１に表示させる。また攻撃リスク表示部１５０は、分析者から、攻撃リスクの詳細表示の入力があった場合、データ特性管理テーブル１１１と尤度管理テーブル１１２とに設定された情報に基づいて、攻撃リスクの算出根拠となったデータをモニタ２１に表示させることもできる。 The attack risk display unit 150 causes the monitor 21 to display the acquired attack risk. Further, when the analyst inputs detailed display of the attack risk, the attack risk display unit 150 displays the calculation basis of the attack risk based on the information set in the data characteristic management table 111 and the likelihood management table 112. The resulting data can also be displayed on the monitor 21 .

なお、図１２に示した各要素間を接続する線は通信経路の一部を示すものであり、図示した通信経路以外の通信経路も設定可能である。また、図１２に示した各要素の機能は、例えば、その要素に対応するプログラムモジュールをコンピュータ１００に実行させることで実現することができる。 It should be noted that the lines connecting the respective elements shown in FIG. 12 indicate part of the communication paths, and communication paths other than the illustrated communication paths can also be set. Also, the function of each element shown in FIG. 12 can be realized, for example, by causing the computer 100 to execute a program module corresponding to the element.

次に図１３と図１４とを参照し、記憶部１１０に格納されているデータ特性管理テーブル１１１と尤度管理テーブル１１２とについて具体的に説明する。
図１３は、データ特性管理テーブルの一例を示す図である。データ特性管理テーブル１１１には、データの種類、攻撃されやすさ、影響度（資産価値）の欄が設けられている。 Next, with reference to FIGS. 13 and 14, the data characteristic management table 111 and the likelihood management table 112 stored in the storage unit 110 will be specifically described.
FIG. 13 is a diagram showing an example of a data characteristic management table. The data characteristic management table 111 has columns of data type, vulnerability to attack, and degree of influence (asset value).

データの種類の欄には、機械学習システム３０が使用するデータの種類が設定される。データの種類としては、訓練データ、訓練済みモデル、クエリ、および出力データがある。 The type of data used by the machine learning system 30 is set in the data type column. Data types include training data, trained models, queries, and output data.

攻撃されやすさの欄には、データの種類ごとに、その種類のデータを用いた攻撃のされやすさが、入手・模倣、操作、多量送信の欄に設定される。入手・模倣の欄には、対応するデータの入手または模倣が容易なほど高い値が設定される。操作の欄には、対応するデータの任意の操作（改変）が容易なほど高い値が設定される。多量送信の欄には、対応するデータを機械学習システム３０へ多量に送信することが容易なほど高い値が設定される。例えばデータの種別「クエリ」の多量送信の欄には、単位時間あたりに送信可能なクエリの数が多いほど高い値が設定される。 In the column of susceptibility to attack, the susceptibility to attack using data of that type is set for each type of data in the columns of acquisition/imitation, manipulation, and mass transmission. In the Acquisition/Imitation column, a higher value is set as the corresponding data is easier to obtain or imitate. A higher value is set in the operation column so that arbitrary operation (modification) of the corresponding data is easier. A large amount of transmission column is set to a value that is so high that it is easy to transmit the corresponding data to the machine learning system 30 in large amounts. For example, in the large volume transmission column of the data type “query”, a higher value is set as the number of queries that can be transmitted per unit time increases.

影響度（資産価値）の欄には、対応するデータが攻撃を受けた場合の影響度が、漏洩と操作との欄に設定される。漏洩の欄には、対応するデータが漏洩した場合の影響度が設定される。操作の欄には、対応するデータが操作（改変）された場合の影響度が設定される。影響度は、例えば資産価値を基準とすることができる。例えば攻撃を受けることによる資産の損失が大きい程、影響度が高く設定される。 In the impact degree (asset value) column, the impact degree when the corresponding data is attacked is set in the leakage and manipulation columns. In the column of leakage, the degree of impact when the corresponding data is leaked is set. In the operation column, the degree of influence when the corresponding data is operated (altered) is set. The degree of influence can be based on asset value, for example. For example, the greater the loss of property due to an attack, the higher the degree of impact is set.

図１４は、尤度管理テーブルの一例を示す図である。尤度管理テーブル１１２には、攻撃目的種別ごとに尤度が設定されている。攻撃目的種別は、攻撃手法とその攻撃の目的の組で表されている。 FIG. 14 is a diagram illustrating an example of a likelihood management table; A likelihood is set in the likelihood management table 112 for each attack purpose type. The attack purpose type is represented by a set of an attack method and the purpose of the attack.

以下、コンピュータ１００による攻撃リスクの概算処理の手順について詳細に説明する。
図１５は、攻撃リスク概算処理の手順の一例を示すフローチャートである。以下、図１５に示す処理をステップ番号に沿って説明する。 The procedure of attack risk estimation processing by the computer 100 will be described in detail below.
FIG. 15 is a flowchart illustrating an example of an attack risk estimation process procedure. The processing shown in FIG. 15 will be described below along with the step numbers.

［ステップＳ１０１］データ特性受け付け部１２０は、機械学習システム３０が使用するデータのデータ特性として、攻撃されやすさ、および攻撃されたときの影響度との入力を受け付ける。データ特定受け付け部１２０は、入力されたデータ特性を示す値を、データ特性管理テーブル１１１に設定する。 [Step S<b>101 ] The data characteristic receiving unit 120 receives an input of attack susceptibility and degree of influence when attacked as data characteristics of data used by the machine learning system 30 . The data identification receiving unit 120 sets the input value indicating the data characteristic in the data characteristic management table 111 .

［ステップＳ１０２］尤度算出部１３０は、データ特性管理テーブル１１１を参照し、攻撃目的種別ごとの尤度を算出する。尤度算出部１３０は、算出した尤度を尤度管理テーブル１１２に設定する。なお、尤度算出処理の詳細は後述する（図１６参照）。 [Step S102] The likelihood calculation unit 130 refers to the data characteristic management table 111 and calculates the likelihood for each attack purpose type. Likelihood calculation section 130 sets the calculated likelihood in likelihood management table 112 . Details of the likelihood calculation process will be described later (see FIG. 16).

［ステップＳ１０３］攻撃リスク算出部１４０は、攻撃目的種別ごとの攻撃リスクを算出する。攻撃リスクは、例えば攻撃目的種別の尤度に、その攻撃目的種別の影響度を乗算した値である。 [Step S103] The attack risk calculator 140 calculates an attack risk for each attack purpose type. The attack risk is, for example, a value obtained by multiplying the likelihood of an attack purpose type by the impact of the attack purpose type.

［ステップＳ１０４］攻撃リスク表示部１５０は、攻撃目的種別ごとの攻撃リスクをモニタ２１に表示する。
このような手順で、分析者がデータ特性を入力すれば、機械学習システム３０に関する攻撃目的種別ごとの攻撃リスクがコンピュータ１００により自動算出される。 [Step S<b>104 ] The attack risk display unit 150 displays the attack risk for each attack purpose type on the monitor 21 .
If the analyst inputs the data characteristics in such a procedure, the computer 100 automatically calculates the attack risk for each attack purpose type with respect to the machine learning system 30 .

次に尤度算出処理について詳細に説明する。
図１６は、尤度算出処理の手順の一例を示すフローチャートである。以下、図１６に示す処理をステップ番号に沿って説明する。 Next, the likelihood calculation process will be described in detail.
FIG. 16 is a flowchart illustrating an example of the procedure of likelihood calculation processing. The processing shown in FIG. 16 will be described below along with the step numbers.

［ステップＳ１１１］尤度算出部１３０は、攻撃に用いるデータについての５つの条件ごとの攻撃されやすさを示す数値を、データ特性管理テーブル１１１から読み出す。例えば尤度算出部１３０は、図１３に示したデータ特性管理テーブル１１１から、条件「訓練済みモデル入手／模倣可能」について、攻撃されやすさ「２」を読み出す。また尤度算出部１３０は、データ特性管理テーブル１１１から、条件「多くのクエリ送信可能」について、攻撃されやすさ「８」を読み出す。また尤度算出部１３０は、データ特性管理テーブル１１１から、条件「訓練データ入手／模倣可能」について、攻撃されやすさ「１０」を読み出す。また尤度算出部１３０は、データ特性管理テーブル１１１から、条件「クエリ操作可能」について、攻撃されやすさ「６」を読み出す。また尤度算出部１３０は、データ特性管理テーブル１１１から、条件「訓練データ操作可能」について、攻撃されやすさ「４」を読み出す。 [Step S<b>111 ] The likelihood calculation unit 130 reads from the data characteristic management table 111 numerical values indicating attack susceptibility for each of the five conditions regarding data used for attack. For example, the likelihood calculation unit 130 reads the vulnerability "2" for the condition "trained model available/imitation possible" from the data characteristic management table 111 shown in FIG. Further, the likelihood calculation unit 130 reads the attack vulnerability “8” from the data characteristic management table 111 for the condition “many queries can be sent”. Further, the likelihood calculation unit 130 reads the attack vulnerability “10” from the data characteristic management table 111 for the condition “training data available/imitation possible”. Further, the likelihood calculation unit 130 reads the attack vulnerability “6” from the data characteristic management table 111 for the condition “query operation possible”. Further, the likelihood calculation unit 130 reads the attack susceptibility “4” from the data characteristic management table 111 for the condition “training data can be manipulated”.

［ステップＳ１１２］尤度算出部１３０は、条件「多くのクエリ送信可能」についての中間条件を計算する。この中間条件は、攻撃ツリー７０におけるノード７４ａ（図１１参照）における条件である。例えば尤度算出部１３０は、条件「訓練済みモデル入手／模倣可能」の攻撃されやすさ「２」と、条件「多くのクエリ送信可能」の攻撃されやすさ「８」とのうちの、大きい方の値「８」を、ノード７４ａにおける中間条件の値とする。 [Step S112] The likelihood calculation unit 130 calculates an intermediate condition for the condition "many queries can be sent". This intermediate condition is the condition at node 74a in attack tree 70 (see FIG. 11). For example, the likelihood calculation unit 130 determines which of the susceptibility to attack “2” for the condition “obtainable/imitation of a trained model” and the susceptibility to attack “8” for the condition “many queries can be sent” The value "8" of the other is taken as the value of the intermediate condition at node 74a.

［ステップＳ１１３］尤度算出部１３０は、条件「訓練データ入手／模倣可能」についての中間条件を計算する。この中間条件は、攻撃ツリー７０におけるノード７４ｂ（図１１参照）における条件である。例えば尤度算出部１３０は、ステップＳ１１２で計算した中間条件の値「８」と、条件「訓練データ入手／模倣可能」の攻撃されやすさ「１０」とのうちの、大きい方の値「１０」を、ノード７４ｂにおける中間条件の値とする。 [Step S113] The likelihood calculation unit 130 calculates an intermediate condition for the condition “training data available/imitation possible”. This intermediate condition is the condition at node 74b in attack tree 70 (see FIG. 11). For example, the likelihood calculation unit 130 calculates the larger value "10" of the intermediate condition value "8" calculated in step S112 and the vulnerability "10" of the condition "training data available/imitation possible". ' be the value of the intermediate condition at node 74b.

［ステップＳ１１４］尤度算出部１３０は、訓練済みモデル４３へのアクセス性に応じた３つの攻撃手法それぞれの中間条件を計算する。例えば尤度算出部１３０は、条件「訓練済みモデル入手／模倣可能」の攻撃されやすさ「２」を、中間条件「ＷＢ攻撃」の値とする。また尤度算出部１３０は、ステップＳ１１２で計算した中間条件の値「８」を、中間条件「ＢＢ攻撃」の値とする。また尤度算出部１３０は、ステップＳ１１３で計算した中間条件の値「１０」を、中間条件「代理モデルを介した攻撃」の値とする。 [Step S<b>114 ] The likelihood calculation unit 130 calculates intermediate conditions for each of the three attack methods according to accessibility to the trained model 43 . For example, the likelihood calculation unit 130 sets the attack susceptibility "2" of the condition "trained model available/imitation possible" as the value of the intermediate condition "WB attack". Further, the likelihood calculation unit 130 sets the value “8” of the intermediate condition calculated in step S112 as the value of the intermediate condition “BB attack”. Further, the likelihood calculation unit 130 sets the intermediate condition value “10” calculated in step S113 as the value of the intermediate condition “attack via proxy model”.

［ステップＳ１１５］尤度算出部１３０は、攻撃の目的で分類される４つの攻撃目的種別ごとの尤度を計算する。例えば尤度算出部１３０は、訓練済みモデルへのアクセス性に応じた攻撃手法と、攻撃目的種別で示される攻撃による影響との組合せごとに、尤度を算出する。具体的には、尤度算出部１３０は以下の計算を行う。 [Step S115] The likelihood calculation unit 130 calculates the likelihood for each of the four attack purpose types classified according to the attack purpose. For example, the likelihood calculation unit 130 calculates the likelihood for each combination of an attack method according to accessibility to the trained model and the impact of the attack indicated by the attack purpose type. Specifically, the likelihood calculation unit 130 performs the following calculations.

尤度算出部１３０は、中間条件「ＷＢ攻撃」の値「２」を、「ＷＢ攻撃による訓練済みモデルの漏洩」についての尤度とする。尤度算出部１３０は、中間条件「ＢＢ攻撃」の値「８」を、「ＢＢ攻撃による訓練済みモデルの漏洩」についての尤度とする。尤度算出部１３０は、中間条件「代理モデルを介した攻撃」の値「１０」を、「代理モデルを介した攻撃による訓練済みモデルの漏洩」についての尤度とする。 The likelihood calculation unit 130 sets the value “2” of the intermediate condition “WB attack” as the likelihood of “leakage of trained model due to WB attack”. The likelihood calculation unit 130 sets the value “8” of the intermediate condition “BB attack” as the likelihood of “leakage of trained model due to BB attack”. The likelihood calculation unit 130 sets the value “10” of the intermediate condition “attack via proxy model” as the likelihood of “leakage of trained model due to attack via proxy model”.

また尤度算出部１３０は、中間条件「ＷＢ攻撃」の値「２」を、「ＷＢ攻撃による訓練データ推定」についての尤度とする。尤度算出部１３０は、中間条件「ＢＢ攻撃」の値「８」を、「ＢＢ攻撃による訓練データ推定」についての尤度とする。尤度算出部１３０は、中間条件「代理モデルを介した攻撃」の値「１０」を、「代理モデルを介した攻撃による訓練データ推定」についての尤度とする。 Further, likelihood calculation section 130 sets the value “2” of the intermediate condition “WB attack” as the likelihood of “training data estimation by WB attack”. The likelihood calculation unit 130 sets the value “8” of the intermediate condition “BB attack” as the likelihood of “training data estimation by BB attack”. The likelihood calculation unit 130 sets the value “10” of the intermediate condition “attack via proxy model” as the likelihood of “estimation of training data by attack via proxy model”.

また尤度算出部１３０は、中間条件「ＷＢ攻撃」の値「２」と、条件「クエリを操作可能」の中間条件「６」とのうちの最小値「２」を、「ＷＢ攻撃による敵対的サンプル」の尤度とする。尤度算出部１３０は、中間条件「ＢＢ攻撃」の値「８」と、条件「クエリを操作可能」の中間条件「６」とのうちの最小値「６」を、「ＢＢ攻撃による敵対的サンプル」の尤度とする。尤度算出部１３０は、中間条件「代理モデルを介した攻撃」の値「１０」と、条件「クエリを操作可能」の中間条件「６」とのうちの最小値「６」を、「代理モデルを介した攻撃による敵対的サンプル」の尤度とする。 Further, the likelihood calculation unit 130 calculates the minimum value “2” between the value “2” of the intermediate condition “WB attack” and the intermediate condition “6” of the condition “query can be manipulated” as “hostile by WB attack”. the likelihood of a “target sample”. The likelihood calculation unit 130 determines the minimum value “6” between the value “8” of the intermediate condition “BB attack” and the intermediate condition “6” of the condition “query can be manipulated” as “hostile by BB attack”. sample' likelihood. The likelihood calculation unit 130 calculates the minimum value “6” between the value “10” of the intermediate condition “attack via proxy model” and the intermediate condition “6” of the condition “query can be manipulated” as “proxy be the likelihood of "adversarial samples" due to attacks through the model.

また尤度算出部１３０は、中間条件「ＷＢ攻撃」の値「２」と、条件「訓練データを入手／模倣可能」の中間条件の値「１０」と、条件「訓練データを操作可能」の値「４」とのうちの最小値「２」を、「ＷＢ攻撃によるポイズニング」の尤度とする。尤度算出部１３０は、中間条件「ＢＢ攻撃」の値「８」と、条件「訓練データを入手／模倣可能」の中間条件の値「１０」と、条件「訓練データを操作可能」の値「４」とのうちの最小値「４」を、「ＢＢ攻撃によるポイズニング」の尤度とする。尤度算出部１３０は、中間条件「代理モデルを介した攻撃」の値「１０」と、条件「訓練データを入手／模倣可能」の中間条件の値「１０」と、条件「訓練データを操作可能」の値「４」とのうちの最小値「４」を求める。そして尤度算出部１３０は、求めた最小値「４」を、「代理モデルを介した攻撃によるポイズニング」の尤度とする。 Further, the likelihood calculation unit 130 calculates the intermediate condition value “2” for the intermediate condition “WB attack”, the intermediate condition value “10” for the condition “training data can be obtained/imitated”, and the condition “training data can be manipulated”. The minimum value "2" among the values "4" is set as the likelihood of "poisoning by WB attack". The likelihood calculation unit 130 calculates the value “8” of the intermediate condition “BB attack”, the value “10” of the intermediate condition of the condition “training data can be obtained/imitated”, and the value of the condition “training data can be manipulated”. Let the minimum value "4" of "4" be the likelihood of "poisoning by BB attack". The likelihood calculation unit 130 calculates the value “10” of the intermediate condition “attack via proxy model”, the value “10” of the intermediate condition The minimum value "4" is obtained from the "possible" value "4". Then, the likelihood calculation unit 130 sets the calculated minimum value “4” as the likelihood of “poisoning by attack via a proxy model”.

［ステップＳ１１６］尤度算出部１３０は、攻撃目的種別ごとの尤度を出力する。例えば尤度算出部１３０は、攻撃目的種別ごとの尤度を尤度管理テーブル１１２に書き込む。
このようにして、攻撃ツリー７０に従って、攻撃目的種別ごとの尤度を算出することができる。 [Step S116] The likelihood calculation unit 130 outputs the likelihood for each attack purpose type. For example, the likelihood calculation unit 130 writes the likelihood for each attack purpose type into the likelihood management table 112 .
In this way, the likelihood for each attack purpose type can be calculated according to the attack tree 70 .

図１７は、攻撃ツリーを用いた尤度の算出過程を示す図である。図１７では、各ノードの位置に、該当ノードに対応する条件または攻撃目的種別に関して算出された値が示されている。 FIG. 17 is a diagram illustrating a likelihood calculation process using an attack tree. In FIG. 17, the position of each node indicates the value calculated for the condition or attack purpose type corresponding to the node.

攻撃ツリー７０の構造に沿った尤度の計算手順は、例えば尤度算出部１３０の処理手順を記述したプログラムモジュール内で記述することができる。
図１８は、尤度計算処理で用いる変数群の一例を示す図である。例えば条件「訓練済みモデルを入手／模倣可能」の攻撃されやすさの値を設定する変数の変数名は、「Ｌ＿ｃａｎ＿ｇｅｔ＿ｍｏｄｅｌ」である。条件「多くのクエリ送信可能」の攻撃されやすさの入力値の変数名は、「Ｌ＿ｃａｎ＿ｓｅｎｄ＿ｑｕｅｒｉｅｓ」である。条件「訓練データを入手／模倣可能」の攻撃されやすさの入力値の変数名は、「Ｌ＿ｃａｎ＿ｇｅｔ＿ｔｄａｔａ」である。条件「クエリを操作可能」の攻撃されやすさの入力値の変数名は、「Ｌ＿ｃａｎ＿ｍａｎｉｐｕｌａｔｅ＿ｑｕｅｒｙ」である。条件「訓練データを操作可能」の攻撃されやすさの入力値の変数名は、「Ｌ＿ｃａｎ＿ｍａｎｉｐｕｌａｔｅ＿ｔｄａｔａ」である。 A procedure for calculating the likelihood along the structure of the attack tree 70 can be described, for example, in a program module describing the processing procedure of the likelihood calculation unit 130 .
FIG. 18 is a diagram showing an example of a variable group used in likelihood calculation processing. For example, the variable name of the variable that sets the vulnerability value for the condition "can get/imitate a trained model" is "L_can_get_model". The variable name of the input value of the attack susceptibility of the condition “many queries can be sent” is “L_can_send_queries”. The variable name of the input value of vulnerability of the condition “Training data can be obtained/imitated” is “L_can_get_tdata”. The variable name of the input value of vulnerability of the condition "can manipulate query" is "L_can_manipulate_query". The variable name of the input value of vulnerability of the condition "can manipulate training data" is "L_can_manipulate_tdata".

攻撃に用いるデータについての５つの条件は、中間条件によって値が変更される。攻撃に用いるデータについての条件の値の遷移は、２次元配列で管理される。攻撃に用いるデータについての各条件には、攻撃ツリー７０における上位から「０」から昇順の番号が付与される。そして攻撃に用いるデータについての条件に関する入力値は、それぞれ配列Ｌ［０］［０］～Ｌ［０］［４］に設定される。配列Ｌ［０］［０］～Ｌ［０］［４］のいずれかの値を用いて算出された中間条件の値は、配列Ｌ［１］［０］～Ｌ［１］［４］に設定される。配列Ｌ［１］［０］～Ｌ［１］［４］のいずれかの値を用いて算出された中間条件の値は、配列Ｌ［２］［０］～Ｌ［２］［４］に設定される。 Values of the five conditions for data used for attack are changed by intermediate conditions. A two-dimensional array manages the transition of the value of the condition for the data used for the attack. Each condition for data used for an attack is given a number in ascending order from "0" from the top in the attack tree 70. FIG. Input values relating to conditions for data used for attacks are set in arrays L[0][0] to L[0][4], respectively. The values of the intermediate conditions calculated using any of the values in the arrays L[0][0] to L[0][4] are stored in the arrays L[1][0] to L[1][4]. set. The values of the intermediate conditions calculated using the values in any of the arrays L[1][0] to L[1][4] are stored in the arrays L[2][0] to L[2][4]. set.

条件「ＷＢ攻撃」の値は、配列Ｌ＿ａｃｃｅｓｓ［ｉ］（ｉ＝０～２）に、０番の要素「Ｌ＿ａｃｃｅｓｓ［０］」として設定される。条件「ＢＢ攻撃」の値は、配列Ｌ＿ａｃｃｅｓｓ［ｉ］（ｉ＝０～２）に、１番の要素「Ｌ＿ａｃｃｅｓｓ［１］」として設定される。条件「代理モデルを介した攻撃」の値は、配列Ｌ＿ａｃｃｅｓｓ［ｉ］（ｉ＝０～２）に、２番の要素「Ｌ＿ａｃｃｅｓｓ［２］」として設定される。 The value of the condition "WB attack" is set as the 0th element "L_access[0]" in the array L_access[i] (i=0 to 2). The value of the condition "BB attack" is set as the first element "L_access[1]" in the array L_access[i] (i=0 to 2). The value of the condition “attack via proxy model” is set as the second element “L_access[2]” in the array L_access[i] (i=0 to 2).

攻撃目的種別「ＷＢ攻撃による訓練済みモデルの漏洩」の値（尤度）は、配列Ｌ＿ｍｏｄｅｌ＿ｌｅａｋ［ｉ］（ｉ＝０～２）に、０番の要素「Ｌ＿ｍｏｄｅｌ＿ｌｅａｋ［０］」として設定される。攻撃目的種別「ＢＢ攻撃による訓練済みモデルの漏洩」の値（尤度）は、配列Ｌ＿ｍｏｄｅｌ＿ｌｅａｋ［ｉ］（ｉ＝０～２）に、１番の要素「Ｌ＿ｍｏｄｅｌ＿ｌｅａｋ［１］」として設定される。攻撃目的種別「代理モデルを介した攻撃による訓練済みモデルの漏洩」の値（尤度）は、配列Ｌ＿ｍｏｄｅｌ＿ｌｅａｋ［ｉ］（ｉ＝０～２）に、２番の要素「Ｌ＿ｍｏｄｅｌ＿ｌｅａｋ［２］」として設定される。 The value (likelihood) of the attack purpose type “leakage of trained model by WB attack” is set as the 0th element “L_model_leak[0]” in the array L_model_leak[i] (i=0 to 2). The value (likelihood) of the attack purpose type “leakage of trained model by BB attack” is set as the first element “L_model_leak[1]” in the array L_model_leak[i] (i=0 to 2). The value (likelihood) of the attack purpose type “Leakage of trained model due to attack via proxy model” is stored in the array L_model_leak[i] (i=0 to 2) as the second element “L_model_leak[2]”. set.

攻撃目的種別「ＷＢ攻撃による訓練データ推定」の値（尤度）は、配列Ｌ＿ｔｄａｔａ＿ｉｎｆｅｒ［ｉ］（ｉ＝０～２）に、０番の要素「Ｌ＿ｔｄａｔａ＿ｉｎｆｅｒ［０］」として設定される。攻撃目的種別「ＢＢ攻撃による訓練データ推定」の値（尤度）は、配列Ｌ＿ｔｄａｔａ＿ｉｎｆｅｒ［ｉ］（ｉ＝０～２）に、１番の要素「Ｌ＿ｔｄａｔａ＿ｉｎｆｅｒ［１］」として設定される。攻撃目的種別「代理モデルを介した攻撃による訓練データ推定」の値（尤度）は、配列Ｌ＿ｔｄａｔａ＿ｉｎｆｅｒ［ｉ］（ｉ＝０～２）に、２番の要素「Ｌ＿ｔｄａｔａ＿ｉｎｆｅｒ［２］」として設定される。 The value (likelihood) of the attack purpose type “training data estimation by WB attack” is set as the 0th element “L_tdata_infer[0]” in the array L_tdata_infer[i] (i=0 to 2). The value (likelihood) of the attack purpose type “training data estimation by BB attack” is set as the first element “L_tdata_infer[1]” in the array L_tdata_infer[i] (i=0 to 2). The value (likelihood) of the attack purpose type “estimation of training data by attacking through a proxy model” is set as the second element “L_tdata_infer[2]” in the array L_tdata_infer[i] (i=0 to 2). be.

攻撃目的種別「ＷＢ攻撃による敵対的サンプル」の値（尤度）は、配列Ｌ＿ａｄｖｅｘ［ｉ］（ｉ＝０～２）に、０番の要素「Ｌ＿ａｄｖｅｘ［０］」として設定される。攻撃目的種別「ＢＢ攻撃による敵対的サンプル」の値（尤度）は、配列Ｌ＿ａｄｖｅｘ［ｉ］（ｉ＝０～２）に、１番の要素「Ｌ＿ａｄｖｅｘ［１］」として設定される。攻撃目的種別「代理モデルを介した攻撃による敵対的サンプル」の値（尤度）は、配列Ｌ＿ａｄｖｅｘ［ｉ］（ｉ＝０～２）に、２番の要素「Ｌ＿ａｄｖｅｘ［２］」として設定される。 The value (likelihood) of the attack purpose type “hostile sample by WB attack” is set as the 0th element “L_advex[0]” in the array L_advex[i] (i=0 to 2). The value (likelihood) of the attack purpose type “hostile sample by BB attack” is set as the first element “L_advex[1]” in the array L_advex[i] (i=0 to 2). The value (likelihood) of the attack purpose type “hostile sample by attack via proxy model” is set as the second element “L_advex[2]” in the array L_advex[i] (i=0 to 2). be.

攻撃目的種別「ＷＢ攻撃によるポイズニング」の値（尤度）は、配列Ｌ＿ｐｏｉｓｏｎｉｎｇ［ｉ］（ｉ＝０～２）に、０番の要素「Ｌ＿ｐｏｉｓｏｎｉｎｇ［０］」として設定される。攻撃目的種別「ＢＢ攻撃によるポイズニング」の値（尤度）は、配列Ｌ＿ｐｏｉｓｏｎｉｎｇ［ｉ］（ｉ＝０～２）に、１番の要素「Ｌ＿ｐｏｉｓｏｎｉｎｇ［１］」として設定される。攻撃目的種別「代理モデルを介した攻撃によるポイズニング」の値（尤度）は、配列Ｌ＿ｐｏｉｓｏｎｉｎｇ［ｉ］（ｉ＝０～２）に、２番の要素「Ｌ＿ｐｏｉｓｏｎｉｎｇ［２］」として設定される。 The value (likelihood) of the attack purpose type “poisoning by WB attack” is set as the 0th element “L_poisoning[0]” in the array L_poisoning[i] (i=0 to 2). The value (likelihood) of the attack purpose type “poisoning by BB attack” is set as the first element “L_poisoning[1]” in the array L_poisoning[i] (i=0 to 2). The value (likelihood) of the attack purpose type “poisoning by attack via a proxy model” is set as the second element “L_poisoning[2]” in the array L_poisoning[i] (i=0 to 2).

図１８に示した変数および配列を用いて、攻撃ツリー７０に沿った尤度計算処理の手順をプログラムで記述することができる。
図１９は、尤度計算プログラムの擬似コードの一例を示す図である。尤度計算プログラム１３１の１行目から５行目は、攻撃に用いてデータについての条件に関する入力値を、配列Ｌ［０］［０］～Ｌ［０］［４］に設定する命令である。 Using the variables and arrays shown in FIG. 18, the procedure of likelihood calculation processing along the attack tree 70 can be described by a program.
FIG. 19 is a diagram showing an example of pseudocode of a likelihood calculation program. The 1st to 5th lines of the likelihood calculation program 131 are instructions for setting the input values related to the data conditions used in the attack to the arrays L[0][0] to L[0][4]. .

尤度計算プログラム１３１の６行目から７行目は、配列Ｌ［０］［０］～Ｌ［０］［４］の値を用いて、１段目の中間条件の値を配列Ｌ［１］［０］～Ｌ［１］［４］に設定する命令である。このうち７行目の「Ｌ［１］［１］＝ｍａｘ（Ｌ［０］［０］，Ｌ［０］［１］）」は、Ｌ［０］［０］とＬ［０］［１］とのうちの最大値を、Ｌ［１］［１］に設定する命令である。 The sixth and seventh lines of the likelihood calculation program 131 use the values of the arrays L[0][0] to L[0][4] to store the values of the intermediate conditions in the first stage in the array L[1 ][0] to L[1][4]. Among them, "L[1][1]=max(L[0][0], L[0][1])" in the seventh line is L[0][0] and L[0][1 ] is set to L[1][1].

尤度計算プログラム１３１の８行目から９行目は、配列Ｌ［１］［０］～Ｌ［１］［４］の値を用いて、２段目の中間条件の値を配列Ｌ［２］［０］～Ｌ［２］［４］に設定する命令である。このうち９行目の「Ｌ［２］［２］＝ｍａｘ（Ｌ［１］［１］，Ｌ［１］［２］）」は、Ｌ［１］［１］とＬ［１］［２］とのうちの最大値を、Ｌ［２］［２］に設定する命令である。 The eighth and ninth lines of the likelihood calculation program 131 use the values of the arrays L[1][0] to L[1][4] to store the values of the intermediate conditions in the second stage in the array L[2 ][0] to L[2][4]. Of these, "L[2][2]=max(L[1][1], L[1][2])" on the 9th line is L[1][1] and L[1][2] ] is set to L[2][2].

１０行目から１２行目は、訓練済みモデルへのアクセス性に応じた攻撃手法に関する条件の値を設定する命令である。
１３行目から１８行目は、攻撃による攻撃目的種別の尤度を設定する命令である。このうち１６行目の「Ｌ＿ａｄｖｅｘ［ｉ］＝ｍｉｎ（Ｌ＿ａｃｃｅｓｓ［ｉ］，Ｌ［２］［３］）」は、Ｌ＿ａｃｃｅｓｓ［ｉ］とＬ［２］［３］とのうちの最小値を、Ｌ＿ａｄｖｅｘ［ｉ］に設定する命令である。１７行目の「Ｌ＿ｐｏｉｓｏｎｉｎｇ［ｉ］＝ｍｉｎ（Ｌ＿ａｃｃｅｓｓ［ｉ］，Ｌ［２］［２］，Ｌ［２］［４］）」は、Ｌ＿ａｃｃｅｓｓ［ｉ］とＬ［２］［２］とＬ［２］［４］とのうちの最小値を、Ｌ＿ｐｏｉｓｏｎｉｎｇ［ｉ］に設定する命令である。 The 10th to 12th lines are instructions for setting the value of the condition regarding the attack technique according to the accessibility to the trained model.
The 13th to 18th lines are instructions for setting the likelihood of the attack purpose type of the attack. Of these, the 16th line "L_advex[i]=min(L_access[i], L[2][3])" is the minimum value between L_access[i] and L[2][3], This is an instruction to set L_advex[i]. "L_poisoning[i]=min(L_access[i], L[2][2], L[2][4])" on the 17th line is L_access[i], L[2][2] and L This is an instruction to set the minimum value of [2] and [4] to L_poisoning[i].

１９行目は、攻撃目的種別の尤度の出力命令である。
図１９に示したような尤度計算プログラム１３１をコンピュータ１００に実行させることによって、図１６、図１７に示した尤度算出部１３０による尤度計算処理を実現することができる。尤度の計算が完了すると、攻撃リスク算出部１４０により、攻撃目的種別ごとの攻撃リスクが算出される。 The 19th line is an instruction to output the likelihood of the attack purpose type.
By causing the computer 100 to execute the likelihood calculation program 131 as shown in FIG. 19, the likelihood calculation processing by the likelihood calculation unit 130 shown in FIGS. 16 and 17 can be realized. When the likelihood calculation is completed, the attack risk calculator 140 calculates the attack risk for each attack purpose type.

図２０は、攻撃リスクの算出方法の一例を示す図である。攻撃リスク算出部１４０は、攻撃目的種別ごとに求められた攻撃の尤度に、その攻撃目的種別に関する攻撃の影響度を乗算した結果を、その攻撃目的種別の攻撃リスクとする。 FIG. 20 is a diagram illustrating an example of an attack risk calculation method. The attack risk calculation unit 140 sets the result of multiplying the likelihood of an attack obtained for each attack purpose type by the degree of influence of the attack for that attack purpose type as the attack risk of that attack purpose type.

なお、ポイズニングの攻撃を受けた場合、直接的には、訓練済みモデル４３が操作されるという影響が生じる。訓練済みモデル４３が操作されると、その訓練済みモデル４３を用いて推論を行った場合に、推論の結果の出力データも操作されるという影響がある。そのため攻撃リスク算出部１４０は、訓練済みモデル操作の影響度を用いて、ポイズニングによる訓練済みモデルの操作の攻撃リスクを算出すると共に、出力データ操作の影響度を用いて、ポイズニングによる出力データの操作の攻撃リスクとを算出する。 It should be noted that when a poisoning attack is received, the trained model 43 is directly manipulated. Manipulation of the trained model 43 has the effect that when an inference is made using the trained model 43, the output data resulting from the inference is also manipulated. Therefore, the attack risk calculation unit 140 uses the impact of the trained model operation to calculate the attack risk of the operation of the trained model by poisoning, and uses the impact of the output data operation to operate the output data by poisoning. to calculate the attack risk of

攻撃リスク算出部１４０により計算された攻撃リスクは、攻撃リスク表示部１５０により、モニタ２１に表示される。例えばデータ特性受け付け部１２０が攻撃リスク管理画面においてデータ特性を示す情報の入力を受け付けたとき、攻撃リスク表示部１５０は、その攻撃リスク管理画面内に攻撃リスクの計算結果を表示する。 The attack risk calculated by the attack risk calculator 140 is displayed on the monitor 21 by the attack risk display 150 . For example, when the data characteristic receiving unit 120 receives input of information indicating data characteristics on the attack risk management screen, the attack risk display unit 150 displays the attack risk calculation result on the attack risk management screen.

図２１は、攻撃リスク管理画面の一例を示す図である。攻撃リスク管理画面８０には、アセスメントの概要表示部８１、システムの概要表示部８２、データ特性入力部８３、攻撃リスク表示部８４、および攻撃リスク詳細ボタン８５が含まれる。 FIG. 21 is a diagram showing an example of an attack risk management screen. The attack risk management screen 80 includes an assessment summary display portion 81 , a system summary display portion 82 , a data characteristic input portion 83 , an attack risk display portion 84 , and an attack risk detail button 85 .

アセスメントの概要表示部８１には、攻撃リスクの計算を指示する分析者に関する情報が設定される。例えばアセスメントの概要表示部８１には、分析者名、アセスメント日、分析作業時間が表示される。システムの概要表示部８２には、システムの名称、分析対象部分、機械学習システム３０で使用している機械学習アルゴリズム、および機械学習システム３０で使用している機械学習パイプラインが表示される。 In the assessment summary display portion 81, information about the analyst who instructs the calculation of the attack risk is set. For example, the assessment summary display section 81 displays the analyst's name, assessment date, and analysis work time. The system overview display portion 82 displays the name of the system, the part to be analyzed, the machine learning algorithm used by the machine learning system 30, and the machine learning pipeline used by the machine learning system 30. FIG.

データ特性入力部８３は、機械学習システム３０が使用するデータの特性を示す情報の入力領域である。分析者は、データ特性入力部８３に、データの種類ごとの攻撃されやすさ、および影響度を入力する。データ特性入力部８３に入力された値が、データ特性管理テーブル１１１に設定される。 The data characteristic input section 83 is an input area for information indicating characteristics of data used by the machine learning system 30 . The analyst inputs the susceptibility to attack and the degree of impact for each type of data in the data characteristic input section 83 . A value input to the data characteristic input section 83 is set in the data characteristic management table 111 .

攻撃リスク表示部８４は、機械学習システム３０への攻撃リスクの表示領域である。図２１の例では、攻撃目的種別ごとに、その攻撃により攻撃を受けるリスクの高さを示す数値（攻撃リスク）が表示されている。例えば各攻撃目的種別について、訓練済みモデルへのアクセス性に応じた攻撃方法ごとに算出された攻撃リスクの最大値が、その攻撃目的種別に関する攻撃リスクとして表示される。 The attack risk display section 84 is a display area for attack risks to the machine learning system 30 . In the example of FIG. 21, a numerical value (attack risk) indicating the risk of being attacked by the attack is displayed for each attack purpose type. For example, for each attack purpose type, the maximum attack risk calculated for each attack method according to the accessibility to the trained model is displayed as the attack risk for that attack purpose type.

攻撃リスク詳細ボタン８５は、攻撃リスクの詳細を示す攻撃リスク詳細画面を表示させるためのボタンである。分析者が攻撃リスク詳細ボタン８５を押下すると、攻撃リスク表示部１５０は、モニタ２１に攻撃リスク詳細画面を表示する。 The attack risk details button 85 is a button for displaying an attack risk details screen showing details of attack risks. When the analyst presses the attack risk details button 85 , the attack risk display unit 150 displays an attack risk details screen on the monitor 21 .

図２２は、攻撃リスク詳細画面の一例を示す図である。攻撃リスク詳細画面９０には、攻撃ツリー表示部９１と計算詳細表示部９２とが含まれる。攻撃ツリー表示部９１には、図１７に示したような、条件ごとに求めた値と、攻撃目的種別ごとの尤度とが示された攻撃ツリー７０が表示される。計算詳細表示部９２には、攻撃リスクの詳細な計算内容が表示される。 FIG. 22 is a diagram illustrating an example of an attack risk detail screen. The attack risk detail screen 90 includes an attack tree display portion 91 and a calculation detail display portion 92 . The attack tree display section 91 displays an attack tree 70 showing the value obtained for each condition and the likelihood for each attack purpose type, as shown in FIG. 17 . The detailed calculation contents of the attack risk are displayed in the calculation details display section 92 .

図２３は、計算詳細表示部の表示例を示す図である。計算詳細表示部９２には、例えば攻撃手法・攻撃リスクの種類、尤度、影響度、および攻撃リスクの欄が設けられている。攻撃手法・攻撃リスクの種類の欄には、攻撃手法とその攻撃手法による攻撃リスクの種類が表示されている。尤度の欄には、攻撃手法と攻撃リスクの種類との組合せについて算出された尤度が表示されている。影響度の欄には、攻撃リスクの種類に対応する影響度が表示されている。攻撃リスクの欄には、攻撃手法と攻撃リスクの種類との組合せについて算出された攻撃リスクが表示されている。なお「代理モデルを介した攻撃による訓練データ推定」の攻撃リスクは参考値であり、機械学習システム３０のリスクを示す値としては用いられない。 FIG. 23 is a diagram showing a display example of the calculation details display section. The calculation details display section 92 is provided with, for example, columns for attack method/attack risk type, likelihood, impact, and attack risk. The column of attack method/attack risk type displays the attack method and the type of attack risk due to the attack method. The likelihood column displays likelihoods calculated for combinations of attack methods and types of attack risks. The impact level column displays the impact level corresponding to the type of attack risk. The attack risk column displays attack risks calculated for combinations of attack methods and types of attack risks. Note that the attack risk of “training data estimation by attack via proxy model” is a reference value and is not used as a value indicating the risk of the machine learning system 30 .

図２３の例では、攻撃リスクの種類「訓練済みモデルの漏洩」については、代理モデルを介した攻撃による攻撃リスクが最も高い。攻撃リスクの種類「訓練データ推定」については、ＢＢ攻撃による攻撃リスクが最も高い。攻撃リスクの種類「敵対的サンプル」については、ＢＢ攻撃または代理モデルを介した攻撃による攻撃リスクが最も高い。攻撃リスクの種類「ポイズニング（モデル操作）」については、ＢＢ攻撃または代理モデルを介した攻撃による攻撃リスクが最も高い。攻撃リスクの種類「ポイズニング（出力データ操作）」については、ＢＢ攻撃または代理モデルを介した攻撃による攻撃リスクが最も高い。 In the example of FIG. 23, for the type of attack risk "leakage of trained model", the attack risk of attacks via proxy models is the highest. For the attack risk type “training data estimation,” the attack risk due to the BB attack is the highest. For the attack risk type 'adversarial sample', the attack risk from BB attacks or attacks via proxy models is the highest. For the attack risk type "poisoning (model manipulation)", the attack risk due to the BB attack or the attack via the proxy model is the highest. For the attack risk type "poisoning (output data manipulation)", the attack risk due to the BB attack or the attack via the proxy model is the highest.

このように攻撃リスクの計算詳細を表示することで、分析者は、機械学習システム３０が、どのような手法の攻撃による、どのような目的での攻撃に脆弱なのかを容易に把握することができる。そして分析者は、機械学習システム３０の運用方法の改善や、機械学習アルゴリズムの改善により、攻撃リスクが高い攻撃手法と攻撃リスクの種類のリスクを低減させる対策をとることができる。その結果、機械学習システム３０の安全性を向上させることができる。 By displaying the calculation details of the attack risk in this way, the analyst can easily comprehend with what type of attack the machine learning system 30 is vulnerable to what kind of attack and for what purpose. can. The analyst can take measures to reduce the risk of high-risk attack methods and attack risk types by improving the operation method of the machine learning system 30 and improving the machine learning algorithm. As a result, the safety of the machine learning system 30 can be improved.

〔その他の実施の形態〕
第２の実施の形態では、図１９に示したように、攻撃ツリー７０の構造が尤度計算プログラム１３１内に組み込まれているが、攻撃ツリー７０の構造を示す情報をデータベース形式でメモリ１０２またはストレージ装置１０３に保持しておくこともできる。 [Other embodiments]
In the second embodiment, as shown in FIG. 19, the structure of the attack tree 70 is incorporated in the likelihood calculation program 131, but information indicating the structure of the attack tree 70 is stored in database form in the memory 102 or It can also be held in the storage device 103 .

図２４は、攻撃ツリーの構造定義情報の一例を示す図である。攻撃ツリーの構造定義情報１３２には、変数名、結合、および複数の引数の欄が設けられている。変数名の欄には、攻撃ツリー７０のノードに対応する条件、中間条件、攻撃目的種別それぞれの値を設定する変数の変数名が設定される。結合の欄には、対応する変数の値が、複数の引数の論理演算で求められる場合の論理演算子が設定される。結合欄に論理演算子「ＯＲ」が設定されている場合、引数で示される値のうちの最大値が、対応する変数の値となる。結合欄に論理演算子「ＡＮＤ」が設定されている場合、引数で示される値のうちの最小値が、対応する変数の値となる。複数の引数の欄には、対応する変数に設定する値の決定に使用する値の取得元（「入力から取得」または変数名）が設定される。 FIG. 24 is a diagram illustrating an example of attack tree structure definition information. The attack tree structure definition information 132 has columns for variable name, connection, and multiple arguments. In the variable name column, the variable names of the variables for setting the values of the conditions corresponding to the nodes of the attack tree 70, the intermediate conditions, and the attack purpose types are set. A logical operator used when the value of the corresponding variable is obtained by a logical operation of a plurality of arguments is set in the connection column. When the logical operator "OR" is set in the combination field, the maximum value among the values indicated by the arguments becomes the value of the corresponding variable. When the logical operator "AND" is set in the combination field, the minimum value among the values indicated by the arguments becomes the value of the corresponding variable. In the multiple argument columns, a value acquisition source (“obtained from input” or variable name) used to determine the value to be set in the corresponding variable is set.

なお図２４の変数名に付与されている「＊」は、「＿ｗｂ」、「＿ｂｂ」、「＿ｓｍ」をまとめて表したものである。例えば「Ｌ＿ｍｏｄｅｌ＿ｌｅａｋ＊」は、「Ｌ＿ｍｏｄｅｌ＿ｌｅａｋ＿ｗｂ」と「Ｌ＿ｍｏｄｅｌ＿ｌｅａｋ＿ｂｂ」と「Ｌ＿ｍｏｄｅｌ＿ｌｅａｋ＿ｓｍ」とを表している。 Note that "*" added to the variable names in FIG. 24 collectively represent "_wb", "_bb", and "_sm". For example, "L_model_leak*" represents "L_model_leak_wb", "L_model_leak_bb" and "L_model_leak_sm".

引数欄に設定された変数名に「＊」が付与されている場合、「＊」を「＿ｗｂ」、「＿ｂｂ」、「＿ｓｍ」それぞれに置き換えて得られる変数（引数）の値が、最後の３文字が同じ変数の引数として用いられる。例えば「Ｌ＿ｍｏｄｅｌ＿ｌｅａｋ＿ｗｂ」の引数は、「Ｌ＿ａｃｃｅｓｓ＿ｗｂ」である。 If "*" is added to the variable name set in the argument column, the value of the variable (argument) obtained by replacing "*" with "_wb", "_bb", and "_sm" is the last Three letters are used as arguments for the same variable. For example, the argument of "L_model_leak_wb" is "L_access_wb".

尤度算出部１３０は、攻撃ツリーの構造定義情報１３２の上位の変数から順に、値を確定していくことで、攻撃目的種別ごとの尤度を計算することができる。
以上、実施の形態を例示したが、実施の形態で示した各部の構成は同様の機能を有する他のものに置換することができる。また、他の任意の構成物や工程が付加されてもよい。さらに、前述した実施の形態のうちの任意の２以上の構成（特徴）を組み合わせたものであってもよい。 The likelihood calculation unit 130 can calculate the likelihood for each attack purpose type by determining values in order from the upper variables in the structure definition information 132 of the attack tree.
Although the embodiment has been exemplified above, the configuration of each part shown in the embodiment can be replaced with another one having the same function. Also, any other components or steps may be added. Furthermore, any two or more configurations (features) of the above-described embodiments may be combined.

１機械学習システム
１ａ訓練済みモデル
２攻撃者
３分析者
４第１の情報
５第２の情報
６第３の情報
７ａ，７ｂ，・・・第４の情報
１０情報処理装置
１１記憶部
１２処理部
1 machine learning system 1a trained model 2 attacker 3 analyst 4 first information 5 second information 6 third information 7a, 7b, ... fourth information 10 information processing device 11 storage unit 12 processing unit

Claims

the computer
Information indicating the availability of a trained model that defines output data in response to a query, information indicating the availability of training data used to train the trained model by machine learning, and queries accepted by the trained model. First information including at least one of information indicating a limit value of the number, information indicating the operability of queries received by the trained model, and information indicating the operability of the training data Receiving second information containing at least one of
generating third information indicating a possibility that an attacker accesses the trained model through one or more access forms, based on the received first information;
Based on the generated third information and the received second information, the possibility that the trained model will be attacked by each of the one or more access modes in order to achieve a predetermined attack purpose is determined. generating fourth information indicating
generation method.

The one or more forms of access include a first form of access to access information in the trained model, a second form of access to input queries to the trained model, and a query to the trained model. at least one of a third form of access for accessing information in a surrogate model generated by inputting of
calculating the possibility of access in the first access form based on information indicating the availability of the trained model;
calculating the possibility of access in the second access form based on the information indicating the availability of the trained model and the information indicating the availability of the training data;
Possibility of access in the third access form based on the information indicating availability of the trained model, the information indicating availability of the training data, and the information indicating the limit value of the number of queries to calculate
A method according to claim 1.

A numerical value indicating the availability of the trained model, a numerical value indicating the availability of the training data used to train the trained model by machine learning, and a number of queries received by the trained model. At least one of the numerical values indicating the height of the limit value of is obtained as the first information,
obtaining, as the second information, a numerical value indicating the degree of operability of queries received by the trained model;
calculating, as the third information, a numerical value indicating a high probability that the trained model will be accessed by each of the one or more access modes;
The smaller numerical value of the numerical value calculated as the third information for one access mode and the numerical value indicated in the second information is used as the fourth information for the one access mode. ,
3. The production method according to claim 1 or 2.

generating the fourth information based on the generated third information and the received first and second information;
3. The production method according to claim 1 or 2.

A numerical value indicating the availability of the trained model, a numerical value indicating the availability of the training data used to train the trained model by machine learning, and a number of queries received by the trained model. At least one of the numerical values indicating the height of the limit value of is obtained as the first information,
Acquiring a numerical value indicating the degree of operability of the training data as the second information,
calculating, as the third information, a numerical value indicating a high probability that the trained model will be accessed by each of the one or more access modes;
Among the maximum value of one or more numerical values indicated in the first information, the numerical value indicated in the second information, and the numerical value calculated as the third information for one access mode Let the fourth information about the one access form be the minimum value of
5. The generating method according to claim 4.

to the computer,
Information indicating the availability of a trained model that defines output data in response to a query, information indicating the availability of training data used to train the trained model by machine learning, and queries accepted by the trained model. First information including at least one of information indicating a limit value of the number, information indicating the operability of queries received by the trained model, and information indicating the operability of the training data Receiving second information containing at least one of
generating third information indicating a possibility that an attacker accesses the trained model through one or more access forms, based on the received first information;
Based on the generated third information and the received second information, the possibility that the trained model will be attacked by each of the one or more access modes in order to achieve a predetermined attack purpose is determined. generating fourth information indicating
Generated program to run the process.

Information indicating the availability of a trained model that defines output data in response to a query, information indicating the availability of training data used to train the trained model by machine learning, and queries accepted by the trained model. First information including at least one of information indicating a limit value of the number, information indicating the operability of queries received by the trained model, and information indicating the operability of the training data Second information containing at least one of these is received, and based on the received first information, an attacker may access the trained model by one or more access modes. and based on the generated third information and the received second information, the trained by each of the one or more access modes to achieve a predetermined attack purpose a processing unit that generates fourth information indicating the likelihood of an attack on the model;
Information processing device having