WO2023132061A1 - Training method, information processing device, and training program - Google Patents

Training method, information processing device, and training program Download PDF

Info

Publication number
WO2023132061A1
WO2023132061A1 PCT/JP2022/000378 JP2022000378W WO2023132061A1 WO 2023132061 A1 WO2023132061 A1 WO 2023132061A1 JP 2022000378 W JP2022000378 W JP 2022000378W WO 2023132061 A1 WO2023132061 A1 WO 2023132061A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
training
determination
model
contamination
Prior art date
Application number
PCT/JP2022/000378
Other languages
French (fr)
Japanese (ja)
Inventor
俊也 清水
郁也 森川
Original Assignee
富士通株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 富士通株式会社 filed Critical 富士通株式会社
Priority to PCT/JP2022/000378 priority Critical patent/WO2023132061A1/en
Publication of WO2023132061A1 publication Critical patent/WO2023132061A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the present invention relates to a training method, an information processing device, and a training program.
  • a poisoning attack is an attack that intentionally alters a machine learning model by mixing "unusual data (in other words, contaminated data)" into the training data. Small contaminant data contamination can significantly reduce the accuracy of machine learning models.
  • One aspect aims to accurately detect contaminated data and shorten the time required to detect contaminated data.
  • the training method is such that each of a plurality of data including first data included in training data and second data generated according to the first data and predetermined parameters is A judgment result obtained by judging whether the data is the first data or the second data using the judgment model is output, and a predetermined parameter is updated so that the judgment result fails the judgment of the second data.
  • the computer executes a process of training the decision model using the new second data generated according to the updated predetermined parameters.
  • FIG. 1 is a block diagram schematically showing a hardware configuration example of an information processing apparatus as an embodiment
  • FIG. 3 is a block diagram schematically showing a software configuration example of the information processing apparatus shown in FIG. 2
  • FIG. FIG. 4 is a flow chart illustrating a tainted data learning phase in a machine learning model as an embodiment
  • FIG. 5 is a flow chart illustrating details of a pollution data generation algorithm shown in FIG. 4
  • FIG. 4 is a flow chart illustrating a tainted data detection phase in a machine learning model as an embodiment
  • FIG. 1 is a diagram illustrating machine learning in a related example.
  • training data in which input data x and correct output data y are associated is input.
  • Training is performed based on training data and an empty model (see A12) and a trained model (see A13).
  • a trained model is composed of an empty model and model parameters, as indicated by A14.
  • the query data x may be an email text, and the output data y may be the spam determination result.
  • the query data x may be an image, and the output data y may be animal species.
  • Machine learning for determining poisoning attacks can be divided into methods that actually train (in other words, a method that measures the impact of contaminated data on a machine learning model) and methods that do not train during operation (in other words, data method for detecting anomalies) and
  • the method of actually training has the advantage that the accuracy of the machine learning model is higher than the method of not training during operation.
  • the method of actually training has the disadvantage that it is not easy to quickly find contaminated data because the training takes time, and it is difficult to apply to online learning, etc., in which learning is performed sequentially.
  • the method that does not require training during operation has the advantage of being able to detect contaminated data quickly with a short execution time, but has the disadvantage of relatively low detection accuracy of contaminated data.
  • sanitization is performed to detect contaminated data using normal data distribution. For example, data that is more than a certain distance away from the center point of normal data is regarded as contaminated data and detected.
  • a method that does not perform training during operation requires a certain amount of information in advance, such as the distribution of "normal data.” If we don't know whether the training data is contaminated or not, we can't use it as a reference because we don't know what the normal data is. Then, an adaptive attack (in other words, an attack that knows its defense criteria) may evade detection of tainted data.
  • FIG. 2 is a block diagram schematically showing a hardware configuration example of the information processing apparatus 1 according to the embodiment.
  • the information processing apparatus 1 includes a CPU 11, a memory section 12, a display control section 13, a storage device 14, an input interface (IF) 15, an external recording medium processing section 16 and a communication IF 17.
  • the memory unit 12 is an example of a storage unit, and is exemplified by Read Only Memory (ROM) and Random Access Memory (RAM).
  • ROM Read Only Memory
  • RAM Random Access Memory
  • a program such as a Basic Input/Output System (BIOS) may be written in the ROM of the memory unit 12 .
  • BIOS Basic Input/Output System
  • the software programs in the memory unit 12 may be appropriately read into the CPU 11 and executed.
  • the RAM of the memory unit 12 may be used as a temporary recording memory or a working memory.
  • the display control unit 13 is connected to the display device 131 and controls the display device 131 .
  • a display device 131 is a liquid crystal display, an organic light-emitting diode (OLED) display, a cathode ray tube (CRT), an electronic paper display, or the like, and displays various information for an operator or the like.
  • the display device 131 may be combined with an input device, such as a touch panel.
  • the storage device 14 is a storage device with high IO performance, and may be, for example, Dynamic Random Access Memory (DRAM), SSD, Storage Class Memory (SCM), or HDD.
  • DRAM Dynamic Random Access Memory
  • SCM Storage Class Memory
  • the input IF 15 may be connected to input devices such as the mouse 151 and keyboard 152 and may control the input devices such as the mouse 151 and keyboard 152 .
  • the mouse 151 and keyboard 152 are examples of input devices, and the operator performs various input operations via these input devices.
  • the external recording medium processing unit 16 is configured so that the recording medium 160 can be attached.
  • the external recording medium processing unit 16 is configured to be able to read information recorded on the recording medium 160 when the recording medium 160 is attached.
  • the recording medium 160 has portability.
  • the recording medium 160 is a flexible disk, optical disk, magnetic disk, magneto-optical disk, or semiconductor memory.
  • the communication IF 17 is an interface for enabling communication with external devices.
  • the CPU 11 is an example of a processor, and is a processing device that performs various controls and calculations.
  • the CPU 11 implements various functions by executing an operating system (OS) and programs read into the memory unit 12 .
  • OS operating system
  • the CPU 11 may be a multiprocessor including a plurality of CPUs, a multicore processor having a plurality of CPU cores, or a configuration having a plurality of multicore processors.
  • a device for controlling the operation of the entire information processing device 1 is not limited to the CPU 11, and may be, for example, any one of MPU, DSP, ASIC, PLD, and FPGA. Also, the device for controlling the operation of the entire information processing device 1 may be a combination of two or more of CPU, MPU, DSP, ASIC, PLD and FPGA.
  • MPU is an abbreviation for Micro Processing Unit
  • DSP is an abbreviation for Digital Signal Processor
  • ASIC is an abbreviation for Application Specific Integrated Circuit
  • PLD is an abbreviation for Programmable Logic Device
  • FPGA is an abbreviation for Field Programmable Gate Array.
  • FIG. 3 is a block diagram schematically showing a software configuration example of the information processing device 1 shown in FIG.
  • the CPU 11 of the information processing apparatus 1 shown in FIG. 2 functions as a feature extraction unit 111, a classifier block 110, and a contamination candidate data generator 114 (contamination candidate data generator h_w).
  • the classifier block 110 has functions as a classifier 112 (classifier D) and a loss calculator 113 .
  • contamination data learning phase indicated by symbol B1 in FIG. 3 a large amount of contamination candidate data is generated in advance, and features and determination methods are learned from the large amount of contamination candidate data. Created before the tainted data detection phase shown.
  • a data set X for training, testing, etc. is accepted as an input.
  • x_i and y_i are data (in other words, first data) and labels, respectively.
  • the contamination candidate data generator 114 creates a database h_w(X) of contamination candidate data (in other words, second data) based on the data set X and the parameter w.
  • Parameter w is a parameter of the contamination candidate data generation algorithm.
  • Contamination candidate data generation algorithm the degree of contamination of the label to be contaminated and the machine learning model are parameterized and used as the parameter w.
  • Contamination candidate data generation algorithm includes BGD (Back-gradient Descent; Luis Munoz-Gonzalez, Battista Biggio, Ambra Demontis, Andrea Paudice, Vasin Wongrassamee, Emil C. Lupu, Fabio Roli, “ Towards Poisoning of Deep Learning Algorithms with Back-gradient Optimization”, Aug. 29, 2017) may be used, or other algorithms may be used.
  • an appropriate machine learning model M for generating contamination candidate data is prepared, and a label pair (y_i, y_j) is defined that indicates which label should be tainted so that it is misidentified as which label.
  • a function that is controlled by a parameter ⁇ representing a tradeoff between BGD accuracy and efficiency and generates contamination candidate data from initial value data x is described as bgd(M, y_i, y_j, ⁇ , x).
  • the contamination candidate data generator h_w and the contamination candidate data can be created by the following procedures (1) to (3).
  • training of the machine learning model M is not necessarily required to create the contamination candidate data generator h_w.
  • the contamination candidate data generator h_w may be created without the machine learning model M already in place.
  • the feature quantity extraction unit 111 extracts feature quantities from the data set X and the contamination candidate data database h_w(X).
  • the discriminator 112 discriminates between contamination candidate data and normal data using a machine learning model (in other words, a judgment model).
  • the discriminator 112 is trained to discriminate between contamination candidate data and normal data.
  • the data set X itself may be used, or a data set that has been characterized using a feature extraction means such as Principal Component Analysis (PCA). may be used.
  • PCA Principal Component Analysis
  • the loss calculation unit 113 updates the parameter w so that the discrimination between the contamination candidate data and the normal data by the discriminator 112 fails.
  • the loss calculator 113 returns, for example, an evaluation result using a loss function representing the inability to discriminate between the data set X and the contamination candidate data database h_w(X) to the contamination candidate data generator 114 as a parameter w.
  • the feature amount extraction unit 111 extracts the feature amount of the detection target data set X'.
  • the discriminator 112 detects tainted data by computing features of the data using the discriminator 112 or intermediate layers of the discriminator 112 .
  • the feature quantity extraction unit 111 and the contamination candidate data generator 114 receive input of the data set X (step S1).
  • the contamination candidate data generator 114 executes the contamination data generation algorithm h_w (step S2). Details of the tainted data generation algorithm h_w will be described later with reference to FIG.
  • the discriminator 112 uses a machine learning model to discriminate between contaminated data and normal data (step S3).
  • the contaminated data and the normal data used for discrimination by the discriminator 112 may be data from which the feature quantity is extracted by the feature quantity extraction unit 111 .
  • the discriminator 112 is trained to successfully discriminate between contaminated data and normal data (step S4).
  • the loss calculator 113 updates the parameter w so that the discrimination by the discriminator 112 fails, and returns the updated parameter w to the contamination candidate data generator 114 (step S5).
  • the processing in steps S2 to S5 may be repeatedly executed until a predetermined termination condition (for example, completion of a certain number of loops or no update of parameter w) is satisfied. Then the tainted data learning phase in the machine learning model ends.
  • step S2 the details of the tainted data generation algorithm (step S2) shown in FIG. 4 will be described using the flowchart (steps S21 to S23) shown in FIG.
  • the contamination candidate data generator 114 uses the dataset X to train one machine learning model M (step S21).
  • the contamination candidate data generator 114 updates the parameter w while keeping the machine learning model M fixed (step S23). Then, the process returns to step S21.
  • the feature quantity extraction unit 111 receives input of the detection target data set X' (step S11).
  • the feature amount extraction unit 111 extracts feature amounts from the data set X' (step S12).
  • the discriminator 112 uses the machine learning model M to discriminate between contaminated data and normal data (step S13). Then the tainted data detection phase in the machine learning model ends.
  • the discriminator 112 discriminates each of a plurality of data including first data included in the training data and second data generated according to the first data and a predetermined parameter from the first data. A judgment result obtained by judging whether it is the second data or the second data using the judgment model is output. The loss calculation unit 113 updates a predetermined parameter such that the determination result fails the determination of the second data. The discriminator 112 trains the decision model using new second data generated according to the updated predetermined parameters.
  • the discriminator 112 extracts a feature amount from each of the first data and the second data, and performs determination using a determination model on the extracted feature amount. This makes it possible to efficiently determine the first data and the second data.
  • the tainted candidate data generator 114 generates tainted data for poisoning attacks on the decision model as second data. This makes it possible to detect poisoning attacks in the dirty data detection phase.
  • the contamination candidate data generator 114 can be trained adversarially by receiving the results of the discriminator block 110 and updating learning.
  • the classifier block 110 can also be trained adversarially by receiving and learning the input of the data set X and the database h_w(X) of contamination candidate data.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

According to the present invention: a determination result is output in which, by using a determination model, each of a plurality of pieces of data, including first data included in training data (X) and second data generated according to the first data and prescribed parameters (w), is determined as the first data or the second data; the prescribed parameters (w) are updated so that the determination result will fail to determine the second data; and the determination model is trained by using new second data generated according to the updated prescribed parameters (w).

Description

訓練方法,情報処理装置及び訓練プログラムTraining method, information processing device and training program
 本発明は、訓練方法,情報処理装置及び訓練プログラムに関する。 The present invention relates to a training method, an information processing device, and a training program.
 近年、機械学習を用いたシステム・サービスの開発・利用が急速に進んでいる。一方で、機械学習固有のセキュリティ問題も様々に見つかっている。特に、機械学習モデルを汚染するデータを訓練データに紛れ込ませる汚染攻撃の研究も進んでいる。 In recent years, the development and use of systems and services using machine learning have progressed rapidly. On the other hand, various machine learning-specific security problems have also been found. In particular, research is also progressing on pollution attacks that mix data that pollutes machine learning models with training data.
 ポイズニング攻撃は、訓練データに「通常とは違ったデータ(別言すれば、汚染データ)」を混入させることによって機械学習モデルを意図的に改変する攻撃である。少ない汚染データ混入によって、著しく機械学習モデルの精度を落とすことができる。 A poisoning attack is an attack that intentionally alters a machine learning model by mixing "unusual data (in other words, contaminated data)" into the training data. Small contaminant data contamination can significantly reduce the accuracy of machine learning models.
特開2018-92613号公報JP 2018-92613 A
 しかしながら、通常の異常検知技術では検知し辛いポイズニング攻撃も存在し、訓練データ中のあるデータについて、そのデータにポイズニング攻撃の意図があるかどうかの判別を正確に又は迅速にできないおそれがある。 However, there are poisoning attacks that are difficult to detect with normal anomaly detection technology, and there is a risk that it may not be possible to accurately or quickly determine whether certain data in the training data is intended to be a poisoning attack.
 1つの側面では、汚染データの検知を正確に行うと共に、汚染データの検知に要する時間を短縮させることを目的とする。 One aspect aims to accurately detect contaminated data and shorten the time required to detect contaminated data.
 1つの側面では、訓練方法は、訓練データに含まれる第1のデータと、第1のデータと所定のパラメタとに応じて生成される第2のデータとを含む複数のデータのそれぞれが、第1のデータと第2のデータとのいずれであるかを、判定モデルを用いて判定した判定結果を出力し、判定結果が第2のデータの判定に失敗するように、所定のパラメタを更新し、更新後の所定のパラメタに応じて生成される新たな第2のデータを用いて、判定モデルを訓練する、処理をコンピュータが実行する。 In one aspect, the training method is such that each of a plurality of data including first data included in training data and second data generated according to the first data and predetermined parameters is A judgment result obtained by judging whether the data is the first data or the second data using the judgment model is output, and a predetermined parameter is updated so that the judgment result fails the judgment of the second data. , the computer executes a process of training the decision model using the new second data generated according to the updated predetermined parameters.
 1つの側面では、汚染データの検知を正確に行うと共に、汚染データの検知に要する時間を短縮させることができる。 In one aspect, it is possible to accurately detect contaminated data and reduce the time required to detect contaminated data.
関連例における機械学習を説明する図である。It is a figure explaining machine learning in a related example. 実施形態としての情報処理装置のハードウェア構成例を模式的に示すブロック図である。1 is a block diagram schematically showing a hardware configuration example of an information processing apparatus as an embodiment; FIG. 図2に示した情報処理装置のソフトウェア構成例を模式的に示すブロック図である。3 is a block diagram schematically showing a software configuration example of the information processing apparatus shown in FIG. 2; FIG. 実施形態としての機械学習モデルにおける汚染データ学習フェーズを説明するフローチャートである。FIG. 4 is a flow chart illustrating a tainted data learning phase in a machine learning model as an embodiment; FIG. 図4に示した汚染データ生成アルゴリズムの詳細を説明するフローチャートである。FIG. 5 is a flow chart illustrating details of a pollution data generation algorithm shown in FIG. 4; FIG. 実施形態としての機械学習モデルにおける汚染データ検知フェーズを説明するフローチャートである。FIG. 4 is a flow chart illustrating a tainted data detection phase in a machine learning model as an embodiment; FIG.
 〔A〕関連例
 図1は、関連例における機械学習を説明する図である。
[A] Related Example FIG. 1 is a diagram illustrating machine learning in a related example.
 符号A1に示す訓練フェーズにおいて、入力データxと正解出力データyとが対応づけられた訓練データ(符号A11参照)が入力される。訓練データと空のモデル(符号A12参照)とに基づき訓練が行われ、訓練済みモデル(符号A13参照)。訓練済みモデルは、符号A14に示すように、空のモデルとモデルのパラメタとによって構成される。 In the training phase indicated by symbol A1, training data (see symbol A11) in which input data x and correct output data y are associated is input. Training is performed based on training data and an empty model (see A12) and a trained model (see A13). A trained model is composed of an empty model and model parameters, as indicated by A14.
 符号A2に示す推論フェーズにおいて、クエリデータx(符号A21参照)から、訓練済みモデル(符号A22参照)に基づいた推論(y=f(x))が行われて、出力データy(符号A23参照)が出力される。 In the inference phase indicated by symbol A2, inference (y=f(x)) based on the trained model (see symbol A22) is performed from the query data x (see symbol A21) to obtain the output data y (see symbol A23). ) is output.
 例えば、クエリデータxはメール文章であってよく、出力データyはスパムであるか否かの判定結果であってよい。また、例えば、クエリデータxは画像であってよく、出力データyは動物の種類であってよい。 For example, the query data x may be an email text, and the output data y may be the spam determination result. Also, for example, the query data x may be an image, and the output data y may be animal species.
 入出力データは全て数値列で表現される。訓練済みモデルはy=f(x)(x,yはベクトル)の形の関数と考えることができる。訓練とは、大量のx,yの組からそれに合った関数fを決める作業のことである。 All input and output data are expressed as numeric strings. A trained model can be thought of as a function of the form y=f(x), where x, y are vectors. Training is the task of determining a function f suitable for a large number of pairs of x and y.
 ポイズニング攻撃の判別のための機械学習は、実際に訓練を行う手法(別言すれば、汚染データの機械学習モデルに対する影響を測る手法)と運用時に訓練を行わない手法(別言すれば、データに対して異常検知を行う手法)とに分けられる。 Machine learning for determining poisoning attacks can be divided into methods that actually train (in other words, a method that measures the impact of contaminated data on a machine learning model) and methods that do not train during operation (in other words, data method for detecting anomalies) and
 実際に訓練を行う手法には、運用時に訓練を行わない手法に比べて機械学習モデルの精度が高くなるというメリットがある。一方、実際に訓練を行う手法には、訓練に時間がかかるため汚染データの素早い発見が容易でなく、逐次的に学習を行うオンライン学習等には適用しにくいというデメリットがある。 The method of actually training has the advantage that the accuracy of the machine learning model is higher than the method of not training during operation. On the other hand, the method of actually training has the disadvantage that it is not easy to quickly find contaminated data because the training takes time, and it is difficult to apply to online learning, etc., in which learning is performed sequentially.
 また、運用時に訓練を行わない手法には、実行時間が少なく素早く汚染データを検知することができるというメリットがある一方、比較的汚染データの検知の精度が低いというデメリットがある。 In addition, the method that does not require training during operation has the advantage of being able to detect contaminated data quickly with a short execution time, but has the disadvantage of relatively low detection accuracy of contaminated data.
 運用時に訓練を行わない手法では、“正常”なデータがある程度わかっている場合に、正常なデータ分布を用いて汚染データを検知するSanitization(浄化)が行われる。例えば、正常なデータの中心点から一定の距離以上離れているデータが汚染データとみなされて検知される。 In the method without training during operation, when "normal" data is known to some extent, sanitization is performed to detect contaminated data using normal data distribution. For example, data that is more than a certain distance away from the center point of normal data is regarded as contaminated data and detected.
 運用時に訓練を行わない手法では、“正常なデータ”について分布等の予めある程度情報が必要である。訓練データが汚染されているかどうかわからない状況では、そもそも正常なデータが不明なのでそれを基準として用いることができない。そして、適応的な攻撃(別言すれば、防御基準を知っている攻撃)により汚染データの検知が回避されるおそれがある。 A method that does not perform training during operation requires a certain amount of information in advance, such as the distribution of "normal data." If we don't know whether the training data is contaminated or not, we can't use it as a reference because we don't know what the normal data is. Then, an adaptive attack (in other words, an attack that knows its defense criteria) may evade detection of tainted data.
 〔B〕実施形態
 以下、図面を参照して一実施の形態を説明する。ただし、以下に示す実施形態はあくまでも例示に過ぎず、実施形態で明示しない種々の変形例や技術の適用を排除する意図はない。すなわち、本実施形態を、その趣旨を逸脱しない範囲で種々変形して実施することができる。また、各図は、図中に示す構成要素のみを備えるという趣旨ではなく、他の機能等を含むことができる。
[B] Embodiment An embodiment will be described below with reference to the drawings. However, the embodiments shown below are merely examples, and are not intended to exclude the application of various modifications and techniques not explicitly described in the embodiments. In other words, the present embodiment can be modified in various ways without departing from the spirit of the embodiment. Also, each drawing does not mean that it has only the constituent elements shown in the drawing, but can include other functions and the like.
 〔B-1〕構成例
 図2は、実施形態における情報処理装置1のハードウェア構成例を模式的に示すブロック図である。
[B-1] Configuration Example FIG. 2 is a block diagram schematically showing a hardware configuration example of the information processing apparatus 1 according to the embodiment.
 図1に示すように、情報処理装置1は、CPU11,メモリ部12,表示制御部13,記憶装置14,入力Interface(IF)15,外部記録媒体処理部16及び通信IF17を備える。 As shown in FIG. 1, the information processing apparatus 1 includes a CPU 11, a memory section 12, a display control section 13, a storage device 14, an input interface (IF) 15, an external recording medium processing section 16 and a communication IF 17.
 メモリ部12は、記憶部の一例であり、例示的に、Read Only Memory(ROM)及びRandom Access Memory(RAM)などである。メモリ部12のROMには、Basic Input/Output System(BIOS)等のプログラムが書き込まれてよい。メモリ部12のソフトウェアプログラムは、CPU11に適宜に読み込まれて実行されてよい。また、メモリ部12のRAMは、一時記録メモリあるいはワーキングメモリとして利用されてよい。 The memory unit 12 is an example of a storage unit, and is exemplified by Read Only Memory (ROM) and Random Access Memory (RAM). A program such as a Basic Input/Output System (BIOS) may be written in the ROM of the memory unit 12 . The software programs in the memory unit 12 may be appropriately read into the CPU 11 and executed. Also, the RAM of the memory unit 12 may be used as a temporary recording memory or a working memory.
 表示制御部13は、表示装置131と接続され、表示装置131を制御する。表示装置131は、液晶ディスプレイやOrganic Light-Emitting Diode(OLED)ディスプレイ,Cathode Ray Tube(CRT),電子ペーパーディスプレイ等であり、オペレータ等に対する各種情報を表示する。表示装置131は、入力装置と組み合わされたものでもよく、例えば、タッチパネルでもよい。 The display control unit 13 is connected to the display device 131 and controls the display device 131 . A display device 131 is a liquid crystal display, an organic light-emitting diode (OLED) display, a cathode ray tube (CRT), an electronic paper display, or the like, and displays various information for an operator or the like. The display device 131 may be combined with an input device, such as a touch panel.
 記憶装置14は、高IO性能の記憶装置であり、例えば、Dynamic Random Access Memory(DRAM)やSSD,Storage Class Memory(SCM),HDDが用いられてよい。 The storage device 14 is a storage device with high IO performance, and may be, for example, Dynamic Random Access Memory (DRAM), SSD, Storage Class Memory (SCM), or HDD.
 入力IF15は、マウス151やキーボード152等の入力装置と接続され、マウス151やキーボード152等の入力装置を制御してよい。マウス151やキーボード152は、入力装置の一例であり、これらの入力装置を介して、オペレータが各種の入力操作を行う。 The input IF 15 may be connected to input devices such as the mouse 151 and keyboard 152 and may control the input devices such as the mouse 151 and keyboard 152 . The mouse 151 and keyboard 152 are examples of input devices, and the operator performs various input operations via these input devices.
 外部記録媒体処理部16は、記録媒体160が装着可能に構成される。外部記録媒体処理部16は、記録媒体160が装着された状態において、記録媒体160に記録されている情報を読み取り可能に構成される。本例では、記録媒体160は、可搬性を有する。例えば、記録媒体160は、フレキシブルディスク、光ディスク、磁気ディスク、光磁気ディスク、又は、半導体メモリ等である。 The external recording medium processing unit 16 is configured so that the recording medium 160 can be attached. The external recording medium processing unit 16 is configured to be able to read information recorded on the recording medium 160 when the recording medium 160 is attached. In this example, the recording medium 160 has portability. For example, the recording medium 160 is a flexible disk, optical disk, magnetic disk, magneto-optical disk, or semiconductor memory.
 通信IF17は、外部装置との通信を可能にするためのインタフェースである。 The communication IF 17 is an interface for enabling communication with external devices.
 CPU11は、プロセッサの一例であり、種々の制御や演算を行う処理装置である。CPU11は、メモリ部12に読み込まれたOperating System(OS)やプログラムを実行することにより、種々の機能を実現する。なお、CPU11は、複数のCPUを含むマルチプロセッサであってもよいし、複数のCPUコアを有するマルチコアプロセッサであってもよく、或いは、マルチコアプロセッサを複数有する構成であってもよい。 The CPU 11 is an example of a processor, and is a processing device that performs various controls and calculations. The CPU 11 implements various functions by executing an operating system (OS) and programs read into the memory unit 12 . Note that the CPU 11 may be a multiprocessor including a plurality of CPUs, a multicore processor having a plurality of CPU cores, or a configuration having a plurality of multicore processors.
 情報処理装置1全体の動作を制御するための装置は、CPU11に限定されず、例えば、MPUやDSP,ASIC,PLD,FPGAのいずれか1つであってもよい。また、情報処理装置1全体の動作を制御するための装置は、CPU,MPU,DSP,ASIC,PLD及びFPGAのうちの2種類以上の組み合わせであってもよい。なお、MPUはMicro Processing Unitの略称であり、DSPはDigital Signal Processorの略称であり、ASICはApplication Specific Integrated Circuitの略称である。また、PLDはProgrammable Logic Deviceの略称であり、FPGAはField Programmable Gate Arrayの略称である。 A device for controlling the operation of the entire information processing device 1 is not limited to the CPU 11, and may be, for example, any one of MPU, DSP, ASIC, PLD, and FPGA. Also, the device for controlling the operation of the entire information processing device 1 may be a combination of two or more of CPU, MPU, DSP, ASIC, PLD and FPGA. Note that MPU is an abbreviation for Micro Processing Unit, DSP is an abbreviation for Digital Signal Processor, and ASIC is an abbreviation for Application Specific Integrated Circuit. PLD is an abbreviation for Programmable Logic Device, and FPGA is an abbreviation for Field Programmable Gate Array.
 図3は、図2に示した情報処理装置1のソフトウェア構成例を模式的に示すブロック図である。 FIG. 3 is a block diagram schematically showing a software configuration example of the information processing device 1 shown in FIG.
 図2に示した情報処理装置1のCPU11は、特徴量抽出部111,判別器ブロック110及び汚染候補データ生成器114(汚染候補データ生成器h_w)として機能する。判別器ブロック110は、判別器112(判別器D)及び損失計算部113としての機能を有する。 The CPU 11 of the information processing apparatus 1 shown in FIG. 2 functions as a feature extraction unit 111, a classifier block 110, and a contamination candidate data generator 114 (contamination candidate data generator h_w). The classifier block 110 has functions as a classifier 112 (classifier D) and a loss calculator 113 .
 図3の符号B1に示す汚染データ学習フェーズでは、汚染候補データを予め大量に生成し、そこから特徴及び判別方法を学習することで、汚染データを検知することのできる判別器112を符号B2に示す汚染データ検知フェーズの前に作成しておく。 In the contamination data learning phase indicated by symbol B1 in FIG. 3, a large amount of contamination candidate data is generated in advance, and features and determination methods are learned from the large amount of contamination candidate data. Created before the tainted data detection phase shown.
 符号B1に示す汚染データ学習フェーズでは、訓練やテスト等のためのデータセットXを入力として受け付ける。データセットはX=[(x_1,y_1),(x_2,y_2),…,(x_n,y_n)]で表されてよい。ここでx_i,y_iはそれぞれデータ(別言すれば、第1のデータ)及びラベルである。 In the contaminated data learning phase indicated by symbol B1, a data set X for training, testing, etc. is accepted as an input. A dataset may be represented by X=[(x_1, y_1), (x_2, y_2), . . . , (x_n, y_n)]. Here, x_i and y_i are data (in other words, first data) and labels, respectively.
 汚染候補データ生成器114は、データセットXとパラメタwとに基づき、汚染候補データ(別言すれば、第2のデータ)のデータベースh_w(X)を作成する。パラメタwは、汚染候補データ生成アルゴリズムのパラメタである。 The contamination candidate data generator 114 creates a database h_w(X) of contamination candidate data (in other words, second data) based on the data set X and the parameter w. Parameter w is a parameter of the contamination candidate data generation algorithm.
 汚染候補データ生成アルゴリズムでは、汚染対象とするラベル及び機械学習モデルを汚染する度合いをパラメタ化してパラメタwとして使用する。汚染候補データ生成アルゴリズムには、機械学習モデルを汚染するアルゴリズムであるBGD(Back-gradient Descent;Luis Munoz-Gonzalez, Battista Biggio, Ambra Demontis, Andrea Paudice, Vasin Wongrassamee, Emil C. Lupu, Fabio Roli, “Towards Poisoning of Deep Learning Algorithms with Back-gradient Optimization”,2017年8月29日)が用いられてもよいし、他のアルゴリズムが用いられてもよい。 In the contamination candidate data generation algorithm, the degree of contamination of the label to be contaminated and the machine learning model are parameterized and used as the parameter w. Contamination candidate data generation algorithm includes BGD (Back-gradient Descent; Luis Munoz-Gonzalez, Battista Biggio, Ambra Demontis, Andrea Paudice, Vasin Wongrassamee, Emil C. Lupu, Fabio Roli, “ Towards Poisoning of Deep Learning Algorithms with Back-gradient Optimization”, Aug. 29, 2017) may be used, or other algorithms may be used.
 BGDでは、汚染候補データを生成するための適当な機械学習モデルMを用意し、どのラベルをどのラベルと誤認識するように汚染したいかを表すラベルの組(y_i,y_j)を定義する。BGDの精度と効率のトレードオフとを表すパラメタεによって制御され、初期値データxから汚染候補データを生成する関数をbgd(M,y_i,y_j,ε,x)と記述する。このとき、以下の手順(1)~(3)で汚染候補データ生成器h_w及び汚染候補データを作ることができる。 In BGD, an appropriate machine learning model M for generating contamination candidate data is prepared, and a label pair (y_i, y_j) is defined that indicates which label should be tainted so that it is misidentified as which label. A function that is controlled by a parameter ε representing a tradeoff between BGD accuracy and efficiency and generates contamination candidate data from initial value data x is described as bgd(M, y_i, y_j, ε, x). At this time, the contamination candidate data generator h_w and the contamination candidate data can be created by the following procedures (1) to (3).
 (1)データセットXを使って機械学習モデルMを一つ訓練する。 (1) Train one machine learning model M using dataset X.
 (2)パラメタをw=(y_1,y_2,ε)として、汚染候補データのデータベースをh_w(X)=bgd(M,w,X)とする。 (2) Let the parameters be w=(y_1, y_2, ε) and the contamination candidate data database be h_w(X)=bgd(M, w, X).
 (3)機械学習モデルMは固定したまま、パラメタwの更新のみを行い汚染候補データを繰り返し生成する。 (3) With the machine learning model M fixed, only the parameter w is updated to repeatedly generate contamination candidate data.
 なお、汚染候補データ生成器h_wの作成のためには必ずしも機械学習モデルMの訓練を必要としない。例えば、訓練する想定モデルのアーキテクチャのみから汚染候補データを生成する手法(Pang Wei Koh, Percy Liang, “Understanding Black-box Predictions via Influence Functions“,2020年12月29日)などを用いることにより、訓練済みの機械学習モデルMなしに汚染候補データ生成器h_wを作成してもよい。 It should be noted that training of the machine learning model M is not necessarily required to create the contamination candidate data generator h_w. For example, by using a method that generates contamination candidate data only from the architecture of the assumed model to be trained (Pang Wei Koh, Percy Liang, “Understanding Black-box Predictions via Influence Functions”, December 29, 2020) The contamination candidate data generator h_w may be created without the machine learning model M already in place.
 特徴量抽出部111は、データセットX及び汚染候補データのデータベースh_w(X)から特徴量を抽出する。 The feature quantity extraction unit 111 extracts feature quantities from the data set X and the contamination candidate data database h_w(X).
 判別器112は、機械学習モデル(別言すれば、判定モデル)を用いて、汚染候補データと通常のデータとを判別する。判別器112は、汚染候補データと通常のデータとが判別できるように訓練される。 The discriminator 112 discriminates between contamination candidate data and normal data using a machine learning model (in other words, a judgment model). The discriminator 112 is trained to discriminate between contamination candidate data and normal data.
 なお、汚染候補データと通常のデータとを判別を行う際には、データセットXそのものを用いてもよいし、Principal Component Analysis(PCA)等の特徴量抽出手段を用いて特徴量化したデータセットを用いてもよい。 In addition, when distinguishing between contamination candidate data and normal data, the data set X itself may be used, or a data set that has been characterized using a feature extraction means such as Principal Component Analysis (PCA). may be used.
 損失計算部113は、判別器112による汚染候補データと通常のデータとの判別が失敗するようにパラメタwを更新する。損失計算部113は、例えば、データセットXと汚染候補データのデータベースh_w(X)との判別のできなさを表す損失関数を用いた評価結果をパラメタwとして汚染候補データ生成器114へ還元する。 The loss calculation unit 113 updates the parameter w so that the discrimination between the contamination candidate data and the normal data by the discriminator 112 fails. The loss calculator 113 returns, for example, an evaluation result using a loss function representing the inability to discriminate between the data set X and the contamination candidate data database h_w(X) to the contamination candidate data generator 114 as a parameter w.
 図3の符号B2に示す汚染データ検知フェーズにおいて、検知対象のデータセットX’が与えられた際には、特徴量抽出部111は、検知対象のデータセットX’の特徴量を抽出する。判別器112は、判別器112又は判別器112の中間層を用いてデータの特徴を計算することにより汚染データを検知する。 In the contaminated data detection phase indicated by reference symbol B2 in FIG. 3, when the detection target data set X' is given, the feature amount extraction unit 111 extracts the feature amount of the detection target data set X'. The discriminator 112 detects tainted data by computing features of the data using the discriminator 112 or intermediate layers of the discriminator 112 .
 〔B-2〕動作例
 実施形態としての機械学習モデルにおける汚染データ学習フェーズを、図4に示すフローチャート(ステップS1~S5)に従って説明する。
[B-2] Operation Example The contaminated data learning phase in the machine learning model as an embodiment will be described according to the flowchart (steps S1 to S5) shown in FIG.
 特徴量抽出部111及び汚染候補データ生成器114は、データセットXの入力を受け付ける(ステップS1)。 The feature quantity extraction unit 111 and the contamination candidate data generator 114 receive input of the data set X (step S1).
 汚染候補データ生成器114は、汚染データ生成アルゴリズムh_wを実行する(ステップS2)。なお、汚染データ生成アルゴリズムh_wの詳細は、図5を用いて後述する。 The contamination candidate data generator 114 executes the contamination data generation algorithm h_w (step S2). Details of the tainted data generation algorithm h_w will be described later with reference to FIG.
 判別器112は、機械学習モデルを用いて、汚染データと通常データとを判別する(ステップS3)。なお、判別器112による判別に用いられる汚染データと通常データとは、特徴量抽出部111によって特徴量が抽出されたデータであってもよい。 The discriminator 112 uses a machine learning model to discriminate between contaminated data and normal data (step S3). Note that the contaminated data and the normal data used for discrimination by the discriminator 112 may be data from which the feature quantity is extracted by the feature quantity extraction unit 111 .
 判別器112は、汚染データと通常データとの判別が成功するように訓練を行う(ステップS4)。 The discriminator 112 is trained to successfully discriminate between contaminated data and normal data (step S4).
 損失計算部113は、判別器112による判別が失敗するようにパラメタwを更新して、更新したパラメタwを汚染候補データ生成器114へ還元する(ステップS5)。ステップS2~S5における処理は、所定の終了条件(例えば、一定回数のループの完了、又は、パラメタwの更新がなくなること)を満たすまで繰り返し実行されてよい。そして、機械学習モデルにおける汚染データ学習フェーズは終了する。 The loss calculator 113 updates the parameter w so that the discrimination by the discriminator 112 fails, and returns the updated parameter w to the contamination candidate data generator 114 (step S5). The processing in steps S2 to S5 may be repeatedly executed until a predetermined termination condition (for example, completion of a certain number of loops or no update of parameter w) is satisfied. Then the tainted data learning phase in the machine learning model ends.
 次に、図4に示した汚染データ生成アルゴリズム(ステップS2)の詳細を、図5に示すフローチャート(ステップS21~S23)を用いて説明する。 Next, the details of the tainted data generation algorithm (step S2) shown in FIG. 4 will be described using the flowchart (steps S21 to S23) shown in FIG.
 汚染候補データ生成器114は、データセットXを使って機械学習モデルMを一つ訓練する(ステップS21)。 The contamination candidate data generator 114 uses the dataset X to train one machine learning model M (step S21).
 汚染候補データ生成器114は、パラメタをw=(y_1,y_2,ε)として、汚染候補データのデータベースをh_w(x)=bgd(M,w,x)とする(ステップS22)。 The contamination candidate data generator 114 sets the parameters to w=(y_1, y_2, ε) and the contamination candidate data database to h_w(x)=bgd(M, w, x) (step S22).
 汚染候補データ生成器114は、機械学習モデルMを固定したままで、パラメタwを更新する(ステップS23)。そして、処理はステップS21へ戻る。 The contamination candidate data generator 114 updates the parameter w while keeping the machine learning model M fixed (step S23). Then, the process returns to step S21.
 次に、実施形態としての機械学習モデルにおける汚染データ検知フェーズを、図6に示すフローチャート(ステップS11~S13)に従って説明する。 Next, the contaminated data detection phase in the machine learning model as an embodiment will be described according to the flowchart (steps S11 to S13) shown in FIG.
 特徴量抽出部111は、検知対象のデータセットX’の入力を受け付ける(ステップS11)。 The feature quantity extraction unit 111 receives input of the detection target data set X' (step S11).
 特徴量抽出部111は、データセットX’から特徴量を抽出する(ステップS12)。 The feature amount extraction unit 111 extracts feature amounts from the data set X' (step S12).
 判別器112は、機械学習モデルMを用いて、汚染データと通常データとを判別する(ステップS13)。そして、機械学習モデルにおける汚染データ検知フェーズは終了する。 The discriminator 112 uses the machine learning model M to discriminate between contaminated data and normal data (step S13). Then the tainted data detection phase in the machine learning model ends.
 〔C〕効果
 上述した実施形態における訓練方法,情報処理装置1及び訓練プログラムによれば、例えば以下の作用効果を奏することができる。
[C] Effects According to the training method, the information processing device 1, and the training program in the above-described embodiments, the following effects can be obtained, for example.
 判別器112は、訓練データに含まれる第1のデータと、第1のデータと所定のパラメタとに応じて生成される第2のデータとを含む複数のデータのそれぞれが、第1のデータと第2のデータとのいずれであるかを、判定モデルを用いて判定した判定結果を出力する。損失計算部113は、判定結果が第2のデータの判定に失敗するように、所定のパラメタを更新する。判別器112は、更新後の所定のパラメタに応じて生成される新たな第2のデータを用いて、判定モデルを訓練する。 The discriminator 112 discriminates each of a plurality of data including first data included in the training data and second data generated according to the first data and a predetermined parameter from the first data. A judgment result obtained by judging whether it is the second data or the second data using the judgment model is output. The loss calculation unit 113 updates a predetermined parameter such that the determination result fails the determination of the second data. The discriminator 112 trains the decision model using new second data generated according to the updated predetermined parameters.
 これにより、汚染データの検知を正確に行うと共に、汚染データの検知に要する時間を短縮させることができる。また、汚染データ検知フェーズには訓練をすることなしに汚染データの検知を行うことができる。更に、様々な種類の汚染データの特徴を予め計算することで、正常データを基にした防御よりもより広い範囲の汚染データを検知することができる。 As a result, it is possible to accurately detect contaminated data and reduce the time required to detect contaminated data. In addition, detection of contaminated data can be performed without training in the contaminated data detection phase. Furthermore, by precomputing features of various types of contamination data, a wider range of contamination data can be detected than defenses based on normal data.
 判別器112は、第1のデータ及び前記第2のデータのそれぞれから特徴量を抽出し、抽出した特徴量に対して、判定モデルを用いた判定を行う。これにより、第1のデータと第2のデータとの判定を効率的に行うことができる。 The discriminator 112 extracts a feature amount from each of the first data and the second data, and performs determination using a determination model on the extracted feature amount. This makes it possible to efficiently determine the first data and the second data.
 汚染候補データ生成器114は、判定モデルに対するポイズニング攻撃のための汚染データを第2のデータとして生成する。これにより、汚染データ検知フェーズにおけるポイズニング攻撃を検知することができる。また、汚染候補データ生成器114が判別器ブロック110の結果を受け取って学習を更新することにより、汚染候補データ生成器114を敵対的に訓練できる。更に、判別器ブロック110がデータセットXの入力と汚染候補データのデータベースh_w(X)とを受け取って学習することにより、判別器ブロック110も敵対的に訓練できる。 The tainted candidate data generator 114 generates tainted data for poisoning attacks on the decision model as second data. This makes it possible to detect poisoning attacks in the dirty data detection phase. In addition, the contamination candidate data generator 114 can be trained adversarially by receiving the results of the discriminator block 110 and updating learning. In addition, the classifier block 110 can also be trained adversarially by receiving and learning the input of the data set X and the database h_w(X) of contamination candidate data.
 〔D〕その他
 開示の技術は上述した実施形態に限定されるものではなく、本実施形態の趣旨を逸脱しない範囲で種々変形して実施することができる。本実施形態の各構成及び各処理は、必要に応じて取捨選択することができ、あるいは適宜組み合わせてもよい。
[D] Others The technology disclosed herein is not limited to the above-described embodiments, and various modifications can be made without departing from the spirit of the embodiments. Each configuration and each process of this embodiment can be selected or discarded as necessary, or may be combined as appropriate.
1    :情報処理装置
11   :CPU
110  :判別器ブロック
111  :特徴量抽出部
112  :判別器
113  :損失計算部
114  :汚染候補データ生成器
12   :メモリ部
13   :表示制御部
131  :表示装置
14   :記憶装置
15   :入力IF
151  :マウス
152  :キーボード
16   :外部記録媒体処理部
160  :記録媒体
17   :通信IF
1: information processing device 11: CPU
110: Discriminator block 111: Feature extraction unit 112: Discriminator 113: Loss calculation unit 114: Contamination candidate data generator 12: Memory unit 13: Display control unit 131: Display device 14: Storage device 15: Input IF
151: mouse 152: keyboard 16: external recording medium processing unit 160: recording medium 17: communication IF

Claims (9)

  1.  訓練データに含まれる第1のデータと、前記第1のデータと所定のパラメタとに応じて生成される第2のデータとを含む複数のデータのそれぞれが、前記第1のデータと前記第2のデータとのいずれであるかを、判定モデルを用いて判定した判定結果を出力し、
     前記判定結果が前記第2のデータの判定に失敗するように、前記所定のパラメタを更新し、
     更新後の前記所定のパラメタに応じて生成される新たな前記第2のデータを用いて、前記判定モデルを訓練する、
    処理をコンピュータが実行する、訓練方法。
    Each of a plurality of data including first data included in training data and second data generated according to the first data and a predetermined parameter is the first data and the second data. Output the judgment result judged using the judgment model, which is the data of
    updating the predetermined parameter such that the determination result fails the determination of the second data;
    training the decision model using the new second data generated according to the predetermined parameter after updating;
    A training method in which the processing is performed by a computer.
  2.  前記第1のデータ及び前記第2のデータのそれぞれから特徴量を抽出し、
     抽出した前記特徴量に対して、前記判定モデルを用いた判定を行う、
    処理を前記コンピュータが実行する、請求項1に記載の訓練方法。
    extracting a feature amount from each of the first data and the second data;
    performing determination using the determination model on the extracted feature amount;
    2. The training method of claim 1, wherein the processing is performed by the computer.
  3.  前記判定モデルに対するポイズニング攻撃のための汚染データを前記第2のデータとして生成する、
    処理を前記コンピュータが実行する、請求項1又は2に記載の訓練方法。
    generating pollution data for a poisoning attack on the decision model as the second data;
    3. The training method according to claim 1 or 2, wherein said computer executes the processing.
  4.  訓練データに含まれる第1のデータと、前記第1のデータと所定のパラメタとに応じて生成される第2のデータとを含む複数のデータのそれぞれが、前記第1のデータと前記第2のデータとのいずれであるかを、判定モデルを用いて判定した判定結果を出力し、
     前記判定結果が前記第2のデータの判定に失敗するように、前記所定のパラメタを更新し、
     更新後の前記所定のパラメタに応じて生成される新たな前記第2のデータを用いて、前記判定モデルを訓練する、
    プロセッサを備える、情報処理装置。
    Each of a plurality of data including first data included in training data and second data generated according to the first data and a predetermined parameter is the first data and the second data. Output the judgment result judged using the judgment model, which is the data of
    updating the predetermined parameter such that the determination result fails the determination of the second data;
    training the decision model using the new second data generated according to the predetermined parameter after updating;
    An information processing device comprising a processor.
  5.  前記プロセッサは、
     前記第1のデータ及び前記第2のデータのそれぞれから特徴量を抽出し、
     抽出した前記特徴量に対して、前記判定モデルを用いた判定を行う、
    請求項4に記載の情報処理装置。
    The processor
    extracting a feature amount from each of the first data and the second data;
    performing determination using the determination model on the extracted feature amount;
    The information processing apparatus according to claim 4.
  6.  前記プロセッサは、
     前記判定モデルに対するポイズニング攻撃のための汚染データを前記第2のデータとして生成する、
    請求項4又は5に記載の情報処理装置。
    The processor
    generating pollution data for a poisoning attack on the decision model as the second data;
    The information processing apparatus according to claim 4 or 5.
  7.  訓練データに含まれる第1のデータと、前記第1のデータと所定のパラメタとに応じて生成される第2のデータとを含む複数のデータのそれぞれが、前記第1のデータと前記第2のデータとのいずれであるかを、判定モデルを用いて判定した判定結果を出力し、
     前記判定結果が前記第2のデータの判定に失敗するように、前記所定のパラメタを更新し、
     更新後の前記所定のパラメタに応じて生成される新たな前記第2のデータを用いて、前記判定モデルを訓練する、
    処理をコンピュータに実行させる、訓練プログラム。
    Each of a plurality of data including first data included in training data and second data generated according to the first data and a predetermined parameter is the first data and the second data. Output the judgment result judged using the judgment model, which is the data of
    updating the predetermined parameter such that the determination result fails the determination of the second data;
    training the decision model using the new second data generated according to the predetermined parameter after updating;
    A training program that makes a computer perform processing.
  8.  前記第1のデータ及び前記第2のデータのそれぞれから特徴量を抽出し、
     抽出した前記特徴量に対して、前記判定モデルを用いた判定を行う、
    処理を前記コンピュータに実行させる、請求項7に記載の訓練プログラム。
    extracting a feature amount from each of the first data and the second data;
    performing determination using the determination model on the extracted feature amount;
    8. The training program according to claim 7, causing said computer to execute processing.
  9.  前記判定モデルに対するポイズニング攻撃のための汚染データを前記第2のデータとして生成する、
    処理を前記コンピュータに実行させる、請求項7又は8に記載の訓練プログラム。
    generating pollution data for a poisoning attack on the decision model as the second data;
    9. The training program according to claim 7 or 8, causing the computer to execute processing.
PCT/JP2022/000378 2022-01-07 2022-01-07 Training method, information processing device, and training program WO2023132061A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2022/000378 WO2023132061A1 (en) 2022-01-07 2022-01-07 Training method, information processing device, and training program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2022/000378 WO2023132061A1 (en) 2022-01-07 2022-01-07 Training method, information processing device, and training program

Publications (1)

Publication Number Publication Date
WO2023132061A1 true WO2023132061A1 (en) 2023-07-13

Family

ID=87073565

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2022/000378 WO2023132061A1 (en) 2022-01-07 2022-01-07 Training method, information processing device, and training program

Country Status (1)

Country Link
WO (1) WO2023132061A1 (en)

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
@ AKIRAXAKIRAX ( AKIRA TAKAHASHI ): "[There are an infinite number of GAN learning methods, so I decided to compare them]", QIITA, 30 August 2019 (2019-08-30), pages 1 - 15, XP009547896, Retrieved from the Internet <URL:https://qiita.com/akiraxakirax/items/a5c8ff3120e343d5aeee> *
LUIS MUNOZ-GONZALEZ; BJARNE PFITZNER; MATTEO RUSSO; JAVIER CARNERERO-CANO; EMIL C. LUPU: "Poisoning Attacks with Generative Adversarial Nets", ARXIV.ORG, 18 June 2019 (2019-06-18), XP081494189 *

Similar Documents

Publication Publication Date Title
JP7398068B2 (en) software testing
Zheng et al. Fault localization analysis based on deep neural network
Zhang et al. Towards characterizing adversarial defects of deep learning software from the lens of uncertainty
Hu et al. Ganfuzz: a gan-based industrial network protocol fuzzing framework
JP6860070B2 (en) Analytical equipment, log analysis method and analysis program
US10715570B1 (en) Generic event stream processing for machine learning
CA3132346A1 (en) User abnormal behavior recognition method and device and computer readable storage medium
Fan et al. Machine learning for black-box fuzzing of network protocols
WO2020164274A1 (en) Network verification data sending method and apparatus, and storage medium and server
WO2022180702A1 (en) Analysis function addition device, analysis function addition program, and analysis function addition method
CN104715190B (en) A kind of monitoring method and system of the program execution path based on deep learning
Rau et al. Transferring tests across web applications
CN114036531A (en) Multi-scale code measurement-based software security vulnerability detection method
Cavalcanti et al. Performance evaluation of container-level anomaly-based intrusion detection systems for multi-tenant applications using machine learning algorithms
WO2023177442A1 (en) Data traffic characterization prioritization
WO2020255414A1 (en) Learning assistance device, learning assistance method, and computer-readable recording medium
CN113222053B (en) Malicious software family classification method, system and medium based on RGB image and Stacking multi-model fusion
Zhao et al. Suzzer: A vulnerability-guided fuzzer based on deep learning
CN111738290B (en) Image detection method, model construction and training method, device, equipment and medium
WO2023132061A1 (en) Training method, information processing device, and training program
Jap et al. Practical side-channel based model extraction attack on tree-based machine learning algorithm
WO2023066237A1 (en) Artificial intelligence model learning introspection
Korstanje Machine Learning for Streaming Data with Python: Rapidly build practical online machine learning solutions using River and other top key frameworks
US10852354B1 (en) System and method for accelerating real X detection in gate-level logic simulation
Zhang et al. Vulnerability detection for smart contract via backward bayesian active learning

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22918650

Country of ref document: EP

Kind code of ref document: A1