WO2024013911A1 - 学習装置、学習方法、学習プログラム、推論装置、推論方法、及び推論プログラム - Google Patents
学習装置、学習方法、学習プログラム、推論装置、推論方法、及び推論プログラム Download PDFInfo
- Publication number
- WO2024013911A1 WO2024013911A1 PCT/JP2022/027626 JP2022027626W WO2024013911A1 WO 2024013911 A1 WO2024013911 A1 WO 2024013911A1 JP 2022027626 W JP2022027626 W JP 2022027626W WO 2024013911 A1 WO2024013911 A1 WO 2024013911A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- score
- model
- label
- loss
- correct
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Definitions
- the present invention relates to a learning device, a learning method, a learning program, an inference device, an inference method, and an inference program.
- Deep learning and deep neural networks have achieved great success in areas such as image recognition and speech recognition. For example, in image recognition using deep learning, when an image is input to a model that includes many deep learning nonlinear functions, it outputs a classification result of what the image depicts.
- convolutional neural networks and ReLU Rectified Linear Unit
- the conventional technology has a problem in that the model may not be sufficiently robust against adversarial attacks.
- the learning device inputs adversarially attacked data in which a pseudo adversarial attack was performed on the input data to the model, and calculates a score for each of multiple labels.
- a calculation unit that calculates a maximum value of an incorrect answer score calculated by the calculation unit for an incorrect answer label that is a label other than a correct answer label that is a label associated with the input data in advance among the plurality of labels; For a certain percentage of the input data that has a large difference between the correct answer score calculated by the calculation unit for the correct answer label, the one-to-other loss that is the loss between the correct answer score and the incorrect answer score is reduced.
- an updating unit that updates parameters of the model.
- FIG. 1 is a diagram illustrating the configuration of a model.
- FIG. 2 is a diagram illustrating the configuration of the final layer of the model.
- FIG. 3 is a diagram showing a configuration example of the learning device according to the first embodiment.
- FIG. 4 is a flowchart showing the flow of learning processing according to the first embodiment.
- FIG. 5 is a flowchart showing the flow of processing for creating a loss function.
- FIG. 6 is a diagram showing an example of a computer that executes a learning program.
- the learning device of the embodiment performs learning (training) of a deep learning model (hereinafter simply referred to as a model).
- a model a deep learning model
- a deep learning model may also be referred to as a deep neural network.
- the model consists of an input layer into which signals enter, one or more hidden layers that transform the signals from the input layer in various ways, and a final layer that transforms the signals of the hidden layers into outputs such as probabilities. and, including.
- the number of intermediate layers is L (L ⁇ 1).
- the output of the last intermediate layer (the L-th intermediate layer in the example of FIG. 1) is input to the softmax function to obtain the output.
- the output of the softmax function corresponds to the output of the entire model.
- FIG. 3 is a diagram illustrating a configuration example of a learning device according to the first embodiment.
- the learning device 10 includes a communication section 11, a storage section 12, and a control section 13.
- the communication unit 11 performs data communication with other devices.
- the communication unit 11 is a NIC (Network Interface Card).
- the communication unit 11 may be an interface for inputting and outputting data between an input device (for example, a mouse and a keyboard) and an output device (for example, a display).
- the storage unit 12 is a storage device such as an HDD (Hard Disk Drive), an SSD (Solid State Drive), or an optical disc. Note that the storage unit 12 may be a data-rewritable semiconductor memory such as a RAM (Random Access Memory), a flash memory, or an NVSRAM (Non Volatile Static Random Access Memory).
- the storage unit 12 stores an OS (Operating System) and various programs executed by the learning device 10.
- the storage unit 12 stores model information 121.
- Model information 121 is parameters for constructing a model.
- the model information 121 is weights, biases, etc. of a deep neural network.
- the model information 121 is updated through learning.
- the control unit 13 controls the entire learning device 10.
- the control unit 13 may be an electronic circuit or integrated circuit Realized.
- control unit 13 has an internal memory for storing programs and control data that define various processing procedures, and executes each process using the internal memory. Further, the control unit 13 functions as various processing units by running various programs.
- control unit 13 includes a calculation unit 131, a creation unit 132, and an update unit 133.
- the calculation unit 131 applies (inputs) input data to the model from among input data and labels selected from a data set (teacher data set) prepared in advance. Thereby, the calculation unit 131 calculates output data corresponding to the input data. Further, the model is constructed based on model information 121.
- model may be expressed as a classifier or a classifier as appropriate.
- the input data is the feature extracted from the image.
- the label is information that specifies the object appearing in the image.
- the score output by the model is the probability that an object corresponding to each label is included in the image.
- the creation unit 132 creates a loss function based on the output data of the model. For example, the creation unit 132 creates a loss function that decreases as the output data of the model matches the label corresponding to the input data (the greater the similarity).
- the updating unit 133 updates the model parameters, that is, the model information 121, so that the loss function is optimized (for example, minimized).
- the learning device 10 ends the learning process when the evaluation criteria are satisfied.
- the evaluation criteria is that the identification accuracy of the model with respect to a separately prepared evaluation data set has exceeded a threshold, that the amount of parameter updates has converged, that parameter updates have been repeated a certain number of times, and so on.
- the learning device 10 performs adversarial learning.
- adversarial learning a loss function that is different from general learning is created.
- adversarial learning in this embodiment will be explained.
- uppercase bold letters represent matrices
- lowercase bold letters represent column vectors.
- row vectors are expressed using transposition.
- the model is used for image recognition
- the present embodiment is not limited to image recognition, but is generally applicable to identification using deep learning.
- d is the dimension of the input data, and is expressed as a vector for simplicity, but the structure of the data does not matter.
- the input data x is data representing characteristics of an image.
- the deep learning model repeats nonlinear functions and linear operations and outputs output data through the softmax function in the final layer.
- z ⁇ (x) [z 1, ⁇ (x), z 2, ⁇ (x ),...,z M, ⁇ (x)] T.
- ⁇ is a parameter of a deep learning model, that is, model information 121.
- z ⁇ (x) is called logit.
- Equation (1) represents the score for each label in class classification, and the element with the largest output obtained from equation (2) is the recognition result of deep learning.
- Image recognition is one type of class classification, and the model f s (z ⁇ ( ⁇ )) that performs classification is called a classifier.
- ⁇ is obtained by optimization of equation (5) using adversarially attacked data x'.
- logit margin loss l LM ( ⁇ ) in equation (6) is considered to represent the difficulty of classifying data (x, y). It will be done.
- z k, ⁇ (x) is the kth element of z ⁇ (x) which is logit.
- the creation unit 132 creates a one-versus-other loss (one-versus-other loss) as shown in equation (7).
- the-rest loss) l Create a loss function using OVR .
- ⁇ is a convex function with a non-negative value and is differentiable, and ⁇ (z) ⁇ ( ⁇ z) for z>0.
- ⁇ (z) log(1+e ⁇ z ) (second term on the right side of equation (8)).
- l OVR in equation (7) is expressed as in equation (8).
- the creation unit 132 selects the top H% of data samples that have a large value of logit margin loss l LM (x') against an adversarial attack among the data (input data and label pair) included in the data set. Applies to targeted attacks. However, it is assumed that 0 ⁇ H ⁇ 100.
- the data sample is a set of adversarial attacked data x' in which the input data included in the data set was attacked, and a label y associated with the input data. It is assumed that there are multiple data samples. From this, the creation unit 132 creates the loss function of equation (9). Note that the first term in brackets [ ] on the right side of equation (9) is the cross entropy, and the second term is the one-to-other loss related to the hostile attack.
- S is a set of data samples in the lower (100-H)% where l LM (x') has a small value.
- L is a set of top H% data samples with large values of l LM (x').
- ⁇ is an adjustment parameter that determines the weight of loss.
- the creation unit 132 sorts the 100 data samples in descending order of l LM (x'). The creation unit 132 then acquires the 20th data sample from the beginning of the sorted data samples and adds them to the set L. The creation unit 132 also acquires the 21st to 100th data samples of the sorted data samples and adds them to the set S.
- the loss of the set S is cross entropy, but the loss is not limited to this, and other losses may be used.
- the top H Optimization is performed by applying one-to-other loss to % data samples.
- the updating unit 133 updates the parameters of the model so that equation (9) is optimized (minimized).
- l CE is the cross-entropy loss. Further, KL is KL divergence.
- the creation unit 132 creates the loss function of equation (11).
- Equation (11) The third term in [ ] on the right side of equation (11) differs from KL divergence in equation (10) in that p ⁇ (k
- x i ) p ⁇ (k
- FIG. 4 is a flowchart showing the flow of learning processing according to the first embodiment.
- the learning device 10 applies input data randomly selected from the data set to the discriminator (step S11).
- the classifier is an example of a model constructed from model information 121.
- the input data applied to the discriminator may be adversarial attacked data.
- the learning device 10 creates a loss function including a one-versus-other loss for the adversarial attack based on the output of the classifier and the labels included in the data set (step S12). Details of the process for creating the loss function will be described later.
- the learning device 10 updates the parameters of the classifier using the gradient of the loss function (step S13).
- the learning device 10 can update parameters using a known method such as error backpropagation.
- step S14 if the evaluation criteria are satisfied (step S14, Yes), the learning device 10 ends the process. On the other hand, if the evaluation criteria are not satisfied (step S14, No), the learning device 10 returns to step S11 and repeats the process.
- FIG. 5 is a flowchart showing the flow of processing for creating a loss function.
- the learning device 10 randomly selects some data samples from among the data samples applied to the classifier (step S121).
- the learning device 10 creates a hostile attack against the selected data sample (step S122).
- the learning device 10 calculates the one-versus-other loss for the generated attack (step S123).
- the learning device 10 creates a set L of H% data samples and another set S in descending order of one-to-other loss (step S124).
- the learning device 10 calculates the one-to-other loss for the attack on the set L, applies cross-entropy to the other sets S, and creates the sum of the one-to-other loss and the cross-entropy as a loss function ( Step S125).
- the loss function created here is as shown in equation (9).
- the calculation unit 131 inputs into the model adversarial attacked data obtained by performing a pseudo adversarial attack on the input data, and calculates a score for each of the plurality of labels.
- the score here includes not only the output from softmax but also logit, which has the same magnitude relationship as the output from softmax.
- the updating unit 133 calculates the maximum value of the incorrect answer score calculated by the calculation unit 131 for the incorrect answer label, which is a label other than the correct answer label that is a label associated with the input data in advance, among the plurality of labels, and the correct answer score for the correct answer label. For a certain percentage of input data that has a large difference between the correct score calculated by the calculation unit 131, the model parameters are updated so that the one-to-other loss, which is the loss between the correct score and the incorrect score, is reduced. do.
- the difference between the maximum value of the incorrect score calculated by the calculation unit 131 for the incorrect label and the correct score calculated by the calculation unit 131 for the correct label is, for example, as shown in equation (6). Further, the certain percentage is, for example, H%. Furthermore, the one-to-other loss for hostile-attacked data is as shown in equations (7) and (8), for example.
- the update unit 133 updates the model so that the loss function that combines the one-to-other loss when adversarially attacked data is input into the model and the KL divergence when adversarially attacked data is input into the model becomes small. Update the parameters of The loss function in this case is, for example, as shown in equation (10).
- the update unit 133 calculates the one-to-other loss when the input data is input to the model, and the KL divergence when the input data is input to the model. Update the model parameters so that the combined loss function becomes smaller.
- the loss function in this case is, for example, as shown in equation (11).
- this embodiment can also be applied to TRADES.
- the inference device performs the same processing as the calculation unit 131 using the model information 121 updated by the learning device 10.
- the inference device, the inference method, and the inference program can be realized using the calculation unit 131 of the learning device 10. Further, the inference device may be realized by a device different from the learning device 10 that has the same function as the calculation unit 131.
- the learning device 10 which functions as an inference device, inputs adversarial attacked data obtained by performing a pseudo adversarial attack on input data to a model, calculates scores for each of a plurality of labels, and calculates scores for each of a plurality of labels.
- each component of each device shown in the drawings is functionally conceptual, and does not necessarily need to be physically configured as shown in the drawings.
- the specific form of distributing and integrating each device is not limited to what is shown in the diagram, and all or part of the devices may be functionally or physically distributed or integrated in arbitrary units depending on various loads and usage conditions. Can be integrated and configured.
- each processing function performed by each device is realized in whole or in part by a CPU (Central Processing Unit) and a program that is analyzed and executed by the CPU, or by hardware using wired logic. It can be realized as Note that the program may be executed not only by the CPU but also by another processor such as a GPU.
- a CPU Central Processing Unit
- the learning device 10 can be implemented by installing a learning program that executes the above-described learning process into a desired computer as packaged software or online software. For example, by causing the information processing device to execute the above learning program, the information processing device can be made to function as the learning device 10.
- the information processing device referred to here includes a desktop or notebook personal computer.
- information processing devices include mobile communication terminals such as smartphones, mobile phones, and PHSs (Personal Handyphone Systems), as well as slate terminals such as PDAs (Personal Digital Assistants).
- the learning device 10 can also be implemented as a learning server device that uses a terminal device used by a user as a client and provides services related to the above-mentioned learning processing to the client.
- a learning server device is implemented as a server device that provides a learning service that takes learning data as input and outputs parameters of a trained model.
- the learning server device may be implemented as a Web server, or may be implemented as a cloud that provides services related to the above-mentioned learning processing by outsourcing.
- FIG. 6 is a diagram showing an example of a computer that executes a learning program.
- Computer 1000 includes, for example, a memory 1010 and a CPU 1020.
- the computer 1000 also includes a hard disk drive interface 1030, a disk drive interface 1040, a serial port interface 1050, a video adapter 1060, and a network interface 1070. These parts are connected by a bus 1080.
- the memory 1010 includes a ROM (Read Only Memory) 1011 and a RAM (Random Access Memory) 1012.
- the ROM 1011 stores, for example, a boot program such as BIOS (Basic Input Output System).
- Hard disk drive interface 1030 is connected to hard disk drive 1090.
- Disk drive interface 1040 is connected to disk drive 1100.
- Serial port interface 1050 is connected to, for example, mouse 1110 and keyboard 1120.
- Video adapter 1060 is connected to display 1130, for example.
- the hard disk drive 1090 stores, for example, an OS 1091, an application program 1092, a program module 1093, and program data 1094. That is, a program that defines each process of the learning device 10 is implemented as a program module 1093 in which computer-executable code is written.
- Program module 1093 is stored in hard disk drive 1090, for example.
- a program module 1093 for executing processing similar to the functional configuration of the learning device 10 is stored in the hard disk drive 1090.
- the hard disk drive 1090 may be replaced by an SSD (Solid State Drive).
- the setting data used in the processing of the embodiment described above is stored as program data 1094 in, for example, the memory 1010 or the hard disk drive 1090. Then, the CPU 1020 reads out the program module 1093 and program data 1094 stored in the memory 1010 and the hard disk drive 1090 to the RAM 1012 as necessary, and executes the processing of the embodiment described above.
- program module 1093 and the program data 1094 are not limited to being stored in the hard disk drive 1090, but may be stored in a removable storage medium, for example, and read by the CPU 1020 via the disk drive 1100 or the like.
- the program module 1093 and the program data 1094 may be stored in another computer connected via a network (LAN (Local Area Network), WAN (Wide Area Network), etc.). The program module 1093 and program data 1094 may then be read by the CPU 1020 from another computer via the network interface 1070.
- LAN Local Area Network
- WAN Wide Area Network
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Machine Translation (AREA)
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2024533424A JPWO2024013911A1 (https=) | 2022-07-13 | 2022-07-13 | |
| PCT/JP2022/027626 WO2024013911A1 (ja) | 2022-07-13 | 2022-07-13 | 学習装置、学習方法、学習プログラム、推論装置、推論方法、及び推論プログラム |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/JP2022/027626 WO2024013911A1 (ja) | 2022-07-13 | 2022-07-13 | 学習装置、学習方法、学習プログラム、推論装置、推論方法、及び推論プログラム |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2024013911A1 true WO2024013911A1 (ja) | 2024-01-18 |
Family
ID=89536149
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2022/027626 Ceased WO2024013911A1 (ja) | 2022-07-13 | 2022-07-13 | 学習装置、学習方法、学習プログラム、推論装置、推論方法、及び推論プログラム |
Country Status (2)
| Country | Link |
|---|---|
| JP (1) | JPWO2024013911A1 (https=) |
| WO (1) | WO2024013911A1 (https=) |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20200134468A1 (en) * | 2018-10-26 | 2020-04-30 | Royal Bank Of Canada | System and method for max-margin adversarial training |
| JP2021111232A (ja) * | 2020-01-14 | 2021-08-02 | 富士通株式会社 | 学習用データ生成プログラム、装置、及び方法 |
Family Cites Families (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11768932B2 (en) * | 2019-06-28 | 2023-09-26 | Baidu Usa Llc | Systems and methods for fast training of more robust models against adversarial attacks |
| US11334671B2 (en) * | 2019-10-14 | 2022-05-17 | International Business Machines Corporation | Adding adversarial robustness to trained machine learning models |
-
2022
- 2022-07-13 WO PCT/JP2022/027626 patent/WO2024013911A1/ja not_active Ceased
- 2022-07-13 JP JP2024533424A patent/JPWO2024013911A1/ja active Pending
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20200134468A1 (en) * | 2018-10-26 | 2020-04-30 | Royal Bank Of Canada | System and method for max-margin adversarial training |
| JP2021111232A (ja) * | 2020-01-14 | 2021-08-02 | 富士通株式会社 | 学習用データ生成プログラム、装置、及び方法 |
Non-Patent Citations (1)
| Title |
|---|
| KANAI SEKITOSHI; YAMADA MASANORI; YAMAGUCHI SHIN'YA; TAKAHASHI HIROSHI; IDA YASUTOSHI: "Constraining Logits by Bounded Function for Adversarial Robustness", 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), IEEE, 18 July 2021 (2021-07-18), pages 1 - 8, XP033975171, DOI: 10.1109/IJCNN52387.2021.9533777 * |
Also Published As
| Publication number | Publication date |
|---|---|
| JPWO2024013911A1 (https=) | 2024-01-18 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11829880B2 (en) | Generating trained neural networks with increased robustness against adversarial attacks | |
| Christou et al. | Performance and early drop prediction for higher education students using machine learning | |
| EP4092555A1 (en) | Control method, information processing device, and control program | |
| JP2019028839A (ja) | 分類器、分類器の学習方法、分類器における分類方法 | |
| US11914672B2 (en) | Method of neural architecture search using continuous action reinforcement learning | |
| US11960574B2 (en) | Image generation using adversarial attacks for imbalanced datasets | |
| JP6870508B2 (ja) | 学習プログラム、学習方法及び学習装置 | |
| US12020160B2 (en) | Generation of neural network containing middle layer background | |
| Berta et al. | Classifier calibration with roc-regularized isotonic regression | |
| US7778949B2 (en) | Method and apparatus for transductive support vector machines | |
| US20230047478A1 (en) | Method and system for learning an ensemble of neural network kernel classifiers based on partitions of the training data | |
| Chandak et al. | A comparison of word2vec, hmm2vec, and pca2vec for malware classification | |
| US20240037397A1 (en) | Interpreting convolutional sequence model by learning local and resolution-controllable prototypes | |
| Aydin et al. | Learning sparse models for a dynamic Bayesian network classifier of protein secondary structure | |
| Aslantaş et al. | Wrapper feature selection approach based on binary firefly algorithm for spam e-mail filtering | |
| Wu et al. | Efficient project gradient descent for ensemble adversarial attack | |
| Garg et al. | Kernelized hashcode representations for relation extraction | |
| WO2024013911A1 (ja) | 学習装置、学習方法、学習プログラム、推論装置、推論方法、及び推論プログラム | |
| JP7533587B2 (ja) | 推論方法、学習方法、推論装置、学習装置及びプログラム | |
| WO2024023946A1 (ja) | 音声処理装置、音声処理方法及び音声処理プログラム | |
| JP7571878B2 (ja) | 学習装置、学習方法及び学習プログラム | |
| JP7683817B2 (ja) | 学習方法、学習装置及び学習プログラム | |
| US12620217B2 (en) | Training device, training method, and training program | |
| JP7396467B2 (ja) | 学習装置、分類装置、学習方法、分類方法、および、プログラム | |
| US20250165751A1 (en) | Graph processing system and method using sparse decomposition |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22951125 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2024533424 Country of ref document: JP |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 22951125 Country of ref document: EP Kind code of ref document: A1 |