WO2023249184A1

WO2023249184A1 - Adversarial training system and adversarial training method

Info

Publication number: WO2023249184A1
Application number: PCT/KR2022/020413
Authority: WO
Inventors: 최대선; 류권상
Original assignee: 숭실대학교 산학협력단
Priority date: 2022-06-22
Filing date: 2022-12-15
Publication date: 2023-12-28
Also published as: KR20230175007A

Abstract

An adversarial training system according to one aspect of the disclosed invention may comprise: an image reception unit configured to receive an original image for training and an adversarial transformed image for training generated by adding noise to the original image for training; an image feature extraction unit configured to extract an original image feature from the original image for training and to extract an adversarial transformed image feature from the adversarial transformed image for training by using a deep learning model; and a machine training unit configured to train the deep learning model on the basis of the original image feature and the adversarial transformed image feature.

Description

Adversarial learning systems and adversarial learning methods

The present invention relates to a learning system and learning method that can learn a deep learning model with high classification accuracy for original images and adversarial modified images.

This invention was developed by the Ministry of Science and ICT's information protection core technology development (task number: 1711134508, task number: 2021-0-00511-001, research project name: Development of Robust AI and distributed attack detection technology for edge AI security, Project management agency: Information and Communications Planning and Evaluation Institute, Project implementation agency: Soongsil University Industry-Academic Cooperation Foundation, Research period: 2021.04.01~2026.12.31) and Basic Research Laboratory Support Project (Project identification number: 1711137712, Project number: 2021R1A4A1029650, Research project name: It is derived from research conducted as part of the Autonomous Vehicle Security Basic Research Laboratory, project management agency: National Research Foundation of Korea, project implementation agency: Soongsil University Industry-Academic Cooperation Foundation, research period: 2021.06.01 to 2024.02.29). Meanwhile, there is no property interest of the Korean government in any aspect of the present invention.

An adversarial transformed image is an image created by adding adversarial perturbation that cannot be recognized by humans to the input image so that a deep neural network (DNN) for image classification misrecognizes it as a class other than the original class. It is also called an adversarial example.

In most cases, attackers of deep neural networks do not know any information about the deep learning model being attacked, such as network structure or training dataset. At this time, the attacker can deceive the deep learning model deployed in the real environment by performing a transfer attack. This utilizes the characteristics that deep learning models learn similar features to form similar classification boundaries and that adversarial examples that fool one model can fool other models.

Adversarial training is one of the effective methods for responding to adversarial attacks, including transfer attacks. Adversarial learning repeatedly generates adversarial examples, and a deep learning model is trained to accurately classify the generated adversarial examples. However, conventional adversarial learning methods have the problem of lower classification accuracy for original data compared to general learning methods because they train deep learning models to correctly classify only adversarial examples.

The present invention is intended to provide an adversarial learning system and an adversarial learning method that can classify adversarial examples more accurately than conventional deep learning model learning methods.

In addition, the present invention is intended to provide an adversarial learning system and an adversarial learning method that have higher classification accuracy for original data without adversarial transformation than conventional adversarial learning methods that learn to correctly classify only adversarial examples.

In addition, the present invention is to provide an adversarial learning system and an adversarial learning method that can prevent casualties due to misrecognition of surrounding objects in self-driving cars or financial damage due to misrecognition of faces through accurate classification even when subjected to transferable adversarial attacks. will be.

An adversarial learning system according to one aspect of the disclosed invention includes an image receiving unit configured to receive an original image for learning and an adversarial transformed image for learning generated by adding noise to the original image for learning; an image feature extraction unit configured to extract original image features from the original image for learning using a deep learning model and extract adversarial transformed image features from the adversarial transformed image for learning; It may include a machine learning unit configured to learn the deep learning model based on the original image features and the adversarial transformed image features.

In addition, it is configured to generate a label value for the original image for learning based on the original image features using the deep learning model and to generate a label value for the adversarial transformed image for learning based on the adversarial transformed image features. It further includes a label classification unit, and the machine learning unit is configured to learn the deep learning model based on the original image features, the adversarial transformed image features, the label value of the original image for learning, and the label value of the adversarial transformed image for learning. It can be.

Additionally, the machine learning unit: calculates a first loss function based on the original image features and the adversarial transformed image features; calculating a second loss function based on the label value of the original image for learning and the correct label value of the original image for learning; And, it may be configured to calculate a third loss function based on the label value of the adversarial modified image for learning and the correct answer label value of the adversarial modified image for learning.

In addition, the machine learning unit calculates a fourth loss function based on the first loss function, the second loss function, and the third loss function; And the deep learning model may be configured to learn so that the fourth loss function decreases.

Additionally, the machine learning unit may be configured to calculate the fourth loss function by adding the second loss function and the third loss function to a value obtained by multiplying the first loss function by a preset constant.

In addition, the machine learning unit calculates a value representing the difference between the original image features and the adversarial modified image features using the first loss function; The first loss function may be configured to learn the deep learning model in a manner that decreases as learning repeats.

In addition, the image receiving unit is configured to receive an image to be inspected, the image feature extractor is configured to extract an image feature to be inspected from the image to be inspected using the deep learning model, and the label classifier is configured to extract the image feature to be inspected from the image to be inspected using the deep learning model. It may be configured to generate a label value for the image to be inspected based on the characteristics of the image to be inspected using a deep learning model.

An adversarial learning method according to one aspect of the disclosed invention is a method of operating an adversarial learning system, comprising: receiving an original image for learning and an adversarial transformed image for learning generated by adding noise to the original image for learning; Extracting original image features from the original image for learning and extracting adversarial transformed image features from the adversarial transformed image for learning using a deep learning model; Generating a label value for the original image for learning based on the original image features using the deep learning model, and generating a label value for the adversarial transformed image for learning based on the adversarial transformed image features; And it may include learning the deep learning model based on the original image features, the adversarial transformed image features, the label value of the original image for learning, and the label value of the adversarial transformed image for learning.

Additionally, learning the deep learning model includes: calculating a first loss function based on the original image features and the adversarial transformed image features; calculating a second loss function based on the label value of the original image for learning and the correct label value of the original image for learning; calculating a third loss function based on the label value of the adversarial modified image for learning and the correct answer label value of the adversarial modified image for learning; calculating a fourth loss function based on the first loss function, the second loss function, and the third loss function; And it may include learning the deep learning model so that the fourth loss function decreases.

Additionally, the computer-readable recording medium according to one aspect of the disclosed invention may store a computer program to execute the joint position estimation method based on the stereo camera input.

According to one aspect of the disclosed invention, adversarial examples can be classified more accurately than conventional deep learning model learning methods.

Additionally, according to an embodiment of the present invention, the classification accuracy for original data to which no adversarial transformation has been applied may be higher than that of a conventional adversarial learning method that learns to correctly classify only adversarial examples.

Finally, according to an embodiment of the present invention, even if subjected to a transferable adversarial attack, it is possible to prevent casualties due to misrecognition of surrounding objects in an autonomous vehicle or financial damage due to misrecognition of faces through accurate classification.

1 is a configuration diagram of an adversarial learning system according to an embodiment.

Figure 2 is a diagram to explain the characteristics of a transition attack through a black box attack.

Figure 3 is a diagram for explaining the process of learning a deep learning model according to an embodiment.

Figure 4 is a flowchart of an adversarial learning method according to one embodiment.

Figure 5 is a table showing the degree of improvement of the adversarial learning method according to one embodiment compared to the conventional deep learning learning method.

Like reference numerals refer to like elements throughout the specification. This specification does not describe all elements of the embodiments, and general content or overlapping content between the embodiments in the technical field to which the disclosed invention pertains is omitted. The term '~unit' used in the specification may be implemented as software or hardware, and depending on the embodiments, multiple '~units' may be implemented as one component, or one '~unit' may be implemented as a plurality of components. It is also possible to include elements.

Additionally, when a part "includes" a certain component, this means that it may further include other components rather than excluding other components, unless specifically stated to the contrary. Terms such as first and second are used to distinguish one component from another component, and the components are not limited by the above-mentioned terms. Singular expressions include plural expressions unless the context clearly makes an exception.

The identification code for each step is used for convenience of explanation. The identification code does not explain the order of each step, and each step may be performed differently from the specified order unless a specific order is clearly stated in the context. there is. Hereinafter, the operating principle and embodiments of the disclosed invention will be described with reference to the attached drawings.

Referring to Figure 1, the adversarial learning system 100 according to an embodiment of the present invention includes an image reception unit 110, an image feature extraction unit 120, a label classification unit 130, a machine learning unit 140, and a memory ( 150) may be included.

The adversarial learning system 100 may be a system configured to learn a deep learning model 151 used to classify received images. The adversarial learning system 100 according to an embodiment of the present invention may be a system provided in a separate image classification device or a system provided in a server.

The image receiving unit 110 may receive an original image 200 for learning and an adversarial transformation image 300 for learning. The image receiving unit 110 may receive images input by a user through an input terminal, but is not limited to this. For example, the image receiving unit 110 receives the images previously stored in the memory 150, or the communication unit included in the adversarial learning system 100 receives the images received from the server, thereby receiving the original image 200 for learning. ) and an adversarial transformation image 300 for learning may be received. The image receiving unit 110 may transmit the received original image for learning 200 and the received adversarial transformation image 300 for learning to the image feature extracting unit 120.

The original image for training 200 and the adversarial transformation image 300 for training may be a plurality of training images used for training the deep learning model 151 used for image classification. The adversarial transformation image 300 for learning may be an image created by adding noise to the original image 200 for learning.

The deep learning-based image classification technology according to one embodiment may be a technology that classifies images using a deep learning model 151 learned in advance based on feature data extracted from images. At this time, a CNN (Convolutional Neural Networks) structure that stacks several convolution layers can be used to learn how to extract features from images, but the method for extracting features from images is not limited to this. .

The image feature extractor 120 may extract features from the image received from the image receiver 110. The characteristics of a specific image may be information representing various characteristics of the image. For example, the characteristics of a specific image may be information about color, brightness, border, etc. in each pixel unit of the image, but are not limited thereto.

The image feature extraction unit 120 extracts the original image features 501 from the original image 200 for learning using the deep learning model 151 and extracts the adversarial transformed image features 502 from the adversarial transformed image 300 for learning. It can be extracted. The image feature extraction unit 120 may transmit the original image features 501 and the adversarial transformation image features 502 to the label classification unit 130 and the machine learning unit 140.

The machine learning unit 140 may be configured to learn a deep learning model 151 based on the original image features 501 and the adversarial transformed image features 502. The machine learning unit 140 can learn the deep learning model 151 through iterative machine learning. The learned deep learning model 151 may be stored in the memory 150.

Machine learning can mean using a model composed of multiple parameters and optimizing the parameters with given data. Machine learning may include supervised learning, unsupervised learning, and reinforcement learning, depending on the type of learning problem. Supervised learning is learning the mapping between input and output, and can be applied when input and output pairs are given as data. Unsupervised learning is applied when there is only input and no output, and can find regularities between inputs. However, machine learning according to one embodiment is not necessarily limited to the above-described learning method.

The label classification unit 130 can generate a label value for the image received by the adversarial learning system 100 using the deep learning model 151 based on the characteristics of the image received from the image feature extraction unit 120. there is. The label value for an image may be a numerical value related to the classification result for that image. That is, the adversarial learning system 100 can classify each of the received images by generating a label value for each of them.

The label classification unit 130 may generate a label value for the original image 200 for learning based on the original image features 501. Additionally, the label classifier 130 may generate a label value for the adversarial deformed image 300 for learning based on the adversarial deformed image features 502. The label classification unit 130 may transmit the generated label values to the machine learning unit 140.

The machine learning unit 140 creates a deep learning model (151) based on the original image features 501, the adversarial transformed image features 502, the label value of the original image 200 for learning, and the label value of the adversarial transformed image 300 for learning. ) can be learned. In this way, the machine learning unit 140 iteratively uses the pairs of the plurality of original images for learning 200 and the plurality of adversarial transformation images 300 for learning received by the adversarial learning system 100 in the above-described manner to create a deep learning model 151. ) can be learned.

When repeated machine learning is completed, the adversarial learning system 100 can perform image classification on the image 400 to be inspected. The inspection target image 400 may be an image that the user actually wants to classify after the learning step.

The image receiver 110 may receive an image 400 to be inspected. The image receiver 110 may transmit the inspection target image 400 to the image feature extractor 120.

The image feature extraction unit 120 may extract image features to be inspected from the image to be inspected 400 using a deep learning model 151 that has been learned in advance. The image feature extraction unit 120 may transmit image features to be inspected to the label classification unit 130. The label classification unit 130 may generate a label value for the image to be inspected 400 based on the characteristics of the image to be inspected using a pre-trained deep learning model 151.

The deep learning model 151 used in the adversarial learning system 100 according to one embodiment does not simply use only the original image 200 for learning as training data, but also uses a pair of images that have been adversarially transformed from the original image as training data. I can write. Because of this, the adversarial learning system 100 according to one embodiment can respond more effectively to transfer attacks through black box attacks.

Referring to Figure 2, it can be seen that a transfer attack is possible by an adversarial image created by adding noise to the original image.

An adversarial transformation image is an image created by adding adversarial perturbation that cannot be recognized by humans to the input image so that a deep neural network for image classification misrecognizes it as a class other than the original class.

Among the techniques for analyzing or detecting images, there is a detection technique (for example, Steganalysis-based detection technique) that utilizes the characteristics of images that have similar values between neighboring pixels. Detection techniques that utilize the characteristics of images with similar values between neighboring pixels have poor detection performance for adversarial deformed images in which noise is added to the detection target image. Specifically, a detection technique that utilizes the characteristics of images with similar values between neighboring pixels determines it as a border if the difference in values in two or more directions among the eight neighboring pixels of each pixel is large. If there is noise at the border of an object in the image, This detection can be avoided.

An image modifier can create an adversarial modified image by adding noise (perturbation) to a normal image. The adversarial deformation image created in this way may not recognize the object's boundary as a boundary using a detection technique that utilizes the characteristics of images with similar values between neighboring pixels. When an adversarial modified image is input to an image classification device, a problem may arise in which image classification performance is poor even if it is a high-performing image classifier. In addition, the problem of intentional transfer attacks by hostile modified images may occur for specific artificial intelligence models.

For any artificial intelligence model, the attacker does not know any information about the artificial intelligence model that is the target of the attack, such as network structure or learning data set, so general attacks are impossible. In other words, white-box attacks are generally impossible. However, if an attacker successfully attacks an artificial intelligence model (Model A) using a hostile modified image, he or she can also successfully attack another similar model (Model B) using a hostile modified image using the transferability characteristic. You can do it. In other words, there is a problem that an adversarial example that fools one artificial intelligence model can fool other artificial intelligence models.

Conventional studies related to adversarial learning to solve problems caused by such adversarial deformed images use adversarial examples generated by manipulating training data for learning and learn to classify adversarial examples into normal classes. The model learned in this way has more sophisticated classification boundaries and is more resistant to adversarial examples. However, because conventional adversarial learning-related studies do not use normal data, they have the disadvantage of lower classification accuracy for inspection target images 400 that have not been adversarially transformed compared to learning methods that use only original images before transformation.

Referring to FIG. 3, the machine learning unit 140 generates a first loss function (

), second loss function (

) and the third loss function (

) can be calculated.

The image feature extractor 120 can extract features for the received image through a deep learning model 151 based on Deep Neural Networks (DNN) using a convolutional layer. At this time, the feature of an image may be the output value of the last convolution layer of the DNN-based deep learning model 151. That is, the original image feature 501 is the output value (

), and the adversarial deformed image feature 502 is the output value of the last convolutional layer of the DNN-based deep learning model 151 for the adversarial deformed image 300 for learning (

) can be.

The machine learning unit 140 creates a first loss function (

) can be calculated. Specifically, the machine learning unit 140 uses a first loss function (

) can be calculated as.

[Equation 1]

Referring to [Equation 1], the first loss function (

) is the output value of the last convolutional layer for the original image for training (200) (

) and the output value of the last convolutional layer for the training adversarial transformed image (300) (

) can be confirmed to be a loss function used to minimize the difference. The machine learning unit 140 has a first loss function (

) can learn the deep learning model 151 so that it decreases as learning repeats.

The label classifier 130 may generate a label value for each image based on the image feature, which is the output value of the last convolutional layer of the DNN-based deep learning model 151. In other words, the label classification unit assigns the label value (

), and the label value (

) can be created.

The machine learning unit 140 may calculate the second loss function based on the label value of the original image 200 for learning and the correct answer label value of the original image 200 for learning. The correct answer label value of one original image 200 for learning may be a predetermined correct answer value regarding how the original image 200 for learning will be classified.

[Equation 2]

Referring to [Equation 2], the second loss function (

) is the label value (

) and the value derived by multiplying the dot product of the correct answer label value (y) of the original image for learning (x) (200) by the logarithmic value and a negative number (

) can be calculated. The label value (

), the more similar it is to the correct answer label value (y) of the original image 200 for learning, the value derived by multiplying the inner product of these values by the logarithmic value and a negative number (

) can decrease. That is, the second loss function (

) is the label value (

) and the correct answer label value (y) of the original image 200 for learning.

The machine learning unit 140 may calculate a third loss function based on the label value of the adversarial deformed image 300 for learning and the correct answer label value of the adversarial deformed image 300 for learning.

[Equation 3]

Referring to [Equation 3], the third loss function (

) is the label value (

) and the value derived by multiplying the dot product of the correct label value (y) of the adversarial transformation image (x') 300 for learning by a logarithmic value and a negative number (

) can be calculated. The label value (

), the more similar it is to the correct label value (y) of the adversarial transformation image 300 for learning, the value derived by multiplying the inner product of these values by the logarithmic value and a negative number (

) can decrease. That is, the third loss function (

) is the label value (

) and the loss function used to minimize the difference between the correct answer label value (y) of the adversarial transformation image 300 for learning.

[Equation 4]

Referring to [Equation 4], the fourth loss function (

) is the first loss function (

), second loss function (

) and the third loss function (

) may be a combination of

The machine learning unit 140 may calculate the fourth loss function based on the first loss function, second loss function, and third loss function. That is, the machine learning unit 140 uses the first loss function (

) is multiplied by a preset constant (γ) and the second loss function (

) and the third loss function (

) and the fourth loss function (

) can be calculated. The machine learning unit 140 may learn the deep learning model 151 so that the fourth loss function decreases while repeating learning.

At least one component may be added or deleted in response to the performance of the components described above. Additionally, it will be easily understood by those skilled in the art that the mutual positions of the components may be changed in response to the performance or structure of the system.

Figure 4 is a flowchart of an adversarial learning method according to one embodiment. This is only a preferred embodiment for achieving the purpose of the present invention, and of course, some components may be added or deleted as needed.

Referring to FIG. 4, the image receiver 110 may receive an original image for training 200 and an adversarial transformation image 300 for training generated by adding noise to the original image 200 for training (1001).

The image feature extraction unit 120 extracts the original image features 501 from the original image 200 for learning using the deep learning model 151 and extracts the adversarial transformed image features 502 from the adversarial transformed image 300 for learning. Can be extracted (1002).

The label classification unit 130 uses the deep learning model 151 to generate a label value for the original image 200 for learning based on the original image features 501 and for learning based on the adversarial transformed image features 502. A label value for the adversarial deformed image 300 may be generated (1003).

The machine learning unit 140 may calculate the first loss function, the second loss function, and the third loss function (1004). At this time, the machine learning unit 140 calculates the first loss function based on the original image features 501 and the adversarial transformed image features 502, and calculates the label value of the original image for learning 200 and the original image for learning 200. A second loss function may be calculated based on the correct answer label value, and a third loss function may be calculated based on the label value of the adversarial deformed image 300 for learning and the correct answer label value of the adversarial deformed image 300 for learning.

The machine learning unit 140 may calculate the fourth loss function based on the first loss function, second loss function, and third loss function (1005). At this time. The machine learning unit 140 may calculate the fourth loss function by adding the second loss function and the third loss function to the value obtained by multiplying the first loss function by a preset constant.

The machine learning unit 140 creates a deep learning model (151) based on the original image features 501, the adversarial transformed image features 502, the label value of the original image 200 for learning, and the label value of the adversarial transformed image 300 for learning. ) can be learned (1006). Specifically, the machine learning unit 140 can learn the deep learning model 151 so that the fourth loss function decreases as learning is repeated.

The image reception unit 110, the image feature extraction unit 120, the label classification unit 130, and the machine learning unit 140 may include any one processor among a plurality of processors included in the adversarial learning system 100. . Additionally, the adversarial learning method according to the embodiment of the present invention described so far can be implemented in the form of a program that can be driven by a processor.

Here, the program may include program instructions, data files, and data structures, etc., singly or in combination. Programs may be designed and produced using machine code or high-level language code. The program may be specially designed to implement the above-described method for modifying the code, or may be implemented using various functions or definitions known and available to those skilled in the art in the field of computer software. A program for implementing the above-described information display method may be recorded on a non-transitory recording medium readable by a processor. At this time, the recording medium may be the memory 150.

The memory 150 can store programs that perform the operations described above and the operations described later, and the memory 150 can execute the stored programs. In the case where there are a plurality of processors and memories 150, they may be integrated into one chip or may be provided in physically separate locations. The memory 150 may include volatile memory such as Static Random Access Memory (S-RAM) or Dynamic Random Access Memory (D-Lab) for temporarily storing data. In addition, the memory 150 includes read only memory (ROM), erasable programmable read only memory (EPROM), and electrically erasable programmable read only memory (EEPROM) for long-term storage of control programs and control data. May include non-volatile memory. The processor may include various logic circuits and operation circuits, process data according to a program provided from the memory 150, and generate control signals according to the processing results.

In order to verify the performance of the adversarial learning system 100 according to an embodiment of the present invention, an experiment was performed to classify images of the CIFAR-10 and CIFAR-100 datasets using the conventional deep learning learning method and the adversarial learning method of the present invention. proceeded.

Referring to Figure 5, it can be seen that the adversarial learning method (Feature-based Aeversarial Training; FAT) according to one embodiment has better classification performance than other conventional methods (Natural training, PGD Training, and TRADES). Specifically, the table at the top of Figure 5 is the result of an experiment for classifying images of the CIFAR-10 dataset, and the table at the bottom is the result of an experiment for classifying images of the CIFAR-100 dataset.

The values shown in each component of the table indicate the classification accuracy for the data set corresponding to the corresponding column by the learning method corresponding to the corresponding row. For example, the adversarial learning method (Feature-based Aeversarial Training; FAT) according to one embodiment has lower classification accuracy than the conventional general learning method (Natural Training) for the original image set (Natural) to which no adversarial transformation has been applied. However, the classification accuracy is higher than conventional adversarial learning methods (PGD Training and TRADES). In addition, the adversarial learning method (Feature-based Aeversarial Training; FAT) according to one embodiment is a conventional adversarial learning method for image sets (FGSM, PGD-20, DeepFool, CW-20, and MIM-20) to which adversarial transformation has been applied. It can be confirmed that the classification accuracy is higher than that of Feature-based Aeversarial Training (FAT).

In summary, the adversarial learning method (FAT) according to one embodiment is more adversarial than the conventional learning method (Natural Training) and the conventional adversarial learning method (PGD Training and TRADES). DeepFool, CW-20, and MIM-20) while more accurately classifying the original image set (Natural) to which no adversarial transformation has been applied than the conventional adversarial learning methods (PGD Training and TRADES). You can.

As described above, the disclosed embodiments have been described with reference to the attached drawings. A person skilled in the art to which the present invention pertains will understand that the present invention can be practiced in forms different from the disclosed embodiments without changing the technical idea or essential features of the present invention. The disclosed embodiments are illustrative and should not be construed as limiting.

Claims

an image receiving unit configured to receive an original image for learning and an adversarial transformation image for learning generated by adding noise to the original image for learning;

an image feature extraction unit configured to extract original image features from the original image for learning using a deep learning model and extract adversarial transformed image features from the adversarial transformed image for learning;

An adversarial learning system including a machine learning unit configured to learn the deep learning model based on the original image features and the adversarial transformed image features.
According to paragraph 1,

A label classification configured to generate a label value for the original image for learning based on the original image features using the deep learning model and generate a label value for the adversarial transformed image for learning based on the adversarial transformed image features. Contains more wealth,

The machine learning department,

An adversarial learning system configured to learn the deep learning model based on the original image features, the adversarial transformed image features, the label value of the original image for learning, and the label value of the adversarial transformed image for learning.
According to paragraph 2,

The machine learning department:

calculate a first loss function based on the original image features and the adversarial modified image features;

calculating a second loss function based on the label value of the original image for learning and the correct label value of the original image for learning; and

An adversarial learning system configured to calculate a third loss function based on the label value of the adversarial transformed image for learning and the correct answer label value of the adversarial transformed image for learning.
According to paragraph 3,

The machine learning department,

calculate a fourth loss function based on the first loss function, the second loss function, and the third loss function; and

An adversarial learning system configured to learn the deep learning model so that the fourth loss function decreases.
According to paragraph 4,

The machine learning department,

An adversarial learning system configured to calculate the fourth loss function by adding the second loss function and the third loss function to a value obtained by multiplying the first loss function by a preset constant.
According to paragraph 3,

The machine learning department,

calculate a value representing a difference between the original image feature and the adversarial modified image feature with the first loss function;

An adversarial learning system configured to learn the deep learning model so that the first loss function decreases as learning is repeated.
According to paragraph 2,

The image receiver,

configured to receive an image to be inspected,

The image feature extraction unit,

Configured to extract inspection target image features from the inspection target image using the deep learning model,

The label classification unit,

An adversarial learning system configured to generate a label value for the image to be inspected based on the characteristics of the image to be inspected using the deep learning model.
As a method of operating an adversarial learning system,

Receiving an original image for training and an adversarial transformation image for training generated by adding noise to the original image for training;

Extracting original image features from the original image for learning and extracting adversarial transformed image features from the adversarial transformed image for learning using a deep learning model;

Generating a label value for the original image for learning based on the original image features using the deep learning model, and generating a label value for the adversarial transformed image for learning based on the adversarial transformed image features; and

An adversarial learning method comprising the step of learning the deep learning model based on the original image features, the adversarial transformed image features, the label value of the original image for learning, and the label value of the adversarial transformed image for learning.
According to clause 8,

The steps for learning the deep learning model are:

calculating a first loss function based on the original image features and the adversarial modified image features;

calculating a second loss function based on the label value of the original image for learning and the correct label value of the original image for learning;

calculating a third loss function based on the label value of the adversarial modified image for learning and the correct answer label value of the adversarial modified image for learning;

calculating a fourth loss function based on the first loss function, the second loss function, and the third loss function; and

An adversarial learning method comprising training the deep learning model so that the fourth loss function decreases.
A computer-readable, non-transitory recording medium storing a computer program to execute the adversarial learning method of claim 8.