WO2023070696A1 - 针对连续学习能力系统的基于特征操纵的攻击和防御方法 - Google Patents

针对连续学习能力系统的基于特征操纵的攻击和防御方法 Download PDF

Info

Publication number
WO2023070696A1
WO2023070696A1 PCT/CN2021/128193 CN2021128193W WO2023070696A1 WO 2023070696 A1 WO2023070696 A1 WO 2023070696A1 CN 2021128193 W CN2021128193 W CN 2021128193W WO 2023070696 A1 WO2023070696 A1 WO 2023070696A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature
image
sample
samples
clean
Prior art date
Application number
PCT/CN2021/128193
Other languages
English (en)
French (fr)
Inventor
郭良轩
陈阳
余山
曲徽
黄旭辉
张金鹏
Original Assignee
中国科学院自动化研究所
中国航天科工集团第二研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院自动化研究所, 中国航天科工集团第二研究院 filed Critical 中国科学院自动化研究所
Publication of WO2023070696A1 publication Critical patent/WO2023070696A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/061Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using biological neurons, e.g. biological neurons connected to an integrated circuit

Definitions

  • the invention belongs to the technical fields of pattern recognition, machine learning, multi-task learning, and confrontation attack, and in particular relates to an attack and defense method, system, and device based on feature manipulation for a continuous learning capability system.
  • Deep artificial neural networks can extract high-level features from raw data, and use them as a basis to implement tasks such as pattern detection, recognition, and classification, and have shown great potential in learning complex mapping rules.
  • this capability is a "static" one, i.e. the mapping is usually fixed once training is complete.
  • deep artificial neural networks often destroy the mapping established in previous tasks, and do not have continuous learning ability. In the field of machine learning, this is often referred to as "catastrophic forgetting".
  • Many application scenarios require deep artificial neural networks to learn new information and adjust themselves, but “catastrophic forgetting" is undoubtedly a shortcoming.
  • the “continuous learning algorithm” came into being, aiming to balance the knowledge of old and new tasks so that the artificial intelligence system has the ability of continuous learning.
  • Such AI systems are called “continuous learning systems.”
  • the present invention proposes an attack and defense method based on feature manipulation for a continuous learning capability system (or intelligent system) with continuous learning capability, such as an image classification model based on continuous learning, which can covertly affect continuous Learning the learning process of the system and manipulating the learning result of the system.
  • the present invention proposes A signature manipulation-based attack and defense method for systems with continuous learning capabilities, the method comprising:
  • Step S10 obtaining training samples corresponding to class B tasks to be classified and learned in the image training sample set as clean samples; the image training sample set contains M types of tasks to be classified and learned;
  • Step S20 using a pre-built feature extraction network to extract the features of the clean samples as features of the clean samples;
  • Step S30 obtaining the training samples corresponding to the C-type tasks to be classified and learned in the image training sample set as the target samples, and extracting the features of the target samples through the feature extraction network as the target anchor feature;
  • Step S40 based on the clean sample feature, combined with the target anchor point feature, generate an adversarial sample of the B-class task to be classified and learned through a preset attack sample generation algorithm;
  • Step S50 delete the clean sample from the image training sample set, add the adversarial sample into the image training sample set, and train the image classification model through a continuous learning algorithm, and count the performance of the image classification model in class C task classification learning.
  • Step S60 if the correct rate of classification is lower than the set threshold, add a neuron in the linear classification layer of the image classification model to identify categories other than the M categories to be classified and learned;
  • the training samples of various tasks in the image training sample set of samples and the first matrix are added to the first matrix in the image training sample set containing confrontation samples according to the ratio of 1:n.
  • the image classification of the added neurons is performed.
  • the model is trained until a trained image classification model is obtained; otherwise, jump to step S70; wherein, the first matrix is a pixel matrix constructed based on random noise; n is a positive integer;
  • Step S70 classify the image to be classified based on the trained image classification model.
  • both the feature extraction network and the image classification model are constructed based on a deep neural network; wherein, the feature extraction network is constructed based on a deep neural network with a linear classification layer removed.
  • the loss function at the feature level of the image classification model during continuous learning is a loss function constructed based on a distance function; the distance function includes Euclidean distance.
  • the attack sample generation algorithm is:
  • X clean and X both represent clean samples, Indicates the adversarial samples obtained in the Nth iteration, J( ⁇ , ⁇ ) represents the loss function, h s represents the target anchor feature, Clip X, ⁇ ⁇ X′ ⁇ represents the clipping function, (x, y) represents the pixel coordinates, ⁇ Indicates the intensity of noise disturbance, ⁇ , ⁇ represent the preset weight parameters, F represents the feature extraction network, Represents the adversarial samples obtained by the unpruned N+1th iteration, Indicates to find the gradient for the clean sample X.
  • the continuous learning algorithm is the OWM continuous learning algorithm.
  • the continuous learning capability system is an image classification model.
  • the system includes: a clean sample acquisition module, a clean sample feature extraction module, target anchor point feature extraction module, adversarial sample generation method, continuous learning module, defense optimization module, image classification module;
  • the clean sample acquisition module is configured to obtain training samples corresponding to class B tasks to be classified and learned in the image training sample set as clean samples; the image training sample set contains M types of tasks to be classified and learned;
  • the clean sample feature extraction module is configured to extract the feature of the clean sample through a pre-built feature extraction network as a clean sample feature;
  • the target anchor point feature extraction module is configured to obtain training samples corresponding to the C-type tasks to be classified and learned in the image training sample set as target samples, and extract the features of the target samples through the feature extraction network as target anchor point features ;
  • the adversarial sample generation method is configured to generate an adversarial sample for a class B task to be classified and learned based on the clean sample feature, combined with the target anchor point feature, through a preset attack sample generation algorithm;
  • the continuous learning module is configured to delete clean samples from the image training sample set, add adversarial samples to the image training sample set, and train the image classification model through a continuous learning algorithm, and count the image classification model in C The classification accuracy rate corresponding to the clean sample during class task classification learning;
  • the defense optimization module is configured to add a neuron in the linear classification layer of the image classification model to identify categories other than the M categories to be classified and learned if the correct rate of classification is lower than the set threshold. category; the training samples of various tasks in the image training sample set containing the confrontation sample and the first matrix are added to the first matrix in the image training sample set containing the confrontation sample according to the ratio of 1:n, and after the addition, the increase
  • the image classification model of the neuron is trained until the trained image classification model is obtained; otherwise, the image classification module is skipped; wherein, the first matrix is a pixel matrix constructed based on random noise; n is a positive integer;
  • the image classification module is configured to classify the image to be classified based on the trained image classification model.
  • an electronic device including: at least one processor; and a memory connected to at least one processor in communication; wherein, the memory stores instructions executable by the processor , the instructions are used to be executed by the processor to implement the above-mentioned attack and defense method based on feature manipulation for a continuous learning capability system.
  • a computer-readable storage medium which is characterized in that the computer-readable storage medium stores computer instructions, and the computer instructions are used to be executed by the computer to realize the above-mentioned continuous Feature manipulation-based attack and defense methods for learning capability systems.
  • the invention improves the safety and robustness of the existing intelligent system based on continuous learning.
  • the present invention proposes a new neural network attack algorithm, which only utilizes the learning ability of the system itself and manipulates the ability and learning results of the continuous learning system without directly attacking and modifying system parameters.
  • the training process unlike traditional neural network attack algorithms, often targets a static neural network model. In terms of algorithm design, it has the advantages of strong scalability, easy operation and strong concealment;
  • the present invention will systematically quantify and analyze the robustness of mainstream continuous learning algorithms for the first time.
  • the focus of attention is still on the proposal of new algorithms to refresh the continuous learning performance of neural networks, but there is a lack of systematic research on the robustness of continuous learning algorithms. Therefore, the present invention will not only be a beneficial supplement, but may also provide a new perspective for continuous learning algorithm design.
  • the present invention reveals the potential risks of existing continuous learning algorithms when they are applied in actual scenarios, and also provides effective defense strategies. Whether it is to study a new continuous learning algorithm in the future, or to put an existing continuous learning algorithm into practice, the present invention has positive significance.
  • Fig. 1 is a schematic flow diagram of an attack and defense method based on feature manipulation for a continuous learning capability system according to an embodiment of the present invention
  • FIG. 2 is an example diagram of a generated adversarial example according to an embodiment of the present invention.
  • Fig. 3 is a schematic diagram of an adversarial attack process for continuous learning in an embodiment of the present invention.
  • Fig. 4 is a schematic diagram of the implementation effect of the image classification model of an embodiment of the present invention after being attacked; wherein, (a) is a comparison chart of the accuracy of each task between the attacked image classification model and the control group after all tasks are learned ; (b) is a line graph of the change in test accuracy of the attacked image classification model and the control group for the clean 3 during the learning process;
  • Fig. 5 is a schematic diagram of the implementation effect after defense optimization of an embodiment of the present invention. wherein, (a) is that after all tasks are learned, the accuracy of each task of the image classification model after defense and the image classification model not attacked rate comparison chart; (b) is a comparison chart of the accuracy of each task between the attacked image classification model and the defended image classification model after all tasks are learned; (c) is the learning process, the control group, the attacked image classification model Line chart of the change in test accuracy for clean 3 for the image classification model and the defended image classification model;
  • FIG. 6 is a schematic structural diagram of a computer system suitable for realizing the electronic device of the embodiment of the present application according to an embodiment of the present invention.
  • the continuous learning capability system is an image classification model, as shown in FIG. 1 , the method includes the following steps:
  • Step S10 obtaining training samples corresponding to class B tasks to be classified and learned in the image training sample set as clean samples; the image training sample set contains M types of tasks to be classified and learned;
  • Step S20 using a pre-built feature extraction network to extract the features of the clean samples as features of the clean samples;
  • Step S30 obtaining the training samples corresponding to the C-type tasks to be classified and learned in the image training sample set as the target samples, and extracting the features of the target samples through the feature extraction network as the target anchor feature;
  • Step S40 based on the clean sample feature, combined with the target anchor point feature, generate an adversarial sample of the B-class task to be classified and learned through a preset attack sample generation algorithm;
  • Step S50 delete the clean sample from the image training sample set, add the adversarial sample into the image training sample set, and train the image classification model through a continuous learning algorithm, and count the performance of the image classification model in class C task classification learning.
  • Step S60 if the correct rate of classification is lower than the set threshold, add a neuron in the linear classification layer of the image classification model to identify categories other than the M categories to be classified and learned;
  • the training samples of various tasks in the image training sample set of samples and the first matrix are added to the first matrix in the image training sample set containing confrontation samples according to the ratio of 1:n.
  • the image classification of the added neurons is performed.
  • the model is trained until a trained image classification model is obtained; otherwise, jump to step S70; wherein, the first matrix is a pixel matrix constructed based on random noise; n is a positive integer;
  • Step S70 classify the image to be classified based on the trained image classification model.
  • a concealed and delayed attack is proposed for the artificial neural network continuous learning system.
  • the damage to the system can not be manifested immediately, but the performance of the target task will drop sharply in a specific continuous learning stage. This poses a major challenge to the current continuous learning system, and also provides ideas for the robust design of continuous learning algorithms.
  • the present invention preset an attack target task; construct a feature extraction network independent of the continuous learning system and preset, for extracting the feature vector corresponding to the target task information type of the continuous learning system; 2 ) Use the feature extraction network to define a certain measure to measure the key features in the sample, and determine the key features for the preset attack target task; 3) Based on the key features of the preset target task, conduct training samples for the preset task After the features are fine-tuned, the covert attack operation on the preset target task in the continuous learning system is completed.
  • the specific process is as follows:
  • Step S10 obtaining training samples corresponding to class B tasks to be classified and learned in the image training sample set as clean samples; the image training sample set contains M types of tasks to be classified and learned;
  • image training samples of the continuous learning system that is, the image classification model
  • the image training sample set contains M kinds of tasks to be classified and learned.
  • the MNIST training set is used as the image training set, which contains 60,000 pictures of 10 handwritten digits, that is, the image sample training set of the present invention includes 0 ⁇ 9 out of 10 tasks to be learned for classification.
  • the training samples corresponding to the class B tasks to be classified and learned in the image training sample set as clean samples For example, in this embodiment, the number 3 is selected as a clean sample (that is, a sample to be attacked and learned), referred to as "clean 3".
  • Step S20 using a pre-built feature extraction network to extract the features of the clean samples as features of the clean samples;
  • the feature extraction network is constructed based on a deep neural network.
  • the feature extraction network of the present invention takes a fully connected neural network based on deep learning as an example, wherein the fully connected neural network based on deep learning is preferably a three-layer fully connected neural network.
  • the structure is [784-800-10], and the network is trained on the entire MNIST training set. [784-800-10], that is, the first layer of the network is an input layer with 784 neurons, which matches the data dimension of the training sample; the second layer is a hidden layer with 800 neurons, and the last layer contains 10 Classification layer for categories.
  • the feature extraction network in the present invention is constructed based on a deep neural network that removes the linear classification layer.
  • the Adam algorithm is used, the learning rate is 0.1, the weight decay rate is 0.0001, and the size of each batch is 256.
  • Step S30 obtaining the training samples corresponding to the C-type tasks to be classified and learned in the image training sample set as the target samples, and extracting the features of the target samples through the feature extraction network as the target anchor feature;
  • a certain spatial point or a certain subspace in the feature space corresponding to the image sample training set may be designated as the target feature.
  • the selection of target features depends on specific needs, and the sample features in a task other than the attack target task can be selected as its target features.
  • the number 5 is preferably used as the target sample, and the features of the target sample are extracted as the feature of the target anchor point.
  • Step S40 based on the clean sample feature, combined with the target anchor point feature, generate an adversarial sample of the B-class task to be classified and learned through a preset attack sample generation algorithm;
  • all numbers 3 are taken out from the training set of image samples, and Euclidean distance is preferably used as the loss function.
  • Euclidean distance is preferably used as the loss function.
  • other distances may be used as the loss function according to actual needs. That is, the loss function at the feature level of the image classification model is constructed based on the distance function during continuous learning.
  • the target feature obtained by the number 5 feature is the adversarial sample feature.
  • the fixed feature extraction network iteratively updates the digital adversarial examples by:
  • X clean and X both represent clean samples, Indicates the adversarial samples obtained in the Nth iteration, J( ⁇ , ⁇ ) represents the loss function, h s represents the target anchor feature, Clip X, ⁇ ⁇ X′ ⁇ represents the clipping function, (x, y) represents the pixel coordinates, ⁇ Indicates the intensity of noise disturbance, ⁇ , ⁇ represent the preset weight parameters, F represents the feature extraction network, Represents the adversarial samples obtained by the unpruned N+1th iteration, Indicates to find the gradient for the clean sample X.
  • Step S50 delete the clean sample from the image training sample set, add the adversarial sample into the image training sample set, and train the image classification model through a continuous learning algorithm, and count the performance of the image classification model in class C task classification learning.
  • a feed-forward neural network capable of continuous learning is constructed, preferably also taking the network structure [784-800-10] fully connected neural network based on deep learning as an example, as an image classification model, that is, in the present invention
  • the image classification model is built based on deep neural network.
  • clean samples are deleted from the image training sample set, and adversarial samples are added to the image training sample set.
  • adversarial samples are added to the image training sample set.
  • the continuous learning capability system is learning the number 3, we replace 90% of the clean samples (or all clean samples) with attack samples.
  • the desired attack effect will be triggered when the neural network is actually learning task 5.
  • the attack process is shown in Figure 3, where a is the normal continuous learning process, b is the attacked process, the B task is the attacked task, and the C task is the trigger point of the attack.
  • Fig. 4 is the attack effect of the method of the present invention.
  • the performance of task 3 drops sharply relative to normal continuous learning (Fig. 4(a)).
  • the performance of task 3 is normal just after learning, but the performance drops sharply after learning the number 5 (Fig. 4(b)).
  • the "comparison" in Figure 4 and Figure 5 refers to the classification accuracy (that is, the test accuracy) of the trained image classification model for each number without adding adversarial samples to the image training sample set of the image classification model.
  • Step S60 if the correct rate of classification is lower than the set threshold, add a neuron in the linear classification layer of the image classification model to identify categories other than the M categories to be classified and learned;
  • the training samples of various tasks in the image training sample set of samples and the first matrix are added to the first matrix in the image training sample set containing confrontation samples according to the ratio of 1:n.
  • the image classification of the added neurons is performed.
  • the model is trained until a trained image classification model is obtained; otherwise, jump to step S70; wherein, the first matrix is a pixel matrix constructed based on random noise; n is a positive integer;
  • the first step is to expand the structure of the network, and add a neuron in the final classification layer to teach the system what is the category of "nothing", that is, the category of rejection.
  • the second step is that during the training process, in each task, in addition to the samples of the original task, some auxiliary samples are added to the task to learn the task. In this way, the defense work is completed, and the specific processing process is as follows:
  • a head is added to the linear classification layer to identify random noise, and then the structure of the network becomes [784-800-11].
  • For various tasks in the image training sample set containing confrontation samples generate random noise pictures according to the ratio of 1:n (for example, "number 0" originally had 100 training images, according to the ratio of 1:n, if n is 6, then Generate 600 random noise pictures, that is, there are now a total of 700 training images for "number 0", and label the 11th category, and incorporate them into the image training sample set containing adversarial samples.
  • Use the 11-head network to train this data-augmented image training sample set.
  • the effect of defense optimization is shown in Figure 5, where the bar graph represents the test accuracy of each task after all tasks are learned, and the line graph represents the test accuracy of the clean number 3 as the learning progresses.
  • the accuracy rate of the number 3 dropped from 86.93% to 17.13%, with a performance loss of 69.8%; after passing the defense, the accuracy rate dropped from 86.93% to 38.61%, with a performance loss of 48.32%, which is about the original loss of 69.8%. 0.7 times, that is, the performance loss is reduced by about 30% after defense.
  • Step S70 classify the image to be classified based on the trained image classification model.
  • the image to be classified is obtained, and the image classification model trained by the continuous learning algorithm (that is, if the classification accuracy rate is lower than the set threshold, the image classification model trained in step S60 is used, otherwise, the image classification model trained in step S60 is used.
  • the image classification model trained by the continuous learning algorithm that is, if the classification accuracy rate is lower than the set threshold, the image classification model trained in step S60 is used, otherwise, the image classification model trained in step S60 is used.
  • S50 The trained image classification model
  • attack and defense method based on feature manipulation of the continuous learning capability system of the present invention in other embodiments, can also be used for attack and defense of other intelligent systems according to actual application scenarios and application needs, such as image detection, Identification, etc., will not be elaborated here one by one.
  • the second embodiment of the present invention is a feature manipulation-based attack and defense system for a continuous learning capability system
  • the continuous learning capability system is an image classification model
  • the system includes: a clean sample acquisition module, a clean sample feature extraction module, Target anchor point feature extraction module, adversarial sample generation method, continuous learning module, defense optimization module, image classification module;
  • the clean sample acquisition module is configured to obtain training samples corresponding to class B tasks to be classified and learned in the image training sample set as clean samples; the image training sample set contains M types of tasks to be classified and learned;
  • the clean sample feature extraction module is configured to extract the feature of the clean sample through a pre-built feature extraction network as a clean sample feature;
  • the target anchor point feature extraction module is configured to obtain training samples corresponding to the C-type tasks to be classified and learned in the image training sample set as target samples, and extract the features of the target samples through the feature extraction network as target anchor point features ;
  • the adversarial sample generation method is configured to generate an adversarial sample for a class B task to be classified and learned based on the clean sample feature, combined with the target anchor point feature, through a preset attack sample generation algorithm;
  • the continuous learning module is configured to delete clean samples from the image training sample set, add adversarial samples to the image training sample set, and train the image classification model through a continuous learning algorithm, and count the image classification model in C The classification accuracy rate corresponding to the clean sample during class task classification learning;
  • the defense optimization module is configured to add a neuron in the linear classification layer of the image classification model to identify categories other than the M categories to be classified and learned if the correct rate of classification is lower than the set threshold. category; the training samples of various tasks in the image training sample set containing the confrontation sample and the first matrix are added to the first matrix in the image training sample set containing the confrontation sample according to the ratio of 1:n, and after the addition, the increase
  • the image classification model of the neuron is trained until the trained image classification model is obtained; otherwise, the image classification module is skipped; wherein, the first matrix is a pixel matrix constructed based on random noise; n is a positive integer;
  • the image classification module is configured to classify the image to be classified based on the trained image classification model.
  • the attack and defense system based on feature manipulation for the continuous learning capability system provided by the above-mentioned embodiments is only illustrated by the division of the above-mentioned functional modules.
  • the above-mentioned functions can be allocated according to needs It is completed by different functional modules, that is, the modules or steps in the embodiments of the present invention are decomposed or combined.
  • the modules in the above embodiments can be combined into one module, or can be further split into multiple sub-modules to complete the above description all or part of the functions.
  • the names of the modules and steps involved in the embodiments of the present invention are only used to distinguish each module or step, and are not regarded as improperly limiting the present invention.
  • An electronic device includes at least one processor; and a memory communicatively connected to at least one of the processors; wherein, the memory stores instructions executable by the processor, and the instructions It is used to be executed by the processor to realize the attack and defense method based on feature manipulation for the continuous learning capability system in the claim.
  • a computer-readable storage medium stores computer instructions, and the computer instructions are used to be executed by the computer to realize the above-mentioned continuous learning ability system of the claim Attack and defense methods based on feature manipulation.
  • FIG. 6 shows a schematic structural diagram of a server computer system suitable for implementing the system, method, and device embodiments of the present application.
  • the server shown in FIG. 6 is only an example, and should not limit the functions and scope of use of this embodiment of the present application.
  • the computer system includes a central processing unit (CPU, Central Processing Unit) 601, which can be stored in a program in a read-only memory (ROM, Read Only Memory) 602 or loaded into a random access memory from a storage section 608 (RAM, Random Access Memory) 603 to execute various appropriate actions and processes.
  • Various programs and data necessary for system operation are also stored in RAM 603 .
  • the CPU 601 , ROM 602 , and RAM 603 are connected to each other via a bus 604 .
  • An input/output (I/O, Input/Output) interface 605 is also connected to the bus 604 .
  • the following components are connected to the I/O interface 605: an input section 606 including a keyboard, a mouse, etc.; an output section 607 including a cathode ray tube, a liquid crystal display, etc., and a speaker; a storage section 608 including a hard disk; , the communication part 609 of the network interface card such as the modem.
  • the communication section 609 performs communication processing via a network such as the Internet.
  • a drive 610 is also connected to the I/O interface 605 as needed.
  • a removable medium 611 such as a magnetic disk, optical disk, magneto-optical disk, semiconductor memory, etc. is mounted on the drive 610 as necessary so that a computer program read therefrom is installed into the storage section 608 as necessary.
  • embodiments of the present disclosure include a computer program product, which includes a computer program carried on a computer-readable medium, where the computer program includes program codes for executing the methods shown in the flowcharts.
  • the computer program may be downloaded and installed from a network via the communication portion 609 and/or installed from a removable medium 611 .
  • the computer program is executed by the CPU 601
  • the above-mentioned functions defined in the method of the present application are executed.
  • the computer-readable medium mentioned above in this application may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two.
  • a computer-readable storage medium may be, for example but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, RAM, ROM, erasable programmable read-only memory (EPROM or flash memory), Optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, in which computer-readable program codes are carried. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can send, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device. .
  • Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wires, optical cables, etc., or any suitable combination of the above.
  • Computer program codes for performing the operations of the present application can be written with one or more programming languages or combinations thereof, and the above-mentioned programming languages include object-oriented programming languages, such as Java, Smalltalk, C++, and also include conventional A procedural programming language, such as C or a similar programming language.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. Where a remote computer is involved, the remote computer can be connected to the user computer through any kind of network, including a local or wide area network, or can be connected to an external computer (via the Internet, for example, using an Internet service provider).
  • each block in a flowchart or block diagram may represent a module, program segment, or portion of code that contains one or more logical functions for implementing specified executable instructions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented by a dedicated hardware-based system that performs the specified functions or operations , or may be implemented by a combination of dedicated hardware and computer instructions.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Neurology (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Image Analysis (AREA)

Abstract

本发明属于模式识别、机器学习、多任务学习、对抗攻击技术领域,具体涉及一种针对连续学习能力系统的基于特征操纵的攻击和防御方法,旨在解决现有基于连续学习的智能系统安全性、鲁棒性较差的问题。本发明方法包括:获取图像干净样本;提取干净样本的特征;获取目标样本并提取特征,作为目标锚点特征;基于干净样本特征,结合目标锚点特征,通过攻击样本生成算法生成对抗样本;通过连续学习算法对图像分类模型进行训练,并统计在C类任务分类学习时干净样本对应的分类正确率;按照1:n的比例增入第一矩阵作为训练样本,重新训练;基于训练好的图像分类模型对图像进行分类。本发明提升了现有基于连续学习的智能系统的安全性、鲁棒性。

Description

针对连续学习能力系统的基于特征操纵的攻击和防御方法 技术领域
本发明属于模式识别、机器学习、多任务学习、对抗攻击技术领域,具体涉及一种针对连续学习能力系统的基于特征操纵的攻击和防御方法、系统、设备。
背景技术
深层人工神经网络可以从原始数据中提取高级特征,并以此为基础实现模式的检测、识别和分类等任务,在学习复杂的映射规则方面展现出非常强大潜力。然而,这种能力是一种“静态”的能力,即一旦训练完成,映射通常是固定的。在学习新任务时,深层人工神经网络往往会将以往任务中建立的映射破坏掉,不具备连续的学习能力。在机器学习领域中,这常被称为“灾难性遗忘”。很多应用场景需要深层人工神经网络能够学习新的信息并自我调整,但“灾难性遗忘”无疑是一块短板。“连续学习算法”应运而生,旨在平衡新旧任务的知识,以让人工智能系统具有连续学习的能力。这样的人工智能系统称为“连续学习系统”。
现在已有不少用以克服“灾难性遗忘”的连续学习算法和人工智能系统问世。这样的系统可以在实际场景中主动适应环境,大大提高人机协作的工作效率。但是由于基于神经网络构建的智能系统的学习能力没有关闭,这也使其学习能力完全暴露于实际场景,容易遭受第三方入侵。而目前,针对连续攻击连续学习过程的研究还比较少,而相应的防御算法就更少有人关心了。实际上,这些内容是连续学习算法落地应用所必不可少的。基于此,本发明针对具备连续学习能力的连续学习 能力系统(或称之为智能系统),如基于连续学习的图像分类模型,提出一种基于特征操纵的攻击和防御方法,可以隐蔽地影响连续学习系统的学习过程和操纵系统的学习结果。
发明内容
为了解决现有技术中的上述问题,即为了解决现有基于连续学习的智能系统暴露于实际场景,容易被利用、攻击和误导,导致安全性、鲁棒性较差的问题,本发明提出了一种针对连续学习能力系统的基于特征操纵的攻击和防御方法,该方法包括:
步骤S10,获取图像训练样本集中待分类学习的B类任务对应的训练样本,作为干净样本;所述图像训练样本集中含有M种待分类学习的任务;
步骤S20,通过预构建的特征提取网络提取所述干净样本的特征,作为干净样本特征;
步骤S30,获取图像训练样本集中待分类学习的C类任务对应的训练样本,作为目标样本,并通过所述特征提取网络提取目标样本的特征,作为目标锚点特征;
步骤S40,基于所述干净样本特征,结合所述目标锚点特征,通过预设的攻击样本生成算法生成待分类学习的B类任务的对抗样本;
步骤S50,将干净样本从所述图像训练样本集中删除,将对抗样本增入图像训练样本集,并通过连续学习算法对图像分类模型进行训练,统计所述图像分类模型在C类任务分类学习时所述干净样本对应的分类正确率;
步骤S60,若所述分类正确率低于设定阈值,则在所述图像分类模型的线性分类层中增加一个神经元,用于识别除M种待分类学习的类别外的类别;将包含对抗样本的图像训练样本集中的各类任务的训 练样本和第一矩阵按照1:n的比例,在包含对抗样本的图像训练样本集增入第一矩阵,增入后,对增加神经元的图像分类模型进行训练,直至得到训练好的图像分类模型;否则,跳转步骤S70;其中,所述第一矩阵为基于随机噪声构建的像素矩阵;n为正整数;
步骤S70,基于训练好的图像分类模型对待分类的图像进行分类。
在一些优选的实施方式中,所述特征提取网络、所述图像分类模型均基于深度神经网络构建;其中,所述特征提取网络基于去掉线性分类层的深度神经网络构建。
在一些优选的实施方式中,所述图像分类模型在连续学习时特征层面的损失函数为基于距离函数构建的损失函数;所述距离函数包括欧式距离。
在一些优选的实施方式中,所述的攻击样本生成算法为:
Figure PCTCN2021128193-appb-000001
Figure PCTCN2021128193-appb-000002
Figure PCTCN2021128193-appb-000003
Clip X,∈{X′}(x,y)=min{γ,X(x,y)+∈,max{0,X(x,y)-∈,X′(x,y)}}
其中,X clean、X均表示干净样本,
Figure PCTCN2021128193-appb-000004
表示第N次迭代获取的对抗样本,J(·,·)表示损失函数,h s表示目标锚点特征,Clip X,∈{X′}表示裁剪函数,(x,y)表示像素坐标,∈表示噪声扰动强度,α、γ表示预设的权重参数,F表示特征提取网络,
Figure PCTCN2021128193-appb-000005
表示未剪裁的第N+1次迭代获取的对抗样本,
Figure PCTCN2021128193-appb-000006
表示对干净样本X求梯度。
在一些优选的实施方式中,所述连续学习算法为OWM连续学习算法。
本发明的第二方面,提出了一种针对连续学习能力系统的基于特征操纵的攻击和防御系统,所述连续学习能力系统为图像分类模型,该系统包括:干净样本获取模块、干净样本特征提取模块、目标锚点特征提取模块、对抗样本生成方法、连续学习模块、防御优化模块、图像分类模块;
所述干净样本获取模块,配置为获取图像训练样本集中待分类学习的B类任务对应的训练样本,作为干净样本;所述图像训练样本集中含有M种待分类学习的任务;
所述干净样本特征提取模块,配置为通过预构建的特征提取网络提取所述干净样本的特征,作为干净样本特征;
所述目标锚点特征提取模块,配置为获取图像训练样本集中待分类学习的C类任务对应的训练样本,作为目标样本,并通过所述特征提取网络提取目标样本的特征,作为目标锚点特征;
所述对抗样本生成方法,配置为基于所述干净样本特征,结合所述目标锚点特征,通过预设的攻击样本生成算法生成待分类学习的B类任务的对抗样本;
所述连续学习模块,配置为将干净样本从所述图像训练样本集中删除,将对抗样本增入图像训练样本集,并通过连续学习算法对图像分类模型进行训练,统计所述图像分类模型在C类任务分类学习时所述干净样本对应的分类正确率;
所述防御优化模块,配置为若所述分类正确率低于设定阈值,则在所述图像分类模型的线性分类层中增加一个神经元,用于识别除M种待分类学习的类别外的类别;将包含对抗样本的图像训练样本集中的各类任务的训练样本和第一矩阵按照1:n的比例,在包含对抗样本的图像训练样本集增入第一矩阵,增入后,对增加神经元的图像分类模型进行 训练,直至得到训练好的图像分类模型;否则,跳转图像分类模块;其中,所述第一矩阵为基于随机噪声构建的像素矩阵;n为正整数;
所述图像分类模块,配置为基于训练好的图像分类模型对待分类的图像进行分类。
本发明的第三方面,提出了一种电子设备,包括:至少一个处理器;以及与至少一个所述处理器通信连接的存储器;其中,所述存储器存储有可被所述处理器执行的指令,所述指令用于被所述处理器执行以实现上述的针对连续学习能力系统的基于特征操纵的攻击和防御方法。
本发明的第四方面,提出了一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机指令,所述计算机指令用于被所述计算机执行以实现上述的针对连续学习能力系统的基于特征操纵的攻击和防御方法。
本发明的有益效果:
本发明提升了现有基于连续学习的智能系统的安全性、鲁棒性。
1)本发明提出一种新的神经网络攻击算法,在不需要直接攻击和修改系统参数的情况,只利用系统自身的学习的能力,操纵连续学习系统的能力和学习结果,主要针对神经网络的训练过程,而非像传统的神经网络攻击算法往往针对一个静态的神经网络模型。在算法设计上,它具有可扩展性强、易操作并且具有较强隐蔽性等优点;
2)本发明将首次对主流连续学习算法的鲁棒性进行系统的量化和分析。目前,在连续学习领域,关注的焦点还是在新算法的提出,刷新神经网络的连续学习性能,而对于连续学习算法的鲁棒性缺乏系统的研究。因此,本发明不仅将是一个有益的补充,还可能为连续学习算法设计提供新的视角。
3)本发明揭示了现有连续学习算法在实际场景应用时的潜在风险,同时也给出了有效的防御策略。无论是将来研究新的连续学习算法,还是将已有连续学习算法落地使用,本发明都具有积极的意义。
附图说明
通过阅读参照以下附图所做的对非限制性实施例所做的详细描述,本申请的其他特征、目的和优点将会变得更明显。
图1是本发明一种实施例的针对连续学习能力系统的基于特征操纵的攻击和防御方法的流程示意图;
图2为本发明一种实施例的生成的对抗样本的示例图;
图3是本发明一种实施例的针对连续学习的对抗攻击过程的示意图;
图4是本发明一种实施例的图像分类模型受到攻击后的实施效果的示意图;其中,(a)是全部任务学完后,被攻击的图像分类模型和对照组的各任务准确率对比图;(b)是学习进程中,被攻击的图像分类模型和对照组对于干净3的测试准确率变化折线图;
图5是本发明一种实施例的防御优化后的实施效果的示意图;其中,(a)是全部任务学完后,经过防御后的图像分类模型和未受攻击的图像分类模型的各任务准确率对比图;(b)是全部任务学完后,被攻击的图像分类模型和经过防御后的图像分类模型的各任务准确率对比图;(c)是学习进程中,对照组、被攻击的图像分类模型和经过防御的图像分类模型,对于干净3的测试准确率变化折线图;
图6是本发明一种实施例的适于用来实现本申请实施例的电子设备的计算机系统的结构示意图。
具体实施方式
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
下面结合附图和实施例对本申请作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释相关发明,而非对该发明的限定。需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。
本发明第一实施例的一种针对连续学习能力系统的基于特征操纵的攻击和防御方法,所述连续学习能力系统为图像分类模型,如图1所示,该方法包括以下步骤:
步骤S10,获取图像训练样本集中待分类学习的B类任务对应的训练样本,作为干净样本;所述图像训练样本集中含有M种待分类学习的任务;
步骤S20,通过预构建的特征提取网络提取所述干净样本的特征,作为干净样本特征;
步骤S30,获取图像训练样本集中待分类学习的C类任务对应的训练样本,作为目标样本,并通过所述特征提取网络提取目标样本的特征,作为目标锚点特征;
步骤S40,基于所述干净样本特征,结合所述目标锚点特征,通过预设的攻击样本生成算法生成待分类学习的B类任务的对抗样本;
步骤S50,将干净样本从所述图像训练样本集中删除,将对抗样本增入图像训练样本集,并通过连续学习算法对图像分类模型进行训练,统计所述图像分类模型在C类任务分类学习时所述干净样本对应的分类正确率;
步骤S60,若所述分类正确率低于设定阈值,则在所述图像分类模型的线性分类层中增加一个神经元,用于识别除M种待分类学习的类别外的类别;将包含对抗样本的图像训练样本集中的各类任务的训练样本和第一矩阵按照1:n的比例,在包含对抗样本的图像训练样本集增入第一矩阵,增入后,对增加神经元的图像分类模型进行训练,直至得到训练好的图像分类模型;否则,跳转步骤S70;其中,所述第一矩阵为基于随机噪声构建的像素矩阵;n为正整数;
步骤S70,基于训练好的图像分类模型对待分类的图像进行分类。
为了更清晰地对本发明针对连续学习能力系统的基于特征操纵的攻击和防御方法进行说明,下面结合附图,对本发明方法一种实施例中各步骤进行展开详述。
在本发明中,针对人工神经网络连续学习系统,提出了一种隐蔽式、延迟性的攻击。可以使对于系统的破坏并不立马表现出来,而是在特定的连续学习阶段使目标任务性能骤降。这对当前连续学习系统提出了重大挑战,也为连续学习算法的鲁棒性设计提供了思路。
为实现上述目的,本发明:1)预设一个攻击目标任务;构建一个独立于连续学习系统、预设的特征提取网络,用于提取连续学习系统的目标任务信息类型所对应的特征向量;2)使用特征提取网络,定义某种度量来衡量样本中的关键特征,确定对于预设攻击目标任务的关键性特征;3)基于预设目标任务的关键性特征,对预设任务的训练样本进行特征微调后,即完成了对连续学习系统中预设目标任务的隐蔽式攻击操作。具体过程如下:
步骤S10,获取图像训练样本集中待分类学习的B类任务对应的训练样本,作为干净样本;所述图像训练样本集中含有M种待分类学习的任务;
在本实施例中,采集连续学习系统(即图像分类模型)在分类学习时的图像训练样本,构建图像样本训练集。图像训练样本集中含有M种待分类学习的任务,例如,本实施例中采用MNIST训练集作为图像本训练集,包含10个手写体数字的60000张图片,即本发明的图像样本训练集包含0~9的10中待分类学习的任务。
获取图像训练样本集中待分类学习的B类任务对应的训练样本,作为干净样本。例如,本实施例中选取数字3作为干净样本(即待攻击学习的样本),简称“干净3”。
步骤S20,通过预构建的特征提取网络提取所述干净样本的特征,作为干净样本特征;
在本实施例中,特征提取网络基于深度神经网络构建,本发明特征提取网络以基于深度学习的全连接神经网络为例,其中,基于深度学习的全连接神经网络优选为三层全连接神经网络,结构为[784-800-10],并在全部MNIST训练集上训练该网络。[784-800-10],即网络的第一层为784个神经元的输入层,与训练样本的数据维度匹配;第二层为含有800个神经元的隐层,最后一层是包含10个类别的分类层。我们首先使用传统的多输出的方法训练数据,在MNIST的训练集上用该基于深度学习的全连接网络训练了一个特征向量提取器(即特征提取网络),然后去掉网络的最后一层线性分类层,使用倒数第二层的输出作为数据的特征。即本发明中的特征提取网络基于去掉线性分类层的深度神经网络构建。
特征提取网络在训练时,使用Adam算法,学习率为0.1,权重衰减率为0.0001,每个Batch的大小为256。
步骤S30,获取图像训练样本集中待分类学习的C类任务对应的训练样本,作为目标样本,并通过所述特征提取网络提取目标样本的特征,作为目标锚点特征;
在本实施例中,可以指定在图像样本训练集对应的特征空间中的某个空间点或某个子空间作为目标特征。目标特征的选择视具体需要而确定,可以选定攻击目标任务之外的某个任务中的样本特征作为其目标特征。例如,本实施例中优选数字5作为目标样本,并提取目标样本的特征作为目标锚点特征。
步骤S40,基于所述干净样本特征,结合所述目标锚点特征,通过预设的攻击样本生成算法生成待分类学习的B类任务的对抗样本;
在本实施例中,将图像样本训练集中所有数字3取出,优选采用欧氏距离作为损失函数。在其他实施例中,可以根据实际需要,用其他距离来作为损失函数。即图像分类模型在连续学习时特征层面的损失函数基于距离函数构建。
在本发明中,定义图像分类模型在连续学习时特征层面的损失函数为J=‖h clean-h adv2,其中,h clean为数字3的原始特征,即干净样本特征,h adv为参考数字5特征得到的目标特征,即对抗样本特征。固定特征提取网络,采用以下方法迭代更新数字的对抗样本:
Figure PCTCN2021128193-appb-000007
Figure PCTCN2021128193-appb-000008
Figure PCTCN2021128193-appb-000009
Clip X,∈{X′}(x,y)=min{γ,X(x,y)+∈,max{0,X(x,y)-∈,X′(x,y)}}           (4)
其中,X clean、X均表示干净样本,
Figure PCTCN2021128193-appb-000010
表示第N次迭代获取的对抗样本,J(·,·)表示损失函数,h s表示目标锚点特征,Clip X,∈{X′}表示裁剪函数,(x,y)表示像素坐标,∈表示噪声扰动强度,α、γ表示预设的权重参数,F表示特征提取网络,
Figure PCTCN2021128193-appb-000011
表示未剪裁的第N+1次迭代获取的对抗样本,
Figure PCTCN2021128193-appb-000012
表示对干净样本X求梯度。
例如,α为0,γ为255,则Clip X,∈{X′}(x,y)=min{255,X(x,y)+∈,max{0,X(x,y)-∈,X′(x,y)}}。
经过N轮迭代后,得到对抗样本X adv,图2为生成的对抗样本和原始样本,二者十分接近,非常具有隐蔽性。
步骤S50,将干净样本从所述图像训练样本集中删除,将对抗样本增入图像训练样本集,并通过连续学习算法对图像分类模型进行训练,统计所述图像分类模型在C类任务分类学习时所述干净样本对应的分类正确率;
在本实施例中,构建一个可以连续学习的前馈神经网络,优选同样以网络结构[784-800-10]基于深度学习的全连接神经网络为例,作为图像分类模型,即本发明中的图像分类模型基于深度神经网络构建。在连续学习时,将干净样本从所述图像训练样本集中删除,将对抗样本增入图像训练样本集,按照0~9的顺序,优选使用OWM连续学习算法训练网络,在其他实施例中,可以根据实际需要,选取其他的连续学习算法对网络进行训练。此时,当连续学习能力系统在学习数字3时,我们将其中90%的干净样本(或全部干净样本)替换为攻击样本。期望的攻击效果将在神经网络在真正学习任务5时触发。攻击过程如图3所示,其中a为正常的连续学习过程,b为被攻击的过程,B类任务为被攻击的任务,C类任务为攻击的触发点。图4为本发明方法的攻击效果。攻击后,任务3(数字3)的性能相对正常连续学习急剧下降(图4(a))。在过程中,任务3在刚刚学习完时表现正常,但在学习数字5后性能急剧下降(图4(b))。另外,图4以及图5中的“对照”,指的是图像分类模型的图像训练样本集未增入对抗样本,训练后的图像分类模型对各数字的分类准确率(即测试准确率)。
步骤S60,若所述分类正确率低于设定阈值,则在所述图像分类模型的线性分类层中增加一个神经元,用于识别除M种待分类学习 的类别外的类别;将包含对抗样本的图像训练样本集中的各类任务的训练样本和第一矩阵按照1:n的比例,在包含对抗样本的图像训练样本集增入第一矩阵,增入后,对增加神经元的图像分类模型进行训练,直至得到训练好的图像分类模型;否则,跳转步骤S70;其中,所述第一矩阵为基于随机噪声构建的像素矩阵;n为正整数;
经试验发现,对抗样本可以用任意样本生成,即使用随机噪声也可以生成对抗样本。并且生成的对抗样本具有有效性,即在生成它的网络看来可以100%误导。这给予我们启发:随机噪声充满了整个样本空间,且总量和密度远远大于MNIST数据集。但如果强行将样本输入被训好的网络,那网络会将其强行分类。如果我们能够“教会”网络认识这些噪声(MNIST task-unnecessary samples),那么相当于“挤掉了”对抗样本可能的分布空间,从而增强模型鲁棒性。
在本实施例中,第一步是将网络的结构进行扩展,在最后的分类层多一个神经元,用以教会系统什么是“什么都不是”的类别,即拒识类。第二步是在训练过程中,在每个任务中,除了原本任务的样本,再添加部分辅助样本合并至该任务,对该任务进行学习。这样便完成了防御工作,具体处理过程如下:
若所述分类正确率低于设定阈值,则在线性分类层增加一个head(神经元),用以识别随机噪声,那么网络的结构就变为[784-800-11]。对包含对抗样本的图像训练样本集中的各类任务,按照1:n的比例生成随机噪声图片(例如,“数字0”原本有100张训练图像,按照1:n比例,假如n为6,则生成600张随机噪声图片,即现在“数字0”总共有700张训练图像),并打上第11类的标签,并入包含对抗样本的图像训练样本集。用11-head的网络训练这个数据增强过的图像训练样本集。防御优化后的效果如图5所示,其中柱状图表示所有任务学完后每个任务 的测试准确率,折线图表示随着学习进程推进,干净数字3的测试准确率。
其中,受到攻击后数字3的准确率从86.93%下降至17.13%,性能损失有69.8%;通过防御后准确率从86.93%下降至38.61%,性能损失有48.32%,大概为原损失69.8%的0.7倍,即经过防御后性能损失减小约30%。
步骤S70,基于训练好的图像分类模型对待分类的图像进行分类。
在本实施例中,获取待分类的图像,通过连续学习算法训练好的图像分类模型(即若所述分类正确率低于设定阈值,则使用步骤S60训练好的图像分类模型,否则使用步骤S50训练好的图像分类模型)对待分类的图像进行分类,输出分类结果。
另外,本发明针对连续学习能力系统的基于特征操纵的攻击和防御方法,在其他实施例中,也可以根据实际的应用场景以及应用需要,用于其他智能系统的攻击和防御,例如图像检测、识别等,此处不再一一阐述。
本发明第二实施例的一种针对连续学习能力系统的基于特征操纵的攻击和防御系统,所述连续学习能力系统为图像分类模型,该系统包括:干净样本获取模块、干净样本特征提取模块、目标锚点特征提取模块、对抗样本生成方法、连续学习模块、防御优化模块、图像分类模块;
所述干净样本获取模块,配置为获取图像训练样本集中待分类学习的B类任务对应的训练样本,作为干净样本;所述图像训练样本集中含有M种待分类学习的任务;
所述干净样本特征提取模块,配置为通过预构建的特征提取网络提取所述干净样本的特征,作为干净样本特征;
所述目标锚点特征提取模块,配置为获取图像训练样本集中待分类学习的C类任务对应的训练样本,作为目标样本,并通过所述特征提取网络提取目标样本的特征,作为目标锚点特征;
所述对抗样本生成方法,配置为基于所述干净样本特征,结合所述目标锚点特征,通过预设的攻击样本生成算法生成待分类学习的B类任务的对抗样本;
所述连续学习模块,配置为将干净样本从所述图像训练样本集中删除,将对抗样本增入图像训练样本集,并通过连续学习算法对图像分类模型进行训练,统计所述图像分类模型在C类任务分类学习时所述干净样本对应的分类正确率;
所述防御优化模块,配置为若所述分类正确率低于设定阈值,则在所述图像分类模型的线性分类层中增加一个神经元,用于识别除M种待分类学习的类别外的类别;将包含对抗样本的图像训练样本集中的各类任务的训练样本和第一矩阵按照1:n的比例,在包含对抗样本的图像训练样本集增入第一矩阵,增入后,对增加神经元的图像分类模型进行训练,直至得到训练好的图像分类模型;否则,跳转图像分类模块;其中,所述第一矩阵为基于随机噪声构建的像素矩阵;n为正整数;
所述图像分类模块,配置为基于训练好的图像分类模型对待分类的图像进行分类。
所述技术领域的技术人员可以清楚的了解到,为描述的方便和简洁,上述描述的系统的具体的工作过程及有关说明,可以参考前述方法实施例中的对应过程,在此不再赘述。
需要说明的是,上述实施例提供的针对连续学习能力系统的基于特征操纵的攻击和防御系统,仅以上述各功能模块的划分进行举例说明,在实际应用中,可以根据需要而将上述功能分配由不同的功能模块来完成,即将本发明实施例中的模块或者步骤再分解或者组合,例如, 上述实施例的模块可以合并为一个模块,也可以进一步拆分成多个子模块,以完成以上描述的全部或者部分功能。对于本发明实施例中涉及的模块、步骤的名称,仅仅是为了区分各个模块或者步骤,不视为对本发明的不当限定。
本发明第三实施例的一种电子设备,至少一个处理器;以及与至少一个所述处理器通信连接的存储器;其中,所述存储器存储有可被所述处理器执行的指令,所述指令用于被所述处理器执行以实现权利要求上述的针对连续学习能力系统的基于特征操纵的攻击和防御方法。
本发明第四实施例的一种计算机可读存储介质,所述计算机可读存储介质存储有计算机指令,所述计算机指令用于被所述计算机执行以实现权利要求上述的针对连续学习能力系统的基于特征操纵的攻击和防御方法。
所述技术领域的技术人员可以清楚的了解到,未描述的方便和简洁,上述描述的针对连续学习能力系统的基于特征操纵的攻击和防御装置、电子设备、计算机可读存储介质的具体工作过程及有关说明,可以参考前述方法实例中的对应过程,在此不再赘述。
下面参考图6,其示出了适于用来实现本申请系统、方法、设备实施例的服务器的计算机系统的结构示意图。图6示出的服务器仅仅是一个示例,不应对本申请实施例的功能和使用范围带来任何限制。
如图6所示,计算机系统包括中央处理单元(CPU,Central Processing Unit)601,其可以根据存储在只读存储器(ROM,Read Only Memory)602中的程序或者从存储部分608加载到随机访问存储器(RAM,Random Access Memory)603中的程序而执行各种适当的动作和处理。在RAM603中,还存储有系统操作所需的各种程序和数据。CPU601、ROM602以及RAM603通过总线604彼此相连。输入/输出(I/O,Input/Output)接口605也连接至总线604。
以下部件连接至I/O接口605:包括键盘、鼠标等的输入部分606;包括诸如阴极射线管、液晶显示器等以及扬声器等的输出部分607;包括硬盘等的存储部分608;以及包括诸如局域网卡、调制解调器等的网络接口卡的通讯部分609。通讯部分609经由诸如因特网的网络执行通讯处理。驱动器610也根据需要连接至I/O接口605。可拆卸介质611,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器610上,以便于从其上读出的计算机程序根据需要被安装入存储部分608。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通讯部分609从网络上被下载和安装,和/或从可拆卸介质611被安装。在该计算机程序被CPU601执行时,执行本申请的方法中限定的上述功能。需要说明的是,本申请上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是但不限于:电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、RAM、ROM、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本申请中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本申请中,计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适 的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:无线、电线、光缆等等,或者上述的任意合适的组合。
可以以一种或多种程序设计语言或其组合来编写用于执行本申请的操作的计算机程序代码,上述程序设计语言包括面向对象的程序设计语言,如Java、Smalltalk、C++,还包括常规的过程式程序设计语言,如C语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络,包括局域网或广域网连接到用户计算机,或者可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本申请各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
术语“第一”、“第二”等是用于区别类似的对象,而不是 用于描述或表示特定的顺序或先后次序。
术语“包括”或者任何其它类似用语旨在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备/装置不仅包括那些要素,而且还包括没有明确列出的其它要素,或者还包括这些过程、方法、物品或者设备/装置所固有的要素。
至此,已经结合附图所示的优选实施方式描述了本发明的技术方案,但是,本领域技术人员容易理解的是,本发明的保护范围显然不局限于这些具体实施方式。在不偏离本发明的原理的前提下,本领域技术人员可以对相关技术特征作出等同的更改或替换,这些更改或替换之后的技术方案都将落入本发明的保护范围之内。

Claims (8)

  1. 一种针对连续学习能力系统的基于特征操纵的攻击和防御方法,所述连续学习能力系统为图像分类模型,其特征在于,该方法包括:
    步骤S10,获取图像训练样本集中待分类学习的B类任务对应的训练样本,作为干净样本;所述图像训练样本集中含有M种待分类学习的任务;
    步骤S20,通过预构建的特征提取网络提取所述干净样本的特征,作为干净样本特征;
    步骤S30,获取图像训练样本集中待分类学习的C类任务对应的训练样本,作为目标样本,并通过所述特征提取网络提取目标样本的特征,作为目标锚点特征;
    步骤S40,基于所述干净样本特征,结合所述目标锚点特征,通过预设的攻击样本生成算法生成待分类学习的B类任务的对抗样本;
    步骤S50,将干净样本从所述图像训练样本集中删除,将对抗样本增入图像训练样本集,并通过连续学习算法对图像分类模型进行训练,统计所述图像分类模型在C类任务分类学习时所述干净样本对应的分类正确率;
    步骤S60,若所述分类正确率低于设定阈值,则在所述图像分类模型的线性分类层中增加一个神经元,用于识别除M种待分类学习的类别外的类别;将包含对抗样本的图像训练样本集中的各类任务的训练样本和第一矩阵按照1:n的比例,在包含对抗样本的图像训练样本集增入第一矩阵,增入后,对增加神经元的图像分类模型进行训练,直至得到训练好的图像分类模型;否则,跳转步骤S70;其中,所述第一矩阵为基于随机噪声构建的像素矩阵;n为正整数;
    步骤S70,基于训练好的图像分类模型对待分类的图像进行分类。
  2. 根据权利要求1所述的针对连续学习能力系统的基于特征操纵的 攻击和防御方法,其特征在于,所述特征提取网络、所述图像分类模型均基于深度神经网络构建;其中,所述特征提取网络基于去掉线性分类层的深度神经网络构建。
  3. 根据权利要求1所述的针对连续学习能力系统的基于特征操纵的攻击和防御方法,其特征在于,所述图像分类模型在连续学习时特征层面的损失函数为基于距离函数构建的损失函数;所述距离函数包括欧式距离。
  4. 根据权利要求3所述的针对连续学习能力系统的基于特征操纵的攻击和防御方法,其特征在于,所述的攻击样本生成算法为:
    Figure PCTCN2021128193-appb-100001
    Figure PCTCN2021128193-appb-100002
    Figure PCTCN2021128193-appb-100003
    ClipX,ε{X′}(x,y)=min{γ,X(x,y)+ε,max{0,X(x,y)-ε,X′(x,y)}}
    其中,X clean、X均表示干净样本,
    Figure PCTCN2021128193-appb-100004
    表示第N次迭代获取的对抗样本,j(·,·)表示损失函数,h s表示目标锚点特征,Clip X,ε{X′}表示裁剪函数,(x,y)表示像素坐标,ε表示噪声扰动强度,α、γ表示预设的权重参数,F表示特征提取网络,
    Figure PCTCN2021128193-appb-100005
    表示未剪裁的第N+1次迭代获取的对抗样本,
    Figure PCTCN2021128193-appb-100006
    表示对干净样本X求梯度。
  5. 根据权利要求1所述的针对连续学习能力系统的基于特征操纵的攻击和防御方法,其特征在于,所述连续学习算法为OWM连续学习算法。
  6. 一种针对连续学习能力系统的基于特征操纵的攻击和防御系统, 所述连续学习能力系统为图像分类模型,其特征在于,该系统包括:干净样本获取模块、干净样本特征提取模块、目标锚点特征提取模块、对抗样本生成方法、连续学习模块、防御优化模块、图像分类模块;
    所述干净样本获取模块,配置为获取图像训练样本集中待分类学习的B类任务对应的训练样本,作为干净样本;所述图像训练样本集中含有M种待分类学习的任务;
    所述干净样本特征提取模块,配置为通过预构建的特征提取网络提取所述干净样本的特征,作为干净样本特征;
    所述目标锚点特征提取模块,配置为获取图像训练样本集中待分类学习的C类任务对应的训练样本,作为目标样本,并通过所述特征提取网络提取目标样本的特征,作为目标锚点特征;
    所述对抗样本生成方法,配置为基于所述干净样本特征,结合所述目标锚点特征,通过预设的攻击样本生成算法生成待分类学习的B类任务的对抗样本;
    所述连续学习模块,配置为将干净样本从所述图像训练样本集中删除,将对抗样本增入图像训练样本集,并通过连续学习算法对图像分类模型进行训练,统计所述图像分类模型在C类任务分类学习时所述干净样本对应的分类正确率;
    所述防御优化模块,配置为若所述分类正确率低于设定阈值,则在所述图像分类模型的线性分类层中增加一个神经元,用于识别除M种待分类学习的类别外的类别;将包含对抗样本的图像训练样本集中的各类任务的训练样本和第一矩阵按照1:n的比例,在包含对抗样本的图像训练样本集增入第一矩阵,增入后,对增加神经元的图像分类模型进行训练,直至得到训练好的图像分类模型;否则,跳转图像分类模块;其中,所述第一矩阵为基于随机噪声构建的像素矩阵;n为正整数;
    所述图像分类模块,配置为基于训练好的图像分类模型对待分类的 图像进行分类。
  7. 一种电子设备,其特征在于,包括:
    至少一个处理器;以及与至少一个所述处理器通信连接的存储器;
    其中,所述存储器存储有可被所述处理器执行的指令,所述指令用于被所述处理器执行以实现权利要求1-5任一项所述的针对连续学习能力系统的基于特征操纵的攻击和防御方法。
  8. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机指令,所述计算机指令用于被所述计算机执行以实现权利要求1-5任一项所述的针对连续学习能力系统的基于特征操纵的攻击和防御方法。
PCT/CN2021/128193 2021-10-25 2021-11-02 针对连续学习能力系统的基于特征操纵的攻击和防御方法 WO2023070696A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111242998.2 2021-10-25
CN202111242998.2A CN113919497A (zh) 2021-10-25 2021-10-25 针对连续学习能力系统的基于特征操纵的攻击和防御方法

Publications (1)

Publication Number Publication Date
WO2023070696A1 true WO2023070696A1 (zh) 2023-05-04

Family

ID=79242793

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/128193 WO2023070696A1 (zh) 2021-10-25 2021-11-02 针对连续学习能力系统的基于特征操纵的攻击和防御方法

Country Status (2)

Country Link
CN (1) CN113919497A (zh)
WO (1) WO2023070696A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117036869A (zh) * 2023-10-08 2023-11-10 之江实验室 一种基于多样性和随机策略的模型训练方法及装置

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114708460A (zh) * 2022-04-12 2022-07-05 济南博观智能科技有限公司 一种图像分类方法、系统、电子设备及存储介质
CN115409818B (zh) * 2022-09-05 2023-10-27 江苏济远医疗科技有限公司 一种应用于内窥镜图像目标检测模型的增强训练方法

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110334808A (zh) * 2019-06-12 2019-10-15 武汉大学 一种基于对抗样本训练的对抗攻击防御方法
CN111753881A (zh) * 2020-05-28 2020-10-09 浙江工业大学 一种基于概念敏感性量化识别对抗攻击的防御方法
US20210012188A1 (en) * 2019-07-09 2021-01-14 Baidu Usa Llc Systems and methods for defense against adversarial attacks using feature scattering-based adversarial training

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110334808A (zh) * 2019-06-12 2019-10-15 武汉大学 一种基于对抗样本训练的对抗攻击防御方法
US20210012188A1 (en) * 2019-07-09 2021-01-14 Baidu Usa Llc Systems and methods for defense against adversarial attacks using feature scattering-based adversarial training
CN111753881A (zh) * 2020-05-28 2020-10-09 浙江工业大学 一种基于概念敏感性量化识别对抗攻击的防御方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LI XIAOBIN; SHAN LIANLEI; LI MINGLONG; WANG WEIQIANG: "Energy Minimum Regularization in Continual Learning", 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), IEEE, 10 January 2021 (2021-01-10), pages 6404 - 6409, XP033909282, DOI: 10.1109/ICPR48806.2021.9412744 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117036869A (zh) * 2023-10-08 2023-11-10 之江实验室 一种基于多样性和随机策略的模型训练方法及装置
CN117036869B (zh) * 2023-10-08 2024-01-09 之江实验室 一种基于多样性和随机策略的模型训练方法及装置

Also Published As

Publication number Publication date
CN113919497A (zh) 2022-01-11

Similar Documents

Publication Publication Date Title
Chakraborty et al. A survey on adversarial attacks and defences
Yuan et al. Adversarial examples: Attacks and defenses for deep learning
WO2023070696A1 (zh) 针对连续学习能力系统的基于特征操纵的攻击和防御方法
Gong et al. Change detection in synthetic aperture radar images based on deep neural networks
Chakraborty et al. Adversarial attacks and defences: A survey
Guo et al. Fake face detection via adaptive manipulation traces extraction network
Farahnakian et al. A deep auto-encoder based approach for intrusion detection system
CN108111489B (zh) Url攻击检测方法、装置以及电子设备
Muñoz-González et al. Poisoning attacks with generative adversarial nets
Ponmalar et al. An intrusion detection approach using ensemble support vector machine based chaos game optimization algorithm in big data platform
Zhao et al. A malware detection method of code texture visualization based on an improved faster RCNN combining transfer learning
Huang et al. Robustness of on-device models: Adversarial attack to deep learning models on android apps
JP2022141931A (ja) 生体検出モデルのトレーニング方法及び装置、生体検出の方法及び装置、電子機器、記憶媒体、並びにコンピュータプログラム
Li et al. Black-box attack against handwritten signature verification with region-restricted adversarial perturbations
Liu et al. Adversaries or allies? Privacy and deep learning in big data era
Sun et al. Can shape structure features improve model robustness under diverse adversarial settings?
Chen et al. Patch selection denoiser: An effective approach defending against one-pixel attacks
Umer et al. Targeted forgetting and false memory formation in continual learners through adversarial backdoor attacks
CN111062019A (zh) 用户攻击检测方法、装置、电子设备
Mohammadi et al. A new metaheuristic feature subset selection approach for image steganalysis
Zhang et al. Effective presentation attack detection driven by face related task
Sharif et al. A deep learning based technique for the classification of malware images
WO2023185074A1 (zh) 一种基于互补时空信息建模的群体行为识别方法
Cao et al. FePN: A robust feature purification network to defend against adversarial examples
CN115659387A (zh) 一种基于神经通路的用户隐私保护方法、电子设备、介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21962046

Country of ref document: EP

Kind code of ref document: A1