CN113360917A - Deep reinforcement learning model security reinforcement method and device based on differential privacy - Google Patents

Deep reinforcement learning model security reinforcement method and device based on differential privacy Download PDF

Info

Publication number
CN113360917A
CN113360917A CN202110766183.8A CN202110766183A CN113360917A CN 113360917 A CN113360917 A CN 113360917A CN 202110766183 A CN202110766183 A CN 202110766183A CN 113360917 A CN113360917 A CN 113360917A
Authority
CN
China
Prior art keywords
model
stealing
value
differential privacy
reinforcement learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110766183.8A
Other languages
Chinese (zh)
Inventor
陈晋音
王雪柯
胡书隆
章燕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN202110766183.8A priority Critical patent/CN113360917A/en
Publication of CN113360917A publication Critical patent/CN113360917A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • G06F21/575Secure boot
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database

Abstract

The invention discloses a method and a device for reinforcing the safety of a deep reinforcement learning model based on differential privacy, wherein the method comprises the following steps: sampling data from the environment as a sample set to be trained, constructing a target model by using a deep reinforcement learning algorithm, and inputting the sample set to be trained into the target model to train the target model; testing the trained target model, and sampling state actions as a stealing data set; constructing a stealing model by utilizing a deep reinforcement learning algorithm; inputting the stealing data set serving as a training sample into a stealing model and training the stealing model by using a simulation learning algorithm; adding a differential privacy protection mechanism into a trained target model, and inputting data output by the target model under the action of the differential privacy protection mechanism into a stealing model; the stealing model makes false attack actions under the influence of data acted by a differential privacy mechanism.

Description

Deep reinforcement learning model security reinforcement method and device based on differential privacy
Technical Field
The invention relates to the field of data security, in particular to a method and a device for reinforcing the security of a deep reinforcement learning model based on differential privacy.
Background
With the rapid development of artificial intelligence, a deep reinforcement learning algorithm combining the perception capability of deep learning and the decision capability of reinforcement learning is widely applied to the fields of automatic driving, automatic translation, game AI and the like.
However, recent research shows that a deep reinforcement learning model is easily attacked by different types of malicious attacks, the integrity, the usability and the confidentiality of a deep reinforcement learning system are greatly threatened by security holes existing in a deep reinforcement learning algorithm, and as artificial intelligence is increasingly closely related to production life, the demand of people on solving the problem of artificial intelligence application security is increasingly urgent.
The existing method for improving the security of the deep learning model is a defense method facing the deep reinforcement learning model to resist attacks, which is disclosed in the Chinese patent application with the publication number of CN 110968866A; the defense method comprises the following steps: predicting the input previous environmental state by using a visual prediction model constructed based on a generative confrontation network, outputting and predicting the current environmental state, and obtaining a next frame prediction environmental state value of the predicted current environmental state under a deep reinforcement learning strategy; acquiring an actual current environment state output by the deep reinforcement learning model, and acquiring an environment state value of the actual current environment state added with disturbance under a deep reinforcement learning strategy; judging the predicted environment state value and the environment state value added with disturbance by using a judgment model constructed based on a generative confrontation network, and obtaining whether the deep reinforcement learning model is attacked or not according to a judgment result; when the deep reinforcement learning model is attacked, extracting an actual current environment state, performing first-layer defense on the actual current environment state by using a first defense model based on Squeezenet, and performing second-layer defense on a first-layer defense result by using a second defense model based on DenseNet to obtain the actual current environment state after defense; and the deep reinforcement learning model performs learning prediction output by using the actual current environment state after defense.
The defense method for resisting attacks by the deep reinforcement learning model provided by the patent application utilizes the visual prediction model, the discriminator and the additional defense model to defend the reinforcement learning model, and the defense method utilizes reinforcement learning to defend but not to perform security reinforcement on the deep reinforcement learning model.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a method and a device for reinforcing the safety of a depth reinforcement learning model based on differential privacy, which realize the purpose that the output distribution of the depth model is blurred to the maximum extent on the premise of not changing the output action of the model, and the level of model stealing attack is greatly reduced, thereby preventing an attacker from stealing the original model by utilizing the action space distribution.
A deep reinforcement learning model security reinforcement method based on differential privacy comprises the following steps:
sampling data from the environment as a sample set to be trained, constructing a target model by using a deep reinforcement learning algorithm, and inputting the sample set to be trained into the target model to train the target model;
testing the trained target model, and sampling state actions as a stealing data set;
constructing a stealing model by utilizing a deep reinforcement learning algorithm, wherein the stealing model is used for simulating the attack action of an attack target model;
inputting the stealing data set serving as a training sample into a stealing model and training the stealing model by using a simulation learning algorithm;
adding a differential privacy protection mechanism into a trained target model, and inputting data output by the target model under the action of the differential privacy protection mechanism into a stealing model;
the stealing model makes false attack actions under the influence of data acted by a differential privacy mechanism.
The training of the target model comprises the following steps:
using an experience playback mechanism, and carrying out online collection and processing to obtain an online sample set;
storing the online sample set and the sample set to be trained into a playback memory unit to form a transfer sample;
during each training, randomly extracting transfer samples from the transfer samples, inputting the transfer samples into a current value network to obtain a current Q value, and updating parameters by using a random gradient descent algorithm in the training process;
copying parameters of the current value network to a target value network to obtain an optimization target of the current Q value, namely a target Q value;
updating network parameters by minimizing a mean square error between a current Q value and a target Q value; after the target value network is introduced, the target Q value is kept unchanged in a period of time, so that the correlation between the current Q value and the target Q value is reduced to a certain extent, and the stability of the algorithm is improved;
the depth reinforcement learning algorithm reduces the reward value and the error term to a limited interval, ensures that the Q value and the gradient value are in a reasonable range, improves the stability of the algorithm, and obtains an optimal strategy through gradient descent optimization.
The deep reinforcement learning problem can be modeled as a markov decision process, i.e. MDP ═ S, a, R, P can be represented by a quadruple, where S represents the set of states available in the decision process, a represents the set of actions in the decision process, R represents the real-time reward for state transition, and P is the state transition probability. At the beginning of any time step t, the agent observes the environment to get the current state stAnd according to the current optimal strategy pi*Make action atAt the end of t, the agent receives its reward rtAnd the next observation state st+1(ii) a The deep reinforcement learning algorithm adopts a target value network parameter update called 'hard' mode, namely, the network parameters in the current value network are assigned to the target value network at regular intervals;
when the deep reinforcement network is trained, the samples are generally required to be mutually independent and are sampled randomly, so that the relevance among the samples is greatly reduced, and the stability of the algorithm is improved;
typically, the output of the network, representing the current values, is used to evaluate the value function of the current state action pair; the output of the network of target values is typically represented by an optimization objective that is a function of the approximate representation.
The error function between the current Q value and the error Q value is as follows:
Figure BDA0003151646270000033
the parameter θ is subjected to partial derivation to obtain the following gradient:
Figure BDA0003151646270000031
where s is the current state, a is the corresponding action, r is the reward value, s' is the next state, θiIs a model parameter, E represents expectation, YiRepresents the desired Q value, Q (s, a | θ)i) A prize value representing state s and action a.
The optimal strategy is as follows:
Figure BDA0003151646270000032
where s is the current state, a is the corresponding action, A action set, Q*Is an optimum function, pi*Is the optimal strategy.
The training of the stealing model comprises the following steps:
the action and the state output by the generator G are input into the discriminator in pairs to be compared with expert data by using an Actor network instead of the generator G, the output of the discriminator D: S multiplied by A → (0,1) is used as a reward value to guide strategy learning simulating the learning, and a discriminator loss function is expressed as:
Figure BDA0003151646270000041
wherein, piILRepresenting strategies obtained by imitation of learning,. pitAn expert strategy representing sampling, logD (s, a) in the first term representing the judgment of the arbiter on the real data, the second term representing the judgment of the arbiter on the real dataThe binomial log (1-D (s, a)) represents the judgment of the generated data;
specifically, through such a maximum and minimum game process, G and D are optimized circularly and alternately to train a required Actor network and a discriminant network;
in the training process, a loss function is minimized through gradient derivation so as to reversely update network parameters of the arbiter and the Actor, and the loss function is as follows:
Figure BDA0003151646270000042
wherein the content of the first and second substances,
Figure BDA0003151646270000043
is a simulation strategy piILThe entropy of the loss function is controlled by a constant lambda (lambda is more than or equal to 0) and is used as a strategy regular term in the loss function;
and generating a target model for resisting sample attack by using the trained stealing model.
The differential privacy mechanism is represented as follows:
Figure BDA0003151646270000044
wherein the content of the first and second substances,
Figure BDA0003151646270000045
is a mean of 0 and a variance of
Figure BDA0003151646270000046
Gaussian distribution, a single applied Gaussian mechanism sensitivity function fdySatisfy the requirement of
Figure BDA0003151646270000047
Representing an input sequence dseThe sensitivity of (c);
a differential privacy mechanism is added to the target model middle layer.
Approximating a real-valued function with a differential privacy mechanism
Figure BDA0003151646270000049
One common example of this is by adding noise calibration
Figure BDA00031516462700000410
Sensitivity of (2)
Figure BDA00031516462700000411
It is defined as two adjacent input sequences dseAnd d'seThe maximum value of the absolute distance therebetween.
In deep reinforcement learning, Dynamic Differential Privacy (DDP) is added to a strategy execution forward DRL model intermediate layer, and in order to ensure that given noise distribution meets (epsilon, delta) -DDP, the invention selects noise scale sigma ≧ c delta s/epsilon and constant
Figure BDA0003151646270000048
For ε ∈ (0, 1); in this result, the value of the data plus noise samples in the data set, Δ s, is determined by
Figure BDA0003151646270000051
Given the sensitivity of the function s, s is a real-valued function. A security reinforcement mechanism is dynamically added in the model to ensure that the strategy action distribution is different from the prime action space distribution, and the action space distribution taken by an attacker is difficult to predict an original model algorithm.
Specifically, the measure of the model stealing attack is defined as:
Figure BDA0003151646270000052
this formula measures the effectiveness and extent of model stealing of the target model,
wherein: rstlIs the value of the reward after model stealing, RtestIs the original model test award value.
Then the measure of model stealing defense added with the differential privacy protection mechanism is as follows:
Figure BDA0003151646270000053
the formula measures the defense effect of the invention, and intuitively speaking, the formula measures the reduction degree of model stealing attack under the defense of the invention;
wherein: rdefnseIs the model steals the defended reward value, RstlIs the value of the reward after model stealing, RtestIs the original model test award value.
A differential privacy based deep reinforcement learning model security strengthening device, comprising a computer memory, a computer processor and a computer program stored in the computer memory and executable on the computer processor, wherein the computer processor implements any one of the above methods when executing the computer program.
Compared with the prior art, the invention has the advantages that:
(1) by introducing an index mechanism of differential privacy into the model input layer, the information quantity which can be obtained by a model stealing attacker from the model output is reduced, the output distribution of the depth model is blurred to the maximum extent on the premise of not changing the output action of the model, and the level of model stealing attack is greatly reduced, so that the attacker is prevented from stealing the original model by utilizing the action space distribution.
Drawings
FIG. 1 is a general flowchart of a method for security reinforcement of a deep reinforcement learning model based on differential privacy according to the present invention;
fig. 2 is a deep reinforcement learning model schematic diagram of the deep reinforcement learning model security reinforcement method based on differential privacy provided by the invention.
Detailed Description
The invention is further described with reference to the following figures and specific examples.
The embodiment provides a deep reinforcement learning model security reinforcement method based on differential privacy, which changes the action space distribution of a deep reinforcement learning strategy through a differential privacy index mechanism, reduces the information quantity which can be obtained from model output by a model stealing attacker through introducing the differential privacy index mechanism into a model input layer, blurs the output distribution of the deep model to the maximum extent on the premise of not changing the output action of the model, and greatly reduces the level of model stealing attack, thereby preventing the attacker from stealing the original model by utilizing the action space distribution.
Fig. 1 is a general flowchart of the method for reinforcing the security of the deep reinforcement learning model based on differential privacy according to this embodiment, and the method for reinforcing the security of the deep reinforcement learning model based on differential privacy according to this embodiment can be used in the field of game AI and used for training game AI to automatically play games.
As shown in fig. 1-2, the method for security reinforcement of the deep reinforcement learning model based on differential privacy includes the following steps:
(1) sampling data from the environment as a sample set to be trained, constructing a target model by using a deep reinforcement learning algorithm, and inputting the sample set to be trained into the target model to train the target model; the specific training process comprises
(1.1) using an experience playback mechanism, and carrying out online collection and processing to obtain an online sample set;
(1.2) storing the online sample set and the sample set to be trained into a playback memory unit to form a transfer sample;
(1.3) during each training, randomly extracting transfer samples from the transfer samples, inputting the transfer samples into a current value network to obtain a current Q value, and updating parameters by using a random gradient descent algorithm in the training process;
(1.4) copying parameters of the current value network to a target value network to obtain an optimization target of the current Q value, namely the target Q value;
(1.5) updating network parameters by minimizing a mean square error between a current Q value and a target Q value; the error function between the current Q value and the error Q value is as follows:
Figure BDA0003151646270000061
the parameter θ is subjected to partial derivation to obtain the following gradient:
Figure BDA0003151646270000071
where s is the current state, a is the corresponding action, s' is the next state, θiIs a model parameter; e represents expectation, YiRepresents the desired Q value, Q (s, a | θ)i) A prize value representing state s and action a.
(1.6) the reward value and the error term are reduced to a limited interval by a deep reinforcement learning algorithm, and an optimal strategy is obtained through gradient descent optimization, wherein the optimal strategy is as follows:
Figure BDA0003151646270000072
where s is the current state, a is the corresponding action, A action set, Q*Is an optimum function, pi*Is the optimal strategy.
(2) Testing the trained target model, and sampling state actions as a stealing data set;
(3) constructing a stealing model by utilizing a deep reinforcement learning algorithm, wherein the stealing model is used for simulating the attack action of an attack target model;
(4) inputting the stealing data set serving as a training sample into a stealing model and training the stealing model by using a simulation learning algorithm; the training steps are as follows:
(4.1) using the Actor network instead of generator G, inputting the output action and state pair into the arbiter to compare with expert data, and using the output of arbiter D: sxa → (0,1) as a reward value to guide the strategy learning of the emulation learning, the arbiter loss function is expressed as:
Figure BDA0003151646270000073
wherein, piILRepresenting strategies obtained by imitation of learning,. pitRepresenting the expert strategy of sampling, wherein logD (s, a) in the first item represents the judgment of the discriminator on real data, and the second item log (1-D (s, a)) represents the judgment on generated data;
(4.2) in the training process, minimizing a loss function through gradient derivation so as to reversely update network parameters of the arbiter and the Actor, wherein the loss function is as follows:
Figure BDA0003151646270000074
wherein the content of the first and second substances,
Figure BDA0003151646270000075
is a simulation strategy piILThe entropy of the loss function is controlled by a constant lambda (lambda is more than or equal to 0) and is used as a strategy regular term in the loss function;
and (4.3) generating a target model for resisting sample attack by using the trained stealing model.
(5) Adding a differential privacy protection mechanism to an intermediate layer of a trained target model, and inputting data output by the target model under the action of the differential privacy protection mechanism into a stealing model; the differential privacy mechanism is represented as follows:
Figure BDA0003151646270000081
wherein the content of the first and second substances,
Figure BDA0003151646270000082
is a mean of 0 and a variance of
Figure BDA0003151646270000083
Gaussian distribution, a single applied Gaussian mechanism sensitivity function fdySatisfy the requirement of
Figure BDA0003151646270000084
ε<1,f(dse) Representing an input sequence dseThe sensitivity of (2).
(6) The stealing model makes wrong attack action under the influence of data with the function of a differential privacy mechanism;
defining the measurement index of the model stealing attack as:
Figure BDA0003151646270000085
this formula measures the effectiveness and extent of model stealing of the target model,
wherein: rstlIs the value of the reward after model stealing, RtestIs the original model test award value.
Then the measure of model stealing defense added with the differential privacy protection mechanism is as follows:
Figure BDA0003151646270000086
the formula measures the defense effect of the invention, and intuitively speaking, the formula measures the reduction degree of model stealing attack under the defense of the invention;
wherein: rdefnseIs the model steals the defended reward value, RstlIs the value of the reward after model stealing, RtestIs the original model test award value.

Claims (7)

1. A deep reinforcement learning model security reinforcement method based on differential privacy is characterized by comprising the following steps:
sampling data from the environment as a sample set to be trained, constructing a target model by using a deep reinforcement learning algorithm, and inputting the sample set to be trained into the target model to train the target model;
testing the trained target model, and sampling state actions as a stealing data set;
constructing a stealing model by utilizing a deep reinforcement learning algorithm, wherein the stealing model is used for simulating the attack action of an attack target model;
inputting the stealing data set serving as a training sample into a stealing model and training the stealing model by using a simulation learning algorithm;
adding a differential privacy protection mechanism into a trained target model, and inputting data output by the target model under the action of the differential privacy protection mechanism into a stealing model;
the stealing model makes false attack actions under the influence of data acted by a differential privacy mechanism.
2. The method for security reinforcement of the deep reinforcement learning model based on the differential privacy as claimed in claim 1, wherein the training of the target model comprises the following steps:
using an experience playback mechanism, and carrying out online collection and processing to obtain an online sample set;
storing the online sample set and the sample set to be trained into a playback memory unit to form a transfer sample;
during each training, randomly extracting transfer samples from the transfer samples, inputting the transfer samples into a current value network to obtain a current Q value, and updating parameters by using a random gradient descent algorithm in the training process;
copying parameters of the current value network to a target value network to obtain an optimization target of the current Q value, namely a target Q value;
updating network parameters by minimizing a mean square error between a current Q value and a target Q value;
and (3) the reward value and the error term are reduced to a limited interval by a deep reinforcement learning algorithm, and an optimal strategy is obtained through gradient descent optimization.
3. The method for security reinforcement of the deep reinforcement learning model based on the differential privacy as claimed in claim 2, wherein the error function between the current Q value and the error Q value is as follows:
Figure FDA0003151646260000011
the parameter θ is subjected to partial derivation to obtain the following gradient:
Figure FDA0003151646260000021
where s is the current state, a is the corresponding action, s' is the next state, θiIs a model parameter, E represents expectation, YiRepresents the desired Q value, Q (s, a | θ)i) A prize value representing state s and action a.
4. The method for security reinforcement of the deep reinforcement learning model based on the differential privacy as claimed in claim 2, wherein the optimal strategy is as follows:
Figure FDA0003151646260000022
where s is the current state, a is the corresponding action, A action set, Q*Is an optimum function, pi*Is the optimal strategy.
5. The method for security reinforcement of the deep reinforcement learning model based on the differential privacy as claimed in claim 1, wherein the training of the stealing model comprises the following steps:
the action and the state output by the generator G are input into the discriminator in pairs to be compared with expert data by using an Actor network instead of the generator G, the output of the discriminator D: S multiplied by A → (0,1) is used as a reward value to guide strategy learning simulating the learning, and a discriminator loss function is expressed as:
Figure FDA0003151646260000023
wherein, piILRepresenting strategies obtained by imitation of learning,. pitRepresenting the expert strategy of sampling, wherein logD (s, a) in the first item represents the judgment of the discriminator on real data, and the second item log (1-D (s, a)) represents the judgment on generated data;
in the training process, a loss function is minimized through gradient derivation so as to reversely update network parameters of the arbiter and the Actor, and the loss function is as follows:
Figure FDA0003151646260000024
wherein the content of the first and second substances,
Figure FDA0003151646260000025
is a simulation strategy piILThe entropy of the loss function is controlled by a constant lambda (lambda is more than or equal to 0) and is used as a strategy regular term in the loss function;
and generating a target model for resisting sample attack by using the trained stealing model.
6. The method for security reinforcement of the deep reinforcement learning model based on the differential privacy as claimed in claim 1, wherein: the differential privacy mechanism is represented as follows:
Figure FDA0003151646260000026
wherein the content of the first and second substances,
Figure FDA0003151646260000031
is a mean of 0 and a variance of
Figure FDA0003151646260000032
Gaussian distribution, a single applied Gaussian mechanism sensitivity function fdySatisfy the requirement of
Figure FDA0003151646260000033
ε<1,f(dse) Representing an input sequence dseThe sensitivity of (c);
a differential privacy mechanism is added to the target model middle layer.
7. A differential privacy based deep reinforcement learning model security reinforcement apparatus comprising a computer memory, a computer processor, and a computer program stored in the computer memory and executable on the computer processor, characterized in that: the computer processor, when executing the computer program, implements the differential privacy based deep reinforcement learning model security reinforcement method of any one of claims 1-6.
CN202110766183.8A 2021-07-07 2021-07-07 Deep reinforcement learning model security reinforcement method and device based on differential privacy Pending CN113360917A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110766183.8A CN113360917A (en) 2021-07-07 2021-07-07 Deep reinforcement learning model security reinforcement method and device based on differential privacy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110766183.8A CN113360917A (en) 2021-07-07 2021-07-07 Deep reinforcement learning model security reinforcement method and device based on differential privacy

Publications (1)

Publication Number Publication Date
CN113360917A true CN113360917A (en) 2021-09-07

Family

ID=77538674

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110766183.8A Pending CN113360917A (en) 2021-07-07 2021-07-07 Deep reinforcement learning model security reinforcement method and device based on differential privacy

Country Status (1)

Country Link
CN (1) CN113360917A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114547687A (en) * 2022-02-22 2022-05-27 浙江星汉信息技术股份有限公司 Question-answering system model training method and device based on differential privacy technology
WO2023206777A1 (en) * 2022-04-29 2023-11-02 浪潮(北京)电子信息产业有限公司 Model generation method and apparatus, operation control method and apparatus, device, and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200311540A1 (en) * 2019-03-28 2020-10-01 International Business Machines Corporation Layer-Wise Distillation for Protecting Pre-Trained Neural Network Models
CN112052456A (en) * 2020-08-31 2020-12-08 浙江工业大学 Deep reinforcement learning strategy optimization defense method based on multiple intelligent agents
CN112241554A (en) * 2020-10-30 2021-01-19 浙江工业大学 Model stealing defense method and device based on differential privacy index mechanism
CN112884131A (en) * 2021-03-16 2021-06-01 浙江工业大学 Deep reinforcement learning strategy optimization defense method and device based on simulation learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200311540A1 (en) * 2019-03-28 2020-10-01 International Business Machines Corporation Layer-Wise Distillation for Protecting Pre-Trained Neural Network Models
CN112052456A (en) * 2020-08-31 2020-12-08 浙江工业大学 Deep reinforcement learning strategy optimization defense method based on multiple intelligent agents
CN112241554A (en) * 2020-10-30 2021-01-19 浙江工业大学 Model stealing defense method and device based on differential privacy index mechanism
CN112884131A (en) * 2021-03-16 2021-06-01 浙江工业大学 Deep reinforcement learning strategy optimization defense method and device based on simulation learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘全 等: "深度强化学习综述", 《计算机学报》 *
赵静雯: "基于差分隐私的深度学习隐私保护研究", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114547687A (en) * 2022-02-22 2022-05-27 浙江星汉信息技术股份有限公司 Question-answering system model training method and device based on differential privacy technology
WO2023206777A1 (en) * 2022-04-29 2023-11-02 浪潮(北京)电子信息产业有限公司 Model generation method and apparatus, operation control method and apparatus, device, and storage medium

Similar Documents

Publication Publication Date Title
CN107483486B (en) Network defense strategy selection method based on random evolution game model
CN112052456A (en) Deep reinforcement learning strategy optimization defense method based on multiple intelligent agents
CN112884131A (en) Deep reinforcement learning strategy optimization defense method and device based on simulation learning
CN113360917A (en) Deep reinforcement learning model security reinforcement method and device based on differential privacy
CN113179263A (en) Network intrusion detection method, device and equipment
Zennaro et al. Modelling penetration testing with reinforcement learning using capture‐the‐flag challenges: Trade‐offs between model‐free learning and a priori knowledge
CN111282267A (en) Information processing method, information processing apparatus, information processing medium, and electronic device
Mo et al. MCTSteg: A Monte Carlo tree search-based reinforcement learning framework for universal non-additive steganography
CN113392396A (en) Strategy protection defense method for deep reinforcement learning
CN111488904A (en) Image classification method and system based on confrontation distribution training
CN113420326A (en) Deep reinforcement learning-oriented model privacy protection method and system
CN113033822A (en) Antagonistic attack and defense method and system based on prediction correction and random step length optimization
CN115033878A (en) Rapid self-game reinforcement learning method and device, computer equipment and storage medium
CN112001480A (en) Small sample amplification method for sliding orientation data based on generation of countermeasure network
CN113704098B (en) Deep learning fuzzy test method based on Monte Carlo search tree seed scheduling
CN110598794A (en) Classified countermeasure network attack detection method and system
CN114358278A (en) Training method and device of neural network model
CN111144243B (en) Household pattern recognition method and device based on counterstudy
Lin et al. An uncertainty-incorporated approach to predict the winner in StarCraft II using neural processes
CN116306268A (en) Shield tunneling simulation model parameter identification method system based on federal reinforcement learning
CN113344071B (en) Intrusion detection algorithm based on depth strategy gradient
Cranford et al. Accounting for Uncertainty in Deceptive Signaling for Cybersecurity
EP4116853A1 (en) Computer-readable recording medium storing evaluation program, evaluation method, and information processing device
CN114036503B (en) Migration attack method and device, electronic equipment and storage medium
CN113313236B (en) Deep reinforcement learning model poisoning detection method and device based on time sequence neural pathway

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210907

RJ01 Rejection of invention patent application after publication