CN113360917A - Deep reinforcement learning model security reinforcement method and device based on differential privacy - Google Patents
Deep reinforcement learning model security reinforcement method and device based on differential privacy Download PDFInfo
- Publication number
- CN113360917A CN113360917A CN202110766183.8A CN202110766183A CN113360917A CN 113360917 A CN113360917 A CN 113360917A CN 202110766183 A CN202110766183 A CN 202110766183A CN 113360917 A CN113360917 A CN 113360917A
- Authority
- CN
- China
- Prior art keywords
- model
- stealing
- value
- differential privacy
- reinforcement learning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000002787 reinforcement Effects 0.000 title claims abstract description 71
- 238000000034 method Methods 0.000 title claims abstract description 37
- 230000009471 action Effects 0.000 claims abstract description 37
- 230000007246 mechanism Effects 0.000 claims abstract description 29
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 25
- 238000012549 training Methods 0.000 claims abstract description 24
- 238000012360 testing method Methods 0.000 claims abstract description 12
- 238000005070 sampling Methods 0.000 claims abstract description 11
- 238000004088 simulation Methods 0.000 claims abstract description 7
- 230000006870 function Effects 0.000 claims description 30
- 238000009826 distribution Methods 0.000 claims description 14
- 238000012546 transfer Methods 0.000 claims description 12
- 230000008569 process Effects 0.000 claims description 11
- 238000005457 optimization Methods 0.000 claims description 7
- 230000035945 sensitivity Effects 0.000 claims description 7
- 230000000875 corresponding effect Effects 0.000 claims description 6
- 238000009795 derivation Methods 0.000 claims description 6
- 239000000126 substance Substances 0.000 claims description 6
- 238000004590 computer program Methods 0.000 claims description 4
- 238000012545 processing Methods 0.000 claims description 3
- 230000003014 reinforcing effect Effects 0.000 abstract description 5
- 230000007123 defense Effects 0.000 description 18
- 230000007613 environmental effect Effects 0.000 description 4
- 238000013473 artificial intelligence Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 230000007704 transition Effects 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000005728 strengthening Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/57—Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
- G06F21/575—Secure boot
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
Abstract
The invention discloses a method and a device for reinforcing the safety of a deep reinforcement learning model based on differential privacy, wherein the method comprises the following steps: sampling data from the environment as a sample set to be trained, constructing a target model by using a deep reinforcement learning algorithm, and inputting the sample set to be trained into the target model to train the target model; testing the trained target model, and sampling state actions as a stealing data set; constructing a stealing model by utilizing a deep reinforcement learning algorithm; inputting the stealing data set serving as a training sample into a stealing model and training the stealing model by using a simulation learning algorithm; adding a differential privacy protection mechanism into a trained target model, and inputting data output by the target model under the action of the differential privacy protection mechanism into a stealing model; the stealing model makes false attack actions under the influence of data acted by a differential privacy mechanism.
Description
Technical Field
The invention relates to the field of data security, in particular to a method and a device for reinforcing the security of a deep reinforcement learning model based on differential privacy.
Background
With the rapid development of artificial intelligence, a deep reinforcement learning algorithm combining the perception capability of deep learning and the decision capability of reinforcement learning is widely applied to the fields of automatic driving, automatic translation, game AI and the like.
However, recent research shows that a deep reinforcement learning model is easily attacked by different types of malicious attacks, the integrity, the usability and the confidentiality of a deep reinforcement learning system are greatly threatened by security holes existing in a deep reinforcement learning algorithm, and as artificial intelligence is increasingly closely related to production life, the demand of people on solving the problem of artificial intelligence application security is increasingly urgent.
The existing method for improving the security of the deep learning model is a defense method facing the deep reinforcement learning model to resist attacks, which is disclosed in the Chinese patent application with the publication number of CN 110968866A; the defense method comprises the following steps: predicting the input previous environmental state by using a visual prediction model constructed based on a generative confrontation network, outputting and predicting the current environmental state, and obtaining a next frame prediction environmental state value of the predicted current environmental state under a deep reinforcement learning strategy; acquiring an actual current environment state output by the deep reinforcement learning model, and acquiring an environment state value of the actual current environment state added with disturbance under a deep reinforcement learning strategy; judging the predicted environment state value and the environment state value added with disturbance by using a judgment model constructed based on a generative confrontation network, and obtaining whether the deep reinforcement learning model is attacked or not according to a judgment result; when the deep reinforcement learning model is attacked, extracting an actual current environment state, performing first-layer defense on the actual current environment state by using a first defense model based on Squeezenet, and performing second-layer defense on a first-layer defense result by using a second defense model based on DenseNet to obtain the actual current environment state after defense; and the deep reinforcement learning model performs learning prediction output by using the actual current environment state after defense.
The defense method for resisting attacks by the deep reinforcement learning model provided by the patent application utilizes the visual prediction model, the discriminator and the additional defense model to defend the reinforcement learning model, and the defense method utilizes reinforcement learning to defend but not to perform security reinforcement on the deep reinforcement learning model.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a method and a device for reinforcing the safety of a depth reinforcement learning model based on differential privacy, which realize the purpose that the output distribution of the depth model is blurred to the maximum extent on the premise of not changing the output action of the model, and the level of model stealing attack is greatly reduced, thereby preventing an attacker from stealing the original model by utilizing the action space distribution.
A deep reinforcement learning model security reinforcement method based on differential privacy comprises the following steps:
sampling data from the environment as a sample set to be trained, constructing a target model by using a deep reinforcement learning algorithm, and inputting the sample set to be trained into the target model to train the target model;
testing the trained target model, and sampling state actions as a stealing data set;
constructing a stealing model by utilizing a deep reinforcement learning algorithm, wherein the stealing model is used for simulating the attack action of an attack target model;
inputting the stealing data set serving as a training sample into a stealing model and training the stealing model by using a simulation learning algorithm;
adding a differential privacy protection mechanism into a trained target model, and inputting data output by the target model under the action of the differential privacy protection mechanism into a stealing model;
the stealing model makes false attack actions under the influence of data acted by a differential privacy mechanism.
The training of the target model comprises the following steps:
using an experience playback mechanism, and carrying out online collection and processing to obtain an online sample set;
storing the online sample set and the sample set to be trained into a playback memory unit to form a transfer sample;
during each training, randomly extracting transfer samples from the transfer samples, inputting the transfer samples into a current value network to obtain a current Q value, and updating parameters by using a random gradient descent algorithm in the training process;
copying parameters of the current value network to a target value network to obtain an optimization target of the current Q value, namely a target Q value;
updating network parameters by minimizing a mean square error between a current Q value and a target Q value; after the target value network is introduced, the target Q value is kept unchanged in a period of time, so that the correlation between the current Q value and the target Q value is reduced to a certain extent, and the stability of the algorithm is improved;
the depth reinforcement learning algorithm reduces the reward value and the error term to a limited interval, ensures that the Q value and the gradient value are in a reasonable range, improves the stability of the algorithm, and obtains an optimal strategy through gradient descent optimization.
The deep reinforcement learning problem can be modeled as a markov decision process, i.e. MDP ═ S, a, R, P can be represented by a quadruple, where S represents the set of states available in the decision process, a represents the set of actions in the decision process, R represents the real-time reward for state transition, and P is the state transition probability. At the beginning of any time step t, the agent observes the environment to get the current state stAnd according to the current optimal strategy pi*Make action atAt the end of t, the agent receives its reward rtAnd the next observation state st+1(ii) a The deep reinforcement learning algorithm adopts a target value network parameter update called 'hard' mode, namely, the network parameters in the current value network are assigned to the target value network at regular intervals;
when the deep reinforcement network is trained, the samples are generally required to be mutually independent and are sampled randomly, so that the relevance among the samples is greatly reduced, and the stability of the algorithm is improved;
typically, the output of the network, representing the current values, is used to evaluate the value function of the current state action pair; the output of the network of target values is typically represented by an optimization objective that is a function of the approximate representation.
The error function between the current Q value and the error Q value is as follows:
the parameter θ is subjected to partial derivation to obtain the following gradient:
where s is the current state, a is the corresponding action, r is the reward value, s' is the next state, θiIs a model parameter, E represents expectation, YiRepresents the desired Q value, Q (s, a | θ)i) A prize value representing state s and action a.
The optimal strategy is as follows:
where s is the current state, a is the corresponding action, A action set, Q*Is an optimum function, pi*Is the optimal strategy.
The training of the stealing model comprises the following steps:
the action and the state output by the generator G are input into the discriminator in pairs to be compared with expert data by using an Actor network instead of the generator G, the output of the discriminator D: S multiplied by A → (0,1) is used as a reward value to guide strategy learning simulating the learning, and a discriminator loss function is expressed as:
wherein, piILRepresenting strategies obtained by imitation of learning,. pitAn expert strategy representing sampling, logD (s, a) in the first term representing the judgment of the arbiter on the real data, the second term representing the judgment of the arbiter on the real dataThe binomial log (1-D (s, a)) represents the judgment of the generated data;
specifically, through such a maximum and minimum game process, G and D are optimized circularly and alternately to train a required Actor network and a discriminant network;
in the training process, a loss function is minimized through gradient derivation so as to reversely update network parameters of the arbiter and the Actor, and the loss function is as follows:
wherein the content of the first and second substances,is a simulation strategy piILThe entropy of the loss function is controlled by a constant lambda (lambda is more than or equal to 0) and is used as a strategy regular term in the loss function;
and generating a target model for resisting sample attack by using the trained stealing model.
The differential privacy mechanism is represented as follows:
wherein the content of the first and second substances,is a mean of 0 and a variance ofGaussian distribution, a single applied Gaussian mechanism sensitivity function fdySatisfy the requirement ofRepresenting an input sequence dseThe sensitivity of (c);
a differential privacy mechanism is added to the target model middle layer.
Approximating a real-valued function with a differential privacy mechanismOne common example of this is by adding noise calibrationSensitivity of (2)It is defined as two adjacent input sequences dseAnd d'seThe maximum value of the absolute distance therebetween.
In deep reinforcement learning, Dynamic Differential Privacy (DDP) is added to a strategy execution forward DRL model intermediate layer, and in order to ensure that given noise distribution meets (epsilon, delta) -DDP, the invention selects noise scale sigma ≧ c delta s/epsilon and constantFor ε ∈ (0, 1); in this result, the value of the data plus noise samples in the data set, Δ s, is determined byGiven the sensitivity of the function s, s is a real-valued function. A security reinforcement mechanism is dynamically added in the model to ensure that the strategy action distribution is different from the prime action space distribution, and the action space distribution taken by an attacker is difficult to predict an original model algorithm.
Specifically, the measure of the model stealing attack is defined as:
this formula measures the effectiveness and extent of model stealing of the target model,
wherein: rstlIs the value of the reward after model stealing, RtestIs the original model test award value.
Then the measure of model stealing defense added with the differential privacy protection mechanism is as follows:
the formula measures the defense effect of the invention, and intuitively speaking, the formula measures the reduction degree of model stealing attack under the defense of the invention;
wherein: rdefnseIs the model steals the defended reward value, RstlIs the value of the reward after model stealing, RtestIs the original model test award value.
A differential privacy based deep reinforcement learning model security strengthening device, comprising a computer memory, a computer processor and a computer program stored in the computer memory and executable on the computer processor, wherein the computer processor implements any one of the above methods when executing the computer program.
Compared with the prior art, the invention has the advantages that:
(1) by introducing an index mechanism of differential privacy into the model input layer, the information quantity which can be obtained by a model stealing attacker from the model output is reduced, the output distribution of the depth model is blurred to the maximum extent on the premise of not changing the output action of the model, and the level of model stealing attack is greatly reduced, so that the attacker is prevented from stealing the original model by utilizing the action space distribution.
Drawings
FIG. 1 is a general flowchart of a method for security reinforcement of a deep reinforcement learning model based on differential privacy according to the present invention;
fig. 2 is a deep reinforcement learning model schematic diagram of the deep reinforcement learning model security reinforcement method based on differential privacy provided by the invention.
Detailed Description
The invention is further described with reference to the following figures and specific examples.
The embodiment provides a deep reinforcement learning model security reinforcement method based on differential privacy, which changes the action space distribution of a deep reinforcement learning strategy through a differential privacy index mechanism, reduces the information quantity which can be obtained from model output by a model stealing attacker through introducing the differential privacy index mechanism into a model input layer, blurs the output distribution of the deep model to the maximum extent on the premise of not changing the output action of the model, and greatly reduces the level of model stealing attack, thereby preventing the attacker from stealing the original model by utilizing the action space distribution.
Fig. 1 is a general flowchart of the method for reinforcing the security of the deep reinforcement learning model based on differential privacy according to this embodiment, and the method for reinforcing the security of the deep reinforcement learning model based on differential privacy according to this embodiment can be used in the field of game AI and used for training game AI to automatically play games.
As shown in fig. 1-2, the method for security reinforcement of the deep reinforcement learning model based on differential privacy includes the following steps:
(1) sampling data from the environment as a sample set to be trained, constructing a target model by using a deep reinforcement learning algorithm, and inputting the sample set to be trained into the target model to train the target model; the specific training process comprises
(1.1) using an experience playback mechanism, and carrying out online collection and processing to obtain an online sample set;
(1.2) storing the online sample set and the sample set to be trained into a playback memory unit to form a transfer sample;
(1.3) during each training, randomly extracting transfer samples from the transfer samples, inputting the transfer samples into a current value network to obtain a current Q value, and updating parameters by using a random gradient descent algorithm in the training process;
(1.4) copying parameters of the current value network to a target value network to obtain an optimization target of the current Q value, namely the target Q value;
(1.5) updating network parameters by minimizing a mean square error between a current Q value and a target Q value; the error function between the current Q value and the error Q value is as follows:
the parameter θ is subjected to partial derivation to obtain the following gradient:
where s is the current state, a is the corresponding action, s' is the next state, θiIs a model parameter; e represents expectation, YiRepresents the desired Q value, Q (s, a | θ)i) A prize value representing state s and action a.
(1.6) the reward value and the error term are reduced to a limited interval by a deep reinforcement learning algorithm, and an optimal strategy is obtained through gradient descent optimization, wherein the optimal strategy is as follows:
where s is the current state, a is the corresponding action, A action set, Q*Is an optimum function, pi*Is the optimal strategy.
(2) Testing the trained target model, and sampling state actions as a stealing data set;
(3) constructing a stealing model by utilizing a deep reinforcement learning algorithm, wherein the stealing model is used for simulating the attack action of an attack target model;
(4) inputting the stealing data set serving as a training sample into a stealing model and training the stealing model by using a simulation learning algorithm; the training steps are as follows:
(4.1) using the Actor network instead of generator G, inputting the output action and state pair into the arbiter to compare with expert data, and using the output of arbiter D: sxa → (0,1) as a reward value to guide the strategy learning of the emulation learning, the arbiter loss function is expressed as:
wherein, piILRepresenting strategies obtained by imitation of learning,. pitRepresenting the expert strategy of sampling, wherein logD (s, a) in the first item represents the judgment of the discriminator on real data, and the second item log (1-D (s, a)) represents the judgment on generated data;
(4.2) in the training process, minimizing a loss function through gradient derivation so as to reversely update network parameters of the arbiter and the Actor, wherein the loss function is as follows:
wherein the content of the first and second substances,is a simulation strategy piILThe entropy of the loss function is controlled by a constant lambda (lambda is more than or equal to 0) and is used as a strategy regular term in the loss function;
and (4.3) generating a target model for resisting sample attack by using the trained stealing model.
(5) Adding a differential privacy protection mechanism to an intermediate layer of a trained target model, and inputting data output by the target model under the action of the differential privacy protection mechanism into a stealing model; the differential privacy mechanism is represented as follows:
wherein the content of the first and second substances,is a mean of 0 and a variance ofGaussian distribution, a single applied Gaussian mechanism sensitivity function fdySatisfy the requirement ofε<1,f(dse) Representing an input sequence dseThe sensitivity of (2).
(6) The stealing model makes wrong attack action under the influence of data with the function of a differential privacy mechanism;
defining the measurement index of the model stealing attack as:
this formula measures the effectiveness and extent of model stealing of the target model,
wherein: rstlIs the value of the reward after model stealing, RtestIs the original model test award value.
Then the measure of model stealing defense added with the differential privacy protection mechanism is as follows:
the formula measures the defense effect of the invention, and intuitively speaking, the formula measures the reduction degree of model stealing attack under the defense of the invention;
wherein: rdefnseIs the model steals the defended reward value, RstlIs the value of the reward after model stealing, RtestIs the original model test award value.
Claims (7)
1. A deep reinforcement learning model security reinforcement method based on differential privacy is characterized by comprising the following steps:
sampling data from the environment as a sample set to be trained, constructing a target model by using a deep reinforcement learning algorithm, and inputting the sample set to be trained into the target model to train the target model;
testing the trained target model, and sampling state actions as a stealing data set;
constructing a stealing model by utilizing a deep reinforcement learning algorithm, wherein the stealing model is used for simulating the attack action of an attack target model;
inputting the stealing data set serving as a training sample into a stealing model and training the stealing model by using a simulation learning algorithm;
adding a differential privacy protection mechanism into a trained target model, and inputting data output by the target model under the action of the differential privacy protection mechanism into a stealing model;
the stealing model makes false attack actions under the influence of data acted by a differential privacy mechanism.
2. The method for security reinforcement of the deep reinforcement learning model based on the differential privacy as claimed in claim 1, wherein the training of the target model comprises the following steps:
using an experience playback mechanism, and carrying out online collection and processing to obtain an online sample set;
storing the online sample set and the sample set to be trained into a playback memory unit to form a transfer sample;
during each training, randomly extracting transfer samples from the transfer samples, inputting the transfer samples into a current value network to obtain a current Q value, and updating parameters by using a random gradient descent algorithm in the training process;
copying parameters of the current value network to a target value network to obtain an optimization target of the current Q value, namely a target Q value;
updating network parameters by minimizing a mean square error between a current Q value and a target Q value;
and (3) the reward value and the error term are reduced to a limited interval by a deep reinforcement learning algorithm, and an optimal strategy is obtained through gradient descent optimization.
3. The method for security reinforcement of the deep reinforcement learning model based on the differential privacy as claimed in claim 2, wherein the error function between the current Q value and the error Q value is as follows:
the parameter θ is subjected to partial derivation to obtain the following gradient:
where s is the current state, a is the corresponding action, s' is the next state, θiIs a model parameter, E represents expectation, YiRepresents the desired Q value, Q (s, a | θ)i) A prize value representing state s and action a.
4. The method for security reinforcement of the deep reinforcement learning model based on the differential privacy as claimed in claim 2, wherein the optimal strategy is as follows:
where s is the current state, a is the corresponding action, A action set, Q*Is an optimum function, pi*Is the optimal strategy.
5. The method for security reinforcement of the deep reinforcement learning model based on the differential privacy as claimed in claim 1, wherein the training of the stealing model comprises the following steps:
the action and the state output by the generator G are input into the discriminator in pairs to be compared with expert data by using an Actor network instead of the generator G, the output of the discriminator D: S multiplied by A → (0,1) is used as a reward value to guide strategy learning simulating the learning, and a discriminator loss function is expressed as:
wherein, piILRepresenting strategies obtained by imitation of learning,. pitRepresenting the expert strategy of sampling, wherein logD (s, a) in the first item represents the judgment of the discriminator on real data, and the second item log (1-D (s, a)) represents the judgment on generated data;
in the training process, a loss function is minimized through gradient derivation so as to reversely update network parameters of the arbiter and the Actor, and the loss function is as follows:
wherein the content of the first and second substances,is a simulation strategy piILThe entropy of the loss function is controlled by a constant lambda (lambda is more than or equal to 0) and is used as a strategy regular term in the loss function;
and generating a target model for resisting sample attack by using the trained stealing model.
6. The method for security reinforcement of the deep reinforcement learning model based on the differential privacy as claimed in claim 1, wherein: the differential privacy mechanism is represented as follows:
wherein the content of the first and second substances,is a mean of 0 and a variance ofGaussian distribution, a single applied Gaussian mechanism sensitivity function fdySatisfy the requirement ofε<1,f(dse) Representing an input sequence dseThe sensitivity of (c);
a differential privacy mechanism is added to the target model middle layer.
7. A differential privacy based deep reinforcement learning model security reinforcement apparatus comprising a computer memory, a computer processor, and a computer program stored in the computer memory and executable on the computer processor, characterized in that: the computer processor, when executing the computer program, implements the differential privacy based deep reinforcement learning model security reinforcement method of any one of claims 1-6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110766183.8A CN113360917A (en) | 2021-07-07 | 2021-07-07 | Deep reinforcement learning model security reinforcement method and device based on differential privacy |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110766183.8A CN113360917A (en) | 2021-07-07 | 2021-07-07 | Deep reinforcement learning model security reinforcement method and device based on differential privacy |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113360917A true CN113360917A (en) | 2021-09-07 |
Family
ID=77538674
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110766183.8A Pending CN113360917A (en) | 2021-07-07 | 2021-07-07 | Deep reinforcement learning model security reinforcement method and device based on differential privacy |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113360917A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114547687A (en) * | 2022-02-22 | 2022-05-27 | 浙江星汉信息技术股份有限公司 | Question-answering system model training method and device based on differential privacy technology |
WO2023206777A1 (en) * | 2022-04-29 | 2023-11-02 | 浪潮(北京)电子信息产业有限公司 | Model generation method and apparatus, operation control method and apparatus, device, and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200311540A1 (en) * | 2019-03-28 | 2020-10-01 | International Business Machines Corporation | Layer-Wise Distillation for Protecting Pre-Trained Neural Network Models |
CN112052456A (en) * | 2020-08-31 | 2020-12-08 | 浙江工业大学 | Deep reinforcement learning strategy optimization defense method based on multiple intelligent agents |
CN112241554A (en) * | 2020-10-30 | 2021-01-19 | 浙江工业大学 | Model stealing defense method and device based on differential privacy index mechanism |
CN112884131A (en) * | 2021-03-16 | 2021-06-01 | 浙江工业大学 | Deep reinforcement learning strategy optimization defense method and device based on simulation learning |
-
2021
- 2021-07-07 CN CN202110766183.8A patent/CN113360917A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200311540A1 (en) * | 2019-03-28 | 2020-10-01 | International Business Machines Corporation | Layer-Wise Distillation for Protecting Pre-Trained Neural Network Models |
CN112052456A (en) * | 2020-08-31 | 2020-12-08 | 浙江工业大学 | Deep reinforcement learning strategy optimization defense method based on multiple intelligent agents |
CN112241554A (en) * | 2020-10-30 | 2021-01-19 | 浙江工业大学 | Model stealing defense method and device based on differential privacy index mechanism |
CN112884131A (en) * | 2021-03-16 | 2021-06-01 | 浙江工业大学 | Deep reinforcement learning strategy optimization defense method and device based on simulation learning |
Non-Patent Citations (2)
Title |
---|
刘全 等: "深度强化学习综述", 《计算机学报》 * |
赵静雯: "基于差分隐私的深度学习隐私保护研究", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114547687A (en) * | 2022-02-22 | 2022-05-27 | 浙江星汉信息技术股份有限公司 | Question-answering system model training method and device based on differential privacy technology |
WO2023206777A1 (en) * | 2022-04-29 | 2023-11-02 | 浪潮(北京)电子信息产业有限公司 | Model generation method and apparatus, operation control method and apparatus, device, and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107483486B (en) | Network defense strategy selection method based on random evolution game model | |
CN112052456A (en) | Deep reinforcement learning strategy optimization defense method based on multiple intelligent agents | |
CN112884131A (en) | Deep reinforcement learning strategy optimization defense method and device based on simulation learning | |
CN113360917A (en) | Deep reinforcement learning model security reinforcement method and device based on differential privacy | |
CN113179263A (en) | Network intrusion detection method, device and equipment | |
Zennaro et al. | Modelling penetration testing with reinforcement learning using capture‐the‐flag challenges: Trade‐offs between model‐free learning and a priori knowledge | |
CN111282267A (en) | Information processing method, information processing apparatus, information processing medium, and electronic device | |
Mo et al. | MCTSteg: A Monte Carlo tree search-based reinforcement learning framework for universal non-additive steganography | |
CN113392396A (en) | Strategy protection defense method for deep reinforcement learning | |
CN111488904A (en) | Image classification method and system based on confrontation distribution training | |
CN113420326A (en) | Deep reinforcement learning-oriented model privacy protection method and system | |
CN113033822A (en) | Antagonistic attack and defense method and system based on prediction correction and random step length optimization | |
CN115033878A (en) | Rapid self-game reinforcement learning method and device, computer equipment and storage medium | |
CN112001480A (en) | Small sample amplification method for sliding orientation data based on generation of countermeasure network | |
CN113704098B (en) | Deep learning fuzzy test method based on Monte Carlo search tree seed scheduling | |
CN110598794A (en) | Classified countermeasure network attack detection method and system | |
CN114358278A (en) | Training method and device of neural network model | |
CN111144243B (en) | Household pattern recognition method and device based on counterstudy | |
Lin et al. | An uncertainty-incorporated approach to predict the winner in StarCraft II using neural processes | |
CN116306268A (en) | Shield tunneling simulation model parameter identification method system based on federal reinforcement learning | |
CN113344071B (en) | Intrusion detection algorithm based on depth strategy gradient | |
Cranford et al. | Accounting for Uncertainty in Deceptive Signaling for Cybersecurity | |
EP4116853A1 (en) | Computer-readable recording medium storing evaluation program, evaluation method, and information processing device | |
CN114036503B (en) | Migration attack method and device, electronic equipment and storage medium | |
CN113313236B (en) | Deep reinforcement learning model poisoning detection method and device based on time sequence neural pathway |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210907 |
|
RJ01 | Rejection of invention patent application after publication |