WO2021196701A1 - 一种应对攻击的方法及联邦学习装置 - Google Patents
一种应对攻击的方法及联邦学习装置 Download PDFInfo
- Publication number
- WO2021196701A1 WO2021196701A1 PCT/CN2020/134270 CN2020134270W WO2021196701A1 WO 2021196701 A1 WO2021196701 A1 WO 2021196701A1 CN 2020134270 W CN2020134270 W CN 2020134270W WO 2021196701 A1 WO2021196701 A1 WO 2021196701A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- intensity
- round
- monitoring
- malicious attacker
- federated learning
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
Definitions
- the present invention relates to the field of Fintech technology and artificial intelligence technology, in particular to a method for responding to an attack and a federated learning device.
- Federated learning is a new type of machine learning concept that ensures the maximum protection of user privacy data through distributed training and encryption technology.
- each participant contributes the encrypted data model to the alliance to jointly train a federated learning model, and then open it to all participants through the federated learning model.
- the present invention provides a method for responding to an attack and a federated learning device to solve the problem that there is no method for responding to an attack by a malicious attacker in the prior art, so as to achieve the purpose of preventing a malicious attacker from attacking the federated learning model. Reduce the success rate of malicious attackers attacking the federated learning model.
- the present invention provides a method for responding to attacks, including:
- the penalty loss of the malicious attacker is determined and sent to the malicious attacker.
- determining the penalty intensity of the alliance against the malicious attacker in this round includes:
- the loss intensity, the success rate, and the preset penalty intensity model determine the penalty intensity of the alliance against the malicious attacker in this round.
- determining the attack probability of each participant's attack on the federated learning model includes:
- model parameters update the parameters of the federated learning model, and count the number of historical rounds of the alliance training each participant; the update times of the parameters of the federated learning model are equal to the number of historical rounds of the alliance training each participant ;
- the attack probability of each participant attacking the federated learning model is determined.
- the target monitoring intensity of the previous round is determined, including:
- the historical loss the historical monitoring cost, the success rate of the malicious attacker in the previous round of attacking the federated learning model, and when the federated learning model was attacked by the malicious attacker in the previous round
- the loss intensity corresponding to the alliance, the attack probability of the malicious attacker in the previous round and the preset monitoring intensity model are used to determine the target monitoring intensity in the previous round.
- the target monitoring intensity is between a first threshold and a second threshold; according to the historical loss, the historical monitoring cost, the malicious attacker attacked the federation in the last round.
- the intensity of target monitoring for one round includes:
- the historical loss the historical monitoring cost, the success rate of the malicious attacker in the previous round of attacking the federated learning model, and when the federated learning model was attacked by the malicious attacker in the previous round
- the loss intensity corresponding to the alliance, the attack probability of the malicious attacker in the previous round, and the preset monitoring intensity model are used to determine the maximum monitoring intensity in the previous round
- identifying malicious attackers from the participating parties according to the attack probability includes:
- the list of malicious attackers includes the correspondence between the identification information of the malicious attacker and the attack probability of the malicious attacker;
- a malicious attacker is identified from the participating parties.
- the method further includes:
- the monitoring budget of the preset dynamic monitoring mechanism is updated according to the penalty loss, which is used to monitor the model parameters fed back by the participants in the next round of the alliance training.
- the present invention provides a federated learning device, and the federated learning device includes:
- the monitoring unit is used to monitor the model parameters fed back by each participant in this round of the alliance training according to the preset dynamic monitoring mechanism;
- the processing unit is configured to determine the attack probability of each participant attacking the federated learning model according to the model parameters and a preset attack probability model, and identify malicious attacks from each participant according to the attack probability Party; According to the attack probability of the malicious attacker in the previous round and the preset monitoring intensity model, determine the target monitoring intensity of the previous round, and determine the cost according to the target monitoring intensity and preset punishment intensity model of the previous round The penalty intensity of the round alliance against the malicious attacker; according to the penalty intensity, determine the penalty loss of the malicious attacker;
- the sending unit is configured to send the penalty loss of the malicious attacker to the malicious attacker.
- the processing unit is specifically configured to:
- the loss intensity, the success rate, and the preset penalty intensity model determine the penalty intensity of the alliance against the malicious attacker in this round.
- the processing unit is specifically configured to:
- model parameters update the parameters of the federated learning model, and count the number of historical rounds of the alliance training each participant; the update times of the parameters of the federated learning model are equal to the number of historical rounds of the alliance training each participant ;
- the attack probability of each participant attacking the federated learning model is determined.
- the processing unit is specifically configured to:
- the historical loss the historical monitoring cost, the success rate of the malicious attacker in the previous round of attacking the federated learning model, and when the federated learning model was attacked by the malicious attacker in the previous round
- the loss intensity corresponding to the alliance, the attack probability of the malicious attacker in the previous round and the preset monitoring intensity model are used to determine the target monitoring intensity in the previous round.
- the target monitoring intensity is between a first threshold and a second threshold; the processing unit is specifically configured to:
- the historical loss the historical monitoring cost, the success rate of the malicious attacker in the previous round of attacking the federated learning model, and when the federated learning model was attacked by the malicious attacker in the previous round
- the loss intensity corresponding to the alliance, the attack probability of the malicious attacker in the previous round, and the preset monitoring intensity model are used to determine the maximum monitoring intensity in the previous round
- the processing unit is specifically configured to:
- the list of malicious attackers includes the correspondence between the identification information of the malicious attacker and the attack probability of the malicious attacker;
- a malicious attacker is identified from the participating parties.
- the monitoring unit is also used for:
- the present invention provides a federated learning device.
- the federated learning device includes: at least one processor and a memory; wherein the memory stores one or more computer programs; when the memory stores one or more When the computer program is executed by the at least one processor, the federated learning device can execute the above-mentioned first aspect or any one of the possible design methods of the above-mentioned first aspect.
- the present invention provides a computer-readable storage medium, the computer-readable storage medium stores computer instructions, and when the computer instructions run on a computer, the computer can execute the first aspect or the foregoing Any one of the possible design methods of the first aspect.
- the federation learning device monitors the model parameters fed back by each participant in the alliance training this round through a preset dynamic monitoring mechanism, and determines that each participant has an impact on the federation according to the model parameters and the preset attack probability model.
- the attack probability of the attack by learning the model so that the malicious attacker can be identified in time from each participant according to the attack probability of each participant.
- the federated learning device can also determine the target monitoring intensity of the previous round according to the attack probability of the malicious attacker in the previous round and the preset monitoring intensity model, so as to realize the effective monitoring of each participant according to different monitoring intensity.
- the federal learning device can also determine the punishment of the alliance against the malicious attacker in this round according to the target monitoring intensity and preset punishment intensity model of the previous round, and determine the penalty loss of the malicious attacker according to the punishment intensity and send it to the malicious attacker.
- the attacker can thereby deter the malicious attacker, so as to prevent the malicious attacker from attacking the federated learning model, thereby effectively reducing the success rate of the malicious attacker attacking the federated learning model.
- FIG. 1 is a schematic flowchart of a method for responding to an attack provided by an embodiment of the present invention
- FIG. 2 is a schematic diagram of a process of monitoring each participant according to a preset dynamic monitoring mechanism by a federated learning device according to an embodiment of the present invention
- Figure 3 is a schematic structural diagram of a federated learning device provided by an embodiment of the present invention.
- Figure 4 is a schematic structural diagram of a federated learning device provided by an embodiment of the present invention.
- the embodiment of the present invention provides a method for responding to attacks, which is used to fill the gaps in the field of federated learning that currently respond to attacks by malicious attackers. At the same time, it can also achieve the purpose of preventing malicious attackers from attacking the federated learning model. It can reduce the success rate of malicious attackers attacking the federated learning model.
- the following specifically introduces the specific process of the federated learning device in the embodiment of the present invention to deal with the attack of the malicious attacker.
- FIG. 1 is a schematic flowchart of a method for responding to an attack according to an embodiment of the present invention.
- the method can be applied to a federated learning device.
- the method flow includes:
- the attack time can be in any round of federated learning model training, where the round is a basic concept in federated learning, It can be understood as a certain period of time, and each round can have multiple participants. Therefore, in the embodiment of the present invention, the federated learning device can monitor the model parameters fed back by the participants in any round of the alliance training according to the preset dynamic monitoring mechanism, where the preset dynamic monitoring mechanism is to dynamically monitor through different monitoring efforts. The mechanism of each participant's feedback of model parameters in any round.
- the federal learning device can use the monitoring intensity determined in any round to monitor the model parameters fed back by each participant of the alliance training in any round, so that it can target the feedback of each participant in the alliance training in any round.
- the model parameters are monitored to achieve the purpose of effectively monitoring the alliance to train each participant in any round. For example, taking this round as an example, when the monitoring intensity is 0, the federated learning device does not monitor the model parameters fed back by the participants in the alliance training in this round, and directly aggregates the federated learning model; when the monitoring intensity is 0.5, the federated learning device Monitor the model parameters fed back by half of the participants in the training of the alliance in this round.
- the federated learning device randomly monitors the model parameters fed back by half of the participants in the training of the alliance in this round; when the monitoring intensity is 1, The federated learning device monitors the model parameters fed back by each participant in the current round of the alliance training. Among them, the determination of monitoring intensity will be described in detail later.
- the federated learning device can update the parameters of the federated learning model according to the received model parameters, and count the number of historical rounds of the alliance training each participant.
- the number of updates of the parameters of the federated learning model is equal to the number of historical rounds of the alliance training each participant. For example, when the federated learning device detects that the parameters of the federated learning model have been updated 20 times, the federated learning device may determine that the number of historical rounds of the alliance training each participant is 20.
- the federated learning device can also count and record the number of times each participant has attacked the federated learning model in each historical round. For example, the federated learning device can record the number of times each participant has attacked the federated learning model in any historical round, and subsequently, the federated learning device can count the number of times each participant has attacked the federated learning model in each historical round.
- the federated learning device can obtain the above-recorded number of times each participant attacked the federated learning model in each historical round, and the number of historical rounds of the alliance training each participant, and can train each participant according to the historical rounds of the alliance.
- the number of times and the number of times each participant attacked the federated learning model in each historical round determines the historical attack probability of each participant in each historical round.
- the federated learning device can determine that the attack probability of participant a in the 1-10 and 16-20 historical rounds is 0, and the historical attack in the 11-15 historical rounds The odds are both 0.05.
- the federated learning device can determine the attack probability of each participant attacking the federated learning model according to the historical attack probability of each participant in each historical round and the preset attack probability model. For example, if the default attack probability model is:
- p(a) is the attack probability of the participant
- a is the index of the attack (attack)
- N is equal to the number of historical rounds
- P n is the attack probability of the participant in any historical round.
- the federated learning device determines the attack probability of each participant attacking the federated learning model according to the historical attack probability of each participant in each historical round and the preset attack probability model, so that the The probability of attack by the federated learning model is to identify the malicious attacker from each participant in time, so that the subsequent federated learning device can punish the malicious attacker according to the corresponding punishment measures, so as to prevent the malicious attacker from attacking the federated learning model.
- the federated learning device determines the attack probability of each participant on the federated learning model based on the historical attack probability and the preset attack probability model of each participant in each historical round.
- the federated learning device can determine the malicious attacker from the participating parties according to the preset list of malicious attackers and the attack probability of the malicious attacker. For example, the federated learning device can record the identification information of the malicious attacker who attacked the federated learning model in each historical round among the participating parties, and establish a corresponding relationship based on the identification information of the malicious attacker and the attack probability of the malicious attacker to obtain the prediction. The list of malicious attackers set up. After the federated learning device determines the attack probability of each participant, it can determine the malicious attacker from each participant according to the list of malicious attackers.
- the federated learning device can determine each Participant a among the participants is a malicious attacker a.
- the identification information of the malicious attacker may be any information that can uniquely identify the malicious attacker, such as account information and registration information of the malicious attacker, which is not specifically limited in the embodiment of the present invention.
- the list of malicious attackers recorded by the federated learning device can be dynamically updated, that is, the attack probability of the malicious attacker currently recorded in the list of malicious attackers can be the attack corresponding to the malicious attacker in the previous round. Probability: When the next round ends at the end of this round, the attack probability of the malicious attacker currently recorded in the list of malicious attackers can be the corresponding attack probability of the malicious attacker in this round.
- the federated learning device creates a list of malicious attackers, which can facilitate the federated learning device to identify malicious attackers from the participating parties in a timely manner according to the attack probability of each participant, thereby facilitating subsequent federated learning devices to pass
- the penalty loss of the malicious attacker is used to deter the malicious attacker, so as to prevent the malicious attacker from attacking the federated learning model.
- S103 Determine the target monitoring intensity of the previous round according to the attack probability of the malicious attacker in the previous round and the preset monitoring intensity model, and determine the current target monitoring intensity according to the target monitoring intensity and the preset punishment intensity model of the previous round. The penalties of the round alliance against the malicious attacker.
- the federated learning device can set the monitoring cost of any round, so that the alliance can give the monitoring budget of any round according to the monitoring cost of any round, so as to solve the problem of insufficient monitoring budget.
- the monitoring cost of any round is calculated by the federal learning device according to the monitoring intensity of any round.
- the historical monitoring cost of the previous round is calculated from the historical monitoring intensity of the previous round.
- the monitoring cost of any round can be expressed as:
- ⁇ is a constant greater than 0, the specific value of ⁇ can be set according to actual needs, and r is the monitoring intensity of any round.
- the federated learning device can record the historical monitoring intensity of each historical round in order to calculate the historical monitoring cost of each historical round (the specific calculation process will be described in detail later). After that, the federated learning device can determine the success rate of the malicious attacker in attacking the federated learning model in each historical round according to the historical monitoring cost corresponding to each historical round. Among them, the success rate of malicious attackers attacking the federated learning model in each historical round can be expressed as:
- r is the historical monitoring cost of each historical round
- ⁇ is a constant greater than 0, the specific value of ⁇ can be set according to actual needs , The embodiment of the present invention does not make specific limitations.
- the federated learning device may also record the duration of the model parameters fed back by the participants in each historical round of the monitoring alliance. After that, the federated learning device can determine the corresponding loss intensity of the federated learning model of each historic round when the federated learning model of each historic round is attacked by a malicious attacker according to the recorded duration of each historical round of the monitoring alliance training the model parameters fed back by each participant.
- the loss intensity corresponding to the alliance when the federated learning model of each historical round is attacked by a malicious attacker can be expressed as:
- ⁇ is a constant greater than 0, the specific value of ⁇ can be set according to actual needs, and t is the length of time the federated learning device monitors the model parameters fed back by each participant in each historical round of the alliance training.
- the federated learning device can monitor the cost of each historical round, the success rate of the malicious attacker in attacking the federated learning model in each historical round, the loss intensity corresponding to the alliance when the federated learning model of each historical round is attacked by the malicious attacker, and each history.
- the attack probability of each participant in the round determines the historical loss of the alliance in each historical round.
- the historical loss of the alliance in each historical round can be expressed as:
- the federated learning device can calculate the historical monitoring cost of the previous round of alliances according to the above formula (2), and calculate the success rate of the previous malicious attacker attacking the federated learning model according to the above formula (3).
- the above formula (4) Calculate the loss intensity corresponding to the alliance when the federated learning model in the previous round was attacked by a malicious attacker.
- the federated learning device can monitor the historical cost of the alliance in the previous round, the success rate of the malicious attacker in the previous round of attacking the federated learning model, the loss intensity of the alliance when the federated learning model was attacked by the malicious attacker in the previous round, and the previous The attack probability of the malicious attacker in the round is calculated according to the above formula (5) to obtain the historical loss of the alliance in the previous round.
- the federated learning can determine the target monitoring intensity of the previous round based on the historical loss of the previous round of the alliance and the preset monitoring intensity model. For example, in order to minimize the loss of historical alliances, the federated learning device can derivate r according to the above formula (5), and set the result to 0 to obtain the preset monitoring model as:
- the federal learning device can determine the target monitoring intensity of the previous round according to the above formula (6).
- the federal learning device can set the value range of the target monitoring intensity of the previous round, that is, set the upper and lower limits of the target monitoring intensity of the previous round, where the lower limit of the target monitoring intensity is the first threshold.
- the upper limit is the second threshold.
- the first threshold can be set to 0, and the second threshold can be set to 1.
- the federal learning device can obtain the target monitoring intensity value of the previous round according to the above formula (6), which can be expressed as:
- the federal learning device can determine the maximum monitoring intensity of the previous round according to the above formula (6), and then the federal learning device can determine the target monitoring intensity of the previous round according to the above formula (7). For example, after the federal learning device determines the maximum monitoring intensity of the previous round, it determines the target monitoring intensity of the previous round by judging the relationship between the maximum monitoring intensity of the previous round and the second threshold. For example, if the federal learning device determines that the maximum monitoring intensity of the previous round is greater than or equal to the second threshold, the second threshold is used as the target monitoring intensity of the previous round; otherwise, the maximum monitoring intensity of the previous round is used as the target of the previous round Monitoring efforts.
- the federal learning device can determine the punishment intensity of the alliance against the malicious attacker in this round according to the target monitoring intensity of the previous round and the preset penalty intensity model.
- the federated learning device can determine the duration of the model parameters fed back by each participant in the training of the monitoring alliance in this round, and then, according to the above formula (4), determine the corresponding loss intensity of the federated learning model in this round when the federated learning model is attacked by a malicious attacker.
- the federated learning device can also determine the success rate of the malicious attacker in attacking the federated learning model in this round according to the target monitoring intensity of the previous round and the above formula (3).
- the federated learning device can determine that the federated learning model of this round is attacked by a malicious attacker based on the corresponding loss intensity of the alliance, the success rate of the malicious attacker's attacking the federated learning model in this round, and the preset penalty force model to determine that the alliance targets the malicious attack in this round. Fang’s punishment.
- the benefits of successfully attacking the federated learning model can be expressed as:
- c a (t) is the punishment of the alliance against malicious attackers in this round
- t at this time is the duration of the model parameters fed back by each participant in the training of the monitoring alliance in this round.
- the federated learning device in order to make it unprofitable for a malicious attacker to attack the federated learning model, the federated learning device is set That is, the preset penalty intensity model can be expressed as:
- the federated learning device can determine that the federated learning model of this round is attacked by the malicious attacker according to the loss intensity of the alliance corresponding to the attack of the malicious attacker, the success rate of the malicious attacker's attacking the federated learning model in this round, and the above formula (9). Fang’s punishment.
- the federated learning device can determine the penalty loss of the malicious attacker after determining the punishment of the alliance against the malicious attacker in this round.
- the federal learning device can use the penalty of the alliance against the malicious attacker in this round as the penalty loss of the malicious attacker, or it can use the product of the penalty intensity of the alliance against the malicious attacker in this round and the preset multiple as the malicious The penalty loss of the attacker, where the value corresponding to the preset multiple is greater than 1.
- the federated learning device can send the penalty loss of the malicious attacker to the malicious attacker, for example, send the penalty loss of the malicious attacker to the terminal corresponding to the malicious attacker to punish the malicious attacker to deter the malicious attacker.
- the terminal may be any device that can participate in federated learning, such as a mobile phone or a tablet, which is not limited in the embodiment of the present invention.
- the federated learning device sends the penalty loss of the malicious attacker to the malicious attacker.
- the federated learning device can also publish the punishment of this round of alliance against malicious attackers on the federated learning platform to deter malicious attackers. To a certain extent, it can prevent malicious attackers from attacking the federated learning model to achieve prevention.
- the purpose of malicious attackers attacking the federated learning model reduces the success rate of malicious attackers attacking the federated learning model.
- the federated learning device updates the monitoring budget of the preset dynamic monitoring mechanism according to the penalty loss, which is used to monitor the model parameters fed back by each participant in the next round of training of the alliance, so as to ensure the monitoring alliance In the next round, train the model parameters fed back by each participant.
- the federal learning device uses the above method to calculate the monitoring cost of this round
- the federal learning device can update the monitoring budget of the preset dynamic monitoring mechanism based on the combination of the penalty loss of the malicious attacker and the monitoring cost of this round, so as to ensure Monitor the implementation of the model parameters fed back by the participants in the next round of training of the alliance.
- the following specifically introduces the specific process of the federated learning device dynamically monitoring each participant according to the preset dynamic monitoring mechanism.
- the federated learning device may include an attack probability calculation module 200, an alliance budget management module 201, a monitoring intensity decision module 202, a penalty intensity decision module 203, and a federated learning result monitoring module 204.
- the attack probability calculation module 200 can receive the historical rounds of the alliance training each participant sent by the federated learning result monitoring module 204 and the number of times each participant attacked the federated learning model in each historical round to calculate each The attack probability of the participants. After that, the attack probability calculation module 200 can identify a malicious attacker from each participant according to the attack probability of each participant, and update the stored preset list of malicious attackers. The attack probability calculation module 200 sends the attack probability of each participant and the identification information of the malicious attacker to the monitoring intensity decision module 202.
- the alliance budget management module 201 is used to record the monitoring budgets corresponding to different monitoring intensities of the preset dynamic monitoring mechanism, and send the results to the monitoring intensity decision-making module 202.
- the alliance budget management module 201 can determine the monitoring budget of the preset dynamic monitoring mechanism for this round based on the penalty loss for the malicious attacker in the previous round and the monitoring cost of the previous round, and send the monitoring budget of the preset dynamic monitoring mechanism for this round To the monitoring intensity decision-making module 202.
- the monitoring intensity decision-making module 202 provides the system administrators of the federated learning device with decision support for deploying the monitoring intensity to respond to malicious attackers within the scope allowed by the monitoring budget of the preset dynamic monitoring mechanism, that is, monitoring
- the intensity decision module 202 can provide the system administrator of the federated learning device with the monitoring intensity against malicious attackers in any round within the scope allowed by the monitoring budget of the preset dynamic monitoring mechanism, and feedback the model parameters of each participant The monitoring is performed, and the duration of the model parameters fed back by each participant in any round of the monitoring alliance and the monitoring intensity provided are sent to the penalty intensity decision-making module 203 and the federal learning result monitoring module 204.
- the calculation of monitoring intensity can refer to the above content, which will not be repeated here.
- the monitoring intensity decision module 202 can also provide the system administrator of the federal learning device with the monitoring cost of any round, and send the provided monitoring cost to the federal learning result monitoring module 204.
- the punishment decision module 203 can calculate the punishment for the malicious attacker in this round according to the monitoring strength sent by the monitoring decision module 202 and the duration of the model parameters fed back by each participant in the training of the monitoring alliance in this round. And send the punishment of the malicious attacker this round to the federal learning result monitoring module 204.
- the calculation of the punishment intensity can refer to the above content, which will not be repeated here.
- the federated learning result monitoring module 204 may invite each participant to participate in this round of federated learning training. After that, the federated learning result monitoring module 204 can monitor each participant according to the monitoring power sent by the monitoring power decision module 202, and will detect the identification information of the participants attacking the federated learning model during this round of federated learning training. The number of times is sent to the attack probability calculation module 200 for updating the attack probability of each participant.
- the federated learning result monitoring module 204 can also announce to the participating parties the penalty for the malicious attacker in this round during the federated learning training process of this round, and send the penalty loss for the malicious attacker this round to the malicious attacker, Used to punish malicious attackers.
- the federal learning result monitoring module 204 can also send the penalty loss for the malicious attacker this round and the monitoring cost of this round to the alliance budget management module 201 to update the monitoring budget for the next round of the preset dynamic monitoring mechanism.
- the federated learning device monitors the model parameters fed back by each participant in the alliance training this round through a preset dynamic monitoring mechanism, and according to the model parameters and the preset attack probability model, Determine the attack probability of each participant attacking the federated learning model, so that the malicious attacker can be identified from each participant in time according to the attack probability of each participant.
- the federated learning device can also determine the target monitoring intensity of the previous round based on the attack probability of the malicious attacker in the previous round and the preset monitoring intensity model, so as to realize the effective monitoring of each participant according to different monitoring intensity.
- the federal learning device can also determine the punishment of the alliance against the malicious attacker in this round according to the target monitoring intensity and preset penalty intensity model of the previous round, and determine the penalty loss of the malicious attacker according to the punishment intensity and send it to the malicious attacker.
- the attacker can thereby deter the malicious attacker, so as to prevent the malicious attacker from attacking the federated learning model, thereby effectively reducing the success rate of the malicious attacker attacking the federated learning model.
- FIG. 3 is a schematic structural diagram of a federated learning device according to an embodiment of the present invention.
- the federated learning device 300 includes:
- the monitoring unit 301 is used to monitor the model parameters fed back by each participant in the training round of the alliance according to the preset dynamic monitoring mechanism;
- the processing unit 302 is configured to determine the attack probability of each participant attacking the federated learning model according to the model parameters and a preset attack probability model, and identify malicious from each participant according to the attack probability Attacker: Determine the target monitoring intensity of the previous round according to the attack probability of the malicious attacker in the previous round and the preset monitoring intensity model, and determine according to the target monitoring intensity of the previous round and the preset punishment intensity model The penalty intensity of the alliance against the malicious attacker in this round; determine the penalty loss of the malicious attacker according to the penalty intensity;
- the sending unit 303 is configured to send the penalty loss of the malicious attacker to the malicious attacker.
- processing unit 302 is specifically configured to:
- the loss intensity, the success rate, and the preset penalty intensity model determine the penalty intensity of the alliance against the malicious attacker in this round.
- processing unit 302 is specifically configured to:
- model parameters update the parameters of the federated learning model, and count the number of historical rounds of the alliance training each participant; the update times of the parameters of the federated learning model are equal to the number of historical rounds of the alliance training each participant ;
- the attack probability of each participant attacking the federated learning model is determined.
- processing unit 302 is specifically configured to:
- the historical loss the historical monitoring cost, the success rate of the malicious attacker in the previous round of attacking the federated learning model, and when the federated learning model was attacked by the malicious attacker in the previous round
- the loss intensity corresponding to the alliance, the attack probability of the malicious attacker in the previous round and the preset monitoring intensity model are used to determine the target monitoring intensity in the previous round.
- the target monitoring intensity is between a first threshold and a second threshold; the processing unit 302 is specifically configured to:
- the historical loss the historical monitoring cost, the success rate of the malicious attacker in the previous round of attacking the federated learning model, and when the federated learning model was attacked by the malicious attacker in the previous round
- processing unit 302 is specifically configured to:
- the list of malicious attackers includes the correspondence between the identification information of the malicious attacker and the attack probability of the malicious attacker;
- a malicious attacker is identified from the participating parties.
- the monitoring unit 301 is further configured to: update the monitoring budget of the preset dynamic monitoring mechanism according to the penalty loss, for the next round of the monitoring alliance to train the model fed back by each participant parameter.
- the federated learning device 300 in the embodiment of the present invention and the method for responding to the attack shown in FIG. 1 are inventions based on the same concept. Through the detailed description of the method corresponding to the attack, those skilled in the art can clearly understand this implementation. In the example, the implementation process of the federated learning device 300 is not repeated here for the sake of brevity of the description.
- the present invention also provides a federated learning device.
- FIG. 4 is a schematic structural diagram of a federated learning device according to an embodiment of the present invention.
- the federation learning device 400 includes: a transceiver 401, a processor 402, and a memory 403;
- the memory 403 stores one or more executable programs, which are used to configure the processor
- the processor 402 is configured to monitor the model parameters fed back by each participant in the current round of the alliance training according to the preset dynamic monitoring mechanism; determine that each participant performs the federated learning model according to the model parameters and the preset attack probability model The attack probability of the attack, and the malicious attacker is identified from the participating parties according to the attack probability; according to the attack probability of the malicious attacker in the previous round and the preset monitoring intensity model, the target monitoring of the previous round is determined According to the target monitoring intensity and preset penalty intensity model of the previous round, determine the penalty intensity of the alliance against the malicious attacker in this round;
- the transceiver 401 is configured to determine the penalty loss of the malicious attacker according to the penalty intensity and send it to the malicious attacker.
- the processor 402 is specifically configured to: determine the duration of the model parameters fed back by each participant in the training of the monitoring alliance in this round; based on the duration, determine that the federated learning model in this round suffers from the malicious The attacker’s corresponding loss intensity during the attack; according to the target monitoring intensity of the previous round, determine the success rate of the malicious attacker’s attack on the federated learning model in this round; according to the loss intensity, the success rate and The preset penalty intensity model determines the penalty intensity of the alliance against the malicious attacker in this round.
- the processor 402 is specifically configured to: update the parameters of the federated learning model according to the model parameters, and count the number of historical rounds of the alliance training each participant; the federated learning model The number of updates of the parameters of is equal to the number of historical rounds of the alliance training each participant; the recorded number of times each participant attacked the federated learning model in each historical round; the number of historical rounds of the alliance training each participant And the number of times that each participant attacked the federated learning model in each historical round, and determine the historical attack probability of each participant in each historical round; according to the history of each participant in each historical round The attack probability and the preset attack probability model determine the attack probability of each participant attacking the federated learning model.
- the processor 402 is specifically configured to determine the historical loss of the previous round of the alliance, the historical monitoring cost of the previous round of the alliance, and the malicious attacker attacking the federated learning model in the previous round. Success rate, the strength of the loss corresponding to the alliance when the federated learning model in the previous round was attacked by the malicious attacker; according to the historical loss, the historical monitoring cost, the malicious attacker in the previous round attacked the The success rate of the federated learning model, the loss intensity corresponding to the alliance when the federated learning model was attacked by the malicious attacker in the previous round, the attack probability of the malicious attacker in the previous round, and the preset monitoring intensity model are determined The intensity of target monitoring in the previous round.
- the processor 402 is specifically configured to: according to the historical loss, the historical monitoring cost, the success rate of the malicious attacker in the previous round of attacking the federated learning model, and According to the loss intensity corresponding to the alliance when the federated learning model in the previous round is attacked by the malicious attacker, the attack probability of the malicious attacker in the previous round, and the preset monitoring intensity model, determine the maximum monitoring intensity of the previous round; Determine whether the maximum monitoring intensity is greater than or equal to the second threshold; if the maximum monitoring intensity is greater than or equal to the second threshold, use the second threshold as the target monitoring intensity; otherwise, set the maximum monitoring intensity The intensity is used as the target monitoring intensity.
- the processor 402 is specifically configured to: obtain a preset list of malicious attackers; the list of malicious attackers includes identification information of the malicious attacker and the attack probability of the malicious attacker Correspondence between; according to the attack probability and the list of malicious attackers, identify malicious attackers from the participating parties.
- the processor 402 is further configured to: update the monitoring budget of the preset dynamic monitoring mechanism according to the penalty loss, for monitoring the model fed back by each participant in the next round of the alliance training parameter.
- These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment.
- the instructions provide steps for implementing the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
- These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device.
- the device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Bioethics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Computer And Data Communications (AREA)
Abstract
一种应对攻击的方法及联邦学习装置,该方法包括:按照预设动态监测机制监测联盟本回合训练各参与方反馈的模型参数(S101);根据所述模型参数和预设攻击几率模型,确定各参与方对联邦学习模型进行攻击的攻击几率,并根据所述攻击几率从各参与方中识别出恶意攻击方(S102);根据上一回合恶意攻击方的攻击几率和预设监测力度模型,确定上一回合的目标监测力度,并根据上一回合的目标监测力度和预设惩罚力度模型,确定本回合联盟针对恶意攻击方的惩罚力度(S103);根据所述惩罚力度,确定恶意攻击方的惩罚损失并发送给恶意攻击方,用于威慑恶意攻击方,以达到防范恶意攻击方攻击联邦学习模型的目的,可以有效降低恶意攻击方攻击联邦学习模型的成功率(S104)。
Description
相关申请的交叉引用
本申请要求在2020年03月31日提交中国专利局、申请号为202010243325.8、申请名称为“一种应对攻击的方法及联邦学习装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本发明涉及金融科技(Fintech)技术领域和人工智能技术领域,尤其涉及一种应对攻击的方法及联邦学习装置。
联邦学习是一种新型的通过分布式训练及加密技术,确保用户隐私数据得到最大限度的保护的机器学习理念。在联邦学习机制下,各参与方把加密后的数据模型贡献给联盟,联合训练一个联邦学习模型,通过联邦学习模型再开放给各参与方使用。
然而,在这个过程中,恶意攻击方可以攻击联邦学习模型,以期望获取到某种特殊利益。但是,目前还未存在应对恶意攻击方攻击的方法,从而无法达到防范恶意攻击方攻击联邦学习模型的目的,使得恶意攻击方攻击联邦学习模型的成功率高。
因此,如何应对恶意攻击方的攻击,成为了当前亟需解决的问题。
发明内容
本发明提供一种应对攻击的方法及联邦学习装置,用以解决现有技中不存在应对恶意攻击方的攻击的方法的问题,从而可以达到防范恶意攻击方攻击联邦学习模型的目的,进而可以降低恶意攻击方攻击联邦学习模型的成功率。
为实现上述目的,第一方面,本发明提供一种应对攻击的方法,包括:
按照预设动态监测机制监测联盟本回合训练各参与方反馈的模型参数;
根据所述模型参数和预设攻击几率模型,确定所述各参与方对联邦学习模型进行攻击的攻击几率,并根据所述攻击几率从所述各参与方中识别出恶意攻击方;
根据所述上一回合恶意攻击方的攻击几率和预设监测力度模型,确定上一回合的目标监测力度,并根据所述上一回合的目标监测力度和预设惩罚力度模型,确定本回合联盟针对所述恶意攻击方的惩罚力度;
根据所述惩罚力度,确定所述恶意攻击方的惩罚损失并发送给所述恶意攻击方。
在一种可能的设计中,根据所述上一回合的目标监测力度和预设惩罚力度模型,确定本回合联盟针对所述恶意攻击方的惩罚力度,包括:
确定监测联盟本回合训练各参与方反馈的模型参数的时长;
基于所述时长,确定本回合所述联邦学习模型遭受所述恶意攻击方攻击时联盟对应的损失力度;
根据所述上一回合的目标监测力度,确定所述恶意攻击方本回合攻击所述联邦学习模型的成功率;
根据所述损失力度、所述成功率和所述预设惩罚力度模型,确定本回合联盟针对所述恶意攻击方的惩罚力度。
在一种可能的设计中,根据所述模型参数和预设攻击几率模型,确定所述各参与方对联邦学习模型进行攻击的攻击几率,包括:
根据所述模型参数,更新所述联邦学习模型的参数,并统计联盟训练各参与方的历史回合的次数;所述联邦学习模型的参数的更新次数与联盟训练各参与方的历史回合的次数相等;
获取记录的所述各参与方在各历史回合攻击所述联邦学习模型的次数;
根据联盟训练各参与方的历史回合的次数以及所述各参与方在各历史回合攻击所述联邦学习模型的次数,确定所述各参与方在所述各历史回合的历史攻击几率;
根据所述各参与方在所述各历史回合的历史攻击几率和所述预设攻击几率模型,确定所述各参与方对所述联邦学习模型进行攻击的攻击几率。
在一种可能的设计中,根据所述上一回合恶意攻击方的攻击几率和预设监测力度模型,确定上一回合的目标监测力度,包括:
确定上一回合联盟的历史损失、上一回合联盟的历史监测成本、上一回合所述恶意攻击方攻击所述联邦学习模型的成功率、上一回合所述联邦学习模型遭受所述恶意攻击方攻击时联盟对应的损失力度;
根据所述历史损失、所述历史监测成本、所述上一回合所述恶意攻击方攻击所述联邦学习模型的成功率、所述上一回合所述联邦学习模型遭受所述恶意攻击方攻击时联盟对应的损失力度、所述上一回合恶意攻击方的攻击几率和预设监测力度模型,确定上一回合的目标监测力度。
在一种可能的设计中,所述目标监测力度在第一阈值和第二阈值之间;根据所述历史损失、所述历史监测成本、所述上一回合所述恶意攻击方攻击所述联邦学习模型的成功率、 所述上一回合所述联邦学习模型遭受所述恶意攻击方攻击时联盟对应的损失力度、所述上一回合恶意攻击方的攻击几率和预设监测力度模型,确定上一回合的目标监测力度,包括:
根据所述历史损失、所述历史监测成本、所述上一回合所述恶意攻击方攻击所述联邦学习模型的成功率、所述上一回合所述联邦学习模型遭受所述恶意攻击方攻击时联盟对应的损失力度、所述上一回合恶意攻击方的攻击几率和预设监测力度模型,确定上一回合的最大监测力度;
判断所述最大监测力度是否大于等于所述第二阈值;
若所述最大监测力度大于等于所述第二阈值,则将所述第二阈值作为所述目标监测力度;
否则,将所述最大监测力度作为所述目标监测力度。
在一种可能的设计中,根据所述攻击几率从所述各参与方中识别出恶意攻击方,包括:
获取预设的恶意攻击方名单;所述恶意攻击方名单包括所述恶意攻击方的标识信息和所述恶意攻击方的攻击几率之间的对应关系;
根据所述攻击几率和所述恶意攻击方名单,从所述各参与方中识别出恶意攻击方。
在一种可能的设计中,在根据所述惩罚力度,确定所述恶意攻击方的惩罚损失之后,还包括:
根据所述惩罚损失更新所述预设动态监测机制的监测预算,用于监测联盟下一回合训练所述各参与方反馈的模型参数。
第二方面,本发明提供一种联邦学习装置,所述联邦学习装置包括:
监测单元,用于按照预设动态监测机制监测联盟本回合训练各参与方反馈的模型参数;
处理单元,用于根据所述模型参数和预设攻击几率模型,确定所述各参与方对联邦学习模型进行攻击的攻击几率,并根据所述攻击几率从所述各参与方中识别出恶意攻击方;根据所述上一回合恶意攻击方的攻击几率和预设监测力度模型,确定上一回合的目标监测力度,并根据所述上一回合的目标监测力度和预设惩罚力度模型,确定本回合联盟针对所述恶意攻击方的惩罚力度;根据所述惩罚力度,确定所述恶意攻击方的惩罚损失;
发送单元,用于将所述恶意攻击方的惩罚损失发送给所述恶意攻击方。
在一种可能的设计中,所述处理单元具体用于:
确定监测联盟本回合训练各参与方反馈的模型参数的时长;
基于所述时长,确定本回合所述联邦学习模型遭受所述恶意攻击方攻击时联盟对应的损失力度;
根据所述上一回合的目标监测力度,确定所述恶意攻击方本回合攻击所述联邦学习模 型的成功率;
根据所述损失力度、所述成功率和所述预设惩罚力度模型,确定本回合联盟针对所述恶意攻击方的惩罚力度。
在一种可能的设计中,所述处理单元具体用于:
根据所述模型参数,更新所述联邦学习模型的参数,并统计联盟训练各参与方的历史回合的次数;所述联邦学习模型的参数的更新次数与联盟训练各参与方的历史回合的次数相等;
获取记录的所述各参与方在各历史回合攻击所述联邦学习模型的次数;
根据联盟训练各参与方的历史回合的次数以及所述各参与方在各历史回合攻击所述联邦学习模型的次数,确定所述各参与方在所述各历史回合的历史攻击几率;
根据所述各参与方在所述各历史回合的历史攻击几率和所述预设攻击几率模型,确定所述各参与方对所述联邦学习模型进行攻击的攻击几率。
在一种可能的设计中,所述处理单元具体用于:
确定上一回合联盟的历史损失、上一回合联盟的历史监测成本、上一回合所述恶意攻击方攻击所述联邦学习模型的成功率、上一回合所述联邦学习模型遭受所述恶意攻击方攻击时联盟对应的损失力度;
根据所述历史损失、所述历史监测成本、所述上一回合所述恶意攻击方攻击所述联邦学习模型的成功率、所述上一回合所述联邦学习模型遭受所述恶意攻击方攻击时联盟对应的损失力度、所述上一回合恶意攻击方的攻击几率和预设监测力度模型,确定上一回合的目标监测力度。
在一种可能的设计中,所述目标监测力度在第一阈值和第二阈值之间;所述处理单元具体用于:
根据所述历史损失、所述历史监测成本、所述上一回合所述恶意攻击方攻击所述联邦学习模型的成功率、所述上一回合所述联邦学习模型遭受所述恶意攻击方攻击时联盟对应的损失力度、所述上一回合恶意攻击方的攻击几率和预设监测力度模型,确定上一回合的最大监测力度;
判断所述最大监测力度是否大于等于所述第二阈值;
若所述最大监测力度大于等于所述第二阈值,则将所述第二阈值作为所述目标监测力度;
否则,将所述最大监测力度作为所述目标监测力度。
在一种可能的设计中,所述处理单元具体用于:
获取预设的恶意攻击方名单;所述恶意攻击方名单包括所述恶意攻击方的标识信息和所述恶意攻击方的攻击几率之间的对应关系;
根据所述攻击几率和所述恶意攻击方名单,从所述各参与方中识别出恶意攻击方。
在一种可能的设计中,所述监测单元还用于:
根据所述惩罚损失更新所述预设动态监测机制的监测预算,所述监测预算大于所述目标监测力度,用于监测联盟下一回合训练所述各参与方反馈的模型参数。
第三方面,本发明提供一种联邦学习装置,所述联邦学习装置包括:至少一个处理器和存储器;其中,所述存储器存储一个或多个计算机程序;当所述存储器存储的一个或多个计算机程序被所述至少一个处理器执行时,使得所述联邦学习装置能够执行上述第一方面或上述第一方面的任意一种可能的设计的方法。
第四方面,本发明提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机指令,当所述计算机指令在计算机上运行时,使得所述计算机能够执行上述第一方面或上述第一方面的任意一种可能的设计的方法。
本发明有益效果如下:
在本发明提供的技术方案中,联邦学习装置通过预设动态监测机制监测联盟本回合训练各参与方反馈的模型参数,并根据所述模型参数和预设攻击几率模型,确定各参与方对联邦学习模型进行攻击的攻击几率,从而可以根据各参与方的攻击几率及时地从各参与方中识别出恶意攻击方。联邦学习装置还可以根据上一回合恶意攻击方的攻击几率和预设监测力度模型,确定上一回合的目标监测力度,从而可以实现根据不同的监测力度对各参与方进行有效监测。联邦学习装置还可以根据上一回合的目标监测力度和预设惩罚力度模型,确定本回合联盟针对恶意攻击方的惩罚力度,并根据所述惩罚力度,确定恶意攻击方的惩罚损失并发送给恶意攻击方,从而可以威慑恶意攻击方,以达到防范恶意攻击方攻击联邦学习模型的目的,进而可以有效降低恶意攻击方攻击联邦学习模型的成功率。
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简要介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域的普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本发明实施例提供的一种应对攻击的方法的流程示意图;
图2为本发明实施例提供的一种联邦学习装置按照预设动态监测机制监测各参与方的 过程示意图;
图3为本发明实施例提供的一种联邦学习装置的结构示意图;
图4为本发明实施例提供的一种联邦学习装置的结构示意图。
为了使本发明的目的、技术方案和优点更加清楚,下面将结合附图对本发明作进一步地详细描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本发明保护的范围。
本发明实施例中,术语“包括”以及它们任何变形,意图在于覆盖不排他的保护。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。
由前述内容可知,目前还未存在应对恶意攻击方攻击的方法,因此,存在无法防范恶意攻击方攻击联邦学习模型的问题,使得恶意攻击方攻击联邦学习模型的成功率高。为了解决该问题,本发明实施例提供了一种应对攻击的方法,用于填补联邦学习领域目前应对恶意攻击方攻击的空白,同时,还可以达到防范恶意攻击方攻击联邦学习模型的目的,从而可以降低恶意攻击方攻击联邦学习模型的成功率。
下面具体介绍本发明实施例中联邦学习装置应对恶意攻击方的攻击的具体过程。
示例性的,请参考图1所示,为本发明实施例提供的一种应对攻击的方法的流程示意图。其中,该方法可以应用于联邦学习装置。如图1所示,该方法流程包括:
S101、按照预设动态监测机制监测联盟本回合训练各参与方反馈的模型参数。
可选地,由于任何一个联邦学习中的参与方都有可能对联邦学习模型发起攻击,攻击的时间可以在任何一个联邦学习模型训练的回合,其中,所述回合是联邦学习中的基本概念,可以理解为一段确定的时间,每个回合都可以有多个参与方。因此,在本发明实施例中,联邦学习装置可以按照预设动态监测机制监测联盟在任意一个回合训练各参与方反馈的模型参数,其中,预设动态监测机制为通过不同的监测力度动态地监测各参与方在任意一个回合反馈的模型参数的机制。
比如,联邦学习装置可以采用在任意一个回合确定的监测力度,来监测联盟在任意一个回合训练各参与方反馈的模型参数,从而可以有针对性的对联盟在任意一个回合训练各参与方反馈的模型参数进行监测,达到了有效监测联盟在任意一个回合训练各参与方的目 的。例如,以本回合为例,当监测力度为0时,联邦学习装置不监测联盟本回合训练各参与方反馈的模型参数,直接对联邦学习模型进行聚合;当监测力度为0.5时,联邦学习装置监测联盟本回合训练各参与方中的一半参与方反馈的模型参数,如联邦学习装置随机对联盟本回合训练各参与方中的一半参与方反馈的模型参数进行监测;当监测力度为1时,联邦学习装置监测联盟当前回合训练各参与方反馈的模型参数。其中,监测力度的确定将在后文进行详细介绍。
S102、根据所述模型参数和预设攻击几率模型,确定所述各参与方对联邦学习模型进行攻击的攻击几率,并根据所述攻击几率从所述各参与方中识别出恶意攻击方。
在具体的实现过程中,联邦学习装置在接收到各参与方反馈的模型参数后,可以根据接收到的模型参数更新联邦学习模型的参数,并统计联盟训练各参与方的历史回合的次数。其中,联邦学习模型的参数的更新次数与联盟训练各参与方的历史回合的次数相等。比如,当联邦学习装置检测到联邦学习模型的参数更新了20次时,联邦学习装置可以确定联盟训练各参与方的历史回合次数为20。
在具体的实现过程中,联邦学习装置还可以统计各参与方在各历史回合攻击联邦学习模型的次数,并记录下来。比如,联邦学习装置可以记录各参与方在任意一个历史回合中攻击联邦学习模型的次数,后续,联邦学习装置可以统计各参与方在各历史回合攻击联邦学习模型的次数。
可选地,联邦学习装置可以获取上述记录的各参与方在各历史回合攻击联邦学习模型的次数,以及联盟训练各参与方的历史回合的次数,并可以根据联盟训练各参与方的历史回合的次数,以及各参与方在各历史回合攻击联邦学习模型的次数,确定各参与方在所述各历史回合的历史攻击几率。比如,以各参与方中的参与方a为例,若参与方a在第1-10、16-20个历史回合中攻击联邦学习模型的次数均为0,在第11-15个历史回合中攻击联邦学习模型的次数均为1,那么,联邦学习装置可以确定参与方a在第1-10、16-20个历史回合的攻击几率均为0,在第11-15个历史回合的历史攻击几率均为0.05。
之后,联邦学习装置可以根据各参与方在各历史回合的历史攻击几率和预设攻击几率模型,确定各参与方对联邦学习模型进行攻击的攻击几率。比如,若预设攻击几率模型为:
其中,p(a)为参与方的攻击几率,a为攻击(attack)的角标,N为等于历史回合次数,P
n为参与方在任意一个历史回合的攻击几率。
那么,联邦学习装置可以根据各参与方在各历史回合的历史攻击几率和上述公式(1),确定各参与方的攻击几率。比如,仍以上述参与方a为例,联邦学习装置可以根据参与方 a在各历史回合的历史攻击几率和上述公式(1),确定参与方a的攻击几率p(a)=0.25。
本发明实施例中,联邦学习装置通过根据各参与方在各历史回合的历史攻击几率和预设攻击几率模型,确定各参与方对联邦学习模型进行攻击的攻击几率,从而可以根据各参与方对联邦学习模型进行攻击的攻击几率,及时地从各参与方中识别出恶意攻击方,以便后续联邦学习装置根据相应的惩罚措施惩罚恶意攻击方,以达到防范恶意攻击方攻击联邦学习模型的目的。
需要说明的是,上述是以联邦学习装置根据各参与方在各历史回合的历史攻击几率和预设攻击几率模型,确定各参与方对联邦学习模型进行攻击的攻击几率为例。当然,联邦学习装置可以直接根据联盟训练各参与方的历史回合的次数以及各参与方在各历史回合攻击联邦学习模型的总次数,确定参与方a对联邦学习模型进行攻击的攻击几率。比如,仍以各参与方中的参与方a为例,参与方a在20个历史回合中,总共攻击联邦学习模型的次数为5次,那么联邦学习装置确定参与方a对联邦学习模型进行攻击的攻击几率p(a)=5/20=0.25。
在具体的实现过程中,联邦学习装置可以根据预设的恶意攻击方名单和恶意攻击方的攻击几率,从各参与方中确定出恶意攻击方。比如,联邦学习装置可以将各参与方中在各历史回合攻击联邦学习模型的恶意攻击方的标识信息记录下来,并根据恶意攻击方的标识信息和恶意攻击方的攻击几率建立对应关系,得到预设的恶意攻击方名单。联邦学习装置确定各参与方的攻击几率后,可以根据该恶意攻击方名单,从各参与方中确定出恶意攻击方。比如,各参与方中的参与方a的攻击几率p(a)为0.25,而该恶意攻击方名单中记录的恶意攻击方a的攻击几率p(a)为0.25,那么联邦学习装置可以确定各参与方中的参与方a为恶意攻击方a。其中,恶意攻击方的标识信息可以为恶意攻击方的账户信息、注册信息等任何可以唯一标识恶意攻击方的信息,本发明实施例不做具体限定。
其中,在本发明实施例中,联邦学习装置记录的恶意攻击方名单可以进行动态更新,即恶意攻击方名单当前所记录的恶意攻击方的攻击几率可以为恶意攻击方在上一回合对应的攻击几率,当本回合结束进入下一回合时,恶意攻击方名单当前所记录的恶意攻击方的攻击几率可以为恶意攻击方在本回合对应的攻击几率。
在本发明实施例中,联邦学习装置通过创建恶意攻击方名单,可以便于联邦学习装置根据各参与方的攻击几率及时地从各参与方中识别出恶意攻击方,从而可以便于后续联邦学习装置通过对恶意攻击方的惩罚损失来威慑恶意攻击方,以防范恶意攻击方攻击联邦学习模型。
S103、根据所述上一回合恶意攻击方的攻击几率和预设监测力度模型,确定上一回合 的目标监测力度,并根据所述上一回合的目标监测力度和预设惩罚力度模型,确定本回合联盟针对所述恶意攻击方的惩罚力度。
在具体的实施过程中,若监测预算不足,联邦学习装置则无法实施监测联盟任意一个回合训练各参与方反馈的模型参数。因此,在本发明实施例中,联邦学习装置可以设定任意一个回合的监测成本,以使联盟可以根据任意一个回合的监测成本给出任意一个回合的监测预算,以解决监测预算不足的问题。其中,任意一个回合的监测成本为联邦学习装置根据任意一个回合的监测力度计算得到的,比如,上一回合的历史监测成本是由上一回合的历史监测力度计算得到的。具体的,任意一个回合的监测成本可以表示为:
c(r)=βr (2)
其中,β为大于0的常数,β的具体数值可以根据实际需求进行设定,r为任意一个回合的监测力度。
在具体的实现过程中,联邦学习装置可以记录各历史回合的历史监测力度,以便计算各历史回合的历史监测成本(具体计算过程将在后文详细介绍)。之后,联邦学习装置可以根据各历史回合对应的历史监测成本,确定恶意攻击方在各历史回合攻击联邦学习模型的成功率。其中,恶意攻击方在各历史回合攻击联邦学习模型的成功率可以表示为:
其中,r为各历史回合的历史监测成本,e是欧拉数(Euler number),约为2.718281828,通常取e=2.7,θ为大于0的常数,θ的具体数值可以根据实际需求进行设定,本发明实施例不做具体限定。
可选地,联邦学习装置还可以记录监测联盟各历史回合训练各参与方反馈的模型参数的时长。之后,联邦学习装置可以根据记录的监测联盟各历史回合训练各参与方反馈的模型参数的时长,确定各历史回合联邦学习模型遭受恶意攻击方攻击时联盟对应的损失力度。其中,各历史回合联邦学习模型遭受恶意攻击方攻击时联盟对应的损失力度可以表示为:
v(t)=λt (4)
其中,λ为大于0的常数,λ的具体数值可以根据实际需求进行设定,t为联邦学习装置监测联盟各历史回合训练各参与方反馈的模型参数的时长。
之后,联邦学习装置可以根据各历史回合的历史监测成本、恶意攻击方在各历史回合攻击联邦学习模型的成功率、各历史回合联邦学习模型遭受恶意攻击方攻击时联盟对应的损失力度以及各历史回合各参与方对应的攻击几率,确定联盟在各历史回合的历史损失。具体地,联盟在各历史回合的历史损失可以表示为:
即,联邦学习装置可以根据上述公式(2)计算得到上一回合联盟的历史监测成本,根据上述公式(3)计算得到上一回恶意攻击方攻击联邦学习模型的成功率,根据上述公式(4)计算得到上一回合联邦学习模型遭受恶意攻击方攻击时联盟对应的损失力度。之后,联邦学习装置可以根据上一回合联盟的历史监测成本、上一回恶意攻击方攻击联邦学习模型的成功率、上一回合联邦学习模型遭受恶意攻击方攻击时联盟对应的损失力度、上一回合恶意攻击方的攻击几率,按照上述公式(5)计算得到上一回合联盟的历史损失。
可选地,联邦学习确定上一回合联盟的历史损失后,可以根据上一回合联盟的历史损失和预设监测力度模型,确定上一回合的目标监测力度。比如,为了最小化历史联盟损失,联邦学习装置可以根据上述公式(5)对r求导,并将结果设置为0,得到预设监测模型为:
之后,联邦学习装置可以根据上述公式(6)确定出上一回合的目标监测力度。
在具体的实现过程中,联邦学习装置可以设定上一回合的目标监测力度取值范围,即设定上一回合的目标监测力度的上下限,其中,目标监测力度的下限为第一阈值,上限为第二阈值,具体的,第一阈值可以设定为0,第二阈值可以设定为1。联邦学习装置可以根据上述公式(6)得到上一回合的目标监测力度取值可以表示为:
即,联邦学习装置可以根据上述公式(6)确定上一回合的最大监测力度,之后,联邦学习装置可以根据上述公式(7)确定上一回合的目标监测力度。比如,联邦学习装置确定上一回合的最大监测力度后,通过判断上一回合的最大监测力度与第二阈值之间的大小关系,来确定上一回合的目标监测力度。例如,若联邦学习装置确定上一回合的最大监测力度大于等于第二阈值,则将第二阈值作为上一回合的目标监测力度,否则,将上一回合的最大监测力度作为上一回合的目标监测力度。例如,若上一回合的最大监测力度r
max=0.6,小于1,上一回合的目标监测力度则为0.6,若上一回合的最大监测力度r
max=2,大于1,上一回合的目标监测力度则为1。
可选地,联邦学习装置确定上一回合的目标监测力度后,可以根据上一回合的目标监测力度和预设惩罚力度模型,确定本回合联盟针对恶意攻击方的惩罚力度。具体的,联邦学习装置可以确定监测联盟本回合训练各参与方反馈的模型参数的时长,之后,可以根据上述公式(4),确定本回合联邦学习模型遭受恶意攻击方攻击时联盟对应的损失力度。联邦学习装置还可以根据上一回合的目标监测力度和上述公式(3),确定恶意攻击方本回合攻击联邦学习模型的成功率。之后,联邦学习装置可以根据本回合联邦学习模型遭受恶意 攻击方攻击时联盟对应的损失力度、恶意攻击方本回合攻击联邦学习模型的成功率和预设惩罚力度模型,确定本回合联盟针对恶意攻击方的惩罚力度。
比如,从恶意攻击方的角度来看,攻击联邦学习模型成功后产生的收益可以表达为:
其中,c
a(t)为本回合联盟针对恶意攻击方的惩罚力度,此时的t为监测联盟本回合训练各参与方反馈的模型参数的时长。
由于对上述公式(8)相对攻击几率p(a)求导不能解出最小化的攻击几率p(a)值,恶意攻击方在本回合发动攻击时无法预知上一回合的目标监测力度r,但可以从上述公式(8)可以得知,恶意攻击方将会选择在
增大时提高攻击几率p(a)。因此,在本发明实施例中,为了让恶意攻击方攻击联邦学习模型时无利可图,联邦学习装置设定
即预设惩罚力度模型可以表示为:
即,联邦学习装置可以根据本回合联邦学习模型遭受恶意攻击方攻击时联盟对应的损失力度、恶意攻击方本回合攻击联邦学习模型的成功率和上述公式(9),确定本回合联盟针对恶意攻击方的惩罚力度。
S104、根据所述惩罚力度,确定所述恶意攻击方的惩罚损失并发送给所述恶意攻击方。
可选地,联邦学习装置确定本回合联盟针对恶意攻击方的惩罚力度后,可以确定恶意攻击方的惩罚损失。比如,联邦学习装置可以将本回合联盟针对恶意攻击方的惩罚力度,作为恶意攻击方的惩罚损失,或者,可以将本回合联盟针对恶意攻击方的惩罚力度与预设倍数之间的乘积作为恶意攻击方的惩罚损失,其中,预设倍数对应的数值大于1。
之后,联邦学习装置可以将恶意攻击方的惩罚损失发送给恶意攻击方,比如,将恶意攻击方的惩罚损失发送给恶意攻击方对应的终端,用于惩罚恶意攻击方,以威慑恶意攻击方,从而达到防范恶意攻击方攻击联邦学习模型的目的,从而可以降低恶意攻击方攻击联邦学习模型的成功率。其中,终端可以为手机、平板等任何可以参与联邦学习的设备,本发明实施例不限定。
上述是以联邦学习装置将恶意攻击方的惩罚损失发送给恶意攻击方为例。当然,联邦学习装置还可以将本回合联盟针对恶意攻击方的惩罚力度公布在联邦学习平台上,用于威慑恶意攻击方,在一定程度上可以避免恶意攻击方攻击联邦学习模型,从而以达到防范恶意攻击方攻击联邦学习模型的目的,降低了恶意攻击方攻击联邦学习模型的成功率。
可选地,联邦学习装置确定恶意攻击方的惩罚损失之后,根据惩罚损失更新预设动态 监测机制的监测预算,用于监测联盟下一回合训练各参与方反馈的模型参数,从而可以保障监测联盟下一回合训练各参与方反馈的模型参数。比如,联邦学习装置采用上述方式计算得到本回合的监测成本时,联邦学习装置可以根据结合恶意攻击方的惩罚损失和本回合的监测成本,来更新预设动态监测机制的监测预算,从而可以保障监测联盟下一回合训练各参与方反馈的模型参数的实施。
下面具体介绍联邦学习装置按照预设动态监测机制动态地监测各参与方的具体过程。
示例性的,如图2所示,联邦学习装置可以包括攻击几率计算模块200、联盟预算管理模块201、监测力度决策模块202、惩罚力度决策模块203和联邦学习结果监测模块204。
在具体的实现过程中,攻击几率计算模块200可以接收联邦学习结果监测模块204发送的联盟训练各参与方的历史回合的次数和各参与方在各历史回合攻击联邦学习模型的次数,来计算各参与方的攻击几率。之后,攻击几率计算模块200可以根据各参与方的攻击几率,从各参与方中识别出恶意攻击方,并更新存储的预设的恶意攻击方名单。攻击几率计算模块200将各参与方的攻击几率以及恶意攻击方的标识信息发送给监测力度决策模块202。
在具体的实现过程中,联盟预算管理模块201用于记录预设动态监测机制在不同监测力度所对应的监测预算,并将结果发送给监测力度决策模块202。比如,联盟预算管理模块201可以根据上一回合对恶意攻击方的惩罚损失和上一回合监测成本确定本回合预设动态监测机制的监测预算,并将本回合预设动态监测机制的监测预算发送给监测力度决策模块202。
在具体的实现过程中,监测力度决策模块202在预设动态监测机制的监测预算允许的范围内,向联邦学习装置的系统管理员提供部署应对恶意攻击方攻击的监测力度的决策支持,即监测力度决策模块202可以在预设动态监测机制的监测预算允许的范围内,向联邦学习装置的系统管理员提供任意一个回合中应对恶意攻击方攻击的监测力度,并对各参与方反馈的模型参数进行监测,以及将监测联盟任意一个回合训练各参与方反馈的模型参数的时长和提供的监测力度,发送给惩罚力度决策模块203和联邦学习结果监测模块204。其中,监测力度的计算可以参见上述内容,在此不再赘述。监测力度决策模块202还可以向联邦学习装置的系统管理员提供任意一个回合的监测成本,并将提供的监测成本发送给联邦学习结果监测模块204。
在具体的实现过程中,惩罚力度决策模块203可以根据监测力度决策模块202发送的监测力度以及监测联盟本回合训练各参与方反馈的模型参数的时长,计算本回合对恶意攻击方的惩罚力度,并将本回合对恶意攻击方的惩罚力度发送给联邦学习结果监测模块204。 其中,惩罚力度的计算可以参见上述内容,在此不再赘述。
在具体的实现过程中,联邦学习结果监测模块204可以邀请各参与方参与本回合联邦学习训练。之后,联邦学习结果监测模块204可以根据监测力度决策模块202发送的监测力度对各参与方进行监测,并将在本回合联邦学习训练过程中发现的攻击联邦学习模型的参与方的标识信息以及攻击次数,发送给攻击几率计算模块200,用于更新各参与方的攻击几率。联邦学习结果监测模块204还可以在本回合联邦学习训练过程中,向对各参与方公布本回合对恶意攻击方的惩罚力度,以及将本回合对恶意攻击方的惩罚损失发送给恶意攻击方,用于惩罚恶意攻击方。联邦学习结果监测模块204还可以将本回合对恶意攻击方的惩罚损失以及本回合的监测成本发送联盟预算管理模块201,用于更新预设动态监测机制下一回合的监测预算。
通过以上描述可知,本发明实施例提供的技术方案中,联邦学习装置通过预设动态监测机制监测联盟本回合训练各参与方反馈的模型参数,并根据所述模型参数和预设攻击几率模型,确定各参与方对联邦学习模型进行攻击的攻击几率,从而可以根据各参与方的攻击几率及时地从各参与方中识别出恶意攻击方。联邦学习装置还可以根据上一回合恶意攻击方的攻击几率和预设监测力度模型,确定上一回合的目标监测力度,从而可以实现根据不同的监测力度对各参与方进行有效监测。联邦学习装置还可以根据上一回合的目标监测力度和预设惩罚力度模型,确定本回合联盟针对恶意攻击方的惩罚力度,并根据所述惩罚力度,确定恶意攻击方的惩罚损失并发送给恶意攻击方,从而可以威慑恶意攻击方,以达到防范恶意攻击方攻击联邦学习模型的目的,进而可以有效降低恶意攻击方攻击联邦学习模型的成功率。
基于同一发明构思下,本发明还提供了一种联邦学习装置。请参考图3所示,为本发明实施例提供的一种联邦学习装置的结构示意图。
如图3所示,联邦学习装置300包括:
监测单元301,用于按照预设动态监测机制监测联盟本回合训练各参与方反馈的模型参数;
处理单元302,用于根据所述模型参数和预设攻击几率模型,确定所述各参与方对联邦学习模型进行攻击的攻击几率,并根据所述攻击几率从所述各参与方中识别出恶意攻击方;根据所述上一回合恶意攻击方的攻击几率和预设监测力度模型,确定上一回合的目标监测力度,并根据所述上一回合的目标监测力度和预设惩罚力度模型,确定本回合联盟针对所述恶意攻击方的惩罚力度;根据所述惩罚力度,确定所述恶意攻击方的惩罚损失;
发送单元303,用于将所述恶意攻击方的惩罚损失发送给所述恶意攻击方。
在一种可能的设计中,所述处理单元302具体用于:
确定监测联盟本回合训练各参与方反馈的模型参数的时长;
基于所述时长,确定本回合所述联邦学习模型遭受所述恶意攻击方攻击时联盟对应的损失力度;
根据所述上一回合的目标监测力度,确定所述恶意攻击方本回合攻击所述联邦学习模型的成功率;
根据所述损失力度、所述成功率和所述预设惩罚力度模型,确定本回合联盟针对所述恶意攻击方的惩罚力度。
在一种可能的设计中,所述处理单元302具体用于:
根据所述模型参数,更新所述联邦学习模型的参数,并统计联盟训练各参与方的历史回合的次数;所述联邦学习模型的参数的更新次数与联盟训练各参与方的历史回合的次数相等;
获取记录的所述各参与方在各历史回合攻击所述联邦学习模型的次数;
根据联盟训练各参与方的历史回合的次数以及所述各参与方在各历史回合攻击所述联邦学习模型的次数,确定所述各参与方在所述各历史回合的历史攻击几率;
根据所述各参与方在所述各历史回合的历史攻击几率和所述预设攻击几率模型,确定所述各参与方对所述联邦学习模型进行攻击的攻击几率。
在一种可能的设计中,所述处理单元302具体用于:
确定上一回合联盟的历史损失、上一回合联盟的历史监测成本、上一回合所述恶意攻击方攻击所述联邦学习模型的成功率、上一回合所述联邦学习模型遭受所述恶意攻击方攻击时联盟对应的损失力度;
根据所述历史损失、所述历史监测成本、所述上一回合所述恶意攻击方攻击所述联邦学习模型的成功率、所述上一回合所述联邦学习模型遭受所述恶意攻击方攻击时联盟对应的损失力度,所述上一回合恶意攻击方的攻击几率和预设监测力度模型,确定上一回合的目标监测力度。
在一种可能的设计中,所述目标监测力度在第一阈值和第二阈值之间;所述处理单元302具体用于:
根据所述历史损失、所述历史监测成本、所述上一回合所述恶意攻击方攻击所述联邦学习模型的成功率、所述上一回合所述联邦学习模型遭受所述恶意攻击方攻击时联盟对应的损失力度,所述上一回合恶意攻击方的攻击几率和预设监测力度模型,确定上一回合的最大监测力度;
判断所述最大监测力度是否大于等于所述第二阈值;
若所述最大监测力度大于等于所述第二阈值,则将所述第二阈值作为所述目标监测力度;
否则,将所述最大监测力度作为所述目标监测力度。
在一种可能的设计中,所述处理单元具体302用于:
获取预设的恶意攻击方名单;所述恶意攻击方名单包括所述恶意攻击方的标识信息和所述恶意攻击方的攻击几率之间的对应关系;
根据所述攻击几率和所述恶意攻击方名单,从所述各参与方中识别出恶意攻击方。
在一种可能的设计中,所述监测单元301还用于:根据所述惩罚损失更新所述预设动态监测机制的监测预算,用于监测联盟下一回合训练所述各参与方反馈的模型参数。
本发明实施例中的联邦学习装置300与前述图1所示的应对攻击的方法是基于同一构思下的发明,通过前述对应对攻击的方法的详细描述,本领域技术人员可以清楚的了解本实施例中联邦学习装置300的实施过程,所以为了说明书的简洁,在此不再赘述。
基于同一发明构思下,本发明还提供了一种联邦学习装置。请参考图4所示,为本发明实施例提供的一种联邦学习装置的结构示意图。
如图4所示,联邦学习装置400包括:收发信机401、处理器402和存储器403;
所述存储器403,存储一个或多个可执行程序,被用于配置所述处理器;
所述处理器402,用于按照预设动态监测机制监测联盟本回合训练各参与方反馈的模型参数;根据所述模型参数和预设攻击几率模型,确定所述各参与方对联邦学习模型进行攻击的攻击几率,并根据所述攻击几率从所述各参与方中识别出恶意攻击方;根据所述上一回合恶意攻击方的攻击几率和预设监测力度模型,确定上一回合的目标监测力度,并根据所述上一回合的目标监测力度和预设惩罚力度模型,确定本回合联盟针对所述恶意攻击方的惩罚力度;
所述收发信机401,用于根据所述惩罚力度,确定所述恶意攻击方的惩罚损失并发送给所述恶意攻击方。
在一种可能的设计中,所述处理器402具体用于:确定监测联盟本回合训练各参与方反馈的模型参数的时长;基于所述时长,确定本回合所述联邦学习模型遭受所述恶意攻击方攻击时联盟对应的损失力度;根据所述上一回合的目标监测力度,确定所述恶意攻击方本回合攻击所述联邦学习模型的成功率;根据所述损失力度、所述成功率和所述预设惩罚力度模型,确定本回合联盟针对所述恶意攻击方的惩罚力度。
在一种可能的设计中,所述处理器402具体用于:根据所述模型参数,更新所述联邦 学习模型的参数,并统计联盟训练各参与方的历史回合的次数;所述联邦学习模型的参数的更新次数与联盟训练各参与方的历史回合的次数相等;获取记录的所述各参与方在各历史回合攻击所述联邦学习模型的次数;根据联盟训练各参与方的历史回合的次数以及所述各参与方在各历史回合攻击所述联邦学习模型的次数,确定所述各参与方在所述各历史回合的历史攻击几率;根据所述各参与方在所述各历史回合的历史攻击几率和所述预设攻击几率模型,确定所述各参与方对所述联邦学习模型进行攻击的攻击几率。
在一种可能的设计中,所述处理器402具体用于:确定上一回合联盟的历史损失、上一回合联盟的历史监测成本、上一回合所述恶意攻击方攻击所述联邦学习模型的成功率、上一回合所述联邦学习模型遭受所述恶意攻击方攻击时联盟对应的损失力度;根据所述历史损失、所述历史监测成本、所述上一回合所述恶意攻击方攻击所述联邦学习模型的成功率、所述上一回合所述联邦学习模型遭受所述恶意攻击方攻击时联盟对应的损失力度、所述上一回合恶意攻击方的攻击几率和预设监测力度模型,确定上一回合的目标监测力度。
在一种可能的设计中,所述处理器402具体用于:根据所述历史损失、所述历史监测成本、所述上一回合所述恶意攻击方攻击所述联邦学习模型的成功率、所述上一回合所述联邦学习模型遭受所述恶意攻击方攻击时联盟对应的损失力度、所述上一回合恶意攻击方的攻击几率和预设监测力度模型,确定上一回合的最大监测力度;判断所述最大监测力度是否大于等于所述第二阈值;若所述最大监测力度大于等于所述第二阈值,则将所述第二阈值作为所述目标监测力度;否则,将所述最大监测力度作为所述目标监测力度。
在一种可能的设计中,所述处理器402具体用于:获取预设的恶意攻击方名单;所述恶意攻击方名单包括所述恶意攻击方的标识信息和所述恶意攻击方的攻击几率之间的对应关系;根据所述攻击几率和所述恶意攻击方名单,从所述各参与方中识别出恶意攻击方。
在一种可能的设计中,所述处理器402还用于:根据所述惩罚损失更新所述预设动态监测机制的监测预算,用于监测联盟下一回合训练所述各参与方反馈的模型参数。
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机 或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
尽管已描述了本发明的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例作出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本发明范围的所有变更和修改。
显然,本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样,倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内,则本发明也意图包含这些改动和变型在内。
Claims (20)
- 一种应对攻击的方法,其特征在于,包括:按照预设动态监测机制监测联盟本回合训练各参与方反馈的模型参数;根据所述模型参数和预设攻击几率模型,确定所述各参与方对联邦学习模型进行攻击的攻击几率,并根据所述攻击几率从所述各参与方中识别出恶意攻击方;根据所述上一回合恶意攻击方的攻击几率和预设监测力度模型,确定上一回合的目标监测力度,并根据所述上一回合的目标监测力度和预设惩罚力度模型,确定本回合联盟针对所述恶意攻击方的惩罚力度;根据所述惩罚力度,确定所述恶意攻击方的惩罚损失并发送给所述恶意攻击方。
- 如权利要求1所述的方法,其特征在于,根据所述上一回合的目标监测力度和预设惩罚力度模型,确定本回合联盟针对所述恶意攻击方的惩罚力度,包括:确定监测联盟本回合训练各参与方反馈的模型参数的时长;基于所述时长,确定本回合所述联邦学习模型遭受所述恶意攻击方攻击时联盟对应的损失力度;根据所述上一回合的目标监测力度,确定所述恶意攻击方本回合攻击所述联邦学习模型的成功率;根据所述损失力度、所述成功率和所述预设惩罚力度模型,确定本回合联盟针对所述恶意攻击方的惩罚力度。
- 如权利要求1所述的方法,其特征在于,根据所述模型参数和预设攻击几率模型,确定所述各参与方对联邦学习模型进行攻击的攻击几率,包括:根据所述模型参数,更新所述联邦学习模型的参数,并统计联盟训练各参与方的历史回合的次数;所述联邦学习模型的参数的更新次数与联盟训练各参与方的历史回合的次数相等;获取记录的所述各参与方在各历史回合攻击所述联邦学习模型的次数;根据联盟训练各参与方的历史回合的次数以及所述各参与方在各历史回合攻击所述联邦学习模型的次数,确定所述各参与方在所述各历史回合的历史攻击几率;根据所述各参与方在所述各历史回合的历史攻击几率和所述预设攻击几率模型,确定所述各参与方对所述联邦学习模型进行攻击的攻击几率。
- 如权利要求1所述的方法,其特征在于,根据所述上一回合恶意攻击方的攻击几率和预设监测力度模型,确定上一回合的目标监测力度,包括:确定上一回合联盟的历史损失、上一回合联盟的历史监测成本、上一回合所述恶意攻击方攻击所述联邦学习模型的成功率、上一回合所述联邦学习模型遭受所述恶意攻击方攻击时联盟对应的损失力度;根据所述历史损失、所述历史监测成本、所述上一回合所述恶意攻击方攻击所述联邦学习模型的成功率、所述上一回合所述联邦学习模型遭受所述恶意攻击方攻击时联盟对应的损失力度、所述上一回合恶意攻击方的攻击几率和预设监测力度模型,确定上一回合的目标监测力度。
- 如权利要求4所述的方法,其特征在于,所述目标监测力度在第一阈值和第二阈值之间;根据所述历史损失、所述历史监测成本、所述上一回合所述恶意攻击方攻击所述联邦学习模型的成功率、所述上一回合所述联邦学习模型遭受所述恶意攻击方攻击时联盟对应的损失力度、所述上一回合恶意攻击方的攻击几率和预设监测力度模型,确定上一回合的目标监测力度,包括:根据所述历史损失、所述历史监测成本、所述上一回合所述恶意攻击方攻击所述联邦学习模型的成功率、所述上一回合所述联邦学习模型遭受所述恶意攻击方攻击时联盟对应的损失力度、所述上一回合恶意攻击方的攻击几率和预设监测力度模型,确定上一回合的最大监测力度;判断所述最大监测力度是否大于等于所述第二阈值;若所述最大监测力度大于等于所述第二阈值,则将所述第二阈值作为所述目标监测力度;否则,将所述最大监测力度作为所述目标监测力度。
- 如权利要求1-5任一项所述的方法,其特征在于,根据所述攻击几率从所述各参与方中识别出恶意攻击方,包括:获取预设的恶意攻击方名单;所述恶意攻击方名单包括所述恶意攻击方的标识信息和所述恶意攻击方的攻击几率之间的对应关系;根据所述攻击几率和所述恶意攻击方名单,从所述各参与方中识别出恶意攻击方。
- 如权利要求6所述的方法,其特征在于,在根据所述惩罚力度,确定所述恶意攻击方的惩罚损失之后,还包括:根据所述惩罚损失更新所述预设动态监测机制的监测预算,用于监测联盟下一回合训练所述各参与方反馈的模型参数。
- 一种联邦学习装置,其特征在于,包括:监测单元,用于按照预设动态监测机制监测联盟本回合训练各参与方反馈的模型参数;处理单元,用于根据所述模型参数和预设攻击几率模型,确定所述各参与方对联邦学习模型进行攻击的攻击几率,并根据所述攻击几率从所述各参与方中识别出恶意攻击方;根据所述上一回合恶意攻击方的攻击几率和预设监测力度模型,确定上一回合的目标监测力度,并根据所述上一回合的目标监测力度和预设惩罚力度模型,确定本回合联盟针对所述恶意攻击方的惩罚力度;根据所述惩罚力度,确定所述恶意攻击方的惩罚损失;发送单元,用于将所述恶意攻击方的惩罚损失发送给所述恶意攻击方。
- 如权利要求8所述的装置,其特征在于,所述处理单元具体用于:确定监测联盟本回合训练各参与方反馈的模型参数的时长;基于所述时长,确定本回合所述联邦学习模型遭受所述恶意攻击方攻击时联盟对应的损失力度;根据所述上一回合的目标监测力度,确定所述恶意攻击方本回合攻击所述联邦学习模型的成功率;根据所述损失力度、所述成功率和所述预设惩罚力度模型,确定本回合联盟针对所述恶意攻击方的惩罚力度。
- 如权利要求8所述的装置,其特征在于,所述处理单元具体用于:根据所述模型参数,更新所述联邦学习模型的参数,并统计联盟训练各参与方的历史回合的次数;所述联邦学习模型的参数的更新次数与联盟训练各参与方的历史回合的次数相等;获取记录的所述各参与方在各历史回合攻击所述联邦学习模型的次数;根据联盟训练各参与方的历史回合的次数以及所述各参与方在各历史回合攻击所述联邦学习模型的次数,确定所述各参与方在所述各历史回合的历史攻击几率;根据所述各参与方在所述各历史回合的历史攻击几率和所述预设攻击几率模型,确定所述各参与方对所述联邦学习模型进行攻击的攻击几率。
- 如权利要求8所述的装置,其特征在于,所述处理单元具体用于:确定上一回合联盟的历史损失、上一回合联盟的历史监测成本、上一回合所述恶意攻击方攻击所述联邦学习模型的成功率、上一回合所述联邦学习模型遭受所述恶意攻击方攻击时联盟对应的损失力度;根据所述历史损失、所述历史监测成本、所述上一回合所述恶意攻击方攻击所述联邦学习模型的成功率、所述上一回合所述联邦学习模型遭受所述恶意攻击方攻击时联盟对应的损失力度、所述上一回合恶意攻击方的攻击几率和预设监测力度模型,确定上一回合的目标监测力度。
- 如权利要求11所述的装置,其特征在于,所述处理单元具体用于:根据所述历史损失、所述历史监测成本、所述上一回合所述恶意攻击方攻击所述联邦学习模型的成功率、所述上一回合所述联邦学习模型遭受所述恶意攻击方攻击时联盟对应的损失力度、所述上一回合恶意攻击方的攻击几率和预设监测力度模型,确定上一回合的最大监测力度;判断所述最大监测力度是否大于等于所述第二阈值;若所述最大监测力度大于等于所述第二阈值,则将所述第二阈值作为所述目标监测力度;否则,将所述最大监测力度作为所述目标监测力度。
- 如权利要求8-12任一项所述的装置,其特征在于,所述处理单元具体用于:获取预设的恶意攻击方名单;所述恶意攻击方名单包括所述恶意攻击方的标识信息和所述恶意攻击方的攻击几率之间的对应关系;根据所述攻击几率和所述恶意攻击方名单,从所述各参与方中识别出恶意攻击方。
- 如权利要求13所述的装置,其特征在于,所述监测单元还用于:根据所述惩罚损失更新所述预设动态监测机制的监测预算,用于监测联盟下一回合训练所述各参与方反馈的模型参数。
- 一种联邦学习装置,其特征在于,包括:收发信机、处理器和存储器;所述存储器,存储一个或多个可执行程序,被用于配置所述处理器;所述处理器,用于按照预设动态监测机制监测联盟本回合训练各参与方反馈的模型参数;根据所述模型参数和预设攻击几率模型,确定所述各参与方对联邦学习模型进行攻击的攻击几率,并根据所述攻击几率从所述各参与方中识别出恶意攻击方;根据所述上一回合恶意攻击方的攻击几率和预设监测力度模型,确定上一回合的目标监测力度,并根据所述上一回合的目标监测力度和预设惩罚力度模型,确定本回合联盟针对所述恶意攻击方的惩罚力度;所述收发信机,用于根据所述惩罚力度,确定所述恶意攻击方的惩罚损失并发送给所述恶意攻击方。
- 如权利要求15所述的装置,其特征在于,所述处理器具体用于:确定监测联盟本回合训练各参与方反馈的模型参数的时长;基于所述时长,确定本回合所述联邦学习模型遭受所述恶意攻击方攻击时联盟对应的损失力度;根据所述上一回合的目标监测力度,确定所述恶意攻击方本回合攻击所述联邦学习模型的成功率;根据所述损失力度、所述成功率和所述预设惩罚力度模型,确定本回合联盟针对所述恶意攻击方的惩罚力度。
- 如权利要求15所述的装置,其特征在于,所述处理器具体用于:根据所述模型参数,更新所述联邦学习模型的参数,并统计联盟训练各参与方的历史回合的次数;所述联邦学习模型的参数的更新次数与联盟训练各参与方的历史回合的次数相等;获取记录的所述各参与方在各历史回合攻击所述联邦学习模型的次数;根据联盟训练各参与方的历史回合的次数以及所述各参与方在各历史回合攻击所述联邦学习模型的次数,确定所述各参与方在所述各历史回合的历史攻击几率;根据所述各参与方在所述各历史回合的历史攻击几率和所述预设攻击几率模型,确定所述各参与方对所述联邦学习模型进行攻击的攻击几率。
- 如权利要求15所述的装置,其特征在于,所述处理器具体用于:确定上一回合联盟的历史损失、上一回合联盟的历史监测成本、上一回合所述恶意攻击方攻击所述联邦学习模型的成功率、上一回合所述联邦学习模型遭受所述恶意攻击方攻击时联盟对应的损失力度;根据所述历史损失、所述历史监测成本、所述上一回合所述恶意攻击方攻击所述联邦学习模型的成功率、所述上一回合所述联邦学习模型遭受所述恶意攻击方攻击时联盟对应的损失力度、所述上一回合恶意攻击方的攻击几率和预设监测力度模型,确定上一回合的目标监测力度。
- 如权利要求15所述的装置,其特征在于,所述处理器具体用于:根据所述历史损失、所述历史监测成本、所述上一回合所述恶意攻击方攻击所述联邦学习模型的成功率、所述上一回合所述联邦学习模型遭受所述恶意攻击方攻击时联盟对应的损失力度、所述上一回合恶意攻击方的攻击几率和预设监测力度模型,确定上一回合的最大监测力度;判断所述最大监测力度是否大于等于所述第二阈值;若所述最大监测力度大于等于所述第二阈值,则将所述第二阈值作为所述目标监测力度;否则,将所述最大监测力度作为所述目标监测力度。
- 如权利要求15-19任一项所述的装置,其特征在于,所述处理器具体用于:获取预设的恶意攻击方名单;所述恶意攻击方名单包括所述恶意攻击方的标识信息和所述恶意攻击方的攻击几率之间的对应关系;根据所述攻击几率和所述恶意攻击方名单,从所述各参与方中识别出恶意攻击方。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010243325.8 | 2020-03-31 | ||
CN202010243325.8A CN111445031B (zh) | 2020-03-31 | 2020-03-31 | 一种应对攻击的方法及联邦学习装置 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021196701A1 true WO2021196701A1 (zh) | 2021-10-07 |
Family
ID=71649382
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/134270 WO2021196701A1 (zh) | 2020-03-31 | 2020-12-07 | 一种应对攻击的方法及联邦学习装置 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN111445031B (zh) |
WO (1) | WO2021196701A1 (zh) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113988314A (zh) * | 2021-11-09 | 2022-01-28 | 长春理工大学 | 一种选择客户端的分簇联邦学习方法及系统 |
CN115333825A (zh) * | 2022-08-10 | 2022-11-11 | 浙江工业大学 | 针对联邦学习神经元梯度攻击的防御方法 |
CN116542342A (zh) * | 2023-05-16 | 2023-08-04 | 江南大学 | 一种可防御拜占庭攻击的异步联邦优化方法 |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111935157B (zh) * | 2020-08-12 | 2022-05-06 | 科技谷(厦门)信息技术有限公司 | 一种基于安全防御的联盟学习系统 |
CN114139713A (zh) * | 2020-08-13 | 2022-03-04 | 华为技术有限公司 | 一种联邦学习方法、设备及系统 |
CN112257063B (zh) * | 2020-10-19 | 2022-09-02 | 上海交通大学 | 一种基于合作博弈论的联邦学习中后门攻击的检测方法 |
CN112446025B (zh) * | 2020-11-23 | 2024-07-26 | 平安科技(深圳)有限公司 | 联邦学习防御方法、装置、电子设备及存储介质 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8601587B1 (en) * | 2009-09-04 | 2013-12-03 | Raytheon Company | System, method, and software for cyber threat analysis |
CN109344583A (zh) * | 2018-08-22 | 2019-02-15 | 阿里巴巴集团控股有限公司 | 阈值确定及核身方法、装置、电子设备及存储介质 |
CN110008696A (zh) * | 2019-03-29 | 2019-07-12 | 武汉大学 | 一种面向深度联邦学习的用户数据重建攻击方法 |
CN110490330A (zh) * | 2019-08-16 | 2019-11-22 | 安徽航天信息有限公司 | 一种基于区块链的分布式机器学习系统 |
CN110503207A (zh) * | 2019-08-28 | 2019-11-26 | 深圳前海微众银行股份有限公司 | 联邦学习信用管理方法、装置、设备及可读存储介质 |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11475350B2 (en) * | 2018-01-22 | 2022-10-18 | Google Llc | Training user-level differentially private machine-learned models |
CN109189825B (zh) * | 2018-08-10 | 2022-03-15 | 深圳前海微众银行股份有限公司 | 横向数据切分联邦学习建模方法、服务器及介质 |
CN110908893A (zh) * | 2019-10-08 | 2020-03-24 | 深圳逻辑汇科技有限公司 | 联邦学习的沙盒机制 |
CN112734045B (zh) * | 2020-01-16 | 2022-07-12 | 支付宝(杭州)信息技术有限公司 | 一种联邦学习的异常处理方法、装置及电子设备 |
-
2020
- 2020-03-31 CN CN202010243325.8A patent/CN111445031B/zh active Active
- 2020-12-07 WO PCT/CN2020/134270 patent/WO2021196701A1/zh active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8601587B1 (en) * | 2009-09-04 | 2013-12-03 | Raytheon Company | System, method, and software for cyber threat analysis |
CN109344583A (zh) * | 2018-08-22 | 2019-02-15 | 阿里巴巴集团控股有限公司 | 阈值确定及核身方法、装置、电子设备及存储介质 |
CN110008696A (zh) * | 2019-03-29 | 2019-07-12 | 武汉大学 | 一种面向深度联邦学习的用户数据重建攻击方法 |
CN110490330A (zh) * | 2019-08-16 | 2019-11-22 | 安徽航天信息有限公司 | 一种基于区块链的分布式机器学习系统 |
CN110503207A (zh) * | 2019-08-28 | 2019-11-26 | 深圳前海微众银行股份有限公司 | 联邦学习信用管理方法、装置、设备及可读存储介质 |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113988314A (zh) * | 2021-11-09 | 2022-01-28 | 长春理工大学 | 一种选择客户端的分簇联邦学习方法及系统 |
CN113988314B (zh) * | 2021-11-09 | 2024-05-31 | 长春理工大学 | 一种选择客户端的分簇联邦学习方法及系统 |
CN115333825A (zh) * | 2022-08-10 | 2022-11-11 | 浙江工业大学 | 针对联邦学习神经元梯度攻击的防御方法 |
CN115333825B (zh) * | 2022-08-10 | 2024-04-09 | 浙江工业大学 | 针对联邦学习神经元梯度攻击的防御方法 |
CN116542342A (zh) * | 2023-05-16 | 2023-08-04 | 江南大学 | 一种可防御拜占庭攻击的异步联邦优化方法 |
Also Published As
Publication number | Publication date |
---|---|
CN111445031B (zh) | 2021-07-27 |
CN111445031A (zh) | 2020-07-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021196701A1 (zh) | 一种应对攻击的方法及联邦学习装置 | |
CN107067255B (zh) | 区块链中账户的处理方法和装置 | |
WO2020177392A1 (zh) | 基于联邦学习的模型参数训练方法、装置、设备及介质 | |
US9202173B1 (en) | Using link analysis in adversarial knowledge-based authentication model | |
CN115333825A (zh) | 针对联邦学习神经元梯度攻击的防御方法 | |
US10700855B2 (en) | Reinforcement learning-based encryption and decryption method and client and server system performing the same | |
CN108574668B (zh) | 一种基于机器学习的DDoS攻击流量峰值预测方法 | |
CN112668913B (zh) | 基于联邦学习的网络构建方法、装置、设备及存储介质 | |
CN108282334A (zh) | 一种基于区块链的多方密钥协商装置、方法及系统 | |
Hasan et al. | A signaling game approach to mitigate co-resident attacks in an IaaS cloud environment | |
CN112073173A (zh) | 一种面向区块链pki的非法签名者确定系统 | |
CN114546527B (zh) | 一种纵向多方数据聚合计算解决方案系统 | |
Devanny | ‘Madman Theory’or ‘Persistent Engagement’? The Coherence of US Cyber Strategy under Trump | |
US20170026409A1 (en) | Phishing campaign ranker | |
Rass et al. | A unified framework for the analysis of availability, reliability and security, with applications to quantum networks | |
Chen et al. | Disclose or exploit? A game-theoretic approach to strategic decision making in cyber-warfare | |
US20170244753A1 (en) | Establishing a secure data exchange channel | |
CN117077092A (zh) | 模型产权保护方法、设备、存储介质及程序产品 | |
Dabbous et al. | Circulation of Fake News: Threat Analysis Model to Assess the Impact on Society and Public Safety | |
CN112738129B (zh) | 一种网络用户的身份核实认证方法及系统 | |
CN109246121A (zh) | 攻击防御方法、装置、物联网设备及计算机可读存储介质 | |
Yang et al. | The Application of Reinforcement Learning to the FlipIt Security Game | |
CN116384514B (zh) | 可信分布式服务器集群的联邦学习方法、系统及存储介质 | |
WO2018045780A1 (zh) | 医疗数据处理方法、集群处理系统及方法 | |
CN117439737B (zh) | 协同签名方法和协同签名系统 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20929303 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 190123) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20929303 Country of ref document: EP Kind code of ref document: A1 |