CN115292738B - Method for detecting security and correctness of federated learning model and data - Google Patents

Method for detecting security and correctness of federated learning model and data Download PDF

Info

Publication number
CN115292738B
CN115292738B CN202211219715.7A CN202211219715A CN115292738B CN 115292738 B CN115292738 B CN 115292738B CN 202211219715 A CN202211219715 A CN 202211219715A CN 115292738 B CN115292738 B CN 115292738B
Authority
CN
China
Prior art keywords
data
model
result
detecting
detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211219715.7A
Other languages
Chinese (zh)
Other versions
CN115292738A (en
Inventor
陈万钢
李昆阳
饶金涛
杨伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Haofu Cipher Detection Technology Chengdu Co ltd
Original Assignee
Haofu Cipher Detection Technology Chengdu Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Haofu Cipher Detection Technology Chengdu Co ltd filed Critical Haofu Cipher Detection Technology Chengdu Co ltd
Priority to CN202211219715.7A priority Critical patent/CN115292738B/en
Publication of CN115292738A publication Critical patent/CN115292738A/en
Application granted granted Critical
Publication of CN115292738B publication Critical patent/CN115292738B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Abstract

The invention relates to a federated learning model and a method for detecting data security and correctness, which belong to the technical field of passwords and data security, the qualification of a federated learning participant is detected, the detection and verification of the data security and correctness of the whole federated learning model are realized by introducing a third-party trusted computing module in the federated learning detection process, and the judgment of whether the model fails or not and the judgment of the data security in the federated learning process are realized by detecting and totally judging related links involved in the federated learning. The method and the device solve the defects that the qualification of the participants, the application correctness of the passwords of the participants, the safety and the correctness of the final model and the verification by a third-party trusted computing module are not detected in the traditional technology, and can detect the correctness and the safety on the premise of ensuring the safety.

Description

Method for detecting security and correctness of federated learning model and data
Technical Field
The invention relates to the technical field of data security, in particular to a method for detecting data security and correctness of a federal learning model.
Background
Federal learning is called federal machine learning, joint learning and alliance learning, is one of key technologies of privacy protection calculation, and currently, the federal learning type includes horizontal federal learning and longitudinal federal learning; the federal learning can realize language model updating by coordinating a plurality of intelligent terminals with loose structures through a central server; the working principle is as follows: the client terminal downloads the existing model from the central server, trains the model by using local data, and uploads updated contents of the model to the cloud. The training model is integrated by updating the models of different terminals, so that the model is optimized, the client terminal downloads the updated model to the local, the process is repeated continuously, the terminal data are stored in the local all the time in the whole process, and the risk of data leakage does not exist. However, no technical solution is disclosed at present through a federal learning model and data security and correctness detection.
It is noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure and therefore may include information that does not constitute prior art that is already known to a person of ordinary skill in the art.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, provides a federated learning model and a method for detecting the security and correctness of data, and solves the problems that the qualification of participants, the application correctness of passwords of the participants and the security and correctness of a final model are not detected in the prior art.
The purpose of the invention is realized by the following technical scheme: a method for detecting a federated learning model and data security and correctness comprises the following steps:
s1, setting and selecting evaluation parameters and models of a federal learning participant, and detecting whether the participant is qualified or not and whether samples are aligned or not;
s2, the participator uses a cryptographic module to produce cryptographic hash algorithm parameters, and adopts a cryptographic hash algorithm to calculate hash values of the participated training data;
s3, setting a model issuing mode and sequence of each round, then sending the model, comparing the model received by at least one participant with the model of the model issuing party, and detecting the consistency of the models;
s4, setting sample data according to the characteristics and the purpose of the model, sending the sample data, the weight data and the parameter data by the model sending party, comparing the data received by the participating party with the sent data, and detecting the consistency of the data and the model;
s5, evaluating and detecting data of a data sender and a data participant according to preset data characteristic overlapping evaluation indexes and user overlapping evaluation indexes, judging whether the federal learning type is reasonable or not, and outputting a federal learning type rationality result;
s6, selecting the participants with abnormal parameters in the training process according to the proportion and the quantity, carrying out data detection on the participants, stopping the participation of the participants with abnormal data in the training process, and carrying out abnormal reminding;
s7, encrypting the model gradient, the model parameters and the intermediate result data obtained by the training model and then sending the encrypted data to other participants for detection;
s8, detecting and integrating the gradient of the model, the parameters of the model and the intermediate result data;
and S9, according to the federal learning model, each participant receives the new model gradient, the model parameters and the intermediate data, decrypts the data, updates the model and detects the deviation of at least one participant new model.
The detecting whether the participant is qualified and whether the samples are aligned comprises:
s11, detecting at least one participant meeting the participation requirement according to the evaluation parameters, importing the model and the data into a third-party trusted computing module for computation and verification, and outputting a detection result;
s12, detecting at least one potential participant which does not meet the participation requirement according to the parameter requirement, importing the model and the data into a third-party trusted computing module for computation and verification, and outputting a detection result;
s13, carrying out federal learning of sample encryption alignment, checking whether encryption protection and coding protection are carried out on data interaction in the sample alignment process, and reminding that risks exist if encryption protection is not carried out on data or only coding protection is carried out on the data;
s14, carrying out federal learning of sample encryption alignment, inputting specified plaintext data in a data interaction process, acquiring encrypted data or encoded data in the data interaction process, and outputting risk reminding according to an acquisition result;
s15, federated learning of sample encryption alignment needs to be carried out, the data of the participator sample is imported into a third-party trusted computing module to compare with overlapping users, the overlapping user name is compared with the sample data, if the data of the participator sample is not consistent with the sample data, the fact that the sample alignment is in problem is indicated, and risk reminding is output.
The consistency of the detection model comprises:
s31, under the condition of carrying out encryption protection on the model file, decrypting the model encrypted file through the key received by the participant, if the decryption is successful, executing the step S32, and if the decryption is unsuccessful, terminating the detection and outputting a detection result;
and S32, calculating the model hash value received by the participant by adopting a cryptographic hash algorithm, comparing the model hash value with the hash value obtained by the model calculation of the model sender, if the model hash value is consistent with the hash value, judging that the models are consistent, executing the step S4, if the models are inconsistent, judging that the models are inconsistent, terminating the detection, and outputting a detection result.
The detecting consistency of data and model comprises:
s41, under the condition of encrypting the data, decrypting the file by using the received secret key, if the decryption is successful, executing a step S42, if the decryption is unsuccessful, terminating the detection, and outputting a detection result;
s42, calculating a data hash value received by a participant by adopting a password hash algorithm, comparing the data hash value with a hash value obtained by calculating data of a data sender, if the data hash value is consistent with the hash value, judging that the data are consistent, executing a step S43, if the data are inconsistent, judging that the data are inconsistent, terminating the detection, and outputting a detection result;
s43, substituting the sample data into the model for calculation, comparing the obtained result with the result obtained by the model sender through calculation of the sample data in the model, if the result is consistent with the result obtained by calculation of the model sender through the sample data, judging that the models are consistent, executing the step S5, if the result is inconsistent with the result obtained by calculation of the model sender through the sample data, judging that the models are inconsistent, terminating the detection, and outputting the detection result.
The data detection in step S6 specifically includes the following contents:
s61, calculating the ratio of the participants with abnormal safety characteristic parameters to the total participants, outputting the ratio result, and if the ratio exceeds an expected value, carrying out risk reminding;
s62, comparing the hash value of the data of the detected party with the hash value calculated in the detection starting stage, if the hash value is consistent with the hash value, indicating that the data is not modified, and if the hash value is inconsistent with the hash value, indicating that the data is modified, and outputting a risk prompt;
and S63, inputting the training result, the training data, the training model, the model parameters and the allowable deviation of the data of the detected participant into a third-party trusted computing module, retraining in the third-party trusted computing module, and judging whether to output risk reminding according to the training result.
The specific content detected in step S7 includes:
s71, checking whether data transmission of communication parties among the parties, between the parties and a sender under the model is protected by encryption, coding or both encryption and coding, and if not, reminding that data security risk exists;
s72, decrypting or decoding the protected data, comparing the decoding or decoding result with a plaintext, if the decoding or decoding result is consistent with the plaintext, judging that the protection measures are correct, and if the decoding or decoding result is inconsistent with the plaintext, judging that the protection measures are wrong, and reminding that the data safety risk exists;
and S73, inputting specified data, checking whether the result data obtained by encrypting the specified data is consistent with the expected result data or within an allowable deviation range, comparing the data obtained by decrypting the data with the specified data, and judging whether the data is consistent with the specified data.
The specific contents of detecting data such as the model gradient, the model parameters and the intermediate result in the step S8 include:
a1, decrypting or decoding data such as a model gradient, a model parameter and an intermediate result of encryption protection or coding protection to obtain corresponding data;
a2, comparing the data obtained by decryption or decoding according to the set important data deviation, judging whether the data are in an allowable deviation range, if so, executing the step A3, and if not, outputting a risk prompt;
and A3, transmitting the model gradient, the data structure, the intermediate result and a small amount of sample data of the participant into a third-party trusted computing module, performing inverse operation according to a computing mode of obtaining the model gradient and the intermediate result to obtain a data model based on the data structure, and importing the sample data into the model for operation.
The specific content of the detection for integrating the data such as the model gradient, the model parameters and the intermediate result in the step S8 includes:
b1, setting an integration method, and integrating data such as model gradient, model parameters and intermediate results according to the integration method;
b2, setting a data deviation range, comparing whether the integrated data is in the deviation range, and outputting a risk prompt if the integrated data is out of the deviation range.
The detecting the deviation of the at least one new participant model comprises:
s91, the participator calculates in the participator or inputs data, a new model and a previous model into a third-party trusted calculation module, and a new result and a previous result are obtained by respectively adopting the new model and the previous model;
s92, comparing the new result with the previous result, setting a comparison deviation range, and outputting a risk prompt if the comparison result exceeds the comparison deviation range;
s93, inputting the federal learning target data into a participant or a third-party trusted computing module, comparing the computing result of the last round of model with the target data in the participant or the third-party trusted computing module, and if the computing result does not accord with the comparison result, judging that the model of the participant fails;
s94, setting the number of the participants with model failure to reach a certain number or a percentage limit value, counting the number of the participants with model failure, calculating the percentage, and if the number of the participants with model failure or the percentage exceeds the limit value, judging that the whole federal model fails.
The detection method further comprises the following steps: and detecting whether the participator adopts the cryptographic technology to carry out data protection and security authentication in the communication process.
The invention has the following advantages: a third-party trusted computing module is introduced in the federal learning detection process to realize the detection and verification of the data security and correctness of the whole federal learning model, and the judgment of whether the model fails or not and the judgment of the data security in the federal learning process are realized by detecting and totally judging related links involved in the federal learning.
Drawings
FIG. 1 is a schematic flow chart of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the detailed description of the embodiments of the present application provided below in connection with the appended drawings is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application. The invention is further described below with reference to the accompanying drawings.
As shown in fig. 1, the invention specifically relates to a federal learning model and a method for detecting data security and correctness, forming a federal learning detection process, introducing a third-party trusted computing module in the process to realize the detection and verification of the whole federal learning model and data security and correctness, and realizing the judgment of whether the model fails or not and the judgment of data security in the federal learning process by detecting and generally judging related links involved in the federal learning; the method specifically comprises the following steps:
step 1, setting and selecting evaluation parameters and models of federal learning participants; detecting whether the participants are qualified and whether the samples are aligned; the specific detection is as follows:
1) And detecting at least one participant meeting the participation requirement according to the evaluation parameters. And inputting the selected data and evaluation parameters of the detected party into a third-party trusted computing module. And calculating the received data of the detected party in the third-party trusted calculation module to obtain an evaluation parameter, and comparing the parameter with a set evaluation parameter of the party. If the requirement of the evaluation parameters is met, the output is 'yes', and the selection is not problematic. And if the evaluation parameter requirement is not met, outputting 'no', and reminding the user that the selection has a problem. The third-party computing module does not output other information to the outside, and the safety of the collected data is guaranteed.
2) At least one non-compliant potential participant is detected according to the parameter requirements. And inputting the data and the evaluation parameters into a third-party trusted computing module. And calculating the received data in the third-party trusted computing module to obtain an evaluation parameter, and comparing the parameter with the set evaluation parameter of the participant. If the evaluation parameter requirement is not met, yes is output, and the selection is not problematic. And if the evaluation parameter requirement is met, outputting 'no', and reminding that the selection has a problem. The third-party computing module does not output other information to the outside, and the safety of the collected data is guaranteed.
3) And checking whether the data interaction in the sample alignment process is encrypted or encoded in a data packet capturing mode for the federal study needing to carry out the sample encryption alignment. And if the data is not encrypted or only coded, reminding that the risk exists.
4) And for the federal learning needing to carry out sample encryption alignment, inputting specified plaintext data in the data interaction process, and acquiring encrypted data or encoded data in the interaction process in a packet capturing mode. And inputting the encrypted data or the encoded data, the specified plaintext data and the corresponding encryption method, decryption method, encoding method and decoding method into the third-party trusted computing module. And encrypting or encoding the appointed plaintext data in the third-party trusted computing module, and comparing the obtained encrypted data or encoded data with the encrypted data or encoded data obtained in a packet capturing mode. If the two are identical, the output is yes, which indicates that the encryption implementation or the encoding implementation is correct. If not, a "no" output indicates that the encryption implementation or the encoding implementation is incorrect. And randomly selecting the encrypted data or the coded data in the third-party trusted computing module to decrypt or decode, and comparing whether the decrypted or decoded data is consistent with the specified plaintext data. If so, a "yes" output indicates that the encryption implementation or encoding implementation is correct. If not, a "no" output indicates that the encryption implementation or encoding implementation is incorrect. If the step is in the sample alignment process interactively, outputting a risk prompt if the condition of outputting 'no' indicates that the sample alignment is in problem.
5) And (3) for federal learning of sample encryption alignment to be carried out, importing the sample data of the participants into a third-party trusted computing module, comparing the overlapping users, and comparing the overlapping user names with the sample data. And if the sample alignment is inconsistent, outputting 'no', indicating that the sample alignment has a problem, and outputting a risk prompt.
And 2, calculating the hash value of the training data by adopting a cryptographic hash algorithm. If there is data source information, the hash value should be calculated for both data source information and training data.
And 3, setting the issuing mode and sequence of each round of model and sending the model. And comparing the model received by at least one participant with the model of the model sender, and detecting the consistency of the models. The detection of model identity is:
1) And in the case of carrying out encryption protection on the model file, decrypting the model encrypted file by using the key received by the participant. If the decryption is successful, the next detection is carried out; if the decryption is not successful, the detection is terminated, and a detection result is given.
2) And calculating the model hash value received by the participant by adopting a cryptographic hash algorithm, and comparing the value with the hash value obtained by calculating the model of the model sender. If the models are consistent, judging that the models are consistent, and entering the next detection; if not, judging that the models are not consistent, terminating the detection and giving a detection result.
And 4, setting sample data according to the characteristics and the purpose of the model, transmitting the data such as the sample data, the weight data, the parameter data and the like by the model transmitting party, comparing the data received by the participating party with the transmitted data, and detecting the consistency of the data and the model. The consistency detection of the data and the model comprises the following steps:
1) And in the case of encrypting and protecting the data, decrypting the file by using the received key. If the decryption is successful, the next detection is carried out; if the decryption is not successful, the detection is terminated, and a detection result is given.
2) And calculating the data hash value received by the participant by adopting a cryptographic hash algorithm, and comparing the value with the hash value obtained by calculating the data of the data sender. If the data are consistent, judging that the data are consistent, and entering the next detection; if not, judging that the data are not consistent, terminating the detection and giving a detection result.
3) And substituting the sample data into the model for calculation, and comparing the obtained result with the result obtained by the model sender through calculation of the data in the model. If the models are consistent, judging that the models are consistent, and entering the next detection; if not, judging that the models are not consistent, terminating the detection and giving a detection result.
And 5, detecting the type of federal learning, wherein the type can be divided into horizontal federal learning, vertical federal learning, federal transfer learning or other pre-designed models. And evaluating and detecting data of a data sender and a participant according to preset evaluation indexes such as data characteristic overlapping evaluation indexes and user overlapping evaluation indexes, and judging whether the federal learning type is reasonable. The data amount used for the type reasonableness detection is not lower than the preset data amount and the number of the participants. And outputting a result of the reasonability of the federal learning type, and entering the next detection.
And 6, setting characteristic parameters reflecting the federal learning safety problem. And selecting the participants with abnormal characteristic parameters in the training process according to a certain proportion or quantity, carrying out data detection on the participants, and stopping the participants with unqualified detection to participate in the training in time. The data was checked as follows:
1) And calculating the ratio of the abnormal participants of the security feature parameters to the total participants, and outputting the ratio.
2) And comparing the data hash value of the detected party with the hash value calculated in the detection starting stage. If the data are consistent, the data are not modified; if not, indicating that the data is modified, and outputting a prompt;
3) Inputting the training result, the training data, the training model, the model parameters and the allowable deviation of the data of the detected participant into a third-party trusted computing module, and retraining in the third-party trusted computing module. If the result and the parameter calculated by the trusted computing device are consistent with the calculated result and parameter of the detected party or within the allowable deviation range, yes is output. If the deviation range is exceeded, outputting 'no', comparing the data hash values, and verifying whether the training data changes. And if the change occurs, outputting a risk reminder.
And 7, encrypting data such as the gradient, the model parameters, the intermediate result and the like obtained by the training model by the participator, and then sending the encrypted data to other participators, including a model lower sender, such as a polymerization server, a coordinator and the like, for detection. The detection is as follows:
1) And adopting a data sending and appointed transmission mode or a data packet capturing mode to check whether data transmission of communication parties among the parties, between the parties and a sender under the model is protected by encryption, coding or both encryption and coding. And if the data is not protected, reminding that the data security risk exists.
2) And decrypting or decoding the protected data by using a key or an encoding mode negotiated by the communication party, and comparing a decryption or decoding result with a plaintext. And if the protection measures are consistent, judging that the corresponding protection measures are correctly realized. And if the data are inconsistent, judging that the corresponding protection measures are mistakenly implemented, and reminding that the data safety risk exists.
3) For the encryption protection mode which is not based on ZUC, SM2, SM4 and SM9 cryptographic algorithms and technologies, the result data after the encryption of the specified data is checked whether to be consistent with the expected result data or whether to be within the allowable deviation range with the expected result data by inputting the specified data, and the data obtained after the decryption of the data is compared with the specified data to judge whether to be consistent with the specified data.
And 8, detecting important data such as intermediate results, model gradients, model parameters and the like transmitted by all parties. The detection is as follows:
1) And decrypting or decoding important data such as intermediate results of encryption protection or coding protection, model gradients, model parameters and the like by adopting a key or a decoding mode negotiated by a communication party to obtain corresponding data.
2) And comparing the decrypted or decoded data according to the set deviation of each important data to judge whether the data is in the deviation range. And if the deviation is within the range, entering the next detection. And if the deviation is out of the range, outputting a prompt.
3) And transferring necessary data such as model gradient data, a data structure, an intermediate result, a small amount of sample data and the like of the participant into a third-party trusted computing module, and performing inverse operation on the model gradient data and the intermediate result according to a computing mode of obtaining the model gradient data and the intermediate result to obtain a data model based on the data structure. And importing the sample data into model operation. If the sample data conforms to the model, the trusted device outputs "no". There is a risk of speculating other party data between the parties.
And 9, carrying out integrated detection on data such as model gradient, model parameters and intermediate results. The detection is as follows:
1) Setting an integrated method, such as a gradient tie adopted by horizontal federal learning, a method of safety aggregation such as model averaging, and the like.
2) And setting a data deviation range, and comparing whether the integrated data is in the deviation range. And if the deviation range is exceeded, outputting a prompt.
And step 10, according to the federal learning type, each participant receives corresponding new model gradient, model parameter, intermediate result or other data, and after decrypting or decoding the data, the model is updated, and the deviation of at least one participant new model is detected. The detection is as follows:
1) And the participator carries out internal calculation in the participator, or inputs the data, the new model and the previous model into a third-party trusted calculation module, and respectively adopts the new model and the previous model for calculation to respectively obtain a new result and a previous result.
2) The new results are compared with the previous round. And setting a deviation range. And if the deviation exceeds the range, outputting a prompt.
3) And inputting the federal learning target data into the participant or the third-party trusted computing module, computing a result according to the last round of model in the participant or the third-party trusted computing module, and comparing the result with the target data. If the data deviation expectation is not met or exceeded, the model is judged to fail for the present participant.
4) The participants who fail to set the model reach a certain number or a duty limit. And (5) counting the number of the failed participants of the model and calculating the proportion of the failed participants. And if the number or the proportion of the losers of the model exceeds a threshold value, judging that the whole federal learning model fails.
And step 10, detecting whether the participator adopts the cryptographic technology to carry out data protection and security authentication in the communication process.
The foregoing is illustrative of the preferred embodiments of this invention, and it is to be understood that the invention is not limited to the precise form disclosed herein and that various other combinations, modifications, and environments may be resorted to, falling within the scope of the concept as disclosed herein, either as described above or as apparent to those skilled in the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (9)

1. A method for detecting the security and correctness of a federated learning model and data is characterized in that: the detection method comprises the following steps:
s1, setting and selecting evaluation parameters and models of federal learning participants, and detecting whether the participants are qualified and whether samples are aligned;
s2, the participant uses a password module to generate a password hash algorithm parameter and adopts a password hash algorithm to calculate a hash value of the training data;
s3, setting a model issuing mode and sequence of each round, then sending the model, comparing the model received by at least one participant with the model of the model issuing party, and detecting the consistency of the models;
s4, setting sample data according to the characteristics and the purpose of the model, sending the sample data, the weight data and the parameter data by the model sending party, comparing the data received by the participating party with the sent data, and detecting the consistency of the data and the model;
s5, evaluating and detecting data of a data sender and a data participant according to preset data characteristic overlapping evaluation indexes and user overlapping evaluation indexes, judging whether the federal learning type is reasonable or not, and outputting a federal learning type rationality result;
s6, selecting participants with abnormal parameters in the training process according to the proportion and the quantity, carrying out data detection on the participants, stopping the participation of the participants with abnormal data in the training process, and carrying out abnormal reminding;
s7, encrypting the model gradient, the model parameters and the intermediate result data obtained by the training model and then sending the encrypted data to other participants for detection;
s8, detecting and integrating the gradient of the model, the parameters of the model and the intermediate result data;
s9, according to the federal learning model, each participant receives new model gradients, model parameters and intermediate data, decrypts the data, updates the model and detects the deviation of at least one participant new model;
the detecting whether the participant is qualified and whether the samples are aligned comprises:
s11, detecting at least one participant meeting the participation requirement according to the evaluation parameters, importing the model and the data into a third-party trusted computing module for computation and verification, and outputting a detection result;
s12, detecting at least one potential participant which does not meet the participation requirement according to the parameter requirement, importing the model and the data into a third-party trusted computing module for computation and verification, and outputting a detection result;
s13, carrying out federal learning of sample encryption alignment, checking whether encryption protection and coding protection are carried out on data interaction in the sample alignment process, and reminding that risks exist if encryption protection is not carried out on data or only coding protection is carried out on the data;
s14, carrying out federated learning of sample encryption alignment, inputting specified plaintext data in a data interaction process, acquiring encrypted data or encoded data in the data interaction process, and outputting a risk prompt according to an acquisition result;
s15, federated learning of sample encryption alignment needs to be carried out, the data of the participator sample is imported into a third-party trusted computing module to compare with overlapping users, the overlapping user name is compared with the sample data, if the data of the participator sample is not consistent with the sample data, the fact that the sample alignment is in problem is indicated, and risk reminding is output.
2. The method for detecting the Federation learning model and the data safety and correctness according to claim 1, characterized in that: the consistency of the detection model comprises:
s31, under the condition of carrying out encryption protection on the model file, decrypting the model encrypted file through the key received by the participant, if the decryption is successful, executing the step S32, if the decryption is unsuccessful, terminating the detection, and outputting a detection result;
s32, calculating the model hash value received by the participant by adopting a cryptographic hash algorithm, comparing the model hash value with the hash value obtained by the model calculation of the model sender, if the model hash value is consistent with the hash value, judging that the models are consistent, executing the step S4, if the model hash values are inconsistent, judging that the models are inconsistent, terminating the detection, and outputting a detection result.
3. The method for detecting the security and correctness of the federated learning model and the data according to claim 1, characterized in that: the detecting consistency of data and model comprises:
s41, under the condition of encrypting the data, decrypting the file by using the received key, if the decryption is successful, executing the step S42, if the decryption is unsuccessful, terminating the detection, and outputting a detection result;
s42, calculating a data hash value received by a participant by adopting a password hash algorithm, comparing the data hash value with a hash value obtained by calculating data of a data sender, if the data hash value is consistent with the hash value, judging that the data are consistent, executing a step S43, if the data are inconsistent, judging that the data are inconsistent, terminating the detection, and outputting a detection result;
s43, substituting the sample data into the model for calculation, comparing the obtained result with the result obtained by the model sender through calculation of the sample data in the model, if the result is consistent with the result obtained by calculation of the model sender through calculation of the sample data, judging that the models are consistent, executing the step S5, if the result is inconsistent with the result, judging that the models are inconsistent, terminating the detection, and outputting the detection result.
4. The method for detecting the Federation learning model and the data safety and correctness according to claim 1, characterized in that: the data detection in step S6 specifically includes the following contents:
s61, calculating the ratio of the abnormal participants of the safety characteristic parameters to the total participants, outputting the ratio result, and if the ratio exceeds an expected value, carrying out risk reminding;
s62, comparing the hash value of the data of the detected party with the hash value calculated in the detection starting stage, if the hash value is consistent with the hash value, indicating that the data is not modified, and if the hash value is inconsistent with the hash value, indicating that the data is modified, and outputting a risk prompt;
and S63, inputting the training result, the training data, the training model, the model parameters and the allowable deviation of the data of the detected participant into a third-party trusted computing module, retraining in the third-party trusted computing module, and judging whether to output risk reminding according to the training result.
5. The method for detecting the security and correctness of the federated learning model and the data according to claim 1, characterized in that: the specific content detected in step S7 includes:
s71, checking whether data transmission of communication parties among the parties, between the parties and a sender under the model is protected by encryption, coding or both encryption and coding, and if not, reminding that data security risk exists;
s72, decrypting or decoding the protected data, comparing the decoding or decoding result with a plaintext, if the decoding or decoding result is consistent with the plaintext, judging that the protection measures are correct, and if the decoding or decoding result is inconsistent with the plaintext, judging that the protection measures are wrong, and reminding that the data safety risk exists;
and S73, inputting specified data, checking whether the result data obtained by encrypting the specified data is consistent with the expected result data or in an allowable deviation range, comparing the data obtained by decrypting the data with the specified data, and judging whether the data is consistent with the specified data.
6. The method for detecting the security and correctness of the federated learning model and the data according to claim 1, characterized in that: the specific contents of detecting the model gradient, the model parameters and the intermediate result data in the step S8 include:
a1, decrypting or decoding the model gradient, the model parameter and the intermediate result data of encryption protection or coding protection to obtain corresponding data;
a2, comparing the data obtained by decryption or decoding according to the set important data deviation, judging whether the data are in an allowable deviation range, if so, executing the step A3, and if not, outputting a risk prompt;
and A3, transmitting the model gradient, the data structure, the intermediate result and a small amount of sample data of the participant into a third-party trusted computing module, performing inverse operation according to a computing mode of obtaining the model gradient and the intermediate result to obtain a data model based on the data structure, and importing the sample data into the model for operation.
7. The method for detecting the security and correctness of the federated learning model and the data according to claim 1, characterized in that: the specific content of the detection for integrating the model gradient, the model parameters and the intermediate result data in the step S8 includes:
b1, setting an integration method, and integrating the gradient of the model, the parameters of the model and the intermediate result data according to the integration method;
and B2, setting a data deviation range, comparing whether the integrated data is in the deviation range, and outputting a risk prompt if the integrated data is out of the deviation range.
8. The method for detecting the security and correctness of the federated learning model and the data according to claim 1, characterized in that: the detecting the deviation of the at least one new participant model comprises:
s91, the participator calculates in the participator or inputs data, a new model and a previous model into a third-party trusted calculation module, and a new result and a previous result are obtained by respectively adopting the new model and the previous model;
s92, comparing the new result with the previous result, setting a comparison deviation range, and outputting a risk prompt if the comparison result exceeds the comparison deviation range;
s93, inputting the federal learning target data into a participant or a third-party trusted computing module, comparing a computing result of the last round of model with the target data in the participant or the third-party trusted computing module, and if the computing result does not accord with the comparison result, judging that the model of the participant fails;
s94, setting the number of the participants of the model failure to reach a certain number or a ratio limit value, counting the number of the participants of the model failure, calculating the ratio, and if the number of the participants of the model failure or the ratio exceeds the limit value, judging that the whole federal model fails.
9. The method for detecting the security and correctness of the federated learning model and the data according to any one of claims 1-8, characterized in that: the detection method further comprises the following steps: and detecting whether the participator adopts the cryptographic technology to carry out data protection and security authentication in the communication process.
CN202211219715.7A 2022-10-08 2022-10-08 Method for detecting security and correctness of federated learning model and data Active CN115292738B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211219715.7A CN115292738B (en) 2022-10-08 2022-10-08 Method for detecting security and correctness of federated learning model and data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211219715.7A CN115292738B (en) 2022-10-08 2022-10-08 Method for detecting security and correctness of federated learning model and data

Publications (2)

Publication Number Publication Date
CN115292738A CN115292738A (en) 2022-11-04
CN115292738B true CN115292738B (en) 2023-01-17

Family

ID=83834965

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211219715.7A Active CN115292738B (en) 2022-10-08 2022-10-08 Method for detecting security and correctness of federated learning model and data

Country Status (1)

Country Link
CN (1) CN115292738B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116015610B (en) * 2022-12-19 2023-08-22 豪符密码检测技术(成都)有限责任公司 Detection method for lightweight passwords
CN115828302B (en) * 2022-12-20 2023-07-07 华北电力大学 Micro-grid-connected control privacy protection method based on trusted privacy calculation
CN116305080B (en) * 2023-05-15 2023-07-28 豪符密码检测技术(成都)有限责任公司 Universal password detection method
CN116383856B (en) * 2023-05-24 2023-08-29 豪符密码检测技术(成都)有限责任公司 Safety and effectiveness detection method for data safety protection measures

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021004551A1 (en) * 2019-09-26 2021-01-14 深圳前海微众银行股份有限公司 Method, apparatus, and device for optimization of vertically federated learning system, and a readable storage medium
CN112580821A (en) * 2020-12-10 2021-03-30 深圳前海微众银行股份有限公司 Method, device and equipment for federated learning and storage medium
CN112949865A (en) * 2021-03-18 2021-06-11 之江实验室 Sigma protocol-based federal learning contribution degree evaluation method
CN113159327A (en) * 2021-03-25 2021-07-23 深圳前海微众银行股份有限公司 Model training method and device based on federal learning system, and electronic equipment
CN113570069A (en) * 2021-07-28 2021-10-29 神谱科技(上海)有限公司 Model evaluation method for self-adaptive starting model training based on safe federal learning
CN113591115A (en) * 2021-08-04 2021-11-02 神谱科技(上海)有限公司 Method for batch normalization in logistic regression model for safe federal learning
CN113591152A (en) * 2021-08-04 2021-11-02 神谱科技(上海)有限公司 LightGBM algorithm-based longitudinal federal modeling method
WO2021232754A1 (en) * 2020-05-22 2021-11-25 深圳前海微众银行股份有限公司 Federated learning modeling method and device, and computer-readable storage medium
CN113722987A (en) * 2021-08-16 2021-11-30 京东科技控股股份有限公司 Federal learning model training method and device, electronic equipment and storage medium
CN113779608A (en) * 2021-09-17 2021-12-10 神谱科技(上海)有限公司 Data protection method based on WOE mask in multi-party longitudinal federal learning LightGBM training
CN114091356A (en) * 2022-01-18 2022-02-25 北京邮电大学 Method and device for federated learning
CN114389824A (en) * 2022-03-24 2022-04-22 湖南天河国云科技有限公司 Verification updating method and device of trusted computing trust chain based on block chain
CN114841356A (en) * 2021-01-14 2022-08-02 新智数字科技有限公司 Internet of things-based joint learning engine overall architecture system
CN114998251A (en) * 2022-05-30 2022-09-02 天津理工大学 Air multi-vision platform ground anomaly detection method based on federal learning

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112308240A (en) * 2020-11-02 2021-02-02 清华大学 Edge side machine cooperation and optimization system based on federal learning
CN112733967B (en) * 2021-03-30 2021-06-29 腾讯科技(深圳)有限公司 Model training method, device, equipment and storage medium for federal learning
CN113435121B (en) * 2021-06-30 2023-08-22 平安科技(深圳)有限公司 Model training verification method, device, equipment and medium based on federal learning
CN114330759B (en) * 2022-03-08 2022-08-02 富算科技(上海)有限公司 Training method and system for longitudinal federated learning model
CN115102763B (en) * 2022-06-22 2023-04-14 北京交通大学 Multi-domain DDoS attack detection method and device based on trusted federal learning

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021004551A1 (en) * 2019-09-26 2021-01-14 深圳前海微众银行股份有限公司 Method, apparatus, and device for optimization of vertically federated learning system, and a readable storage medium
WO2021232754A1 (en) * 2020-05-22 2021-11-25 深圳前海微众银行股份有限公司 Federated learning modeling method and device, and computer-readable storage medium
CN112580821A (en) * 2020-12-10 2021-03-30 深圳前海微众银行股份有限公司 Method, device and equipment for federated learning and storage medium
CN114841356A (en) * 2021-01-14 2022-08-02 新智数字科技有限公司 Internet of things-based joint learning engine overall architecture system
CN112949865A (en) * 2021-03-18 2021-06-11 之江实验室 Sigma protocol-based federal learning contribution degree evaluation method
CN113159327A (en) * 2021-03-25 2021-07-23 深圳前海微众银行股份有限公司 Model training method and device based on federal learning system, and electronic equipment
CN113570069A (en) * 2021-07-28 2021-10-29 神谱科技(上海)有限公司 Model evaluation method for self-adaptive starting model training based on safe federal learning
CN113591152A (en) * 2021-08-04 2021-11-02 神谱科技(上海)有限公司 LightGBM algorithm-based longitudinal federal modeling method
CN113591115A (en) * 2021-08-04 2021-11-02 神谱科技(上海)有限公司 Method for batch normalization in logistic regression model for safe federal learning
CN113722987A (en) * 2021-08-16 2021-11-30 京东科技控股股份有限公司 Federal learning model training method and device, electronic equipment and storage medium
CN113779608A (en) * 2021-09-17 2021-12-10 神谱科技(上海)有限公司 Data protection method based on WOE mask in multi-party longitudinal federal learning LightGBM training
CN114091356A (en) * 2022-01-18 2022-02-25 北京邮电大学 Method and device for federated learning
CN114389824A (en) * 2022-03-24 2022-04-22 湖南天河国云科技有限公司 Verification updating method and device of trusted computing trust chain based on block chain
CN114998251A (en) * 2022-05-30 2022-09-02 天津理工大学 Air multi-vision platform ground anomaly detection method based on federal learning

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
A Light-Weight Crowdsourcing Aggregation in Privacy-Preserving Federated Learning System;Ke Zhang 等;《2020 International Joint Conference on Neural Networks》;20200928;1-8 *
When Federated Learning Meets Blockchain:A New Distributed Learning Paradigm;Chuan Ma 等;《IEEE Computational Intelligence Magazine》;20220831;第17卷(第3期);26-33 *
可信车联网资源分配方法研究;赵宁;《中国优秀博士学位论文全文数据库 工程科技Ⅱ辑》;20220215(第2期);C034-163 *
基于区块链的隐私保护可信联邦学习模型;朱建明 等;《计算机学报》;20211215;第44卷(第12期);2464-2484 *
基于隐私保护的联邦推荐算法综述;张洪磊 等;《自动化学报》;20220722;第48卷(第9期);2142-2163 *

Also Published As

Publication number Publication date
CN115292738A (en) 2022-11-04

Similar Documents

Publication Publication Date Title
CN115292738B (en) Method for detecting security and correctness of federated learning model and data
CN109756485B (en) Electronic contract signing method, electronic contract signing device, computer equipment and storage medium
CN103973695A (en) Signature algorithm for server validation
CN106878007A (en) A kind of authorization method and system
CN112073440B (en) Internet of things information recording method and system
KR102011042B1 (en) Certificated quantum cryptosystem amd method
CN112600675B (en) Electronic voting method and device based on group signature, electronic equipment and storage medium
CN107508791A (en) A kind of terminal identity verification method and system based on distributed key encryption
CN113111124B (en) Block chain-based federal learning data auditing system and method
CN109474419A (en) A kind of living body portrait photo encryption and decryption method and encrypting and deciphering system
CN107809311A (en) The method and system that a kind of unsymmetrical key based on mark is signed and issued
CN112600669B (en) Cipher algorithm and conformity verification system
CN112291062B (en) Voting method and device based on block chain
CN108632042A (en) A kind of class AKA identity authorization systems and method based on pool of symmetric keys
CN108599926A (en) A kind of HTTP-Digest modified AKA identity authorization systems and method based on pool of symmetric keys
CN110597836A (en) Information query request response method and device based on block chain network
CN104753682B (en) A kind of generation system and method for session code key
CN115795518B (en) Block chain-based federal learning privacy protection method
CN114971796B (en) Bidding system based on cloud service platform
CN113468570A (en) Private data sharing method based on intelligent contract
CN112231769A (en) Block chain-based numerical verification method and device, computer equipment and medium
Shirvanian et al. On the pitfalls of end-to-end encrypted communications: A study of remote key-fingerprint verification
CN110266653A (en) A kind of method for authenticating, system and terminal device
CN115622686B (en) Detection method for safe multi-party calculation
CN108616350A (en) A kind of HTTP-Digest class AKA identity authorization systems and method based on pool of symmetric keys

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant