CN114338045B - Information data safe sharing method and system based on block chain and federal learning - Google Patents

Information data safe sharing method and system based on block chain and federal learning Download PDF

Info

Publication number
CN114338045B
CN114338045B CN202210040143.XA CN202210040143A CN114338045B CN 114338045 B CN114338045 B CN 114338045B CN 202210040143 A CN202210040143 A CN 202210040143A CN 114338045 B CN114338045 B CN 114338045B
Authority
CN
China
Prior art keywords
gradient
user
model
committee
verification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210040143.XA
Other languages
Chinese (zh)
Other versions
CN114338045A (en
Inventor
郭渊博
方晨
王一丰
马佳利
李勇飞
尹安琪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Information Engineering University of PLA Strategic Support Force
Original Assignee
Information Engineering University of PLA Strategic Support Force
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Information Engineering University of PLA Strategic Support Force filed Critical Information Engineering University of PLA Strategic Support Force
Priority to CN202210040143.XA priority Critical patent/CN114338045B/en
Publication of CN114338045A publication Critical patent/CN114338045A/en
Application granted granted Critical
Publication of CN114338045B publication Critical patent/CN114338045B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Storage Device Security (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the technical field of network security, and particularly relates to a safe information data sharing method and system based on blockchain and federal learning, wherein a user obtains a local intrusion detection model gradient by utilizing machine training based on local data, encrypts and compresses the local intrusion detection model gradient through a mask, and sends the local intrusion detection model gradient to adjacent blockchain nodes together with a digital signature; the block chain node performs signature verification on the gradient data, legal model gradients are put into a transaction pool, a leader in the block chain node is utilized to aggregate the legal gradients in the transaction pool, a user redundancy mask is added under the abnormal condition recovered by the backup committee to obtain a global gradient, and a new block is created to be fed back to the verification committee; the verification committee broadcasts the new block which passes the verification to the whole network; the user downloads the latest block and obtains global gradients therefrom to update the local intrusion detection model. According to the invention, the convergent information data sharing model is obtained through multiparty user training, and the model is deployed at a local end to perform network anomaly detection, so that the intrusion detection defensive performance is improved.

Description

Information data safe sharing method and system based on block chain and federal learning
Technical Field
The invention belongs to the technical field of network security, and particularly relates to a block chain and federal learning-based information data verifiable security sharing method and system.
Background
Along with frequent occurrence of network security events, intelligence data has become an important basis for detecting network information crimes, detecting intrusion behaviors and other anomalies. Training of artificial intelligence models to detect network anomalies using intelligence data has become an important means of constructing network security boundaries. The accuracy of the artificial intelligence model is closely related to the amount of training data. While the current sources of intelligence data available to each institutional user are limited to their own data collection channels, or spend a significant amount of money purchasing from third party institutions. Since the informative data may contain sensitive information about the user, and today where the data is a production element, the informative data is an important asset for each department. Therefore, most departments are not willing to share each other's informative data, which causes a serious data islanding phenomenon, and each department has difficulty in constructing an effective intrusion detection model because of insufficient informative data volume.
The federal learning is used as a distributed machine learning framework, the sharing of original data can be converted into the sharing of model parameters, a distrusting and decentralizing data transaction mode can be established among distributed users by a blockchain, the distrusting and decentralizing data transaction mode can be combined, the risk of data privacy disclosure can be reduced, the problems of single-point fault attack, trust deficiency and the like can be solved, and the whole data sharing process can be verified, traced and audited. In recent years, the combination of blockchain and federal learning is applied to data security sharing by a successor, but the following problems still exist: (1) The verifiability problem of data sharing results, the communication base station may typically be responsible for collecting model parameters uploaded by different users as blockchain nodes. Once some malicious base stations tamper the model parameters and put the model parameters into a transaction pool (namely tamper before data is uplink), the block chain can reach consensus on the wrong model parameters, and finally, the wrong data sharing model is obtained through joint modeling. (2) Shared data availability and privacy issues to enhance privacy protection when data is shared, existing literature is typically based on differential privacy and secure multiparty computing, but they reduce data availability and increase computing overhead, respectively. How to compromise data availability and privacy with less computational overhead still requires investigation. (3) The problem of large communication overhead is that the inherent training process of federal learning requires large communication overhead, and the message broadcasting mechanism in the blockchain can make the communication overhead after the two are combined larger, which limits the application of the system to the scene with limited bandwidth. Therefore, how to ensure verifiability of data sharing results is also a problem to be solved in a blockchain and federal learning-based data sharing method.
Disclosure of Invention
Therefore, the invention provides a safe and verifiable information data sharing method and system based on blockchain and federal learning, which acquire a converged information data sharing model through multiparty user training, deploy the sharing model on a user local end as an intrusion detection model to detect network anomalies, solve the problems of confidentiality, verifiability of results, high expenditure of privacy protection schemes and the like in the existing data sharing process, and provide effective technical means support for data circulation and sharing among different institutions.
According to the design scheme provided by the invention, the information data verifiable security sharing method based on blockchain and federal learning is used for joint modeling of multiparty users on an intrusion detection model in network security defense, and the joint modeling process comprises the following contents:
the trusted authority distributes public and private key pairs for each user and the block chain link point, and sends the public and private key pairs to the user and the block chain node through a secure channel, and each user shares own private key secretly to a backup committee, wherein the backup committee consists of a plurality of block chain link points;
the user obtains a local intrusion detection model gradient by utilizing machine training based on local data, encrypts the model gradient by adding a mask, compresses the encrypted model gradient, and sends the compressed and encrypted model gradient together with a digital signature to an associated adjacent blockchain node;
The block chain node performs signature verification on the uploaded model gradient data, places legal model gradients passing through the signature verification into a transaction pool, utilizes a leader elected in the block chain node to aggregate the legal gradients in the transaction pool, adds a redundant mask generated by a user under an abnormal condition recovered by a backup committee to obtain a global gradient, and sends the global gradient to the verification committee by creating a new block for recording the global gradient and other key parameters;
the verification committee performs correctness verification on the global gradient, and broadcasts new blocks passing verification to the whole network to achieve consensus; the user updates the local intrusion detection model by receiving the latest global gradient and downloading the global gradient from the latest block.
As the verifiable safe sharing method of information data based on blockchain and federal learning, the global gradient in the joint modeling is obtained iteratively by setting the model convergence condition in iteration rounds so as to update the local intrusion detection model of the user synchronously and iteratively, wherein the model convergence condition is the maximum iteration round.
As the verifiable safe sharing method of the information data based on the blockchain and the federal learning, the invention further stimulates the users and the blockchain nodes to participate in the joint modeling of the intrusion monitoring model by setting a reputation value for each user and each blockchain node, and elects the blockchain nodes to form a leader, a backup committee and a verification committee according to the reputation value, and the blacklist is utilized to manage and limit the joint participation authority of the users and the blockchain nodes with the reputation value smaller than a threshold in the joint modeling.
As the verifiable safe sharing method of information data based on blockchain and federal learning, in the encryption of model gradient data masks, aiming at private keys of users and public keys of other users, a Diffie-Hellman protocol is utilized to calculate a shared secret key between the users and the other users, the shared secret key is used as a seed of a random number generator to generate a random mask, and the local model gradient of the users is encrypted by using the random mask; the user adds a private key to the selected polynomial and constructs a polynomial commitment using verifiable secret sharing techniques by splitting the polynomial into n secret shares and sending the polynomial, a secret share witness to verify the polynomial to which the secret share belongs, and the polynomial commitment to a backup committee for restoration of a redundant random mask through key reconstruction in the event of user disconnection or signature legitimacy.
As the verifiable safe sharing method of the information data based on blockchain and federal learning, the invention further utilizes the CRT of the China remainder theorem to compress the encrypted model gradient, and the compression process comprises the following steps: first, add the user The dense model gradient is evenly divided into r segments, wherein,
Figure GDA0004241599630000032
l is the gradient length of the encrypted model, and k is a preset dividing length value; and then, solving a system of equations consisting of k congruence equations to compress the model gradient segment into an element corresponding to the segment, and acquiring a compression result of the whole model gradient through the element of the corresponding segment.
As the verifiable safe sharing method of information data based on blockchain and federal learning, the invention further utilizes a consistency hash protocol based on reputation values to draw a election leader aiming at blockchain nodes, and the election process comprises the following steps: setting a hash ring, distributing hash ring spaces corresponding to all the block chain nodes according to the credit value of the block chain nodes, performing hash calculation on an initial SHA-256 hash value of the current latest block, mapping the calculated hash value to the hash ring, and determining a block chain node leader for drawing and electing according to the hash ring space where the mapping result is located.
As the information data verifiable safe sharing method based on blockchain and federal learning, further, setting all user sets as U, setting the abnormal user set with illegal disconnection or signature as V, and then the process of the leader to aggregate all legal gradient data in a transaction pool is expressed as follows:
Figure GDA0004241599630000031
Wherein CRT indicates compression operation, ">
Figure GDA0004241599630000033
Encryption result, Δw, for model gradient mask of user i in transaction pool i ' is the result of user i model gradient compression.
As the verifiable safe sharing method of the information data based on the blockchain and the federal learning, the invention further aims at the abnormal users in the abnormal user set, firstly, the secret shares of the abnormal users are submitted by utilizing a plurality of blockchain link points in the backup committee, the correctness of the secret shares is verified, the polynomial and the private key of the abnormal users are recovered by utilizing the interpolation theorem, and then, the redundant random mask is calculated by utilizing the shared key between other users and the abnormal users, so that the global gradient is recovered.
As the verifiable safe sharing method of information data based on blockchain and federal learning, in the verification of correctness, whether the model gradient in a transaction pool is tampered or not is confirmed according to the addition homomorphism of polynomial promise, and aiming at the situation that the model gradient is not tampered, a new block created by a leader is considered legal, when a block chain node in a verification committee recognizes that the proportion of a verifier with the legal new block reaches a preset value, the verification is passed, and if the proportion of the verifier is smaller than the preset value, an invalid blank block is generated.
Furthermore, the invention also provides a system for authenticating safety sharing of information data based on blockchain and federal learning, which is used for joint modeling of multiparty users to an intrusion detection model in network safety defense, and comprises the following steps: the system comprises user nodes for participating in local model training in joint modeling, block chain nodes for carrying out consensus operation on local model training parameters of the user nodes, a trusted authority mechanism for distributing public and private key pairs for the user nodes and the block chain link points, and a backup committee and a verification committee formed by a plurality of block chain link points, wherein each user shares own private key secret to the backup committee so as to recover private key information of the user in abnormal situations, and a global gradient new block obtained by aggregation carries out correctness verification through the verification committee;
the user obtains a local intrusion detection model gradient by utilizing machine training based on local data, encrypts the model gradient by adding a mask, compresses the encrypted model gradient, and sends the compressed and encrypted model gradient together with a digital signature to an associated adjacent blockchain node;
the block chain node performs signature verification on the uploaded model gradient data, places legal model gradients passing through the signature verification into a transaction pool, utilizes a leader elected in the block chain node to aggregate the legal gradients in the transaction pool, adds a redundant mask generated by a user under an abnormal condition recovered by a backup committee to obtain a global gradient, and sends the global gradient to the verification committee by creating a new block for recording the global gradient and other key parameters;
The verification committee performs correctness verification on the global gradient, and broadcasts new blocks passing verification to the whole network to achieve consensus; the user updates the local intrusion detection model by receiving the latest global gradient and downloading the global gradient from the latest block.
The invention has the beneficial effects that:
according to the invention, the artificial intelligent model is trained in a joint modeling mode so as to be used for constructing an intrusion detection system, thereby not only reducing the risk of data privacy disclosure, but also solving the problems of single-point fault attack, trust deficiency and the like, realizing the verification, traceability and auditability of the whole flow of information data sharing, and being applicable to data sharing among a plurality of departments or institutions; the gradient is quickly encrypted by adding the mask, privacy attacks such as model reverse, model extraction and the like which occur recently can be resisted, and the mask counteracts 0 during gradient polymerization, so that the precision of the federal learning model is not influenced; and the gradient verification based on polynomial promise is integrated into the joint modeling consensus process, so that the tampering attack of malicious block chain nodes can be resisted, the problems of confidentiality, verifiability of results, high expenditure of privacy protection schemes and the like in the data sharing process can be solved, an effective technical means can be provided for data circulation and sharing among different departments or institutions, the local end intrusion detection performance and the network security defense effect are effectively improved, and the method has a good application prospect.
Description of the drawings:
FIG. 1 is a flowchart of a method for verifiable security sharing of information data based on blockchain and federal learning in an embodiment;
FIG. 2 is a schematic diagram of an information data verifiable secure sharing architecture in an embodiment;
FIG. 3 is a schematic diagram of a training process in a round of security sharing iterative training for information data verifiability in an embodiment;
FIG. 4 is a gradient masking and compression process illustration in an embodiment;
FIG. 5 is a consistent hash protocol illustration in an embodiment;
FIG. 6 is a schematic representation of a new chunk created by a leader in an embodiment;
fig. 7 is a graph showing the resistance to witch attacks at various backup committee scales in the examples.
The specific embodiment is as follows:
the present invention will be described in further detail with reference to the drawings and the technical scheme, in order to make the objects, technical schemes and advantages of the present invention more apparent.
In solving the data island problem in industry 4.0, a cognitive computing platform can be obtained by combining the decentralization of blockchain and federal learning, model parameters of a user are directly stored in the blockchain, and once an attacker or a malicious data sharing participant obtains the parameters, the information of the original data of the user can be deduced through model reverse attack. And encrypting the model parameters of the user by using the Paillier algorithm, uploading the model parameters to the blockchain, and after the model is updated, finishing decryption by cooperation of part of users, thereby consuming a great deal of calculation overhead and communication overhead. Aiming at the data security sharing requirement in the industrial Internet scene, the local differential privacy technology can be used for extracting and sharing the characteristics after adding noise on the original data, so that privacy stealing attacks can be prevented, but partial data utility can be lost. Therefore, the homomorphic encryption technology and the differential privacy technology are used for enhancing the data security, so that a certain defect exists, and the research is still needed on how to consider the usability and the privacy of the data with smaller calculation cost and communication cost. In addition, because verifiability of the data sharing result is not considered, the base station can be used as a blockchain node to collect model parameters uploaded by users in different areas in general; if some malicious base stations tamper the model parameters and put the model parameters into a transaction pool (namely tamper before data is uplinked), the block chain can reach consensus on the wrong model parameters, and finally, an incorrect data sharing model is obtained through joint modeling, so that the actual application of data sharing is affected. Therefore, in an embodiment of the present invention, a verifiable security sharing method for information data based on blockchain and federal learning is provided, which is used for joint modeling of multi-party users on an intrusion detection model in network security defense, and as shown in fig. 1, the joint modeling process includes the following contents:
S101, a trusted authority allocates public and private key pairs for each user and block chain link points, and sends the public and private key pairs to the user and the block chain nodes through a secure channel, and each user shares own private key secretly to a backup committee, wherein the backup committee consists of a plurality of block chain link points;
s102, a user acquires a local intrusion detection model gradient by utilizing machine training based on local data, encrypts the model gradient by adding a mask, compresses the encrypted model gradient, and sends the compressed and encrypted model gradient together with a digital signature to an associated adjacent blockchain node;
s103, signature verification is carried out on the uploaded model gradient data by the block chain node, legal model gradients passing through the signature verification are put into a transaction pool, legal gradients in the transaction pool are aggregated by utilizing a leader selected from block chain link points, a redundancy mask generated by a user under an abnormal condition recovered by a backup committee is added, a global gradient is obtained, a new block used for recording the global gradient and other key parameters is created, and the global gradient is sent to the verification committee by utilizing the new block, wherein the verification committee consists of a plurality of block chain link points;
S104, the verification committee performs correctness verification on the global gradient, and broadcasts new blocks passing verification to the whole network to achieve consensus; the user updates the local intrusion detection model by receiving the latest global gradient and downloading the global gradient from the latest block.
In the embodiment of the present disclosure, each user participating in information data sharing converts a data sharing problem into a model gradient sharing problem through federal training of a local end, and adds a mask on a gradient to realize rapid encryption. To reduce communication overhead, a user may compress the encryption gradient before uploading to the associated blockchain node. And then, carrying out aggregation calculation on all the effective gradients in the block chain to obtain a global gradient, verifying the correctness of the global gradient, and generating legal blocks to achieve consensus in the whole network. And finally, downloading the generated new block from the block chain by each user, acquiring the global gradient from the new block, and updating the local model. The user can refer to departments, local enterprises, organization organizations and the like participating in the sharing of the information data, has limited information data and computing capacity, hopes to combine and model the user's own information data with other users while keeping the information data locally, and obtains a more accurate abnormality detection model to construct the network protection system. In the embodiment of the present disclosure, it may be assumed that users participating in data sharing are semi-honest, i.e. they may honest perform the protocol, but may use their own information to infer the intelligence data of other users. The blockchain node is generally equipped with certain computing resources and communication resources, such as a communication base station, a server and the like, and is responsible for operations of verification, aggregation, consensus and the like of parameters, and can assume that part of blockchain nodes can be captured by an attacker, tampered with data uploaded by a user and put into a transaction pool, and can also provide false secret shares in a secret reconstruction stage. The transaction is used for data recording of interactions between blockchain nodes, in this embodiment, the gradient of the transaction record model and related training information. The block chain and federal learning are combined to be applied to information data sharing, so that the problems of confidentiality, verifiability of results and high expenditure of privacy protection schemes in the data sharing process are solved, and an effective technical means is provided for communicating and sharing information data among different departments.
As the verifiable safe sharing method of information data based on blockchain and federal learning in the embodiment of the invention, further, the global gradient in the joint modeling is obtained in an iteration mode by setting the model convergence condition in the iteration round, so as to update the local intrusion detection model of the user in a synchronous iteration mode, wherein the model convergence condition is the maximum iteration round. Further, the user and the block chain node are stimulated to participate in joint modeling of the intrusion monitoring model by setting a reputation value for each user and the block chain node, and the block chain node is elected according to the reputation value to form a leader, a backup committee and a verification committee, and the blacklist is utilized to manage and limit joint participation authority of the user and the block chain node with the reputation value smaller than a threshold in the joint modeling.
In this embodiment, each user and blockchain node involved in data sharing may be assigned an initialization reputation value. For a user, if the user participates in data sharing online in the whole process and the uploaded gradient signature is verified to be legal, the reputation value of the user is increased, otherwise, the reputation value is reduced; for a blockchain node, its reputation value increases if it provides a correct secret share, generates a legitimate new block, or participates in new block verification, and decreases if it provides a false secret share. When reputation values decrease to 0, they are blacklisted and are not allowed to participate in the intelligence data sharing. It can be assumed that at any time at least 70% of the reputation values in the system are mastered by the honest to ensure proper operation of the blockchain consensus protocol. Referring to fig. 2, it is assumed that the architecture is composed of a blockchain and m distributed users, where the blockchain is maintained by a plurality of nodes equipped with certain computing and communication resources, and in practical application, the blockchain nodes may be base stations equipped with servers, etc. The users may be departments, institutions, enterprises, etc. with limited computing and communication capabilities, which hold a local intelligence data set D i (i is more than or equal to 1 is less than or equal to m), mapping the original information data into a model gradient based on machine learning training, uploading the model gradient to the associated blockchain nodes through a wired or wireless network, and completing federal learning under the coordination of blockchains, thereby achieving the purpose of information data sharing.
As the verifiable safe sharing method of information data based on blockchain and federal learning in the embodiment of the invention, further, in the encryption of model gradient data masks, aiming at private keys of users and public keys of other users, a Diffie-Hellman protocol is utilized to calculate a shared key between the users and the other users, the shared key is used as a seed of a random number generator to generate a random mask, and the local model gradient of the users is encrypted by using the random mask; the user adds the private key to the selected polynomial and constructs a polynomial commitment using verifiable secret sharing techniques by splitting the polynomial into n secret shares and sending the polynomial, secret share witness and polynomial commitment to the backup committee for use in the followingAnd recovering a redundant random mask through key reconstruction when the user is disconnected or the signature is illegal, wherein the secret share witness is used for verifying a commitment polynomial to which the secret share belongs. Further, the encrypted model gradient is compressed by using the China remainder theorem CRT, and the compression process comprises the following steps: firstly, the model gradient encrypted by the user is uniformly divided into r segments, wherein,
Figure GDA00042415996300000710
l is the gradient length of the encrypted model, and k is a preset dividing length value; and then, solving a system of equations consisting of k congruence equations to compress the model gradient segment into an element corresponding to the segment, and acquiring a compression result of the whole model gradient through the element of the corresponding segment.
Cryptographic commitments are an important class of cryptographic primitives that generally involve a committee and an verifier. In the promise generation stage, a promise party selects a message m, calculates promise c in a ciphertext form, and then sends the promise c to a receiver, wherein the promise party cannot change m at the moment; in the promise revealing stage, the promise party publishes the plaintext message m and the secret key, the validation party calculates the promise c 'corresponding to m in the same way, if c' =c, the validation is passed, otherwise the validation is failed. The commitment protocol has the following characteristics: (1) concealment: the commitment value c does not reveal any information about message m; (2) binding: the committee cannot open commitment c to a message other than m and verify passing. In view of the above characteristics, a commitment protocol may be used to ensure the uniqueness of the ciphertext-form private data interpretation.
Polynomial commitment is a commitment protocol that satisfies the additive homomorphic encryption property, often used to construct zero knowledge proof, verifiable secret sharing, and the like. The process of constructing a verifiable secret share (Verifiable Secret Sharing, VSS) can be described as follows:
(1) Initializing Setup (1) κ Let t) assume
Figure GDA0004241599630000071
And->
Figure GDA0004241599630000072
Is a group with order prime number p, g is +.>
Figure GDA0004241599630000073
Is used for generating the generation element of (a),
Figure GDA0004241599630000074
symmetric bilinear pair mapping to satisfy the t-strong Diffie-hellman (t-SDH) assumption. Selection of
Figure GDA0004241599630000075
As private key SK, the public key is +.>
Figure GDA0004241599630000076
(2) Commitment to Committment (PK, φ (x)): for a t-th order polynomial
Figure GDA0004241599630000077
Its promise can be calculated as:
Figure GDA0004241599630000078
(3) The promise reveals VerifyPoly (PK, COMM (φ (x)), φ (x)): given a polynomial
Figure GDA0004241599630000079
A commitment value COMM. If->
Figure GDA0004241599630000081
It proves that the commitment was indeed generated by the polynomial phi (x) and not otherwise.
(4) Secret distribution CreateWitness (PK, phi (x), i) to perform (n, t) -secret sharing among n users, a secret share sent to user i (1. Ltoreq.i. Ltoreq.n)<i,φ(i),w i >Function value phi (i) at index i, containing polynomial phi (x), and witness w i =COMM(ψ i (x) A kind of electronic device. Wherein the method comprises the steps of
Figure GDA0004241599630000082
COMM(ψ i (x) The calculation method of (2) is the same as that of the formula (1).
(5) Secret verification VerifyEval (PK, COMM (x)), i, phi (i), w i ) If the formula () is true, user i's secret share<i,φ(i),w i >From the committed polynomial COMM (x)), otherwise not.
Figure GDA0004241599630000083
(6) Secret reconstruction Recover (i, f (i)): any t+1 or more users present their secret shares < i, φ (i) >, which are verified, and then Recover the original polynomial φ (x) using the interpolation theorem.
In addition, polynomial commitments also satisfy addition homomorphism:
COMM(φ 1 (x)+φ 2 (x))=COMM(φ 1 (x))*COMM(φ 2 (x)) (3)
the chinese remainder theorem (Chinese Remainder Theorem, CRT) is one method of solving a system of linear congruence equations. Let m be 1 ,m 2 ,L,m k Is a positive integer and is mutually equal to each other, let M=m 1 ·m 2 L m k The following system of equations is in the finite field
Figure GDA0004241599630000087
There is only one solution inside:
Figure GDA0004241599630000084
solution to
Figure GDA0004241599630000085
Wherein M is i =M/m i ,/>
Figure GDA0004241599630000088
Is a finite field->
Figure GDA0004241599630000086
Inner M i Is the inverse of (a).
Assuming that all users have registered in the system and are assigned respective public and private keys and an ordered number ID, see fig. 3, the steps in a round of training may be designed to include: before training begins, each user uses verifiable secret sharing (Verifiable Secret Sharing, VSS) to share its own private key secret to a backup committee consisting of several block link points to prevent user disconnection during subsequent training from affecting the normal training process (step 0). In formal training, each user iterates locally to get a model gradient (step 1) and adds a mask to prevent privacy leakage (step 2). To save communication overhead, users use Chinese remainder theorem (Chinese Remainder Theorem, CRT) to compress encryption gradients and send them to neighboring blockchain nodes (step 3 and step 4) along with the promised value of the original gradient. After verifying the data signature, the node puts legal gradients into a transaction pool, stops data after a specified time, and elects a leader to execute the next gradient aggregation (step 5). If the gradients of all users in the transaction pool are in the same, the leader directly adds the gradients to obtain a global gradient; if the gradient of some users is missing in the transaction pool (i.e., the user is dropped or the signature is verified as illegal), the leader calculates the global gradient under the secret share provided by the backup committee (step 6). The leader then creates a new chunk packing related gradient information and sends the chunk to the committee for verification and broadcasting (step 7). Finally, the user downloads the latest global gradient update local model from the blockchain (step 8). If the user is disconnected or the signature verification fails in the training of the round, the next training round is to allocate new private keys to the users and execute step0-step8, otherwise, step1-step8 is executed. The iteration is repeated until the model converges or the maximum training round number is reached. Note that the reputation values of users, leaders, and backup committee members identified as legitimate in each round of training will increase to motivate them to make greater contributions to the data sharing system.
In the initialization stage (step 0) before training, the trusted authority generates public and private key pairs for all users and block link points, and other public information is stored in an creation block (namely the first block in the block chain) and is sent to all participants by the trusted authority through a secure channel to execute an initialization task. Wherein, the creation block mainly comprises the following contents:
a) Model initialization parameter w 0 Learning rate eta, total training round number T
b) Public key PK for generating polynomial commitments
c) Positive integer m of k pairwise intersubstances 1 ,m 2 ,L,m k
d) PRG (,) when its input is made from l sources
Figure GDA0004241599630000091
When the element composition of (2) is uniform and random, it can output a random distribution of [0, R) l Spatially pseudo-random numbers
e) Initial random seed 0 Wherein the seed parameter seed of the ith training round i Based on seed of the previous round i-1 Generated, it is mainly used for ensuring randomness of leader election
f) Initial reputation values for all users and blockchain nodes and reputation update functions
In addition, considering the possible occurrence of partial user disconnection in the training process, all users adopt VSS to split their private keys into a plurality of secret shares and send the secret shares to a backup committee before formal training starts.
A local training phase (step 1-step 4) in which each user obtains a model gradient Deltaw based on local intelligence data in each round of training i 1.ltoreq.i.ltoreq.m, and then Δw is added by adding a mask i Encryption is
Figure GDA0004241599630000092
To enhance privacy protection. In order to reduce the communication overhead, in the present embodiment, the gradient of encryption can be compressed using China remainder theorem CRT>
Figure GDA0004241599630000093
Assuming Δwi=l, user i (1.ltoreq.i.ltoreq.m) will first be +.>
Figure GDA0004241599630000094
Evenly divide into->
Figure GDA0004241599630000097
Fragments, i.e.)>
Figure GDA0004241599630000095
Wherein the symbol->
Figure GDA0004241599630000098
Representing an upward rounding. If l is not divisible by k, then 0 is used for filling. Let j' th fragment be +.>
Figure GDA0004241599630000096
User i (1. Ltoreq.i.ltoreq.m) solves the following set of congruence equations:
Figure GDA0004241599630000101
according to the Chinese remainder theorem, the above equation set has a unique solution
Figure GDA0004241599630000102
It can be seen that each gradient vector segment of length k +.>
Figure GDA0004241599630000103
Compressed into an element Deltaw by CRT ij Then the whole gradient vector
Figure GDA0004241599630000104
Can be compressed into +.>
Figure GDA0004241599630000105
The length becomes 1/k of the original length. The entire gradient masking and compression process may be as shown in fig. 4. User i (i is more than or equal to 1 and less than or equal to m) calculates original gradient deltaw i Commitment value COMM (aw) i ) And will<Δw i ′,COMM(Δw i )>Along with the digital signature, to the associated blockchain node.
And an aggregation stage (step 5-step 6) for firstly checking whether the signature is legal or not after the blockchain node receives the data uploaded by the user. If it is legal, the data is put into a transaction pool. After a certain time, all nodes stop receiving data and then compete to become a leader to acquire the right of generating a new block. In this embodiment, a hash protocol based on consistency of reputation values may be used as a lottery algorithm to elect a leader, and the process is shown in fig. 5, specifically, by giving a hash ring, the space is allocated to each blockchain node proportionally according to the size of reputation values. And repeatedly performing hash calculation on the initial SHA-256 hash value of the current latest block, and mapping the hash value obtained by each calculation to a hash ring, so that the block chain node corresponding to the space where the mapping result is located is selected. Note that in this embodiment, the leader of the current training round can be selected by repeating the hash calculation for 1 time, and the backup committee and the verification committee, which are composed of several nodes, need to perform hash calculations for multiple times to select the blockchain node members in the committee. It follows that the above-described drawing process is similar to the algornd protocol: the probability that a blockchain node is pumped is proportional to its reputation value. Let U represent the set of all users, V represent the abnormal user set that the line is dropped or signature is illegal. The selected leader aggregates all user gradients in the transaction pool according to equation (6):
Figure GDA0004241599630000106
For two data compressed by CRT (as in equation (5))
Figure GDA0004241599630000107
Figure GDA0004241599630000108
The method can be calculated to obtain:
Figure GDA0004241599630000109
the formula shows that the CRT satisfies the additive homomorphism. Based on this property, equation (6) can be transformed into:
Figure GDA00042415996300001010
the leader then computes the sum of the coefficients by the modulo operation in equation (9)
Figure GDA0004241599630000111
Decompression is +.>
Figure GDA0004241599630000112
Then calculating to obtain global gradient Deltaw g
Figure GDA0004241599630000113
Further, in this embodiment, for an abnormal user in the abnormal user set, first, a plurality of block link points in the backup committee are used to submit the secret shares of the abnormal user and verify the correctness of the secret shares, the polynomial and the private key of the abnormal user are recovered by using the interpolation theorem, and then, the redundant random mask is calculated by using the shared key between other users and the abnormal user, so as to recover the global gradient. Further, whether the model gradient in the transaction pool is tampered is confirmed according to the addition homomorphism of polynomial promise, and aiming at the situation that the model gradient is not tampered, a new block created by a leader is confirmed to be legal, when the block chain node in the verification committee confirms that the verifier proportion of the new block is legal reaches a preset value, verification is passed, and if the verifier proportion is smaller than the preset value, an invalid empty block is generated.
In the block generation and broadcasting phase, the leader creates a new block and broadcasts it to the validation committee for validation. As shown in FIG. 6, the blocks in the present embodiment are composed of A block header and a block body, wherein the block header includes meta information of a block and a pointer (i.e., a hash value) to a previous block; the tile body contains a series of transaction information. Unlike conventional blockchains, storing relevant training parameters as transactions in the present embodiment may include: (1) Random seed parameter seed for the next training round t+1 (2) proof generated when the leader is elected in the polymerization stage, (3) global gradient Δw for this round g (4) promise values for legal user gradients. Therefore, key parameter information in the whole training process is recorded in the blockchain in a tamper-proof mode, and compared with a traditional federal learning algorithm, the training process of the algorithm has auditability.
In the prior art, local gradient plaintext of all users is directly stored in a block, and once an attacker or a semi-honest user acquires gradients of other users, privacy attacks such as model reversal, model extraction and the like can be initiated. Therefore, in the embodiment of the present disclosure, only the promised value of the gradient can be stored in the block, so that not only the privacy information of the gradient can be protected, but also the correctness of the global gradient obtained by each round of training can be ensured. Specifically, after the new block generated by the leader is broadcast to the validation committee, all validators calculate whether formula (10) holds.
Figure GDA0004241599630000114
If so, the user gradient in the transaction pool can be determined to be not tampered according to the addition homomorphism of the polynomial promise, and the new block is legal. Otherwise, it is indicated that some block chain links tamper the collected user gradient and then put into the transaction pool, so that the global gradient calculation is wrong and the new block is illegal. When the verifier exceeding 2/3 confirms that the new area block is legal, the new area block is broadcast to the whole network to reach consensus through verification, otherwise, an invalid empty area block is generated.
And a block generation and broadcasting stage (step 7), wherein the leader creates a new block and broadcasts the new block to the verification committee for verification, and the election method of the verification committee is consistent with the lottery algorithm based on the consistency hash protocol, so that the description is omitted. If the verification is successful, the verification committee broadcasts the block to all block chain nodes of the whole network through a gossip protocol to achieve consensus. Otherwise, an invalid empty block is created.
And a model updating stage (step 8) for downloading the latest blocks from the associated block chain nodes by the user, and updating the local model after acquiring the global gradient from the latest blocks. If an abnormal user occurs in the training of the round (namely, the line is disconnected or the signature is illegal), the leader restores the private key of the user during gradient aggregation, so that a trusted authority needs to allocate a new public and private key to the user before the next training round and re-execute the secret sharing step. After each round of training is finished, the credit value of the gradient signature is verified as legal users which participate in data sharing online in the whole process, or else, the credit value of the gradient signature is reduced; the reputation value increases for those blockchain nodes that generate legitimate new blocks or participate in new block verification. When reputation values decrease to 0, they are blacklisted and are not allowed to participate in the intelligence data sharing.
The operation calculation process of secret sharing of the private key of the user, encryption of the model gradient mask and aggregation calculation of the global gradient can be described as follows:
gradient mask: assume that each user has acquired the public key pk of the other user i If i is E U, the Diffie-Hellman protocol is run to calculate the shared key s between each user pair i,j ←KA.agree(sk i ,pk j ) And generates a random mask using the key as a seed for the random number generator. Assume that each user has now obtained a gradient aw of length l through local training i I is equal to or greater than 1 and is equal to or less than m, and for simplicity, the vector Deltaw is assumed i All elements in are in the domain
Figure GDA0004241599630000121
In (3), the gradient Deltaw i Can be encrypted as->
Figure GDA0004241599630000122
As shown in the following formula.
Figure GDA0004241599630000123
As can be seen from the formula (11), the user can encrypt only by adding a random number to the gradient, and when all users encrypt the gradient
Figure GDA0004241599630000124
After addition, the random numbers are partially offset to be 0, and the global gradient can be directly obtained
Figure GDA0004241599630000125
Compared with homomorphic encryption algorithm adopted in the prior art, the encryption mode has higher efficiency and does not lose data utility. However, once some users drop the line or the signature is verified as illegal, the random number cannot be counteracted to 0 when the residual gradients are added, and the global gradient cannot be obtained. Therefore, the private key of the user needs to be backed up in a secret way, and when the user is disconnected or the signature is illegal, the redundant random number can be calculated by using the backed-up private key, so that the global gradient is obtained. Based on this idea, the user private key is secret shared to other local users. Considering that the key reconstruction requires larger calculation and communication overhead, and the calculation and communication resources of the blockchain node are far larger than those of the local user, in the embodiment of the present disclosure, the private key of the user can be shared to the backup committee consisting of a plurality of blockchain nodes through the VSS secret.
Private key sharing: assume that the backup committee consists of n block chain points (election is implemented by a lottery algorithm based on a consistent hash protocol). User i (1.ltoreq.i.ltoreq.m) first selects a polynomial phi i (x) Its private key sk i Is set to phi i (x) Constant term of (phi) i (0)=sk i A commitment COMM (phi) is then made to the polynomial i (x) A kind of electronic device. Next, the polynomial φ is applied using verifiable secret sharing techniques i (x) Split into n secret shares {<k,φ i (k)>1 is equal to or greater than k is equal to or less than n, and will<k,φ i (k),w i,k ,COMM(φ i (x))>And the data is sent to a block chain node k (k is more than or equal to 1 and less than or equal to n). Wherein the method comprises the steps of
Figure GDA0004241599630000131
For witness of a secret share, it can be used to verify that the secret share does belong to COMM (phi) i (x) Polynomial phi) of commitments in i (x) This prevents a part of the malicious blockchain nodes from providing spurious shares during key reconstruction.
Gradient polymerization: assuming that the leader has been decompressed by equation (9)
Figure GDA0004241599630000132
If all users' gradients are uploaded to the blockchain and the signature is legal (i.e., +.in equation (9)>
Figure GDA0004241599630000135
) The leader directly gets the global gradient through a simple addition operation as shown in the following formula: />
Figure GDA0004241599630000133
If part of the users are disconnected or the signature is verified as illegal (the part of the abnormal users are recorded as a set V), the secret shares of the abnormal users i epsilon V are submitted by more than t block chain link points in the backup committee <k,φ i (k),w i,k ,COMM(φ i (x))>After verifying the correctness of the secret share, the polynomial phi is recovered by interpolation theorem i (x) Private key sk i I is V. Then calculating the shared secret key s between other users and abnormal users i,m =KA.agree(sk i ,pk m ) And i epsilon V and m epsilon U-V, and finally obtaining the global gradient through the calculation of the following formula.
Figure GDA0004241599630000134
In the embodiment of the scheme, the private key secret of the user is shared to the backup committee in the initialization stage, then a mask is added to the original user gradient in the local training stage, finally the global gradient is calculated in the aggregation stage, the three operations are closely related, gradient security aggregation under the two conditions that the user is disconnected or not disconnected is jointly realized, the method can be suitable for information data sharing among a plurality of departments, institutions and organizations, the risk of data privacy leakage is reduced, the problems of single-point fault attack, trust deficiency and the like are solved, and the network intrusion detection defensive performance is improved.
Furthermore, based on the method, the invention also provides an information data verifiable security sharing system based on blockchain and federal learning, which is used for joint modeling of multiparty users on an intrusion detection model in network security defense, and comprises the following steps: the system comprises user nodes for participating in local model training in joint modeling, block chain nodes for carrying out consensus operation on local model training parameters of the user nodes, a trusted authority mechanism for distributing public and private key pairs for the user nodes and the block chain link points, and a backup committee and a verification committee formed by a plurality of block chain link points, wherein each user shares own private key secret to the backup committee so as to recover private key information of the user in abnormal situations, and a global gradient new block obtained by aggregation carries out correctness verification through the verification committee;
The user obtains a local intrusion detection model gradient by utilizing machine training based on local data, encrypts the model gradient by adding a mask, compresses the encrypted model gradient, and sends the compressed and encrypted model gradient together with a digital signature to an associated adjacent blockchain node;
the block chain node performs signature verification on the uploaded model gradient data, places legal model gradients passing through the signature verification into a transaction pool, utilizes a leader elected in the block chain node to aggregate the legal gradients in the transaction pool, adds a redundant mask generated by a user under an abnormal condition recovered by a backup committee to obtain a global gradient, and sends the global gradient to the verification committee by creating a new block for recording the global gradient and other key parameters;
the verification committee performs correctness verification on the global gradient, and broadcasts new blocks passing verification to the whole network to achieve consensus; the user updates the local intrusion detection model by updating the blocks and obtaining global gradients from them.
To verify the validity of this protocol, the following is further explained in connection with experimental data:
privacy analysis: if the local gradient of one user is added with a pair of uniform random masks (as shown in equation (11)) and these masks cancel each other out to 0 when all user gradients are added, then the masked user gradients can be considered uniform random, i.e., representing that the pair of masks can protect the gradient privacy of a single user.
Theorem 1: given m, l, R, U, { Δw i } i∈U Where m is the number of users and l is the user gradient Δw i U represents the set of all users. Let us assume the gradient aw of all users i I epsilon U all satisfy
Figure GDA0004241599630000141
Then->
Figure GDA0004241599630000142
Wherein the symbol "≡" indicates that both are equally distributed.
And (3) proving: the theorem was demonstrated using induction.
(1) When m= |u|=1,
Figure GDA0004241599630000143
since it has been assumed that
Figure GDA0004241599630000144
There is->
Figure GDA0004241599630000145
In addition, a->
Figure GDA0004241599630000146
So when m=1, formula (14) becomesStanding.
(2) When m= |u|=k, if the theorem 1 is assumed to be true, then according to equation (14), it is obtained
Figure GDA0004241599630000147
Figure GDA0004241599630000148
(3) When m=k+1, a new set U' =u { k+1} is defined, then there is
Figure GDA0004241599630000151
In the formula (16) already
Figure GDA0004241599630000152
Then formula (17) can be written as
Figure GDA0004241599630000153
Since it has been assumed that
Figure GDA00042415996300001512
Then get->
Figure GDA0004241599630000154
Thus formula (18) can be written as
Figure GDA0004241599630000155
On the other hand, in the other hand,
Figure GDA0004241599630000156
according to formula (15), there is
Figure GDA0004241599630000157
And has been assumed +.>
Figure GDA0004241599630000158
Can be inferred to
Figure GDA0004241599630000159
/>
In addition, since it has been assumed that
Figure GDA00042415996300001513
Can be inferred to
Figure GDA00042415996300001510
Based on equations (21) and (22), equation (20) can be written as:
Figure GDA00042415996300001511
by combining equations (19) and (23), it can be inferred that when m=k+1, theorem 1 holds. And (5) finishing the verification.
Resistance to witch attack: in the private key sharing operation calculation, in order to support the user to drop the line, the private key secret of the user is shared to the backup committee. However, if an attacker can control more than t blockchain nodes in the backup committee through the witches attack, he can generate enough false secret shares and witnessed to pass the verification of VSS and eventually reconstruct a false private key, thus disrupting the gradient aggregation operation process. By analyzing what the minimum value of the threshold t in VSS is if the probability of an attacker breaking the secret reconstruction is to be limited below a certain value.
Given that the backup committee is derived by a sampling algorithm based on a consistent hash protocol, the probability of each blockchain node being elected is proportional to its reputation value. Thus, the probability p of more than t blockchain nodes in the attacker control backup committee can be calculated as:
Figure GDA0004241599630000161
where n is the number of members in the backup committee and s is the ratio of malicious credits controlled by an attacker. Since the present invention assumes that at least 70% of the reputation values in the system are mastered by honest, s=0.3. According to equation (24), the probability p is assumed to satisfy the binomial distribution. However, binomial distribution is resampling, and each block link point can only be elected once in this scheme, so p can be regarded as the upper bound of probability of more than t nodes in the attacker control backup committee. By the exhaustion method, the minimum value of the threshold t in VSS when p is smaller than a certain probability under different backup committee scales is calculated, and as shown in FIG. 7, the minimum value of the threshold t in VSS when p is smaller than 0.01,0.05 and 0.001 is shown. P can be limited under different probabilities according to actual training conditions so as to ensure the safety of the method. For example, the number of training rounds on the intelligence dataset for this solution is typically within 100 times, then p should be less than 0.01. When the number of nodes in the backup committee is 10, the minimum value of the threshold value tset is 8, so that the attacker can be guaranteed to break the key reconstruction process of VSS through the witch attack with high probability.
Based on the experimental data, the scheme can further explain that the security aggregation based on the gradient mask and the verifiable secret sharing not only strengthens the privacy security during the sharing of the information data, but also does not lose the utility of the data; based on the binding and hiding property of polynomial promise, new block chain structure is utilized to integrate federal learning model verification into the consensus process, so that the tampering attack of malicious block chain nodes can be resisted; communication overhead can be effectively reduced through gradient compression, and application in actual scenes is facilitated.
The relative steps, numerical expressions and numerical values of the components and steps set forth in these embodiments do not limit the scope of the present invention unless it is specifically stated otherwise.
Based on the above method and/or system, the embodiment of the present invention further provides a server, including: one or more processors; and a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the method described above.
Based on the above-described method and/or system, embodiments of the present invention also provide a computer-readable medium having a computer program stored thereon, wherein the program, when executed by a processor, implements the above-described method.
Any particular values in all examples shown and described herein are to be construed as merely illustrative and not a limitation, and thus other examples of exemplary embodiments may have different values.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.
Finally, it should be noted that: the above examples are only specific embodiments of the present invention, and are not intended to limit the scope of the present invention, but it should be understood by those skilled in the art that the present invention is not limited thereto, and that the present invention is described in detail with reference to the foregoing examples: any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments, or perform equivalent substitution of some of the technical features, while remaining within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (6)

1. A safe sharing method of information data based on block chain and federal learning is used for joint modeling of multiparty users to an intrusion detection model in network security defense, and is characterized in that the joint modeling process comprises the following contents:
the trusted authority distributes public and private key pairs for each user and the block chain link point, and sends the public and private key pairs to the user and the block chain node through a secure channel, and each user shares own private key secretly to a backup committee so as to recover private key data under the abnormal situation of the user, wherein the backup committee consists of a plurality of block chain link points;
the user obtains a local intrusion detection model gradient by utilizing machine training based on local data, encrypts the model gradient by adding a mask, compresses the encrypted model gradient, and sends the compressed and encrypted model gradient together with a digital signature to an associated adjacent blockchain node; in the encryption of the model gradient data mask, aiming at private keys of users and public keys of other users, a Diffie-Hellman protocol is utilized to calculate a shared key between the users and the other users, the shared key is used as a seed of a random number generator to generate a random mask, and the random mask is utilized to encrypt the local model gradient of the users; the user adds a private key into a selected polynomial by using a verifiable secret sharing technology and constructs a polynomial commitment, the polynomial is split into n secret shares, the polynomial, a secret share witness and the polynomial commitment are sent to a backup committee, and the backup committee restores a redundant random mask through key reconstruction when the user drops a line or the signature is illegal in an abnormal situation, wherein the secret share witness is used for verifying the commitment polynomial to which the secret share belongs; compressing the encrypted model gradient by using a China remainder theorem CRT, wherein the compression process comprises the following steps: firstly, the model gradient encrypted by the user is uniformly divided into r segments, wherein,
Figure FDA0004241599620000013
l is the gradient length of the encrypted model, and k is a preset dividing length value; then, solving an equation set by using k congruence equations to compress the model gradient segment into an element corresponding to the segment, and acquiring a compression result of the whole model gradient through the element of the corresponding segment;
blockchain node pair uploadSignature verification is carried out on model gradient data of the transaction pool, legal model gradients passing through the signature verification are put into the transaction pool, legal gradients in the transaction pool are aggregated by utilizing elected leaders in block chain link points, redundancy masks generated by users under abnormal conditions are restored by a backup committee, global gradients are obtained, and the global gradients are sent to the verification committee by utilizing new blocks by creating new blocks for recording the global gradients and other key parameters, wherein the verification committee consists of a plurality of block chain link points; for the blockchain node, a consistency hash protocol based on a reputation value is utilized to draw a election leader, and the election process specifically comprises the following steps: setting a hash ring, distributing hash ring spaces corresponding to all the block chain nodes according to the credit value of the block chain nodes, performing hash calculation on an initial SHA-256 hash value of the current latest block, mapping the calculated hash value to the hash ring, and determining a block chain node leader for drawing and electing according to the hash ring space where the mapping result is located; setting all user sets as U, and setting the abnormal user set as V under the condition of illegal disconnection or signature abnormality, and then the legal gradient data process in the leader aggregate transaction pool is expressed as follows:
Figure FDA0004241599620000011
Wherein CRT indicates compression operation, ">
Figure FDA0004241599620000012
Encryption result, Δw, for model gradient mask of user i in transaction pool i Gradient compression results are obtained for the user i model;
the verification committee performs correctness verification on the global gradient, and broadcasts new blocks passing verification to the whole network to achieve consensus; the user updates the local intrusion detection model by receiving the latest global gradient and by downloading the latest block and retrieving the global gradient therefrom.
2. The safe information data sharing method based on blockchain and federal learning according to claim 1, wherein a global gradient in the joint modeling is iteratively obtained by setting a model convergence condition in an iteration round to update a user local intrusion detection model in a synchronous iteration, wherein the model convergence condition is a maximum iteration round.
3. The blockchain and federal learning-based intelligence data security sharing method of claim 1 or 2, wherein the user and blockchain nodes are motivated to participate in joint modeling of the intrusion detection model by setting a reputation value for each user and blockchain node and electing blockchain nodes to form a leader, a backup committee and a validation committee according to the reputation value, and the blacklist is utilized to manage and limit joint participation rights of users and blockchain nodes with reputation values smaller than a threshold in the joint modeling.
4. The safe sharing method of information data based on blockchain and federal learning according to claim 1, wherein for the user in the abnormal situation, firstly, submitting the secret share of the abnormal user by using a plurality of blockchain link points in the backup committee and verifying the correctness of the secret share, recovering the polynomial and the private key of the abnormal user by using the interpolation theorem, and then, calculating the redundant random mask by using the shared key between other users and the abnormal user, thereby recovering the global gradient.
5. The method for secure sharing of information data based on blockchain and federal learning of claim 4, wherein in the verification of correctness, whether the model gradient in the transaction pool is tampered is confirmed according to the addition homomorphism of polynomial commitments, and for the situation that the model gradient is not tampered, a new block created by a leader is considered legal, and when the blockchain node in the verification committee recognizes that the verifier proportion of the new block is considered legal to reach a preset value, the verification is passed, and if the verifier proportion is smaller than the preset value, a null block is generated.
6. The utility model provides an intelligence data safety sharing system based on blockchain and federal study for multiparty user's in the network security defends joint modeling to intrusion detection model, characterized by that includes: the system comprises user nodes for participating in local model training in joint modeling, block chain nodes for carrying out consensus operation on local model training parameters of the user nodes, a trusted authority mechanism for distributing public and private key pairs for the user nodes and the block chain link points, and a backup committee and a verification committee which are formed by a plurality of block chain link points, wherein each user shares own private key secret to the backup committee so as to recover private key information of the user in abnormal situations, and a global gradient obtained by aggregation carries out correctness verification through the verification committee;
The user obtains a local intrusion detection model gradient by utilizing machine training based on local data, encrypts the model gradient by adding a mask, compresses the encrypted model gradient, and sends the compressed and encrypted model gradient together with a digital signature to an associated adjacent blockchain node; in the encryption of the model gradient data mask, aiming at private keys of users and public keys of other users, a Diffie-Hellman protocol is utilized to calculate a shared key between the users and the other users, the shared key is used as a seed of a random number generator to generate a random mask, and the random mask is utilized to encrypt the local model gradient of the users; the user adds a private key into a selected polynomial by using a verifiable secret sharing technology and constructs a polynomial commitment, the polynomial is split into n secret shares, the polynomial, a secret share witness and the polynomial commitment are sent to a backup committee, and the backup committee restores a redundant random mask through key reconstruction when the user drops a line or the signature is illegal in an abnormal situation, wherein the secret share witness is used for verifying the commitment polynomial to which the secret share belongs; compressing the encrypted model gradient by using a China remainder theorem CRT, wherein the compression process comprises the following steps: firstly, the model gradient encrypted by the user is uniformly divided into r segments, wherein,
Figure FDA0004241599620000031
l is the gradient length of the encrypted model, and k is a preset dividing length value; the model gradient segment is then compressed into one and the same with the solution of the set of k congruence equationsThe elements corresponding to the segments are used for obtaining the compression result of the whole model gradient;
the block chain node performs signature verification on the uploaded model gradient data, places legal model gradients passing through the signature verification into a transaction pool, utilizes a leader elected in the block chain node to aggregate the legal gradients in the transaction pool, adds a redundant mask generated by a user under an abnormal condition recovered by a backup committee to obtain a global gradient, and sends the global gradient to the verification committee by creating a new block for recording the global gradient and other key parameters; for the blockchain node, a consistency hash protocol based on a reputation value is utilized to draw a election leader, and the election process specifically comprises the following steps: setting a hash ring, distributing hash ring spaces corresponding to all the block chain nodes according to the credit value of the block chain nodes, performing hash calculation on an initial SHA-256 hash value of the current latest block, mapping the calculated hash value to the hash ring, and determining a block chain node leader for drawing and electing according to the hash ring space where the mapping result is located; setting all user sets as U, and setting the abnormal user set as V under the condition of illegal disconnection or signature abnormality, and then the legal gradient data process in the leader aggregate transaction pool is expressed as follows:
Figure FDA0004241599620000032
Wherein CRT indicates compression operation, ">
Figure FDA0004241599620000033
Encryption result, Δw, for model gradient mask of user i in transaction pool i Gradient compression results are obtained for the user i model;
the verification committee performs correctness verification on the global gradient, and broadcasts new blocks passing verification to the whole network to achieve consensus; the user updates the local intrusion detection model by receiving the latest global gradient and downloading the global gradient from the latest block.
CN202210040143.XA 2022-01-14 2022-01-14 Information data safe sharing method and system based on block chain and federal learning Active CN114338045B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210040143.XA CN114338045B (en) 2022-01-14 2022-01-14 Information data safe sharing method and system based on block chain and federal learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210040143.XA CN114338045B (en) 2022-01-14 2022-01-14 Information data safe sharing method and system based on block chain and federal learning

Publications (2)

Publication Number Publication Date
CN114338045A CN114338045A (en) 2022-04-12
CN114338045B true CN114338045B (en) 2023-06-23

Family

ID=81025878

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210040143.XA Active CN114338045B (en) 2022-01-14 2022-01-14 Information data safe sharing method and system based on block chain and federal learning

Country Status (1)

Country Link
CN (1) CN114338045B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114760023A (en) * 2022-04-19 2022-07-15 光大科技有限公司 Model training method and device based on federal learning and storage medium
CN115021905B (en) * 2022-05-24 2023-01-10 北京交通大学 Method for aggregating update parameters of local model for federated learning
CN114726551B (en) * 2022-06-06 2022-08-16 广州优刻谷科技有限公司 Meta-universe credit assessment method and device based on federal management
CN115049056A (en) * 2022-07-20 2022-09-13 天津科技大学 AI model training method based on block chain
CN115549901B (en) * 2022-09-29 2024-03-22 江苏大学 Batch aggregation method for federal learning in Internet of vehicles environment
CN116016610B (en) * 2023-03-21 2024-01-09 杭州海康威视数字技术股份有限公司 Block chain-based Internet of vehicles data secure sharing method, device and equipment
CN116489637B (en) * 2023-04-25 2023-11-03 北京交通大学 Mobile edge computing method oriented to meta universe and based on privacy protection
CN116402169B (en) * 2023-06-09 2023-08-15 山东浪潮科学研究院有限公司 Federal modeling verification method, federal modeling verification device, federal modeling verification equipment and storage medium
CN116828453B (en) * 2023-06-30 2024-04-16 华南理工大学 Unmanned aerial vehicle edge computing privacy protection method based on self-adaptive nonlinear function
CN116822661B (en) * 2023-08-30 2023-11-14 山东省计算中心(国家超级计算济南中心) Privacy protection verifiable federal learning method based on double-server architecture
CN116895375B (en) * 2023-09-08 2023-12-01 南通大学附属医院 Medical instrument management traceability method and system based on data sharing
CN117272389B (en) * 2023-11-14 2024-04-02 信联科技(南京)有限公司 Non-interactive verifiable joint safety modeling method
CN117521151B (en) * 2024-01-05 2024-04-09 齐鲁工业大学(山东省科学院) Block chain-based decentralization federation learning data sharing method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113704810A (en) * 2021-04-01 2021-11-26 华中科技大学 Federated learning oriented chain-crossing consensus method and system
CN113873534A (en) * 2021-10-15 2021-12-31 重庆邮电大学 Block chain assisted federal learning active content caching method in fog calculation

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11586743B2 (en) * 2018-03-22 2023-02-21 Via Science, Inc. Secure data processing
CN111552986B (en) * 2020-07-10 2020-11-13 鹏城实验室 Block chain-based federal modeling method, device, equipment and storage medium
CN112217626B (en) * 2020-08-24 2022-11-18 中国人民解放军战略支援部队信息工程大学 Network threat cooperative defense system and method based on intelligence sharing
CN112395640B (en) * 2020-11-16 2022-08-26 国网河北省电力有限公司信息通信分公司 Industry internet of things data light-weight credible sharing technology based on block chain
CN112434280B (en) * 2020-12-17 2024-02-13 浙江工业大学 Federal learning defense method based on blockchain
CN113095510B (en) * 2021-04-14 2024-03-01 深圳前海微众银行股份有限公司 Federal learning method and device based on block chain
CN113794675B (en) * 2021-07-14 2023-04-07 中国人民解放军战略支援部队信息工程大学 Distributed Internet of things intrusion detection method and system based on block chain and federal learning
CN113886817A (en) * 2021-10-19 2022-01-04 国网山东省电力公司济宁供电公司 Host intrusion detection method and device, electronic equipment and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113704810A (en) * 2021-04-01 2021-11-26 华中科技大学 Federated learning oriented chain-crossing consensus method and system
CN113873534A (en) * 2021-10-15 2021-12-31 重庆邮电大学 Block chain assisted federal learning active content caching method in fog calculation

Also Published As

Publication number Publication date
CN114338045A (en) 2022-04-12

Similar Documents

Publication Publication Date Title
CN114338045B (en) Information data safe sharing method and system based on block chain and federal learning
KR102627049B1 (en) Computer-implemented method for generating threshold vaults
KR102627039B1 (en) Threshold digital signature method and system
US10296248B2 (en) Turn-control rewritable blockchain
Bonawitz et al. Practical secure aggregation for privacy-preserving machine learning
CN115037477A (en) Block chain-based federated learning privacy protection method
CN112417489B (en) Digital signature generation method and device and server
Zhou et al. VFLF: A verifiable federated learning framework against malicious aggregators in Industrial Internet of Things
Liang et al. Auditable federated learning with byzantine robustness
CN110999207B (en) Computer-implemented method of generating a threshold library
Zhang et al. Robust and privacy-preserving federated learning with distributed additive encryption against poisoning attacks
Hussain et al. Blockchain Architecture, Components and Considerations
CN114417419A (en) Outsourcing cloud storage medical data aggregation method with security authorization and privacy protection
CN117155579A (en) Secret key share updating method, computer equipment and storage medium
CN112904067A (en) Real-time electricity stealing detection method based on user data privacy

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant