CN114338045B

CN114338045B - Information data safe sharing method and system based on block chain and federal learning

Info

Publication number: CN114338045B
Application number: CN202210040143.XA
Authority: CN
Inventors: 郭渊博; 方晨; 王一丰; 马佳利; 李勇飞; 尹安琪
Original assignee: Information Engineering University of PLA Strategic Support Force
Current assignee: Information Engineering University of PLA Strategic Support Force
Priority date: 2022-01-14
Filing date: 2022-01-14
Publication date: 2023-06-23
Anticipated expiration: 2042-01-14
Also published as: CN114338045A

Abstract

The invention belongs to the technical field of network security, and particularly relates to a safe information data sharing method and system based on blockchain and federal learning, wherein a user obtains a local intrusion detection model gradient by utilizing machine training based on local data, encrypts and compresses the local intrusion detection model gradient through a mask, and sends the local intrusion detection model gradient to adjacent blockchain nodes together with a digital signature; the block chain node performs signature verification on the gradient data, legal model gradients are put into a transaction pool, a leader in the block chain node is utilized to aggregate the legal gradients in the transaction pool, a user redundancy mask is added under the abnormal condition recovered by the backup committee to obtain a global gradient, and a new block is created to be fed back to the verification committee; the verification committee broadcasts the new block which passes the verification to the whole network; the user downloads the latest block and obtains global gradients therefrom to update the local intrusion detection model. According to the invention, the convergent information data sharing model is obtained through multiparty user training, and the model is deployed at a local end to perform network anomaly detection, so that the intrusion detection defensive performance is improved.

Description

Information data safe sharing method and system based on block chain and federal learning

Technical Field

The invention belongs to the technical field of network security, and particularly relates to a block chain and federal learning-based information data verifiable security sharing method and system.

Background

Along with frequent occurrence of network security events, intelligence data has become an important basis for detecting network information crimes, detecting intrusion behaviors and other anomalies. Training of artificial intelligence models to detect network anomalies using intelligence data has become an important means of constructing network security boundaries. The accuracy of the artificial intelligence model is closely related to the amount of training data. While the current sources of intelligence data available to each institutional user are limited to their own data collection channels, or spend a significant amount of money purchasing from third party institutions. Since the informative data may contain sensitive information about the user, and today where the data is a production element, the informative data is an important asset for each department. Therefore, most departments are not willing to share each other's informative data, which causes a serious data islanding phenomenon, and each department has difficulty in constructing an effective intrusion detection model because of insufficient informative data volume.

The federal learning is used as a distributed machine learning framework, the sharing of original data can be converted into the sharing of model parameters, a distrusting and decentralizing data transaction mode can be established among distributed users by a blockchain, the distrusting and decentralizing data transaction mode can be combined, the risk of data privacy disclosure can be reduced, the problems of single-point fault attack, trust deficiency and the like can be solved, and the whole data sharing process can be verified, traced and audited. In recent years, the combination of blockchain and federal learning is applied to data security sharing by a successor, but the following problems still exist: (1) The verifiability problem of data sharing results, the communication base station may typically be responsible for collecting model parameters uploaded by different users as blockchain nodes. Once some malicious base stations tamper the model parameters and put the model parameters into a transaction pool (namely tamper before data is uplink), the block chain can reach consensus on the wrong model parameters, and finally, the wrong data sharing model is obtained through joint modeling. (2) Shared data availability and privacy issues to enhance privacy protection when data is shared, existing literature is typically based on differential privacy and secure multiparty computing, but they reduce data availability and increase computing overhead, respectively. How to compromise data availability and privacy with less computational overhead still requires investigation. (3) The problem of large communication overhead is that the inherent training process of federal learning requires large communication overhead, and the message broadcasting mechanism in the blockchain can make the communication overhead after the two are combined larger, which limits the application of the system to the scene with limited bandwidth. Therefore, how to ensure verifiability of data sharing results is also a problem to be solved in a blockchain and federal learning-based data sharing method.

Disclosure of Invention

Therefore, the invention provides a safe and verifiable information data sharing method and system based on blockchain and federal learning, which acquire a converged information data sharing model through multiparty user training, deploy the sharing model on a user local end as an intrusion detection model to detect network anomalies, solve the problems of confidentiality, verifiability of results, high expenditure of privacy protection schemes and the like in the existing data sharing process, and provide effective technical means support for data circulation and sharing among different institutions.

According to the design scheme provided by the invention, the information data verifiable security sharing method based on blockchain and federal learning is used for joint modeling of multiparty users on an intrusion detection model in network security defense, and the joint modeling process comprises the following contents:

the trusted authority distributes public and private key pairs for each user and the block chain link point, and sends the public and private key pairs to the user and the block chain node through a secure channel, and each user shares own private key secretly to a backup committee, wherein the backup committee consists of a plurality of block chain link points;

the user obtains a local intrusion detection model gradient by utilizing machine training based on local data, encrypts the model gradient by adding a mask, compresses the encrypted model gradient, and sends the compressed and encrypted model gradient together with a digital signature to an associated adjacent blockchain node;

The block chain node performs signature verification on the uploaded model gradient data, places legal model gradients passing through the signature verification into a transaction pool, utilizes a leader elected in the block chain node to aggregate the legal gradients in the transaction pool, adds a redundant mask generated by a user under an abnormal condition recovered by a backup committee to obtain a global gradient, and sends the global gradient to the verification committee by creating a new block for recording the global gradient and other key parameters;

the verification committee performs correctness verification on the global gradient, and broadcasts new blocks passing verification to the whole network to achieve consensus; the user updates the local intrusion detection model by receiving the latest global gradient and downloading the global gradient from the latest block.

As the verifiable safe sharing method of information data based on blockchain and federal learning, the global gradient in the joint modeling is obtained iteratively by setting the model convergence condition in iteration rounds so as to update the local intrusion detection model of the user synchronously and iteratively, wherein the model convergence condition is the maximum iteration round.

As the verifiable safe sharing method of the information data based on the blockchain and the federal learning, the invention further stimulates the users and the blockchain nodes to participate in the joint modeling of the intrusion monitoring model by setting a reputation value for each user and each blockchain node, and elects the blockchain nodes to form a leader, a backup committee and a verification committee according to the reputation value, and the blacklist is utilized to manage and limit the joint participation authority of the users and the blockchain nodes with the reputation value smaller than a threshold in the joint modeling.

As the verifiable safe sharing method of information data based on blockchain and federal learning, in the encryption of model gradient data masks, aiming at private keys of users and public keys of other users, a Diffie-Hellman protocol is utilized to calculate a shared secret key between the users and the other users, the shared secret key is used as a seed of a random number generator to generate a random mask, and the local model gradient of the users is encrypted by using the random mask; the user adds a private key to the selected polynomial and constructs a polynomial commitment using verifiable secret sharing techniques by splitting the polynomial into n secret shares and sending the polynomial, a secret share witness to verify the polynomial to which the secret share belongs, and the polynomial commitment to a backup committee for restoration of a redundant random mask through key reconstruction in the event of user disconnection or signature legitimacy.

As the verifiable safe sharing method of the information data based on blockchain and federal learning, the invention further utilizes the CRT of the China remainder theorem to compress the encrypted model gradient, and the compression process comprises the following steps: first, add the user The dense model gradient is evenly divided into r segments, wherein,

l is the gradient length of the encrypted model, and k is a preset dividing length value; and then, solving a system of equations consisting of k congruence equations to compress the model gradient segment into an element corresponding to the segment, and acquiring a compression result of the whole model gradient through the element of the corresponding segment.

As the verifiable safe sharing method of information data based on blockchain and federal learning, the invention further utilizes a consistency hash protocol based on reputation values to draw a election leader aiming at blockchain nodes, and the election process comprises the following steps: setting a hash ring, distributing hash ring spaces corresponding to all the block chain nodes according to the credit value of the block chain nodes, performing hash calculation on an initial SHA-256 hash value of the current latest block, mapping the calculated hash value to the hash ring, and determining a block chain node leader for drawing and electing according to the hash ring space where the mapping result is located.

As the information data verifiable safe sharing method based on blockchain and federal learning, further, setting all user sets as U, setting the abnormal user set with illegal disconnection or signature as V, and then the process of the leader to aggregate all legal gradient data in a transaction pool is expressed as follows:

Wherein CRT indicates compression operation, ">

Encryption result, Δw, for model gradient mask of user i in transaction pool _i ' is the result of user i model gradient compression.

As the verifiable safe sharing method of the information data based on the blockchain and the federal learning, the invention further aims at the abnormal users in the abnormal user set, firstly, the secret shares of the abnormal users are submitted by utilizing a plurality of blockchain link points in the backup committee, the correctness of the secret shares is verified, the polynomial and the private key of the abnormal users are recovered by utilizing the interpolation theorem, and then, the redundant random mask is calculated by utilizing the shared key between other users and the abnormal users, so that the global gradient is recovered.

As the verifiable safe sharing method of information data based on blockchain and federal learning, in the verification of correctness, whether the model gradient in a transaction pool is tampered or not is confirmed according to the addition homomorphism of polynomial promise, and aiming at the situation that the model gradient is not tampered, a new block created by a leader is considered legal, when a block chain node in a verification committee recognizes that the proportion of a verifier with the legal new block reaches a preset value, the verification is passed, and if the proportion of the verifier is smaller than the preset value, an invalid blank block is generated.

Furthermore, the invention also provides a system for authenticating safety sharing of information data based on blockchain and federal learning, which is used for joint modeling of multiparty users to an intrusion detection model in network safety defense, and comprises the following steps: the system comprises user nodes for participating in local model training in joint modeling, block chain nodes for carrying out consensus operation on local model training parameters of the user nodes, a trusted authority mechanism for distributing public and private key pairs for the user nodes and the block chain link points, and a backup committee and a verification committee formed by a plurality of block chain link points, wherein each user shares own private key secret to the backup committee so as to recover private key information of the user in abnormal situations, and a global gradient new block obtained by aggregation carries out correctness verification through the verification committee;

The invention has the beneficial effects that:

according to the invention, the artificial intelligent model is trained in a joint modeling mode so as to be used for constructing an intrusion detection system, thereby not only reducing the risk of data privacy disclosure, but also solving the problems of single-point fault attack, trust deficiency and the like, realizing the verification, traceability and auditability of the whole flow of information data sharing, and being applicable to data sharing among a plurality of departments or institutions; the gradient is quickly encrypted by adding the mask, privacy attacks such as model reverse, model extraction and the like which occur recently can be resisted, and the mask counteracts 0 during gradient polymerization, so that the precision of the federal learning model is not influenced; and the gradient verification based on polynomial promise is integrated into the joint modeling consensus process, so that the tampering attack of malicious block chain nodes can be resisted, the problems of confidentiality, verifiability of results, high expenditure of privacy protection schemes and the like in the data sharing process can be solved, an effective technical means can be provided for data circulation and sharing among different departments or institutions, the local end intrusion detection performance and the network security defense effect are effectively improved, and the method has a good application prospect.

Description of the drawings:

FIG. 1 is a flowchart of a method for verifiable security sharing of information data based on blockchain and federal learning in an embodiment;

FIG. 2 is a schematic diagram of an information data verifiable secure sharing architecture in an embodiment;

FIG. 3 is a schematic diagram of a training process in a round of security sharing iterative training for information data verifiability in an embodiment;

FIG. 4 is a gradient masking and compression process illustration in an embodiment;

FIG. 5 is a consistent hash protocol illustration in an embodiment;

FIG. 6 is a schematic representation of a new chunk created by a leader in an embodiment;

fig. 7 is a graph showing the resistance to witch attacks at various backup committee scales in the examples.

The specific embodiment is as follows:

the present invention will be described in further detail with reference to the drawings and the technical scheme, in order to make the objects, technical schemes and advantages of the present invention more apparent.

In solving the data island problem in industry 4.0, a cognitive computing platform can be obtained by combining the decentralization of blockchain and federal learning, model parameters of a user are directly stored in the blockchain, and once an attacker or a malicious data sharing participant obtains the parameters, the information of the original data of the user can be deduced through model reverse attack. And encrypting the model parameters of the user by using the Paillier algorithm, uploading the model parameters to the blockchain, and after the model is updated, finishing decryption by cooperation of part of users, thereby consuming a great deal of calculation overhead and communication overhead. Aiming at the data security sharing requirement in the industrial Internet scene, the local differential privacy technology can be used for extracting and sharing the characteristics after adding noise on the original data, so that privacy stealing attacks can be prevented, but partial data utility can be lost. Therefore, the homomorphic encryption technology and the differential privacy technology are used for enhancing the data security, so that a certain defect exists, and the research is still needed on how to consider the usability and the privacy of the data with smaller calculation cost and communication cost. In addition, because verifiability of the data sharing result is not considered, the base station can be used as a blockchain node to collect model parameters uploaded by users in different areas in general; if some malicious base stations tamper the model parameters and put the model parameters into a transaction pool (namely tamper before data is uplinked), the block chain can reach consensus on the wrong model parameters, and finally, an incorrect data sharing model is obtained through joint modeling, so that the actual application of data sharing is affected. Therefore, in an embodiment of the present invention, a verifiable security sharing method for information data based on blockchain and federal learning is provided, which is used for joint modeling of multi-party users on an intrusion detection model in network security defense, and as shown in fig. 1, the joint modeling process includes the following contents:

S101, a trusted authority allocates public and private key pairs for each user and block chain link points, and sends the public and private key pairs to the user and the block chain nodes through a secure channel, and each user shares own private key secretly to a backup committee, wherein the backup committee consists of a plurality of block chain link points;

s102, a user acquires a local intrusion detection model gradient by utilizing machine training based on local data, encrypts the model gradient by adding a mask, compresses the encrypted model gradient, and sends the compressed and encrypted model gradient together with a digital signature to an associated adjacent blockchain node;

s103, signature verification is carried out on the uploaded model gradient data by the block chain node, legal model gradients passing through the signature verification are put into a transaction pool, legal gradients in the transaction pool are aggregated by utilizing a leader selected from block chain link points, a redundancy mask generated by a user under an abnormal condition recovered by a backup committee is added, a global gradient is obtained, a new block used for recording the global gradient and other key parameters is created, and the global gradient is sent to the verification committee by utilizing the new block, wherein the verification committee consists of a plurality of block chain link points;

S104, the verification committee performs correctness verification on the global gradient, and broadcasts new blocks passing verification to the whole network to achieve consensus; the user updates the local intrusion detection model by receiving the latest global gradient and downloading the global gradient from the latest block.

In the embodiment of the present disclosure, each user participating in information data sharing converts a data sharing problem into a model gradient sharing problem through federal training of a local end, and adds a mask on a gradient to realize rapid encryption. To reduce communication overhead, a user may compress the encryption gradient before uploading to the associated blockchain node. And then, carrying out aggregation calculation on all the effective gradients in the block chain to obtain a global gradient, verifying the correctness of the global gradient, and generating legal blocks to achieve consensus in the whole network. And finally, downloading the generated new block from the block chain by each user, acquiring the global gradient from the new block, and updating the local model. The user can refer to departments, local enterprises, organization organizations and the like participating in the sharing of the information data, has limited information data and computing capacity, hopes to combine and model the user's own information data with other users while keeping the information data locally, and obtains a more accurate abnormality detection model to construct the network protection system. In the embodiment of the present disclosure, it may be assumed that users participating in data sharing are semi-honest, i.e. they may honest perform the protocol, but may use their own information to infer the intelligence data of other users. The blockchain node is generally equipped with certain computing resources and communication resources, such as a communication base station, a server and the like, and is responsible for operations of verification, aggregation, consensus and the like of parameters, and can assume that part of blockchain nodes can be captured by an attacker, tampered with data uploaded by a user and put into a transaction pool, and can also provide false secret shares in a secret reconstruction stage. The transaction is used for data recording of interactions between blockchain nodes, in this embodiment, the gradient of the transaction record model and related training information. The block chain and federal learning are combined to be applied to information data sharing, so that the problems of confidentiality, verifiability of results and high expenditure of privacy protection schemes in the data sharing process are solved, and an effective technical means is provided for communicating and sharing information data among different departments.

As the verifiable safe sharing method of information data based on blockchain and federal learning in the embodiment of the invention, further, the global gradient in the joint modeling is obtained in an iteration mode by setting the model convergence condition in the iteration round, so as to update the local intrusion detection model of the user in a synchronous iteration mode, wherein the model convergence condition is the maximum iteration round. Further, the user and the block chain node are stimulated to participate in joint modeling of the intrusion monitoring model by setting a reputation value for each user and the block chain node, and the block chain node is elected according to the reputation value to form a leader, a backup committee and a verification committee, and the blacklist is utilized to manage and limit joint participation authority of the user and the block chain node with the reputation value smaller than a threshold in the joint modeling.

In this embodiment, each user and blockchain node involved in data sharing may be assigned an initialization reputation value. For a user, if the user participates in data sharing online in the whole process and the uploaded gradient signature is verified to be legal, the reputation value of the user is increased, otherwise, the reputation value is reduced; for a blockchain node, its reputation value increases if it provides a correct secret share, generates a legitimate new block, or participates in new block verification, and decreases if it provides a false secret share. When reputation values decrease to 0, they are blacklisted and are not allowed to participate in the intelligence data sharing. It can be assumed that at any time at least 70% of the reputation values in the system are mastered by the honest to ensure proper operation of the blockchain consensus protocol. Referring to fig. 2, it is assumed that the architecture is composed of a blockchain and m distributed users, where the blockchain is maintained by a plurality of nodes equipped with certain computing and communication resources, and in practical application, the blockchain nodes may be base stations equipped with servers, etc. The users may be departments, institutions, enterprises, etc. with limited computing and communication capabilities, which hold a local intelligence data set D _i (i is more than or equal to 1 is less than or equal to m), mapping the original information data into a model gradient based on machine learning training, uploading the model gradient to the associated blockchain nodes through a wired or wireless network, and completing federal learning under the coordination of blockchains, thereby achieving the purpose of information data sharing.

As the verifiable safe sharing method of information data based on blockchain and federal learning in the embodiment of the invention, further, in the encryption of model gradient data masks, aiming at private keys of users and public keys of other users, a Diffie-Hellman protocol is utilized to calculate a shared key between the users and the other users, the shared key is used as a seed of a random number generator to generate a random mask, and the local model gradient of the users is encrypted by using the random mask; the user adds the private key to the selected polynomial and constructs a polynomial commitment using verifiable secret sharing techniques by splitting the polynomial into n secret shares and sending the polynomial, secret share witness and polynomial commitment to the backup committee for use in the followingAnd recovering a redundant random mask through key reconstruction when the user is disconnected or the signature is illegal, wherein the secret share witness is used for verifying a commitment polynomial to which the secret share belongs. Further, the encrypted model gradient is compressed by using the China remainder theorem CRT, and the compression process comprises the following steps: firstly, the model gradient encrypted by the user is uniformly divided into r segments, wherein,

Cryptographic commitments are an important class of cryptographic primitives that generally involve a committee and an verifier. In the promise generation stage, a promise party selects a message m, calculates promise c in a ciphertext form, and then sends the promise c to a receiver, wherein the promise party cannot change m at the moment; in the promise revealing stage, the promise party publishes the plaintext message m and the secret key, the validation party calculates the promise c 'corresponding to m in the same way, if c' =c, the validation is passed, otherwise the validation is failed. The commitment protocol has the following characteristics: (1) concealment: the commitment value c does not reveal any information about message m; (2) binding: the committee cannot open commitment c to a message other than m and verify passing. In view of the above characteristics, a commitment protocol may be used to ensure the uniqueness of the ciphertext-form private data interpretation.

Polynomial commitment is a commitment protocol that satisfies the additive homomorphic encryption property, often used to construct zero knowledge proof, verifiable secret sharing, and the like. The process of constructing a verifiable secret share (Verifiable Secret Sharing, VSS) can be described as follows:

(1) Initializing Setup (1) ^κ Let t) assume

And->

Is a group with order prime number p, g is +.>

Is used for generating the generation element of (a),

symmetric bilinear pair mapping to satisfy the t-strong Diffie-hellman (t-SDH) assumption. Selection of

As private key SK, the public key is +.>

(2) Commitment to Committment (PK, φ (x)): for a t-th order polynomial

Its promise can be calculated as:

(3) The promise reveals VerifyPoly (PK, COMM (φ (x)), φ (x)): given a polynomial

A commitment value COMM. If->

It proves that the commitment was indeed generated by the polynomial phi (x) and not otherwise.

(4) Secret distribution CreateWitness (PK, phi (x), i) to perform (n, t) -secret sharing among n users, a secret share sent to user i (1. Ltoreq.i. Ltoreq.n)<i,φ(i),w _i >Function value phi (i) at index i, containing polynomial phi (x), and witness w _i ＝COMM(ψ _i (x) A kind of electronic device. Wherein the method comprises the steps of

COMM(ψ _i (x) The calculation method of (2) is the same as that of the formula (1).

(5) Secret verification VerifyEval (PK, COMM (x)), i, phi (i), w _i ) If the formula () is true, user i's secret share<i,φ(i),w _i >From the committed polynomial COMM (x)), otherwise not.

(6) Secret reconstruction Recover (i, f (i)): any t+1 or more users present their secret shares < i, φ (i) >, which are verified, and then Recover the original polynomial φ (x) using the interpolation theorem.

In addition, polynomial commitments also satisfy addition homomorphism:

COMM(φ ₁ (x)+φ ₂ (x))＝COMM(φ ₁ (x))*COMM(φ ₂ (x)) (3)

the chinese remainder theorem (Chinese Remainder Theorem, CRT) is one method of solving a system of linear congruence equations. Let m be ₁ ,m ₂ ,L,m _k Is a positive integer and is mutually equal to each other, let M=m ₁ ·m ₂ L m _k The following system of equations is in the finite field

There is only one solution inside:

solution to

Wherein M is _i ＝M/m _i ，/>

Is a finite field->

Inner M _i Is the inverse of (a).

Assuming that all users have registered in the system and are assigned respective public and private keys and an ordered number ID, see fig. 3, the steps in a round of training may be designed to include: before training begins, each user uses verifiable secret sharing (Verifiable Secret Sharing, VSS) to share its own private key secret to a backup committee consisting of several block link points to prevent user disconnection during subsequent training from affecting the normal training process (step 0). In formal training, each user iterates locally to get a model gradient (step 1) and adds a mask to prevent privacy leakage (step 2). To save communication overhead, users use Chinese remainder theorem (Chinese Remainder Theorem, CRT) to compress encryption gradients and send them to neighboring blockchain nodes (step 3 and step 4) along with the promised value of the original gradient. After verifying the data signature, the node puts legal gradients into a transaction pool, stops data after a specified time, and elects a leader to execute the next gradient aggregation (step 5). If the gradients of all users in the transaction pool are in the same, the leader directly adds the gradients to obtain a global gradient; if the gradient of some users is missing in the transaction pool (i.e., the user is dropped or the signature is verified as illegal), the leader calculates the global gradient under the secret share provided by the backup committee (step 6). The leader then creates a new chunk packing related gradient information and sends the chunk to the committee for verification and broadcasting (step 7). Finally, the user downloads the latest global gradient update local model from the blockchain (step 8). If the user is disconnected or the signature verification fails in the training of the round, the next training round is to allocate new private keys to the users and execute step0-step8, otherwise, step1-step8 is executed. The iteration is repeated until the model converges or the maximum training round number is reached. Note that the reputation values of users, leaders, and backup committee members identified as legitimate in each round of training will increase to motivate them to make greater contributions to the data sharing system.

In the initialization stage (step 0) before training, the trusted authority generates public and private key pairs for all users and block link points, and other public information is stored in an creation block (namely the first block in the block chain) and is sent to all participants by the trusted authority through a secure channel to execute an initialization task. Wherein, the creation block mainly comprises the following contents:

a) Model initialization parameter w ₀ Learning rate eta, total training round number T

b) Public key PK for generating polynomial commitments

c) Positive integer m of k pairwise intersubstances ₁ ,m ₂ ,L,m _k

d) PRG (,) when its input is made from l sources

When the element composition of (2) is uniform and random, it can output a random distribution of [0, R) ^l Spatially pseudo-random numbers

e) Initial random seed ₀ Wherein the seed parameter seed of the ith training round _i Based on seed of the previous round _i-1 Generated, it is mainly used for ensuring randomness of leader election

f) Initial reputation values for all users and blockchain nodes and reputation update functions

In addition, considering the possible occurrence of partial user disconnection in the training process, all users adopt VSS to split their private keys into a plurality of secret shares and send the secret shares to a backup committee before formal training starts.

A local training phase (step 1-step 4) in which each user obtains a model gradient Deltaw based on local intelligence data in each round of training _i 1.ltoreq.i.ltoreq.m, and then Δw is added by adding a mask _i Encryption is

To enhance privacy protection. In order to reduce the communication overhead, in the present embodiment, the gradient of encryption can be compressed using China remainder theorem CRT>

Assuming Δwi=l, user i (1.ltoreq.i.ltoreq.m) will first be +.>

Evenly divide into->

Fragments, i.e.)>

Wherein the symbol->

Representing an upward rounding. If l is not divisible by k, then 0 is used for filling. Let j' th fragment be +.>

User i (1. Ltoreq.i.ltoreq.m) solves the following set of congruence equations:

according to the Chinese remainder theorem, the above equation set has a unique solution

It can be seen that each gradient vector segment of length k +.>

Compressed into an element Deltaw by CRT _i ′ _j Then the whole gradient vector

Can be compressed into +.>

The length becomes 1/k of the original length. The entire gradient masking and compression process may be as shown in fig. 4. User i (i is more than or equal to 1 and less than or equal to m) calculates original gradient deltaw _i Commitment value COMM (aw) _i ) And will<Δw _i ′,COMM(Δw _i )>Along with the digital signature, to the associated blockchain node.

And an aggregation stage (step 5-step 6) for firstly checking whether the signature is legal or not after the blockchain node receives the data uploaded by the user. If it is legal, the data is put into a transaction pool. After a certain time, all nodes stop receiving data and then compete to become a leader to acquire the right of generating a new block. In this embodiment, a hash protocol based on consistency of reputation values may be used as a lottery algorithm to elect a leader, and the process is shown in fig. 5, specifically, by giving a hash ring, the space is allocated to each blockchain node proportionally according to the size of reputation values. And repeatedly performing hash calculation on the initial SHA-256 hash value of the current latest block, and mapping the hash value obtained by each calculation to a hash ring, so that the block chain node corresponding to the space where the mapping result is located is selected. Note that in this embodiment, the leader of the current training round can be selected by repeating the hash calculation for 1 time, and the backup committee and the verification committee, which are composed of several nodes, need to perform hash calculations for multiple times to select the blockchain node members in the committee. It follows that the above-described drawing process is similar to the algornd protocol: the probability that a blockchain node is pumped is proportional to its reputation value. Let U represent the set of all users, V represent the abnormal user set that the line is dropped or signature is illegal. The selected leader aggregates all user gradients in the transaction pool according to equation (6):

For two data compressed by CRT (as in equation (5))

The method can be calculated to obtain:

the formula shows that the CRT satisfies the additive homomorphism. Based on this property, equation (6) can be transformed into:

the leader then computes the sum of the coefficients by the modulo operation in equation (9)

Decompression is +.>

Then calculating to obtain global gradient Deltaw _g 。

Further, in this embodiment, for an abnormal user in the abnormal user set, first, a plurality of block link points in the backup committee are used to submit the secret shares of the abnormal user and verify the correctness of the secret shares, the polynomial and the private key of the abnormal user are recovered by using the interpolation theorem, and then, the redundant random mask is calculated by using the shared key between other users and the abnormal user, so as to recover the global gradient. Further, whether the model gradient in the transaction pool is tampered is confirmed according to the addition homomorphism of polynomial promise, and aiming at the situation that the model gradient is not tampered, a new block created by a leader is confirmed to be legal, when the block chain node in the verification committee confirms that the verifier proportion of the new block is legal reaches a preset value, verification is passed, and if the verifier proportion is smaller than the preset value, an invalid empty block is generated.

In the block generation and broadcasting phase, the leader creates a new block and broadcasts it to the validation committee for validation. As shown in FIG. 6, the blocks in the present embodiment are composed of A block header and a block body, wherein the block header includes meta information of a block and a pointer (i.e., a hash value) to a previous block; the tile body contains a series of transaction information. Unlike conventional blockchains, storing relevant training parameters as transactions in the present embodiment may include: (1) Random seed parameter seed for the next training round _t+1 (2) proof generated when the leader is elected in the polymerization stage, (3) global gradient Δw for this round _g (4) promise values for legal user gradients. Therefore, key parameter information in the whole training process is recorded in the blockchain in a tamper-proof mode, and compared with a traditional federal learning algorithm, the training process of the algorithm has auditability.

In the prior art, local gradient plaintext of all users is directly stored in a block, and once an attacker or a semi-honest user acquires gradients of other users, privacy attacks such as model reversal, model extraction and the like can be initiated. Therefore, in the embodiment of the present disclosure, only the promised value of the gradient can be stored in the block, so that not only the privacy information of the gradient can be protected, but also the correctness of the global gradient obtained by each round of training can be ensured. Specifically, after the new block generated by the leader is broadcast to the validation committee, all validators calculate whether formula (10) holds.

If so, the user gradient in the transaction pool can be determined to be not tampered according to the addition homomorphism of the polynomial promise, and the new block is legal. Otherwise, it is indicated that some block chain links tamper the collected user gradient and then put into the transaction pool, so that the global gradient calculation is wrong and the new block is illegal. When the verifier exceeding 2/3 confirms that the new area block is legal, the new area block is broadcast to the whole network to reach consensus through verification, otherwise, an invalid empty area block is generated.

And a block generation and broadcasting stage (step 7), wherein the leader creates a new block and broadcasts the new block to the verification committee for verification, and the election method of the verification committee is consistent with the lottery algorithm based on the consistency hash protocol, so that the description is omitted. If the verification is successful, the verification committee broadcasts the block to all block chain nodes of the whole network through a gossip protocol to achieve consensus. Otherwise, an invalid empty block is created.

And a model updating stage (step 8) for downloading the latest blocks from the associated block chain nodes by the user, and updating the local model after acquiring the global gradient from the latest blocks. If an abnormal user occurs in the training of the round (namely, the line is disconnected or the signature is illegal), the leader restores the private key of the user during gradient aggregation, so that a trusted authority needs to allocate a new public and private key to the user before the next training round and re-execute the secret sharing step. After each round of training is finished, the credit value of the gradient signature is verified as legal users which participate in data sharing online in the whole process, or else, the credit value of the gradient signature is reduced; the reputation value increases for those blockchain nodes that generate legitimate new blocks or participate in new block verification. When reputation values decrease to 0, they are blacklisted and are not allowed to participate in the intelligence data sharing.

The operation calculation process of secret sharing of the private key of the user, encryption of the model gradient mask and aggregation calculation of the global gradient can be described as follows:

gradient mask: assume that each user has acquired the public key pk of the other user _i If i is E U, the Diffie-Hellman protocol is run to calculate the shared key s between each user pair _i,j ←KA.agree(sk _i ,pk _j ) And generates a random mask using the key as a seed for the random number generator. Assume that each user has now obtained a gradient aw of length l through local training _i I is equal to or greater than 1 and is equal to or less than m, and for simplicity, the vector Deltaw is assumed _i All elements in are in the domain

In (3), the gradient Deltaw _i Can be encrypted as->

As shown in the following formula.

As can be seen from the formula (11), the user can encrypt only by adding a random number to the gradient, and when all users encrypt the gradient

After addition, the random numbers are partially offset to be 0, and the global gradient can be directly obtained

Compared with homomorphic encryption algorithm adopted in the prior art, the encryption mode has higher efficiency and does not lose data utility. However, once some users drop the line or the signature is verified as illegal, the random number cannot be counteracted to 0 when the residual gradients are added, and the global gradient cannot be obtained. Therefore, the private key of the user needs to be backed up in a secret way, and when the user is disconnected or the signature is illegal, the redundant random number can be calculated by using the backed-up private key, so that the global gradient is obtained. Based on this idea, the user private key is secret shared to other local users. Considering that the key reconstruction requires larger calculation and communication overhead, and the calculation and communication resources of the blockchain node are far larger than those of the local user, in the embodiment of the present disclosure, the private key of the user can be shared to the backup committee consisting of a plurality of blockchain nodes through the VSS secret.

Private key sharing: assume that the backup committee consists of n block chain points (election is implemented by a lottery algorithm based on a consistent hash protocol). User i (1.ltoreq.i.ltoreq.m) first selects a polynomial phi _i (x) Its private key sk _i Is set to phi _i (x) Constant term of (phi) _i (0)＝sk _i A commitment COMM (phi) is then made to the polynomial _i (x) A kind of electronic device. Next, the polynomial φ is applied using verifiable secret sharing techniques _i (x) Split into n secret shares {<k,φ _i (k)>1 is equal to or greater than k is equal to or less than n, and will<k,φ _i (k),w _i,k ,COMM(φ _i (x))>And the data is sent to a block chain node k (k is more than or equal to 1 and less than or equal to n). Wherein the method comprises the steps of

For witness of a secret share, it can be used to verify that the secret share does belong to COMM (phi) _i (x) Polynomial phi) of commitments in _i (x) This prevents a part of the malicious blockchain nodes from providing spurious shares during key reconstruction.

Gradient polymerization: assuming that the leader has been decompressed by equation (9)

If all users' gradients are uploaded to the blockchain and the signature is legal (i.e., +.in equation (9)>

) The leader directly gets the global gradient through a simple addition operation as shown in the following formula: />

If part of the users are disconnected or the signature is verified as illegal (the part of the abnormal users are recorded as a set V), the secret shares of the abnormal users i epsilon V are submitted by more than t block chain link points in the backup committee <k,φ _i (k),w _i,k ,COMM(φ _i (x))>After verifying the correctness of the secret share, the polynomial phi is recovered by interpolation theorem _i (x) Private key sk _i I is V. Then calculating the shared secret key s between other users and abnormal users _i,m ＝KA.agree(sk _i ,pk _m ) And i epsilon V and m epsilon U-V, and finally obtaining the global gradient through the calculation of the following formula.

In the embodiment of the scheme, the private key secret of the user is shared to the backup committee in the initialization stage, then a mask is added to the original user gradient in the local training stage, finally the global gradient is calculated in the aggregation stage, the three operations are closely related, gradient security aggregation under the two conditions that the user is disconnected or not disconnected is jointly realized, the method can be suitable for information data sharing among a plurality of departments, institutions and organizations, the risk of data privacy leakage is reduced, the problems of single-point fault attack, trust deficiency and the like are solved, and the network intrusion detection defensive performance is improved.

Furthermore, based on the method, the invention also provides an information data verifiable security sharing system based on blockchain and federal learning, which is used for joint modeling of multiparty users on an intrusion detection model in network security defense, and comprises the following steps: the system comprises user nodes for participating in local model training in joint modeling, block chain nodes for carrying out consensus operation on local model training parameters of the user nodes, a trusted authority mechanism for distributing public and private key pairs for the user nodes and the block chain link points, and a backup committee and a verification committee formed by a plurality of block chain link points, wherein each user shares own private key secret to the backup committee so as to recover private key information of the user in abnormal situations, and a global gradient new block obtained by aggregation carries out correctness verification through the verification committee;

the verification committee performs correctness verification on the global gradient, and broadcasts new blocks passing verification to the whole network to achieve consensus; the user updates the local intrusion detection model by updating the blocks and obtaining global gradients from them.

To verify the validity of this protocol, the following is further explained in connection with experimental data:

privacy analysis: if the local gradient of one user is added with a pair of uniform random masks (as shown in equation (11)) and these masks cancel each other out to 0 when all user gradients are added, then the masked user gradients can be considered uniform random, i.e., representing that the pair of masks can protect the gradient privacy of a single user.

Theorem 1: given m, l, R, U, { Δw _i } _i∈U Where m is the number of users and l is the user gradient Δw _i U represents the set of all users. Let us assume the gradient aw of all users _i I epsilon U all satisfy

Then->

Wherein the symbol "≡" indicates that both are equally distributed.

And (3) proving: the theorem was demonstrated using induction.

(1) When m= |u|=1,

since it has been assumed that

There is->

In addition, a->

So when m=1, formula (14) becomesStanding.

(2) When m= |u|=k, if the theorem 1 is assumed to be true, then according to equation (14), it is obtained

(3) When m=k+1, a new set U' =u { k+1} is defined, then there is

In the formula (16) already

Then formula (17) can be written as

Since it has been assumed that

Then get->

Thus formula (18) can be written as

On the other hand, in the other hand,

according to formula (15), there is

And has been assumed +.>

Can be inferred to

/>

In addition, since it has been assumed that

Can be inferred to

Based on equations (21) and (22), equation (20) can be written as:

by combining equations (19) and (23), it can be inferred that when m=k+1, theorem 1 holds. And (5) finishing the verification.

Resistance to witch attack: in the private key sharing operation calculation, in order to support the user to drop the line, the private key secret of the user is shared to the backup committee. However, if an attacker can control more than t blockchain nodes in the backup committee through the witches attack, he can generate enough false secret shares and witnessed to pass the verification of VSS and eventually reconstruct a false private key, thus disrupting the gradient aggregation operation process. By analyzing what the minimum value of the threshold t in VSS is if the probability of an attacker breaking the secret reconstruction is to be limited below a certain value.

Given that the backup committee is derived by a sampling algorithm based on a consistent hash protocol, the probability of each blockchain node being elected is proportional to its reputation value. Thus, the probability p of more than t blockchain nodes in the attacker control backup committee can be calculated as:

where n is the number of members in the backup committee and s is the ratio of malicious credits controlled by an attacker. Since the present invention assumes that at least 70% of the reputation values in the system are mastered by honest, s=0.3. According to equation (24), the probability p is assumed to satisfy the binomial distribution. However, binomial distribution is resampling, and each block link point can only be elected once in this scheme, so p can be regarded as the upper bound of probability of more than t nodes in the attacker control backup committee. By the exhaustion method, the minimum value of the threshold t in VSS when p is smaller than a certain probability under different backup committee scales is calculated, and as shown in FIG. 7, the minimum value of the threshold t in VSS when p is smaller than 0.01,0.05 and 0.001 is shown. P can be limited under different probabilities according to actual training conditions so as to ensure the safety of the method. For example, the number of training rounds on the intelligence dataset for this solution is typically within 100 times, then p should be less than 0.01. When the number of nodes in the backup committee is 10, the minimum value of the threshold value tset is 8, so that the attacker can be guaranteed to break the key reconstruction process of VSS through the witch attack with high probability.

Based on the experimental data, the scheme can further explain that the security aggregation based on the gradient mask and the verifiable secret sharing not only strengthens the privacy security during the sharing of the information data, but also does not lose the utility of the data; based on the binding and hiding property of polynomial promise, new block chain structure is utilized to integrate federal learning model verification into the consensus process, so that the tampering attack of malicious block chain nodes can be resisted; communication overhead can be effectively reduced through gradient compression, and application in actual scenes is facilitated.

The relative steps, numerical expressions and numerical values of the components and steps set forth in these embodiments do not limit the scope of the present invention unless it is specifically stated otherwise.

Based on the above method and/or system, the embodiment of the present invention further provides a server, including: one or more processors; and a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the method described above.

Based on the above-described method and/or system, embodiments of the present invention also provide a computer-readable medium having a computer program stored thereon, wherein the program, when executed by a processor, implements the above-described method.

Any particular values in all examples shown and described herein are to be construed as merely illustrative and not a limitation, and thus other examples of exemplary embodiments may have different values.

It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.

Finally, it should be noted that: the above examples are only specific embodiments of the present invention, and are not intended to limit the scope of the present invention, but it should be understood by those skilled in the art that the present invention is not limited thereto, and that the present invention is described in detail with reference to the foregoing examples: any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments, or perform equivalent substitution of some of the technical features, while remaining within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A safe sharing method of information data based on block chain and federal learning is used for joint modeling of multiparty users to an intrusion detection model in network security defense, and is characterized in that the joint modeling process comprises the following contents:

the trusted authority distributes public and private key pairs for each user and the block chain link point, and sends the public and private key pairs to the user and the block chain node through a secure channel, and each user shares own private key secretly to a backup committee so as to recover private key data under the abnormal situation of the user, wherein the backup committee consists of a plurality of block chain link points;

the user obtains a local intrusion detection model gradient by utilizing machine training based on local data, encrypts the model gradient by adding a mask, compresses the encrypted model gradient, and sends the compressed and encrypted model gradient together with a digital signature to an associated adjacent blockchain node; in the encryption of the model gradient data mask, aiming at private keys of users and public keys of other users, a Diffie-Hellman protocol is utilized to calculate a shared key between the users and the other users, the shared key is used as a seed of a random number generator to generate a random mask, and the random mask is utilized to encrypt the local model gradient of the users; the user adds a private key into a selected polynomial by using a verifiable secret sharing technology and constructs a polynomial commitment, the polynomial is split into n secret shares, the polynomial, a secret share witness and the polynomial commitment are sent to a backup committee, and the backup committee restores a redundant random mask through key reconstruction when the user drops a line or the signature is illegal in an abnormal situation, wherein the secret share witness is used for verifying the commitment polynomial to which the secret share belongs; compressing the encrypted model gradient by using a China remainder theorem CRT, wherein the compression process comprises the following steps: firstly, the model gradient encrypted by the user is uniformly divided into r segments, wherein,

l is the gradient length of the encrypted model, and k is a preset dividing length value; then, solving an equation set by using k congruence equations to compress the model gradient segment into an element corresponding to the segment, and acquiring a compression result of the whole model gradient through the element of the corresponding segment;

blockchain node pair uploadSignature verification is carried out on model gradient data of the transaction pool, legal model gradients passing through the signature verification are put into the transaction pool, legal gradients in the transaction pool are aggregated by utilizing elected leaders in block chain link points, redundancy masks generated by users under abnormal conditions are restored by a backup committee, global gradients are obtained, and the global gradients are sent to the verification committee by utilizing new blocks by creating new blocks for recording the global gradients and other key parameters, wherein the verification committee consists of a plurality of block chain link points; for the blockchain node, a consistency hash protocol based on a reputation value is utilized to draw a election leader, and the election process specifically comprises the following steps: setting a hash ring, distributing hash ring spaces corresponding to all the block chain nodes according to the credit value of the block chain nodes, performing hash calculation on an initial SHA-256 hash value of the current latest block, mapping the calculated hash value to the hash ring, and determining a block chain node leader for drawing and electing according to the hash ring space where the mapping result is located; setting all user sets as U, and setting the abnormal user set as V under the condition of illegal disconnection or signature abnormality, and then the legal gradient data process in the leader aggregate transaction pool is expressed as follows:

Wherein CRT indicates compression operation, ">

Encryption result, Δw, for model gradient mask of user i in transaction pool _i ^′ Gradient compression results are obtained for the user i model;

the verification committee performs correctness verification on the global gradient, and broadcasts new blocks passing verification to the whole network to achieve consensus; the user updates the local intrusion detection model by receiving the latest global gradient and by downloading the latest block and retrieving the global gradient therefrom.

2. The safe information data sharing method based on blockchain and federal learning according to claim 1, wherein a global gradient in the joint modeling is iteratively obtained by setting a model convergence condition in an iteration round to update a user local intrusion detection model in a synchronous iteration, wherein the model convergence condition is a maximum iteration round.

3. The blockchain and federal learning-based intelligence data security sharing method of claim 1 or 2, wherein the user and blockchain nodes are motivated to participate in joint modeling of the intrusion detection model by setting a reputation value for each user and blockchain node and electing blockchain nodes to form a leader, a backup committee and a validation committee according to the reputation value, and the blacklist is utilized to manage and limit joint participation rights of users and blockchain nodes with reputation values smaller than a threshold in the joint modeling.

4. The safe sharing method of information data based on blockchain and federal learning according to claim 1, wherein for the user in the abnormal situation, firstly, submitting the secret share of the abnormal user by using a plurality of blockchain link points in the backup committee and verifying the correctness of the secret share, recovering the polynomial and the private key of the abnormal user by using the interpolation theorem, and then, calculating the redundant random mask by using the shared key between other users and the abnormal user, thereby recovering the global gradient.

5. The method for secure sharing of information data based on blockchain and federal learning of claim 4, wherein in the verification of correctness, whether the model gradient in the transaction pool is tampered is confirmed according to the addition homomorphism of polynomial commitments, and for the situation that the model gradient is not tampered, a new block created by a leader is considered legal, and when the blockchain node in the verification committee recognizes that the verifier proportion of the new block is considered legal to reach a preset value, the verification is passed, and if the verifier proportion is smaller than the preset value, a null block is generated.

6. The utility model provides an intelligence data safety sharing system based on blockchain and federal study for multiparty user's in the network security defends joint modeling to intrusion detection model, characterized by that includes: the system comprises user nodes for participating in local model training in joint modeling, block chain nodes for carrying out consensus operation on local model training parameters of the user nodes, a trusted authority mechanism for distributing public and private key pairs for the user nodes and the block chain link points, and a backup committee and a verification committee which are formed by a plurality of block chain link points, wherein each user shares own private key secret to the backup committee so as to recover private key information of the user in abnormal situations, and a global gradient obtained by aggregation carries out correctness verification through the verification committee;

l is the gradient length of the encrypted model, and k is a preset dividing length value; the model gradient segment is then compressed into one and the same with the solution of the set of k congruence equationsThe elements corresponding to the segments are used for obtaining the compression result of the whole model gradient;

the block chain node performs signature verification on the uploaded model gradient data, places legal model gradients passing through the signature verification into a transaction pool, utilizes a leader elected in the block chain node to aggregate the legal gradients in the transaction pool, adds a redundant mask generated by a user under an abnormal condition recovered by a backup committee to obtain a global gradient, and sends the global gradient to the verification committee by creating a new block for recording the global gradient and other key parameters; for the blockchain node, a consistency hash protocol based on a reputation value is utilized to draw a election leader, and the election process specifically comprises the following steps: setting a hash ring, distributing hash ring spaces corresponding to all the block chain nodes according to the credit value of the block chain nodes, performing hash calculation on an initial SHA-256 hash value of the current latest block, mapping the calculated hash value to the hash ring, and determining a block chain node leader for drawing and electing according to the hash ring space where the mapping result is located; setting all user sets as U, and setting the abnormal user set as V under the condition of illegal disconnection or signature abnormality, and then the legal gradient data process in the leader aggregate transaction pool is expressed as follows:

Wherein CRT indicates compression operation, ">