CN114338045A

CN114338045A - Information data verifiability safety sharing method and system based on block chain and federal learning

Info

Publication number: CN114338045A
Application number: CN202210040143.XA
Authority: CN
Inventors: 郭渊博; 方晨; 王一丰; 马佳利; 李勇飞; 尹安琪
Original assignee: Information Engineering University of PLA Strategic Support Force
Current assignee: Information Engineering University of PLA Strategic Support Force
Priority date: 2022-01-14
Filing date: 2022-01-14
Publication date: 2022-04-12
Anticipated expiration: 2042-01-14
Also published as: CN114338045B

Abstract

The invention belongs to the technical field of network security, and particularly relates to an information data verifiability security sharing method and system based on a block chain and federal learning.A user acquires a local intrusion detection model gradient by utilizing machine training based on local data, and the local intrusion detection model gradient is encrypted and compressed by a mask and then is sent to a neighboring block chain node together with a digital signature; the block chain node performs signature verification on the gradient data, legal model gradients are put into a transaction pool, a leader in the block chain node is used for aggregating the legal gradients in the transaction pool, a user redundancy mask under the abnormal condition recovered by a backup committee is added to obtain a global gradient, and a new block is created to feed back the global gradient to a verification committee; the authentication committee broadcasts the new blocks which pass the authentication to the whole network; the user downloads the latest tile and gets the global gradient from it to update the local intrusion detection model. The invention obtains the converged information data sharing model through the training of multi-party users, deploys the model at the local end for network anomaly detection, and improves the defense performance of intrusion detection.

Description

Information data verifiability safety sharing method and system based on block chain and federal learning

Technical Field

The invention belongs to the technical field of network security, and particularly relates to an information data verifiability security sharing method and system based on block chain and federal learning.

Background

With the frequent occurrence of network security incidents, intelligence data has become an important basis for detecting network information crime, detecting intrusion behaviors and other abnormalities. The method for detecting network abnormity by training an artificial intelligence model by using intelligence data has become an important means for constructing a network safety protection boundary. The accuracy of artificial intelligence models is closely related to the amount of training data. The current sources of intelligence data for each institution user are limited to their own data collection channels or cost a lot of money to purchase from third party institutions. Since informative data may contain sensitive information about a user and today data is a production element, informative data is an important asset for every department. Therefore, most departments are reluctant to share the intelligence data of each other, which leads to a serious data islanding phenomenon, and each department is difficult to construct an effective intrusion detection model because of insufficient intelligence data quantity.

Federal learning serves as a distributed machine learning framework, original data sharing can be converted into model parameter sharing, a distrust-removing and centralization-removing data transaction mode can be established among distributed users through a block chain, the distrust-removing and centralization-removing data transaction mode can be combined, the risk of data privacy disclosure can be reduced, the problems of single-point fault attack, trust loss and the like can be solved, and verification, traceability and audit of the whole data sharing process can be achieved. In recent years, successive learners combine block chain and federal learning to be applied to data security sharing, but the following problems still exist: (1) in the verifiability problem of the data sharing result, the communication base station can be generally used as a block chain node to be responsible for collecting model parameters uploaded by different users. Once some malicious base stations tamper the model parameters and then put the tampered model parameters into the transaction pool (i.e. before data uplink), the blockchain can achieve consensus on the wrong model parameters, and finally joint modeling is performed to obtain a wrong data sharing model. (2) The availability and privacy of shared data, existing documents are typically based on differential privacy and secure multiparty computing in order to enhance privacy protection when sharing data, but they reduce data availability and increase computational overhead, respectively. There is still a need for research on how to compromise data availability and privacy with less computational overhead. (3) The problem of high communication overhead is solved, the inherent training process of federal learning needs high communication overhead, and the communication overhead after the two are combined is higher by a message broadcasting mechanism in a block chain, so that the application of the method to the scene with limited bandwidth is limited. Therefore, how to ensure the verifiability of the data sharing result is also a problem to be solved in the data sharing method based on the block chain and the federal learning.

Disclosure of Invention

Therefore, the invention provides an information data verifiability safety sharing method and system based on block chain and federal learning, a converged information data sharing model is obtained through multi-party user training, the sharing model is deployed at a user local end to be used as an intrusion detection model to carry out network anomaly detection, the problems of confidentiality, result verifiability, high privacy protection scheme cost and the like in the existing data sharing process are solved, and effective technical means support is provided for data circulation and sharing among different mechanisms.

According to the design scheme provided by the invention, the method for sharing the verifiability safety of the intelligence data based on the block chain and the federal learning is used for the joint modeling of the intrusion detection model by a plurality of users in the network security defense, and the joint modeling process comprises the following contents:

the method comprises the following steps that a trusted authority distributes a public and private key pair for each user and each block chain link point, the public and private key pair is sent to the users and the block chain link points through a secure channel, each user secretly shares a private key of the user to a backup committee, and the backup committee consists of a plurality of block chain link points;

a user acquires a local intrusion detection model gradient by using machine training based on local data, encrypts the model gradient by adding a mask, compresses the encrypted model gradient, and sends the compressed and encrypted model gradient and a digital signature to an associated adjacent block chain node;

carrying out signature verification on uploaded model gradient data by block link points, putting legal model gradients passing the signature verification into a transaction pool, aggregating the legal gradients in the transaction pool by using a selected leader in the block link points, adding a redundancy mask generated by a user under an abnormal condition recovered by a backup committee to obtain a global gradient, and sending the global gradient to a verification committee by using a new block by creating the new block for recording the global gradient and other key parameters;

the verification committee verifies the correctness of the global gradient and broadcasts the new blocks passing the verification to the whole network to achieve consensus; the user updates the local intrusion detection model by receiving the latest global gradient and downloading it from the latest tile.

The method is used for safely sharing verifiable information data based on the block chain and federal learning, further, the global gradient in the combined modeling is obtained in an iterative mode by setting the model convergence condition in the iterative round, so that the local intrusion detection model of the user is updated in a synchronous iterative mode, wherein the model convergence condition is the maximum iterative round.

The safety sharing method for verifiability of the intelligence data based on the block chain and federal learning is characterized in that a credit value is set for each user and each block link point to stimulate the user and each block link point to participate in joint modeling of an intrusion monitoring model, the block chain nodes are elected according to the credit values to form a leader, a backup committee and a verification committee, and a blacklist is used for managing and limiting joint participation authorities of the users and the block chain nodes with the credit values smaller than a threshold value in the joint modeling.

The method is used for safely sharing verifiable information data based on block chains and federal learning, and further, in the model gradient data mask encryption, aiming at the private key of a user and the public keys of other users, a shared key between the user and other users is calculated by using a Diffie-Hellman protocol, the shared key is used as the seed of a random number generator to generate a random mask, and the random mask is used for encrypting the local model gradient of the user; the user adds a private key to a selected polynomial and constructs a polynomial commitment by utilizing a verifiable secret sharing technology, and recovers a redundant random mask through key reconstruction when user drop or signature failure occurs by splitting the polynomial into n secret shares and sending the polynomial, secret share witnesses and the polynomial commitment to a backup committee, wherein the secret share witnesses are used for verifying the commitment polynomial to which the secret share belongs.

As the information data verifiability safety sharing method based on the block chain and the federal study, the invention further compresses the encrypted model gradient by using the CRT (cathode ray tube) of the Chinese remainder theorem, and the compression process comprises the following steps: firstly, uniformly dividing the model gradient after the encryption of a user into r segments, wherein,

l is the gradient length of the encrypted model, and k is a preset division length value; then, the model gradient segment is compressed into an element corresponding to the segment by using the solution of the equation system formed by the k congruence equations, and the compression result of the whole model gradient is obtained through the element corresponding to the segment.

As the safety sharing method for verifiability of intelligence data based on block chain and federal learning, the invention further utilizes a consistent hash protocol based on credit value to draw a election leader aiming at the block chain nodes, wherein the election process specifically comprises the following steps: setting a Hash ring, distributing Hash ring spaces corresponding to each block link point according to the reputation value of the block link point, carrying out Hash calculation on an initial SHA-256 Hash value of the current latest block, mapping the Hash value obtained by calculation to the Hash ring, and determining a leader of the block link point selected by drawing according to the Hash ring space where the mapping result is located.

As the safety sharing method for verifiability of intelligence data based on block chain and federal learning, further, setting all user sets as U, and abnormal user sets as V, wherein the abnormal user sets are illegal when the user sets are disconnected or signed, and the process that the leader aggregates all legal gradient data in the business pool is expressed as follows:

wherein, the CRT indicates a compression operation,

encrypt result, Δ w ', for model gradient mask of user i in transaction pool'_iAnd (5) gradient compression results of the user i model.

As the safety sharing method for verifiability of the information data based on the block chain and the federal study, the invention further aims at the abnormal users in the abnormal user set, firstly, a plurality of block chain link points in a backup committee are utilized to submit the secret shares of the abnormal users and carry out the correctness verification on the secret shares, a polynomial and the private keys of the abnormal users are recovered by utilizing an interpolation theorem, and then, a redundant random mask is calculated by utilizing the shared key between other users and the abnormal users, so that the global gradient is recovered.

The method for safely sharing verifiability of the information data based on the block chain and federal learning further comprises the steps of confirming whether the model gradient in the transaction pool is tampered according to the addition homomorphism promised by a polynomial in the correctness verification, confirming that a new block created by a leader is legal aiming at the situation that the model gradient is not tampered, and passing the verification when the verifier proportion that the new block is legal by a block chain node in a verification committee reaches a preset value, and generating an invalid empty block if the verifier proportion is smaller than the preset value.

Further, the invention also provides an information data verifiability security sharing system based on block chain and federal learning, which is used for joint modeling of intrusion detection models by multiple users in network security defense, and comprises the following steps: the system comprises user nodes used for participating in local model training in joint modeling, block link points used for carrying out consensus operation on local model training parameters of the user nodes, a credible authority used for distributing public and private key pairs for the user nodes and the block link points, and a backup committee and a verification committee which are composed of a plurality of block link points, wherein each user secretly shares a self private key to the backup committee so as to recover private key information of the user in abnormal situations, and a global gradient new block obtained by aggregation is subjected to correctness verification through the verification committee;

The invention has the beneficial effects that:

the artificial intelligence model is trained in a joint modeling mode to be used for constructing an intrusion detection system, so that the risk of data privacy disclosure is reduced, the problems of single-point fault attack, trust loss and the like can be solved, the information data sharing whole process can be verified, traced and audited, and the method is suitable for data sharing among multiple departments or organizations; the gradient is quickly encrypted by adding the mask, so that privacy attacks such as the latest model inversion and model extraction can be resisted, and the mask is offset to 0 during gradient aggregation, so that the precision of the federal learning model is not influenced; and gradient verification based on polynomial commitment is merged into a joint modeling consensus process, tampering attack of malicious block chain nodes can be resisted, the problems of confidentiality, result verifiability, high privacy protection scheme overhead and the like in a data sharing process can be solved, an effective technical means can be provided for data circulation and sharing among different departments or mechanisms, local end intrusion detection performance and network security defense effect are effectively improved, and the method has a good application prospect.

Description of the drawings:

FIG. 1 is a flow diagram of a method for verifiably and safely sharing informative data based on a block chain and federal learning in an embodiment;

FIG. 2 is a schematic diagram of an embodiment of an architecture for verifiable security sharing of intelligence data;

FIG. 3 is a schematic diagram of a round of training process in an embodiment of iterative training of verifiable security sharing of intelligence data;

FIG. 4 is a schematic diagram of the gradient mask and compression process in an embodiment;

FIG. 5 is a consistent hash protocol illustration in an embodiment;

FIG. 6 is a schematic diagram of a new block created by the leader in an embodiment;

fig. 7 is a schematic diagram of the swollen attack resistance at different backup committee scales in the example.

The specific implementation mode is as follows:

in order to make the objects, technical solutions and advantages of the present invention clearer and more obvious, the present invention is further described in detail below with reference to the accompanying drawings and technical solutions.

In solving the data island problem in the industry 4.0, a cognitive computing platform can be obtained by combining a block chain and decentralization of federal learning, model parameters of a user are directly stored in the block chain, and once an attacker or a malicious data sharing participant obtains the parameters, information of original data of the user can be deduced through reverse attack of the model. Model parameters of the users are encrypted by using a Paillier algorithm and then uploaded to a block chain, and decryption is completed by cooperation of part of users after the model is updated, so that a large amount of calculation overhead and communication overhead are consumed. Aiming at the data security sharing requirement in an industrial internet scene, a local differential privacy technology can be applied, noise is added to original data, then feature extraction and sharing are carried out, privacy stealing attack can be prevented, and the effectiveness of partial data can be lost. Therefore, it can be seen that there is a certain drawback in enhancing data security by using a homomorphic encryption technology and a differential privacy technology, and how to consider both data availability and privacy with smaller computation overhead and communication overhead still needs to be studied. In addition, as the verifiability of the data sharing result is not considered, the base station can be taken as a block chain node to collect model parameters uploaded by users in different areas in general; if some malicious base stations tamper the model parameters and then put the model parameters into a transaction pool (namely tamper before data uplink), the block chain can achieve consensus on wrong model parameters, and finally joint modeling is carried out to obtain a wrong data sharing model, so that the practical application of data sharing is influenced. To this end, an embodiment of the present invention provides a method for sharing information data verifiability security based on a block chain and federal learning, which is used for joint modeling of an intrusion detection model by multiple users in network security defense, and as shown in fig. 1, the joint modeling process includes the following contents:

s101, distributing a public and private key pair for each user and each block chain link point by a trusted authority, sending the public and private key pair to the users and the block chain link points through a secure channel, and secretly sharing a private key of each user to a backup committee, wherein the backup committee consists of a plurality of block chain link points;

s102, a user acquires a local intrusion detection model gradient by machine training based on local data, encrypts the model gradient by adding a mask, compresses the encrypted model gradient, and sends the compressed and encrypted model gradient and a digital signature to an associated adjacent block chain node;

s103, performing signature verification on uploaded model gradient data by block link points, putting legal model gradients passing the signature verification into a transaction pool, aggregating the legal gradients in the transaction pool by using a leader elected by the block link points, adding a redundancy mask generated by a user under an abnormal condition recovered by a backup committee to obtain a global gradient, and sending the global gradient to a verification committee by creating a new block for recording the global gradient and other key parameters, wherein the verification committee consists of a plurality of block link points;

s104, carrying out correctness verification on the global gradient by a verification committee, and broadcasting the new blocks passing the verification to the whole network to achieve consensus; the user updates the local intrusion detection model by receiving the latest global gradient and downloading it from the latest tile.

In the embodiment of the scheme, each user participating in intelligence data sharing converts the data sharing problem into a model gradient sharing problem through federal training of a local end, and a mask is added to the gradient to realize rapid encryption. To reduce communication overhead, the user may compress the encryption gradient before uploading to the associated block link point. And then carrying out aggregate calculation on all effective gradients in the block chain to obtain a global gradient, verifying the correctness of the global gradient, and generating a legal block which can be identified in the whole network. And finally, downloading the generated new blocks from the block chain by each user, and updating the local model after acquiring the global gradient. The users can refer to departments, local enterprises, organization organizations and the like participating in intelligence data sharing, have limited intelligence data and computing capacity, hope that the intelligence data of the users can be kept locally and simultaneously combined with other users for modeling, and therefore a more accurate anomaly detection model is obtained to construct a network protection system. In this embodiment, it may be assumed that the users participating in the data sharing are semi-honest, i.e. they may honestly execute the protocol, but may also use the information obtained by themselves to infer informative data of other users. The blockchain nodes are generally equipped with certain computing resources and communication resources, such as communication base stations, servers, and the like, and are responsible for operations such as parameter verification, aggregation, consensus, and the like, it can be assumed that some blockchain nodes may be placed in a transaction pool after being tampered with data uploaded by a user after being captured by an attacker, and may also provide false secret shares in a secret reconstruction stage. The transaction is used for data recording of interaction between block chain nodes, and in the embodiment of the scheme, the gradient of a transaction record model and related training information are recorded. The block chain and the federal learning are combined and applied to information data sharing, the problems of confidentiality in the data sharing process, result verifiability and high cost of a privacy protection scheme are solved, and an effective technical means is provided for getting through circulation and sharing of the information data among different departments.

As the information data verifiability safety sharing method based on the block chain and the federal learning in the embodiment of the invention, further, the global gradient in the combined modeling is obtained by iteration through setting the model convergence condition in the iteration round, so as to synchronously update the local intrusion detection model of the user by iteration, wherein the model convergence condition is the maximum iteration round. Further, a credit value is set for each user and each block link point to stimulate the user and the block link points to participate in joint modeling of the intrusion monitoring model, the block link nodes are selected according to the credit values to form a leader, a backup committee and a verification committee, and a blacklist is used for managing and limiting joint participation permission of the users and the block link nodes of which the credit values are smaller than a threshold value in the joint modeling.

In the embodiment of the scheme, each user participating in data sharing and a blockchain node can be endowed with an initialization credit value. For a user, if the user participates in data sharing online in the whole process and the uploaded gradient signature is verified to be legal, the credit value of the user is increased, otherwise, the credit value of the user is reduced; for a blockchain node, its reputation value increases if it provides a correct secret share, generates a legitimate new block, or participates in new block verification, and decreases if it provides a false secret share. When the reputation values decrease to 0, they are blacklisted and are not allowed to participate in intelligence data sharing. It can be assumed that at least 70% of the credit in the system is known by honest at any time to ensure proper operation of the blockchain consensus protocol. Referring to fig. 2, it is assumed that the architecture is composed of a blockchain and m distributed users, where the blockchain is maintained by a plurality of nodes equipped with certain computing and communication resources, and in practical applications, the blockchain nodes may be base stations equipped with servers, etc. The users may be departments, organizations, enterprises, or the like with limited computing and communication capabilities, having possession ofLocal intelligence data set D_iAnd (i is more than or equal to 1 and less than or equal to m), mapping the original information data into a model gradient based on machine learning training, uploading the model gradient to associated block chain nodes through a wired or wireless network, and completing federal learning under coordination of the block chains, so that the purpose of information data sharing is achieved.

As an information data verifiability security sharing method based on block chain and federal learning in the embodiment of the invention, further, in the model gradient data mask encryption, aiming at the private key of the user and the public keys of other users, a shared key between the user and other users is calculated by using a Diffie-Hellman protocol, the shared key is used as the seed of a random number generator to generate a random mask, and the random mask is used for encrypting the local model gradient of the user; the user adds a private key to a selected polynomial and constructs a polynomial commitment by utilizing a verifiable secret sharing technology, and recovers a redundant random mask through key reconstruction when user drop or signature failure occurs by splitting the polynomial into n secret shares and sending the polynomial, secret share witnesses and the polynomial commitment to a backup committee, wherein the secret share witnesses are used for verifying the commitment polynomial to which the secret share belongs. Further, the encrypted model gradient is compressed by using a Chinese remainder theorem CRT, and the compression process comprises the following steps: firstly, uniformly dividing the model gradient after the encryption of a user into r segments, wherein,

Cryptographic commitments are a class of important cryptographic primitives that generally include a commitment party and a verification party. In the stage of commitment generation, the commitment party selects a message m, calculates a commitment c in a cryptograph form, and then sends the commitment c to the receiving party, wherein the commitment party cannot change the m at the moment; in the commitment disclosure stage, the commitment party publishes a plaintext message m and a secret key, and the verification party calculates a commitment c 'corresponding to the m according to the same way, if c', the verification is passed, otherwise, the verification fails. The commitment agreement has the following characteristics: (1) concealment: the commitment value c does not reveal any information about the message m; (2) binding property: the committee cannot open the commitment c as a non-m message and verify it. In view of the above, a commitment protocol may be used to ensure uniqueness in the interpretation of the ciphertext form of the private data.

Polynomial commitment is a commitment protocol that satisfies the properties of additively homomorphic cryptography, and is often used to construct zero-knowledge proofs, verifiable secret sharing, and the like. The process of constructing Verifiable Secret Sharing (VSS) can be described as follows:

(1) initialization Setup (1)^κT) hypothesis

And

is a group of order prime p, g is

The generation element of (a) is generated,

to satisfy the symmetric bilinear pairings mapping assumed by t-strong Diffie-Hellaman (t-SDH). Selecting

As the private key SK, the public key is

(2) Commitment to generate Commitment (PK, φ (x))

Its commitment can be calculated as:

(3) promise to disclose VerifyPoly (PK, COMM (φ (x)), φ (x)): given a polynomial

And a commitment value COMM. If it is

It is proven that the commitment was indeed generated by the polynomial phi (x), otherwise not.

(4) Secret sharing CreateWitness (PK, phi (x), i) secret shares sent to user i (1 ≦ i ≦ n) in order to perform (n, t) -secret sharing among n users<i,φ(i),w_iContains the function value phi (i) of the polynomial phi (x) at the index i, and witness w_i＝COMM(ψ_i(x) ). Wherein

COMM(ψ_i(x) The calculation method of (c) is the same as that of formula (1).

(5) Secret verification VerifyEval (PK, COMM (φ (x)), i, φ (i), w_i) Secret share of user i if formula () holds<i,φ(i),w_i>From the promised polynomial COMM (phi (x)), otherwise not.

(6) Secret reconstruction Recover (i, f (i)) any t +1 or more users show their secret shares < i, φ (i) >, which pass the verification, and then Recover the original polynomial φ (x) using the interpolation theorem.

In addition, the polynomial commitments also satisfy additive homomorphism:

COMM(φ₁(x)+φ₂(x))＝COMM(φ₁(x))*COMM(φ₂(x)) (3)

the Chinese Remainder Theorem (CRT) is a method for solving a linear congruence equation set. Suppose m₁,m₂,L,m_kIs a positive integer and is prime in pairs, let M be M₁·m₂L m_kThen the following system of equations is in the finite field

Only one solution is included:

is solved as

Wherein M is_i＝M/m_i，

Is a finite field

Inner M_iThe inverse of (c).

Assuming that all users have been registered in the system and assigned their respective public, private and an ordered number ID, the steps in a round of training can be designed to include the following, as shown in fig. 3: before training begins, each user shares its own private key Secret to a backup committee consisting of several tile chain nodes using Verifiable Secret Sharing (VSS) to prevent users from dropping off the line in subsequent training and affecting the normal training process (step 0). In formal training, each user iterates through the model gradient locally (step1) and adds a mask to prevent privacy leakage (step 2). To save communication overhead, the user compresses the encrypted gradient using the Chinese Residual Theorem (CRT) and sends it to the neighboring blockchain nodes along with the commitment value of the original gradient (step3 and step 4). And after the node verifies the data signature, putting legal gradients into the transaction pool, stopping the data after a specified time, and electing a leader to perform the next gradient aggregation (step 5). If the gradients of all users in the transaction pool are in the same, the leader directly adds the gradients to obtain a global gradient; if a partial user's gradient is missing from the transaction pool (i.e., the user is dropped or the signature is verified as illegal), the leader computes a global gradient under the secret shares provided by the backup committee (step 6). The leader then creates a new tile to package the relevant gradient information and sends the tile to the committee for validation and broadcast (step 7). Finally, the user downloads the latest global gradient update local model from the blockchain (step 8). If the user is dropped or the signature verification fails in the training round, the next training round will be assigned a new private key and step0-step8 is executed, otherwise step1-step8 is executed. And repeating the iteration until the model converges or the maximum number of training rounds is reached. Note that the reputation values of users, leaders, and backup committee members identified as legitimate in each round of training will all increase to encourage them to make greater contributions to the data sharing system.

In the initialization stage (step0) before training, the trusted authority generates public and private key pairs for all users and block link points, and other public information is stored in the creation block (namely the first block in the block chain) and is sent to all participants through a secure channel by the trusted authority to execute the initialization task. The creating block mainly comprises the following contents:

a) model initialization parameter w₀Learning rate η, total number of training rounds T

b) Generating a public key PK of a polynomial commitment

c) k positive integers m of each two being prime₁,m₂,L,m_k

d) Pseudo-random number generator PRG (-) when its input is from l

When the element(s) of (1) is (are) a uniform random seed, it can output a random distribution in [0, R)^lSpatially pseudo-random number

e) Initial random seed₀Wherein seed parameter seed of ith round of training_iBased on seed of the previous round_i-1Generated, primarily to ensure election by the leaderRandomness property

f) Initial reputation values for all users and blockchain nodes, and reputation update functions

In addition, considering that some users may be disconnected during training, all users are made to use VSS to split their private keys into secret shares and send the secret shares to the backup committee before formal training begins.

Local training phase (step1-step4), in each round of training, each user gets a model gradient Δ w based on local intelligence data_iI is 1. ltoreq. m, then Δ w is masked by adding_iIs encrypted as

To enhance privacy protection. To reduce communication overhead, in the present embodiment, the encrypted gradient may be compressed using a Chinese remainder theorem CRT

Assuming Δ wi ≦ l, user i (1 ≦ i ≦ m) will first

Is divided evenly into

A segment, i.e.

Wherein the symbols

Representing a rounding up. If l is not evenly divisible by k, 0 is used for padding. Suppose that the jth segment is

Then user i (1 ≦ i ≦ m) solves the following congruence equation set:

according to the Chinese remainder theorem, the above equation set has unique solution

It follows that each gradient vector segment of length k

Is compressed into an element Δ w by CRT_i′_jThen the whole gradient vector

Can be compressed into

The length becomes 1/k of the original length. The entire gradient mask and compression process may be as shown in fig. 4. User i (i is more than or equal to 1 and less than or equal to m) calculates original gradient delta w_iCommitment value COMM (Δ w)_i) And will be<Δw′_i,COMM(Δw_i)>Along with the digital signature to the associated blockchain node.

And an aggregation stage (step5-step6), when the block chain node receives the data uploaded by the user, whether the signature is legal is checked firstly. If it is legal, the data is put into a transaction pool. After a certain time, all nodes stop receiving data and then compete to become the leader to obtain the right to generate a new block. In this embodiment, a consistent hash protocol based on reputation values can be used as a drawing algorithm to select a leader, and the process is as shown in fig. 5, specifically, by giving a hash ring, the space thereof is proportionally allocated to each blockchain node according to the reputation value. And performing repeated Hash calculation on the initial SHA-256 Hash value of the current latest block, and mapping the Hash value obtained by each calculation to a Hash ring, so that the block link point corresponding to the space where the mapping result is located is selected. Note that in this embodiment, the leader of the training round can be selected by repeating the hash calculation for 1 time, and the backup committee and the validation committee, which are composed of a plurality of nodes, need to perform multiple hash calculations to select the member of the block link point in the committee. It can be seen that the above-described drawing process is similar to the Algorand protocol: the probability that a blockchain node is withdrawn is proportional to its reputation value. Let U represent the set of all users and V represent the abnormal set of users who are dropped or illegal signed. The selected leader aggregates all user gradients in the transaction pool according to equation (6):

for two data compressed by CRT (as formula (5))

Can be calculated to obtain:

this formula indicates that the CRT satisfies additive homomorphism. From this property, equation (6) can be converted to:

the leader will then operate by modulo operation in equation (9)

Is decompressed into

Then the global gradient delta w is obtained through calculation_g。

Further, in this embodiment, for an abnormal user in the abnormal user set, first, a plurality of block link points in the backup committee are used to submit secret shares of the abnormal user, the secret shares are verified for correctness, a polynomial and a private key of the abnormal user are recovered by using an interpolation theorem, and then, a redundant random mask is calculated by using a shared key between another user and the abnormal user, so that a global gradient is recovered. Further, whether the model gradient in the transaction pool is tampered or not is confirmed according to the addition homomorphism promised by the polynomial, the new block created by the leader is confirmed to be legal aiming at the situation that the model gradient is not tampered, when the verifier proportion that the block chain node confirms that the new block is legal in the verification committee reaches a preset value, the verification is passed, and if the verifier proportion is smaller than the preset value, an invalid empty block is generated.

In the block generation and broadcast phase, the leader creates a new block and broadcasts it to the validation committee for validation. As shown in fig. 6, a block in this embodiment is composed of a block header and a block body, wherein the block header contains meta information of the block and a pointer (i.e., a hash value) pointing to a previous block; the block body contains a series of transaction information. Unlike conventional blockchains, embodiments of the present disclosure store the relevant training parameters as transactions, which may include: (1) random seed parameter seed for next round of training_t+1(2) proof of proof generated when electing the leader in the aggregation phase, (3) global gradient Δ w of this round_gAnd (4) the commitment value of the legal user gradient. Therefore, the key parameter information in the whole training process is recorded in the block chain in a non-falsifiable mode, and therefore compared with the traditional federal learning algorithm, the training process of the algorithm has auditability.

In the prior art, local gradient plaintext of all users is directly stored in a block, and once an attacker or a semi-honest user obtains the gradient of other users, privacy attacks such as reverse model attack, model extraction attack and the like can be launched. Therefore, in the embodiment of the invention, only the commitment value of the gradient can be stored in the block, so that not only can the privacy information of the gradient be protected, but also the accuracy of the global gradient obtained by each training round can be ensured. Specifically, after the new block generated by the leader is broadcast to the certification committee, all verifiers calculate whether formula (10) holds.

COMM(Δw_g)＝ΠCOMM(Δw_i) (10)

If the result is positive, according to the addition homomorphism promised by the polynomial, the user gradient in the transaction pool can be determined not to be tampered, and the new block is legal. Otherwise, it indicates that some block nodes tamper the user gradient collected by the block nodes and then put the user gradient into the transaction pool, so that the global gradient calculation is wrong, and the new block is illegal. When the verifier exceeding 2/3 determines that the new block is legal, the verification is passed, the new block is broadcast to the whole network to achieve consensus, otherwise, an invalid empty block is generated.

In the block generation and broadcast phase (step7), the leader creates a new block and broadcasts it to the verification committee for verification, wherein the selection method of the verification committee is consistent with the aforementioned drawing algorithm based on the consistent hash protocol, and therefore, the detailed description thereof is omitted. If the verification is successful, the verification committee broadcasts the block to all the block chain nodes of the whole network through the gossip protocol to achieve consensus. Otherwise, an invalid empty block is created.

And a model updating stage (step8), wherein the user downloads the latest block from the link point of the associated block, acquires the global gradient from the latest block and updates the local model. If abnormal users (namely, disconnection or illegal signature) appear in the training round, the leaders recover the private keys of the users during gradient aggregation, so that the trusted authority needs to distribute new public and private keys to the users before the next training round and perform the secret sharing step again. After each round of training is finished, for the users who participate in data sharing and upload the gradient signatures on line in the whole process and are verified to be legal, the credit value is increased, and otherwise, the credit value is reduced; the reputation value increases for blockchain nodes that generate legitimate new blocks or participate in new block verification. When the reputation values decrease to 0, they are blacklisted and are not allowed to participate in intelligence data sharing.

The operational calculation process of secret sharing of the user private key, encryption of the model gradient mask, and aggregate computation of the global gradient can be described as follows:

gradient mask: it is assumed that each user has already obtained the public key pk of the other users_iI ∈ U, then running the Diffie-Hellman protocol can compute the shared secret s between each user pair_i,j←KA.agree(sk_i,pk_j) And generates a random mask using the key as a seed for the random number generator. Suppose that each user has now obtained a gradient Δ w of length l through local training_i1 ≦ i ≦ m, assuming for simplicity vector Δ w_iAll elements in (1) are in the field

Medium, gradient Δ w_iCan be encrypted into

As shown in the following formula.

As shown in the formula (11), the user only needs to add a random number to the gradient to realize encryption, and when the encryption gradients of all users are equal

After addition, the random numbers partially cancel each other to be 0, and the global gradient can be directly obtained

Compared with a homomorphic encryption algorithm adopted in the prior art, the encryption mode has higher efficiency and does not lose the data utility. However, once some users are disconnected or the signature is verified to be illegal, the residual gradient is added, the random number cannot be offset to 0, and the global gradient cannot be obtained. Therefore, the private key of the user needs to be secretly backed up, and when the user is disconnected or the signature is illegal, redundant random numbers can be calculated by using the backed-up private key, so that a global gradient is obtained. Based on the idea, the private key of the user is shared to other local users in a secret way. Considering that the key reconstruction needs to consume large calculation and communication overheadThe computation and communication resources of the block chain nodes are much larger than those of the local users, so in the embodiment of the scheme, the private key of the user can be shared to a backup committee consisting of a plurality of block chain nodes through VSS secret.

Private key sharing: assume that the backup committee consists of n block chain nodes (election is based on the drawing algorithm of the consistent hash protocol). User i (1 ≦ i ≦ m) first selects a polynomial phi_i(x) Its private key sk_iIs set to phi_i(x) Constant term of (i.e., +)_i(0)＝sk_iThen makes a commitment COMM (phi) to the polynomial_i(x) ). Next, the polynomial φ is transformed using verifiable secret sharing techniques_i(x) Split into n secret shares<k,φ_i(k)>L 1 is less than or equal to k is less than or equal to n, and<k,φ_i(k),w_i,k,COMM(φ_i(x))>and sending the data to a block chain node k (k is more than or equal to 1 and less than or equal to n). Wherein

For the witness of a secret share, it can be used to verify that the secret share does belong to COMM (phi)_i(x) A polynomial phi of the commitment in)_i(x) This prevents partially malicious block chaining points from providing false shares during key reconstruction.

Gradient polymerization: suppose the leader has been decompressed by equation (9)

If all user gradients are uploaded to the blockchain and the signature is legal (i.e. in equation (9))

) Then the leader gets the global gradient directly through a simple addition operation as shown in the following equation:

if a part of the users are dropped or the signature is verified as illegal (note that this part of the abnormal users is the set V),the secret shares of these abnormal users i e V are submitted first by more than t block chain nodes in the backup committee<k,φ_i(k),w_i,k,COMM(φ_i(x))>After the correctness of the secret share is verified, the polynomial phi is recovered through an interpolation theorem_i(x) And the private key sk_iI ∈ V. Then calculating shared key s between other users and abnormal users_i,m＝KA.agree(sk_i,pk_m) I belongs to V, m belongs to U-V, and finally the global gradient is calculated by the following formula.

In the embodiment of the scheme, a private key of a user is shared to a backup committee in an initialization stage in a secret mode, then a mask is added to an original user gradient in a local training stage, and finally a global gradient is calculated in an aggregation stage.

Further, based on the above method, the present invention also provides an intelligence data verifiability security sharing system based on block chain and federal learning, which is used for joint modeling of intrusion detection models by multiple users in network security defense, and comprises: the system comprises user nodes used for participating in local model training in joint modeling, block link points used for carrying out consensus operation on local model training parameters of the user nodes, a credible authority used for distributing public and private key pairs for the user nodes and the block link points, and a backup committee and a verification committee which are composed of a plurality of block link points, wherein each user secretly shares a self private key to the backup committee so as to recover private key information of the user in abnormal situations, and a global gradient new block obtained by aggregation is subjected to correctness verification through the verification committee;

the verification committee verifies the correctness of the global gradient and broadcasts the new blocks passing the verification to the whole network to achieve consensus; the user updates the local intrusion detection model by the latest tile and obtaining the global gradient from the latest tile.

To verify the validity of the protocol, the following further explanation is made with reference to the test data:

privacy analysis: if a user's local gradient is added with a pair of uniform random masks (as shown in equation (11)), and the masks cancel each other to 0 when all user gradients are added, the user gradients after the masks are added can be regarded as uniform random, i.e. the masks in the pair can protect the gradient privacy of a single user.

Theorem 1: given m, l, R, U, { Δ w_i}_i∈UWhere m is the number of users and l is the user gradient Δ w_iU represents the set of all users. Assume gradient Δ w of all users_iI ∈ U all satisfy

Then

Where the symbol "≡" indicates that the two distributions are the same.

And (3) proving that: the theorem is proved by using the induction method.

(1) When m ═ U | ═ 1,

since it has already been assumed that

Then there is

In addition, the first and second substrates are,

therefore, when m is 1, expression (14) is established.

(2) When m ═ U | ═ k, assuming that theorem 1 holds, equation (14) can be used to obtain

(3) When m ═ k +1, a new set U ═ U { k +1}, is defined, then

Already in formula (16)

Then formula (17) can be written as

Since it has already been assumed that

Then it can be obtained

Thus, formula (18) can be written as

On the other hand, in the case of a liquid,

according to formula (15), there are

And has assumed

Then can reason out

In addition, since it has been assumed

Then can reason out

Based on equations (21) and (22), equation (20) can be written as:

when m is k +1, theorem 1 holds, by combining equations (19) and (23). And (5) finishing the certification.

Anti-witch attack: in the private key sharing operation calculation, in order to support a user drop, the private key of the user is shared in secret to the backup committee. However, if an attacker can control more than t block chain link points in the backup committee through the witch attack, he can generate enough false secret shares and witnesses to pass the verification of VSS, and finally reconstruct a false private key, thereby destroying the gradient aggregation operation process. The minimum value of the threshold t in VSS is analyzed by how much if the probability of an attacker breaking the secret reconstruction is to be limited below a certain value.

Given that the backup committee is derived from a lottery algorithm based on the consistent hash protocol, the probability of each block link point being elected is proportional to its reputation value. Thus, the probability p of an attacker controlling more than t blockchain nodes in the backup committee can be calculated as:

where n is the number of members in the backup committee and s is the malicious reputation ratio controlled by the attacker. Since the present invention assumes that at least 70% of the reputation value in the system is known by honest, s is 0.3. According to equation (24), the probability p is assumed to satisfy a binomial distribution. However, the two distributions are resampled, and each block link point can only be elected once in this case, so p can be considered as an upper bound for the probability of an attacker controlling more than t nodes in the backup committee. Through an exhaustion method, the minimum value of the threshold t in the VSS at a time when p is less than a certain probability is calculated, as shown in fig. 7, the minimum value of the threshold t in the VSS at a time when p is less than 0.01, 0.05 and 0.001 is displayed. P can be limited under different probabilities according to the actual training situation to ensure the safety of the method. For example, the number of training rounds of the present solution on the intelligence data set is usually within 100, and p should be less than 0.01. When the number of nodes in the backup committee is 10, the minimum value of the threshold t is 8, so that an attacker cannot destroy the key reconstruction process of the VSS through the witch attack at a high probability.

Based on the experimental data, the scheme based on the gradient mask and the safety aggregation capable of verifying secret sharing not only enhances the privacy safety during the sharing of the information data, but also does not lose the data utility; based on the binding property and the hiding property of the polynomial commitment, the new block chain structure is utilized to integrate the federal learning model verification into the consensus process, so that the tampering attack of the malicious block chain node can be resisted; the communication overhead can be effectively reduced through gradient compression, and the application in an actual scene is facilitated.

Unless specifically stated otherwise, the relative steps, numerical expressions, and values of the components and steps set forth in these embodiments do not limit the scope of the present invention.

Based on the foregoing method and/or system, an embodiment of the present invention further provides a server, including: one or more processors; a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the method described above.

Based on the above method and/or system, the embodiment of the invention further provides a computer readable medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the above method.

In all examples shown and described herein, any particular value should be construed as merely exemplary, and not as a limitation, and thus other examples of example embodiments may have different values.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A verifiable security sharing method of information data based on block chain and federal learning is used for joint modeling of intrusion detection models by multiple users in network security defense, and is characterized in that the joint modeling process comprises the following contents:

the method comprises the following steps that a trusted authority distributes a public and private key pair for each user and each block chain link point, the public and private key pairs are sent to the users and the block chain link points through a secure channel, each user secretly shares a private key of the user to a backup committee so as to recover private key data of the user under the abnormal condition of the user, and the backup committee consists of a plurality of block chain link points;

carrying out signature verification on uploaded model gradient data by block link points, putting legal model gradients passing the signature verification into a transaction pool, aggregating the legal gradients in the transaction pool by using a selected leader in the block link points, adding a redundancy mask generated by a user under an abnormal condition recovered by a backup committee to obtain a global gradient, and sending the global gradient to a verification committee by creating a new block for recording the global gradient and other key parameters, wherein the verification committee consists of a plurality of block link points;

the verification committee verifies the correctness of the global gradient and broadcasts the new blocks passing the verification to the whole network to achieve consensus; the user updates the local intrusion detection model by receiving the latest global gradient and downloading the latest tile and obtaining the global gradient from it.

2. The intelligence data verifiable security sharing method based on blockchain and federal learning of claim 1, wherein the global gradient in the joint modeling is iteratively obtained by setting a model convergence condition in an iteration round to synchronously iteratively update the user local intrusion detection model, wherein the model convergence condition is a maximum iteration round.

3. The intelligence data verifiable security sharing method based on block chain and federal learning of claim 1 or 2, characterized in that, the users and the block chain nodes are stimulated to participate in the joint modeling of the intrusion monitoring model by setting a credit value for each user and block chain node, and the block chain nodes are elected according to the credit values to form a leader, a backup committee and a verification committee, and the joint participation authority of the users and the block chain nodes with the credit values smaller than a threshold value in the joint modeling is managed and limited by using a blacklist.

4. The method for safely sharing verifiability of intelligence data based on blockchain and federal learning according to claim 1, wherein in the model gradient data mask encryption, a shared key between the user and other users is calculated by using a Diffie-Hellman protocol with respect to the own private key of the user and the public keys of other users, and the shared key is used as a seed of a random number generator to generate a random mask, and the model gradient local to the user is encrypted by using the random mask; the user adds a private key to a selected polynomial and constructs a polynomial commitment by using a verifiable secret sharing technology, and the backup committee recovers redundant random masks through key reconstruction when the user drops or signs an illegal abnormal situation, by splitting the polynomial into n secret shares and sending the polynomial, secret share witnesses and the polynomial commitment to the backup committee, wherein the secret share witnesses are used for verifying the commitment polynomial to which the secret share belongs.

5. The method of claim 4, wherein the encrypted model gradient is compressed using a Chinese remainder theorem CRT, the compression process comprising: firstly, uniformly dividing the model gradient after the encryption of a user into r segments, wherein,

6. The method for safely sharing verifiability of intelligence data based on blockchain and federal learning according to claim 4, wherein a consistent hash protocol based on reputation value is used for the blockchain nodes to elect the leader, and the election process specifically comprises: setting a Hash ring, distributing Hash ring spaces corresponding to each block link point according to the reputation value of the block link point, carrying out Hash calculation on an initial SHA-256 Hash value of the current latest block, mapping the Hash value obtained by calculation to the Hash ring, and determining a leader of the block link point selected by drawing according to the Hash ring space where the mapping result is located.

7. The intelligence data verifiable security sharing method based on block chain and federal learning of claim 6, wherein the set of all users is set as U, and the abnormal set of users in case of offline or illegal signature abnormal is set as V, then the process of the leader aggregating legal gradient data in the transaction pool is expressed as:

wherein, CRT denotes the operation of the compression, and,

8. The method for safely sharing verifiable intelligence data based on blockchain and federal learning of claim 1 or 7, wherein for a user in an abnormal situation, firstly, a plurality of block chain link points in a backup committee are used to submit secret shares of the abnormal user and carry out correctness verification on the secret shares, a polynomial and a private key of the abnormal user are recovered by using an interpolation theorem, and then a redundant random mask is calculated by using a shared key between other users and the abnormal user, so that a global gradient is recovered.

9. The intelligence data verifiable security sharing method based on block chain and federal learning of claim 8, wherein in the correctness verification, whether the model gradient in the transaction pool is tampered with is confirmed according to the addition homomorphism promised by the polynomial, for the case of not being tampered, the new block created by the leader is determined to be legal, when the verifier that the new block is judged to be legal by the block chain node in the verification committee reaches the preset value, the verification is passed, and if the verifier proportion is smaller than the preset value, the invalid empty block is generated.

10. An intelligence data verifiability security sharing system based on block chain and federal learning is used for joint modeling of intrusion detection models by multiple users in network security defense, and is characterized by comprising the following steps: the system comprises user nodes used for participating in local model training in joint modeling, block link points used for carrying out consensus operation on local model training parameters of the user nodes, a credible authority used for distributing public and private key pairs for the user nodes and the block link points, and a backup committee and a verification committee which are composed of a plurality of block link points, wherein each user secretly shares a self private key to the backup committee so as to recover private key information of the user in abnormal situations, and the global gradient obtained by aggregation is subjected to correctness verification through the verification committee;