CN114912631A - Block chain medical data sharing-based federal learning method - Google Patents
Block chain medical data sharing-based federal learning method Download PDFInfo
- Publication number
- CN114912631A CN114912631A CN202210390877.0A CN202210390877A CN114912631A CN 114912631 A CN114912631 A CN 114912631A CN 202210390877 A CN202210390877 A CN 202210390877A CN 114912631 A CN114912631 A CN 114912631A
- Authority
- CN
- China
- Prior art keywords
- training
- data
- block chain
- federal learning
- aggregator
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/20—Ensemble learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/602—Providing cryptographic facilities or services
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/64—Protecting data integrity, e.g. using checksums, certificates or signatures
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H10/00—ICT specially adapted for the handling or processing of patient-related medical or healthcare data
- G16H10/60—ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1097—Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Bioethics (AREA)
- General Physics & Mathematics (AREA)
- Computer Security & Cryptography (AREA)
- Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Epidemiology (AREA)
- Primary Health Care (AREA)
- Public Health (AREA)
- Artificial Intelligence (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Storage Device Security (AREA)
- Medical Treatment And Welfare Office Work (AREA)
- Measuring And Recording Apparatus For Diagnosis (AREA)
Abstract
The invention relates to the field of data security, in particular to a block chain medical data sharing-based federal learning method. According to the method, based on the block chain characteristics and privacy protection means, privacy protection is performed on federal learning while medical data sharing is completed, and the problem of medical data leakage in federal learning is avoided.
Description
Technical Field
The invention relates to the field of data security, in particular to a block chain medical data sharing-based federal learning method.
Background
In recent medical reform, informatization construction of medical institutions is strengthened, but many problems and challenges still exist in the aspects of medical data intercommunication mutual recognition, data security, transparency, privacy protection and the like.
In recent years, artificial intelligence is rapidly developed in the medical field, and can play a role in auxiliary diagnosis in a part of scenes. To improve the accuracy of machine learning models, large amounts of data are required, so data sharing across organizations is often required.
Federated learning accomplishes the computation by distributing the machine learning model (to the model owners) to the various nodes (data owners), rather than aggregating the data of the various data owners. The classification performance of the method is equivalent to that of local training, and the method has better universality and generalization capability because more data is contained. However, federal learning does not have privacy protection capabilities by itself. Research shows that the reverse attack can reconstruct a picture with high reducibility from model weight and gradient updating. In order to protect the privacy of the patient during this process, technical means are required to simultaneously compromise the privacy of the data and the effectiveness of the data.
The blockchain can participate in maintaining a reliable database as a whole in a decentralized manner based on cryptography rather than trust. The emergence of the blockchain technology provides a new idea for data sharing in the medical industry.
Disclosure of Invention
Aiming at the problems in the prior art, the invention aims to provide a block chain medical data sharing-based federal learning method, which is used for completing the sharing of medical data and simultaneously performing privacy protection on federal learning based on block chain characteristics and privacy protection means, so that the problem of medical data leakage in the federal learning is avoided.
In order to achieve the purpose, the invention is realized by adopting the following technical scheme.
A block chain medical data sharing-based federal learning method comprises the following steps:
step 1, establishing a block chain network;
step 2, uploading the machine learning model to an interplanetary file system IPFS of the block chain network by a model owner, and planning a federal learning process by setting an Orchester consisting of a plurality of intelligent contracts of a contract layer in the block chain network;
step 3, the data owner acquires the encrypted data and the breakpoint from the interplanetary file system IPFS of the block chain network, and then decrypts the encrypted data;
step 4, training the machine learning model by using the decrypted data according to the Orchester planning federal learning process to obtain a training gradient of the machine learning model;
step 5, the data owner adds noise to the obtained training gradient to realize differential privacy, and then sends the training gradient to the security aggregator;
step 6, recording training events in the machine learning process by the distributed non-falsifiable account book of the block chain network; the safety aggregator generates a nonce for the training record and sends the nonce to a medical institution participating in the training, and records the hash value hash (K, nonce) of the training record on the block chain;
and 7, encrypting the training gradients transmitted by the data owners by the security aggregator, collecting the training gradients transmitted by all the data owners, and updating the machine learning model.
Compared with the prior art, the invention has the beneficial effects that: based on the block chain characteristics and privacy protection means, privacy protection is performed on federal learning while medical data sharing is completed, and the problem of medical data leakage in federal learning is avoided.
Drawings
The invention is described in further detail below with reference to the figures and specific embodiments.
FIG. 1 is a schematic overall flow chart of the present invention for training a machine learning model;
FIG. 2 is an original drawing of a photograph of melanoma in a training set;
FIG. 3 is a schematic diagram of the process of restoring the training set photographs in experiment 3;
fig. 4 is a photograph of the training set that was recovered after 180 iterations in trial 4 using the reverse attack.
Detailed Description
Embodiments of the present invention will be described in detail below with reference to examples, but it will be understood by those skilled in the art that the following examples are only illustrative of the present invention and should not be construed as limiting the scope of the present invention.
Referring to fig. 1, a federal learning method based on blockchain medical data sharing includes the following steps:
step 1, establishing a block chain network;
specifically, a FISCO BCOS block chain bottom platform is used for building a block chain network, and a plurality of medical institutions which are owners of medical data are used as block chain network nodes; setting an interplanetary File System (IPFS) to store medical data in a link mode;
step 2, uploading the machine learning model to an interplanetary file system IPFS of the block chain network by a model owner-a certain scientific research unit, and planning a federal learning process by setting an Orchester consisting of a plurality of intelligent contracts of a contract layer in the block chain network;
wherein the Orchester comprises a medical institution list, and medical institutions in the medical institution list have data for training the machine learning model; in the medical institution table, medical institutions can enable own medical data to be used by only certain people or not by certain people by setting a white list or a black list of addresses (hash of public keys), so that the data owner can control the data; the medical institution communicates through a secure communication channel;
step 3, the data owner acquires the encrypted data and the breakpoint from the interplanetary file system IPFS of the block chain network, and then decrypts the encrypted data;
step 4, training the machine learning model by using the decrypted data according to the Orchester planning federal learning process to obtain a training gradient of the machine learning model;
step 5, the data owner adds noise to the obtained training gradient to realize differential privacy, and then sends the training gradient to the security aggregator;
specifically, differential privacy is achieved by randomly modifying a data set, so that the individual information disclosure is reduced, and the statistical properties and the inference capability of the whole data are kept.
Assume the raw data is { x } 1 ,x 2 ,...,x n The function to be calculated is f (x) 1 ,x 2 ,...,x n ) The raw data may be noisy before calculation, and f (x) may be calculated 1 +r 1 ,x 2 +r 2 ,...,x n +r n ). Due to the addition of noise, privacy is provided to a certain extentThe protection is obtained, but the result of the calculation after the noise is added is close to the real result, which needs to make a trade-off between the privacy and the usability of the data.
Intuitively, if the result obtained by querying and calculating the whole database is almost the same after the data is modified, the privacy of the data is protected to some extent.
Then for a randomization algorithm M (the randomization algorithm is not a fixed value for the output of a particular input, but a random value following a certain distribution), provided that:
Pr[M(x)∈s]≤e ε Pr[M(y)∈s]+δ
if the algorithm M is allAll hold, then this algorithm M is said to satisfy (e, δ) differential privacy. ε is the consumption of the privacy budget for a single query. The smaller epsilon indicates the better privacy protection.
If two data sets x, y differ by only one record, i.e., | | x-y | | purple 1 1, then the two datasets are "adjacent datasets". The contiguous data set means that protection is provided for each record. When the probabilities are close, the similarity is high, and it is difficult to distinguish x output from y output, so that data with different x and y is protected.
When an external observer obtains a result from a certain data set, whether a certain individual is used or not cannot be known, and then the data set meets the differential privacy.
Step 6, recording training events in the machine learning process by the distributed non-falsifiable account book of the block chain network; the safety aggregator generates a nonce for the training record and sends the nonce to a medical institution participating in the training, and records the hash value hash (K, nonce) of the training record on the block chain;
the training record needs to satisfy two characteristics: 1. the medical institution participating in the training can prove that the medical institution participates in the training by the record; 2. until now, no other person than the security aggregator could know whether this medical facility is involved in the training.
To do this, the blockchain network needs to be set with two published parameters, the prime number q and its primitive root a. Private keys of a certain medical institution and a security aggregator in a certain training are respectively pri 1 ,pri 2 And both parties calculate and disclose:
Since only the medical institution participating in the training has K, only the medical institution participating in the training can prove that the piece of training record belongs to itself. Meanwhile, other people cannot deduce who participates in the training from the hash (K, nonce) of the training record.
And 7, encrypting the training gradients transmitted by the data owners by the security aggregator, collecting the training gradients transmitted by all the data owners, and updating the machine learning model.
Specifically, the encryption mode is that the worker selects worker selection, that is, the security aggregator randomly selects only a part of collected gradients to generate output, and the medical institution does not know whether the gradient uploaded by the medical institution is selected. This way, illegal attackers can be prevented from obtaining data of a certain party, and the method and the device can play a role of privacy protection together with differential privacy.
Simulation test results
Taking a melanoma photo set from International Skin Imaging corporation as a training set of a machine learning model; the photos in the melanoma photo set are divided into benign tumor photos and malignant tumor photos; storing the training set in an IPFS;
randomly selecting 80 benign tumor photos and 80 malignant tumor photos in a training set, and taking the 160 photos as a test set;
test 1: training the machine learning model by using a general federal learning method, and testing the trained machine learning model by using a test set;
test 2: the method is characterized in that a machine learning model is trained by using the federal learning method, and the trained machine learning model is tested by using a test set;
the recognition accuracy of both federal studies is shown in table 1. Wherein, Accuracy represents the proportion of correct number of classifications in the test set; sensitivity represents the ratio of the number of photographs that successfully identified malignant melanoma to the number of photographs of all malignant melanomas in the test set; ROC-AUC represents the area under the ROC (receiver Operating characteristics) curve, and the closer to 1, the better; the MCC is a mausis correlation coefficient, and is used to measure the classification effect when the difference between the sizes of two classes is large during binary classification.
TABLE 1
Accuracy | Sensitivity | ROC-AUC | MCC | |
Test 1 | 0.92 | 0.86 | 0.92 | 0.85 |
Test 2 | 0.85 | 0.81 | 0.88 | 0.78 |
As can be seen from table 1, although the recognition accuracy of test 2 is worse than that of test 1, test 2 still has higher recognition accuracy, and even if the federal learning method of the present invention does not have too great influence on the recognition accuracy of the machine learning model, the machine learning model can still maintain higher recognition accuracy.
Test 3: using a reverse attack to attack a general federal learning method, and restoring a picture in a training set;
test 4: attacking the federal learning method by using reverse attack to restore the photos in the training set;
referring to fig. 2, an original image of a melanoma photograph is collected for training.
Referring to fig. 3, a schematic diagram of the process of restoring the training set photo in experiment 3 is shown.
Comparing fig. 2 and 3, it can be seen that the shape of melanoma is substantially visualized after 30 iterations with the reverse attack; after 50 iterations with the reverse attack, the original image is substantially restored. This means that the picture information is leaked and the privacy of the medical data cannot be guaranteed.
Referring to fig. 4, a photograph of the training set is restored after 180 iterations in trial 4 using the reverse attack.
Comparing fig. 2 and fig. 4, it can be seen that the photos in the training set cannot be restored by using a reverse attack, which indicates that the federal learning method of the present invention does not leak the image information, and can ensure the privacy of the medical data.
Although the present invention has been described in detail in this specification with reference to specific embodiments and illustrative embodiments, it will be apparent to those skilled in the art that modifications and improvements can be made thereto based on the present invention. Accordingly, such modifications and improvements are intended to be within the scope of the invention as claimed.
Claims (5)
1. A federal learning method based on block chain medical data sharing is characterized by comprising the following steps:
step 1, establishing a block chain network;
step 2, uploading the machine learning model to an interplanetary file system IPFS of the block chain network by a model owner, and planning a federal learning process by setting an Orchester consisting of a plurality of intelligent contracts of a contract layer in the block chain network;
step 3, the data owner acquires the encrypted data and the breakpoint from the interplanetary file system IPFS of the block chain network, and then decrypts the encrypted data;
step 4, training the machine learning model by using the decrypted data according to the Orchester planning federal learning process to obtain a training gradient of the machine learning model;
step 5, the data owner adds noise to the obtained training gradient to realize differential privacy, and then sends the training gradient to the security aggregator;
step 6, recording training events in the machine learning process by the distributed non-falsifiable account book of the block chain network; the safety aggregator generates a nonce for the training record and sends the nonce to a medical institution participating in the training, and records the hash value hash (K, nonce) of the training record on the block chain;
and 7, encrypting the training gradients transmitted by the data owners by the security aggregator, collecting the training gradients transmitted by all the data owners, and updating the machine learning model.
2. The federal learning method for blockchain medical data sharing according to claim 1, wherein in step 1, a blockchain network is built by using a FISCO BCOS blockchain underlying platform, and a data owner is used as a blockchain network node; and setting an interplanetary File System IPFS (Inter-planet File System) to store the medical data in a link mode.
3. A federal learning method as claimed in claim 1, wherein the differential privacy in step 5, particularly for a randomized algorithm M, is provided by the following equation:
Pr[M(x)∈s]≤e ε Pr[M(y)∈s]+δ
and for allAll hold, then this algorithm M is said to satisfy (e, δ) differential privacy; where ε is the consumption of the privacy budget by a single query;
when an external observer obtains a result from a certain data set, whether a certain individual is used or not cannot be known, and then the data set meets the differential privacy.
4. A federal learning method based on blockchain medical data sharing as claimed in claim 1, wherein K of hash (K, nonce) in step 6, specifically, the blockchain network sets two public parameters, prime number q and its primitive root a; private keys of a certain data owner and a security aggregator in certain training are respectively pri 1 ,pri 2 And both parties calculate and disclose:
5. A federal learning method based on block chain medical data sharing as claimed in claim 1, wherein the security aggregator encrypts the training gradient transmitted by the data owner in step 7, specifically, the encryption selects worker selection, that is, the security aggregator randomly selects only a part of the collected gradient to generate output, and the medical institution does not know whether the gradient uploaded by itself is selected.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210390877.0A CN114912631A (en) | 2022-04-14 | 2022-04-14 | Block chain medical data sharing-based federal learning method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210390877.0A CN114912631A (en) | 2022-04-14 | 2022-04-14 | Block chain medical data sharing-based federal learning method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114912631A true CN114912631A (en) | 2022-08-16 |
Family
ID=82765705
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210390877.0A Pending CN114912631A (en) | 2022-04-14 | 2022-04-14 | Block chain medical data sharing-based federal learning method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114912631A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116665913A (en) * | 2023-07-13 | 2023-08-29 | 之江实验室 | Cross-institution patient matching system and method |
CN116682543A (en) * | 2023-08-03 | 2023-09-01 | 山东大学齐鲁医院 | Sharing method and system of regional rehabilitation information |
-
2022
- 2022-04-14 CN CN202210390877.0A patent/CN114912631A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116665913A (en) * | 2023-07-13 | 2023-08-29 | 之江实验室 | Cross-institution patient matching system and method |
CN116665913B (en) * | 2023-07-13 | 2023-10-13 | 之江实验室 | Cross-institution patient matching system and method |
CN116682543A (en) * | 2023-08-03 | 2023-09-01 | 山东大学齐鲁医院 | Sharing method and system of regional rehabilitation information |
CN116682543B (en) * | 2023-08-03 | 2023-11-10 | 山东大学齐鲁医院 | Sharing method and system of regional rehabilitation information |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Hao et al. | Towards efficient and privacy-preserving federated deep learning | |
Sharma et al. | Hiding data in images using cryptography and deep neural network | |
Avudaiappan et al. | Medical image security using dual encryption with oppositional based optimization algorithm | |
Zhang et al. | HF-TPE: High-fidelity thumbnail-preserving encryption | |
WO2020034754A1 (en) | Secure multi-party computation method and apparatus, and electronic device | |
Mugunthan et al. | Smpai: Secure multi-party computation for federated learning | |
Lugan et al. | Secure architectures implementing trusted coalitions for blockchained distributed learning (TCLearn) | |
CN114912631A (en) | Block chain medical data sharing-based federal learning method | |
CN105100083B (en) | A kind of secret protection and support user's revocation based on encryption attribute method and system | |
Doss et al. | Memetic optimization with cryptographic encryption for secure medical data transmission in IoT-based distributed systems | |
KR102289419B1 (en) | Method and apparatus for authentification of user using biometric | |
CN104363215A (en) | Encryption method and system based on attributes | |
Niu et al. | Toward verifiable and privacy preserving machine learning prediction | |
Koppu et al. | A fast enhanced secure image chaotic cryptosystem based on hybrid chaotic magic transform | |
CN111800252A (en) | Information auditing method and device based on block chain and computer equipment | |
Kim et al. | Efficient Privacy‐Preserving Fingerprint‐Based Authentication System Using Fully Homomorphic Encryption | |
Tang et al. | A secure and trustworthy medical record sharing scheme based on searchable encryption and blockchain | |
Kalapaaking et al. | Blockchain-based federated learning with SMPC model verification against poisoning attack for healthcare systems | |
Li et al. | SPFM: Scalable and privacy-preserving friend matching in mobile cloud | |
Liu et al. | A color image encryption scheme based on a novel 3d chaotic mapping | |
Liu et al. | Face image publication based on differential privacy | |
Fan et al. | Lightweight privacy and security computing for blockchained federated learning in IoT | |
CN112380404B (en) | Data filtering method, device and system | |
Manisha et al. | CBRC: a novel approach for cancelable biometric template generation using random permutation and Chinese Remainder Theorem | |
Eltaieb et al. | Efficient implementation of cancelable face recognition based on elliptic curve cryptography |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |