CN112967812A - Anti-theft attack medical diagnosis model protection method based on federal learning - Google Patents
Anti-theft attack medical diagnosis model protection method based on federal learning Download PDFInfo
- Publication number
- CN112967812A CN112967812A CN202110422407.3A CN202110422407A CN112967812A CN 112967812 A CN112967812 A CN 112967812A CN 202110422407 A CN202110422407 A CN 202110422407A CN 112967812 A CN112967812 A CN 112967812A
- Authority
- CN
- China
- Prior art keywords
- aggregated
- teachers
- federal
- teacher
- medical
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 23
- 238000003745 diagnosis Methods 0.000 title claims abstract description 19
- 238000000586 desensitisation Methods 0.000 claims abstract description 10
- 230000004931 aggregating effect Effects 0.000 claims abstract description 4
- 238000003062 neural network model Methods 0.000 claims description 16
- 230000002776 aggregation Effects 0.000 claims description 3
- 238000004220 aggregation Methods 0.000 claims description 3
- 238000002372 labelling Methods 0.000 claims description 3
- 201000010099 disease Diseases 0.000 description 2
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 2
- 230000008520 organization Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/50—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for simulation or modelling of medical disorders
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/20—Ensemble learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- Medical Informatics (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Bioethics (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Databases & Information Systems (AREA)
- Evolutionary Computation (AREA)
- Public Health (AREA)
- Computational Linguistics (AREA)
- Epidemiology (AREA)
- Molecular Biology (AREA)
- Pathology (AREA)
- Primary Health Care (AREA)
- Biophysics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Medical Treatment And Welfare Office Work (AREA)
Abstract
The invention discloses a method for protecting a medical diagnosis model against theft and attack based on federal learning, which belongs to the technical field of medical diagnosis and protection and comprises the following specific steps: s1, assuming that N medical institutions respectively collect N sensitive medical data sets, and respectively training different local models by each institution based on the data sets to obtain N teachers; s2, deploying the trained teachers locally in each medical institution, and recording the prediction result of each teacher for final result voting; s3, introducing Laplace noise, and disturbing the statistical condition of the ticket number to realize differential privacy protection; s4, aggregating all the teachers votes in the federal global server to obtain aggregated teachers; s5, marking desensitization public data sets by using aggregated teachers at a federal global server, and transmitting learned knowledge; s6, training the student by the marked public data set at the federal global server; and S7, providing the trained student model for a user to use, and reducing the risk of leakage.
Description
Technical Field
The invention relates to the technical field of medical diagnosis protection, in particular to a method for protecting a medical diagnosis model against theft and attack based on federal learning.
Background
In the field of intelligent medical treatment, electronic health data is generally used to extract patient characteristics and identify patient groups, and simply, clinical data is used to train a disease diagnosis model. These clinical data are often sensitive data related to patient privacy, and thus data often cannot be communicated between medical institutions. However, most hospitals (or data centers) have limited data volume and insufficient data is held to train a good model. Only a large medical AI cloud service provider can provide good model services, such as model training, identification and the like, the services are usually opened to the outside in an interface form, and a user can call the services to complete operations such as medical image identification, disease diagnosis prediction and the like; the model stealing attack means that an attacker speculates parameters and training data information of a system model by inquiring and analyzing input and output and other external information of the system, for the attacker, the model is a black box, and the attacker can select an input value to observe a prediction result of the model;
an attacker can 'steal' the AI model by calling the interface of the medical AI cloud service for many times, which can cause two problems: one is the theft of intellectual property. Sample collection and model training need to consume large resources, and the trained model is important intellectual property; and secondly, an attacker can construct a countermeasure sample through the stolen model, so that a special protection method can be adopted in the model training stage to reduce the risk of model theft.
In order to train a medical diagnosis model with a better effect by using sensitive medical data collected by hospitals or data institutions and prevent the model from being stolen, a method for protecting the medical diagnosis model against stealing attacks based on federal learning is provided.
Disclosure of Invention
The invention aims to solve the defects in the prior art, and provides a method for protecting a medical diagnosis model based on federal learning and used for preventing stealing attacks.
In order to achieve the purpose, the invention adopts the following technical scheme:
a medical diagnosis model protection method for preventing stealing attacks based on federal learning comprises the following specific steps:
s1, assuming that N medical institutions respectively collect N sensitive medical data sets, and respectively training different local models by each institution based on the data sets to obtain N teachers;
s2, deploying the trained teachers locally in each medical institution, and recording the prediction result of each teacher for final result voting;
s3, introducing Laplace noise, and disturbing the statistical condition of the ticket number to realize differential privacy protection;
s4, aggregating the prediction results of all the teachers at the federal global server, voting, and selecting the highest vote number as a final result to obtain aggregated teachers;
s5, labeling the unlabeled desensitization public data set by using an aggregated teacher at the federal global server to obtain a labeled desensitization public data set;
s6, training a new deep neural network model student by using the labeled desensitization public data set obtained in the step S5 at a federal global server;
and S7, providing the trained student for the user to use.
As a further scheme of the invention: the specific process of implementing differential privacy protection in step S3 is as follows:
SS1, counting the voting conditions of the teacher models trained locally by N medical institutions at the Federal Global nation server to form an aggregation result;
SS2, if most of the teacher models agree with a certain prediction result, the prediction result does not depend on a specific dispersed data set, namely the privacy cost is very low; if the two types of prediction results have similar votes, the inconsistency may reveal privacy information, so that the differential privacy is increased before the votes form aggregated teachers;
the SS3 treats the aggregated teacher of the federated global server as a differential privacy module, and the user provides input data (i.e., samples to be predicted) that can return tags that protect privacy.
As a further scheme of the invention: the differential privacy step in step SS2 specifically includes: laplace noise is introduced, the statistical condition of the number of the votes is disturbed, and privacy is protected.
As a further scheme of the invention: the main factors of the step S6 for training the new deep neural network model student are:
(1) the differential privacy protection of the aggregated teacher in step S4 is limited, and if the attacker calls the aggregated teacher differential privacy module for multiple times, some privacy information may be obtained by outputting the result;
(2) the deep neural network model student in the step S6 is a desensitized model, and can be directly run on the user equipment, so that an attacker can be prevented from directly calling the aggregated teacher differential privacy module for multiple times, and the leakage risk is reduced.
As a further scheme of the invention: the deep neural network model student in the step S6 is trained based on the unmarked desensitized common data set, and before training the deep neural network model student, the desensitized common data set needs to be labeled with an aggregated teecher.
Compared with the prior art, the invention has the beneficial effects that:
1. and introducing a federal learning framework, and training a local teacher model by fully utilizing local limited data sets of each medical institution. The federated global server is aggregated into an aggregated teacher model, so that the data privacy of each organization is ensured not to be revealed while the local data knowledge of each medical organization is acquired;
2. before the federal global server votes to form an aggregated teacher, by adding an additional differential privacy step: laplace noise is introduced, the statistical condition of the number of the votes is disturbed, and at the moment, the aggregated teacher is regarded as a module of differential privacy, so that a user provides input data (namely a sample to be predicted), and the aggregated teacher can return a label for protecting the privacy;
3. before the students are trained, the desensitized public data set is labeled by using the aggregated teachers, the learned knowledge is transferred, and the trained student model is provided for the user to use, so that the user can be prevented from directly calling the aggregated teachers for many times, and the risk of model stealing is further reduced.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention.
Fig. 1 is a flow chart of a method for protecting a medical diagnosis model against theft and attack based on federal learning according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments.
In the description of the present invention, it is to be understood that the terms "upper", "lower", "front", "rear", "left", "right", "top", "bottom", "inner", "outer", and the like, indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, are merely for convenience in describing the present invention and simplifying the description, and do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present invention.
Referring to fig. 1, a medical diagnosis model protection method for preventing theft and attack based on federal learning includes the following specific steps:
s1, assuming that N medical institutions respectively collect N sensitive medical data sets, and respectively training different local models by each institution based on the data sets to obtain N teachers;
s2, deploying the trained teachers locally in each medical institution, and recording the prediction result of each teacher for final result voting;
s3, introducing Laplace noise, and disturbing the statistical condition of the ticket number to realize differential privacy protection;
s4, aggregating the prediction results of all the teachers at the federal global server, voting, and selecting the highest vote number as a final result to obtain aggregated teachers;
s5, labeling the unlabeled desensitization public data set by using an aggregated teacher at the federal global server to obtain a labeled desensitization public data set;
s6, training a new deep neural network model student by using the labeled desensitization public data set obtained in the step S5 at a federal global server;
and S7, providing the trained student for the user to use.
The specific process of implementing differential privacy protection in step S3 is as follows:
SS1, counting the voting condition of the teacher model trained locally by N medical institutions at the federal global server to form an aggregation result;
SS2, if most of the teacher models agree with a certain prediction result, the prediction result does not depend on a specific dispersed data set, namely the privacy cost is very low; if the two types of prediction results have similar votes, the inconsistency may reveal privacy information, so before the votes form aggregated teachers, a differential privacy step is added: laplace noise is introduced, the statistical condition of the number of the votes is disordered, and privacy is protected;
the SS3 treats the aggregated teacher of the federated global server as a differential privacy module, and the user provides input data (i.e., samples to be predicted) that can return tags that protect privacy.
The main factors of the step S6 for training the new deep neural network model student are:
(1) the differential privacy protection of the aggregated teacher in step S4 is limited, and if the attacker calls the aggregated teacher differential privacy module for multiple times, some privacy information may be obtained by outputting the result;
(2) the deep neural network model student in the step S6 is a desensitized model, and can be directly run on the user equipment, so that an attacker can be prevented from directly calling the aggregated teacher differential privacy module for multiple times, and the leakage risk is reduced.
The deep neural network model student in the step S6 is trained based on the unmarked desensitized common data set, and before training the deep neural network model student, the desensitized common data set needs to be labeled with an aggregated teecher.
The working principle and the using process of the invention are as follows: firstly, N sensitive medical data sets are collected by N medical institutions respectively, different local models are independently trained by the institutions based on the data sets to obtain N teachers, the trained teachers are deployed locally in the medical institutions, the prediction result of each teacher is recorded for final result voting, the statistical condition of the number of votes is disturbed by introducing Laplace noise to realize differential privacy protection, then the prediction results of all the teachers are aggregated and voted in a federal global server, the highest number of votes is selected as the final result to obtain an aggregated teacher, the unlabeled desensitized public data set is labeled by the aggregated teacher in the federal global server to obtain a labeled desensitized public data set, the labeled desensitized public data set is used for training a new deep neural network model stub, the trained stub is provided for a user to use, before the stub is trained, the public data set is labeled by using the aggregated teacher, the learned knowledge is transferred, the trained student model is provided for the user to use, the user can be prevented from directly calling the aggregated teacher for many times, and the leakage risk is further reduced.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention should be equivalent or changed within the scope of the present invention.
Claims (5)
1. A medical diagnosis model protection method for preventing stealing attacks based on federal learning is characterized by comprising the following specific steps:
s1, assuming that N medical institutions respectively collect N sensitive medical data sets, and respectively training different local models by each institution based on the data sets to obtain N teachers;
s2, deploying the trained teachers locally in each medical institution, and recording the prediction result of each teacher for final result voting;
s3, introducing Laplace noise, and disturbing the statistical condition of the ticket number to realize differential privacy protection;
s4, aggregating the prediction results of all the teachers at the federal global server, voting, and selecting the highest vote number as a final result to obtain aggregated teachers;
s5, labeling the unlabeled desensitization public data set by using an aggregated teacher at the federal global server to obtain a labeled desensitization public data set;
s6, training a new deep neural network model student by using the labeled desensitization public data set obtained in the step S5 at a federal global server;
and S7, providing the trained student for the user to use.
2. The method for protecting the federally-learned attack-stealing-prevention medical diagnosis model according to claim 1, wherein the method comprises the following steps: the specific process of implementing differential privacy protection in step S3 is as follows:
SS1, counting the voting condition of the teacher model trained locally by N medical institutions at the federal global server to form an aggregation result;
SS2, if most of the teacher models agree with a certain prediction result, the prediction result does not depend on a specific dispersed data set, namely the privacy cost is very low; if the two types of prediction results have similar votes, the inconsistency may reveal privacy information, so that the differential privacy is increased before the votes form aggregated teachers;
the SS3 treats the aggregated teacher of the federated global server as a differential privacy module, and the user provides input data, which can return tags that protect privacy.
3. The method for protecting the federally-learned attack-stealing-prevention medical diagnosis model according to claim 2, wherein the method comprises the following steps: the differential privacy step in step SS2 specifically includes: laplace noise is introduced, the statistical condition of the number of the votes is disturbed, and privacy is protected.
4. The method for protecting the federally-learned attack-stealing-prevention medical diagnosis model according to claim 1, wherein the method comprises the following steps: the main factors of the step S6 for training the new deep neural network model student are:
(1) the differential privacy protection of the aggregated teacher in step S4 is limited, and if the attacker calls the aggregated teacher differential privacy module for multiple times, some privacy information may be obtained by outputting the result;
(2) the deep neural network model student in the step S6 is a desensitized model, and can be directly run on the user equipment, so that an attacker can be prevented from directly calling the aggregated teacher differential privacy module for multiple times.
5. The method for protecting the federally-learned attack-stealing-prevention medical diagnosis model according to claim 1, wherein the method comprises the following steps: the deep neural network model student in the step S6 is trained based on the unmarked desensitized common data set, and before training the deep neural network model student, the desensitized common data set needs to be labeled by using an aggregated teecher on the federal global server.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110422407.3A CN112967812A (en) | 2021-04-20 | 2021-04-20 | Anti-theft attack medical diagnosis model protection method based on federal learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110422407.3A CN112967812A (en) | 2021-04-20 | 2021-04-20 | Anti-theft attack medical diagnosis model protection method based on federal learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112967812A true CN112967812A (en) | 2021-06-15 |
Family
ID=76280857
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110422407.3A Pending CN112967812A (en) | 2021-04-20 | 2021-04-20 | Anti-theft attack medical diagnosis model protection method based on federal learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112967812A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114169007A (en) * | 2021-12-10 | 2022-03-11 | 西安电子科技大学 | Medical privacy data identification method based on dynamic neural network |
CN114566289A (en) * | 2022-04-26 | 2022-05-31 | 之江实验室 | Disease prediction system based on multi-center clinical data anti-cheating analysis |
CN115881306A (en) * | 2023-02-22 | 2023-03-31 | 中国科学技术大学 | Networked ICU intelligent medical decision-making method based on federal learning and storage medium |
CN116564535A (en) * | 2023-05-11 | 2023-08-08 | 之江实验室 | Central disease prediction method and device based on local graph information exchange under privacy protection |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111079174A (en) * | 2019-11-21 | 2020-04-28 | 中国电力科学研究院有限公司 | Power consumption data desensitization method and system based on anonymization and differential privacy technology |
CN112232527A (en) * | 2020-09-21 | 2021-01-15 | 北京邮电大学 | Safe distributed federal deep learning method |
CN112613065A (en) * | 2020-12-02 | 2021-04-06 | 北京明朝万达科技股份有限公司 | Data sharing method and device based on differential privacy protection |
CN112668726A (en) * | 2020-12-25 | 2021-04-16 | 中山大学 | Personalized federal learning method with efficient communication and privacy protection |
-
2021
- 2021-04-20 CN CN202110422407.3A patent/CN112967812A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111079174A (en) * | 2019-11-21 | 2020-04-28 | 中国电力科学研究院有限公司 | Power consumption data desensitization method and system based on anonymization and differential privacy technology |
CN112232527A (en) * | 2020-09-21 | 2021-01-15 | 北京邮电大学 | Safe distributed federal deep learning method |
CN112613065A (en) * | 2020-12-02 | 2021-04-06 | 北京明朝万达科技股份有限公司 | Data sharing method and device based on differential privacy protection |
CN112668726A (en) * | 2020-12-25 | 2021-04-16 | 中山大学 | Personalized federal learning method with efficient communication and privacy protection |
Non-Patent Citations (3)
Title |
---|
NICOLAS PAPERNOT等: "SEMI-SUPERVISED KNOWLEDGE TRANSFER FOR DEEP LEARNING FROM PRIVATE TRAINING DATA", 《ICLR》, 31 December 2017 (2017-12-31), pages 1 - 16 * |
WENQI LI等: "Privacy-Preserving Federated Brain Tumour Segmentation", 《INTERNATIONAL WORKSHOP ON MACHINE LEARNING IN MEDICAL IMAGING》, pages 133 - 141 * |
郭子菁 等: "医疗健康大数据隐私保护综述", 《计算机科学与探索》, vol. 15, no. 3, pages 389 - 402 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114169007A (en) * | 2021-12-10 | 2022-03-11 | 西安电子科技大学 | Medical privacy data identification method based on dynamic neural network |
CN114169007B (en) * | 2021-12-10 | 2024-05-14 | 西安电子科技大学 | Medical privacy data identification method based on dynamic neural network |
CN114566289A (en) * | 2022-04-26 | 2022-05-31 | 之江实验室 | Disease prediction system based on multi-center clinical data anti-cheating analysis |
CN115881306A (en) * | 2023-02-22 | 2023-03-31 | 中国科学技术大学 | Networked ICU intelligent medical decision-making method based on federal learning and storage medium |
CN116564535A (en) * | 2023-05-11 | 2023-08-08 | 之江实验室 | Central disease prediction method and device based on local graph information exchange under privacy protection |
CN116564535B (en) * | 2023-05-11 | 2024-02-20 | 之江实验室 | Central disease prediction method and device based on local graph information exchange under privacy protection |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112967812A (en) | Anti-theft attack medical diagnosis model protection method based on federal learning | |
Casey et al. | The Kodak syndrome: risks and opportunities created by decentralization of forensic capabilities | |
Arulogun et al. | RFID-based students attendance management system | |
CN108763507A (en) | Enterprise's incidence relation method for digging and device | |
CN107085812A (en) | The anti money washing system and method for block chain digital asset | |
CN109658310A (en) | One kind being used for exhibition room museum intelligent identifying system | |
CN107820210A (en) | One kind is registered method, mobile terminal and computer-readable recording medium | |
Osayande | Electronic security systems in academic libraries: A case study of three university libraries in South-West Nigeria | |
CN106980745B (en) | Method for exhibiting data and device | |
Ekere et al. | The use of ICT for security and theft prevention in two university libraries in Nigeria | |
CN101246619A (en) | Monitoring system | |
CN107507118A (en) | A kind of three-dimensional safety of person vehicle prevention and control system | |
CN110889717A (en) | Method and device for filtering advertisement content in text, electronic equipment and storage medium | |
Hull | Dirty data labeled dirt cheap: epistemic injustice in machine learning systems | |
CN109871211A (en) | Information displaying method and device | |
Kui | The stumbling balance between public health and privacy amid the pandemic in China | |
Kotoroi | Constraints Facing African Academic Libraries in Applying Electronic Security Systems to Protect Library Materials | |
Benton et al. | Using video cameras as a research tool in public spaces: addressing ethical and information governance challenges under data protection legislation | |
CN116797005A (en) | Information acquisition management system applied to campus intelligent teaching management system | |
Estivill-Castro et al. | Privacy in data mining | |
Ayofe et al. | A framework for computer aided investigation of ATM fraud in Nigeria | |
Lyon | Surveillance technologies: Trends and social implications | |
CN110309312B (en) | Associated event acquisition method and device | |
Opara et al. | Technological Methods and Security of Information Resources in Dame Patience Goodluck Jonathan Automated Library, Ignatius Ajuru University of Education | |
CN116433387A (en) | Risk prediction method, risk prediction device, computing equipment and readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |