CN112967812A - Anti-theft attack medical diagnosis model protection method based on federal learning - Google Patents

Anti-theft attack medical diagnosis model protection method based on federal learning Download PDF

Info

Publication number
CN112967812A
CN112967812A CN202110422407.3A CN202110422407A CN112967812A CN 112967812 A CN112967812 A CN 112967812A CN 202110422407 A CN202110422407 A CN 202110422407A CN 112967812 A CN112967812 A CN 112967812A
Authority
CN
China
Prior art keywords
aggregated
teachers
federal
teacher
medical
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110422407.3A
Other languages
Chinese (zh)
Inventor
郑子彬
李世璇
陈川
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongai Health Technology Guangdong Co ltd
Original Assignee
Zhongai Health Technology Guangdong Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongai Health Technology Guangdong Co ltd filed Critical Zhongai Health Technology Guangdong Co ltd
Priority to CN202110422407.3A priority Critical patent/CN112967812A/en
Publication of CN112967812A publication Critical patent/CN112967812A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/50ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for simulation or modelling of medical disorders
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Bioethics (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Public Health (AREA)
  • Computational Linguistics (AREA)
  • Epidemiology (AREA)
  • Molecular Biology (AREA)
  • Pathology (AREA)
  • Primary Health Care (AREA)
  • Biophysics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention discloses a method for protecting a medical diagnosis model against theft and attack based on federal learning, which belongs to the technical field of medical diagnosis and protection and comprises the following specific steps: s1, assuming that N medical institutions respectively collect N sensitive medical data sets, and respectively training different local models by each institution based on the data sets to obtain N teachers; s2, deploying the trained teachers locally in each medical institution, and recording the prediction result of each teacher for final result voting; s3, introducing Laplace noise, and disturbing the statistical condition of the ticket number to realize differential privacy protection; s4, aggregating all the teachers votes in the federal global server to obtain aggregated teachers; s5, marking desensitization public data sets by using aggregated teachers at a federal global server, and transmitting learned knowledge; s6, training the student by the marked public data set at the federal global server; and S7, providing the trained student model for a user to use, and reducing the risk of leakage.

Description

Anti-theft attack medical diagnosis model protection method based on federal learning
Technical Field
The invention relates to the technical field of medical diagnosis protection, in particular to a method for protecting a medical diagnosis model against theft and attack based on federal learning.
Background
In the field of intelligent medical treatment, electronic health data is generally used to extract patient characteristics and identify patient groups, and simply, clinical data is used to train a disease diagnosis model. These clinical data are often sensitive data related to patient privacy, and thus data often cannot be communicated between medical institutions. However, most hospitals (or data centers) have limited data volume and insufficient data is held to train a good model. Only a large medical AI cloud service provider can provide good model services, such as model training, identification and the like, the services are usually opened to the outside in an interface form, and a user can call the services to complete operations such as medical image identification, disease diagnosis prediction and the like; the model stealing attack means that an attacker speculates parameters and training data information of a system model by inquiring and analyzing input and output and other external information of the system, for the attacker, the model is a black box, and the attacker can select an input value to observe a prediction result of the model;
an attacker can 'steal' the AI model by calling the interface of the medical AI cloud service for many times, which can cause two problems: one is the theft of intellectual property. Sample collection and model training need to consume large resources, and the trained model is important intellectual property; and secondly, an attacker can construct a countermeasure sample through the stolen model, so that a special protection method can be adopted in the model training stage to reduce the risk of model theft.
In order to train a medical diagnosis model with a better effect by using sensitive medical data collected by hospitals or data institutions and prevent the model from being stolen, a method for protecting the medical diagnosis model against stealing attacks based on federal learning is provided.
Disclosure of Invention
The invention aims to solve the defects in the prior art, and provides a method for protecting a medical diagnosis model based on federal learning and used for preventing stealing attacks.
In order to achieve the purpose, the invention adopts the following technical scheme:
a medical diagnosis model protection method for preventing stealing attacks based on federal learning comprises the following specific steps:
s1, assuming that N medical institutions respectively collect N sensitive medical data sets, and respectively training different local models by each institution based on the data sets to obtain N teachers;
s2, deploying the trained teachers locally in each medical institution, and recording the prediction result of each teacher for final result voting;
s3, introducing Laplace noise, and disturbing the statistical condition of the ticket number to realize differential privacy protection;
s4, aggregating the prediction results of all the teachers at the federal global server, voting, and selecting the highest vote number as a final result to obtain aggregated teachers;
s5, labeling the unlabeled desensitization public data set by using an aggregated teacher at the federal global server to obtain a labeled desensitization public data set;
s6, training a new deep neural network model student by using the labeled desensitization public data set obtained in the step S5 at a federal global server;
and S7, providing the trained student for the user to use.
As a further scheme of the invention: the specific process of implementing differential privacy protection in step S3 is as follows:
SS1, counting the voting conditions of the teacher models trained locally by N medical institutions at the Federal Global nation server to form an aggregation result;
SS2, if most of the teacher models agree with a certain prediction result, the prediction result does not depend on a specific dispersed data set, namely the privacy cost is very low; if the two types of prediction results have similar votes, the inconsistency may reveal privacy information, so that the differential privacy is increased before the votes form aggregated teachers;
the SS3 treats the aggregated teacher of the federated global server as a differential privacy module, and the user provides input data (i.e., samples to be predicted) that can return tags that protect privacy.
As a further scheme of the invention: the differential privacy step in step SS2 specifically includes: laplace noise is introduced, the statistical condition of the number of the votes is disturbed, and privacy is protected.
As a further scheme of the invention: the main factors of the step S6 for training the new deep neural network model student are:
(1) the differential privacy protection of the aggregated teacher in step S4 is limited, and if the attacker calls the aggregated teacher differential privacy module for multiple times, some privacy information may be obtained by outputting the result;
(2) the deep neural network model student in the step S6 is a desensitized model, and can be directly run on the user equipment, so that an attacker can be prevented from directly calling the aggregated teacher differential privacy module for multiple times, and the leakage risk is reduced.
As a further scheme of the invention: the deep neural network model student in the step S6 is trained based on the unmarked desensitized common data set, and before training the deep neural network model student, the desensitized common data set needs to be labeled with an aggregated teecher.
Compared with the prior art, the invention has the beneficial effects that:
1. and introducing a federal learning framework, and training a local teacher model by fully utilizing local limited data sets of each medical institution. The federated global server is aggregated into an aggregated teacher model, so that the data privacy of each organization is ensured not to be revealed while the local data knowledge of each medical organization is acquired;
2. before the federal global server votes to form an aggregated teacher, by adding an additional differential privacy step: laplace noise is introduced, the statistical condition of the number of the votes is disturbed, and at the moment, the aggregated teacher is regarded as a module of differential privacy, so that a user provides input data (namely a sample to be predicted), and the aggregated teacher can return a label for protecting the privacy;
3. before the students are trained, the desensitized public data set is labeled by using the aggregated teachers, the learned knowledge is transferred, and the trained student model is provided for the user to use, so that the user can be prevented from directly calling the aggregated teachers for many times, and the risk of model stealing is further reduced.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention.
Fig. 1 is a flow chart of a method for protecting a medical diagnosis model against theft and attack based on federal learning according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments.
In the description of the present invention, it is to be understood that the terms "upper", "lower", "front", "rear", "left", "right", "top", "bottom", "inner", "outer", and the like, indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, are merely for convenience in describing the present invention and simplifying the description, and do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present invention.
Referring to fig. 1, a medical diagnosis model protection method for preventing theft and attack based on federal learning includes the following specific steps:
s1, assuming that N medical institutions respectively collect N sensitive medical data sets, and respectively training different local models by each institution based on the data sets to obtain N teachers;
s2, deploying the trained teachers locally in each medical institution, and recording the prediction result of each teacher for final result voting;
s3, introducing Laplace noise, and disturbing the statistical condition of the ticket number to realize differential privacy protection;
s4, aggregating the prediction results of all the teachers at the federal global server, voting, and selecting the highest vote number as a final result to obtain aggregated teachers;
s5, labeling the unlabeled desensitization public data set by using an aggregated teacher at the federal global server to obtain a labeled desensitization public data set;
s6, training a new deep neural network model student by using the labeled desensitization public data set obtained in the step S5 at a federal global server;
and S7, providing the trained student for the user to use.
The specific process of implementing differential privacy protection in step S3 is as follows:
SS1, counting the voting condition of the teacher model trained locally by N medical institutions at the federal global server to form an aggregation result;
SS2, if most of the teacher models agree with a certain prediction result, the prediction result does not depend on a specific dispersed data set, namely the privacy cost is very low; if the two types of prediction results have similar votes, the inconsistency may reveal privacy information, so before the votes form aggregated teachers, a differential privacy step is added: laplace noise is introduced, the statistical condition of the number of the votes is disordered, and privacy is protected;
the SS3 treats the aggregated teacher of the federated global server as a differential privacy module, and the user provides input data (i.e., samples to be predicted) that can return tags that protect privacy.
The main factors of the step S6 for training the new deep neural network model student are:
(1) the differential privacy protection of the aggregated teacher in step S4 is limited, and if the attacker calls the aggregated teacher differential privacy module for multiple times, some privacy information may be obtained by outputting the result;
(2) the deep neural network model student in the step S6 is a desensitized model, and can be directly run on the user equipment, so that an attacker can be prevented from directly calling the aggregated teacher differential privacy module for multiple times, and the leakage risk is reduced.
The deep neural network model student in the step S6 is trained based on the unmarked desensitized common data set, and before training the deep neural network model student, the desensitized common data set needs to be labeled with an aggregated teecher.
The working principle and the using process of the invention are as follows: firstly, N sensitive medical data sets are collected by N medical institutions respectively, different local models are independently trained by the institutions based on the data sets to obtain N teachers, the trained teachers are deployed locally in the medical institutions, the prediction result of each teacher is recorded for final result voting, the statistical condition of the number of votes is disturbed by introducing Laplace noise to realize differential privacy protection, then the prediction results of all the teachers are aggregated and voted in a federal global server, the highest number of votes is selected as the final result to obtain an aggregated teacher, the unlabeled desensitized public data set is labeled by the aggregated teacher in the federal global server to obtain a labeled desensitized public data set, the labeled desensitized public data set is used for training a new deep neural network model stub, the trained stub is provided for a user to use, before the stub is trained, the public data set is labeled by using the aggregated teacher, the learned knowledge is transferred, the trained student model is provided for the user to use, the user can be prevented from directly calling the aggregated teacher for many times, and the leakage risk is further reduced.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention should be equivalent or changed within the scope of the present invention.

Claims (5)

1. A medical diagnosis model protection method for preventing stealing attacks based on federal learning is characterized by comprising the following specific steps:
s1, assuming that N medical institutions respectively collect N sensitive medical data sets, and respectively training different local models by each institution based on the data sets to obtain N teachers;
s2, deploying the trained teachers locally in each medical institution, and recording the prediction result of each teacher for final result voting;
s3, introducing Laplace noise, and disturbing the statistical condition of the ticket number to realize differential privacy protection;
s4, aggregating the prediction results of all the teachers at the federal global server, voting, and selecting the highest vote number as a final result to obtain aggregated teachers;
s5, labeling the unlabeled desensitization public data set by using an aggregated teacher at the federal global server to obtain a labeled desensitization public data set;
s6, training a new deep neural network model student by using the labeled desensitization public data set obtained in the step S5 at a federal global server;
and S7, providing the trained student for the user to use.
2. The method for protecting the federally-learned attack-stealing-prevention medical diagnosis model according to claim 1, wherein the method comprises the following steps: the specific process of implementing differential privacy protection in step S3 is as follows:
SS1, counting the voting condition of the teacher model trained locally by N medical institutions at the federal global server to form an aggregation result;
SS2, if most of the teacher models agree with a certain prediction result, the prediction result does not depend on a specific dispersed data set, namely the privacy cost is very low; if the two types of prediction results have similar votes, the inconsistency may reveal privacy information, so that the differential privacy is increased before the votes form aggregated teachers;
the SS3 treats the aggregated teacher of the federated global server as a differential privacy module, and the user provides input data, which can return tags that protect privacy.
3. The method for protecting the federally-learned attack-stealing-prevention medical diagnosis model according to claim 2, wherein the method comprises the following steps: the differential privacy step in step SS2 specifically includes: laplace noise is introduced, the statistical condition of the number of the votes is disturbed, and privacy is protected.
4. The method for protecting the federally-learned attack-stealing-prevention medical diagnosis model according to claim 1, wherein the method comprises the following steps: the main factors of the step S6 for training the new deep neural network model student are:
(1) the differential privacy protection of the aggregated teacher in step S4 is limited, and if the attacker calls the aggregated teacher differential privacy module for multiple times, some privacy information may be obtained by outputting the result;
(2) the deep neural network model student in the step S6 is a desensitized model, and can be directly run on the user equipment, so that an attacker can be prevented from directly calling the aggregated teacher differential privacy module for multiple times.
5. The method for protecting the federally-learned attack-stealing-prevention medical diagnosis model according to claim 1, wherein the method comprises the following steps: the deep neural network model student in the step S6 is trained based on the unmarked desensitized common data set, and before training the deep neural network model student, the desensitized common data set needs to be labeled by using an aggregated teecher on the federal global server.
CN202110422407.3A 2021-04-20 2021-04-20 Anti-theft attack medical diagnosis model protection method based on federal learning Pending CN112967812A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110422407.3A CN112967812A (en) 2021-04-20 2021-04-20 Anti-theft attack medical diagnosis model protection method based on federal learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110422407.3A CN112967812A (en) 2021-04-20 2021-04-20 Anti-theft attack medical diagnosis model protection method based on federal learning

Publications (1)

Publication Number Publication Date
CN112967812A true CN112967812A (en) 2021-06-15

Family

ID=76280857

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110422407.3A Pending CN112967812A (en) 2021-04-20 2021-04-20 Anti-theft attack medical diagnosis model protection method based on federal learning

Country Status (1)

Country Link
CN (1) CN112967812A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114169007A (en) * 2021-12-10 2022-03-11 西安电子科技大学 Medical privacy data identification method based on dynamic neural network
CN114566289A (en) * 2022-04-26 2022-05-31 之江实验室 Disease prediction system based on multi-center clinical data anti-cheating analysis
CN115881306A (en) * 2023-02-22 2023-03-31 中国科学技术大学 Networked ICU intelligent medical decision-making method based on federal learning and storage medium
CN116564535A (en) * 2023-05-11 2023-08-08 之江实验室 Central disease prediction method and device based on local graph information exchange under privacy protection

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111079174A (en) * 2019-11-21 2020-04-28 中国电力科学研究院有限公司 Power consumption data desensitization method and system based on anonymization and differential privacy technology
CN112232527A (en) * 2020-09-21 2021-01-15 北京邮电大学 Safe distributed federal deep learning method
CN112613065A (en) * 2020-12-02 2021-04-06 北京明朝万达科技股份有限公司 Data sharing method and device based on differential privacy protection
CN112668726A (en) * 2020-12-25 2021-04-16 中山大学 Personalized federal learning method with efficient communication and privacy protection

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111079174A (en) * 2019-11-21 2020-04-28 中国电力科学研究院有限公司 Power consumption data desensitization method and system based on anonymization and differential privacy technology
CN112232527A (en) * 2020-09-21 2021-01-15 北京邮电大学 Safe distributed federal deep learning method
CN112613065A (en) * 2020-12-02 2021-04-06 北京明朝万达科技股份有限公司 Data sharing method and device based on differential privacy protection
CN112668726A (en) * 2020-12-25 2021-04-16 中山大学 Personalized federal learning method with efficient communication and privacy protection

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
NICOLAS PAPERNOT等: "SEMI-SUPERVISED KNOWLEDGE TRANSFER FOR DEEP LEARNING FROM PRIVATE TRAINING DATA", 《ICLR》, 31 December 2017 (2017-12-31), pages 1 - 16 *
WENQI LI等: "Privacy-Preserving Federated Brain Tumour Segmentation", 《INTERNATIONAL WORKSHOP ON MACHINE LEARNING IN MEDICAL IMAGING》, pages 133 - 141 *
郭子菁 等: "医疗健康大数据隐私保护综述", 《计算机科学与探索》, vol. 15, no. 3, pages 389 - 402 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114169007A (en) * 2021-12-10 2022-03-11 西安电子科技大学 Medical privacy data identification method based on dynamic neural network
CN114169007B (en) * 2021-12-10 2024-05-14 西安电子科技大学 Medical privacy data identification method based on dynamic neural network
CN114566289A (en) * 2022-04-26 2022-05-31 之江实验室 Disease prediction system based on multi-center clinical data anti-cheating analysis
CN115881306A (en) * 2023-02-22 2023-03-31 中国科学技术大学 Networked ICU intelligent medical decision-making method based on federal learning and storage medium
CN116564535A (en) * 2023-05-11 2023-08-08 之江实验室 Central disease prediction method and device based on local graph information exchange under privacy protection
CN116564535B (en) * 2023-05-11 2024-02-20 之江实验室 Central disease prediction method and device based on local graph information exchange under privacy protection

Similar Documents

Publication Publication Date Title
CN112967812A (en) Anti-theft attack medical diagnosis model protection method based on federal learning
Casey et al. The Kodak syndrome: risks and opportunities created by decentralization of forensic capabilities
Arulogun et al. RFID-based students attendance management system
CN108763507A (en) Enterprise's incidence relation method for digging and device
CN107085812A (en) The anti money washing system and method for block chain digital asset
CN109658310A (en) One kind being used for exhibition room museum intelligent identifying system
CN107820210A (en) One kind is registered method, mobile terminal and computer-readable recording medium
Osayande Electronic security systems in academic libraries: A case study of three university libraries in South-West Nigeria
CN106980745B (en) Method for exhibiting data and device
Ekere et al. The use of ICT for security and theft prevention in two university libraries in Nigeria
CN101246619A (en) Monitoring system
CN107507118A (en) A kind of three-dimensional safety of person vehicle prevention and control system
CN110889717A (en) Method and device for filtering advertisement content in text, electronic equipment and storage medium
Hull Dirty data labeled dirt cheap: epistemic injustice in machine learning systems
CN109871211A (en) Information displaying method and device
Kui The stumbling balance between public health and privacy amid the pandemic in China
Kotoroi Constraints Facing African Academic Libraries in Applying Electronic Security Systems to Protect Library Materials
Benton et al. Using video cameras as a research tool in public spaces: addressing ethical and information governance challenges under data protection legislation
CN116797005A (en) Information acquisition management system applied to campus intelligent teaching management system
Estivill-Castro et al. Privacy in data mining
Ayofe et al. A framework for computer aided investigation of ATM fraud in Nigeria
Lyon Surveillance technologies: Trends and social implications
CN110309312B (en) Associated event acquisition method and device
Opara et al. Technological Methods and Security of Information Resources in Dame Patience Goodluck Jonathan Automated Library, Ignatius Ajuru University of Education
CN116433387A (en) Risk prediction method, risk prediction device, computing equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination