CN112967812A

CN112967812A - Anti-theft attack medical diagnosis model protection method based on federal learning

Info

Publication number: CN112967812A
Application number: CN202110422407.3A
Authority: CN
Inventors: 郑子彬; 李世璇; 陈川
Original assignee: Zhongai Health Technology Guangdong Co ltd
Current assignee: Zhongai Health Technology Guangdong Co ltd
Priority date: 2021-04-20
Filing date: 2021-04-20
Publication date: 2021-06-15

Abstract

The invention discloses a method for protecting a medical diagnosis model against theft and attack based on federal learning, which belongs to the technical field of medical diagnosis and protection and comprises the following specific steps: s1, assuming that N medical institutions respectively collect N sensitive medical data sets, and respectively training different local models by each institution based on the data sets to obtain N teachers; s2, deploying the trained teachers locally in each medical institution, and recording the prediction result of each teacher for final result voting; s3, introducing Laplace noise, and disturbing the statistical condition of the ticket number to realize differential privacy protection; s4, aggregating all the teachers votes in the federal global server to obtain aggregated teachers; s5, marking desensitization public data sets by using aggregated teachers at a federal global server, and transmitting learned knowledge; s6, training the student by the marked public data set at the federal global server; and S7, providing the trained student model for a user to use, and reducing the risk of leakage.

Description

Anti-theft attack medical diagnosis model protection method based on federal learning

Technical Field

The invention relates to the technical field of medical diagnosis protection, in particular to a method for protecting a medical diagnosis model against theft and attack based on federal learning.

Background

In the field of intelligent medical treatment, electronic health data is generally used to extract patient characteristics and identify patient groups, and simply, clinical data is used to train a disease diagnosis model. These clinical data are often sensitive data related to patient privacy, and thus data often cannot be communicated between medical institutions. However, most hospitals (or data centers) have limited data volume and insufficient data is held to train a good model. Only a large medical AI cloud service provider can provide good model services, such as model training, identification and the like, the services are usually opened to the outside in an interface form, and a user can call the services to complete operations such as medical image identification, disease diagnosis prediction and the like; the model stealing attack means that an attacker speculates parameters and training data information of a system model by inquiring and analyzing input and output and other external information of the system, for the attacker, the model is a black box, and the attacker can select an input value to observe a prediction result of the model;

an attacker can 'steal' the AI model by calling the interface of the medical AI cloud service for many times, which can cause two problems: one is the theft of intellectual property. Sample collection and model training need to consume large resources, and the trained model is important intellectual property; and secondly, an attacker can construct a countermeasure sample through the stolen model, so that a special protection method can be adopted in the model training stage to reduce the risk of model theft.

In order to train a medical diagnosis model with a better effect by using sensitive medical data collected by hospitals or data institutions and prevent the model from being stolen, a method for protecting the medical diagnosis model against stealing attacks based on federal learning is provided.

Disclosure of Invention

The invention aims to solve the defects in the prior art, and provides a method for protecting a medical diagnosis model based on federal learning and used for preventing stealing attacks.

In order to achieve the purpose, the invention adopts the following technical scheme:

a medical diagnosis model protection method for preventing stealing attacks based on federal learning comprises the following specific steps:

s1, assuming that N medical institutions respectively collect N sensitive medical data sets, and respectively training different local models by each institution based on the data sets to obtain N teachers;

s2, deploying the trained teachers locally in each medical institution, and recording the prediction result of each teacher for final result voting;

s3, introducing Laplace noise, and disturbing the statistical condition of the ticket number to realize differential privacy protection;

s4, aggregating the prediction results of all the teachers at the federal global server, voting, and selecting the highest vote number as a final result to obtain aggregated teachers;

s5, labeling the unlabeled desensitization public data set by using an aggregated teacher at the federal global server to obtain a labeled desensitization public data set;

s6, training a new deep neural network model student by using the labeled desensitization public data set obtained in the step S5 at a federal global server;

and S7, providing the trained student for the user to use.

As a further scheme of the invention: the specific process of implementing differential privacy protection in step S3 is as follows:

SS1, counting the voting conditions of the teacher models trained locally by N medical institutions at the Federal Global nation server to form an aggregation result;

SS2, if most of the teacher models agree with a certain prediction result, the prediction result does not depend on a specific dispersed data set, namely the privacy cost is very low; if the two types of prediction results have similar votes, the inconsistency may reveal privacy information, so that the differential privacy is increased before the votes form aggregated teachers;

the SS3 treats the aggregated teacher of the federated global server as a differential privacy module, and the user provides input data (i.e., samples to be predicted) that can return tags that protect privacy.

As a further scheme of the invention: the differential privacy step in step SS2 specifically includes: laplace noise is introduced, the statistical condition of the number of the votes is disturbed, and privacy is protected.

As a further scheme of the invention: the main factors of the step S6 for training the new deep neural network model student are:

(1) the differential privacy protection of the aggregated teacher in step S4 is limited, and if the attacker calls the aggregated teacher differential privacy module for multiple times, some privacy information may be obtained by outputting the result;

(2) the deep neural network model student in the step S6 is a desensitized model, and can be directly run on the user equipment, so that an attacker can be prevented from directly calling the aggregated teacher differential privacy module for multiple times, and the leakage risk is reduced.

As a further scheme of the invention: the deep neural network model student in the step S6 is trained based on the unmarked desensitized common data set, and before training the deep neural network model student, the desensitized common data set needs to be labeled with an aggregated teecher.

Compared with the prior art, the invention has the beneficial effects that:

1. and introducing a federal learning framework, and training a local teacher model by fully utilizing local limited data sets of each medical institution. The federated global server is aggregated into an aggregated teacher model, so that the data privacy of each organization is ensured not to be revealed while the local data knowledge of each medical organization is acquired;

2. before the federal global server votes to form an aggregated teacher, by adding an additional differential privacy step: laplace noise is introduced, the statistical condition of the number of the votes is disturbed, and at the moment, the aggregated teacher is regarded as a module of differential privacy, so that a user provides input data (namely a sample to be predicted), and the aggregated teacher can return a label for protecting the privacy;

3. before the students are trained, the desensitized public data set is labeled by using the aggregated teachers, the learned knowledge is transferred, and the trained student model is provided for the user to use, so that the user can be prevented from directly calling the aggregated teachers for many times, and the risk of model stealing is further reduced.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention.

Fig. 1 is a flow chart of a method for protecting a medical diagnosis model against theft and attack based on federal learning according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments.

In the description of the present invention, it is to be understood that the terms "upper", "lower", "front", "rear", "left", "right", "top", "bottom", "inner", "outer", and the like, indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, are merely for convenience in describing the present invention and simplifying the description, and do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present invention.

Referring to fig. 1, a medical diagnosis model protection method for preventing theft and attack based on federal learning includes the following specific steps:

and S7, providing the trained student for the user to use.

The specific process of implementing differential privacy protection in step S3 is as follows:

SS1, counting the voting condition of the teacher model trained locally by N medical institutions at the federal global server to form an aggregation result;

SS2, if most of the teacher models agree with a certain prediction result, the prediction result does not depend on a specific dispersed data set, namely the privacy cost is very low; if the two types of prediction results have similar votes, the inconsistency may reveal privacy information, so before the votes form aggregated teachers, a differential privacy step is added: laplace noise is introduced, the statistical condition of the number of the votes is disordered, and privacy is protected;

The main factors of the step S6 for training the new deep neural network model student are:

The deep neural network model student in the step S6 is trained based on the unmarked desensitized common data set, and before training the deep neural network model student, the desensitized common data set needs to be labeled with an aggregated teecher.

The working principle and the using process of the invention are as follows: firstly, N sensitive medical data sets are collected by N medical institutions respectively, different local models are independently trained by the institutions based on the data sets to obtain N teachers, the trained teachers are deployed locally in the medical institutions, the prediction result of each teacher is recorded for final result voting, the statistical condition of the number of votes is disturbed by introducing Laplace noise to realize differential privacy protection, then the prediction results of all the teachers are aggregated and voted in a federal global server, the highest number of votes is selected as the final result to obtain an aggregated teacher, the unlabeled desensitized public data set is labeled by the aggregated teacher in the federal global server to obtain a labeled desensitized public data set, the labeled desensitized public data set is used for training a new deep neural network model stub, the trained stub is provided for a user to use, before the stub is trained, the public data set is labeled by using the aggregated teacher, the learned knowledge is transferred, the trained student model is provided for the user to use, the user can be prevented from directly calling the aggregated teacher for many times, and the leakage risk is further reduced.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention should be equivalent or changed within the scope of the present invention.

Claims

1. A medical diagnosis model protection method for preventing stealing attacks based on federal learning is characterized by comprising the following specific steps:

and S7, providing the trained student for the user to use.

2. The method for protecting the federally-learned attack-stealing-prevention medical diagnosis model according to claim 1, wherein the method comprises the following steps: the specific process of implementing differential privacy protection in step S3 is as follows:

the SS3 treats the aggregated teacher of the federated global server as a differential privacy module, and the user provides input data, which can return tags that protect privacy.

3. The method for protecting the federally-learned attack-stealing-prevention medical diagnosis model according to claim 2, wherein the method comprises the following steps: the differential privacy step in step SS2 specifically includes: laplace noise is introduced, the statistical condition of the number of the votes is disturbed, and privacy is protected.

4. The method for protecting the federally-learned attack-stealing-prevention medical diagnosis model according to claim 1, wherein the method comprises the following steps: the main factors of the step S6 for training the new deep neural network model student are:

(2) the deep neural network model student in the step S6 is a desensitized model, and can be directly run on the user equipment, so that an attacker can be prevented from directly calling the aggregated teacher differential privacy module for multiple times.

5. The method for protecting the federally-learned attack-stealing-prevention medical diagnosis model according to claim 1, wherein the method comprises the following steps: the deep neural network model student in the step S6 is trained based on the unmarked desensitized common data set, and before training the deep neural network model student, the desensitized common data set needs to be labeled by using an aggregated teecher on the federal global server.