CN116611115A - Medical data diagnosis model, method, system and memory based on federal learning - Google Patents

Medical data diagnosis model, method, system and memory based on federal learning Download PDF

Info

Publication number
CN116611115A
CN116611115A CN202310889420.9A CN202310889420A CN116611115A CN 116611115 A CN116611115 A CN 116611115A CN 202310889420 A CN202310889420 A CN 202310889420A CN 116611115 A CN116611115 A CN 116611115A
Authority
CN
China
Prior art keywords
model
medical data
local
training
diagnosis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310889420.9A
Other languages
Chinese (zh)
Inventor
吴艳平
马韵洁
王佐成
王飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Data Space Research Institute
Original Assignee
Data Space Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Data Space Research Institute filed Critical Data Space Research Institute
Priority to CN202310889420.9A priority Critical patent/CN116611115A/en
Publication of CN116611115A publication Critical patent/CN116611115A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioethics (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention relates to the technical field of medical data diagnosis and machine learning, in particular to a medical data diagnosis model, a medical data diagnosis method, a medical data diagnosis system and a medical data diagnosis memory based on federal learning. According to the invention, the study of the medical data diagnosis model is realized through federal aggregation, the joint training of the hospital diagnosis models is realized through federal study under the condition of not sharing a hospital database, noise and disturbance are added in the local model parameter aggregation process, so that a private diagnosis model training method is ensured, and the risk of medical data leakage is avoided; meanwhile, the model training precision is improved through federal learning. According to the invention, noise is added for the global parameter and the local model parameter respectively, so that the possibility of acquiring data of each medical unit through model analysis is further reduced, and the privacy budget of model training is reduced through multiple noise addition.

Description

Medical data diagnosis model, method, system and memory based on federal learning
Technical Field
The invention relates to the technical field of medical data diagnosis and machine learning, in particular to a medical data diagnosis model, a medical data diagnosis method, a medical data diagnosis system and a medical data diagnosis memory based on federal learning.
Background
Along with popularization of machine learning, automatic diagnosis, prediction and classification of diagnosis and treatment data provide a new direction for development of the medical field. However, the medical databases of the respective hospitals are independent, and the medical data has high privacy, and a shared medical database cannot be established. At present, many big hospitals perform training of a medical data diagnosis model based on own medical databases, but because single hospital data are limited and different hospitals have different expertise directions, only machine learning is performed based on the databases of the single hospitals, and the obtained medical data diagnosis model effect is not ideal.
Federal learning, which is a distributed machine learning, can significantly protect private data of clients from exposure. Nonetheless, private information can still be revealed by analyzing parameters uploaded by the client, such as weights trained in deep neural networks.
Disclosure of Invention
In order to overcome the defect that the medical data diagnosis model cannot be shared and limited by the medical data diagnosis model machine learning in the prior art, the invention provides a training method of the medical data diagnosis model based on federal learning, and the medical data diagnosis model with high precision can be realized through machine learning under the condition of ensuring privacy safety.
The invention provides a training method of a medical data diagnosis model based on federal learning, which comprises the following steps:
s1, acquiring a participant, wherein the participant has a local medical database; acquiring a part to be trained in a medical data diagnosis model of a participant as a local model; the local model performs diagnosis based on the input medical data to obtain a diagnosis result; the local model structures of all the participants are the same;
s2, the server gives global parameters w (0) to local models of all the participants, the participants perform local training on the medical data diagnosis model, and after the local training is finished, the parameters of the local model of the ith participant are recorded as w (i, 0); w (0) is an initialized global parameter;
s3, at the time of t, each local model carries out federal aggregation on a parameter w (i, t) uploading server to obtain a global parameter w (t, 0), adds noise to the global parameter w (t, 0), and records the global parameter after noise addition as w (t, 1); the initial value of t is 0;
s4, carrying out parameter updating by combining the local models with a local medical database, global parameters w (t, 1) and a set optimization target, wherein the parameters of the i-th participant after the local model updating are w (i, t+1, 1);
s5, judging whether t+1 is greater than or equal to a set value T; if yes, completing the training of each local model, substituting each local model into the corresponding medical data diagnosis model of the participator so as to fix the medical data diagnosis model of each participator; if not, executing step S6;
s6, adding noise and disturbance to parameters w (i, t+1, 1) of each local model, and recording the parameters after adding the noise and the disturbance to the local model of the ith participant as w (i, t+1); updating t to t+1, and returning to S3;
w(i,t+1)=|w(i,t+1,1)|×L(r)+n D (i,t+1)
let L (r) and r be transition terms, r=w (i, t+1, 1)/|w (i, t+1, 1) |;
the value of L (r) is as follows:
taking a random number x in [0,1 ];
if x<(e ε -1)/(e ε +1), L (r) is in the interval [ (r/2- (C-1)/2), (r/2+ (C-1)/2) ]]A random number is fetched;
if x is not less than (e) ε -1)/(e ε +1), L (r) is in the interval [ - (r/2+ (C-1)/2), (C-1)/2) -r/2]A random number is fetched;
c is a transition term, c= (e ε +1)/(e ε -1); epsilon is the set privacy budget; n is n D (i, t+1) is noise added at t+1 iterations of the local model of the ith participant; e is a natural number.
Preferably, the global parameter update formula in S3 is:
w(t,0)= ∑ N i=1 w(i,t)
w(t,1)=w(t,0)+n D (t)
n is the number of participants, i is ordinal number; n is n D And (t) is a set noise.
Preferably, noise n D (t) obeys the expectation of 0, variance σ 2 Gaussian distribution N (0, sigma) 2 );σ 2 =[2×ln(1.25/δ)]/ε 2 Delta is the set differential privacy significance level and epsilon is the set privacy budget.
Preferably, in S1, the participants are divided into an aggregate object and a receiver, and the aggregate object is divided into J levels; the calculation formula of w (t, 1) in S3 is as follows:
w(j,t,c)=[∑ i∈Zj w(i,t)]/n(Zj)
w(t,0)=[∑ J j=1 p(j)×w(j,t,c)]/J
w(t,1)=w(t,0)+n D (t)
w (j, t, c) represents an aggregation parameter of the jth hierarchy, zj represents a set of participants within the jth hierarchy as aggregation objects, and n (Zj) represents the number of participants in Zj; c represents hierarchical aggregation; j is the number of layers, p (J) is the set weight of the J-th layer, and J is more than or equal to 1 and less than or equal to J; n is n D And (t) is a set noise.
Preferably, the optimization objective in S4 is to minimize the function F (w (i)) +γ×|w (i) -w (t, 1) |; w (i) represents a parameter of the local model of the i-th participant, and F (w (i)) represents a loss of the local model of the i-th participant at the parameter w (i); gamma denotes the set regularization parameter.
Preferably, n D (i, t+1) obeys the expectation that 0 variance is σ (i) 2 Is (N) (0, sigma (i)) of the gaussian distribution 2 );
σ(i) 2 =[2×ln(1.25/δ)]/[ε 2 ×N×m(i)]
Where δ is the set differential privacy significance level, N is the number of participants, and m (i) is the sensitivity of the medical data diagnostic model of the ith participant.
Preferably, in S4, at time t, the server sends the global parameter w (t, 1) to the local model of each participant, the local model parameter is updated to the mean value of the current model parameter w (i, t) and the global parameter w (t, 1), and then the medical data diagnosis model performs local training in combination with the optimization target and the local medical database.
The invention also provides a medical data diagnosis method based on federal learning, which can be combined with the medical data diagnosis model to carry out medical data diagnosis and improve the medical service level, and comprises the following steps:
SA1, the medical institution is used as a participant to execute the training method of the medical data diagnosis model based on federal learning so as to complete the training of the medical data diagnosis model of the medical institution;
and SA2, inputting the medical data to be diagnosed into the trained medical data diagnosis model by the medical unit, and obtaining the output of the medical data diagnosis model as a diagnosis result.
The invention also provides a medical data diagnosis system and a memory based on federal learning, which provide a carrier for the diagnosis method and facilitate the application and popularization of the medical data diagnosis model. The system comprises a memory and a processor, wherein the memory stores a computer program, the processor is connected with the memory, and the processor is used for executing the computer program to realize the medical data diagnosis method based on federal learning.
The invention also provides a memory, which stores a computer program, and the computer program is used for realizing the medical data diagnosis method based on federal learning when being executed.
The invention has the advantages that:
(1) Under the condition of not sharing a hospital database, the medical data diagnosis model joint training of each hospital is realized through federal learning, noise and disturbance are added in the local model parameter aggregation process, the privacy diagnosis model training process is ensured, and the medical data leakage risk is avoided; meanwhile, the model training precision is improved through federal learning.
(2) According to the invention, noise is added for the global parameter and the local model parameter respectively, so that the possibility of acquiring data of each medical unit through model analysis is further reduced, and the privacy budget of model training is reduced through multiple noise addition.
(3) According to the method, the difficulty and the calculated amount of parameter aggregation are reduced and the calculation efficiency is improved through the segmentation and the hierarchical aggregation of the participants. Meanwhile, through hierarchical division, the data weights of different medical units can be controlled, so that the generalization performance of the trained global parameters is improved, and the accuracy and generalization of the medical data diagnosis model of each participant are improved.
(4) By the application of the method and the device, the precision loss of the model can be reduced by smaller privacy budget in a multi-user scene, and the usability of the model is improved. According to the training method provided by the invention, on the premise of privacy protection, more accurate global parameters and model diagnosis results can be obtained, and better effects can be obtained in a multi-user scene.
Drawings
FIG. 1 is a training method of a federal learning-based medical data diagnostic model;
FIG. 2 is a statistical diagram of test accuracy according to an embodiment.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Noun definition:
medical database: the medical detection information is used for storing medical detection information marked with diagnosis results; the medical detection information is a detection report, such as a special detection report, a physical examination report and the like; the diagnostic structures are in the category of diseases such as diabetes, diabetic complications, kidney stones, hypertension, etc.
Medical data diagnostic model: and inputting medical detection information and outputting a neural network model which is a diagnosis result.
Training method of medical data diagnosis model based on federal learning
The training method of the medical data diagnosis model based on federal learning, which is provided in the embodiment, is used for realizing the training of the medical data diagnosis model under the condition that medical data of all hospitals are not shared.
Referring to fig. 1, the training method includes the following steps S1 to S6.
S1, acquiring a participant, wherein the participant has a local medical database; acquiring a part to be trained in a medical data diagnosis model of a participant as a local model; the local model performs diagnosis based on the input medical data to obtain a diagnosis result; the local model structure of each participant is the same.
In particular, the local model can be provided in two ways.
A first medical data diagnosis model, which is input as preprocessed medical data and output as a diagnosis result; the first medical data diagnostic model is entirely a local model.
A second medical data diagnostic model comprising a pre-processing module and a diagnostic module; the preprocessing module is used for preprocessing the input medical data, the input of the diagnosis module is the output data of the preprocessing module, and the output of the diagnosis module is the diagnosis result. In the second medical data diagnosis model, the preprocessing module is a pre-training model, and only parameters of the diagnosis module are updated in the training process, namely, only the diagnosis module is used as a local model in the second medical data diagnosis model.
In this embodiment, the data preprocessing includes: data cleaning, data conversion and data standardization; the data cleaning comprises the steps of removing null values, abnormal values, repeated values and the like, so that the accuracy and the integrity of the data are ensured; the data conversion comprises numerical value, coding, feature selection and the like, so that algorithm training and analysis are facilitated; data normalization comprises normalization, dimension reduction and the like, and data preprocessing is beneficial to improving the efficiency and accuracy of model training. The preprocessing steps are all for optimizing the performance and the precision of the model, and the data preprocessing can adopt any existing data preprocessing mode and various combinations of the existing data and processing modes, which are not described herein.
S2, the server gives global parameters w (0) to the local models of all the participants, the medical data diagnosis models of the participants perform local training, and after the local training is finished, the parameters of the local models of the ith participant are marked as w (i, 0); w (0) is an initialized global parameter.
In particular, w (0) is a random initialization parameter or a weighted average of local models in the existing medical data diagnostic model of each participant.
S3, at the time of t, each local model carries out federal aggregation on a parameter w (i, t) uploading server to obtain a global parameter w (t, 0), adds noise to the global parameter w (t, 0), and records the global parameter after noise addition as w (t, 1); the initial value of t is 0;
w(t,0)= ∑ N i=1 w(i,t)
w(t,1)=w(t,0)+n D (t)
n is the number of participants, i is ordinal number; n is n D And (t) is a set noise.
Specifically, in the present embodiment, noise n D (t) obeys the expectation of 0, variance σ 2 Gaussian distribution N (0, sigma) 2 ) The method comprises the steps of carrying out a first treatment on the surface of the And:
σ 2 =[2×ln(1.25/δ)]/ε 2
where δ is the set differential privacy significance level and ε is the set privacy budget.
S4, carrying out parameter updating by combining the local models with a local medical database, global parameters w (t, 1) and an optimization target, wherein the parameters after the local models of the ith participant are updated are w (i, t+1, 1);
the optimization objective is to minimize the function F (w (i)) +γ×|w (i) -w (t, 1) |;
the optimization objective formula is expressed as:
w(i,t+1,1)←argmin w(i) {F(w(i))+γ×|w(i)-w(t,1)|}
w (i) represents a parameter of a local model of the ith participant, arg represents an updated object taking the parameter w (i), and min represents a minimum value; f (w (i)) represents the loss of the local model of the ith participant at the parameter w (i); gamma denotes the set regularization parameter.
It is worth noting that only the local model in the medical data diagnosis model needs to be subjected to parameter updating, so that the local training of the medical data diagnosis model is the local training of the local model, and the loss of the local model is the loss of the medical data diagnosis model.
S5, judging whether t+1 is greater than or equal to a set value T; if yes, completing the training of each local model, substituting each local model into the corresponding medical data diagnosis model of the participator so as to fix the medical data diagnosis model of each participator; if not, executing step S6;
s6, adding noise and disturbance to parameters w (i, t+1, 1) of each local model, and recording the parameters after adding the noise and the disturbance to the local model of the ith participant as w (i, t+1); let t update to t+1, and then return to S3.
Specifically, the acquisition of w (i, t+1) in S6 is expressed as follows:
w(i,t+1)=|w(i,t+1,1)|×L(r)+n D (i,t+1)
let L (r) and r be transition terms, r=w (i, t+1, 1)/|w (i, t+1, 1) |;
the value of L (r) is as follows:
taking a random number x in [0,1 ];
if x<(e ε -1)/(e ε +1), L (r) is in the interval [ (r/2- (C-1)/2), (r/2+ (C-1)/2) ]]A random number is fetched;
if x is not less than (e) ε -1)/(e ε +1), L (r) is in the interval [ - (r/2+ (C-1)/2), (C-1)/2) -r/2]A random number is fetched;
c is a transition term, c= (e ε +1)/(e ε -1); epsilon is the set privacy budget; e is a natural number.
n D (i, t+1) is noise added at t+1 iterations of the local model of the ith participant, n D (i, t+1) obeys the expectation that 0 variance is σ (i) 2 Is (N) (0, sigma (i)) of the gaussian distribution 2 ) The method comprises the steps of carrying out a first treatment on the surface of the And:
σ(i) 2 =[2×ln(1.25/δ)]/[ε 2 ×N×m(i)]
where δ is the set differential privacy significance level, N is the number of participants, and m (i) is the sensitivity of the medical data diagnostic model of the ith participant.
Second training method of medical data diagnosis model based on federal learning
The second training method is improved on the basis of the first training method. Specifically, compared with the first training method, in the training method, steps S1-S3 are improved, and the improved steps are realized as follows.
S1, acquiring a participant, selecting an aggregation object from the participant, and dividing the aggregation object into J layers; taking the participants outside the aggregation object as the receiving party; the participants are provided with a local medical database and a medical data diagnosis model, and the medical data diagnosis model is used for diagnosing medical data so as to obtain diagnosis results. Enabling a part to be trained in the local module to serve as a local model; the local model structure of each participant is the same.
S2, the server sends the global parameter w (0) to the local model of each participant, the medical data diagnosis model updates the model parameters to the average value of the current model parameters w (i, 0) and the global parameter w (0), then the medical data diagnosis model combines a local medical database to carry out local training, and after the local training is finished, the parameter of the local model of the ith participant is recorded as w (i, 0); w (0) is an initialized global parameter.
S3, at the time of t, carrying out intra-level aggregation on the local models of the participants in each level to obtain parameters of each level; let the level parameter of the j-th level be w (j, t, c); the server performs federation aggregation on the hierarchical parameters to obtain global parameters w (t, 0), adds noise to the global parameters w (t, 0), and records the global parameters after noise addition as w (t, 1); the initial value of t is 0;
w(j,t,c)=[∑ i∈Zj w(i,t)]/n(Zj)
zj represents the set of participants within the j-th hierarchy, n (Zj) represents the number of participants in Zj; c represents hierarchical aggregation;
w(t,0)=[∑ J j=1 p(j)×w(j,t,c)]/J
w(t,1)=w(t,0)+n D (t)
j is the number of layers, p (J) is the set weight of the J-th layer, and J is more than or equal to 1 and less than or equal to J; n is n D And (t) is a set noise.
The medical data diagnosis model obtained by the training method of the first medical data diagnosis model based on federal learning provided by the invention is verified by combining a specific embodiment.
In the present embodiment, the differential privacy significance level δ=0.1 is set.
In this embodiment, 3 tertiary hospitals and 2 secondary hospitals are selected as participants, and local medical databases of the participants are excel relational databases, that is, medical detection information is presented through an excel table.
In this embodiment, the medical data diagnosis model of each participant is defined as a second medical data diagnosis model.
In this embodiment, the local model of each participant is trained by using the above-mentioned first training method for the medical data diagnosis model based on federal learning, so as to obtain the final medical data diagnosis model of the participant.
And finally, substituting the final medical data diagnosis model of each participant into the S3 by the server, calculating the global parameter w (t, 1) as the final global parameter, constructing a medical test model by combining the final global parameter, and performing medical data diagnosis on any primary hospital through the medical test model. The medical test model is a medical data diagnosis model adopting final global parameters for the local model.
In this embodiment, the medical databases of the participants are divided into a training set and a testing set; in the federal learning process, the local model learns only the training set to update the model parameters.
In this embodiment, two evaluation indexes, namely, training accuracy and testing accuracy, are constructed.
In this example, the diagnostic accuracy of each level of hospitals was counted at privacy budgets of 0.7, 0.6, 0.5 and 0.4, respectively, as shown in tables 1 and 2 below.
Table 1: precision of medical data diagnosis model in hospitals of all levels
In table 1, the third-level hospital training accuracy average value is the diagnosis accuracy average value of the medical data diagnosis model of the third-level hospital on the corresponding training set;
the test accuracy average value of the third-level hospital is the diagnosis accuracy average value of the final medical data diagnosis model of the third-level hospital on the corresponding test set;
the training accuracy average value of the secondary hospital is the diagnosis accuracy average value of the final medical data diagnosis model of the secondary hospital on the corresponding training set;
the test accuracy average value of the secondary hospital is the diagnosis accuracy average value of the final medical data diagnosis model of the secondary hospital on the corresponding test set;
the first-level hospital test accuracy average value is the diagnosis accuracy average value of the medical test model on the medical databases of a plurality of first-level hospitals;
none of the primary hospitals are participants.
As can be seen from the combination of Table 2, by adopting the training method provided by the invention, when the medical data diagnosis model of the participator has lower privacy budget of 0.4, namely higher safety, the model test precision is higher than 82%, and on the primary hospital of the non-participator, the model precision can also reach 78%. The medical data diagnosis model of the participator has the model test precision higher than 88% when the privacy budget is higher, namely the safety is lower, and the model precision can reach 86% in a primary hospital of a non-participator. Therefore, the training method provided by the invention can realize high-precision model training under the condition of not sharing medical data, and the final global parameters obtained by the invention have good generalization.
It will be understood by those skilled in the art that the present invention is not limited to the details of the foregoing exemplary embodiments, but includes other specific forms of the same or similar structures that may be embodied without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.
Furthermore, it should be understood that although the present disclosure describes embodiments, not every embodiment is provided with a separate embodiment, and that this description is provided for clarity only, and that the disclosure is not limited to the embodiments described in detail below, and that the embodiments described in the examples may be combined as appropriate to form other embodiments that will be apparent to those skilled in the art.
The technology, shape, and construction parts of the present invention, which are not described in detail, are known in the art.

Claims (10)

1. A training method of a medical data diagnosis model based on federal learning is characterized by comprising the following steps:
s1, acquiring a participant, wherein the participant has a local medical database; acquiring a part to be trained in a medical data diagnosis model of a participant as a local model; the local model performs diagnosis based on the input medical data to obtain a diagnosis result; the local model structures of all the participants are the same;
s2, the server gives global parameters w (0) to local models of all the participants, the participants perform local training on the medical data diagnosis model, and after the local training is finished, the parameters of the local model of the ith participant are recorded as w (i, 0); w (0) is an initialized global parameter;
s3, at the time of t, each local model carries out federal aggregation on a parameter w (i, t) uploading server to obtain a global parameter w (t, 0), adds noise to the global parameter w (t, 0), and records the global parameter after noise addition as w (t, 1); the initial value of t is 0;
s4, carrying out parameter updating by combining the local models with a local medical database, global parameters w (t, 1) and a set optimization target, wherein the parameters of the i-th participant after the local model updating are w (i, t+1, 1);
s5, judging whether t+1 is greater than or equal to a set value T; if yes, completing the training of each local model, substituting each local model into the corresponding medical data diagnosis model of the participator so as to fix the medical data diagnosis model of each participator; if not, executing step S6;
s6, adding noise and disturbance to parameters w (i, t+1, 1) of each local model, and recording the parameters after adding the noise and the disturbance to the local model of the ith participant as w (i, t+1); updating t to t+1, and returning to S3;
w(i,t+1)=|w(i,t+1,1)|×L(r)+n D (i,t+1)
let L (r) and r be transition terms, r=w (i, t+1, 1)/|w (i, t+1, 1) |;
the value of L (r) is as follows:
taking a random number x in [0,1 ];
if x<(e ε -1)/(e ε +1), L (r) is in the interval [ (r/2- (C-1)/2), (r/2+ (C-1)/2) ]]A random number is fetched;
if x is not less than (e) ε -1)/(e ε +1), L (r) is in the interval [ - (r/2+ (C-1)/2), (C-1)/2) -r/2]A random number is fetched;
c is a transition term, c= (e ε +1)/(e ε -1); epsilon is the set privacy budget; n is n D (i, t+1) is noise added at t+1 iterations of the local model of the ith participant; e is a natural number.
2. The method of training a federally learned medical data diagnostic model according to claim 1, wherein the global parameter update formula in S3 is:
n is the number of participants, i is ordinal number; n is n D And (t) is a set noise.
3. The method for training a federally learned medical data diagnostic model according to claim 2, wherein the noise n D (t) obeys the expectation of 0, variance σ 2 Gaussian distribution N (0, sigma) 2 );σ 2 =[2×ln(1.25/δ)]/ε 2 Delta is the set differential privacy significance level and epsilon is the set privacy budget.
4. The training method of a federal learning-based medical data diagnostic model according to claim 1, wherein the step S1 is to divide the participants into an aggregate object and a receiver, and the aggregate object is divided into J levels; the calculation formula of w (t, 1) in S3 is as follows:
w(j,t,c)=[∑ i∈Zjw (i,t)]/n(Zj)
w(t,0)=[∑ J j=1 p(j)×w(j,t,c)]/J
w(t,1)=w(t,0)+n D (t)
w (j, t, c) represents an aggregation parameter of the jth hierarchy, zj represents a set of participants within the jth hierarchy as aggregation objects, and n (Zj) represents the number of participants in Zj; c represents hierarchical aggregation; j is the number of layers, p (J) is the set weight of the J-th layer, and J is more than or equal to 1 and less than or equal to J; n is n D And (t) is a set noise.
5. The training method of a federally learned medical data diagnostic model according to claim 1, wherein the optimization objective in S4 is to minimize the function F (w (i)) +γ×| w (i) -w (t, 1) |; w (i) represents a parameter of the local model of the i-th participant, and F (w (i)) represents a loss of the local model of the i-th participant at the parameter w (i); gamma denotes the set regularization parameter.
6. The method for training a federally learned medical data diagnostic model according to claim 1, wherein n D (i, t+1) obeys the expectation that 0 variance is σ (i) 2 Is (N) (0, sigma (i)) of the gaussian distribution 2 );
σ(i) 2 =[2×ln(1.25/δ)]/[ε 2 ×N×m(i)]
Wherein δ is the set differential privacy significance level; n is the number of participants; m (i) is the sensitivity of the medical data diagnostic model of the ith participant.
7. The method for training a federally learned medical data diagnostic model according to claim 5, wherein in S4, the server transmits the global parameter w (t, 1) to the local model of each participant at time t, the local model parameters are updated to the mean of the current model parameters w (i, t) and the global parameters w (t, 1), and the medical data diagnostic model is then trained locally in combination with the optimization objective and the local medical database.
8. A medical data diagnosis method based on federal learning, comprising the steps of:
SA1, the medical unit performing, as a participant, the training method of the federal learning-based medical data diagnostic model according to any one of claims 1 to 7 to complete training of the medical data diagnostic model of the medical unit;
and SA2, inputting the medical data to be diagnosed into the trained medical data diagnosis model by the medical unit, and obtaining the output of the medical data diagnosis model as a diagnosis result.
9. A federal study-based medical data diagnostic system comprising a memory having a computer program stored therein and a processor coupled to the memory for executing the computer program to implement the federal study-based medical data diagnostic method of claim 8.
10. A memory, characterized in that a computer program is stored, which computer program, when executed, is adapted to carry out the federal learning-based medical data diagnosis method according to claim 8.
CN202310889420.9A 2023-07-20 2023-07-20 Medical data diagnosis model, method, system and memory based on federal learning Pending CN116611115A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310889420.9A CN116611115A (en) 2023-07-20 2023-07-20 Medical data diagnosis model, method, system and memory based on federal learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310889420.9A CN116611115A (en) 2023-07-20 2023-07-20 Medical data diagnosis model, method, system and memory based on federal learning

Publications (1)

Publication Number Publication Date
CN116611115A true CN116611115A (en) 2023-08-18

Family

ID=87683950

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310889420.9A Pending CN116611115A (en) 2023-07-20 2023-07-20 Medical data diagnosis model, method, system and memory based on federal learning

Country Status (1)

Country Link
CN (1) CN116611115A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117411683A (en) * 2023-10-17 2024-01-16 中国人民解放军国防科技大学 Method and device for identifying low orbit satellite network attack based on distributed federal learning

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113361694A (en) * 2021-06-30 2021-09-07 哈尔滨工业大学 Layered federated learning method and system applying differential privacy protection
WO2021208721A1 (en) * 2020-11-23 2021-10-21 平安科技(深圳)有限公司 Federated learning defense method, apparatus, electronic device, and storage medium
CN114169010A (en) * 2021-12-13 2022-03-11 安徽理工大学 Edge privacy protection method based on federal learning
CN114254386A (en) * 2021-12-13 2022-03-29 北京理工大学 Federated learning privacy protection system and method based on hierarchical aggregation and block chain
CN114357526A (en) * 2022-03-15 2022-04-15 中电云数智科技有限公司 Differential privacy joint training method for medical diagnosis model for resisting inference attack
CN114462090A (en) * 2022-02-18 2022-05-10 北京邮电大学 Tightening method for differential privacy budget calculation in federal learning
CN114841364A (en) * 2022-04-14 2022-08-02 北京理工大学 Federal learning method capable of meeting personalized local differential privacy requirements
CN115358487A (en) * 2022-09-21 2022-11-18 国网河北省电力有限公司信息通信分公司 Federal learning aggregation optimization system and method for power data sharing
CN115496198A (en) * 2022-08-05 2022-12-20 广州大学 Gradient compression framework for adaptive privacy budget allocation based on federal learning
CN115563650A (en) * 2022-10-14 2023-01-03 电子科技大学 Privacy protection system for realizing medical data based on federal learning
US20230047092A1 (en) * 2021-07-30 2023-02-16 Oracle International Corporation User-level Privacy Preservation for Federated Machine Learning
EP4149134A1 (en) * 2021-09-09 2023-03-15 Telefonica Digital España, S.L.U. Method and system for providing differential privacy using federated learning
CN115952533A (en) * 2022-11-18 2023-04-11 湖南科技大学 Personalized federal learning and recognition method and system based on differential privacy
CN116227631A (en) * 2022-12-07 2023-06-06 西京学院 Federal learning method and system for classification prediction of connection data of Internet of vehicles terminal
CN116363449A (en) * 2023-03-07 2023-06-30 沈阳理工大学 Image recognition method based on hierarchical federal learning

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021208721A1 (en) * 2020-11-23 2021-10-21 平安科技(深圳)有限公司 Federated learning defense method, apparatus, electronic device, and storage medium
CN113361694A (en) * 2021-06-30 2021-09-07 哈尔滨工业大学 Layered federated learning method and system applying differential privacy protection
US20230047092A1 (en) * 2021-07-30 2023-02-16 Oracle International Corporation User-level Privacy Preservation for Federated Machine Learning
EP4149134A1 (en) * 2021-09-09 2023-03-15 Telefonica Digital España, S.L.U. Method and system for providing differential privacy using federated learning
CN114169010A (en) * 2021-12-13 2022-03-11 安徽理工大学 Edge privacy protection method based on federal learning
CN114254386A (en) * 2021-12-13 2022-03-29 北京理工大学 Federated learning privacy protection system and method based on hierarchical aggregation and block chain
CN114462090A (en) * 2022-02-18 2022-05-10 北京邮电大学 Tightening method for differential privacy budget calculation in federal learning
CN114357526A (en) * 2022-03-15 2022-04-15 中电云数智科技有限公司 Differential privacy joint training method for medical diagnosis model for resisting inference attack
CN114841364A (en) * 2022-04-14 2022-08-02 北京理工大学 Federal learning method capable of meeting personalized local differential privacy requirements
CN115496198A (en) * 2022-08-05 2022-12-20 广州大学 Gradient compression framework for adaptive privacy budget allocation based on federal learning
CN115358487A (en) * 2022-09-21 2022-11-18 国网河北省电力有限公司信息通信分公司 Federal learning aggregation optimization system and method for power data sharing
CN115563650A (en) * 2022-10-14 2023-01-03 电子科技大学 Privacy protection system for realizing medical data based on federal learning
CN115952533A (en) * 2022-11-18 2023-04-11 湖南科技大学 Personalized federal learning and recognition method and system based on differential privacy
CN116227631A (en) * 2022-12-07 2023-06-06 西京学院 Federal learning method and system for classification prediction of connection data of Internet of vehicles terminal
CN116363449A (en) * 2023-03-07 2023-06-30 沈阳理工大学 Image recognition method based on hierarchical federal learning

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117411683A (en) * 2023-10-17 2024-01-16 中国人民解放军国防科技大学 Method and device for identifying low orbit satellite network attack based on distributed federal learning

Similar Documents

Publication Publication Date Title
Zhou et al. Evidential reasoning approach with multiple kinds of attributes and entropy-based weight assignment
CN109935336B (en) Intelligent auxiliary diagnosis system for respiratory diseases of children
Wang et al. Risk assessment of coronary heart disease based on cloud-random forest
CN109636061A (en) Training method, device, equipment and the storage medium of medical insurance Fraud Prediction network
US6609118B1 (en) Methods and systems for automated property valuation
Tripathy et al. A framework for intelligent medical diagnosis using rough set with formal concept analysis
CN110085327A (en) Multichannel LSTM neural network Influenza epidemic situation prediction technique based on attention mechanism
CN114298234B (en) Brain medical image classification method and device, computer equipment and storage medium
CN116611115A (en) Medical data diagnosis model, method, system and memory based on federal learning
CN106846326A (en) Image partition method based on multinuclear local message FCM algorithms
CN105808906A (en) Method for analyzing individual characteristics of patient and apparatus therefor
CN116741411A (en) Intelligent health science popularization recommendation method and system based on medical big data analysis
Gross et al. Systemic test and evaluation of a hard+ soft information fusion framework: Challenges and current approaches
Chen et al. Hierarchical Bayesian model with inequality constraints for US county estimates
Dutta Detecting Lung Cancer Using Machine Learning Techniques.
JP7365747B1 (en) Disease treatment process abnormality identification system based on hierarchical neural network
Goutham et al. Brain tumor classification using Efficientnet-B0 model
CN111798455A (en) Thyroid nodule real-time segmentation method based on full convolution dense cavity network
CN116091412A (en) Method for segmenting tumor from PET/CT image
Borst et al. Comparative evaluation of the comparable sales method with geostatistical valuation models
Sitepu et al. Analysis of Fuzzy C-Means and Analytical Hierarchy Process (AHP) Models Using Xie-Beni Index
CN111914952A (en) AD characteristic parameter screening method and system based on deep neural network
Dinara et al. Fine-tuning the hyperparameters of pre-trained models for solving multiclass classification problems
Goodman et al. Fuzzy ARTMAP neural network compared to linear discriminant analysis prediction of the length of hospital stay in patients with pneumonia
Zhou et al. An improved U-Net for nerve fibre segmentation in confocal corneal microscopy images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination