CN116595584A - Physical medicine data fusion privacy protection method based on cloud and fog architecture longitudinal federal learning - Google Patents

Physical medicine data fusion privacy protection method based on cloud and fog architecture longitudinal federal learning Download PDF

Info

Publication number
CN116595584A
CN116595584A CN202310565639.3A CN202310565639A CN116595584A CN 116595584 A CN116595584 A CN 116595584A CN 202310565639 A CN202310565639 A CN 202310565639A CN 116595584 A CN116595584 A CN 116595584A
Authority
CN
China
Prior art keywords
data
privacy
model
parameters
local
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310565639.3A
Other languages
Chinese (zh)
Inventor
王新建
武洛生
杨建设
胡婕婷
郑敏
穆鹏远
朱元利
路铭达
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
XI'AN INSTITUTE OF PHYSICAL EDUCATION
Original Assignee
XI'AN INSTITUTE OF PHYSICAL EDUCATION
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by XI'AN INSTITUTE OF PHYSICAL EDUCATION filed Critical XI'AN INSTITUTE OF PHYSICAL EDUCATION
Priority to CN202310565639.3A priority Critical patent/CN116595584A/en
Publication of CN116595584A publication Critical patent/CN116595584A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H20/00ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance
    • G16H20/40ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance relating to mechanical, radiation or invasive therapies, e.g. surgery, laser therapy, dialysis or acupuncture
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H40/00ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices
    • G16H40/20ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices for the management or administration of healthcare resources or facilities, e.g. managing hospital staff or surgery rooms
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Business, Economics & Management (AREA)
  • Public Health (AREA)
  • Epidemiology (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Primary Health Care (AREA)
  • Bioethics (AREA)
  • Surgery (AREA)
  • Urology & Nephrology (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention discloses a physical medical data fusion privacy protection method based on cloud architecture longitudinal federal learning, and aims to solve the problems of independence and privacy of movement guidance centers and hospital data in cloud architecture research. The method of the invention uses differential privacy technology to protect the privacy of data and models, and adopts the central server to aggregate the model parameters, thereby avoiding the risks of data disclosure and model privacy. The method is divided into three stages: (1) A data preprocessing stage, namely selecting relevant characteristics from a data set of a motion guidance center and a hospital for processing; (2) In a longitudinal federal learning model stage, a specific neural network architecture is adopted, individual data is encrypted through a differential privacy mechanism, the processed data is transmitted to a central server for model training, and a shared global model is finally generated; (3) And in the model prediction stage, the body-building health guidance, the disease prediction and the decision making are performed by using the model obtained by the joint modeling so as to improve the prediction accuracy.

Description

Physical medicine data fusion privacy protection method based on cloud and fog architecture longitudinal federal learning
Technical Field
The invention belongs to the field of data privacy protection and machine learning of physical and medical fusion, and particularly relates to a physical and medical data fusion privacy protection method based on longitudinal federal learning.
Background
With the rapid development of big data and artificial intelligence technology, more and more medical institutions and sports fitness institutions begin to share own data so as to perform joint modeling, and the accuracy and generalization capability of the model are improved. However, such data sharing often involves personal privacy, and how to protect personal privacy on the premise of guaranteeing the quality of a model becomes an important problem.
To address this problem, federal learning-based data fusion and privacy protection methods have been developed. The longitudinal federal learning is a method for carrying out joint training and analysis on data on the premise of protecting privacy. This approach allows joint learning of multiple data sources without sharing the original data and generating models with predictive capabilities. In longitudinal federal learning, each data source is responsible for providing only a portion of the information, and other data sources are not visible. Thus, even if an attacker obtains information from some data sources, it is not possible to infer information from other data sources from this information. Thus, longitudinal federal learning can guarantee data privacy and security.
In the field of physical and medical data fusion, the data fusion privacy protection method based on longitudinal federal learning can be applied to joint analysis of motion data and medical data of different crowds, and simultaneously is combined with chronic disease prevention and control, a motion prescription library for personalized health needs of different crowds and intervention of chronic single diseases is built, and based on basic information of service objects, health examination, physical testing, health state monitoring and evaluation, intervention guidance, scheme implementation and other data sources of different data mechanisms, key information sharing between various Internet of things terminals and various health service systems is realized, the relationship between human body motion and physical conditions is better understood, and in order to enable the user to know dynamic changes of physical health indexes and motion intervention effects in a motion period at any time, the health level and motion performance of people are further improved. At the same time, data privacy and security are also critical as it involves the sharing of personal privacy data. The differential privacy technology can be used for protecting personal privacy and guaranteeing accuracy and practicability of data. Therefore, the application of the data fusion privacy protection method based on longitudinal federal learning in the data fusion of the physical medicine is very feasible, and effective and accurate data fusion technical service is provided for promoting the body building of the whole people and the deep fusion of the health of the whole people.
Disclosure of Invention
In order to solve the defects of the prior art, the invention provides a physical and medical data fusion privacy protection method based on cloud and fog architecture longitudinal federal learning, which can improve the performance and prediction accuracy of a model, can be applied to data fusion, collaborative learning and privacy protection in different fields while protecting the privacy of data and the model, and has important practical value and application prospect.
A physical medical data fusion privacy protection method based on cloud architecture longitudinal federal learning is characterized by comprising the following steps:
step one, collecting the original data of each participant participating in federal learning and carrying out relevant processing, wherein the specific process is as follows:
step 1.1, firstly, a motion guidance center and a hospital collect original data of all participants participating in federal learning, data cleaning is carried out on the original data, after data cleaning operation is carried out, in order to enable the data to be more suitable for the use of a machine learning algorithm, preprocessing is needed to be carried out on the data, data formats, labels and value fields of different participants are unified to form a standardized data format, influences of dimension differences and abnormal values are eliminated, the data of different participants are ensured to have similar distribution and characteristics, and therefore the federal learning effect and reliability are improved;
step 1.2, adding random noise into the preprocessed data through a differential privacy technology, segmenting and encrypting privacy features, and protecting the privacy of original data, so that the disclosure of personal sensitive information is prevented, and different methods are needed to be adopted for numerical data and non-numerical data respectively;
step 1.3, finally, storing the cleaned and preprocessed data into a database or file system to form a local data set for later analysis and model training, wherein the exercise guidance center and the hospital respectively have the local data set D center and Dhispital
Respectively carrying out model training on respective local data sets by a motion guidance center and a hospital to obtain a local model;
the specific process is as follows:
step 2.1, sampling and selecting user data of participants participating in federal learning by both sides of the exercise guidance center and the hospital, and carrying out encryption ID matching alignment to obtain a user ID list shared by both sides, wherein the data of the institutions are used for federal learning;
step 2.2, then, the two parties encrypt the user data homomorphic, and train a local model on a local data set by using a longitudinal federal learning mechanism;
step 2.3, in the process of local model training, noise needs to be added into a gradient update algorithm according to the requirement of differential privacy so as to protect data privacy, and meanwhile, when model parameters are updated, corresponding noise is added to the gradient of each participant so as to protect the data privacy, and a gradient update formula of each participant can be expressed as follows:
wherein ,Δwt Representing the updated amount of the parameter, n represents the number of samples in the training set, y i A label, x, representing sample i i Features representing samples i, f t Representing the predictive function, w, of the current model t Representing parameters of the current model, lambda representing regularized parameters, sigma t The standard deviation is indicated as such,mean value 0, variance ++>Is a distributed noise of (a);
step three, uploading the local model to a federal server by both the exercise guidance center and the hospital; adding noise into the parameters of the local model during uploading, and sending the parameters after noise addition to a federal server for parameter updating;
fourthly, the federation server carries out longitudinal federation learning on the uploaded local model to generate a global model, and the specific process is as follows:
in the longitudinal federation learning process, the federation server needs to aggregate the local models of all the participants to update the global model parameters so as to improve the accuracy and performance of the models, and in the process of updating the global model parameters, in order to protect the privacy of each participant, a differential privacy technology is used for adding random noise to protect the privacy of the local model;
step five, the motion guidance center and the hospital input the local data set into a global model, train the global model, and then predict the data by using the trained global model to obtain a prediction result;
step six, the motion guidance center and the hospital upload the prediction result to the federal server;
before uploading the predicted result, each mechanism needs to perform differential privacy processing on the predicted result, namely, each mechanism uploads the predicted result with random noise to a federal server instead of the original predicted result, the size of the random noise can be controlled through privacy parameters in differential privacy, and a Laplace mechanism is adopted to add the random noise to each predicted result;
namely:
wherein ,the predicted result after random noise is added is represented by Lap, laplacian distribution is represented by Δf, sensitivity of the function f to changes in the data set D is represented by ε, and privacy parameters are represented by ε. This formula can add a certain amount of noise to each prediction, thus preserving its privacy;
step seven, after receiving the prediction results with random noise from different participants, the federal server aggregates the parameters through weighted average, the aggregated prediction results are encrypted to generate final prediction results, and the results are returned to each participant;
step eight, testing and verifying the trained global model to verify the prediction performance of the model, specifically, dividing a data set into a plurality of parts, taking one part as a test set each time, taking the other part as a training set, then performing model training on the training set, performing model testing on the test set, calculating the performance index of the model, repeating for a plurality of times, and finally obtaining the average performance index; the process can be repeated for a plurality of times until the global model reaches the expected precision, so that scientific body-building health guidance of the body medical data fusion is realized.
Further, the data cleaning means: removing missing values, abnormal values and repeated values in the original data to ensure the accuracy and reliability of subsequent data analysis; filling the missing values by adopting an interpolation method; for abnormal values, detecting and repairing the abnormal values by adopting a statistical method or a machine learning algorithm; for the repeated values, a deleting or merging method is adopted for processing.
Further, the preprocessing refers to data centralization processing, data scaling processing and data normalization processing.
Further, the data centralization process refers to: different participants use different units of measurement to measure the height and weight of the patient, resulting in different units of data, and the average of the data is moved to zero by a data-centric process, i.e., the average of each feature minus the average of the feature in the entire dataset, such that the data of the different participants has a similar distribution over the units of measurement.
Further, the data scaling process refers to: different participants use different measuring equipment to measure the blood sugar level of the patient, so that the range of the data is different, the data is scaled down or amplified by adopting a data scaling processing mode, so that the range of each feature is the same, and the data of different participants has similar distribution in measurement precision.
Further, the data normalization processing means: different participants sample the patient in different time periods, so that the data are distributed differently on the time axis, and the data are limited in a certain range by adopting a data normalization processing mode, so that the deviation of the data is avoided, and the data of different participants have similar distribution on the time axis.
Further, in the first step, aiming at the numerical data, a Laplace mechanism is adopted to add random dynamic disturbance noise to the data result; for a function f, the Laplace mechanism adds one noise N (0, Δf/ε) from the Laplace distribution to each f (x), where ε is the privacy parameter; specifically, for a query q and a database a, the results of the query calculated by the laplace mechanism are:
q(A)=f(A)+Lap(0,Δf/ε)
wherein f (A) is the query result of the original data, Δf represents sensitivity, represents the sensitivity degree of the function f to the change of the data set, lap (0, b) represents Laplacian distribution with the mean value of 0 and the standard deviation of b; the noise added in the laplace mechanism is symmetrical so that the average error for a query result can be kept at a constant level of Δf/epsilon.
Further, in the second step, aiming at non-numerical data, an exponential mechanism is adopted to introduce a scoring mechanism, output scores of discrete classifications are enumerated and calculated, and the output scores are characterized as probability values after normalization; specifically, for a function f, given a data set D, the exponential mechanism adds a random noise N to the query results such that the probability of issuance of each query result f (D) is proportional to the probability of issuance of f (d+Δd), where Δd is a small variation of D, and N is a random variable subject to an exponential distribution, whose probability density function is:
wherein epsilon is a privacy parameter and represents the degree of privacy protection, deltaf is sensitivity and represents the sensitivity of a function to the change of a data set, and random noise of an exponential mechanism can lead a query result to have certain uncertainty, so that the effect of privacy protection is improved.
Further, the third specific method is that the local model parameter of the party i is assumed to be theta i To protect differential privacy, party i will be at the local model parameter θ i Adding noise which is compliant with Laplace distribution, i.e. theta to ultra i =θ i +Δθ i, wherein Δθi Obeying the Laplacian distribution Lap (0, b), and Δθ i Meeting the requirement of differential privacy, namely meeting the (epsilon, delta) -differential privacy, so that the local model parameters after noise addition, which are sent to the federal server by the party i, are theta i I.e.
wherein ,Δθi Representing the Laplace distributed noise, the probability density function is:
receiving the noisy local model parameters theta to the extent that are transmitted by the party i on the federal server i Then, the federal server calculates the global model parameters θ according to a weighted average global I.e.
wherein ,wi Is the weight of the participant i, and the weight is determined according to the data quantity and the data quality factor of the participant.
Further, the specific method in the fourth step is as follows: assuming M participants, each participant has trained a model M locally i Uploading the local models to a federation server through a differential privacy technology, and then combining the models into a global model M by the federation server, and updating parameters of the global model M to realize optimization of the global model;
set up local model M i The gradient in the t-th iteration isThe parameter of the global model M before the t-th round of iteration is theta t-1 The gradient of the global model M in the t-th round iteration is g t In the differential privacy technique, the global model updates parameters with each gradient +.>Random noise is added to protect the privacy of each participant user while still maintaining the accuracy of the model, and specifically, the update formula of the global model can be expressed as:
wherein N (0, sigma) 2 I) Representation is 0 as mean, sigma 2 As distributed noise of variance, I is an identity matrix, thus, in global modelIn the parameter updating process, the differential privacy technology can add noise into the gradient of each participant user so as to protect the privacy of each participant user and ensure the accuracy of the global model.
Further, the training process of the step five global model may be described as the following multi-round iterative process:
(1) Initially, the exercise guidance center and the hospital randomly initialize local model parameters, respectively and />
(2) In each iteration, the athletic guideline center and the hospital upload local model parameters to the federal server, namely:
wherein t represents the iteration round number, eta represents the learning rate, f (·) represents the loss function,a gradient representing a loss function;
(3) In each iteration, the global model server adds some noise to each participant so that the output model parameters do not reveal individual privacy information, assuming and />Local model parameters representing exercise guidance center and hospital, respectively +.> and />May be calculated using the following formula:
(4) After the global model server receives the local model parameters of the differential privacy version uploaded by the participants, the local model parameters are weighted and averaged to update the global model parameters theta (t) The method comprises the following steps:
wherein n represents the number of participants, w i The weight representing the ith participant may generally be determined based on the amount and quality of the data of the participants.
Further, a seventh specific method is that, assuming that there are m participants, the local model parameter of each participant is θ i The weight is w i The calculation formula of the global model parameters is:
wherein the weight w i The weights of all the participants in the training are expressed, the weights can be dynamically adjusted according to the data quantity and the quality of the weights, and the denominator represents the sum of the weights of all the participants, so that the result that the global model parameters are weighted average is ensured.
Further, the method includes testing and verifying data by using a global model to verify predicted performance, taking one part as a test set and the other part as a training set each time, performing model training on the training set, performing model testing on the test set, and calculating performance indexes of the model.
Further, the participants comprise various medical, fitness, physical examination and rehabilitation institutions, and each institution can share data and knowledge with other institutions by adding a federal learning network, so that comprehensive analysis and decision making capability of the data is improved.
The training data and the treatment data comprise morphological indexes, physiological and biochemical indexes, kinematic indexes, dynamic indexes, medical records and the like, and are subjected to privacy protection treatment.
And the model training process adopts a differential privacy technology to carry out noise processing on the information such as model parameters, gradients, updating and the like in the Union learning process so as to protect privacy data of users.
The noise level in the differential privacy technology meets the balance of privacy protection and data utility.
The encryption algorithm is an algorithm based on homomorphic encryption, and can encrypt and decrypt data on the premise of not revealing data content, so that privacy data of users are protected, and only authorized users can obtain complete model parameters after decryption in a mode of retaining key information.
The longitudinal federal learning model established by the exercise guidance center and the hospital is a neural network model.
The computing devices used in the model building and predicting steps comprise a server, a PC, a sports bracelet, a sports watch, a mobile terminal and the like.
The invention aims to solve the problems of independence and privacy of movement guidance centers and hospital data in cloud and fog architecture research. The invention uses differential privacy technology to protect the privacy of data and models, and adopts the federal server to aggregate the model parameters, thereby avoiding the risks of data disclosure and model privacy.
Drawings
FIG. 1 is a schematic illustration of longitudinal federal learning data fusion privacy protection of the present invention;
FIG. 2 is a schematic diagram of the three-party model training process of the exercise guidance center, hospital and federal center of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
In this embodiment, we select a hospital and a sports guidance center, and the data set of the hospital is an electronic medical record data set containing more than 1000 patients, including age, sex, height, weight, blood sugar and other characteristics, and whether the patient has diabetes and other sensitive characteristics. The data set of the exercise guidance center has the same amount of patient data, but unlike hospitals, the exercise guidance center has different characteristics, such as life style characteristics of the patient's diet, exercise program, exercise duration, exercise intensity, exercise frequency, etc. The two institutions can perform data fusion through a longitudinal federal learning method to obtain a more comprehensive data set, and a more accurate disease prediction model is established.
Firstly, preprocessing data: the data are grouped according to the characteristics, the data with the same characteristics are encrypted into encrypted data, and the encrypted data are subjected to noise processing through a differential privacy technology, so that the privacy of sensitive characteristics and models is protected.
All the data are collected by the exercise guidance center and the hospital and classified and arranged. Because the original data may have problems such as deletion, abnormality, repetition, etc., data cleaning is required.
Specifically, the following data cleansing operations are required:
filling in missing data: for partial missing data, interpolation or other methods are adopted to fill the data so as to ensure the integrity of the data.
Processing duplicate data: for fully duplicated data, only one copy is kept and the other copies are deleted. And merging different items for repeated data with incorrect format or missing key information, and deleting invalid data.
Repairing abnormal data: for abnormal data, a statistical method or a machine learning algorithm is adopted for detection and repair. We used a statistical box-plot method to detect outliers and used mean, median, etc. methods for remediation. In addition, a clustering algorithm based on machine learning is also used to cluster data sets, treat data that does not belong to any cluster as outliers, and then use the mean of the cluster for repair.
After the data cleaning operation is performed by both the exercise guidance center and the hospital, the data needs to be preprocessed in order to make the data more suitable for the use of machine learning algorithms. The method comprises the following steps:
data centralization treatment: different institutions use different units of measurement to measure the height and weight of a patient, resulting in different units of data. The data centering processing mode is adopted, and the average value of the data is moved to the zero point, so that the data of different mechanisms have similar distribution on the measurement unit.
Data scaling processing: different institutions use different measuring devices to measure the blood glucose level of patients, resulting in different ranges of data. The data is scaled down or enlarged by adopting a data scaling processing mode, so that the range of each feature is the same, and the data of different mechanisms have similar distribution in measurement precision.
And (3) data normalization processing: different institutions sample patients over different time periods, resulting in different distributions of data on the time axis. The data is limited in a certain range by adopting a data normalization processing mode, so that deviation of the data is avoided, and the data of different mechanisms have similar distribution on a time axis.
Finally, the cleaned and preprocessed data needs to be stored in a database or file system for subsequent analysis and training of the machine learning model.
Table 1 hospital side data preprocessing
Table 2 motion guidance center data preprocessing
And secondly, building a longitudinal federal learning model: model training is carried out by exchanging encrypted data, a model is built together, and parameter updating of the model is more reasonable and accurate through a gradient updating algorithm.
The exercise guidance center and the hospital divide the own data set into a training set and a testing set respectively, and model training is carried out on the respective training sets. In local training, the physical examination center and the hospital can use different model architectures, and can also use the same model architecture. In the invention, a longitudinal federal learning method based on a neural network is adopted to construct a model. Specifically, each participant trains its own model locally and uploads model parameters to a central server for aggregation, thereby obtaining a joint model. The method can utilize the data of different institutions to perform model training, and can obtain a model with higher accuracy and generalization capability under the condition of ensuring privacy safety.
TABLE 3 neural network Structure employed in the present invention
Hierarchy of layers Input device Output of Activation function
Input layer Feature vector - -
Hidden layer 1 - 64 ReLU
Hidden layer 2 64 32 ReLU
Output layer 32 Target variable Sigmoid
Each participant trains own model locally, differential privacy processing is carried out on model parameters, and noise is added in a gradient updating algorithm according to the requirement of differential privacy so as to protect data privacy. And uploading the encrypted model parameters to a central server for aggregation. The central server also adopts a differential privacy method during aggregation to protect model parameters and data privacy.
Table 4 differential privacy parameters employed in the present invention and their values
Parameters (parameters) Value taking
ε 0.01
δ 1e-5
Sensitivity 0.1
Where ε represents the privacy budget, δ represents the leakage probability, and Sensitivity represents the Sensitivity of the query function.
Finally, a model is applied to predict: and the model obtained by the joint modeling is used for body-building health guidance, disease prediction and decision making so as to achieve the aim of improving the prediction accuracy.
On the basis of the model obtained by the joint modeling, the trained model needs to be tested and verified to verify the predicted performance of the model on the new data set. In the present invention, a 10-fold cross-validation method was used to evaluate the model. Specifically, the data set is divided into 10 parts, one of which is the test set at a time, and the rest of which is the training set. Model training is then performed on the training set, model testing is performed on the testing set, and performance indexes (such as accuracy, recall, F1 values, and the like) of the model are calculated. Repeating for 10 times to finally obtain the average performance index.
TABLE 5 Performance index used in the present invention and its calculation formula
Performance index Formula (VI)
Accuracy rate of (TP+TN)/(TP+TN+FP+FN)
Recall rate of recall TP/(TP+FN)
F1 value 2 (precision x recall)/(precision + recall)
Where TP represents the true number of cases, TN represents the true number of cases, FP represents the false number of cases, and FN represents the false number of cases.
Table 6 comparison of the performance of the invention with other methods
The combined modeling process can be repeated for a plurality of times until the global model reaches the expected precision, and the result is returned to the exercise guidance center and the hospital, so that scientific fitness health guidance of the body medical data fusion is realized. As can be seen from the table, compared with other methods, the method has the advantages that on two data sets, the accuracy and generalization capability of the model can be improved on the premise of ensuring the data privacy, and the effectiveness and feasibility of the method are proved.
The foregoing is considered as illustrative of the principles of the present invention, and has been described herein before with reference to the accompanying drawings, in which the invention is not limited to the specific embodiments shown.

Claims (13)

1. A physical medical data fusion privacy protection method based on cloud architecture longitudinal federal learning is characterized by comprising the following steps:
step one, collecting the original data of each participant participating in federal learning and carrying out relevant processing, wherein the specific process is as follows:
step 1.1, firstly, a motion guidance center and a hospital collect original data of all participants participating in federal learning, data cleaning is carried out on the original data, after data cleaning operation is carried out, in order to enable the data to be more suitable for the use of a machine learning algorithm, preprocessing is needed to be carried out on the data, data formats, labels and value fields of different participants are unified to form a standardized data format, influences of dimension differences and abnormal values are eliminated, the data of different participants are ensured to have similar distribution and characteristics, and therefore the federal learning effect and reliability are improved;
step 1.2, adding random noise into the preprocessed data through a differential privacy technology, segmenting and encrypting privacy features, and protecting the privacy of original data, so that the disclosure of personal sensitive information is prevented, and different methods are needed to be adopted for numerical data and non-numerical data respectively;
step 1.3, finally, storing the cleaned and preprocessed data into a database or file system to form a local data set for later analysis and model training, wherein the exercise guidance center and the hospital respectively have the local data set D center and Dhispital
Respectively carrying out model training on respective local data sets by a motion guidance center and a hospital to obtain a local model;
the specific process is as follows:
step 2.1, sampling and selecting user data of participants participating in federal learning by both sides of the exercise guidance center and the hospital, and carrying out encryption ID matching alignment to obtain a user ID list shared by both sides, wherein the data of the institutions are used for federal learning;
step 2.2, then, the two parties encrypt the user data homomorphic, and train a local model on a local data set by using a longitudinal federal learning mechanism;
step 2.3, in the process of local model training, noise needs to be added into a gradient update algorithm according to the requirement of differential privacy so as to protect data privacy, and meanwhile, when model parameters are updated, corresponding noise is added to the gradient of each participant so as to protect the data privacy, and a gradient update formula of each participant can be expressed as follows:
wherein ,Δwt Representing the updated amount of the parameter, n represents the number of samples in the training set, y i A label, x, representing sample i i Features representing samples i, f t Representing the predictive function, w, of the current model t Representing parameters of the current model, lambda representing regularized parameters, sigma t The standard deviation is indicated as such,mean value 0, variance ++>Is a distributed noise of (a);
step three, uploading the local model to a federal server by both the exercise guidance center and the hospital; adding noise into the parameters of the local model during uploading, and sending the parameters after noise addition to a federal server for parameter updating;
fourthly, the federation server carries out longitudinal federation learning on the uploaded local model to generate a global model, and the specific process is as follows:
in the longitudinal federation learning process, the federation server needs to aggregate the local models of all the participants to update the global model parameters so as to improve the accuracy and performance of the models, and in the process of updating the global model parameters, in order to protect the privacy of each participant, a differential privacy technology is used for adding random noise to protect the privacy of the local model;
step five, the motion guidance center and the hospital input the local data set into a global model, train the global model, and then predict the data by using the trained global model to obtain a prediction result;
step six, the motion guidance center and the hospital upload the prediction result to the federal server;
before uploading the predicted result, each mechanism needs to perform differential privacy processing on the predicted result, namely, each mechanism uploads the predicted result with random noise to a federal server instead of the original predicted result, the size of the random noise can be controlled through privacy parameters in differential privacy, and a Laplace mechanism is adopted to add the random noise to each predicted result;
namely:
wherein ,the predicted result after random noise is added is represented by Lap, laplacian distribution is represented by Δf, sensitivity of the function f to changes in the data set D is represented by ε, and privacy parameters are represented by ε. This formula can add a certain amount of noise to each prediction, thus preserving its privacy;
step seven, after receiving the prediction results with random noise from different participants, the federal server aggregates the parameters through weighted average, the aggregated prediction results are encrypted to generate final prediction results, and the results are returned to each participant;
step eight, testing and verifying the trained global model to verify the prediction performance of the model, specifically, dividing a data set into a plurality of parts, taking one part as a test set each time, taking the other part as a training set, then performing model training on the training set, performing model testing on the test set, calculating the performance index of the model, repeating for a plurality of times, and finally obtaining the average performance index; the process can be repeated for a plurality of times until the global model reaches the expected precision, so that scientific body-building health guidance of the body medical data fusion is realized.
2. The method for protecting the privacy of the fusion of body medical data based on cloud architecture longitudinal federal learning as set forth in claim 1, wherein the data cleaning means: removing missing values, abnormal values and repeated values in the original data to ensure the accuracy and reliability of subsequent data analysis; filling the missing values by adopting an interpolation method; for abnormal values, detecting and repairing the abnormal values by adopting a statistical method or a machine learning algorithm; for the repeated values, a deleting or merging method is adopted for processing.
3. The method for protecting the privacy of the fusion of the body medical data based on the cloud and fog architecture longitudinal federal learning according to claim 1, wherein the preprocessing is data centralization processing, data scaling processing and data normalization processing.
4. The method for protecting the privacy of the fusion of body medical data based on cloud architecture longitudinal federal learning as set forth in claim 3, wherein the data centering process is: different participants use different units of measurement to measure the height and weight of the patient, resulting in different units of data, and the average of the data is moved to zero by a data-centric process, i.e., the average of each feature minus the average of the feature in the entire dataset, such that the data of the different participants has a similar distribution over the units of measurement.
5. The method for protecting the privacy of the fusion of body medical data based on cloud architecture longitudinal federal learning as set forth in claim 3, wherein the data scaling process means: different participants use different measuring equipment to measure the blood sugar level of the patient, so that the range of the data is different, the data is scaled down or amplified by adopting a data scaling processing mode, so that the range of each feature is the same, and the data of different participants has similar distribution in measurement precision.
6. The method for protecting the privacy of the fusion of body medical data based on cloud architecture longitudinal federal learning as set forth in claim 3, wherein the data normalization process means: different participants sample the patient in different time periods, so that the data are distributed differently on the time axis, and the data are limited in a certain range by adopting a data normalization processing mode, so that the deviation of the data is avoided, and the data of different participants have similar distribution on the time axis.
7. The method for protecting the privacy of the fusion of body medical data based on cloud and fog architecture longitudinal federal learning as claimed in claim 1, wherein in the first step, for numerical data, a Laplace mechanism is adopted to add random dynamic disturbance noise to the data result; for a function f, the Laplace mechanism adds one noise N (0, Δf/ε) from the Laplace distribution to each f (x), where ε is the privacy parameter; specifically, for a query q and a database a, the results of the query calculated by the laplace mechanism are:
q(A)=f(A)+Lap(0,Δf/ε)
wherein f (A) is the query result of the original data, Δf represents sensitivity, represents the sensitivity degree of the function f to the change of the data set, lap (0, b) represents Laplacian distribution with the mean value of 0 and the standard deviation of b; the noise added in the laplace mechanism is symmetrical so that the average error for a query result can be kept at a constant level of Δf/epsilon.
8. The method for protecting the privacy of the fusion of the body medical data based on the cloud and fog architecture longitudinal federal learning, which is characterized by comprising the following steps of adopting an exponential mechanism for non-numerical data so as to introduce a scoring mechanism, enumerating and calculating output scores of discrete classifications of the non-numerical data, and characterizing the non-numerical data as probability values after normalization; specifically, for a function f, given a data set D, the exponential mechanism adds a random noise N to the query results such that the probability of issuance of each query result f (D) is proportional to the probability of issuance of f (d+Δd), where Δd is a small variation of D, and N is a random variable subject to an exponential distribution, whose probability density function is:
wherein epsilon is a privacy parameter and represents the degree of privacy protection, deltaf is sensitivity and represents the sensitivity of a function to the change of a data set, and random noise of an exponential mechanism can lead a query result to have certain uncertainty, so that the effect of privacy protection is improved.
9. The method for protecting privacy of body medical data fusion based on cloud architecture longitudinal federal learning as set forth in claim 1, wherein the third specific method is to assume that the local model parameter of the participant i is θ i To protect differential privacy, party i will be at the local model parameter θ i Adding noise following Laplacian distribution, i.e wherein Δθi Obeying the Laplacian distribution Lap (0, b), and Δθ i Meeting the requirement of differential privacy, i.e., (epsilon, delta) -differential privacy, thus the noisy local model parameters sent by party i to the federal server are +.>I.e.
wherein ,Δθi Representing the Laplace distributed noise, the probability density function is:
receiving, on a federal server, noisy local model parameters sent by party iThen, the federal server calculates the global model parameters θ according to a weighted average global I.e.
wherein ,wi Is the weight of the participant i, and the weight is determined according to the data quantity and the data quality factor of the participant.
10. The method for protecting the privacy of the fusion of body medical data based on cloud architecture longitudinal federal learning as set forth in claim 1, wherein the method for protecting the privacy of the fusion of body medical data based on cloud architecture longitudinal federal learning is characterized by comprising the following steps: assuming M participants, each participant has trained a model M locally i Uploading the local models to a federation server through a differential privacy technology, and then combining the models into a global model M by the federation server, and updating parameters of the global model M to realize optimization of the global model;
set up local model M i The gradient in the t-th iteration isThe parameter of the global model M before the t-th round of iteration is theta t-1 The gradient of the global model M in the t-th round iteration is g t In the differential privacy technique, the global model updates parameters with each gradient +.>Random noise is added to protect the privacy of each participant user while still maintaining the accuracy of the model, and specifically, the update formula of the global model can be expressed as:
wherein,N(0,σ 2 i) Representation is 0 as mean, sigma 2 And the variance is distributed noise, and I is an identity matrix, so that in the parameter updating process of the global model, the differential privacy technology can add noise into the gradient of each participant user so as to protect the privacy of each participant user and ensure the accuracy of the global model.
11. The method for protecting the privacy of the fusion of body medical data based on cloud architecture longitudinal federal learning according to claim 1, wherein the training process of the fifth global model can be described as the following multi-round iterative process:
(1) Initially, the exercise guidance center and the hospital randomly initialize local model parameters, respectively and />
(2) In each iteration, the athletic guideline center and the hospital upload local model parameters to the federal server, namely:
wherein t represents the iteration round number, eta represents the learning rate, f (·) represents the loss function,a gradient representing a loss function;
(3) In each iteration, the global model server adds some noise to each participant so that the output model parameters do not reveal individual privacy information, assuming and />Local model parameters representing exercise guidance center and hospital, respectively +.> and />May be calculated using the following formula:
(4) After the global model server receives the local model parameters of the differential privacy version uploaded by the participants, the local model parameters are weighted and averaged to update the global model parameters theta (t) The method comprises the following steps:
wherein n represents the number of participants, w i The weight representing the ith participant may generally be determined based on the amount and quality of the data of the participants.
12. The method for protecting privacy of body medical data fusion based on cloud architecture longitudinal federal learning as set forth in claim 1, wherein the seventh specific method is that m participants are assumed, and the local model parameter of each participant is θ i The weight is w i The calculation formula of the global model parameters is:
wherein the weight w i The weights of all the participants in the training are expressed, the weights can be dynamically adjusted according to the data quantity and the quality of the weights, and the denominator represents the sum of the weights of all the participants, so that the result that the global model parameters are weighted average is ensured.
13. The method for protecting the privacy of the fusion of body medical data based on the longitudinal federal learning of cloud and fog architecture as claimed in claim 1, wherein the method is characterized in that the global model is used for testing and verifying the data to verify the prediction performance of the data, one part is used as a test set each time, the other part is used as a training set, model training is carried out on the training set, model testing is carried out on the test set, and the performance index of the model is calculated.
CN202310565639.3A 2023-05-19 2023-05-19 Physical medicine data fusion privacy protection method based on cloud and fog architecture longitudinal federal learning Pending CN116595584A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310565639.3A CN116595584A (en) 2023-05-19 2023-05-19 Physical medicine data fusion privacy protection method based on cloud and fog architecture longitudinal federal learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310565639.3A CN116595584A (en) 2023-05-19 2023-05-19 Physical medicine data fusion privacy protection method based on cloud and fog architecture longitudinal federal learning

Publications (1)

Publication Number Publication Date
CN116595584A true CN116595584A (en) 2023-08-15

Family

ID=87589402

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310565639.3A Pending CN116595584A (en) 2023-05-19 2023-05-19 Physical medicine data fusion privacy protection method based on cloud and fog architecture longitudinal federal learning

Country Status (1)

Country Link
CN (1) CN116595584A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117236420A (en) * 2023-11-14 2023-12-15 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Method and system for debugging vertical federation learning abnormal data based on data subset
CN117393148A (en) * 2023-10-27 2024-01-12 中科晶锐(苏州)科技有限公司 Intelligent medical federal learning method and device capable of protecting privacy of patient
CN117579215A (en) * 2024-01-17 2024-02-20 杭州世平信息科技有限公司 Longitudinal federal learning differential privacy protection method and system based on tag sharing
CN117640253A (en) * 2024-01-25 2024-03-01 济南大学 Federal learning privacy protection method and system based on homomorphic encryption
CN117648543A (en) * 2024-01-30 2024-03-05 金数信息科技(苏州)有限公司 Self-evolving substation equipment learning method

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117393148A (en) * 2023-10-27 2024-01-12 中科晶锐(苏州)科技有限公司 Intelligent medical federal learning method and device capable of protecting privacy of patient
CN117393148B (en) * 2023-10-27 2024-06-07 中科晶锐(苏州)科技有限公司 Intelligent medical federal learning method and device capable of protecting privacy of patient
CN117236420A (en) * 2023-11-14 2023-12-15 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Method and system for debugging vertical federation learning abnormal data based on data subset
CN117236420B (en) * 2023-11-14 2024-03-26 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Method and system for debugging vertical federation learning abnormal data based on data subset
CN117579215A (en) * 2024-01-17 2024-02-20 杭州世平信息科技有限公司 Longitudinal federal learning differential privacy protection method and system based on tag sharing
CN117579215B (en) * 2024-01-17 2024-03-29 杭州世平信息科技有限公司 Longitudinal federal learning differential privacy protection method and system based on tag sharing
CN117640253A (en) * 2024-01-25 2024-03-01 济南大学 Federal learning privacy protection method and system based on homomorphic encryption
CN117640253B (en) * 2024-01-25 2024-04-05 济南大学 Federal learning privacy protection method and system based on homomorphic encryption
CN117648543A (en) * 2024-01-30 2024-03-05 金数信息科技(苏州)有限公司 Self-evolving substation equipment learning method

Similar Documents

Publication Publication Date Title
CN116595584A (en) Physical medicine data fusion privacy protection method based on cloud and fog architecture longitudinal federal learning
Kompa et al. Second opinion needed: communicating uncertainty in medical machine learning
Yuan et al. A stable AI-based binary and multiple class heart disease prediction model for IoMT
Rahman et al. Machine learning approaches for tackling novel coronavirus (COVID-19) pandemic
Khedr et al. An efficient association rule mining from distributed medical databases for predicting heart diseases
CN111477337B (en) Infectious disease early warning method, system and medium based on individual self-adaptive transmission network
Ha et al. Spatio-temporal split learning for privacy-preserving medical platforms: Case studies with covid-19 ct, x-ray, and cholesterol data
Kim et al. Using deep learning to predict temporomandibular joint disc perforation based on magnetic resonance imaging
Hatt et al. Sequential deconfounding for causal inference with unobserved confounders
Mahbub et al. Covid-19 detection using chest x-ray images with a regnet structured deep learning model
Pradhan et al. Optimizing CNN‐LSTM hybrid classifier using HCA for biomedical image classification
Ningrum et al. A deep learning model to predict knee osteoarthritis based on nonimage longitudinal medical record
Chen et al. A New Optimal Diagnosis System for Coronavirus (COVID‐19) Diagnosis Based on Archimedes Optimization Algorithm on Chest X‐Ray Images
Gollapalli et al. An Artificial Intelligence Approach for Data Modelling Patients Inheritance of Sickle Cell Disease (SCD) in the Eastern Regions of Saudi Arabia.
Nugroho et al. Performance of root-mean-square propagation and adaptive gradient optimization algorithms on covid-19 pneumonia classification
Alodat Using deep learning model for adapting and managing COVID-19 pandemic crisis
Roul et al. COVIHunt: An Intelligent CNN-Based COVID-19 Detection Using CXR Imaging
Manocha et al. Edge intelligence-assisted smart healthcare solution for health pandemic: a federated environment approach
Tourassi et al. Multifractal texture analysis of perfusion lung scans as a potential diagnostic tool for acute pulmonary embolism
Xu et al. CoxNAM: An interpretable deep survival analysis model
Banyal et al. Technology landscape for epidemiological prediction and diagnosis of covid-19
Dutta et al. Forecasting the Growth in Covid-19 Infection Rates
Bala et al. Applications of Machine Learning and Deep Learning for maintaining Electronic Health Records
Li et al. Multiview deep forest for overall survival prediction in cancer
Ravi et al. Prediction of heart disease using machine learning algorithms.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination