CN110348241A - A kind of multicenter under data sharing strategy cooperates with prognosis prediction system - Google Patents
A kind of multicenter under data sharing strategy cooperates with prognosis prediction system Download PDFInfo
- Publication number
- CN110348241A CN110348241A CN201910629800.2A CN201910629800A CN110348241A CN 110348241 A CN110348241 A CN 110348241A CN 201910629800 A CN201910629800 A CN 201910629800A CN 110348241 A CN110348241 A CN 110348241A
- Authority
- CN
- China
- Prior art keywords
- data
- medical institutions
- model
- center
- multicenter
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H10/00—ICT specially adapted for the handling or processing of patient-related medical or healthcare data
- G16H10/60—ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Medical Informatics (AREA)
- Bioethics (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Databases & Information Systems (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- Epidemiology (AREA)
- Primary Health Care (AREA)
- Public Health (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses the multicenters under a kind of data sharing strategy to cooperate with prognosis prediction system.The system can realize the data sharing of secret protection under multiple medical institutions centers, to provide enough data for model construction.The present invention constructs system using that can obtain the Ensemble Learning Algorithms of more preferable prediction result relative to Weak Classifier.Patient grade data of the system in each center processing sensitivity, and the sub-classifier of integrated study model is constructed simultaneously, less sensitive intermediate result is exchanged only to construct complete integrated study model, to ensure that proposed multicenter model and centralized model have identical or even more preferably result.Multicenter of the present invention cooperates with the prognosis prediction system protection individual privacy of patient; it does not need to run algorithm model in large centralised data source; in practical clinical, reliable solution is provided very little to construct the sample of prediction model in single medical institutions.
Description
Technical field
The invention belongs to the multicenter associations under medical field and machine learning field more particularly to a kind of data sharing strategy
With prognosis prediction system.
Background technique
Prognosis prediction plays an important role in clinical research and practice.Electronic health care note based on single medical institutions
The prediction model of record (EHR) data building may lack enough statistics effect and good generalization ability.Therefore, based on multiple
The prognostic predictive model building of medical institutions' center electron health records data collaborative analysis can be used for improving and instruct for model
Experienced patient populations and covering surface, enrich the prognosis characterizations of patient, the accuracy of the final prognosis prediction for improving model and extensive
Ability.Integrated study is a kind of very widely used algorithm in clinical prognosis, with the linear mould such as logistic regression and cox model
Type is different, and the usual precision of Ensemble Learning Algorithms is more preferable, and has the ability of the non-linear relation between capture variable, can keep away well
Exempt from overfitting problem common in machine learning.Therefore, model construction is carried out using Ensemble Learning Algorithms, is the association under multicenter
Ideal solution is provided with building for prognosis prediction system.In addition, while carrying out multicenter prognosis prediction, it is necessary to
Protect the privacy of patient.The integrated study training pattern of secret protection under existing multicenter is the method based on encryption mostly,
Such as utilize the methods of additivity homomorphic cryptography.Aslett et al. proposes the integrated study model based on complete homomorphic cryptography.Magkos
Et al. using based on homomorphic cryptography protocol frame construct encrypting module, to train integrated study classifier.Although these
Encryption method can prevent leakage of information and data exchange, but can significantly affect calculating and storage efficiency, poor expandability, uncomfortable
For handling the large-scale clinical data under multicenter.
Summary of the invention
In view of the above-mentioned deficiencies in the prior art, it is an object of the present invention to the multicenter under providing a kind of new types of data sharing policy
Cooperate with prognosis prediction system.
The purpose of the present invention is achieved through the following technical solutions: the multicenter collaboration under a kind of data sharing strategy
Prognosis prediction system, the system include following four module:
(1) data acquisition module: each variable required for patient's prognosis prediction is collected respectively at each medical institutions center
Data, the set of source data as the medical institutions center.
(2) data anonymous module: carrying out stochastical sampling to the set of source data at each medical institutions center with percentage p,
Anonymization data, local training of the remaining data as the medical institutions center are generated using anonymization algorithm to sampled data
Collection;Anonymization data from each medical institutions center synthesize enhancing data set by central server collection;Data will be enhanced
Collection is divided into two parts, i.e., additional training set and verifying collection;Additional training set is for returning and distributing to each medical institutions center;
Verifying collection is for selecting the hyper parameter (hyper parameter) of integrated learning model.
(3) it model training module: is being instructed in the sub-classifier of the integrated learning model of locally training at each medical institutions center
Training data during white silk includes that the local training set at the medical institutions center and central server return to the medical institutions
The additional training set at center;This shows for training the training set of each medical institutions center sub-classifier not only from center sheet
Body is also from the data set at other centers, to increase the randomness of data set, to improve the overall performance of integrated study model.
In the training process, the hyper parameter of integrated learning model is selected using the verifying collection created from enhancing data set.
(4) sub-classifier that each medical institutions center is trained prognostic model application module: is collected by central server
Constitute complete integrated study model;New patient data is inputted into the integrated study model and executes prognosis prediction.
Further, in the data anonymous module, the stochastical sampling percentage of each medical institutions center set of source data
Than p selection 50%.Anonymization ratio data p is fixed on 50% prediction effect for being able to ascend integrated study model, subclassification
Device be directly integrated or the full energy matriesization of data concentration training all can not achieve optimum again;The size of p is adjustable
Prognosis prediction to adapt to complicated decision support scene, for patient in the clinical practice under different scenes.
Further, k- anonymity algorithm (k-anonymity), l- diversity (l- may be selected in the anonymization algorithm
Diversity), the anonymity algorithms such as t- proximity (t-closeness) and difference privacy.Wherein it is specifically used for realizing that k- is anonymous
Method can choose inhibition (suppression), inhibition thoroughly hide certain information, do not issue certain data item.
Further, which considers horizontal segmentation data (horizontal-partitioned data), i.e., each doctor
The set of source data for treating mechanism center has the variable of identical type.
The beneficial effects of the present invention are: the present invention innovatively proposes a kind of multicenter data sharing strategy, it can be more
The data sharing that secret protection is realized under a medical institutions center, to provide enough data for model construction.The present invention adopts
System is constructed with the Ensemble Learning Algorithms (such as random forests algorithm) that can obtain more preferable prediction result relative to Weak Classifier.
The system and constructs the sub-classifier of integrated study model simultaneously, only hands in patient's grade data of each center processing sensitivity
Less sensitive intermediate result is changed to construct complete integrated study model, to ensure that proposed multicenter model and collection
Chinese style model has identical or even more preferably result.Multicenter of the present invention cooperates with the individual of prognosis prediction system protection patient hidden
Private does not need to run algorithm model in large centralised data source, is structure in single medical institutions in practical clinical
The sample for building prediction model provides reliable solution very little.
Detailed description of the invention
Fig. 1 is that the multicenter under data sharing strategy cooperates with prognosis prediction system framework figure;
Fig. 2 is data sharing strategy schematic diagram;
Fig. 3 is that each centre data transmits schematic diagram;
Fig. 4 is that the multicenter under data sharing strategy of the present invention cooperates with prognosis prediction system and the prognosis under centralized training
The predictive ability comparison diagram of forecasting system.
Specific embodiment
Invention is further described in detail in the following with reference to the drawings and specific embodiments.
Multicenter under a kind of new types of data sharing policy provided by the invention cooperates with prognosis prediction system, as shown in Figure 1,
Including following four module:
(1) data acquisition module: each variable required for patient's prognosis prediction is collected respectively at each medical institutions center
Data, the set of source data as the medical institutions center.The present embodiment carries out experimental verification using the data of colorectal cancer,
The number at middle medical institutions center is 5, and each medical institutions center passes through the collected electronic health record number of data acquisition module
It is as shown in table 1 according to sample, include altogether age, gender, tumor size, T by stages, N by stages and 6 variables such as carcinomebryonic antigen index
Data information.
Table 1: the electronic health record data at the single center of colorectal cancer patients acquire citing
Age | Gender | Tumor size (mm) | T is by stages | N is by stages | Carcinomebryonic antigen index | |
1 | 65 | Male | 4.8 | I | III | It is positive |
2 | 74 | Female | 1.5 | II | IV | It is negative |
… | … | … | … | … | … | … |
(2) data anonymous module: as shown in Fig. 2, the set of source data to each medical institutions center is carried out with percentage p
Stochastical sampling generates anonymization data using anonymization algorithm to sampled data, and remaining data is as the medical institutions center
Local training set.Anonymization data from each medical institutions center synthesize enhancing data set by central server collection;It will
Enhancing data set is divided into two parts, i.e., additional training set and verifying collection;Additional training set is for returning and distributing to each medical treatment
Mechanism center;Verifying collection is for selecting the hyper parameter (hyperparameter) of integrated learning model.In an experiment, anonymization number
It is set as 50% according to ratio p, specific anonymization algorithm needs to collect selection by verifying using the restrainable algorithms in k- anonymity
Hyper parameter has 2: single decision tree uses the maximum quantity of feature, the quantity of sub-classifier.
(3) model training module: as shown in Fig. 2, son of each medical institutions center in the integrated learning model of locally training
Classifier, training data in the training process include that the local training set at the medical institutions center and central server return to
The additional training set at the medical institutions center;This shows the training set for training each medical institutions center sub-classifier not only
The data set at other centers is also from from center itself, to increase the randomness of data set, to improve integrated study model
Overall performance.In the training process, the hyper parameter of integrated learning model is selected using the verifying collection created from enhancing data set,
It is invalid to solve the outer error (OOB) of the bag under multicenter mode and the not exactly the same caused unbiased esti-mator of standard random forest
The problem of.
(4) sub-classifier that each medical institutions center is trained prognostic model application module: is collected by central server
Constitute complete integrated study model;New patient data is inputted into the integrated study model and executes prognosis prediction.Experimental result
As shown in figure 4, the predictive ability of prognosis prediction system is measured with AUC.It can be seen that data sharing strategy proposed by the present invention
Under multicenter collaboration prognosis prediction system can obtain prediction result more preferably than the prognosis prediction system under centralized training.
Above-described embodiment is used to illustrate the present invention, rather than limits the invention, in spirit of the invention and
In scope of protection of the claims, to any modifications and changes that the present invention makes, protection scope of the present invention is both fallen within.
Claims (4)
1. the multicenter under a kind of data sharing strategy cooperates with prognosis prediction system characterized by comprising
(1) data acquisition module: the number of each variable required for patient's prognosis prediction is collected respectively at each medical institutions center
According to set of source data as the medical institutions center.
(2) data anonymous module: carrying out stochastical sampling to the set of source data at each medical institutions center with percentage p, to adopting
Sample data generate anonymization data, local training set of the remaining data as the medical institutions center using anonymization algorithm;Come
Enhancing data set is synthesized by central server collection from the anonymization data at each medical institutions center;Enhancing data set is divided into
Two parts, i.e., additional training set and verifying collection;Additional training set is for returning and distributing to each medical institutions center;Verifying collection
For selecting the hyper parameter (hyper parameter) of integrated learning model.
(3) it model training module: was being trained in the sub-classifier of the integrated learning model of locally training at each medical institutions center
Training data in journey includes that the local training set at the medical institutions center and central server return to the medical institutions center
Additional training set;This shows for training the training set of each medical institutions center sub-classifier to go back not only from center itself
Data set from other centers, to increase the randomness of data set, to improve the overall performance of integrated study model.It is instructing
During white silk, the hyper parameter of integrated learning model is selected using the verifying collection created from enhancing data set.
(4) prognostic model application module: the sub-classifier that each medical institutions center is trained is collected by central server and is constituted
Complete integrated study model;New patient data is inputted into the integrated study model and executes prognosis prediction.
2. the multicenter under a kind of data sharing strategy according to claim 1 cooperates with prognosis prediction system, feature exists
In, in the data anonymous module, the stochastical sampling percentage p selection 50% of each medical institutions center set of source data.
3. the multicenter under a kind of data sharing strategy according to claim 1 cooperates with prognosis prediction system, feature exists
In k- anonymity algorithm (k-anonymity), l- diversity (l-diversity), t- proximity may be selected in the anonymization algorithm
(t-closeness) and the anonymity algorithms such as difference privacy.The method for being wherein specifically used for realizing k- anonymity can choose inhibition
(suppression), inhibit thoroughly to hide certain information, do not issue certain data item.
4. the multicenter under a kind of data sharing strategy according to claim 1 cooperates with prognosis prediction system, feature exists
In the system considers horizontal segmentation data (horizontal-partitioned data), i.e., the source at each medical institutions center
Data set has the variable of identical type.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910629800.2A CN110348241B (en) | 2019-07-12 | 2019-07-12 | Multi-center collaborative prognosis prediction system under data sharing strategy |
PCT/CN2020/083588 WO2020233258A1 (en) | 2019-07-12 | 2020-04-07 | Data sharing strategy-based multi-center collaborative prognosis prediction system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910629800.2A CN110348241B (en) | 2019-07-12 | 2019-07-12 | Multi-center collaborative prognosis prediction system under data sharing strategy |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110348241A true CN110348241A (en) | 2019-10-18 |
CN110348241B CN110348241B (en) | 2021-08-03 |
Family
ID=68175993
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910629800.2A Active CN110348241B (en) | 2019-07-12 | 2019-07-12 | Multi-center collaborative prognosis prediction system under data sharing strategy |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110348241B (en) |
WO (1) | WO2020233258A1 (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111222570A (en) * | 2020-01-06 | 2020-06-02 | 广西师范大学 | Ensemble learning classification method based on difference privacy |
CN111245903A (en) * | 2019-12-31 | 2020-06-05 | 烽火通信科技股份有限公司 | Joint learning method and system based on edge calculation |
WO2020233258A1 (en) * | 2019-07-12 | 2020-11-26 | 之江实验室 | Data sharing strategy-based multi-center collaborative prognosis prediction system |
CN113221162A (en) * | 2021-04-28 | 2021-08-06 | 健康数据(北京)科技有限公司 | Private disease-specific big data privacy protection method and system based on block chain |
CN117577333A (en) * | 2024-01-17 | 2024-02-20 | 浙江大学 | Multi-center clinical prognosis prediction system based on causal feature learning |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115442099B (en) * | 2022-08-28 | 2023-06-06 | 北方工业大学 | Distributed GAN-based privacy protection data sharing method and system |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20130063565A (en) * | 2011-12-07 | 2013-06-17 | 조윤진 | Combination of multiple classifiers using bagging in semi-supervised learning |
CN104200417A (en) * | 2014-08-20 | 2014-12-10 | 西安唐城电子医疗设备研究所 | Rehabilitation training system based on cloud computing |
CN107871160A (en) * | 2016-09-26 | 2018-04-03 | 谷歌公司 | Communicate efficient joint study |
CN109711556A (en) * | 2018-12-24 | 2019-05-03 | 中国南方电网有限责任公司 | Machine patrols data processing method, device, net grade server and provincial server |
CN109977694A (en) * | 2019-03-11 | 2019-07-05 | 暨南大学 | A kind of data sharing method based on cooperation deep learning |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106897545B (en) * | 2017-01-05 | 2019-04-30 | 浙江大学 | A kind of tumor prognosis forecasting system based on depth confidence network |
CN106886799B (en) * | 2017-03-17 | 2019-08-02 | 东北大学 | A kind of continuous annealing band steel quality online test method based on hybrid integrated study |
CN110348241B (en) * | 2019-07-12 | 2021-08-03 | 之江实验室 | Multi-center collaborative prognosis prediction system under data sharing strategy |
-
2019
- 2019-07-12 CN CN201910629800.2A patent/CN110348241B/en active Active
-
2020
- 2020-04-07 WO PCT/CN2020/083588 patent/WO2020233258A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20130063565A (en) * | 2011-12-07 | 2013-06-17 | 조윤진 | Combination of multiple classifiers using bagging in semi-supervised learning |
CN104200417A (en) * | 2014-08-20 | 2014-12-10 | 西安唐城电子医疗设备研究所 | Rehabilitation training system based on cloud computing |
CN107871160A (en) * | 2016-09-26 | 2018-04-03 | 谷歌公司 | Communicate efficient joint study |
CN109711556A (en) * | 2018-12-24 | 2019-05-03 | 中国南方电网有限责任公司 | Machine patrols data processing method, device, net grade server and provincial server |
CN109977694A (en) * | 2019-03-11 | 2019-07-05 | 暨南大学 | A kind of data sharing method based on cooperation deep learning |
Non-Patent Citations (1)
Title |
---|
陈莉平等: "基于大数据的脑卒中复发预测模型的构建", 《物联网技术》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020233258A1 (en) * | 2019-07-12 | 2020-11-26 | 之江实验室 | Data sharing strategy-based multi-center collaborative prognosis prediction system |
CN111245903A (en) * | 2019-12-31 | 2020-06-05 | 烽火通信科技股份有限公司 | Joint learning method and system based on edge calculation |
CN111245903B (en) * | 2019-12-31 | 2022-07-01 | 烽火通信科技股份有限公司 | Joint learning method and system based on edge calculation |
CN111222570A (en) * | 2020-01-06 | 2020-06-02 | 广西师范大学 | Ensemble learning classification method based on difference privacy |
CN111222570B (en) * | 2020-01-06 | 2022-08-26 | 广西师范大学 | Ensemble learning classification method based on difference privacy |
CN113221162A (en) * | 2021-04-28 | 2021-08-06 | 健康数据(北京)科技有限公司 | Private disease-specific big data privacy protection method and system based on block chain |
CN117577333A (en) * | 2024-01-17 | 2024-02-20 | 浙江大学 | Multi-center clinical prognosis prediction system based on causal feature learning |
CN117577333B (en) * | 2024-01-17 | 2024-04-09 | 浙江大学 | Multi-center clinical prognosis prediction system based on causal feature learning |
Also Published As
Publication number | Publication date |
---|---|
WO2020233258A1 (en) | 2020-11-26 |
CN110348241B (en) | 2021-08-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110348241A (en) | A kind of multicenter under data sharing strategy cooperates with prognosis prediction system | |
Torkzadehmahani et al. | Privacy-preserving artificial intelligence techniques in biomedicine | |
Cheng et al. | Integration of machine learning and blockchain technology in the healthcare field: a literature review and implications for cancer care | |
Dettenkofer et al. | Surveillance of nosocomial infections in adult recipients of allogeneic and autologous bone marrow and peripheral blood stem-cell transplantation | |
Glenny et al. | Interventions for the treatment of oral cavity and oropharyngeal cancer: radiotherapy | |
Sruthi et al. | Cancer prediction using machine learning | |
Yang et al. | Histopathology-based diagnosis of oral squamous cell carcinoma using deep learning | |
Swaminathan et al. | Lack of active follow-up of cancer patients in Chennai, India: implications for population-based survival estimates | |
CN105069286A (en) | Logistic regression analysis system based on protection of vertically distributed private data | |
Moon et al. | Privacy-preserving federated learning in healthcare | |
Jia et al. | Breast cancer identification using machine learning | |
CN104166951B (en) | A kind of method and system that data supporting is provided for medical institutions' bidirectionally transfering consultation | |
CN111261299A (en) | Multi-center collaborative cancer prognosis prediction system based on multi-source transfer learning | |
Fionda et al. | Interventional radiotherapy (brachytherapy) for nasal vestibule: novel strategies to prevent side effects | |
Xiao et al. | Methods for examining cancer symptom clusters over time | |
Franco et al. | Equity, diversity, and inclusion in radiation oncology: a bibliometric analysis and critical review | |
Tuček et al. | Is there still a place for brachytherapy in the modern treatment of early-stage oral cancer? | |
Musha et al. | Evaluation of carbon ion radiation-induced trismus in head and neck tumors using dose-volume histograms | |
Gong et al. | Privacy-preserving collaborative learning for mobile health monitoring | |
Ness et al. | Characteristics of responders to a request for a buccal cell specimen among survivors of childhood cancer and their siblings | |
Deconinck et al. | Community acquired fungemia caused by Candida pulcherrima: diagnostic contribution of MALDI-TOF mass spectrometry | |
Li et al. | Using deep learning to model the biological dose prediction on bulky lung cancer patients of partial stereotactic ablation radiotherapy | |
Lin et al. | Efficacy of postoperative unilateral neck irradiation in patients with buccal mucosa squamous carcinoma with extranodal extension: A propensity score analysis | |
Mori et al. | Outcomes of allogeneic hematopoietic cell transplantation in patients with biphenotypic acute leukemia | |
Kumar et al. | Privacy-preserving blockchain-based federated learning for brain tumor segmentation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |