CN110348241A - A kind of multicenter under data sharing strategy cooperates with prognosis prediction system - Google Patents

A kind of multicenter under data sharing strategy cooperates with prognosis prediction system Download PDF

Info

Publication number
CN110348241A
CN110348241A CN201910629800.2A CN201910629800A CN110348241A CN 110348241 A CN110348241 A CN 110348241A CN 201910629800 A CN201910629800 A CN 201910629800A CN 110348241 A CN110348241 A CN 110348241A
Authority
CN
China
Prior art keywords
data
medical institutions
model
center
multicenter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910629800.2A
Other languages
Chinese (zh)
Other versions
CN110348241B (en
Inventor
李劲松
李谨
田雨
吴承凯
池胜强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhijiang Laboratory
Original Assignee
Zhijiang Laboratory
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhijiang Laboratory filed Critical Zhijiang Laboratory
Priority to CN201910629800.2A priority Critical patent/CN110348241B/en
Publication of CN110348241A publication Critical patent/CN110348241A/en
Priority to PCT/CN2020/083588 priority patent/WO2020233258A1/en
Application granted granted Critical
Publication of CN110348241B publication Critical patent/CN110348241B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Medical Informatics (AREA)
  • Bioethics (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses the multicenters under a kind of data sharing strategy to cooperate with prognosis prediction system.The system can realize the data sharing of secret protection under multiple medical institutions centers, to provide enough data for model construction.The present invention constructs system using that can obtain the Ensemble Learning Algorithms of more preferable prediction result relative to Weak Classifier.Patient grade data of the system in each center processing sensitivity, and the sub-classifier of integrated study model is constructed simultaneously, less sensitive intermediate result is exchanged only to construct complete integrated study model, to ensure that proposed multicenter model and centralized model have identical or even more preferably result.Multicenter of the present invention cooperates with the prognosis prediction system protection individual privacy of patient; it does not need to run algorithm model in large centralised data source; in practical clinical, reliable solution is provided very little to construct the sample of prediction model in single medical institutions.

Description

A kind of multicenter under data sharing strategy cooperates with prognosis prediction system
Technical field
The invention belongs to the multicenter associations under medical field and machine learning field more particularly to a kind of data sharing strategy With prognosis prediction system.
Background technique
Prognosis prediction plays an important role in clinical research and practice.Electronic health care note based on single medical institutions The prediction model of record (EHR) data building may lack enough statistics effect and good generalization ability.Therefore, based on multiple The prognostic predictive model building of medical institutions' center electron health records data collaborative analysis can be used for improving and instruct for model Experienced patient populations and covering surface, enrich the prognosis characterizations of patient, the accuracy of the final prognosis prediction for improving model and extensive Ability.Integrated study is a kind of very widely used algorithm in clinical prognosis, with the linear mould such as logistic regression and cox model Type is different, and the usual precision of Ensemble Learning Algorithms is more preferable, and has the ability of the non-linear relation between capture variable, can keep away well Exempt from overfitting problem common in machine learning.Therefore, model construction is carried out using Ensemble Learning Algorithms, is the association under multicenter Ideal solution is provided with building for prognosis prediction system.In addition, while carrying out multicenter prognosis prediction, it is necessary to Protect the privacy of patient.The integrated study training pattern of secret protection under existing multicenter is the method based on encryption mostly, Such as utilize the methods of additivity homomorphic cryptography.Aslett et al. proposes the integrated study model based on complete homomorphic cryptography.Magkos Et al. using based on homomorphic cryptography protocol frame construct encrypting module, to train integrated study classifier.Although these Encryption method can prevent leakage of information and data exchange, but can significantly affect calculating and storage efficiency, poor expandability, uncomfortable For handling the large-scale clinical data under multicenter.
Summary of the invention
In view of the above-mentioned deficiencies in the prior art, it is an object of the present invention to the multicenter under providing a kind of new types of data sharing policy Cooperate with prognosis prediction system.
The purpose of the present invention is achieved through the following technical solutions: the multicenter collaboration under a kind of data sharing strategy Prognosis prediction system, the system include following four module:
(1) data acquisition module: each variable required for patient's prognosis prediction is collected respectively at each medical institutions center Data, the set of source data as the medical institutions center.
(2) data anonymous module: carrying out stochastical sampling to the set of source data at each medical institutions center with percentage p, Anonymization data, local training of the remaining data as the medical institutions center are generated using anonymization algorithm to sampled data Collection;Anonymization data from each medical institutions center synthesize enhancing data set by central server collection;Data will be enhanced Collection is divided into two parts, i.e., additional training set and verifying collection;Additional training set is for returning and distributing to each medical institutions center; Verifying collection is for selecting the hyper parameter (hyper parameter) of integrated learning model.
(3) it model training module: is being instructed in the sub-classifier of the integrated learning model of locally training at each medical institutions center Training data during white silk includes that the local training set at the medical institutions center and central server return to the medical institutions The additional training set at center;This shows for training the training set of each medical institutions center sub-classifier not only from center sheet Body is also from the data set at other centers, to increase the randomness of data set, to improve the overall performance of integrated study model. In the training process, the hyper parameter of integrated learning model is selected using the verifying collection created from enhancing data set.
(4) sub-classifier that each medical institutions center is trained prognostic model application module: is collected by central server Constitute complete integrated study model;New patient data is inputted into the integrated study model and executes prognosis prediction.
Further, in the data anonymous module, the stochastical sampling percentage of each medical institutions center set of source data Than p selection 50%.Anonymization ratio data p is fixed on 50% prediction effect for being able to ascend integrated study model, subclassification Device be directly integrated or the full energy matriesization of data concentration training all can not achieve optimum again;The size of p is adjustable Prognosis prediction to adapt to complicated decision support scene, for patient in the clinical practice under different scenes.
Further, k- anonymity algorithm (k-anonymity), l- diversity (l- may be selected in the anonymization algorithm Diversity), the anonymity algorithms such as t- proximity (t-closeness) and difference privacy.Wherein it is specifically used for realizing that k- is anonymous Method can choose inhibition (suppression), inhibition thoroughly hide certain information, do not issue certain data item.
Further, which considers horizontal segmentation data (horizontal-partitioned data), i.e., each doctor The set of source data for treating mechanism center has the variable of identical type.
The beneficial effects of the present invention are: the present invention innovatively proposes a kind of multicenter data sharing strategy, it can be more The data sharing that secret protection is realized under a medical institutions center, to provide enough data for model construction.The present invention adopts System is constructed with the Ensemble Learning Algorithms (such as random forests algorithm) that can obtain more preferable prediction result relative to Weak Classifier. The system and constructs the sub-classifier of integrated study model simultaneously, only hands in patient's grade data of each center processing sensitivity Less sensitive intermediate result is changed to construct complete integrated study model, to ensure that proposed multicenter model and collection Chinese style model has identical or even more preferably result.Multicenter of the present invention cooperates with the individual of prognosis prediction system protection patient hidden Private does not need to run algorithm model in large centralised data source, is structure in single medical institutions in practical clinical The sample for building prediction model provides reliable solution very little.
Detailed description of the invention
Fig. 1 is that the multicenter under data sharing strategy cooperates with prognosis prediction system framework figure;
Fig. 2 is data sharing strategy schematic diagram;
Fig. 3 is that each centre data transmits schematic diagram;
Fig. 4 is that the multicenter under data sharing strategy of the present invention cooperates with prognosis prediction system and the prognosis under centralized training The predictive ability comparison diagram of forecasting system.
Specific embodiment
Invention is further described in detail in the following with reference to the drawings and specific embodiments.
Multicenter under a kind of new types of data sharing policy provided by the invention cooperates with prognosis prediction system, as shown in Figure 1, Including following four module:
(1) data acquisition module: each variable required for patient's prognosis prediction is collected respectively at each medical institutions center Data, the set of source data as the medical institutions center.The present embodiment carries out experimental verification using the data of colorectal cancer, The number at middle medical institutions center is 5, and each medical institutions center passes through the collected electronic health record number of data acquisition module It is as shown in table 1 according to sample, include altogether age, gender, tumor size, T by stages, N by stages and 6 variables such as carcinomebryonic antigen index Data information.
Table 1: the electronic health record data at the single center of colorectal cancer patients acquire citing
Age Gender Tumor size (mm) T is by stages N is by stages Carcinomebryonic antigen index
1 65 Male 4.8 I III It is positive
2 74 Female 1.5 II IV It is negative
(2) data anonymous module: as shown in Fig. 2, the set of source data to each medical institutions center is carried out with percentage p Stochastical sampling generates anonymization data using anonymization algorithm to sampled data, and remaining data is as the medical institutions center Local training set.Anonymization data from each medical institutions center synthesize enhancing data set by central server collection;It will Enhancing data set is divided into two parts, i.e., additional training set and verifying collection;Additional training set is for returning and distributing to each medical treatment Mechanism center;Verifying collection is for selecting the hyper parameter (hyperparameter) of integrated learning model.In an experiment, anonymization number It is set as 50% according to ratio p, specific anonymization algorithm needs to collect selection by verifying using the restrainable algorithms in k- anonymity Hyper parameter has 2: single decision tree uses the maximum quantity of feature, the quantity of sub-classifier.
(3) model training module: as shown in Fig. 2, son of each medical institutions center in the integrated learning model of locally training Classifier, training data in the training process include that the local training set at the medical institutions center and central server return to The additional training set at the medical institutions center;This shows the training set for training each medical institutions center sub-classifier not only The data set at other centers is also from from center itself, to increase the randomness of data set, to improve integrated study model Overall performance.In the training process, the hyper parameter of integrated learning model is selected using the verifying collection created from enhancing data set, It is invalid to solve the outer error (OOB) of the bag under multicenter mode and the not exactly the same caused unbiased esti-mator of standard random forest The problem of.
(4) sub-classifier that each medical institutions center is trained prognostic model application module: is collected by central server Constitute complete integrated study model;New patient data is inputted into the integrated study model and executes prognosis prediction.Experimental result As shown in figure 4, the predictive ability of prognosis prediction system is measured with AUC.It can be seen that data sharing strategy proposed by the present invention Under multicenter collaboration prognosis prediction system can obtain prediction result more preferably than the prognosis prediction system under centralized training.
Above-described embodiment is used to illustrate the present invention, rather than limits the invention, in spirit of the invention and In scope of protection of the claims, to any modifications and changes that the present invention makes, protection scope of the present invention is both fallen within.

Claims (4)

1. the multicenter under a kind of data sharing strategy cooperates with prognosis prediction system characterized by comprising
(1) data acquisition module: the number of each variable required for patient's prognosis prediction is collected respectively at each medical institutions center According to set of source data as the medical institutions center.
(2) data anonymous module: carrying out stochastical sampling to the set of source data at each medical institutions center with percentage p, to adopting Sample data generate anonymization data, local training set of the remaining data as the medical institutions center using anonymization algorithm;Come Enhancing data set is synthesized by central server collection from the anonymization data at each medical institutions center;Enhancing data set is divided into Two parts, i.e., additional training set and verifying collection;Additional training set is for returning and distributing to each medical institutions center;Verifying collection For selecting the hyper parameter (hyper parameter) of integrated learning model.
(3) it model training module: was being trained in the sub-classifier of the integrated learning model of locally training at each medical institutions center Training data in journey includes that the local training set at the medical institutions center and central server return to the medical institutions center Additional training set;This shows for training the training set of each medical institutions center sub-classifier to go back not only from center itself Data set from other centers, to increase the randomness of data set, to improve the overall performance of integrated study model.It is instructing During white silk, the hyper parameter of integrated learning model is selected using the verifying collection created from enhancing data set.
(4) prognostic model application module: the sub-classifier that each medical institutions center is trained is collected by central server and is constituted Complete integrated study model;New patient data is inputted into the integrated study model and executes prognosis prediction.
2. the multicenter under a kind of data sharing strategy according to claim 1 cooperates with prognosis prediction system, feature exists In, in the data anonymous module, the stochastical sampling percentage p selection 50% of each medical institutions center set of source data.
3. the multicenter under a kind of data sharing strategy according to claim 1 cooperates with prognosis prediction system, feature exists In k- anonymity algorithm (k-anonymity), l- diversity (l-diversity), t- proximity may be selected in the anonymization algorithm (t-closeness) and the anonymity algorithms such as difference privacy.The method for being wherein specifically used for realizing k- anonymity can choose inhibition (suppression), inhibit thoroughly to hide certain information, do not issue certain data item.
4. the multicenter under a kind of data sharing strategy according to claim 1 cooperates with prognosis prediction system, feature exists In the system considers horizontal segmentation data (horizontal-partitioned data), i.e., the source at each medical institutions center Data set has the variable of identical type.
CN201910629800.2A 2019-07-12 2019-07-12 Multi-center collaborative prognosis prediction system under data sharing strategy Active CN110348241B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910629800.2A CN110348241B (en) 2019-07-12 2019-07-12 Multi-center collaborative prognosis prediction system under data sharing strategy
PCT/CN2020/083588 WO2020233258A1 (en) 2019-07-12 2020-04-07 Data sharing strategy-based multi-center collaborative prognosis prediction system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910629800.2A CN110348241B (en) 2019-07-12 2019-07-12 Multi-center collaborative prognosis prediction system under data sharing strategy

Publications (2)

Publication Number Publication Date
CN110348241A true CN110348241A (en) 2019-10-18
CN110348241B CN110348241B (en) 2021-08-03

Family

ID=68175993

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910629800.2A Active CN110348241B (en) 2019-07-12 2019-07-12 Multi-center collaborative prognosis prediction system under data sharing strategy

Country Status (2)

Country Link
CN (1) CN110348241B (en)
WO (1) WO2020233258A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111222570A (en) * 2020-01-06 2020-06-02 广西师范大学 Ensemble learning classification method based on difference privacy
CN111245903A (en) * 2019-12-31 2020-06-05 烽火通信科技股份有限公司 Joint learning method and system based on edge calculation
WO2020233258A1 (en) * 2019-07-12 2020-11-26 之江实验室 Data sharing strategy-based multi-center collaborative prognosis prediction system
CN113221162A (en) * 2021-04-28 2021-08-06 健康数据(北京)科技有限公司 Private disease-specific big data privacy protection method and system based on block chain
CN117577333A (en) * 2024-01-17 2024-02-20 浙江大学 Multi-center clinical prognosis prediction system based on causal feature learning

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115442099B (en) * 2022-08-28 2023-06-06 北方工业大学 Distributed GAN-based privacy protection data sharing method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20130063565A (en) * 2011-12-07 2013-06-17 조윤진 Combination of multiple classifiers using bagging in semi-supervised learning
CN104200417A (en) * 2014-08-20 2014-12-10 西安唐城电子医疗设备研究所 Rehabilitation training system based on cloud computing
CN107871160A (en) * 2016-09-26 2018-04-03 谷歌公司 Communicate efficient joint study
CN109711556A (en) * 2018-12-24 2019-05-03 中国南方电网有限责任公司 Machine patrols data processing method, device, net grade server and provincial server
CN109977694A (en) * 2019-03-11 2019-07-05 暨南大学 A kind of data sharing method based on cooperation deep learning

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106897545B (en) * 2017-01-05 2019-04-30 浙江大学 A kind of tumor prognosis forecasting system based on depth confidence network
CN106886799B (en) * 2017-03-17 2019-08-02 东北大学 A kind of continuous annealing band steel quality online test method based on hybrid integrated study
CN110348241B (en) * 2019-07-12 2021-08-03 之江实验室 Multi-center collaborative prognosis prediction system under data sharing strategy

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20130063565A (en) * 2011-12-07 2013-06-17 조윤진 Combination of multiple classifiers using bagging in semi-supervised learning
CN104200417A (en) * 2014-08-20 2014-12-10 西安唐城电子医疗设备研究所 Rehabilitation training system based on cloud computing
CN107871160A (en) * 2016-09-26 2018-04-03 谷歌公司 Communicate efficient joint study
CN109711556A (en) * 2018-12-24 2019-05-03 中国南方电网有限责任公司 Machine patrols data processing method, device, net grade server and provincial server
CN109977694A (en) * 2019-03-11 2019-07-05 暨南大学 A kind of data sharing method based on cooperation deep learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈莉平等: "基于大数据的脑卒中复发预测模型的构建", 《物联网技术》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020233258A1 (en) * 2019-07-12 2020-11-26 之江实验室 Data sharing strategy-based multi-center collaborative prognosis prediction system
CN111245903A (en) * 2019-12-31 2020-06-05 烽火通信科技股份有限公司 Joint learning method and system based on edge calculation
CN111245903B (en) * 2019-12-31 2022-07-01 烽火通信科技股份有限公司 Joint learning method and system based on edge calculation
CN111222570A (en) * 2020-01-06 2020-06-02 广西师范大学 Ensemble learning classification method based on difference privacy
CN111222570B (en) * 2020-01-06 2022-08-26 广西师范大学 Ensemble learning classification method based on difference privacy
CN113221162A (en) * 2021-04-28 2021-08-06 健康数据(北京)科技有限公司 Private disease-specific big data privacy protection method and system based on block chain
CN117577333A (en) * 2024-01-17 2024-02-20 浙江大学 Multi-center clinical prognosis prediction system based on causal feature learning
CN117577333B (en) * 2024-01-17 2024-04-09 浙江大学 Multi-center clinical prognosis prediction system based on causal feature learning

Also Published As

Publication number Publication date
WO2020233258A1 (en) 2020-11-26
CN110348241B (en) 2021-08-03

Similar Documents

Publication Publication Date Title
CN110348241A (en) A kind of multicenter under data sharing strategy cooperates with prognosis prediction system
Torkzadehmahani et al. Privacy-preserving artificial intelligence techniques in biomedicine
Cheng et al. Integration of machine learning and blockchain technology in the healthcare field: a literature review and implications for cancer care
Dettenkofer et al. Surveillance of nosocomial infections in adult recipients of allogeneic and autologous bone marrow and peripheral blood stem-cell transplantation
Glenny et al. Interventions for the treatment of oral cavity and oropharyngeal cancer: radiotherapy
Sruthi et al. Cancer prediction using machine learning
Yang et al. Histopathology-based diagnosis of oral squamous cell carcinoma using deep learning
Swaminathan et al. Lack of active follow-up of cancer patients in Chennai, India: implications for population-based survival estimates
CN105069286A (en) Logistic regression analysis system based on protection of vertically distributed private data
Moon et al. Privacy-preserving federated learning in healthcare
Jia et al. Breast cancer identification using machine learning
CN104166951B (en) A kind of method and system that data supporting is provided for medical institutions' bidirectionally transfering consultation
CN111261299A (en) Multi-center collaborative cancer prognosis prediction system based on multi-source transfer learning
Fionda et al. Interventional radiotherapy (brachytherapy) for nasal vestibule: novel strategies to prevent side effects
Xiao et al. Methods for examining cancer symptom clusters over time
Franco et al. Equity, diversity, and inclusion in radiation oncology: a bibliometric analysis and critical review
Tuček et al. Is there still a place for brachytherapy in the modern treatment of early-stage oral cancer?
Musha et al. Evaluation of carbon ion radiation-induced trismus in head and neck tumors using dose-volume histograms
Gong et al. Privacy-preserving collaborative learning for mobile health monitoring
Ness et al. Characteristics of responders to a request for a buccal cell specimen among survivors of childhood cancer and their siblings
Deconinck et al. Community acquired fungemia caused by Candida pulcherrima: diagnostic contribution of MALDI-TOF mass spectrometry
Li et al. Using deep learning to model the biological dose prediction on bulky lung cancer patients of partial stereotactic ablation radiotherapy
Lin et al. Efficacy of postoperative unilateral neck irradiation in patients with buccal mucosa squamous carcinoma with extranodal extension: A propensity score analysis
Mori et al. Outcomes of allogeneic hematopoietic cell transplantation in patients with biphenotypic acute leukemia
Kumar et al. Privacy-preserving blockchain-based federated learning for brain tumor segmentation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant