CN113674843A

CN113674843A - Method, device, system, electronic device and storage medium for medical expense prediction

Info

Publication number: CN113674843A
Application number: CN202110774423.9A
Authority: CN
Inventors: 高春蓉; 方敏; 杨啸天; 俞青; 应晶; 余小益; 张旷
Original assignee: Zhejiang Yishan Intelligent Medical Research Co ltd
Current assignee: Zhejiang Yishan Intelligent Medical Research Co ltd
Priority date: 2021-07-08
Filing date: 2021-07-08
Publication date: 2021-11-19

Abstract

The present application relates to a method, apparatus, system, electronic device and storage medium for medical expense prediction, wherein the method comprises: acquiring historical medical data corresponding to all participants, and generating a training set according to the historical medical data; inputting the training set to a federal learning platform, and training by using the federal learning platform through the participator to obtain a first prediction model based on a decision tree; the federated learning platform is built and generated by each local cluster which is pre-deployed on each participant; and acquiring medical data to be tested corresponding to at least one participant, and inputting the medical data to be tested into the first prediction model to generate a medical expense prediction result. By the method and the device, the problem of low privacy of medical expense prediction is solved, and the medical expense prediction which is accurate, safe and based on federal learning is realized.

Description

Method, device, system, electronic device and storage medium for medical expense prediction

Technical Field

The present application relates to the field of medical technology, and in particular, to a method, an apparatus, a system, an electronic apparatus, and a storage medium for medical expense prediction.

Background

In the operation process of the hospital, the medical expense information of the patient received by the hospital needs to be predicted to conduct expense management, so that overall planning is conducted on aspects of material purchasing, medical insurance subsidy, project development and the like, and a better operation scheme is made. However, in the related art, the medical cost is predicted by using a model based on some conventional models, such as a regression model, a mixed effect model, or a gray model. However, since medical data is related to the problem of personal privacy, there is a problem that data is isolated, data cannot be directly shared between medical institutions, and medical cost prediction can be performed only using data in the medical institutions, which results in low privacy of medical cost prediction.

At present, no effective solution is provided for the problem of low privacy of medical expense prediction in the related art.

Disclosure of Invention

The embodiment of the application provides a method, a device, a system, an electronic device and a storage medium for medical expense prediction, so as to at least solve the problem of low privacy of medical expense prediction in the related art.

In a first aspect, an embodiment of the present application provides a method for medical expense prediction, where the method includes:

acquiring historical medical data corresponding to all participants, and generating a training set according to the historical medical data;

inputting the training set to a federal learning platform, and training by using the federal learning platform through the participants to obtain a first prediction model based on a decision tree; the federated learning platform is built and generated by local clusters which are pre-deployed on each participant;

and acquiring medical data to be tested corresponding to at least one participant, and inputting the medical data to be tested into the first prediction model to generate a medical expense prediction result.

In some embodiments, the inputting the training set to a federated learning platform and training, by the participants, a first decision tree-based prediction model using the federated learning platform comprises:

inputting the training set to each of the participants; wherein the participants generate candidate split points according to the training set and traverse the candidate classification points to generate a gradient histogram;

receiving an encrypted histogram correspondingly obtained after each participant encrypts the gradient histogram through the federal learning platform, and generating a global gradient histogram according to the encrypted histogram;

obtaining an optimal split point according to the global gradient histogram, and sending the optimal split point to each participant through the federal learning platform; and the participator updates the decision tree according to the optimal split point so as to generate the first prediction model.

In some of these embodiments, the generating a training set from the historical medical data comprises:

acquiring a preset data structure;

performing data preprocessing on the historical medical data according to the preset data structure to obtain a processed characteristic data set;

and generating the training set according to the characteristic data set.

and generating the training set, the verification set and the test set according to the historical medical data.

In some embodiments, after the first prediction model based on the decision tree is obtained by training using the federal learning platform, the method further includes:

sending the verification set to all of the participants; the participator carries out parameter adjustment processing on the first prediction model according to the verification set to obtain a processed second prediction model, and selects from all the second prediction models to obtain an optimal prediction model;

and sending the test set to the participant, receiving a test result aiming at the optimal prediction model obtained by the participant according to the test set, and inputting the medical data to be tested to the optimal prediction model to generate the medical expense prediction result.

In some embodiments, the method for building the federal learning platform includes:

deploying independent Kubernetes local clusters on all the participants respectively; wherein, each Kubernetes local cluster is connected with each other;

and deploying Federal AI Technology Enabler (FATE for short) in the Kubernets local cluster, and further generating the Federal learning platform.

In a second aspect, an embodiment of the present application provides an apparatus for medical expense prediction, where the apparatus includes: the device comprises an acquisition module, a training module and a generation module;

the acquisition module is used for acquiring historical medical data corresponding to all participants and generating a training set according to the historical medical data;

the training module is used for inputting the training set to a federal learning platform, and training the training set by using the federal learning platform through the participants to obtain a first prediction model based on a decision tree; the federated learning platform is built and generated by local clusters which are pre-deployed on each participant;

the generation module is used for acquiring medical data to be tested corresponding to at least one participant and inputting the medical data to be tested into the first prediction model so as to generate a medical expense prediction result.

In a third aspect, an embodiment of the present application provides a system for medical expense prediction, where the system includes: a terminal device, a transmission device and a server device; the terminal equipment is connected with the server equipment through the transmission equipment;

the terminal equipment is used for displaying the medical expense prediction result;

the transmission device is used for transmitting the medical expense prediction result;

the server device is adapted to perform a method of medical cost prediction as described in the first aspect above.

In a fourth aspect, the present application provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor executes the computer program to implement the method for medical expense prediction according to the first aspect.

In a fifth aspect, the present application provides a storage medium, on which a computer program is stored, which when executed by a processor, implements the method for medical expense prediction according to the first aspect.

Compared with the related art, the method, the device, the system, the electronic device and the storage medium for predicting the medical expenses, provided by the embodiment of the application, generate the training set according to the historical medical data by acquiring the historical medical data corresponding to all the participants; inputting the training set to a federal learning platform, and training by using the federal learning platform through the participator to obtain a first prediction model based on a decision tree; the federated learning platform is built and generated by each local cluster which is pre-deployed on each participant; and acquiring medical data to be tested corresponding to at least one participant, and inputting the medical data to be tested into the first prediction model to generate a medical expense prediction result, so that the problem of low privacy of medical expense prediction is solved, and the accurate and safe medical expense prediction based on federal learning is realized.

The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the application.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

FIG. 1 is a schematic diagram illustrating an application scenario of a method for medical expense prediction according to an embodiment of the present application;

FIG. 2 is a flow chart of a method of medical cost prediction according to an embodiment of the present application;

FIG. 3 is a flow chart of an aggregation method according to an embodiment of the present application;

FIG. 4 is a flow chart of a histogram processing method according to an embodiment of the present application;

FIG. 5 is a flow chart of another method of medical cost prediction according to an embodiment of the present application;

FIG. 6 is a block diagram of an apparatus for medical cost prediction according to an embodiment of the present application;

FIG. 7 is a block diagram of a system for medical cost prediction according to an embodiment of the present application;

fig. 8 is a block diagram of the inside of a computer device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described and illustrated below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments provided in the present application without any inventive step are within the scope of protection of the present application. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.

Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of ordinary skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments without conflict.

Unless defined otherwise, technical or scientific terms referred to herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar words throughout this application are not to be construed as limiting in number, and may refer to the singular or the plural. The present application is directed to the use of the terms "including," "comprising," "having," and any variations thereof, which are intended to cover non-exclusive inclusions; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to the listed steps or elements, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. Reference to "connected," "coupled," and the like in this application is not intended to be limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. Reference herein to "a plurality" means greater than or equal to two. "and/or" describes an association relationship of associated objects, meaning that three relationships may exist, for example, "A and/or B" may mean: a exists alone, A and B exist simultaneously, and B exists alone. Reference herein to the terms "first," "second," "third," and the like, are merely to distinguish similar objects and do not denote a particular ordering for the objects.

In the present embodiment, an application scenario of a method for medical expense prediction is provided, and fig. 1 is a schematic application scenario of a method for medical expense prediction according to an embodiment of the present application, as shown in fig. 1, in this application environment, including a terminal device 102 and a server device 104. The server device 104 obtains historical medical data corresponding to all participants, generates a training set according to the historical medical data, and inputs the training set to the federal learning platform by each participant to train to obtain a uniform first prediction model. The server device 104 inputs the medical device to be tested into the medical expense prediction result obtained by the first prediction model, and sends the medical expense prediction result to the terminal device 102 for displaying, the terminal device 102 may be but is not limited to various smart phones, personal computers, notebook computers and tablet computers, and the server device 104 may be implemented by an independent server or a server cluster composed of a plurality of servers.

The present embodiment provides a method for medical expense prediction, and fig. 2 is a flowchart of a method for medical expense prediction according to an embodiment of the present application, and as shown in fig. 2, the flowchart includes the following steps:

step S210, obtaining historical medical data corresponding to all participants, and generating a training set according to the historical medical data.

The participators refer to all medical institutions participating in operation in the medical operation process. In the horizontal federal learning, each medical institution as a participant needs to use data with the same characteristic space structure, so that historical medical data in the actual operation process of each participant needs to be acquired first, and a training set for model training is acquired subsequently by combining medical data conditions of all the historical medical data.

Step S220, inputting the training set to a federal learning platform, and training by using the federal learning platform through the participator to obtain a first prediction model based on a decision tree; and the federal learning platform is built and generated by local clusters which are pre-deployed on each participant.

The federal learning platform is built on each participant and a platform server site based on a federal learning algorithm. Because there is more overlap of user features among the data sets of each participant in the medical expense prediction process and less overlap of users, the present embodiment may employ a horizontal federal learning algorithm. After the federal learning platform is built, each participant trains by using a training set owned by the participant to obtain the first prediction model. Considering machine learning, a decision tree is a prediction model; he represents a mapping between object properties and object values, and the first prediction model in this implementation is generated based on a decision tree. Specifically, by using the federal learning platform, each participant and the server can jointly train according to the training set to obtain a first prediction model for medical expense prediction based on a decision tree. Through the step S220, each participant can participate in the training process, and data does not depart from the local area, so that the data privacy security in the medical expense prediction process is high.

Step S230, acquiring medical data to be tested corresponding to at least one of the participants, and inputting the medical data to be tested to the first prediction model to generate a medical expense prediction result.

After the first prediction model is trained, the medical data to be tested is input into the first prediction model to generate the medical expense prediction result. The server side may receive all the first prediction models and input each piece of medical data to be measured into the first prediction model to output a medical expense prediction result, or the participating sides may input each piece of medical data to be measured into the corresponding first prediction model to obtain a medical expense prediction result; or, the server side may upload the first prediction model to a pre-deployed or designated application website, and the user inputs the medical data to be tested to the first prediction model on the application website, so as to obtain a medical expense prediction result, which is not described herein again.

Through the steps S210 to S230, based on the federal learning platform set up by each participant, a first prediction model is obtained through training, and medical data to be tested are input into the first prediction model for medical expense prediction to output a medical expense prediction result, so that each medical institution can jointly establish the medical expense prediction model on the basis of not sharing data through federal learning, the privacy and data safety of users are guaranteed, common modeling among the medical institutions is realized, the problem of low privacy of medical expense prediction is solved, and the accurate and safe medical expense prediction method based on federal learning is realized.

In some embodiments, the step S220 further includes the following steps:

step S221, inputting the training set to each of the participants; and the participant generates candidate split points according to the training set and traverses the candidate classification points to generate a gradient histogram.

In the related art, the Decision Tree used in this embodiment is a Gradient Boosting Decision Tree (GBDT), and generally, only a conventional distributed learning method is used to train the GBDT, that is, all participating institutions send respective local Gradient histograms including a sum of a first-order Gradient and a second-order Gradient to each other, so that characteristics and tag contents of respective data of the participating institutions may be inferred from Gradient information, thereby causing data leakage. Therefore, the decision tree is trained through the built federal learning platform, data leakage can be avoided, and privacy of model training is effectively improved.

SecureBoost is a GBDT model framework provided by FATE that is suitable for federal learning. The model method is added with protection measures for data sample privacy on the basis of inheriting the advantages of GBDT. Under the method, each participating mechanism transmits the local gradient histogram to the same server end in an encryption mode, the server obtains the global gradient histogram through aggregation, then the optimal segmentation point is searched and informed to each participating mechanism, and the original local histogram of any mechanism cannot be obtained in the process. The server side can cooperate with all the participators in the machine learning process, and all the participators of medical institutions build and share the global medical expense prediction model together on the premise of not revealing local samples of the participators.

Specifically, the local hyper-parameters of the local cluster initialization model corresponding to each participant are set, a training set is loaded, and some candidate split points are selected for each feature. Then, each participant scans training samples in all training sets under the current leaf node aiming at one leaf node, such as a root node or a left node, traverses candidate split points according to the characteristic values of the training samples, and accumulates the sample gradient obtained by calculation under each split point into a corresponding histogram, so that a local gradient histogram under each candidate split point is constructed.

Step S222, receiving, by the federal learning platform, an encrypted histogram correspondingly obtained after each participant encrypts the gradient histogram, and generating a global gradient histogram according to the encrypted histogram.

It should be added that, on the basis of GBDT, the above-mentioned lateral SecureBoost model adds a protection measure for data sample privacy, that is, security aggregation. The security aggregation can enable a plurality of participants with respective inputs to complete summation calculation through one server side, and the respective inputs are not disclosed to the server side or other participants. Specifically, taking a single split point as an example, fig. 3 is a flowchart of an aggregation method according to an embodiment of the present application, and as shown in fig. 3, the server side generates a set of positive and negative random numbers with a total of 0 and randomly allocates the positive and negative random numbers to each participant, for example, there are 4 participants in the embodiment, including participant 1 to participant 4; the server side randomly generates four random numbers of + a, -b, + c and-d, and distributes the four random numbers to each participant; wherein + a-b + c-d is 0. Each participant adds the distributed random number to the gradient histogram containing the statistical information of the sum of the first-order gradients and the sum of the second-order gradients to complete encryption, and then sends the encrypted histogram obtained after encryption to the server side to complete summarization. The server side aggregates all the encrypted histograms to obtain a global gradient histogram, wherein the random numbers are completely offset due to the fact that the sum is 0, and therefore the server side cannot know the original input of each participant, and the random numbers are offset in the summing process without affecting the result.

Step S223, obtaining an optimal split point according to the global gradient histogram, and sending the optimal split point to each participant through the federal learning platform; and the participant updates the decision tree according to the optimal split point so as to generate the first prediction model.

Fig. 4 is a flowchart of a histogram processing method according to an embodiment of the present application, and as shown in fig. 4, the server side may subtract the aggregated left-node global gradient histogram from the parent-node global gradient histogram to obtain a right-node global gradient histogram. Then the server side searches the global optimal splitting point from the global gradient histograms of the left node and the right node, and returns the optimal splitting point as a result to each participant. After obtaining the current optimal split point, each participant may update the decision tree according to the optimal split point, redistribute the samples, and re-execute the step S221 to find the next layer node, that is, if the current decision tree has reached the maximum depth or meets the stop condition, stop building the current tree, create a new tree, and return to the step S221 to restart a new node.

Through the steps S221 to S223, the first prediction model based on the GBDT is completed through the training of each participant by the federal learning platform, and the process that each participant interacts information with the server side on the federal learning platform is realized by the SecureBoost model in the training process, so that information leakage caused in the multi-party information interaction process is avoided, and the privacy security of medical expense prediction is effectively improved.

In some embodiments, the generating the training set according to the historical medical data further includes: acquiring a preset data structure; performing data preprocessing on the historical medical data according to the preset data structure to obtain a processed characteristic data set; the training set is generated from the feature data set.

In the horizontal federal learning process, each participant can firstly carry out preprocessing and feature engineering on each historical medical data, so that the historical medical data among all participants have the same feature space structure. Specifically, based on the historical medical data, a standard data structure for subsequent model training, namely the preset data structure, can be manually formulated; the standard data structure may include the characteristic variable names, data types, value ranges, and medical costs as labels for all historical medical data that is involved in the modeling. According to the established standard data structure, each participant can generate a characteristic data set consistent with the standard data structure by screening, preprocessing, converting and the like on the own historical medical data, and the characteristic data set is used as a training set finally used for model training.

The processing method of each participant for respective historical medical data may be: the method comprises the steps of preliminarily screening out variables which are the same as or related to features in a standard data structure from historical medical data of the patient as candidate features, deleting extreme and abnormal data with quantiles smaller than 0.01 and quantiles larger than 0.99, carrying out mean filling on missing values, and converting the candidate features into the features in the standard data structure after certain processing. It will be appreciated that the processing method for converting candidate features into features in the canonical data structure includes: aiming at candidate characteristics of text classes, such as gender, blood type, area and education degree, converting the candidate characteristics into discrete numerical values; aiming at specific target characteristics, such as the total payment times, the total registration fee, the total hospitalization days and the like of a patient, performing aggregate statistics on a plurality of related candidate characteristics such as the payment times, the single registration fee, the single hospitalization days and the like of the patient to obtain the target characteristics; performing feature binning processing on continuous and partially discrete candidate features; and aiming at the general discrete candidate characteristics, processing by adopting coding modes such as One-Hot coding and the like. The data such as the feature variables and the tag names obtained after the data preprocessing are shown in table 1:

table 1: data preprocessing result examples

It should be added that the above-mentioned generating a training set according to the historical medical data further includes the following steps: the training set, validation set, and test set are generated from the historical medical data. Dividing the historical medical data into a training set, a verification set and a test set according to a preset certain proportion, for example, according to the proportion of 6:2: 2; or, the historical medical data may be preprocessed through the above steps to obtain a feature data set, and the feature data set is divided into a training set, a verification set, and a test set according to a certain proportion, which is not described herein again.

Through the embodiment, all historical medical data are preprocessed, so that the processed characteristic data are more beneficial to model training, and the efficiency and the accuracy of medical expense prediction are effectively improved.

In some embodiments, a method for medical expense prediction is provided, and fig. 5 is a flowchart of another method for medical expense prediction according to an embodiment of the present application, as shown in fig. 5, the flowchart includes steps S210 and S220 in fig. 2, and further includes the following steps:

step S510, sending the verification set to all the participants; and the participant carries out parameter adjustment processing on the first prediction model according to the verification set to obtain a processed second prediction model, and selects an optimal prediction model from all the second prediction models.

The participants carry out inspection on the acquired global first prediction model through the received verification set, and the inspection results are collected to the server side; and the server side modifies the hyper-parameter setting according to the test result of the first prediction model of each participant. And finally, retraining the first prediction model by each participant by using the adjusted hyper-parameters to obtain a second prediction model after parameter adjustment, and selecting a model with the optimal performance from a plurality of second prediction models as the optimal prediction model.

Step S520, sending the test set to the participant, receiving a test result of the participant according to the test set and aiming at the optimal prediction model, and inputting the medical data to be tested to the optimal prediction model to generate the medical expense prediction result.

And each participant receives a test set of the server side, and uses the selected optimal prediction model for prediction of the test set to obtain a test result. The server side collects the test results of all the participants to be used as a final model prediction result; the test results are used to indicate the generalization ability of the model. And finally, the medical data to be tested can be input into an optimal prediction model to output the medical expense prediction result.

It should be noted that, the effectiveness test indexes of the verification set and the prediction result indexes of the test set may use evaluation indexes of a general regression model, and the evaluation indexes include Root Mean Square Error (RMSE), R-square score (R)²) And Mean Absolute Percent Error (MAPE) and the like; the evaluation index can be used to calculate the verification result of the verification set and the prediction result of the test set.

Through the steps S510 and S520, the parameter adjustment and test processing is performed on the first prediction model based on the verification set and the test set, so that the model is trained on the training set, the model is evaluated on the verification set, the model prediction is performed on the test set, and the verification set is used as a basis for adjusting the model, thereby avoiding information leakage in the test set, effectively improving generalization capability of the prediction model, and further improving privacy security and accuracy of medical expense prediction.

In some embodiments, the method for building the federal learning platform includes the following steps: independent Kubernetes local clusters are respectively deployed on all the participants; wherein, each Kubernetes local cluster is connected with each other; deploying FATE in the Kubernetes local cluster, and further generating the Federal learning platform.

Before model training, firstly, building a federal learning platform among all medical institutions is required; the FATE framework under the Kubernetes system is adopted to build a federal learning platform. Kubernetes is a container cluster management system with open source of google, supports the management and maintenance of automatic deployment, scaling and containerization applications, and has a large and rapidly growing ecosystem. In kubernets, a plurality of containers can be created, each container runs an application instance, and then management, discovery and access of the group of application instances are realized through a built-in load balancing strategy, and the details do not need operation and maintenance personnel to perform complicated manual configuration and processing.

The FATE is an open source project and provides a reliable and safe computing framework for the federal learning ecosystem. The FATE architecture uses technologies such as Multi-Party Secure computing (MPC) and Homomorphic Encryption (HE) to construct a bottom layer Secure computing protocol so as to support different types of machine learning Secure computing. KubeFATE is a FATE cluster that is based on kubernets and can be rapidly deployed in kubernets.

In particular, independent Kubernetes clusters are respectively deployed on all medical institutions participating in medical expenses, namely hosts of all participants and platform server sites, and all the clusters are ensured to be connected and intercommunicated. Then installing a kubecect command and deploying ingress-controller; roles, namespaces, and other resources are created in a kubernets cluster. Deploying FATE on each Kubernetes cluster, wherein the FATE deployment process comprises the following steps: 1. downloading and decompressing the kubemate file on a master node server of the Kubernetes cluster; 2. configuring rbac; 3. installing and deploying a KubeFATE server on Kubernets; 4. configuring an ingress host; 5. installing a kubemate command line tool; 6. modify cluster. yaml configuration file; 7. creating namespace; 8. downloading the FATE mirror image; 9. deployment FATE is installed using a kubemate command line tool. And finally, configuring a host file of the fateboard, checking the interoperability of FATE among clusters, and using the fateborad to test the effect of the platform so as to complete the construction of the federal learning platform.

Through the embodiment, the Federal learning platform is built through the FATE framework under the Kubernetes system, so that rapid deployment in the Kubernetes is realized, medical expense prediction based on multi-party safety calculation is realized, and the efficiency and privacy safety of the medical expense prediction are improved.

It should be noted that the steps illustrated in the above-described flow diagrams or in the flow diagrams of the figures may be performed in a computer system, such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flow diagrams, in some cases, the steps illustrated or described may be performed in an order different than here.

The present embodiment further provides a medical expense prediction apparatus, which is used to implement the foregoing embodiments and preferred embodiments, and the description of the apparatus is omitted here. As used hereinafter, the terms "module," "unit," "subunit," and the like may implement a combination of software and/or hardware for a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.

Fig. 6 is a block diagram of a medical expense prediction apparatus according to an embodiment of the present application, as shown in fig. 6, the apparatus including: an acquisition module 62, a training module 64, and a generation module 66.

The obtaining module 62 is configured to obtain historical medical data corresponding to all participants, and generate a training set according to the historical medical data; the training module 64 is configured to input the training set to a federal learning platform, and obtain a first prediction model based on a decision tree through the participant by using the federal learning platform; the federated learning platform is built and generated by each local cluster which is pre-deployed on each participant; the generating module 66 is configured to obtain medical data to be tested corresponding to at least one of the participants, and input the medical data to be tested to the first prediction model to generate a medical expense prediction result.

Through the embodiment, the training module 64 is used for training to obtain the first prediction model based on the federal learning platform set up by each participant, the generation module 66 is used for inputting the medical data to be tested into the first prediction model for medical expense prediction to output a medical expense prediction result, and therefore all medical institutions can jointly establish the medical expense prediction model on the basis of not sharing data through federal learning, the user privacy and data safety are guaranteed, common modeling among all medical institutions is realized, the problem of low privacy of medical expense prediction is solved, and the medical expense prediction device which is accurate and safe and based on federal learning is realized.

In some embodiments, the training module 64 is further configured to input the training set to each of the participants; the participant generates candidate split points according to the training set and traverses the candidate classification points to generate a gradient histogram; the training module 64 receives the encrypted histogram correspondingly obtained after each participant encrypts the gradient histogram through the federal learning platform, and generates a global gradient histogram according to the encrypted histogram; the training module 64 obtains an optimal split point according to the global gradient histogram, and sends the optimal split point to each of the participants through the federal learning platform; and the participant updates the decision tree according to the optimal split point so as to generate the first prediction model.

In some embodiments, the training module 64 is further configured to obtain a preset data structure; the training module 64 performs data preprocessing on the historical medical data according to the preset data structure to obtain a processed feature data set; the training module 64 generates the training set from the feature data set.

In some embodiments, the training module 64 is further configured to generate the training set, the validation set, and the test set based on the historical medical data.

In some embodiments, the device for predicting medical expenses further includes a parameter adjusting module and a testing module; the parameter adjusting module is used for sending the verification set to all the participants; and the participant carries out parameter adjustment processing on the first prediction model according to the verification set to obtain a processed second prediction model, and selects an optimal prediction model from all the second prediction models. The test module is used for sending the test set to the participant, receiving a test result of the participant according to the test set and aiming at the optimal prediction model, and inputting the medical data to be tested to the optimal prediction model so as to generate the medical expense prediction result.

In some embodiments, the device for predicting medical expenses further includes a construction module; the building module is used for respectively deploying independent Kubernetes local clusters on all the participants; wherein, each Kubernetes local cluster is connected with each other; the building module deploys FATE in the Kubernetes local cluster so as to generate the federal learning platform.

The above modules may be functional modules or program modules, and may be implemented by software or hardware. For a module implemented by hardware, the modules may be located in the same processor; or the modules can be respectively positioned in different processors in any combination.

The present embodiment further provides a system for medical expense prediction, and fig. 7 is a block diagram of a system for medical expense prediction according to an embodiment of the present application, and as shown in fig. 7, the system includes: a terminal device 102, a transmission device 72, and a server device 104; wherein the terminal device 102 is connected to the server device 104 through the transmission device 72; the terminal device 102 is configured to display a medical expense prediction result; the transmission device 72 is used for transmitting the medical expense prediction result.

The server device 104 is configured to obtain historical medical data corresponding to all participants, and generate a training set according to the historical medical data; the server device 104 inputs the training set to a federal learning platform, and obtains a first prediction model based on a decision tree through the participant by using the federal learning platform; the federated learning platform is built and generated by each local cluster which is pre-deployed on each participant; the server device 104 obtains medical data to be tested corresponding to at least one of the participants, and inputs the medical data to be tested to the first prediction model to generate a medical expense prediction result; the server device 104 transmits the medical expense prediction result to the terminal device and displays the result.

Through the embodiment, the server device 104 trains to obtain the first prediction model based on the federal learning platform set up by each participant, and inputs the medical data to be tested into the first prediction model for medical expense prediction to output a medical expense prediction result, so that the medical expense prediction models can be jointly set up by each medical institution on the basis of not sharing data through federal learning, the privacy and data safety of users are guaranteed, common modeling among the medical institutions is realized, the problem of low privacy of medical expense prediction is solved, and the medical expense prediction device based on federal learning is accurate and safe.

In some embodiments, the server device 104 is further configured to input the training set to each of the participants; the participant generates candidate split points according to the training set and traverses the candidate classification points to generate a gradient histogram; the server device 104 receives the encrypted histogram correspondingly obtained after each participant encrypts the gradient histogram through the federal learning platform, and generates a global gradient histogram according to the encrypted histogram; the server device 104 obtains an optimal split point according to the global gradient histogram, and sends the optimal split point to each of the participants through the federal learning platform; and the participant updates the decision tree according to the optimal split point so as to generate the first prediction model.

In some embodiments, the server device 104 is further configured to obtain a preset data structure; the server device 104 performs data preprocessing on the historical medical data according to the preset data structure to obtain a processed feature data set; the server device 104 generates the training set from the feature data set.

In some embodiments, the server device 104 is further configured to generate the training set, the validation set, and the test set based on the historical medical data.

In some embodiments, the server device 104 is further configured to send the validation set to all of the participants; the participator carries out parameter adjustment processing on the first prediction model according to the verification set to obtain a processed second prediction model, and selects an optimal prediction model from all the second prediction models; the server device 104 sends the test set to the participant, receives a test result of the participant for the optimal prediction model according to the test set, and inputs the medical data to be tested to the optimal prediction model to generate the medical expense prediction result.

In some embodiments, the server device 104 is further configured to deploy independent kubernets local clusters on all the participants; wherein, each Kubernetes local cluster is connected with each other; the server device deploys FATE in the Kubernetes local cluster, and then the Federal learning platform is generated.

In some embodiments, a computer device is provided, and the computer device may be a server, and fig. 8 is a structural diagram of the inside of a computer device according to the embodiment of the present application, as shown in fig. 8. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used to store a first predictive model. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement the method of medical expense prediction described above.

Those skilled in the art will appreciate that the architecture shown in fig. 8 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

The present embodiment also provides an electronic device comprising a memory having a computer program stored therein and a processor configured to execute the computer program to perform the steps of any of the above method embodiments.

Optionally, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.

Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:

and S1, acquiring historical medical data corresponding to all participants, and generating a training set according to the historical medical data.

S2, inputting the training set to a federal learning platform, and training by the participator through the federal learning platform to obtain a first prediction model based on a decision tree; and the federal learning platform is built and generated by local clusters which are pre-deployed on each participant.

And S3, acquiring medical data to be tested corresponding to at least one of the participants, and inputting the medical data to be tested into the first prediction model to generate a medical expense prediction result.

It should be noted that, for specific examples in this embodiment, reference may be made to examples described in the foregoing embodiments and optional implementations, and details of this embodiment are not described herein again.

In addition, in combination with the method for predicting medical expenses in the above embodiments, the embodiments of the present application may provide a storage medium to implement. The storage medium having stored thereon a computer program; the computer program, when executed by a processor, implements a method of medical cost prediction as in any of the above embodiments.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

It should be understood by those skilled in the art that various features of the above-described embodiments can be combined in any combination, and for the sake of brevity, all possible combinations of features in the above-described embodiments are not described in detail, but rather, all combinations of features which are not inconsistent with each other should be construed as being within the scope of the present disclosure.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method of medical cost prediction, the method comprising:

2. The method of claim 1, wherein inputting the training set to a federated learning platform and training, by the participants, a first decision tree-based predictive model using the federated learning platform comprises:

3. The method of claim 1, wherein the generating a training set from the historical medical data comprises:

acquiring a preset data structure;

and generating the training set according to the characteristic data set.

4. The method of claim 1, wherein the generating a training set from the historical medical data comprises:

5. The method of claim 4, wherein after training with the federated learning platform to obtain a first prediction model based on a decision tree, the method further comprises:

6. The method according to any one of claims 1 to 5, wherein the method for building the federal learning platform comprises the following steps:

deploying FATE in the Kubernetes local cluster, and further generating the federal learning platform.

7. An apparatus for medical cost prediction, the apparatus comprising: the device comprises an acquisition module, a training module and a generation module;

8. A system for medical cost prediction, the system comprising: a terminal device, a transmission device and a server device; the terminal equipment is connected with the server equipment through the transmission equipment;

the server device is adapted to perform a method of medical cost prediction according to any one of claims 1 to 6.

9. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, and wherein the processor is configured to execute the computer program to perform the method of medical cost prediction according to any of claims 1 to 6.

10. A storage medium, in which a computer program is stored, wherein the computer program is arranged to carry out the method of medical cost prediction of any one of claims 1 to 6 when executed.