CN112699947A - Decision tree based prediction method, apparatus, device, medium, and program product - Google Patents

Decision tree based prediction method, apparatus, device, medium, and program product Download PDF

Info

Publication number
CN112699947A
CN112699947A CN202011642783.5A CN202011642783A CN112699947A CN 112699947 A CN112699947 A CN 112699947A CN 202011642783 A CN202011642783 A CN 202011642783A CN 112699947 A CN112699947 A CN 112699947A
Authority
CN
China
Prior art keywords
decision tree
preset
target
coordinator
prediction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011642783.5A
Other languages
Chinese (zh)
Inventor
周雨豪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WeBank Co Ltd
Original Assignee
WeBank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WeBank Co Ltd filed Critical WeBank Co Ltd
Priority to CN202011642783.5A priority Critical patent/CN112699947A/en
Publication of CN112699947A publication Critical patent/CN112699947A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention relates to the technical field of federal learning, and discloses a prediction method, a device, equipment, a medium and a program product based on a decision tree, wherein the method comprises the following steps: acquiring a target decision tree classification model of a client; acquiring a target sample, inputting the target sample into a target decision tree classification model, and acquiring a prediction classification result of the target sample through the target decision tree classification model; compared with the prior art that the prediction based on the decision tree is carried out by adopting a deep learning model, the prediction based on the decision tree is carried out by adopting the decision tree classification model, so that the model cost is reduced, in addition, the prediction process of the model is visualized by calling the preset model independent interpretation method, and further the confidence of the model prediction result is improved.

Description

Decision tree based prediction method, apparatus, device, medium, and program product
Technical Field
The invention relates to the technical field of federal learning, in particular to a prediction method, a prediction device, prediction equipment, prediction media and a program product based on a decision tree.
Background
In recent years, due to the problem of data privacy, most organizations generally use user data collected by different organizations for modeling, and rarely share the collected data among the organizations, for example, medical organizations use medical data for modeling, however, a model constructed in this way generally has a problem of poor generalization capability due to insufficient data volume. Therefore, how to construct a model with better generalization performance on the premise of protecting user data privacy is a major concern in the medical field at present.
Therefore, in recent years, research results of applying the federal learning technology to such scenes are emerging, but most of the research results use a complex deep learning model, although private data is not exposed among participants, a large amount of intermediate information still needs to be interacted, the requirement on communication bandwidth is high, the cost of prediction by using the deep learning model is high, and in addition, the complexity of the deep learning model is high, the prediction process of the model is non-visual, and most users cannot know the reason of the prediction result output by the model, so the confidence of the model prediction result is low.
Disclosure of Invention
The invention provides a prediction method, a prediction device, a prediction equipment, a prediction medium and a prediction program product based on a decision tree, and aims to solve the technical problems that the cost of a currently used model is high and the confidence of a model prediction result is low.
In order to achieve the above object, the present invention provides a prediction method based on a decision tree, which is applied to a client participating in horizontal federal learning, and the method includes:
obtaining a target decision tree classification model of the client, wherein the target decision tree classification model is obtained by performing horizontal federal learning on a plurality of clients participating in horizontal federal learning through a preset coordinator;
acquiring a target sample, inputting the target sample into the target decision tree classification model, and obtaining a prediction classification result of the target sample through the target decision tree classification model;
and calling a preset model independent interpretation method to perform interpretation analysis on the prediction process of the prediction classification result of the target sample obtained by the target decision tree classification model so as to obtain a classification characteristic interpretation result of the prediction classification result.
Preferably, before the step of obtaining the target decision tree classification model of the client, the method further includes:
constructing a current node of a client decision tree, and acquiring node characteristic information of the current node;
and encrypting and sending the node characteristic information as an intermediate result to a preset coordinator, so as to obtain a target decision tree classification model by coordinating a plurality of clients participating in horizontal federated learning through the preset coordinator.
Preferably, the step of sending the node feature information as an intermediate result to a preset coordinator in an encrypted manner to obtain a target decision tree classification model by coordinating a plurality of clients participating in horizontal federal learning through the preset coordinator includes:
encrypting and sending the node characteristic information serving as an intermediate result to a preset coordinator, so as to select the optimal intermediate result from a plurality of intermediate results respectively corresponding to a plurality of clients participating in horizontal federal learning through the preset coordinator;
dividing local medical training data corresponding to the current node according to the optimal intermediate result, and constructing left and right subtrees of the current node according to the divided target medical training data;
respectively taking the left subtree and the right subtree as current nodes, and returning to execute the step of acquiring the node characteristic information of the current nodes;
and continuously executing the step of encrypting and sending the node characteristic information serving as an intermediate result to a preset coordinator until the decision tree of the client converges or a preset maximum iteration ethic is reached, so as to coordinate a plurality of clients participating in horizontal federated learning by the preset coordinator to obtain a target decision tree classification model.
Preferably, the node feature information includes a partition attribute, a partition threshold, and an information gain of the current node, and the step of sending the node feature information as an intermediate result to a preset coordinator in an encrypted manner, so as to select an optimal intermediate result from among a plurality of intermediate results respectively corresponding to a plurality of clients participating in horizontal federal learning by the preset coordinator includes:
and encrypting and sending the partition attribute, the partition threshold and the information gain of the current node as intermediate results to a preset coordinator, wherein the preset coordinator decrypts the encrypted intermediate results to select the partition attribute and the partition threshold corresponding to the maximum information gain from the intermediate results of the plurality of clients participating in the horizontal federal learning as optimal intermediate results, and encrypting and sending the optimal intermediate results to the clients participating in the horizontal federal learning.
Preferably, before the step of sending the node characteristic information as an intermediate result to the preset coordinator through encryption, the method further includes:
and receiving a public key matched with the private key of the preset coordinator sent by the preset coordinator, so as to encrypt the intermediate result through the public key.
Preferably, the predetermined model independent interpretation method comprises a local proxy independent interpretation method and/or a salpril value method.
Preferably, before the step of constructing the current node of the client decision tree, the method further includes:
acquiring local training data;
determining feature attributes of the local training data;
and classifying the local training data based on the characteristic attributes to construct a current node of a client decision tree.
In addition, to achieve the above object, the present invention provides a prediction apparatus based on a decision tree, including:
the system comprises a first obtaining module, a second obtaining module and a third obtaining module, wherein the first obtaining module is used for obtaining a target decision tree classification model of the client, and the target decision tree classification model is obtained by performing horizontal federal learning on a plurality of clients participating in horizontal federal learning through a preset coordinator;
the second obtaining module is used for obtaining a target sample and inputting the target sample into the target decision tree classification model so as to obtain a prediction classification result of the target sample through the target decision tree classification model;
and the calling module is used for calling a preset model independent interpretation method to interpret and analyze the preset step of obtaining the predicted classification result of the target sample through the target decision tree classification model so as to obtain the characteristic interpretation result of the predicted classification result.
In addition, to achieve the above object, the present invention further provides a prediction device based on a decision tree, which includes a processor, a memory and a prediction program based on a decision tree stored in the memory, wherein when the prediction program based on a decision tree is executed by the processor, the steps of the prediction method based on a decision tree as described above are implemented.
In addition, to achieve the above object, the present invention further provides a computer storage medium having a prediction program based on a decision tree stored thereon, wherein the prediction program based on a decision tree realizes the steps of the prediction method based on a decision tree as described above when the prediction program based on a decision tree is executed by a processor.
Compared with the prior art, the invention provides a prediction method based on a decision tree, which is characterized in that a target decision tree classification model of the client is obtained, wherein the target decision tree classification model is obtained by performing horizontal federal learning on a plurality of clients participating in horizontal federal learning through a preset coordinator; acquiring a target sample, inputting the target sample into the target decision tree classification model, and obtaining a prediction classification result of the target sample through the target decision tree classification model; and calling a preset model independent interpretation method to interpret and analyze the prediction process of the prediction classification result of the target sample obtained by the target decision tree classification model to obtain the classification characteristic interpretation result of the prediction classification result, so that compared with the conventional method of using a deep learning model to predict based on the decision tree, the prediction method based on the decision tree provided by the invention uses the decision tree classification model to predict based on the decision tree, thereby reducing the cost of the model, and in addition, the prediction process of the model is analyzed and analyzed by calling the preset model independent interpretation method to visualize the prediction process of the model, thereby improving the confidence of the model prediction result.
Drawings
FIG. 1 is a diagram illustrating a hardware architecture of a decision tree based prediction device according to various embodiments of the present invention;
FIG. 2 is a flow chart illustrating a first embodiment of a decision tree based prediction method according to the present invention;
FIG. 3 is a diagram of a client hardware architecture participating in horizontal federated learning according to an embodiment of the present invention;
FIG. 4 is a flow chart illustrating a second embodiment of the decision tree based prediction method of the present invention;
FIG. 5 is a functional block diagram of an embodiment of a decision tree based prediction apparatus according to the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Referring to fig. 1, fig. 1 is a schematic diagram of a hardware structure of a prediction apparatus based on a decision tree according to embodiments of the present invention. In this embodiment of the present invention, the prediction device based on the decision tree may include a processor 1001 (e.g., a Central Processing Unit, CPU), a communication bus 1002, an input port 1003, an output port 1004, and a memory 1005. The communication bus 1002 is used for realizing connection communication among the components; the input port 1003 is used for data input; the output port 1004 is used for data output, the memory 1005 may be a high-speed RAM memory, or a non-volatile memory (non-volatile memory), such as a magnetic disk memory, and the memory 1005 may optionally be a storage device independent of the processor 1001. Those skilled in the art will appreciate that the hardware configuration depicted in FIG. 1 is not intended to be limiting of the present invention, and may include more or less components than those shown, or some components in combination, or a different arrangement of components.
With continued reference to FIG. 1, the memory 1005 of FIG. 1, which is one type of readable storage medium, may include an operating system, a network communication module, an application program module, and a decision tree based prediction program. In the terminal shown in fig. 1, the network interface 1004 is mainly used for connecting to a backend server and performing data communication with the backend server; the user interface 1003 is mainly used for connecting a client (user side) and performing data communication with the client; and processor 1001 may be used to invoke a decision tree based prediction program stored in memory 1005.
In this embodiment, the prediction apparatus based on the decision tree includes: a memory 1005, a processor 1001, and a prediction program based on decision tree stored in the memory 1005 and operable on the processor 1001, wherein the processor 1001, when calling the prediction program based on decision tree stored in the memory 1005, performs the following operations:
obtaining a target decision tree classification model of the client, wherein the target decision tree classification model is obtained by performing horizontal federal learning on a plurality of clients participating in horizontal federal learning through a preset coordinator;
acquiring a target sample, inputting the target sample into the target decision tree classification model, and obtaining a prediction classification result of the target sample through the target decision tree classification model;
and calling a preset model independent interpretation method to perform interpretation analysis on the prediction process of the prediction classification result of the target sample obtained by the target decision tree classification model so as to obtain a classification characteristic interpretation result of the prediction classification result.
Further, processor 1001 may call a decision tree based prediction program stored in memory 1005, and also perform the following operations:
constructing a current node of a client decision tree, and acquiring node characteristic information of the current node;
and encrypting and sending the node characteristic information as an intermediate result to a preset coordinator, so as to obtain a target decision tree classification model by coordinating a plurality of clients participating in horizontal federated learning through the preset coordinator.
Further, processor 1001 may call a decision tree based prediction program stored in memory 1005, and also perform the following operations:
encrypting and sending the node characteristic information serving as an intermediate result to a preset coordinator, so as to select the optimal intermediate result from a plurality of intermediate results respectively corresponding to a plurality of clients participating in horizontal federal learning through the preset coordinator;
dividing local medical training data corresponding to the current node according to the optimal intermediate result, and constructing left and right subtrees of the current node according to the divided target medical training data;
respectively taking the left subtree and the right subtree as current nodes, and returning to execute the step of acquiring the node characteristic information of the current nodes;
and continuously executing the step of encrypting and sending the node characteristic information serving as an intermediate result to a preset coordinator until the decision tree of the client converges or a preset maximum iteration ethic is reached, so as to coordinate a plurality of clients participating in horizontal federated learning by the preset coordinator to obtain a target decision tree classification model.
Further, processor 1001 may call a decision tree based prediction program stored in memory 1005, and also perform the following operations:
and encrypting and sending the partition attribute, the partition threshold and the information gain of the current node as intermediate results to a preset coordinator, wherein the preset coordinator decrypts the encrypted intermediate results to select the partition attribute and the partition threshold corresponding to the maximum information gain from the intermediate results of the plurality of clients participating in the horizontal federal learning as optimal intermediate results, and encrypting and sending the optimal intermediate results to the clients participating in the horizontal federal learning.
Further, processor 1001 may call a decision tree based prediction program stored in memory 1005, and also perform the following operations:
and receiving a public key matched with the private key of the preset coordinator sent by the preset coordinator, so as to encrypt the intermediate result through the public key.
Further, processor 1001 may call a decision tree based prediction program stored in memory 1005, and also perform the following operations:
acquiring local training data;
determining feature attributes of the local training data;
and classifying the local training data based on the characteristic attributes to construct a current node of a client decision tree.
Based on the hardware structure shown in fig. 1, a first embodiment of the present invention provides a prediction method based on a decision tree.
Referring to fig. 2, fig. 2 is a flowchart illustrating a first embodiment of a decision tree-based prediction method according to the present invention.
While a logical order is shown in the flow chart, in some cases, the steps shown or described may be performed in an order different than that shown. Specifically, the prediction method based on the decision tree in this embodiment includes:
step S10: obtaining a target decision tree classification model of the client, wherein the target decision tree classification model is obtained by performing horizontal federal learning on a plurality of clients participating in horizontal federal learning through a preset coordinator;
specifically, in this embodiment, it should be noted that, the prediction method based on the decision tree proposed by the present invention is applied to a client participating in horizontal federal learning, and it is easy to understand that, in recent years, due to the problem of data privacy, in general, different clients use respectively collected user data to perform modeling, and the clients rarely share the collected data, for example, three medical institutions, i.e. medical institution a, medical institution b, and medical institution c, use respectively local medical data of their respective medical institutions to perform modeling, however, the model constructed in this case has poor generalization capability due to insufficient data volume and insufficient data characteristics, so in this embodiment, as shown in fig. 3, a client hardware architecture diagram participating in horizontal federal learning in an embodiment is provided, and a plurality of clients are adopted to perform horizontal federated learning modeling, so that a model with better generalization performance is constructed on the premise of protecting the privacy of user data.
In addition, in some embodiments, a plurality of clients perform horizontal federal learning to construct a deep learning model, however, although private data is not exposed among all participants, a large amount of intermediate information still needs to be interacted, the requirement on the communication bandwidth is high, namely, the cost for predicting by using the deep learning model is high, in this embodiment, a target decision tree classification model constructed by performing horizontal federal learning by using the plurality of clients performs classification prediction, so that the target decision tree classification model is obtained by performing horizontal federal learning by using a preset coordinator, the generalization capability of the target decision tree classification model is strong, and when prediction based on a decision tree is performed by using the target decision tree classification model, the process communication load is low, the amount of computation is small, and the model prediction cost is reduced.
Step S20: acquiring a target sample, inputting the target sample into the target decision tree classification model, and obtaining a prediction classification result of the target sample through the target decision tree classification model;
specifically, the target samples refer to samples to be subjected to prediction classification in a client, and in addition, it should be noted that in this embodiment, preferably, the client refers to a client having multiple medical data sets, that is, the prediction method based on the decision tree of this embodiment is applied to medical image recognition classification in the medical field.
Specifically, a medical image sample to be predicted and classified in the client is obtained, for example, a brain imaging image to be predicted and classified, that is, Rs-fMRI image data, obtained based on the Rs-fMRI technology is obtained, and as the Rs-fMRI image data may be affected by machines, environments and the like when being acquired, the acquired Rs-fMRI image data may have some noise influence, in this embodiment, before the Rs-fMRI image data is input to a target decision tree classification model for classification, the Rs-fMRI image data needs to be denoised, preferably, in this embodiment, in order to better maintain image refinement of the original Rs-fMRI image data, the Rs-fMRI image data is denoised in a wavelet denoising manner, specifically, the Rs-fMRI image data is subjected to wavelet decomposition, and then, a high-frequency coefficient obtained after the threshold value decomposition is quantized, and finally, reconstructing Rs-fMRI image data by using the two-dimensional wavelet, and in addition, performing image denoising by using other denoising methods, such as an average filter, a morphological noise filter, and the like, which is not limited in this embodiment.
In addition, in order to improve the accuracy of the prediction result, in this embodiment, before the target sample is input into the target decision tree classification model for prediction classification, normalization, grayscale binarization, vectorization, and the like are performed on the target sample, so that irrelevant information in the original target sample is eliminated, useful real information in the original target sample is restored, the detectability of the target sample is enhanced, and the feature data is simplified to the maximum extent, thereby improving the accuracy of the prediction result.
Therefore, in this embodiment, after the target sample is obtained and preprocessed, the target feature vector is obtained, and finally, the target feature vector is input to the target decision tree classification model, so as to obtain the prediction classification result of the target sample through the target decision tree classification model.
Step S30: and calling a preset model independent interpretation method to perform interpretation analysis on the prediction process of the prediction classification result of the target sample obtained by the target decision tree classification model so as to obtain a classification characteristic interpretation result of the prediction classification result.
Specifically, the preset Model-independent Interpretation method refers to a pre-constructed Model-independent Interpretation method (Model-independent Interpretation Methods), and preferably, in this embodiment, the preset Model-independent Interpretation method includes a Local agency-independent Interpretation (LIME) method and/or a Shapley Values (sharey Values) method.
It is easy to understand that since the process of model prediction is non-visual, most users cannot know the reason of the prediction result output by the model, and thus the confidence of the model prediction result is low, in this embodiment, after the prediction classification result of the target sample is obtained by the target decision tree classification model, in order to improve the confidence of the model prediction result, a preset model-independent interpretation method is called to perform interpretation analysis on the prediction process of the prediction classification result of the target sample obtained by the target decision tree classification model, so that the process of model prediction is visually output.
For ease of understanding, the present embodiment illustrates the above scheme:
for example, when analyzing the above model prediction process by using a Local interpretation-independent model (LIME) method, after inputting a target sample into a target decision tree classification model to obtain a prediction classification result of the target sample through the target decision tree classification model, calling a Local interpretation-independent method to process the target sample, for example, the target sample is a chest image sample, calling the Local interpretation-independent method to perform partial image feature concealment on the chest image sample, for example, dividing the chest image sample into a plurality of partial chest images, calling the Local interpretation-independent method to conceal the plurality of partial chest images one by one, specifically, retaining one partial chest image, and setting image pixels of the chest image sample except the partial chest image to gray or setting the parts of the chest image sample except the partial chest image to blank images, and inputting the adjusted chest image sample into a target decision tree classification model to obtain a prediction classification result of the adjusted chest image sample through the target decision tree classification model, so that a model prediction regression model is obtained based on a plurality of partial chest images obtained by dividing the chest image sample, and finally, a classification feature interpretation result of the prediction classification result can be obtained based on the prediction classification result of the model prediction regression model corresponding to the complete chest image sample, namely, the prediction classification result is generated based on which partial image feature in the chest image sample.
In addition, the step of analyzing the model prediction process by using the Shapley Values method is the same as the step of analyzing the model prediction process by using the Local interpretive model-agnostic extensions (LIME) method, that is, the target sample is divided into a plurality of partial samples, and then the Shapley Values method is called to obtain the contribution of each partial sample to the prediction classification result of the target sample one by one, so that which partial sample (i.e. sample characteristic) in the target sample leads to the generation of the prediction classification result can be known.
It should be understood that the above is only an example, and the technical solution of the present invention is not limited in any way, and those skilled in the art can make settings based on needs in practical applications, and the settings are not listed here.
According to the scheme, the target decision tree classification model of the client is obtained, wherein the target decision tree classification model is obtained by performing horizontal federal learning on a plurality of clients participating in horizontal federal learning through a preset coordinator; acquiring a target sample, inputting the target sample into the target decision tree classification model, and obtaining a prediction classification result of the target sample through the target decision tree classification model; and calling a preset model independent interpretation method to interpret and analyze the prediction process of the prediction classification result of the target sample obtained by the target decision tree classification model to obtain the classification characteristic interpretation result of the prediction classification result, so that compared with the conventional method of using a deep learning model to predict based on the decision tree, the prediction method based on the decision tree provided by the invention uses the decision tree classification model to predict based on the decision tree, thereby reducing the cost of the model, and in addition, the prediction process of the model is analyzed and analyzed by calling the preset model independent interpretation method to visualize the prediction process of the model, thereby improving the confidence of the model prediction result.
Further, based on the first embodiment of the decision tree based prediction method of the present invention, a second embodiment of the decision tree based prediction method of the present invention is proposed.
Referring to FIG. 4, FIG. 4 is a flowchart illustrating a second embodiment of a decision tree-based prediction method according to the present invention;
the second embodiment of the decision tree based prediction method is different from the first embodiment of the decision tree based prediction method in that, before the step of obtaining the target decision tree classification model of the client, the method further includes:
step S101: constructing a current node of a client decision tree, and acquiring node characteristic information of the current node;
step S102: and encrypting and sending the node characteristic information as an intermediate result to a preset coordinator, so as to obtain a target decision tree classification model by coordinating a plurality of clients participating in horizontal federated learning through the preset coordinator.
In this embodiment, a specific implementation scheme for constructing a target decision tree classification model of a client is provided, and specifically, a same target decision tree classification model is constructed by coordinating a plurality of clients participating in horizontal federal learning through a preset coordinator.
It should be understood that, in order to protect data privacy, the clients participating in the horizontal federal learning are independent from each other, so there is a certain difference in data between the clients, and it should be noted that, since the decision tree-based prediction method provided by the present invention is applied to clients possessing multiple medical data sets, since there is more data feature overlap and less data overlap between the multiple clients possessing multiple medical data sets, in this embodiment, referring to fig. 3, as shown in fig. 3, a hardware architecture diagram of a participating part participating in the horizontal federal learning in an embodiment, a same target decision tree classification model is constructed between the multiple clients by using the horizontal federal learning.
Specifically, after a current node of the decision tree is locally built by each client, node characteristic information of the current node, such as a partition attribute, a partition threshold and information gain of the current node, is sent to a preset coordinator as an intermediate result of current horizontal federal training, so that a same target decision tree classification model is built by coordinating a plurality of clients participating in horizontal federal learning through the preset coordinator.
In addition, in some embodiments, in order to improve the confidentiality during data interaction, before the client encrypts and sends the node characteristic information as an intermediate result to the preset coordinator, the client further needs to encrypt the intermediate result, so as to avoid leakage of the node characteristic information of the client, and therefore in this embodiment, before the step of encrypting and sending the node characteristic information as the intermediate result to the preset coordinator, the method further includes:
and receiving a public key matched with the private key of the preset coordinator sent by the preset coordinator, so as to encrypt the intermediate result through the public key.
Before a preset coordinator establishes a same target decision tree classification model in coordination with a plurality of clients participating in horizontal federated learning, the coordinator first generates a pair of a public key and a private key, for example, a private key and a public key matched with the private key are generated based on a homomorphic encryption algorithm, and then the public key is sent to each client so as to encrypt data when the clients and the preset coordinator perform data interaction.
In addition, in some embodiments, before the step of constructing the current node of the client decision tree, the method further includes:
acquiring local training data;
determining feature attributes of the local training data;
and classifying the local training data based on the characteristic attributes to construct a current node of a client decision tree.
Specifically, the feature attributes of the local training data refer to feature attributes used for classifying and dividing the local training data, for example, the local training data is user body feature data acquired by a plurality of users of different age groups under different environmental conditions, where the feature attributes of the local training data include age features, gender features, and environmental condition features, and it is easy to understand that the decision tree represents a mapping relationship between object attributes and object values, where each node in the tree represents an object, and each divergent path represents a possible attribute value, and therefore in this embodiment, the local training data is classified according to the feature attributes of the local training data to construct a current node of the client decision tree, and optionally, the local training data is classified according to one feature attribute to construct a current node of the client decision tree at random, or selecting the characteristic attribute with the largest information gain of the node to classify the local training data to construct the current node of the client decision tree, which is not limited in this embodiment.
In addition, for convenience of understanding, this embodiment specifically describes the above-mentioned embodiment in which the node feature information is sent to a preset coordinator as an intermediate result in an encrypted manner, so as to obtain a target decision tree classification model by coordinating multiple clients participating in horizontal federal learning by the preset coordinator:
encrypting and sending the node characteristic information serving as an intermediate result to a preset coordinator, so as to select the optimal intermediate result from a plurality of intermediate results respectively corresponding to a plurality of clients participating in horizontal federal learning through the preset coordinator;
dividing local medical training data corresponding to the current node according to the optimal intermediate result, and constructing left and right subtrees of the current node according to the divided target medical training data;
respectively taking the left subtree and the right subtree as current nodes, and returning to execute the step of acquiring the node characteristic information of the current nodes;
and continuously executing the step of encrypting and sending the node characteristic information serving as an intermediate result to a preset coordinator until the decision tree of the client converges or a preset maximum iteration ethic is reached, so as to coordinate a plurality of clients participating in horizontal federated learning by the preset coordinator to obtain a target decision tree classification model.
In this step, it is easy to understand that, since the plurality of clients participating in the horizontal federal learning are independent from each other, therefore, the current node of the client decision tree constructed by each client is different, so in this embodiment, in order to improve the construction of a decision tree classification model with better effect, each client is adopted to encrypt and send the node characteristic information of the current node of the decision tree of each client to a preset coordinator, under the condition of not revealing the data privacy of each client, the current node of an optimal decision tree is selected by a preset coordinator, thus, the steps are continuously and circularly executed until the decision tree of the client side converges or reaches the preset maximum iteration times, and ensuring that all decision tree nodes for acquiring the target decision tree classification model are optimal nodes, thereby ensuring the classification accuracy of the target decision tree classification model.
For convenience of understanding, this embodiment specifically describes the above-mentioned embodiment in which the node feature information is sent to a preset coordinator as an intermediate result in an encrypted manner, so that the preset coordinator selects an optimal intermediate result from among a plurality of intermediate results respectively corresponding to a plurality of clients participating in horizontal federal learning:
and encrypting and sending the partition attribute, the partition threshold and the information gain of the current node as intermediate results to a preset coordinator, wherein the preset coordinator decrypts the encrypted intermediate results to select the partition attribute and the partition threshold corresponding to the maximum information gain from the intermediate results of the plurality of clients participating in the horizontal federal learning as optimal intermediate results, and encrypting and sending the optimal intermediate results to the clients participating in the horizontal federal learning.
Specifically, the node feature information includes a partition attribute, a partition threshold, and an information gain of the current node, where the information gain is a difference between entropy values of a parent node and a child node after partitioning, and the information gain is an important index for feature selection, and is defined as how much information can be brought to the classification system by a feature, and if the information is more brought, the more important the feature is, the more the corresponding information gain is, so in this embodiment, the optimal intermediate result represents the partition attribute and the partition threshold corresponding to the maximum information gain in the decision tree of the multiple clients participating in the horizontal federal learning, and the partition attribute and the partition threshold corresponding to the maximum information gain are determined after the maximum information gain is selected by the pre-set coordinator from the multiple partition attributes, partition thresholds, and information gains sent from the multiple clients participating in the horizontal federal learning, and returning the optimal intermediate result to each client as the optimal intermediate result, so that a plurality of clients which coordinate and participate in the horizontal federated learning construct a same target decision tree classification model.
It should be understood that the above is only an example, and the technical solution of the present invention is not limited in any way, and those skilled in the art can make settings based on needs in practical applications, and the settings are not listed here.
According to the scheme, the current node of the client decision tree is constructed, and the node characteristic information of the current node is obtained; the node characteristic information is used as an intermediate result to be encrypted and sent to a preset coordinator, so that a target decision tree classification model is obtained through coordination of a plurality of clients participating in horizontal federated learning by the preset coordinator, and compared with the method for performing decision tree-based prediction by adopting a deep learning neural network model in the prior art, the method for constructing the decision tree model with low complexity is adopted, and the model cost is reduced.
In addition, the embodiment also provides a prediction device based on the decision tree. Referring to fig. 5, fig. 5 is a functional block diagram of an embodiment of a prediction apparatus based on a decision tree according to the present invention.
In this embodiment, the prediction device based on the decision tree is a virtual device, and is stored in the memory 1005 of the prediction device based on the decision tree shown in fig. 1, so as to realize all functions of the prediction program based on the decision tree: the target decision tree classification model is used for obtaining the client, and is obtained by performing horizontal federal learning on a plurality of clients participating in horizontal federal learning through a preset coordinator; the system comprises a target decision tree classification model, a target sample acquisition module, a target decision tree classification module and a target classification module, wherein the target decision tree classification model is used for acquiring a target sample and inputting the target sample into the target decision tree classification model so as to obtain a prediction classification result of the target sample through the target decision tree classification model; and the method is used for calling a preset model independent interpretation method to carry out interpretation analysis on the prediction process of the prediction classification result of the target sample obtained by the target decision tree classification model so as to obtain the classification characteristic interpretation result of the prediction classification result.
Specifically, referring to fig. 5, the decision tree-based prediction apparatus includes:
the first obtaining module 10 is configured to obtain a target decision tree classification model of the client, where the target decision tree classification model is obtained by performing horizontal federal learning on a plurality of clients participating in horizontal federal learning through a preset coordinator;
a second obtaining module 20, configured to obtain a target sample, and input the target sample to the target decision tree classification model, so as to obtain a predicted classification result of the target sample through the target decision tree classification model;
the invoking module 30 is configured to invoke a preset model-independent interpretation method to perform interpretation analysis on a preset step of obtaining a predicted classification result of a target sample through the target decision tree classification model, so as to obtain a feature interpretation result of the predicted classification result.
In addition, an embodiment of the present invention further provides a computer storage medium, where a prediction program based on a decision tree is stored on the computer storage medium, and when the prediction program based on the decision tree is executed by a processor, the steps of the prediction method based on the decision tree are implemented, which are not described herein again.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for causing a terminal device to execute the method according to the embodiments of the present invention.
The above description is only for the preferred embodiment of the present invention and is not intended to limit the scope of the present invention, and all equivalent structures or flow transformations made by the present specification and drawings, or applied directly or indirectly to other related arts, are included in the scope of the present invention.

Claims (11)

1. A prediction method based on a decision tree is applied to a client participating in horizontal federated learning, and the method comprises the following steps:
obtaining a target decision tree classification model of the client, wherein the target decision tree classification model is obtained by performing horizontal federal learning on a plurality of clients participating in horizontal federal learning through a preset coordinator;
acquiring a target sample, inputting the target sample into the target decision tree classification model, and obtaining a prediction classification result of the target sample through the target decision tree classification model;
and calling a preset model independent interpretation method to perform interpretation analysis on the prediction process of the prediction classification result of the target sample obtained by the target decision tree classification model so as to obtain a classification characteristic interpretation result of the prediction classification result.
2. The decision tree-based prediction method of claim 1, wherein the step of obtaining the target decision tree classification model of the client is preceded by the steps of:
constructing a current node of a client decision tree, and acquiring node characteristic information of the current node;
and encrypting and sending the node characteristic information as an intermediate result to a preset coordinator, so as to obtain a target decision tree classification model by coordinating a plurality of clients participating in horizontal federated learning through the preset coordinator.
3. The decision tree-based prediction method according to claim 2, wherein the step of sending the node feature information as an intermediate result to a preset coordinator in an encrypted manner so as to obtain a target decision tree classification model by coordinating a plurality of clients participating in horizontal federal learning through the preset coordinator comprises:
encrypting and sending the node characteristic information serving as an intermediate result to a preset coordinator, so as to select the optimal intermediate result from a plurality of intermediate results respectively corresponding to a plurality of clients participating in horizontal federal learning through the preset coordinator;
dividing local medical training data corresponding to the current node according to the optimal intermediate result, and constructing left and right subtrees of the current node according to the divided target medical training data;
respectively taking the left subtree and the right subtree as current nodes, and returning to execute the step of acquiring the node characteristic information of the current nodes;
and continuously executing the step of encrypting and sending the node characteristic information serving as an intermediate result to a preset coordinator until the decision tree of the client converges or a preset maximum iteration ethic is reached, so as to coordinate a plurality of clients participating in horizontal federated learning by the preset coordinator to obtain a target decision tree classification model.
4. The decision tree-based prediction method according to claim 3, wherein the node feature information includes a partition attribute, a partition threshold, and an information gain of a current node, and the step of sending the node feature information as an intermediate result to a preset coordinator in an encrypted manner so as to select an optimal intermediate result from among a plurality of intermediate results respectively corresponding to a plurality of clients participating in horizontal federal learning by the preset coordinator comprises:
and encrypting and sending the partition attribute, the partition threshold and the information gain of the current node as intermediate results to a preset coordinator, wherein the preset coordinator decrypts the encrypted intermediate results to select the partition attribute and the partition threshold corresponding to the maximum information gain from the intermediate results of the plurality of clients participating in the horizontal federal learning as optimal intermediate results, and encrypting and sending the optimal intermediate results to the clients participating in the horizontal federal learning.
5. The decision tree-based prediction method according to claim 2, wherein before the step of sending the node feature information as an intermediate result to the predetermined coordinator in an encrypted manner, the method further comprises:
and receiving a public key matched with the private key of the preset coordinator sent by the preset coordinator, so as to encrypt the intermediate result through the public key.
6. The decision tree based prediction method of claim 2, wherein the step of constructing the current node of the client decision tree is preceded by the step of:
acquiring local training data;
determining feature attributes of the local training data;
and classifying the local training data based on the characteristic attributes to construct a current node of a client decision tree.
7. The decision tree based prediction method according to any of claims 1 to 6, wherein the predetermined model independent interpretation method comprises a local surrogate independent interpretation method and/or a Shapril value method.
8. A decision tree based prediction apparatus, the decision tree based prediction apparatus comprising:
the system comprises a first obtaining module, a second obtaining module and a third obtaining module, wherein the first obtaining module is used for obtaining a target decision tree classification model of the client, and the target decision tree classification model is obtained by performing horizontal federal learning on a plurality of clients participating in horizontal federal learning through a preset coordinator;
the second obtaining module is used for obtaining a target sample and inputting the target sample into the target decision tree classification model so as to obtain a prediction classification result of the target sample through the target decision tree classification model;
and the calling module is used for calling a preset model independent interpretation method to interpret and analyze the preset step of obtaining the predicted classification result of the target sample through the target decision tree classification model so as to obtain the characteristic interpretation result of the predicted classification result.
9. Decision tree based prediction device, characterized in that it comprises a processor, a memory and a decision tree based prediction program stored in said memory, which when executed by said processor implements the steps of the decision tree based prediction method according to any of claims 1-7.
10. A computer storage medium having stored thereon a decision tree based prediction program, the decision tree based prediction program when executed by a processor implementing the steps of the decision tree based prediction method according to any one of claims 1-7.
11. A computer program product comprising a computer program, wherein the computer program, when executed by a processor, implements the steps of the federated modeling method as recited in any one of claims 1-7.
CN202011642783.5A 2020-12-30 2020-12-30 Decision tree based prediction method, apparatus, device, medium, and program product Pending CN112699947A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011642783.5A CN112699947A (en) 2020-12-30 2020-12-30 Decision tree based prediction method, apparatus, device, medium, and program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011642783.5A CN112699947A (en) 2020-12-30 2020-12-30 Decision tree based prediction method, apparatus, device, medium, and program product

Publications (1)

Publication Number Publication Date
CN112699947A true CN112699947A (en) 2021-04-23

Family

ID=75514192

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011642783.5A Pending CN112699947A (en) 2020-12-30 2020-12-30 Decision tree based prediction method, apparatus, device, medium, and program product

Country Status (1)

Country Link
CN (1) CN112699947A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114969543A (en) * 2022-06-15 2022-08-30 北京百度网讯科技有限公司 Promotion method, promotion system, electronic device and storage medium
CN115423148A (en) * 2022-07-29 2022-12-02 江苏大学 Agricultural machinery operation performance prediction method and device based on kriging method and decision tree
CN116883175A (en) * 2023-07-10 2023-10-13 青岛闪收付信息技术有限公司 Investment and financing activity decision generation method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110084377A (en) * 2019-04-30 2019-08-02 京东城市(南京)科技有限公司 Method and apparatus for constructing decision tree
CN111178408A (en) * 2019-12-19 2020-05-19 中国科学院计算技术研究所 Health monitoring model construction method and system based on federal random forest learning
CN111598186A (en) * 2020-06-05 2020-08-28 腾讯科技(深圳)有限公司 Decision model training method, prediction method and device based on longitudinal federal learning
CN111695697A (en) * 2020-06-12 2020-09-22 深圳前海微众银行股份有限公司 Multi-party combined decision tree construction method and device and readable storage medium
CN111768040A (en) * 2020-07-01 2020-10-13 深圳前海微众银行股份有限公司 Model interpretation method, device, equipment and readable storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110084377A (en) * 2019-04-30 2019-08-02 京东城市(南京)科技有限公司 Method and apparatus for constructing decision tree
CN111178408A (en) * 2019-12-19 2020-05-19 中国科学院计算技术研究所 Health monitoring model construction method and system based on federal random forest learning
CN111598186A (en) * 2020-06-05 2020-08-28 腾讯科技(深圳)有限公司 Decision model training method, prediction method and device based on longitudinal federal learning
CN111695697A (en) * 2020-06-12 2020-09-22 深圳前海微众银行股份有限公司 Multi-party combined decision tree construction method and device and readable storage medium
CN111768040A (en) * 2020-07-01 2020-10-13 深圳前海微众银行股份有限公司 Model interpretation method, device, equipment and readable storage medium

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114969543A (en) * 2022-06-15 2022-08-30 北京百度网讯科技有限公司 Promotion method, promotion system, electronic device and storage medium
CN114969543B (en) * 2022-06-15 2023-08-25 北京百度网讯科技有限公司 Popularization method, popularization system, electronic equipment and storage medium
CN115423148A (en) * 2022-07-29 2022-12-02 江苏大学 Agricultural machinery operation performance prediction method and device based on kriging method and decision tree
CN115423148B (en) * 2022-07-29 2023-10-31 江苏大学 Agricultural machinery operation performance prediction method and device based on Ke Li jin method and decision tree
CN116883175A (en) * 2023-07-10 2023-10-13 青岛闪收付信息技术有限公司 Investment and financing activity decision generation method and device

Similar Documents

Publication Publication Date Title
CN112699947A (en) Decision tree based prediction method, apparatus, device, medium, and program product
Sajjad et al. Mobile-cloud assisted framework for selective encryption of medical images with steganography for resource-constrained devices
US20190050599A1 (en) Method and device for anonymizing data stored in a database
WO2016089710A1 (en) Secure computer evaluation of decision trees
CN112232325B (en) Sample data processing method and device, storage medium and electronic equipment
US11763135B2 (en) Concept-based adversarial generation method with steerable and diverse semantics
Pentyala et al. Privacy-preserving video classification with convolutional neural networks
CN111767906A (en) Face detection model training method, face detection device and electronic equipment
WO2023168903A1 (en) Model training method and apparatus, identity anonymization method and apparatus, device, storage medium, and program product
Bi et al. Achieving lightweight and privacy-preserving object detection for connected autonomous vehicles
CN115842627A (en) Decision tree evaluation method, device, equipment and medium based on secure multi-party computation
CN111767411A (en) Knowledge graph representation learning optimization method and device and readable storage medium
Cai et al. Privacy‐preserving CNN feature extraction and retrieval over medical images
Stergiou et al. Federated learning approach decouples clients from training a local model and with the communication with the server
CN111539008B (en) Image processing method and device for protecting privacy
Jasmine et al. A privacy preserving based multi-biometric system for secure identification in cloud environment
CN111723740A (en) Data identification method, device, equipment and computer readable storage medium
Mansouri et al. PAC: Privacy-preserving arrhythmia classification with neural networks
Benkraouda et al. Image reconstruction attacks on distributed machine learning models
CN115481415A (en) Communication cost optimization method, system, device and medium based on longitudinal federal learning
CN116665261A (en) Image processing method, device and equipment
Imtiaz et al. A correlated noise-assisted decentralized differentially private estimation protocol, and its application to fMRI source separation
CN112836767A (en) Federal modeling method, apparatus, device, storage medium, and program product
Tiwari et al. Security Protection Mechanism in Cloud Computing Authorization Model Using Machine Learning Techniques
JP2021120840A (en) Learning method, device, and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination