CN112699947A

CN112699947A - Decision tree based prediction method, apparatus, device, medium, and program product

Info

Publication number: CN112699947A
Application number: CN202011642783.5A
Authority: CN
Inventors: 周雨豪
Original assignee: WeBank Co Ltd
Current assignee: WeBank Co Ltd
Priority date: 2020-12-30
Filing date: 2020-12-30
Publication date: 2021-04-23

Abstract

The invention relates to the technical field of federal learning, and discloses a prediction method, a device, equipment, a medium and a program product based on a decision tree, wherein the method comprises the following steps: acquiring a target decision tree classification model of a client; acquiring a target sample, inputting the target sample into a target decision tree classification model, and acquiring a prediction classification result of the target sample through the target decision tree classification model; compared with the prior art that the prediction based on the decision tree is carried out by adopting a deep learning model, the prediction based on the decision tree is carried out by adopting the decision tree classification model, so that the model cost is reduced, in addition, the prediction process of the model is visualized by calling the preset model independent interpretation method, and further the confidence of the model prediction result is improved.

Description

Decision tree based prediction method, apparatus, device, medium, and program product

Technical Field

The invention relates to the technical field of federal learning, in particular to a prediction method, a prediction device, prediction equipment, prediction media and a program product based on a decision tree.

Background

In recent years, due to the problem of data privacy, most organizations generally use user data collected by different organizations for modeling, and rarely share the collected data among the organizations, for example, medical organizations use medical data for modeling, however, a model constructed in this way generally has a problem of poor generalization capability due to insufficient data volume. Therefore, how to construct a model with better generalization performance on the premise of protecting user data privacy is a major concern in the medical field at present.

Therefore, in recent years, research results of applying the federal learning technology to such scenes are emerging, but most of the research results use a complex deep learning model, although private data is not exposed among participants, a large amount of intermediate information still needs to be interacted, the requirement on communication bandwidth is high, the cost of prediction by using the deep learning model is high, and in addition, the complexity of the deep learning model is high, the prediction process of the model is non-visual, and most users cannot know the reason of the prediction result output by the model, so the confidence of the model prediction result is low.

Disclosure of Invention

The invention provides a prediction method, a prediction device, a prediction equipment, a prediction medium and a prediction program product based on a decision tree, and aims to solve the technical problems that the cost of a currently used model is high and the confidence of a model prediction result is low.

In order to achieve the above object, the present invention provides a prediction method based on a decision tree, which is applied to a client participating in horizontal federal learning, and the method includes:

obtaining a target decision tree classification model of the client, wherein the target decision tree classification model is obtained by performing horizontal federal learning on a plurality of clients participating in horizontal federal learning through a preset coordinator;

acquiring a target sample, inputting the target sample into the target decision tree classification model, and obtaining a prediction classification result of the target sample through the target decision tree classification model;

and calling a preset model independent interpretation method to perform interpretation analysis on the prediction process of the prediction classification result of the target sample obtained by the target decision tree classification model so as to obtain a classification characteristic interpretation result of the prediction classification result.

Preferably, before the step of obtaining the target decision tree classification model of the client, the method further includes:

constructing a current node of a client decision tree, and acquiring node characteristic information of the current node;

and encrypting and sending the node characteristic information as an intermediate result to a preset coordinator, so as to obtain a target decision tree classification model by coordinating a plurality of clients participating in horizontal federated learning through the preset coordinator.

Preferably, the step of sending the node feature information as an intermediate result to a preset coordinator in an encrypted manner to obtain a target decision tree classification model by coordinating a plurality of clients participating in horizontal federal learning through the preset coordinator includes:

encrypting and sending the node characteristic information serving as an intermediate result to a preset coordinator, so as to select the optimal intermediate result from a plurality of intermediate results respectively corresponding to a plurality of clients participating in horizontal federal learning through the preset coordinator;

dividing local medical training data corresponding to the current node according to the optimal intermediate result, and constructing left and right subtrees of the current node according to the divided target medical training data;

respectively taking the left subtree and the right subtree as current nodes, and returning to execute the step of acquiring the node characteristic information of the current nodes;

and continuously executing the step of encrypting and sending the node characteristic information serving as an intermediate result to a preset coordinator until the decision tree of the client converges or a preset maximum iteration ethic is reached, so as to coordinate a plurality of clients participating in horizontal federated learning by the preset coordinator to obtain a target decision tree classification model.

Preferably, the node feature information includes a partition attribute, a partition threshold, and an information gain of the current node, and the step of sending the node feature information as an intermediate result to a preset coordinator in an encrypted manner, so as to select an optimal intermediate result from among a plurality of intermediate results respectively corresponding to a plurality of clients participating in horizontal federal learning by the preset coordinator includes:

and encrypting and sending the partition attribute, the partition threshold and the information gain of the current node as intermediate results to a preset coordinator, wherein the preset coordinator decrypts the encrypted intermediate results to select the partition attribute and the partition threshold corresponding to the maximum information gain from the intermediate results of the plurality of clients participating in the horizontal federal learning as optimal intermediate results, and encrypting and sending the optimal intermediate results to the clients participating in the horizontal federal learning.

Preferably, before the step of sending the node characteristic information as an intermediate result to the preset coordinator through encryption, the method further includes:

and receiving a public key matched with the private key of the preset coordinator sent by the preset coordinator, so as to encrypt the intermediate result through the public key.

Preferably, the predetermined model independent interpretation method comprises a local proxy independent interpretation method and/or a salpril value method.

Preferably, before the step of constructing the current node of the client decision tree, the method further includes:

acquiring local training data;

determining feature attributes of the local training data;

and classifying the local training data based on the characteristic attributes to construct a current node of a client decision tree.

In addition, to achieve the above object, the present invention provides a prediction apparatus based on a decision tree, including:

the system comprises a first obtaining module, a second obtaining module and a third obtaining module, wherein the first obtaining module is used for obtaining a target decision tree classification model of the client, and the target decision tree classification model is obtained by performing horizontal federal learning on a plurality of clients participating in horizontal federal learning through a preset coordinator;

the second obtaining module is used for obtaining a target sample and inputting the target sample into the target decision tree classification model so as to obtain a prediction classification result of the target sample through the target decision tree classification model;

and the calling module is used for calling a preset model independent interpretation method to interpret and analyze the preset step of obtaining the predicted classification result of the target sample through the target decision tree classification model so as to obtain the characteristic interpretation result of the predicted classification result.

In addition, to achieve the above object, the present invention further provides a prediction device based on a decision tree, which includes a processor, a memory and a prediction program based on a decision tree stored in the memory, wherein when the prediction program based on a decision tree is executed by the processor, the steps of the prediction method based on a decision tree as described above are implemented.

In addition, to achieve the above object, the present invention further provides a computer storage medium having a prediction program based on a decision tree stored thereon, wherein the prediction program based on a decision tree realizes the steps of the prediction method based on a decision tree as described above when the prediction program based on a decision tree is executed by a processor.

Compared with the prior art, the invention provides a prediction method based on a decision tree, which is characterized in that a target decision tree classification model of the client is obtained, wherein the target decision tree classification model is obtained by performing horizontal federal learning on a plurality of clients participating in horizontal federal learning through a preset coordinator; acquiring a target sample, inputting the target sample into the target decision tree classification model, and obtaining a prediction classification result of the target sample through the target decision tree classification model; and calling a preset model independent interpretation method to interpret and analyze the prediction process of the prediction classification result of the target sample obtained by the target decision tree classification model to obtain the classification characteristic interpretation result of the prediction classification result, so that compared with the conventional method of using a deep learning model to predict based on the decision tree, the prediction method based on the decision tree provided by the invention uses the decision tree classification model to predict based on the decision tree, thereby reducing the cost of the model, and in addition, the prediction process of the model is analyzed and analyzed by calling the preset model independent interpretation method to visualize the prediction process of the model, thereby improving the confidence of the model prediction result.

Drawings

FIG. 1 is a diagram illustrating a hardware architecture of a decision tree based prediction device according to various embodiments of the present invention;

FIG. 2 is a flow chart illustrating a first embodiment of a decision tree based prediction method according to the present invention;

FIG. 3 is a diagram of a client hardware architecture participating in horizontal federated learning according to an embodiment of the present invention;

FIG. 4 is a flow chart illustrating a second embodiment of the decision tree based prediction method of the present invention;

FIG. 5 is a functional block diagram of an embodiment of a decision tree based prediction apparatus according to the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Referring to fig. 1, fig. 1 is a schematic diagram of a hardware structure of a prediction apparatus based on a decision tree according to embodiments of the present invention. In this embodiment of the present invention, the prediction device based on the decision tree may include a processor 1001 (e.g., a Central Processing Unit, CPU), a communication bus 1002, an input port 1003, an output port 1004, and a memory 1005. The communication bus 1002 is used for realizing connection communication among the components; the input port 1003 is used for data input; the output port 1004 is used for data output, the memory 1005 may be a high-speed RAM memory, or a non-volatile memory (non-volatile memory), such as a magnetic disk memory, and the memory 1005 may optionally be a storage device independent of the processor 1001. Those skilled in the art will appreciate that the hardware configuration depicted in FIG. 1 is not intended to be limiting of the present invention, and may include more or less components than those shown, or some components in combination, or a different arrangement of components.

With continued reference to FIG. 1, the memory 1005 of FIG. 1, which is one type of readable storage medium, may include an operating system, a network communication module, an application program module, and a decision tree based prediction program. In the terminal shown in fig. 1, the network interface 1004 is mainly used for connecting to a backend server and performing data communication with the backend server; the user interface 1003 is mainly used for connecting a client (user side) and performing data communication with the client; and processor 1001 may be used to invoke a decision tree based prediction program stored in memory 1005.

In this embodiment, the prediction apparatus based on the decision tree includes: a memory 1005, a processor 1001, and a prediction program based on decision tree stored in the memory 1005 and operable on the processor 1001, wherein the processor 1001, when calling the prediction program based on decision tree stored in the memory 1005, performs the following operations:

Further, processor 1001 may call a decision tree based prediction program stored in memory 1005, and also perform the following operations:

acquiring local training data;

determining feature attributes of the local training data;

Based on the hardware structure shown in fig. 1, a first embodiment of the present invention provides a prediction method based on a decision tree.

Referring to fig. 2, fig. 2 is a flowchart illustrating a first embodiment of a decision tree-based prediction method according to the present invention.

While a logical order is shown in the flow chart, in some cases, the steps shown or described may be performed in an order different than that shown. Specifically, the prediction method based on the decision tree in this embodiment includes:

step S10: obtaining a target decision tree classification model of the client, wherein the target decision tree classification model is obtained by performing horizontal federal learning on a plurality of clients participating in horizontal federal learning through a preset coordinator;

specifically, in this embodiment, it should be noted that, the prediction method based on the decision tree proposed by the present invention is applied to a client participating in horizontal federal learning, and it is easy to understand that, in recent years, due to the problem of data privacy, in general, different clients use respectively collected user data to perform modeling, and the clients rarely share the collected data, for example, three medical institutions, i.e. medical institution a, medical institution b, and medical institution c, use respectively local medical data of their respective medical institutions to perform modeling, however, the model constructed in this case has poor generalization capability due to insufficient data volume and insufficient data characteristics, so in this embodiment, as shown in fig. 3, a client hardware architecture diagram participating in horizontal federal learning in an embodiment is provided, and a plurality of clients are adopted to perform horizontal federated learning modeling, so that a model with better generalization performance is constructed on the premise of protecting the privacy of user data.

In addition, in some embodiments, a plurality of clients perform horizontal federal learning to construct a deep learning model, however, although private data is not exposed among all participants, a large amount of intermediate information still needs to be interacted, the requirement on the communication bandwidth is high, namely, the cost for predicting by using the deep learning model is high, in this embodiment, a target decision tree classification model constructed by performing horizontal federal learning by using the plurality of clients performs classification prediction, so that the target decision tree classification model is obtained by performing horizontal federal learning by using a preset coordinator, the generalization capability of the target decision tree classification model is strong, and when prediction based on a decision tree is performed by using the target decision tree classification model, the process communication load is low, the amount of computation is small, and the model prediction cost is reduced.

Step S20: acquiring a target sample, inputting the target sample into the target decision tree classification model, and obtaining a prediction classification result of the target sample through the target decision tree classification model;

specifically, the target samples refer to samples to be subjected to prediction classification in a client, and in addition, it should be noted that in this embodiment, preferably, the client refers to a client having multiple medical data sets, that is, the prediction method based on the decision tree of this embodiment is applied to medical image recognition classification in the medical field.

Specifically, a medical image sample to be predicted and classified in the client is obtained, for example, a brain imaging image to be predicted and classified, that is, Rs-fMRI image data, obtained based on the Rs-fMRI technology is obtained, and as the Rs-fMRI image data may be affected by machines, environments and the like when being acquired, the acquired Rs-fMRI image data may have some noise influence, in this embodiment, before the Rs-fMRI image data is input to a target decision tree classification model for classification, the Rs-fMRI image data needs to be denoised, preferably, in this embodiment, in order to better maintain image refinement of the original Rs-fMRI image data, the Rs-fMRI image data is denoised in a wavelet denoising manner, specifically, the Rs-fMRI image data is subjected to wavelet decomposition, and then, a high-frequency coefficient obtained after the threshold value decomposition is quantized, and finally, reconstructing Rs-fMRI image data by using the two-dimensional wavelet, and in addition, performing image denoising by using other denoising methods, such as an average filter, a morphological noise filter, and the like, which is not limited in this embodiment.

In addition, in order to improve the accuracy of the prediction result, in this embodiment, before the target sample is input into the target decision tree classification model for prediction classification, normalization, grayscale binarization, vectorization, and the like are performed on the target sample, so that irrelevant information in the original target sample is eliminated, useful real information in the original target sample is restored, the detectability of the target sample is enhanced, and the feature data is simplified to the maximum extent, thereby improving the accuracy of the prediction result.

Therefore, in this embodiment, after the target sample is obtained and preprocessed, the target feature vector is obtained, and finally, the target feature vector is input to the target decision tree classification model, so as to obtain the prediction classification result of the target sample through the target decision tree classification model.

Step S30: and calling a preset model independent interpretation method to perform interpretation analysis on the prediction process of the prediction classification result of the target sample obtained by the target decision tree classification model so as to obtain a classification characteristic interpretation result of the prediction classification result.

Specifically, the preset Model-independent Interpretation method refers to a pre-constructed Model-independent Interpretation method (Model-independent Interpretation Methods), and preferably, in this embodiment, the preset Model-independent Interpretation method includes a Local agency-independent Interpretation (LIME) method and/or a Shapley Values (sharey Values) method.

It is easy to understand that since the process of model prediction is non-visual, most users cannot know the reason of the prediction result output by the model, and thus the confidence of the model prediction result is low, in this embodiment, after the prediction classification result of the target sample is obtained by the target decision tree classification model, in order to improve the confidence of the model prediction result, a preset model-independent interpretation method is called to perform interpretation analysis on the prediction process of the prediction classification result of the target sample obtained by the target decision tree classification model, so that the process of model prediction is visually output.

For ease of understanding, the present embodiment illustrates the above scheme:

for example, when analyzing the above model prediction process by using a Local interpretation-independent model (LIME) method, after inputting a target sample into a target decision tree classification model to obtain a prediction classification result of the target sample through the target decision tree classification model, calling a Local interpretation-independent method to process the target sample, for example, the target sample is a chest image sample, calling the Local interpretation-independent method to perform partial image feature concealment on the chest image sample, for example, dividing the chest image sample into a plurality of partial chest images, calling the Local interpretation-independent method to conceal the plurality of partial chest images one by one, specifically, retaining one partial chest image, and setting image pixels of the chest image sample except the partial chest image to gray or setting the parts of the chest image sample except the partial chest image to blank images, and inputting the adjusted chest image sample into a target decision tree classification model to obtain a prediction classification result of the adjusted chest image sample through the target decision tree classification model, so that a model prediction regression model is obtained based on a plurality of partial chest images obtained by dividing the chest image sample, and finally, a classification feature interpretation result of the prediction classification result can be obtained based on the prediction classification result of the model prediction regression model corresponding to the complete chest image sample, namely, the prediction classification result is generated based on which partial image feature in the chest image sample.

In addition, the step of analyzing the model prediction process by using the Shapley Values method is the same as the step of analyzing the model prediction process by using the Local interpretive model-agnostic extensions (LIME) method, that is, the target sample is divided into a plurality of partial samples, and then the Shapley Values method is called to obtain the contribution of each partial sample to the prediction classification result of the target sample one by one, so that which partial sample (i.e. sample characteristic) in the target sample leads to the generation of the prediction classification result can be known.

It should be understood that the above is only an example, and the technical solution of the present invention is not limited in any way, and those skilled in the art can make settings based on needs in practical applications, and the settings are not listed here.

According to the scheme, the target decision tree classification model of the client is obtained, wherein the target decision tree classification model is obtained by performing horizontal federal learning on a plurality of clients participating in horizontal federal learning through a preset coordinator; acquiring a target sample, inputting the target sample into the target decision tree classification model, and obtaining a prediction classification result of the target sample through the target decision tree classification model; and calling a preset model independent interpretation method to interpret and analyze the prediction process of the prediction classification result of the target sample obtained by the target decision tree classification model to obtain the classification characteristic interpretation result of the prediction classification result, so that compared with the conventional method of using a deep learning model to predict based on the decision tree, the prediction method based on the decision tree provided by the invention uses the decision tree classification model to predict based on the decision tree, thereby reducing the cost of the model, and in addition, the prediction process of the model is analyzed and analyzed by calling the preset model independent interpretation method to visualize the prediction process of the model, thereby improving the confidence of the model prediction result.

Further, based on the first embodiment of the decision tree based prediction method of the present invention, a second embodiment of the decision tree based prediction method of the present invention is proposed.

Referring to FIG. 4, FIG. 4 is a flowchart illustrating a second embodiment of a decision tree-based prediction method according to the present invention;

the second embodiment of the decision tree based prediction method is different from the first embodiment of the decision tree based prediction method in that, before the step of obtaining the target decision tree classification model of the client, the method further includes:

step S101: constructing a current node of a client decision tree, and acquiring node characteristic information of the current node;

step S102: and encrypting and sending the node characteristic information as an intermediate result to a preset coordinator, so as to obtain a target decision tree classification model by coordinating a plurality of clients participating in horizontal federated learning through the preset coordinator.

In this embodiment, a specific implementation scheme for constructing a target decision tree classification model of a client is provided, and specifically, a same target decision tree classification model is constructed by coordinating a plurality of clients participating in horizontal federal learning through a preset coordinator.

It should be understood that, in order to protect data privacy, the clients participating in the horizontal federal learning are independent from each other, so there is a certain difference in data between the clients, and it should be noted that, since the decision tree-based prediction method provided by the present invention is applied to clients possessing multiple medical data sets, since there is more data feature overlap and less data overlap between the multiple clients possessing multiple medical data sets, in this embodiment, referring to fig. 3, as shown in fig. 3, a hardware architecture diagram of a participating part participating in the horizontal federal learning in an embodiment, a same target decision tree classification model is constructed between the multiple clients by using the horizontal federal learning.

Specifically, after a current node of the decision tree is locally built by each client, node characteristic information of the current node, such as a partition attribute, a partition threshold and information gain of the current node, is sent to a preset coordinator as an intermediate result of current horizontal federal training, so that a same target decision tree classification model is built by coordinating a plurality of clients participating in horizontal federal learning through the preset coordinator.

In addition, in some embodiments, in order to improve the confidentiality during data interaction, before the client encrypts and sends the node characteristic information as an intermediate result to the preset coordinator, the client further needs to encrypt the intermediate result, so as to avoid leakage of the node characteristic information of the client, and therefore in this embodiment, before the step of encrypting and sending the node characteristic information as the intermediate result to the preset coordinator, the method further includes:

Before a preset coordinator establishes a same target decision tree classification model in coordination with a plurality of clients participating in horizontal federated learning, the coordinator first generates a pair of a public key and a private key, for example, a private key and a public key matched with the private key are generated based on a homomorphic encryption algorithm, and then the public key is sent to each client so as to encrypt data when the clients and the preset coordinator perform data interaction.

In addition, in some embodiments, before the step of constructing the current node of the client decision tree, the method further includes:

acquiring local training data;

determining feature attributes of the local training data;

Specifically, the feature attributes of the local training data refer to feature attributes used for classifying and dividing the local training data, for example, the local training data is user body feature data acquired by a plurality of users of different age groups under different environmental conditions, where the feature attributes of the local training data include age features, gender features, and environmental condition features, and it is easy to understand that the decision tree represents a mapping relationship between object attributes and object values, where each node in the tree represents an object, and each divergent path represents a possible attribute value, and therefore in this embodiment, the local training data is classified according to the feature attributes of the local training data to construct a current node of the client decision tree, and optionally, the local training data is classified according to one feature attribute to construct a current node of the client decision tree at random, or selecting the characteristic attribute with the largest information gain of the node to classify the local training data to construct the current node of the client decision tree, which is not limited in this embodiment.

In addition, for convenience of understanding, this embodiment specifically describes the above-mentioned embodiment in which the node feature information is sent to a preset coordinator as an intermediate result in an encrypted manner, so as to obtain a target decision tree classification model by coordinating multiple clients participating in horizontal federal learning by the preset coordinator:

In this step, it is easy to understand that, since the plurality of clients participating in the horizontal federal learning are independent from each other, therefore, the current node of the client decision tree constructed by each client is different, so in this embodiment, in order to improve the construction of a decision tree classification model with better effect, each client is adopted to encrypt and send the node characteristic information of the current node of the decision tree of each client to a preset coordinator, under the condition of not revealing the data privacy of each client, the current node of an optimal decision tree is selected by a preset coordinator, thus, the steps are continuously and circularly executed until the decision tree of the client side converges or reaches the preset maximum iteration times, and ensuring that all decision tree nodes for acquiring the target decision tree classification model are optimal nodes, thereby ensuring the classification accuracy of the target decision tree classification model.

For convenience of understanding, this embodiment specifically describes the above-mentioned embodiment in which the node feature information is sent to a preset coordinator as an intermediate result in an encrypted manner, so that the preset coordinator selects an optimal intermediate result from among a plurality of intermediate results respectively corresponding to a plurality of clients participating in horizontal federal learning:

Specifically, the node feature information includes a partition attribute, a partition threshold, and an information gain of the current node, where the information gain is a difference between entropy values of a parent node and a child node after partitioning, and the information gain is an important index for feature selection, and is defined as how much information can be brought to the classification system by a feature, and if the information is more brought, the more important the feature is, the more the corresponding information gain is, so in this embodiment, the optimal intermediate result represents the partition attribute and the partition threshold corresponding to the maximum information gain in the decision tree of the multiple clients participating in the horizontal federal learning, and the partition attribute and the partition threshold corresponding to the maximum information gain are determined after the maximum information gain is selected by the pre-set coordinator from the multiple partition attributes, partition thresholds, and information gains sent from the multiple clients participating in the horizontal federal learning, and returning the optimal intermediate result to each client as the optimal intermediate result, so that a plurality of clients which coordinate and participate in the horizontal federated learning construct a same target decision tree classification model.

According to the scheme, the current node of the client decision tree is constructed, and the node characteristic information of the current node is obtained; the node characteristic information is used as an intermediate result to be encrypted and sent to a preset coordinator, so that a target decision tree classification model is obtained through coordination of a plurality of clients participating in horizontal federated learning by the preset coordinator, and compared with the method for performing decision tree-based prediction by adopting a deep learning neural network model in the prior art, the method for constructing the decision tree model with low complexity is adopted, and the model cost is reduced.

In addition, the embodiment also provides a prediction device based on the decision tree. Referring to fig. 5, fig. 5 is a functional block diagram of an embodiment of a prediction apparatus based on a decision tree according to the present invention.

In this embodiment, the prediction device based on the decision tree is a virtual device, and is stored in the memory 1005 of the prediction device based on the decision tree shown in fig. 1, so as to realize all functions of the prediction program based on the decision tree: the target decision tree classification model is used for obtaining the client, and is obtained by performing horizontal federal learning on a plurality of clients participating in horizontal federal learning through a preset coordinator; the system comprises a target decision tree classification model, a target sample acquisition module, a target decision tree classification module and a target classification module, wherein the target decision tree classification model is used for acquiring a target sample and inputting the target sample into the target decision tree classification model so as to obtain a prediction classification result of the target sample through the target decision tree classification model; and the method is used for calling a preset model independent interpretation method to carry out interpretation analysis on the prediction process of the prediction classification result of the target sample obtained by the target decision tree classification model so as to obtain the classification characteristic interpretation result of the prediction classification result.

Specifically, referring to fig. 5, the decision tree-based prediction apparatus includes:

the first obtaining module 10 is configured to obtain a target decision tree classification model of the client, where the target decision tree classification model is obtained by performing horizontal federal learning on a plurality of clients participating in horizontal federal learning through a preset coordinator;

a second obtaining module 20, configured to obtain a target sample, and input the target sample to the target decision tree classification model, so as to obtain a predicted classification result of the target sample through the target decision tree classification model;

the invoking module 30 is configured to invoke a preset model-independent interpretation method to perform interpretation analysis on a preset step of obtaining a predicted classification result of a target sample through the target decision tree classification model, so as to obtain a feature interpretation result of the predicted classification result.

In addition, an embodiment of the present invention further provides a computer storage medium, where a prediction program based on a decision tree is stored on the computer storage medium, and when the prediction program based on the decision tree is executed by a processor, the steps of the prediction method based on the decision tree are implemented, which are not described herein again.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for causing a terminal device to execute the method according to the embodiments of the present invention.

The above description is only for the preferred embodiment of the present invention and is not intended to limit the scope of the present invention, and all equivalent structures or flow transformations made by the present specification and drawings, or applied directly or indirectly to other related arts, are included in the scope of the present invention.

Claims

1. A prediction method based on a decision tree is applied to a client participating in horizontal federated learning, and the method comprises the following steps:

2. The decision tree-based prediction method of claim 1, wherein the step of obtaining the target decision tree classification model of the client is preceded by the steps of:

3. The decision tree-based prediction method according to claim 2, wherein the step of sending the node feature information as an intermediate result to a preset coordinator in an encrypted manner so as to obtain a target decision tree classification model by coordinating a plurality of clients participating in horizontal federal learning through the preset coordinator comprises:

4. The decision tree-based prediction method according to claim 3, wherein the node feature information includes a partition attribute, a partition threshold, and an information gain of a current node, and the step of sending the node feature information as an intermediate result to a preset coordinator in an encrypted manner so as to select an optimal intermediate result from among a plurality of intermediate results respectively corresponding to a plurality of clients participating in horizontal federal learning by the preset coordinator comprises:

5. The decision tree-based prediction method according to claim 2, wherein before the step of sending the node feature information as an intermediate result to the predetermined coordinator in an encrypted manner, the method further comprises:

6. The decision tree based prediction method of claim 2, wherein the step of constructing the current node of the client decision tree is preceded by the step of:

acquiring local training data;

determining feature attributes of the local training data;

7. The decision tree based prediction method according to any of claims 1 to 6, wherein the predetermined model independent interpretation method comprises a local surrogate independent interpretation method and/or a Shapril value method.

8. A decision tree based prediction apparatus, the decision tree based prediction apparatus comprising:

9. Decision tree based prediction device, characterized in that it comprises a processor, a memory and a decision tree based prediction program stored in said memory, which when executed by said processor implements the steps of the decision tree based prediction method according to any of claims 1-7.

10. A computer storage medium having stored thereon a decision tree based prediction program, the decision tree based prediction program when executed by a processor implementing the steps of the decision tree based prediction method according to any one of claims 1-7.

11. A computer program product comprising a computer program, wherein the computer program, when executed by a processor, implements the steps of the federated modeling method as recited in any one of claims 1-7.