CN115563564A

CN115563564A - Processing method and device of decision tree model, computer equipment and storage medium

Info

Publication number: CN115563564A
Application number: CN202211533222.0A
Authority: CN
Inventors: 陈瑞钦; 蒋杰; 刘煜宏; 陈鹏; 程勇
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-12-02
Filing date: 2022-12-02
Publication date: 2023-01-03
Anticipated expiration: 2042-12-02
Also published as: CN115563564B

Abstract

The application relates to a processing method, a processing device, a computer device, a storage medium and a computer program product of a decision tree model. The processing method of the decision tree model can be applied to the fields of artificial intelligence and intelligent driving. The processing method of the decision tree model comprises the following steps: receiving a first gradient ciphertext and a second gradient ciphertext transmitted by a first object; in a trusted execution environment, decrypting the first gradient ciphertext and the second gradient ciphertext, determining histogram information of a target node in a decision tree model based on a training sample, and a first gradient and a second gradient obtained by decryption, further determining a first splitting gain value of the target node, and encrypting the first splitting gain value; and when the encrypted first split gain value is obtained, node splitting is carried out on the target node based on the encrypted first split gain value. By adopting the method, the efficiency of determining the histogram information can be improved under the condition of ensuring the data safety of the first object, and the adjusting and optimizing efficiency of federal learning is further improved.

Description

Processing method and device of decision tree model, computer equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, a computer device, a storage medium, and a computer program product for processing a decision tree model.

Background

The artificial intelligence model needs a large amount of data for training, but in an actual scene, the database may be distributed in different organizations, companies and parts, and cannot be trained in a data sharing mode due to privacy limitation. Federal learning is distributed machine learning, and all participants can jointly model by means of data of other parties without sharing data resources.

In the traditional technology, an object A with label data encrypts a gradient and sends the gradient to an object B without label data, the object B determines a histogram based on the encrypted gradient, but the time consumption for determining the histogram based on the encrypted gradient is long, and the model optimization efficiency of federal learning is seriously influenced.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a processing method, an apparatus, a computer device, a computer readable storage medium, and a computer program product for a decision tree model, which can improve the efficiency of determining histogram information and further improve the efficiency of model tuning.

In a first aspect, the present application provides a method for processing a decision tree model. The method comprises the following steps:

receiving a first gradient ciphertext and a second gradient ciphertext transmitted by a first object; in a trusted execution environment, decrypting the first gradient ciphertext and the second gradient ciphertext, determining histogram information of a target node in the decision tree model based on the training sample, the decrypted first gradient and the decrypted second gradient, determining a first splitting gain value of the target node in the decision tree model according to the histogram information, and encrypting the first splitting gain value; the first gradient and the second gradient are gradients of different orders; and when the encrypted first splitting gain value is obtained, node splitting is carried out on a target node in the decision tree model based on the encrypted first splitting gain value.

In a second aspect, the application further provides a processing method of the decision tree model. The method comprises the following steps:

sending a first gradient ciphertext and a second gradient ciphertext to a second object so that the second object decrypts the first gradient ciphertext and the second gradient ciphertext in a trusted execution environment, determining histogram information of a target node in a decision tree model based on a training sample, a first gradient and a second gradient obtained by decryption, determining a first splitting gain value of the target node in the decision tree model according to the histogram information, and encrypting the first splitting gain value; receiving an encrypted first split gain value sent by a second object; and performing node splitting on the target node in the decision tree model based on the encrypted first splitting gain value.

In a third aspect, the present application further provides a processing apparatus for a decision tree model. The device comprises:

the gradient ciphertext receiving module is used for receiving a first gradient ciphertext and a second gradient ciphertext which are sent by a first object;

a first split gain value determination module, configured to decrypt the first gradient ciphertext and the second gradient ciphertext in a trusted execution environment, determine histogram information of a target node in a decision tree model based on a training sample, a decrypted first gradient and a decrypted second gradient, determine a first split gain value of the target node in the decision tree model according to the histogram information, and encrypt the first split gain value; the first gradient and the second gradient are gradients of different orders;

and the node splitting module is used for splitting a target node in the decision tree model based on the encrypted first split gain value when the encrypted first split gain value is obtained.

In a fourth aspect, the present application further provides a processing apparatus for a decision tree model. The device comprises:

a gradient ciphertext sending module, configured to send a first gradient ciphertext and a second gradient ciphertext to a second object, so that the second object decrypts the first gradient ciphertext and the second gradient ciphertext in a trusted execution environment, determines histogram information of a target node in a decision tree model based on a training sample, a first gradient and a second gradient obtained through decryption, determines a first splitting gain value of the target node in the decision tree model according to the histogram information, and encrypts the first splitting gain value;

the encrypted first split gain value acquisition module is used for receiving the encrypted first split gain value sent by the second object;

and the second node splitting module is used for splitting the target node in the decision tree model based on the encrypted first splitting gain value.

In a fifth aspect, the present application further provides a computer device. The computer device comprises a memory in which a computer program is stored and a processor which, when executing the computer program, implements the processing method of the decision tree model of the first or second aspect.

In a sixth aspect, the present application further provides a computer-readable storage medium. The computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the method of processing the decision tree model of the first or second aspect described above.

In a seventh aspect, the present application further provides a computer program product. The computer program product comprises a computer program which, when executed by a processor, implements the method of processing the decision tree model of the first or second aspect.

The second object receives a first gradient ciphertext and a second gradient ciphertext which are sent by the first object, decrypts the first gradient ciphertext and the second gradient ciphertext in a trusted execution environment, determines histogram information of a target node based on a training sample and a first gradient and a second gradient obtained through decryption in the trusted execution environment, determines a first splitting gain value based on the histogram information, and performs node splitting on the target node based on the encrypted first splitting gain value; the second object decrypts the first gradient ciphertext and the second gradient ciphertext in the trusted execution environment, and cannot access the trusted execution environment, so that the second object cannot obtain the first gradient and the second gradient, the tag data of the first object cannot be leaked, and the data security of the first object is ensured; the second object determines the histogram information based on the first gradient and the second gradient of the plaintext in the trusted execution environment, and compared with the method for determining the histogram information based on the encrypted gradient, the method reduces the calculated data amount, greatly improves the efficiency of determining the histogram information, further improves the optimization efficiency of federal learning, and further improves the training efficiency of the decision tree model.

Drawings

FIG. 1 is a diagram of an application environment of a processing method of a decision tree model in one embodiment;

FIG. 2 is a flow diagram illustrating a method for processing a decision tree model in one embodiment;

FIG. 3 is a diagram illustrating splitting of a target node to obtain a left child node and a right child node in an embodiment;

FIG. 4 is a diagram illustrating a process for a second object to obtain a symmetric key according to an embodiment;

FIG. 5 is a schematic flow chart illustrating the process of determining histogram information for a target node in one embodiment;

FIG. 6 is a schematic illustration of binning a sample to be processed in one embodiment;

FIG. 7 is a diagram illustrating stacking of first-order gradients of bins to obtain a stacked value of first-order gradients corresponding to the bins in an embodiment;

FIG. 8 is a schematic flow chart illustrating the determination of a first order gradient histogram and a second order gradient histogram of a target node in one embodiment;

FIG. 9 is a schematic diagram of a first order gradient histogram in one embodiment;

FIG. 10 is a diagram of a second order gradient histogram in one embodiment;

FIG. 11 is a diagram illustrating a method for processing a decision tree model in an embodiment of an application scenario;

FIG. 12 is a schematic diagram of a processing method of a decision tree model in another embodiment of an application scenario;

FIG. 13 is a diagram illustrating a method of processing a decision tree model in an exemplary embodiment;

FIG. 14 is a flowchart illustrating a method for processing a decision tree model in accordance with another embodiment;

FIG. 15 is a block diagram showing the structure of a processing means of a decision tree model in one embodiment;

FIG. 16 is a block diagram showing the structure of a decision tree model processing apparatus according to another embodiment;

FIG. 17 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence base technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

The processing method of the decision tree model provided by the embodiment of the application can be applied to the application environment shown in fig. 1. Wherein, the first terminal 102 and the second terminal 104 communicate with the server 106 through the network; the data storage system may store data that the server 106 needs to process. The data storage system may be integrated on the server 106, or may be located on the cloud or other network server.

The first object sends a first gradient ciphertext and a second gradient ciphertext to a second object through a first terminal 102, the second object receives the first gradient ciphertext and the second gradient ciphertext through a second terminal 104, the second object decrypts the first gradient ciphertext and the second gradient ciphertext in a trusted execution environment through the second terminal 104, histogram information of a target node in a decision tree model is determined based on a training sample and the first gradient and the second gradient obtained through decryption, a first splitting gain value of the target node in the decision tree model is determined according to the histogram information, the first splitting gain value is encrypted, and when the encrypted first splitting gain value is obtained, the second terminal 104 conducts node splitting on the target node in the decision tree model based on the encrypted first splitting gain value.

It should be noted that the processing method of the decision tree model provided by the present application may be applied to a scenario of federal learning, where the federal learning refers to that data of both parties are kept locally, and the both parties unite the data to train the model, that is, the data is unitedly trained under the condition that the both parties do not share the data. For example, the first object determines a first gradient ciphertext and a second gradient ciphertext based on the own data, the second object determines histogram information based on the own data, the first gradient ciphertext and the second gradient ciphertext, determines a first split gain value, and performs node splitting of the decision tree model based on the first split gain value to realize data combining the first object and the second object and train the decision tree model.

The first terminal 102 and the second terminal 104 may be a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, an internet of things device, and a portable wearable device, and the internet of things device may be a smart speaker, a smart television, a smart air conditioner, and a smart vehicle device. The portable wearable device can be a smart watch, a smart bracelet, a head-mounted device, and the like.

The server 106 may be an independent physical server or a service node in a block chain system, where the service nodes in the block chain system form a point-To-point (P2P) network, and the P2P Protocol is an application layer Protocol running on a Transmission Control Protocol (TCP).

In addition, the server 106 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a web service, cloud communication, a middleware service, a domain name service, a security service, a CDN, a big data and artificial intelligence platform, and the like. The terminal may be, but is not limited to, a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein.

The first terminal 102, the second terminal 104 and the server 106 may be connected through communication connection manners such as bluetooth, USB (Universal Serial Bus), or network, which is not limited herein.

In some embodiments, as shown in fig. 2, a method for processing a decision tree model is provided, which is described by taking the method as an example for being applied to a second terminal in fig. 1, and includes the following steps:

step S202, receiving a first gradient ciphertext and a second gradient ciphertext transmitted by a first object.

Wherein, the first object is an object with a training sample and a sample label, and can be a member organization, including but not limited to: enterprises and departments. The device corresponding to the first object is a first terminal, and the first object sends a first gradient ciphertext and a second gradient ciphertext through the first terminal; the device corresponding to the first object may also be a first server. It is to be understood that the actions performed by the first object, including but not limited to the first object sending, the first object receiving, the first object encrypting, the second object decrypting and the first object determining, are all the actions performed by the first object through the first terminal or the first server, and for convenience of description, in the embodiment of the present application, the actions performed by the first object through the first terminal or the first server are all described as the actions performed by the first object.

The first gradient ciphertext is obtained by encrypting the first gradient by the first object, and the second gradient ciphertext is obtained by encrypting the second gradient by the first object.

In practical application, the first object predicts a training sample according to a previous decision tree model to obtain a predicted value, determines a loss function according to a sample label and the predicted value of the training sample, and determines a first gradient and a second gradient according to the loss function, wherein the first gradient and the second gradient are determined based on the loss function of the predicted value corresponding to the previous decision tree model; the first object encrypts the first gradient and the second gradient to obtain a first gradient ciphertext and a second gradient ciphertext; in generating the current decision tree model, the first object sends the first gradient ciphertext and the second gradient ciphertext to the second object. And when the current decision tree model is the first decision tree model, determining a loss function by adopting the initialized predicted value, and encrypting the first gradient and the second gradient of the loss function to obtain a first gradient ciphertext and a second gradient ciphertext.

Exemplarily, the first object predicts a training sample according to a tth decision tree model to obtain a predicted value yt, determines a loss function based on the predicted value yt to obtain a first gradient Gt and a second gradient Ht, and when a t +1 decision tree model is trained, the first object sends the encrypted first gradient Gt and the encrypted second gradient Ht to the second object. In training the 1 st decision tree model, the first object determines a first gradient and a second gradient based on the initialized predicted value yc, and sends the encrypted first gradient and the encrypted second gradient to the second object.

The second object is an object with a training sample, and may be a member organization, including but not limited to: enterprises and departments. The device corresponding to the second object is a second terminal, and the second object receives the first gradient ciphertext and the second gradient ciphertext through the second terminal; the device corresponding to the second object may also be a second server. It is to be understood that the actions performed by the second object, including but not limited to the second object sending, the second object receiving, the first object encrypting, the second object decrypting and the second object determining, are all the actions performed by the second object through the second terminal or the second server.

The second object has the same training sample as the first object, and does not have the sample label of the training sample. In practical application, a first object is provided with a first initial sample, a second object is provided with a second initial sample, and an intersection of the first initial sample and the second initial sample is determined to obtain a training sample. The feature data of the training sample included in the first object is different from the feature data of the training sample included in the second object.

Illustratively, the first object is enterprise a, the second object is enterprise B, enterprise a relates to a borrowing business of virtual resources of area a, enterprise B relates to a saving business of virtual resources of area a, members participating in the businesses of enterprise a and enterprise B may overlap because enterprise a and enterprise B both relate to the businesses of area a, members in the overlapping portion are used as training samples of enterprise a and enterprise B, and the characteristic data of the training samples possessed by enterprise a and enterprise B are different because the businesses of enterprise a and enterprise B are different.

Specifically, because the first gradient and the second gradient are calculated by the first object according to the sample label thereof and include the related information of the sample label, if the second object acquires the first gradient and the second gradient, the second object can reversely deduce the sample label of the first object, which results in the sample label of the first object being leaked; in the embodiment of the application, the first object sends the first gradient ciphertext and the second gradient ciphertext to the second object, the second object receives the first gradient ciphertext and the second gradient ciphertext sent by the first object, and the second object cannot reversely push out the sample tag of the first object through the first gradient ciphertext and the second gradient ciphertext, so that the data security of the first object is ensured.

Step S204, in a trusted execution environment, decrypting the first gradient ciphertext and the second gradient ciphertext, determining histogram information of a target node in the decision tree model based on the training sample, the decrypted first gradient and the decrypted second gradient, determining a first splitting gain value of the target node in the decision tree model according to the histogram information, and encrypting the first splitting gain value; the first gradient and the second gradient are gradients of different orders.

The Trusted Execution Environment (TEE) is a secure area constructed by a software and hardware method, and data is processed in the Trusted Execution Environment, so that confidentiality of the data can be well guaranteed. Illustratively, hardware and software resources of the second terminal are divided into a trusted execution environment and an untrusted execution environment, the trusted execution environment and the untrusted execution environment have security isolation, and each has an independent internal data path and storage space, and the untrusted execution environment cannot access the trusted execution environment. It should be noted that, when the second object corresponding device is a second server, the trusted execution environment and the untrusted execution environment may be deployed in the second server.

The training sample is a sample shared by the first object and the second object, and the decision tree model is trained by combining the feature data of the training sample possessed by the first object and the feature data of the training sample possessed by the second object.

A complete decision tree model comprising: the decision tree model comprises root nodes, intermediate nodes and leaf nodes, wherein the target node is the root node or the intermediate node to be split currently in the decision tree model.

The histogram information of the target node comprises a first gradient histogram and a second gradient histogram of each feature corresponding to the target node; the first gradient histogram is determined according to the characteristic value of the sample corresponding to the target node in the training sample under each characteristic and the first gradient of the sample corresponding to the target node, and the second gradient histogram is determined according to the characteristic value of the sample corresponding to the target node in the training sample under each characteristic and the second gradient of the sample corresponding to the target node.

The first split gain value is a gain value of a certain feature in the features corresponding to the target node, the information gain value can be used for reflecting the importance degree of the features, namely the degree of influence on the predicted value of the decision tree model, the greater the gain value of the features, the higher the importance degree of the features is, and the smaller the gain value of the features is, the lower the importance degree of the features is.

In practical applications, the first gradient may be a first order gradient and the second gradient may be a second order gradient.

Specifically, the second object decrypts the first gradient ciphertext and the second gradient ciphertext in the TEE environment to obtain the first gradient and the second gradient, and the untrusted execution environment cannot access the TEE environment because of the first gradient and the second gradient in the TEE environment, so that the second object cannot obtain the first gradient and the second gradient, leakage of tag data of the first object cannot be caused, and data security of the first object is guaranteed.

Determining a first gradient histogram and a second gradient histogram of each feature corresponding to the target node according to the training sample, the first gradient, the second gradient and a second feature of the training sample possessed by the second object in the TEE environment by the second object; because the first gradient histogram and the second gradient histogram are in the TEE environment, the untrusted execution environment cannot access the TEE environment, the second object cannot obtain the first gradient histogram and the second gradient histogram, and the second object cannot reversely deduce the sample label of the first object through the first gradient histogram and the second gradient histogram, so that data security of the first object is ensured.

And the second object determines the gain value of each feature according to the first gradient histogram and the second gradient histogram of each feature corresponding to the target node in the TEE environment, and further determines a first split gain value in the gain value of each feature, and the second object encrypts the first split gain value in the TEE environment.

And S206, when the encrypted first split gain value is obtained, node splitting is carried out on the target node in the decision tree model based on the encrypted first split gain value.

And splitting the target node into nodes, namely dividing the corresponding sample of the target node into a left child node and a right child node, so that the decision tree model generates a new child node. Illustratively, as shown in fig. 3, the target node nid1 corresponding sample includes: x1, x2, x3, x4, x5, x6, splitting the target node, so that x1, x2, x6 are divided into the left child node nid2-1 of the target node; x3, x4, x5 are divided to the right child node nid2-2 of the target node.

It should be noted that, after node splitting is performed on a target node in the decision tree model, the generated new child node is used as the target node, and the above process of splitting the target node in the decision tree model is repeated until the target node can no longer perform node splitting, and the target node which can no longer perform node splitting is used as a leaf node of the decision tree model to obtain the decision tree model.

Specifically, the second object sends an encrypted first split gain value to the first object, the first object receives the encrypted first split gain value and decrypts the encrypted first split gain value to obtain a first split gain value, and under the condition that node splitting is performed according to the first split gain value, the first object sends prompt information to the second object, and the second object receives the prompt information. The hint information is used to instruct the second object to perform node splitting on the target node based on the encrypted first split gain value.

Because the first splitting gain value is a gain value of a certain characteristic in the characteristics corresponding to the target node, splitting the target node based on the encrypted first splitting gain value comprises: the method comprises the steps of obtaining a first splitting characteristic and a first splitting characteristic value corresponding to an encrypted first splitting gain value, obtaining a characteristic value of a sample corresponding to a target node under the first splitting characteristic, splitting the target node according to the first splitting characteristic value and the characteristic value of the sample corresponding to the target node to generate a left sub-node and a right sub-node of the target node, wherein the left sub-node comprises a first part of samples in the sample corresponding to the target node, the right sub-node comprises a second part of samples in the sample corresponding to the target node, and the characteristic values of the first part of samples under the first splitting characteristic are smaller than or larger than the characteristic values of the second part of samples under the first splitting characteristic.

Illustratively, the target node is d1, and the target node d1 corresponds to the sample and includes: x1, x2, x3, x4, x5, x6, the second object having the characteristics of each sample including: f1, f2, f3, assuming that the encrypted first split feature value is an information gain value of the feature f1, obtaining a first split feature value v1 of the feature f1, for each sample, dividing the sample with the feature value of the feature f1 being less than or equal to v1 to a left child node d21 of the target node d1, and dividing the sample with the feature value of the feature f1 being greater than v1 to a right child node d22 of the target node d1, assuming that the sample of the left child node d21 includes: x1, x2, x6, samples of the right child node include x3, x4, x5; the characteristic values of x1, x2 and x6 under the characteristic f1 are all smaller than the characteristic values of x3, x4 and x5 under the characteristic f 1.

In the processing method of the decision tree model, a second object receives a first gradient ciphertext and a second gradient ciphertext which are sent by a first object, decrypts the first gradient ciphertext and the second gradient ciphertext in a trusted execution environment, determines histogram information of a target node based on a training sample, a first splitting gain value based on the histogram information and performs node splitting on the target node based on the encrypted first splitting gain value in the trusted execution environment; the second object decrypts the first gradient ciphertext and the second gradient ciphertext in the trusted execution environment, and cannot access the trusted execution environment, so that the second object cannot obtain the first gradient and the second gradient, the tag data of the first object cannot be leaked, and the data security of the first object is ensured; the second object determines the histogram information based on the first gradient and the second gradient of the plaintext in the trusted execution environment, and compared with the method for determining the histogram information based on the encrypted gradient, the method has the advantages that the calculated data amount is reduced, the efficiency for determining the histogram information is greatly improved, the optimization efficiency of federal learning is further improved, and the training efficiency of the decision tree model is further improved.

In some embodiments, receiving the first gradient cipher text and the second gradient cipher text sent by the first object comprises: receiving a first gradient ciphertext and a second gradient ciphertext which are obtained by encrypting a first gradient and a second gradient respectively based on a symmetric key and are sent by a first object; decrypting the first gradient ciphertext and the second gradient ciphertext, comprising: decrypting the first gradient ciphertext and the second gradient ciphertext based on the symmetric key.

The symmetric key is a key generated according to a symmetric encryption algorithm, and for a ciphertext obtained by encrypting a plaintext through the symmetric key, the ciphertext can be decrypted by using the symmetric key to obtain the plaintext. Symmetric encryption algorithms include, but are not limited to: DES (Data Encryption Standard), AES (Advanced Encryption Standard), and SM4 (a packet cipher Standard).

Specifically, the first object generates a symmetric key by using a symmetric encryption algorithm, encrypts the first gradient and the second gradient through the symmetric key to obtain a first gradient ciphertext and a second gradient ciphertext, and sends the first gradient ciphertext and the second gradient ciphertext to the second object.

And the second object receives the first gradient ciphertext and the second gradient ciphertext, and decrypts the first gradient ciphertext and the second gradient ciphertext by adopting a symmetric key in the TEE environment to obtain a first gradient and a second gradient. The symmetric key used by the second object to perform decryption is the same as the symmetric key used by the first object to perform encryption, and the symmetric key used by the second object to perform decryption is transmitted by the first object.

In one implementation, a first object compresses a first gradient and a second gradient, encrypts the compressed first gradient and second gradient by using a symmetric key to obtain a first gradient ciphertext and a second gradient ciphertext, correspondingly, a second object decrypts the first gradient ciphertext and the second gradient ciphertext by using the symmetric key to obtain a compressed first gradient and a compressed second gradient, and decompresses the compressed first gradient and the compressed second gradient to obtain the first gradient and the second gradient.

The first gradient ciphertext and the second gradient ciphertext are compressed and then encrypted, so that the data volume of the first gradient ciphertext and the second gradient ciphertext is reduced, the communication efficiency of transmitting the first gradient ciphertext and the second gradient ciphertext between the first object and the second object is improved.

In the above embodiment, the first object encrypts the first gradient and the second gradient through the symmetric key, and the second object decrypts the first gradient ciphertext and the second gradient ciphertext through the symmetric key in the TEE environment to obtain the first gradient and the second gradient; compared with homomorphic encryption, the data processing speed of the symmetric encryption can reach gigabit per second, the encryption efficiency can be improved by adopting the symmetric encryption, in addition, the data volume of the ciphertext obtained through the symmetric encryption is small, and the communication efficiency of the first object for sending the first gradient ciphertext and the second gradient ciphertext to the second object can be improved.

In some embodiments, as shown in fig. 4, the processing method of the decision tree model further includes:

step S401, generating an asymmetric encrypted public key and a private key in a trusted execution environment, and acquiring a public key output from the trusted execution environment;

step S402, a public key is sent to the first object to indicate the first object to encrypt the symmetric key based on the public key to obtain a key ciphertext;

step S403, receiving a key ciphertext sent by the first object;

step S404, in the trusted execution environment, the secret key ciphertext is decrypted based on the private key, so as to obtain a symmetric secret key for decrypting the first gradient ciphertext and the second gradient ciphertext.

The public key and the private key are the public key and the private key which are included in a key pair generated by an asymmetric encryption algorithm, and for a ciphertext obtained by encrypting a plaintext through the public key, the plaintext can be obtained by decrypting the ciphertext through the corresponding private key. Asymmetric encryption algorithms include, but are not limited to: SM2 (an elliptic curve public key cryptographic algorithm) and RSA.

Specifically, the second object generates a public key and a private key by adopting an asymmetric encryption algorithm in the TEE environment, the TEE environment outputs the public key, the second object sends the public key to the first object, and the private key is in the TEE environment.

The first object receives the public key, the public key is adopted to encrypt the symmetric key generated by the first object to obtain a key ciphertext, the first object sends the key ciphertext to the second object, the second object receives the key ciphertext, and the key ciphertext is decrypted through the private key in a TEE environment to obtain the symmetric key.

In the above embodiment, the first object encrypts the generated symmetric key through the public key to obtain a key ciphertext, and the second object decrypts the key ciphertext through the private key in the TEE environment to obtain the symmetric key, so that the second object can decrypt the first gradient ciphertext and the second gradient ciphertext through the symmetric key in the TEE environment. Through the asymmetric encryption mode, only the second object with the private key can obtain the symmetric key generated by the first object, the private key exists in the TEE environment, the key ciphertext is decrypted through the private key in the TEE environment to obtain the symmetric key, and the symmetric key is located in the TEE environment, so that the second object cannot obtain the symmetric key, the label data of the first object cannot be leaked, and the data security of the first object is ensured.

In some embodiments, the first gradient comprises a first order gradient of the training samples, the second gradient comprises a second order gradient of the training samples, and the histogram information comprises a first order gradient histogram and a second order gradient histogram; as shown in fig. 5, determining histogram information of a target node in a decision tree model based on a training sample, a decrypted first gradient and a decrypted second gradient includes:

step S501, obtaining a sample to be processed corresponding to a target node in a decision tree model;

step S502, determining the first-order gradient of the sample to be processed in the first-order gradient of the training sample, and determining the second-order gradient of the sample to be processed in the second-order gradient of the training sample;

step S503, determining a first order gradient histogram and a second order gradient histogram of the target node based on the first order gradient and the second order gradient of the sample to be processed and the second feature of the sample to be processed possessed by the second object.

The training samples comprise a plurality of training samples which are shared by the first object and the second object, the first-order gradients of the training samples comprise first-order gradients corresponding to the training samples, and the second-order gradients of the training samples comprise second-order gradients corresponding to the training samples.

Exemplarily, the first object predicts n training samples according to the tth decision tree model to obtain predicted values corresponding to the n training samples, where the n predicted values include: yt1, yt2, \8230 \ 8230;, ytn, respectively establishing loss functions according to the n predicted values, and determining first-order gradients and second-order gradients of the n loss functions to obtain first-order gradients of n training samples, including: gt1, gt2, \8230;, gtn, and the second order gradients of n training samples, including Ht1, ht2, \8230;, htn.

The sample to be processed is a training sample corresponding to the target node in the plurality of training samples. The sample to be processed may include a plurality of samples to be processed. In the generation process of the decision tree model, a plurality of training samples are gradually divided into different nodes, so that different nodes in the decision tree model correspond to different samples to be processed. Illustratively, when the target node is a root node of the decision tree model, the to-be-processed samples corresponding to the target node are a plurality of training samples, when the target node is a left child node of the root node, the to-be-processed samples corresponding to the target node are first part training samples of the plurality of training samples, and when the target node is a right child node of the root node, the to-be-processed samples corresponding to the target node are second part training samples of the plurality of training samples.

A second feature which is a feature of a sample to be processed provided to the second object; the second feature may include a plurality of the second features.

Specifically, a plurality of samples to be processed corresponding to the target node are determined in a plurality of training samples, for each second feature, a second feature value of the plurality of samples to be processed under the second feature is obtained, a plurality of bin values corresponding to the second feature are determined according to the plurality of second feature values, the plurality of second feature values are divided into a plurality of bins according to the plurality of bin values, each bin has a corresponding second feature value interval, and the second feature value interval is determined based on the plurality of bin values. For each sub-box, superposing the first-order gradients of the samples to be processed corresponding to each second characteristic value in the sub-box to obtain a first-order gradient superposition value of the sub-box, and determining a first gradient histogram according to the first-order gradient superposition value of each sub-box and the interval of the second characteristic value corresponding to each sub-box; similarly, for each bin, superposing the second-order gradients of the samples to be processed corresponding to each second characteristic value in the bin to obtain a second-order gradient superposed value of the bin, and determining a second gradient histogram according to the second-order gradient superposed value of each bin and the second characteristic value interval corresponding to each bin.

Illustratively, as shown in fig. 6, the plurality of samples to be processed are: x1, x2, \ 8230 \ 8230;, x10, under the second characteristic f1, the second characteristic values of the plurality of samples to be processed include: k11=0.8, k12=0.5, k13=1, k14=0.4, k15=0.2, k16=0.6, k17=0.7, k18=0.1, k19=0.9, k110=0.3, and the first order gradients of the plurality of samples to be processed are: g1=0.03, g2=0.12, g3=0.4, g4=0.2, g5= -0.3, g6= -0.1, g7=0.05, g8= -0.08, g9=0.22, g10= -0.07.

Determining a plurality of bin values of the second feature as: z1=0.35, z2=0.55, z3=0.75, four bins are determined from the plurality of bin values, the bin 1, the bin 2, the bin 3, and the bin 4 respectively, the second characteristic value interval corresponding to the bin 1 is (0, 0.35), the second characteristic value interval corresponding to the bin 2 is (0.35, 0.55), the second characteristic value interval corresponding to the bin 3 is (0.55, 0.75), and the second characteristic value interval corresponding to the bin 4 is (0.75, 1], and it can be determined that the second characteristic values in the bin 1 include k15, k18, and k110, the second characteristic values in the bin 2 include k12 and k14, the second characteristic values in the bin 3 include k16 and k17, and the second characteristic values in the bin 4 include k11, k13, and k19.

As shown in fig. 7, the second eigenvalue in the bin 1 is superimposed with the first-order gradients of the samples x5, x8, and x10 to be processed, so as to obtain a first-order gradient superimposed value Gd1= -0.45 corresponding to the bin 1, the second eigenvalue in the bin 2 is superimposed with the first-order gradients of the samples x2 and x4 to be processed, so as to obtain a first-order gradient superimposed value Gd2=0.32 corresponding to the bin 2, the second eigenvalue in the bin 3 is superimposed with the first-order gradients of the samples x6 and x7 to be processed, so as to obtain a first-order gradient superimposed value Gd3= -0.05 corresponding to the bin 3, and the second eigenvalue in the bin 4 is superimposed with the first-order gradients of the samples x1, x3, and x9 to be processed, so as to obtain a first-order gradient superimposed value Gd4=0.65 corresponding to the bin 4.

A first order gradient histogram of the second feature f1 is determined based on the second feature value interval (0,0.35) and the first order gradient overlap value Gd1= -0.45 corresponding to bin 1, the second feature value interval (0.35,0.55) and the first order gradient overlap value Gd2=0.32 corresponding to bin 2, the second feature value interval (0.55, 0.75) and the first order gradient overlap value Gd3= -0.05 corresponding to bin 3, and the second feature value interval (0.75, 1) and the first order gradient overlap value Gd4=0.65 corresponding to bin 4.

In the above embodiment, the second object determines the histogram of first order gradient and the histogram of second order gradient by the first order gradient and the second order gradient of the plain text in the TEE environment, and compared with determining the histogram of first order gradient and the histogram of second order gradient by the first order gradient and the second order gradient of the cipher text, the amount of calculation is reduced, the time and the calculation resources required for determining the histogram of first order gradient and the histogram of second order gradient are greatly reduced, and the efficiency of determining the histogram of first order gradient and the histogram of second order gradient is improved.

In some embodiments, determining the first order gradient histogram and the second order gradient histogram of the target node based on the first order gradient and the second order gradient of the sample to be processed and the second feature of the sample to be processed provided by the second object, as shown in fig. 8, includes:

step S801, performing unilateral sampling on a sample to be processed based on a first-order gradient and a second-order gradient of the sample to be processed to obtain a target training sample;

step S802, determining a first order gradient histogram and a second order gradient histogram of the target node based on the first order gradient and the second order gradient of the target training sample and the second feature of the target training sample possessed by the second object.

The unilateral Sampling is Gradient-based One-Side Sampling (GOSS), which means that a sample to be processed is sampled through a Sampling Gradient of the sample to be processed, so as to reduce the sample size and improve the learning efficiency.

The sampling gradient is related to the information gain, the larger the sampling gradient of the sample to be processed is, the higher the information gain of the sample to be processed is, and the smaller the sampling gradient of the sample to be processed is, the lower the information gain of the sample to be processed is; if the sampling gradient of a sample to be processed is small, the sample to be processed is fully trained, and if the sampling gradient of a sample to be processed is large, the sample to be processed is not fully trained. In practical applications, the sampling gradient of the sample to be processed is the product of the first order gradient and the second order gradient of the sample to be processed.

And reserving the sample to be processed with the larger sampling gradient through the GOSS, and randomly sampling part of the sample to be processed with the smaller sampling gradient.

Specifically, for each sample to be processed, determining the product between the first order gradient and the second order gradient of the sample to be processed to obtain the sampling gradient of the sample to be processed; the method comprises the steps of arranging a plurality of samples to be processed according to the sequence of sampling gradients from large to small to obtain a sample sequence to be processed, selecting a first sample to be processed arranged at a first preset proportion in the sample sequence to be processed, randomly sampling a second sample to be processed at a second preset proportion in other samples to be processed except the first sample to be processed, wherein a target training sample comprises the first sample to be processed and the second sample to be processed. It should be noted that the first preset ratio is a ratio of the first to-be-processed sample to all to-be-processed samples in the to-be-processed sample sequence, and the second preset ratio is a ratio of the second to-be-processed sample to all to-be-processed samples in the to-be-processed sample sequence.

After passing the GOSS processing, the target proportion of the samples to be processed is reduced, so the target proportion is set as the target weight of the second sample to be processed for subsequent calculation. The target weight (target proportion) is the ratio between 1 and the difference between the first preset proportion and the second preset proportion. The values of the first preset proportion and the second preset proportion can be set according to actual requirements, and the embodiment of the application does not limit the values.

Illustratively, a plurality of samples to be processed are arranged according to the order of the sampling gradient from large to small, a sequence { XC } of the samples to be processed is obtained, and the samples to be processed arranged at the first a% in the sequence { XC } of the samples are taken asThe first sample to be processed is arranged in the next (1-a)% of the samples to be processed in the sample sequence { XC }, b% of the samples to be processed are randomly selected to obtain a second sample to be processed, and the target proportion is determined to be

Will be

Set as the target weight for the second sample to be processed.

Multiplying the first-order gradient and the second-order gradient of the second sample to be processed in the target training sample by the target weight to obtain a target first-order gradient and a target second-order gradient of the second sample to be processed; and determining a first-order gradient histogram and a second-order gradient histogram of the target node according to the first-order gradient and the second-order gradient of the first sample to be processed in the target training sample, the target first-order gradient and the target second-order gradient of the second sample to be processed and the second characteristic of the target training sample possessed by the second object.

The first-order gradient histogram and the second-order gradient histogram of the target node are determined according to the first-order gradient and the second-order gradient of the first sample to be processed, the target first-order gradient and the target second-order gradient of the second sample to be processed, and the second feature of the target training sample of the second object.

In the embodiment, the training samples are sampled by the GOSS, so that the sample size is reduced, the calculation amount for determining the first-order gradient histogram and the second-order gradient histogram is further reduced, the efficiency for determining the first-order gradient histogram and the second-order gradient histogram is improved, and the learning efficiency of the decision tree model is further improved.

In some embodiments, the first order histogram of the target node is a first order histogram of the second feature, and the second order histogram of the target node is a second order histogram of the second feature; determining a first split gain value of a target node in the decision tree model according to the histogram information, comprising: determining a gain value of the second feature based on the first-order gradient histogram and the second-order gradient histogram of the second feature; selecting a gain value satisfying a gain condition from the gain values of the second feature; and taking the gain value meeting the gain condition as a first splitting gain value of the target node in the decision tree model.

The first-order gradient histogram of the target node comprises a first-order gradient histogram of the second object at the target node corresponding to each second feature; the second-order gradient histogram of the target node includes a second-order gradient histogram of the second object at the target node corresponding to each of the second features.

Illustratively, the target node correspondence samples include: x1, x2, x3, x4, x5, x6, the second feature of each sample possessed by the second object at the target node includes: f1, f2, f3, obtaining a first order gradient histogram and a second order gradient histogram corresponding to each second feature according to the first order gradient and the second order gradient of x1, x2, x3, x4, x5, x6 and the second feature value of x1, x2, x3, x4, x5, x6 under each second feature, including: a first-order gradient histogram GF1 and a second-order gradient histogram HF1 corresponding to f1, a first-order gradient histogram GF2 and a second-order gradient histogram HF2 corresponding to f2, and a first-order gradient histogram GF3 and a second-order gradient histogram HF3 corresponding to f3.

Specifically, for a first-order gradient histogram and a second-order gradient histogram corresponding to each second feature, for each split box in the first-order gradient histogram, the split box is taken as a division point, first-order gradients from all split boxes on the left side of the division point to the split box where the division point is located are accumulated to obtain a first gradient sum, first-order gradients from all split boxes on the right side of the division point to the split box where the division point is located are accumulated to obtain a second gradient sum, second-order gradients from all split boxes on the left side of the division point to the split box where the division point is located are accumulated in the second-order gradient histogram to obtain a third gradient sum, and second-order gradients from all split boxes on the right side of the division point to the split box where the division point is located are accumulated to obtain a fourth gradient sum; determining a gain value of the second feature at the segmentation point according to the first gradient sum, the second gradient sum, the third gradient sum and the fourth gradient sum by adopting a gain value calculation formula; and traversing each bin in the first-order gradient histogram to obtain a corresponding gain value when each bin is taken as a division point, and taking the maximum gain value of the second feature in the gain values corresponding to the bins as the gain value of the second feature.

For example, a first-order gradient histogram corresponding to the second feature f1 is shown in fig. 9, a second-order gradient histogram corresponding to the second feature f1 is shown in fig. 10, the first gradient and GL4, the second gradient and GR4, the third gradient and HL4, and the fourth gradient and HR4 corresponding to the bin 4 are determined with the bin 4 as a dividing point, and a gain value calculation formula is shown in formula (1).

(1)

Wherein the content of the first and second substances,

is the gain value of the second feature f1 under bin 4, GL4 is the first gradient sum corresponding to bin 4, GR4 is the second gradient sum corresponding to bin 4, HL4 is the third gradient sum corresponding to bin 4, HR4 is the fourth gradient sum corresponding to bin 4, G4= GL4+ GR4, H4= HL4+ HR4,

and

is a preset gain parameter; the preset gain parameter can be set according to actual requirements.

As shown in fig. 9 or fig. 10, the second feature f1 includes 8 bins, and the gain values corresponding to the second feature f1 in the 8 bins are respectively determined, and the largest gain value among the gain values is taken as the gain value of the second feature f 1.

The gain value satisfying the gain condition may be selected from the gain values of the second features, and a maximum gain value may be selected from the gain values of the second features, and the maximum gain value may be set as the first split gain value of the target node.

Illustratively, the second feature of each sample possessed by the second object at the target node includes: f1, f2, f3, determining the gain values of f1, f2, f3 as

、

And

suppose that

Is that

、

And

the maximum gain value of the medium, then

As a first split gain value.

It should be noted that, after determining the first splitting gain value, it may be determined that the second feature corresponding to the first splitting gain value is a splitting feature, and the splitting feature value corresponding to the first splitting gain value.

In an exemplary manner, the first and second electrodes are,

is the first split gain value, the second characteristic f1 is the first split gain valueCorresponding splitting feature, hypothesis

If the value is the gain value of the 4 th bin under the second characteristic f1, the characteristic value corresponding to the 4 th bin under the second characteristic f1 is the splitting gain value corresponding to the first splitting gain value.

In the above embodiment, the second object determines, in the TEE environment, the first splitting gain value of the target node corresponding to the second object according to the first-order gradient histogram and the second-order gradient histogram; in the prior art, a second object sends histogram information of a ciphertext determined by a second order gradient of a first order gradient of the ciphertext to a first object, the first object determines a first splitting gain value, and the first object can reversely deduce feature data of the second object through the histogram information of the ciphertext; the embodiment avoids the leakage of the characteristic data of the second object, and ensures the data security of the second object. Furthermore, since the first order gradient histogram and the second order gradient histogram are calculated from the first order gradient and the second order gradient of the plain text, it is more efficient to determine the first division gain value by the first order gradient histogram and the second order gradient histogram of the plain text than to determine the first division gain value by the histogram information of the cipher text.

In some embodiments, after receiving the first gradient cipher text and the second gradient cipher text transmitted by the first object, the method further comprises: inputting the first gradient ciphertext and the second gradient ciphertext into the trusted execution environment; after the encryption of the first split gain value is completed in the trusted execution environment, the encrypted first split gain value is output from the trusted execution environment.

Specifically, the second object receives a first gradient ciphertext and a second gradient ciphertext, wherein the first gradient ciphertext and the second gradient ciphertext exist in the untrusted execution environment, and the second object inputs the first gradient ciphertext and the second gradient ciphertext into the TEE environment; and determining an encrypted first split gain value in the TEE environment, and outputting the encrypted first split gain value by the TEE environment so that the second object acquires the encrypted first split gain value.

In some embodiments, the second object obtains the encrypted first split gain value, the split characteristic corresponding to the encrypted first split gain value, and the split characteristic value corresponding to the encrypted first split gain value.

Specifically, the TEE environment outputs an encrypted first split gain value, a first split feature corresponding to the encrypted first split gain value, and a first split feature value, so that when the second object performs node splitting on the target node based on the encrypted first split gain value, the target node may be directly subjected to node splitting according to the first split feature corresponding to the encrypted first split gain value and the first split feature value.

In the above embodiment, after the second object acquires the first gradient ciphertext and the second gradient ciphertext, the first gradient ciphertext and the second gradient ciphertext are input to the TEE environment, the first gradient ciphertext and the second gradient ciphertext are decrypted in the TEE environment to obtain the first gradient and the second gradient, histogram information is determined based on the first gradient and the second gradient, the first split gain value is determined, and the encrypted first split gain value is output to the TEE environment, so that the second object can only acquire the first gradient ciphertext, the second gradient ciphertext and the encrypted first split gain value, and cannot acquire the first gradient, the second gradient, the histogram information and the first split gain value of the plaintext, thus the tag data of the first object cannot be leaked, and the data security of the first object is ensured.

In some embodiments, encrypting the first split gain value comprises: encrypting the first split gain value based on the symmetric key; node splitting is performed on a target node in the decision tree model based on the encrypted first split gain value, and the node splitting method comprises the following steps: sending the encrypted first split gain value to the first object so that the first object decrypts the encrypted first split gain value based on the symmetric key to obtain a first split gain value, and selecting the split gain value meeting a preset condition from the first split gain value and the second split gain value; the second split gain value is determined based on the first gradient and the second gradient of the to-be-processed sample corresponding to the target node and the first characteristic of the to-be-processed sample possessed by the first object; and when the splitting gain value meeting the preset condition is the first splitting gain value, receiving prompt information sent by the first object, and performing node splitting on a target node in the decision tree model based on the encrypted first splitting gain value.

The first feature is a feature of a sample to be processed provided by the first object, and the first feature may include a plurality of first features. It should be noted that, for the target node, the first object and the second object have the same sample to be processed, and the first feature of the sample to be processed of the first object and the second feature of the sample to be processed of the second object may be different. Illustratively, the first object is department a, the business related to department a is a virtual resource exchange application, the second object is department B, the business related to department B is a social application, there are overlapping users of the virtual resource exchange application and the social application (training objects shared by the first object and the second object), and the virtual resource exchange application has different feature data than the social application.

The splitting gain value satisfying the preset condition comprises the following steps: the split gain value is greater than a non-split gain value of the first split gain value and the second split gain value. Illustratively, the split gain value is a first split gain value, then the second split gain value is a non-split gain value, and the first split gain value is greater than the second split gain value. That is, the larger of the first and second split gain values is taken as the split gain value satisfying the preset condition.

The prompt message is used for informing the second object of possessing the splitting gain value meeting the preset condition and instructing the second object to perform node splitting on the target node of the decision tree model based on the encrypted first splitting gain value. In practical applications, the prompt message may be a preset prompt identifier.

Specifically, the second object encrypts the first split gain value with the symmetric key generated by the first object in the TEE environment to obtain an encrypted first split gain value. The second object sends the encrypted first split gain value to the first object, so that the first object obtains the encrypted first split gain value, and the first object can decrypt the encrypted first split gain value by using the generated symmetric key to obtain the first split gain value.

The method comprises the steps that a first object obtains a plurality of samples to be processed corresponding to a target node, for each first feature, first feature values of the samples to be processed under the first feature are obtained, the first feature values are divided into a plurality of bins, a first-order gradient histogram is determined based on the first gradient of the samples to be processed corresponding to the first feature value in each bin, and a second-order gradient histogram is determined based on the second feature value in each bin corresponding to the second gradient of the samples to be processed; determining a gain value under the first characteristic according to the first-order gradient histogram and the second-order gradient histogram; and taking the maximum gain value in the gain values respectively corresponding to the first characteristics as a second split gain value.

The first object takes the larger of the first split gain value and the second split gain value as a split gain value satisfying a preset condition; when the splitting gain value meeting the preset condition is the first splitting gain value, the first splitting gain value is determined based on the second characteristic of the to-be-processed sample of the second object, the first object cannot perform node splitting through the first splitting gain value, and then the second object performs node splitting on a target node based on the encrypted first splitting gain value; and the first object sends prompt information to the second object, the second object receives the prompt information, and node splitting is carried out on the target node in the decision tree model based on the encrypted first splitting gain value according to the prompt information.

In the above embodiment, the second object sends the encrypted first split gain value to the first object, the first object decrypts the encrypted first split gain value through the symmetric key to obtain the first split gain value, the first object does not obtain the histogram information, and the feature data of the second object cannot be reversely derived, so that the data security of the second object is ensured. In addition, the second object sends the encrypted first split gain value to the first object, compared with the histogram information of the ciphertext, the data size of the encrypted first split gain value is small, and the communication efficiency between the first object and the second object is improved.

In some embodiments, node splitting a target node in the decision tree model based on the encrypted first split gain value comprises: performing node splitting on a target node in a decision tree model corresponding to the second object based on the encrypted first splitting gain value to obtain a first splitting result; and sending the first split result to the first object so that the first object updates the decision tree model corresponding to the first object based on the first split result.

And after the first splitting result is used for reflecting that the second object splits the target node, one part of the sample corresponding to the target node is divided into the left child node of the target node, and the other part of the sample is divided into the right child node of the target node.

The decision tree model corresponding to the first object is the same as the samples corresponding to each node in the decision tree model corresponding to the second object, for example, the decision tree model corresponding to the first object includes a root node, a left child node of the root node, and a right child node of the root node, and in the decision tree model corresponding to the first object, the samples corresponding to the root node include: x21, x22, x23, x24, x25, x26, left child node corresponding samples include: x21, x22, x23, right child node corresponding samples include: x24, x25, x26; similarly, the decision tree model corresponding to the second object includes a root node, a left child node, and a right child node, and in the decision tree model corresponding to the second object, the root node corresponding samples include: x21, x22, x23, x24, x25, x26, left child node corresponding samples include: x21, x22, x23, right child node corresponding samples include: x24, x25, x26.

Specifically, a second object acquires a first splitting characteristic and a first splitting characteristic value corresponding to an encrypted first splitting gain value, acquires a characteristic value of a sample to be processed under the first splitting characteristic, and divides the sample to be processed of a target node into a first part of sample and a second part of sample according to the first splitting characteristic value based on the characteristic value of the sample to be processed under the first splitting characteristic so as to complete node splitting of the target node in a corresponding decision tree; the characteristic values of the first part of samples under the first split characteristic are all smaller or all larger than the characteristic values of the second part of samples under the second split characteristic.

Illustratively, while the TEE environment outputs the encrypted first split gain value Enc (gain 1), a first split feature fid1 and a first split feature value1 corresponding to the first split gain value Enc (gain 1) are output; the second object receives the prompt information, obtains a first splitting characteristic fid1 and a first splitting characteristic value1 corresponding to the encrypted first splitting gain value Enc (gain 1), obtains a characteristic value of each sample to be processed under the fid1, and divides each sample to be processed of the target node into a first part of samples and a second part of samples according to the value1 and the characteristic value of each sample to be processed under the fid 1.

The second object associates the sample number of the first part sample with the node index corresponding to the first part sample, associates the sample number of the first part sample with the node index corresponding to the second part sample to obtain a first splitting result, and sends the first splitting result to the first object. The first part of samples corresponds to nodes, and can be left child nodes of the target nodes or right child nodes of the target nodes; accordingly, the second partial sample corresponds to a node, which may be a right child node of the target node or a left child node of the target node.

And the first object acquires a first split result and updates the corresponding decision tree model according to the first split result. Updating the decision tree model corresponding to the first object, which may be updating sample node index information of the decision tree model corresponding to the first object; specifically, the first split result is added to the sample node index information of the current decision tree model to update the sample node index information of the decision tree model.

The first object can determine the result of node splitting performed on the target node by the second object through the first splitting result, the first object can only determine that the feature values of the first part of samples under a certain second characteristic are smaller than or larger than the feature values of the second part of samples under a certain second characteristic, the specific meaning of the certain second characteristic cannot be determined, and the feature values of the first part of samples and the second part of samples under a certain second characteristic cannot be determined, so that the safety of the feature data of the second object is ensured.

In the above embodiment, when the selected splitting gain value meeting the preset condition is the first splitting gain value, the second object performs node splitting on the target node in the decision tree model based on the encrypted first splitting gain value, and sends the first splitting result to the first object, and the first object updates the decision tree model by using the first splitting result, so that the first object and the second object share the node splitting result, and the feature data of the second object cannot be leaked, thereby ensuring the security of the feature data of the second object.

In some embodiments, the method for processing a decision tree model further comprises: when the splitting gain value meeting the preset condition is a second splitting gain value, receiving a second splitting result obtained by splitting a target node in the decision tree model corresponding to the first object by the first object based on the second splitting gain value; and updating the decision tree model corresponding to the second object based on the second split result.

And after the second division result is used for reflecting that the first object performs node division on the target node, in the samples corresponding to the target node, one part of the samples are divided into the left child nodes of the target node, and the other part of the samples are divided into the right child nodes of the target node.

Specifically, when the selected splitting gain value meeting the preset condition is the second splitting gain value, the second splitting gain value is determined based on the first characteristic of the to-be-processed sample possessed by the first object, the second object cannot perform node splitting through the second splitting gain value, and then the first object performs node splitting on the target node based on the second splitting gain value.

The first object determines a second splitting characteristic and a second splitting characteristic value corresponding to a second splitting gain value in the first characteristic, the first object obtains a characteristic value of the sample to be processed under the second splitting characteristic, and divides the sample to be processed of the target node into a third part sample and a fourth part sample according to the second splitting characteristic value on the basis of the characteristic value of the sample to be processed under the second splitting characteristic so as to complete node splitting of the target node in the decision tree model corresponding to the first object; and the characteristic values of the third part of samples under the second splitting characteristic are all smaller or all larger than the characteristic value of the fourth part of samples under the second splitting characteristic.

Illustratively, the first object acquires a second splitting characteristic fid2 and a second splitting characteristic value2 corresponding to the second splitting gain value gain2, acquires a characteristic value of each sample to be processed under the fid2, and divides each sample to be processed of the target node into a third partial sample and a fourth partial sample according to the value2 and the characteristic value of each sample to be processed under the fid 2.

And the first object associates the sample number of the third part sample with the node index corresponding to the third part sample, associates the sample number of the fourth part sample with the node index corresponding to the fourth part sample to obtain a second splitting result, and sends the second splitting result to the second object. The third part of samples correspond to nodes, and can be left child nodes of the target nodes or right child nodes of the target nodes; accordingly, the fourth partial sample corresponds to a node, which may be a right child node of the target node or a left child node of the target node.

And the second object acquires a second split result, and updates the decision tree model corresponding to the second split result according to the second split result. Updating the decision tree model corresponding to the second object, which may be updating sample node index information of the decision tree model corresponding to the second object; specifically, a second split result is added to the sample node index information of the current decision tree model to update the sample node index information of the decision tree model; and the sample node index information of the decision tree model is used for reflecting each node included by the decision tree model and a sample corresponding to each node.

The second object can determine the result of node splitting performed on the target node by the first object through the second splitting result, and the second object can only determine that the feature values of the third part sample under a certain first feature are both smaller than or larger than the feature values of the fourth part sample under a certain first feature, but cannot determine the specific meaning of the certain first feature, and cannot determine the feature values of the third part sample and the fourth part sample under a certain first feature, thereby ensuring the safety of the feature data of the first object.

In some embodiments, the sample node index information of the first object and the sample node index information of the second object are the same, and in the case that the first object performs node splitting on a target node in the decision tree model corresponding to the first object based on the second splitting gain value, the first object updates the sample node index information thereof according to the first splitting result, the first object sends the first splitting result to the second object, and the second object updates the sample node index information of the second object according to the first splitting result, so that the sample node index information of the first object and the sample node index information of the second object are consistent. And under the condition that the second object performs node splitting on a target node in the decision tree model corresponding to the second object based on the first splitting gain value, the second object updates the sample node index information of the second object according to the second splitting result, the second object sends the second splitting result to the first object, and the first object updates the sample node index information of the first object according to the second splitting result, so that the sample node index information of the first object and the sample node index information of the second object are kept consistent.

In the above embodiment, when the selected splitting gain value is the second splitting gain value, the first object performs node splitting on a target node in the decision tree model based on the second splitting gain value, and sends the first splitting result to the second object, and the second object updates the decision tree model by using the first splitting result, so that the first object and the second object share the node splitting result, and the feature data of the first object cannot be leaked, thereby ensuring the security of the feature data of the first object.

In one application scenario, the first object has a sample label of a training sample, and the second object does not have a sample label of a training sample, and in federal learning, a party having a sample label may be called a Guest party, and a party not having a sample label may be called a Host party, that is, the first object may be called a Guest party, and the second object may be called a Host party. As shown in fig. 11, in the processing method of the decision tree model, through interaction between the Guest party and the Host party, a process of the Guest party acquiring the first split gain value determined based on the second feature of the Host party includes:

1. the Host party generates a public key PK and a private key SK in a TEE environment in an asymmetric encryption mode, the public key PK is exported to the TEE environment, and the public key PK is sent to the Guest party; as shown by path (1) in fig. 11;

2. the Guest party generates a symmetric key sym-key through a symmetric encryption mode, and obtains a first gradient ciphertext Enc (Gt) and a second gradient ciphertext Enc (Ht) through sym-key and the first gradient Gt and the second gradient Ht; as shown by path (2) in fig. 11;

3. the Guest party sends the first gradient ciphertext Enc (Gt) and the second gradient ciphertext Enc (Ht) to the Host party, and the Host party inputs the first gradient ciphertext Enc (Gt) and the second gradient ciphertext Enc (Ht) into the TEE environment; as shown by path (3) in fig. 11;

4. the method comprises the steps that a Guest party encrypts a symmetric key sym-key through a public key PK to obtain a key ciphertext PK (sym-key), the Guest party sends the key ciphertext PK (sym-key) to a Host party, the Host party inputs the key ciphertext PK (sym-key) into a TEE environment, and the Host party decrypts the key ciphertext PK (sym-key) through the private key SK in the TEE environment to obtain the symmetric key sym-key; as shown by path (4) in fig. 11;

5. the Host party decrypts the first gradient ciphertext Enc (Gt) and the second gradient ciphertext Enc (Ht) through the symmetric key sym-key in the TEE environment to obtain a first-order gradient Gt and a second-order gradient Ht; as shown by path (5) in fig. 11;

6. the Host side samples a to-be-processed sample corresponding to a target node by adopting GOSS in a TEE environment to obtain a target training sample corresponding to the target node, and establishes a first-order gradient histogram and a second-order gradient histogram corresponding to the target node based on a first-order gradient Gt, a second-order gradient Ht and a second characteristic of the target training sample possessed by the Host side; as shown by path (6) in fig. 11;

7. in a TEE environment, the Host side calculates a first splitting gain value gain1, a first splitting characteristic fid1 corresponding to the first splitting gain value gain1 and a splitting increment characteristic value1 based on a first-order gradient histogram and a second-order gradient histogram; as shown by path (7) in fig. 11;

8. the Host party encrypts the first split gain value gain1 by adopting a symmetric key sym-key in a TEE environment to obtain an encrypted first split gain value Enc (gain 1); as shown by path (8) in fig. 11;

9. the Host side outputs the encrypted first split gain value Enc (gain 1), the encrypted first split characteristic fit 1 and the split increment characteristic value1 to a TEE environment, the Host side sends the encrypted first split gain value Enc (gain 1) to the Guest side, and the Guest side decrypts the encrypted first split gain value Enc (gain 1) through the symmetric key sym-key to obtain the first split gain value gain1; as shown by path (9) in fig. 11.

In another application scenario, the first object is an enterprise a operating a virtual resource exchange application, and the second object is an enterprise B operating a social application, as shown in fig. 12, the processing method of the decision tree model includes:

in a TEE environment, an enterprise B generates a public key PK and a private key SK, exports the public key PK out of the TEE environment, and sends the public key PK to the enterprise A;

when a tth decision tree model is trained, the enterprise A generates a symmetric key sym-key, encrypts the symmetric key sym-key through a public key PK to obtain a key ciphertext PK (sym-key), and sends the key ciphertext PK (sym-key) to the enterprise B;

and the enterprise B inputs the key ciphertext PK (sym-key) into the TEE environment, and decrypts the key ciphertext PK (sym-key) through the private key SK in the TEE environment to obtain the symmetric key sym-key.

The enterprise A determines the predicted value of each training sample according to the t-1 decision tree model, and determines a first-order gradient Gt and a second-order gradient Ht according to the sample label and the predicted value of each training sample;

the enterprise A encrypts the first-order gradient Gt and the second-order gradient Ht through the symmetric key sym-key to obtain a first gradient ciphertext Enc (Gt) and a second gradient ciphertext Enc (Ht), and sends the first gradient ciphertext Enc (Gt) and the second gradient ciphertext Enc (Ht) to the enterprise B;

the enterprise B inputs the first gradient ciphertext Enc (Gt) and the second gradient ciphertext Enc (Ht) into the TEE environment, and decrypts the first gradient ciphertext Enc (Gt) and the second gradient ciphertext Enc (Ht) through the symmetric key sym-key in the TEE environment to obtain a first-order gradient Gt and a second-order gradient Ht;

in a TEE environment, an enterprise B samples a sample to be processed corresponding to a target node nid through a GOSS to obtain a target training sample corresponding to the target node nid, determines a first-order gradient histogram and a second-order gradient histogram through a first feature of the target training sample, a first-order gradient Gt and a second-order gradient Ht of the enterprise B, and determines a first splitting gain value gain1, and a first splitting feature fid1 and a first splitting gain feature value1 corresponding to the first splitting gain value gain1;

in the TEE environment, the enterprise B encrypts a first split gain value gain1 through a symmetric key sym-key, and outputs the encrypted first split gain value Enc (gain 1), a first split characteristic fid1 and a first split increment characteristic value1 to the TEE environment;

enterprise B sends the encrypted first split gain value Enc (gain 1) to enterprise a;

enterprise A receives the encrypted first split gain value Enc (gain 1), decrypts the encrypted first split gain value Enc (gain 1) through the symmetric key sym-key, and obtains a first split gain value gain1;

the enterprise A determines a second split gain value gain2, a second split characteristic fid2 corresponding to the second split gain value gain2 and a second split characteristic value2 according to the to-be-processed sample corresponding to the target node nid and the first characteristic of the to-be-processed sample possessed by the enterprise A;

when the first splitting gain value gain1 is greater than the second splitting gain value gain2, namely the first splitting gain value gain1 is the selected splitting gain value, the enterprise A sends prompt information to the enterprise B;

the enterprise B receives the prompt message, and performs node splitting on the target node nid according to the first splitting characteristic fid1 and the first splitting increment characteristic value1 corresponding to the encrypted first splitting gain value Enc (gain 1) to obtain a first splitting result;

the enterprise B sends the first split result to the enterprise A, and the enterprise A updates the sample node index information according to the first split result;

when the second splitting gain value gain2 is greater than the first splitting gain value gain1, that is, the second splitting gain value gain2 is the selected splitting gain value, the enterprise a performs node splitting on the target node nid according to the second splitting characteristic fid2 and the second splitting characteristic value2 corresponding to the second splitting gain value gain2, so as to obtain a second splitting result;

and the enterprise A sends the second split result to the enterprise B, and the enterprise B updates the sample node index information according to the second split result.

In a specific embodiment, as shown in fig. 13, the processing method of the decision tree model includes:

step S1301, the second object generates an asymmetric encrypted public key and a private key in the trusted execution environment, and obtains the public key output from the trusted execution environment;

step S1302, the second object sends a public key to the first object to indicate the first object to encrypt the symmetric key based on the public key to obtain a key ciphertext;

step S1303, the second object receives the key ciphertext sent by the first object, and decrypts the key ciphertext based on the private key in the trusted execution environment to obtain a symmetric key for decrypting the first gradient ciphertext and the second gradient ciphertext;

step S1304, the second object receives a first gradient ciphertext and a second gradient ciphertext which are obtained by encrypting the first gradient and the second gradient respectively based on the symmetric key and are sent by the first object;

step S1305, the second object inputs the first gradient ciphertext and the second gradient ciphertext into a trusted execution environment, and in the trusted execution environment, the first gradient ciphertext and the second gradient ciphertext are decrypted based on a symmetric key to obtain a first gradient and a second gradient; the first gradient comprises a first order gradient of the training samples, and the second gradient comprises a second order gradient of the training samples;

step 1306, in a trusted execution environment, a second object obtains a sample to be processed corresponding to a target node in a decision tree model, determines a first-order gradient of the sample to be processed in a first-order gradient of a training sample, and determines a second-order gradient of the sample to be processed in a second-order gradient of the training sample;

step 1307, in a trusted execution environment, the second object performs unilateral sampling on the sample to be processed based on the first-order gradient and the second-order gradient of the sample to be processed to obtain a target training sample; determining a first-order gradient histogram and a second-order gradient histogram of the target node based on the first-order gradient and the second-order gradient of the target training sample and a second feature of the target training sample possessed by the second object; the first-order gradient histogram of the target node is a first-order gradient histogram of the second characteristic, and the second-order gradient histogram of the target node is a second-order gradient histogram of the second characteristic;

step S1308, determining, by the second object in the trusted execution environment, a gain value of the second feature based on the first-order gradient histogram and the second-order gradient histogram of the second feature; selecting a gain value satisfying a gain condition from the gain values of the second feature; taking the gain value meeting the gain condition as a first splitting gain value of a target node in the decision tree model;

step 1309, encrypting the first split gain value based on the symmetric key by the second object in the trusted execution environment; after the encryption of the first split gain value is completed in the trusted execution environment, outputting the encrypted first split gain value from the trusted execution environment;

step S1310, when the second object obtains the encrypted first split gain value, sending the encrypted first split gain value to the first object, so that the first object decrypts the encrypted first split gain value based on the symmetric key to obtain the first split gain value, and determining a second split gain value based on the first-order gradient and the second-order gradient of the to-be-processed sample corresponding to the target node and the first feature of the to-be-processed sample possessed by the first object; selecting a splitting gain value meeting a preset condition from a first splitting gain value and a second splitting gain value by a first object;

step S1311, when the selected splitting gain value is the first splitting gain value, the second object receives prompt information sent by the first object, node splitting is conducted on a target node in the decision tree model corresponding to the second object based on the encrypted first splitting gain value to obtain a first splitting result, and the first splitting result is sent to the first object so that the first object updates the decision tree model corresponding to the first object based on the first splitting result;

step S1312, when the selected splitting gain value is the second splitting gain value, the second object receives the second splitting result obtained by the first object splitting the target node in the decision tree model corresponding to the first object according to the second splitting gain value, and updates the decision tree model corresponding to the second object according to the second splitting result.

It should be understood that, although the steps in the flowcharts related to the embodiments as described above are sequentially displayed as indicated by arrows, the steps are not necessarily performed sequentially as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in the flowcharts related to the embodiments described above may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the execution order of the steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least a part of the steps or stages in other steps.

In the processing method of the decision tree model, a second object receives a first gradient ciphertext and a second gradient ciphertext which are sent by a first object, decrypts the first gradient ciphertext and the second gradient ciphertext in a trusted execution environment, determines histogram information of a target node based on a training sample, a first splitting gain value based on the histogram information and performs node splitting on the target node based on the encrypted first splitting gain value in the trusted execution environment; the second object decrypts the first gradient ciphertext and the second gradient ciphertext in the trusted execution environment, and cannot access the trusted execution environment, so that the second object cannot obtain the first gradient and the second gradient, the tag data of the first object cannot be leaked, and the data security of the first object is ensured; the second object determines the histogram information based on the first gradient and the second gradient of the plaintext in the trusted execution environment, and compared with the method for determining the histogram information based on the encrypted gradient, the method reduces the calculated data amount, greatly improves the efficiency of determining the histogram information, further improves the optimization efficiency of federal learning, and further improves the training efficiency of the decision tree model.

In some embodiments, as shown in fig. 14, a method for processing a decision tree model is provided, which is described by taking the method as an example for being applied to the first terminal in fig. 1, and includes the following steps:

step S1402, sending the first gradient ciphertext and the second gradient ciphertext to a second object, so that the second object decrypts the first gradient ciphertext and the second gradient ciphertext in a trusted execution environment, determines histogram information of a target node in the decision tree model based on the training sample, the decrypted first gradient and the decrypted second gradient, determines a first splitting gain value of the target node in the decision tree model according to the histogram information, and encrypts the first splitting gain value;

step S1404, receiving the encrypted first split gain value sent by the second object;

in step S1406, node splitting is performed on the target node in the decision tree model based on the encrypted first splitting gain value.

The detailed process of steps S1402 to S1406 may refer to the embodiments of steps S202 to S206.

In some embodiments, sending the first gradient cipher text and the second gradient cipher text to the second object comprises: encrypting the first gradient and the second gradient respectively based on the symmetric key to obtain a first gradient ciphertext and a second gradient ciphertext; sending the first gradient ciphertext and the second gradient ciphertext to a second object; based on the encrypted first split gain value, node splitting is carried out on a target node in the decision tree model, and the node splitting method comprises the following steps: decrypting the encrypted first split gain value based on the symmetric key to obtain a first split gain value; and performing node splitting on the target node in the decision tree model based on the first splitting gain value.

In this embodiment, the first object encrypts the first gradient and the second gradient to obtain a detailed process of a first gradient ciphertext and a second gradient, which may refer to the above embodiment, where the second object receives a description of a process of the first gradient ciphertext and the second gradient ciphertext sent by the first object; for the detailed process of the first object performing node splitting on the target node in the decision tree model based on the encrypted first split gain value, reference may be made to the above embodiment, where the second object performs node splitting on the target node in the decision tree model based on the encrypted first split gain value.

In some embodiments, the method for processing a decision tree model further comprises: receiving a public key sent by a second object, and encrypting the symmetric key through the public key to obtain a key ciphertext; and sending the key ciphertext to the second object to indicate the second object to decrypt the key ciphertext based on a private key corresponding to the public key to obtain a symmetric key, and encrypting the first split gain value based on the symmetric key to obtain an encrypted first split gain value.

In this embodiment, reference may be made to the embodiments corresponding to steps S401 to S404 above for a detailed process in which the first object encrypts the symmetric key through the public key to obtain a key ciphertext.

In some embodiments, node splitting a target node in the decision tree model based on the first split gain value comprises: determining a second splitting gain value based on the first gradient and the second gradient of the sample to be processed corresponding to the target node and the first characteristic of the sample to be processed possessed by the first object; selecting a splitting gain value meeting a preset condition from the first splitting gain value and the second splitting gain value; and carrying out node splitting on the target node in the decision tree model based on the selected splitting gain value.

In this embodiment, reference may be made to the detailed process of the first object performing node splitting on the target node in the decision tree model based on the first split gain value, where the second object performs node splitting on the target node in the decision tree model based on the encrypted first split gain value in the foregoing embodiment.

In some embodiments, node splitting is performed on the target node in the decision tree model based on the selected splitting gain value, including: when the selected splitting gain value is the first splitting gain value, sending prompt information to the second object so that the second object performs node splitting on a target node in a decision tree model corresponding to the second object based on the encrypted first splitting gain value to obtain a first splitting result; and receiving a first split result sent by the second object, and updating the decision tree model corresponding to the first object according to the first split result.

In this embodiment, when the selected splitting gain value is the first splitting gain value, the first object performs a detailed process of node splitting on the target node in the decision tree model, and in the above embodiment, for a case that the splitting gain value meeting the preset condition is the first splitting gain value, the second object receives the prompt information sent by the first object, and performs a related description of the process of node splitting on the target node in the decision tree model based on the encrypted first splitting gain value, the first object performs a detailed process of node splitting on the target node in the decision tree model.

In some embodiments, node splitting is performed on the target node in the decision tree model based on the selected splitting gain value, including: when the selected splitting gain value is a second splitting gain value, node splitting is carried out on a target node in the decision tree model corresponding to the first object according to the second splitting gain value, and a second splitting result is obtained; and sending the second split result to the second object, so that the second object updates the decision tree model corresponding to the second object based on the second split result.

In this embodiment, when the selected splitting gain value is the second splitting gain value, the first object performs a detailed process of node splitting on the target node in the decision tree model, and reference may be made to the above embodiment, for a related description of a process in which when the splitting gain value meeting the preset condition is the second splitting gain value, the second object receives the second splitting result, and updates the decision tree model corresponding to the second object based on the second splitting result.

Based on the same inventive concept, the embodiment of the present application further provides a processing apparatus of a decision tree model for implementing the processing method of the decision tree model. The implementation scheme for solving the problem provided by the apparatus is similar to the implementation scheme described in the method, so the specific limitations in the embodiment of the processing apparatus for one or more decision tree models provided below may refer to the limitations on the processing method for the decision tree model in the foregoing, and are not described herein again.

In some embodiments, as shown in fig. 15, there is provided a processing apparatus of a decision tree model, including: gradient ciphertext receiving module 1501, first split gain value determining module 1502, and first node splitting module 1503, wherein:

the gradient ciphertext receiving module 1501 is configured to receive a first gradient ciphertext and a second gradient ciphertext sent by a first object;

a first split gain value determining module 1502, configured to decrypt the first gradient ciphertext and the second gradient ciphertext in a trusted execution environment, determine histogram information of a target node in the decision tree model based on the training sample, the decrypted first gradient and the decrypted second gradient, determine a first split gain value of the target node in the decision tree model according to the histogram information, and encrypt the first split gain value; the first gradient and the second gradient are gradients of different orders;

the first node splitting module 1503 is configured to, when the encrypted first split gain value is obtained, split a node of a target node in the decision tree model based on the encrypted first split gain value.

In some embodiments, the gradient ciphertext receiving module 1501 is specifically configured to receive a first gradient ciphertext and a second gradient ciphertext, which are obtained by encrypting the first gradient and the second gradient based on a symmetric key, and are sent by the first object;

a first split gain value determination module 1502 comprising a decryption unit; the decryption unit is to: and decrypting the first gradient ciphertext and the second gradient ciphertext based on the symmetric key.

In some embodiments, the processing means of the decision tree model further comprises:

the symmetric key determining module is used for generating an asymmetric encrypted public key and a private key in the trusted execution environment and acquiring the public key output from the trusted execution environment; sending a public key to the first object to indicate the first object to encrypt the symmetric key based on the public key to obtain a key ciphertext; receiving a key ciphertext sent by a first object; and in the trusted execution environment, decrypting the key ciphertext based on the private key to obtain a symmetric key for decrypting the first gradient ciphertext and the second gradient ciphertext.

In some embodiments, the first split gain value determination module 1502 includes an encryption unit to encrypt the first split gain value based on a symmetric key;

the first node splitting module 1503 is specifically configured to send the encrypted first split gain value to the first object, so that the first object decrypts the encrypted first split gain value based on the symmetric key to obtain a first split gain value, and select a split gain value that meets a preset condition from the first split gain value and the second split gain; the second split gain value is determined based on the first gradient and the second gradient of the to-be-processed sample corresponding to the target node and the first characteristic of the to-be-processed sample possessed by the first object; and when the splitting gain value meeting the preset condition is the first splitting gain value, receiving prompt information sent by the first object, and performing node splitting on a target node in the decision tree model based on the encrypted first splitting gain value.

In some embodiments, the first node splitting module 1503 includes a first splitting unit;

a first splitting unit, configured to perform node splitting on a target node in the decision tree model based on the encrypted first splitting gain value, including: performing node splitting on a target node in the decision tree model corresponding to the second object based on the encrypted first splitting gain value to obtain a first splitting result; and sending the first split result to the first object so that the first object updates the decision tree model corresponding to the first object based on the first split result.

In some embodiments, the first node splitting module 1503 includes a second splitting unit;

the second splitting unit is used for receiving a second splitting result obtained by splitting a target node in the decision tree model corresponding to the first object based on the second splitting gain value when the splitting gain value meeting the preset condition is the second splitting gain value; and updating the decision tree model corresponding to the second object based on the second split result.

In some embodiments, the processing means of the decision tree model is further configured to input the first gradient ciphertext and the second gradient ciphertext into the trusted execution environment; after the encryption of the first split gain value is completed in the trusted execution environment, the encrypted first split gain value is output from the trusted execution environment.

In some embodiments, the first gradient comprises a first order gradient of the training samples, the second gradient comprises a second order gradient of the training samples, and the histogram information comprises a first order gradient histogram and a second order gradient histogram;

the first split gain value determining module 1502 includes a histogram information determining unit, configured to obtain a to-be-processed sample corresponding to a target node in the decision tree model; determining a first-order gradient of a sample to be processed in the first-order gradient of the training sample, and determining a second-order gradient of the sample to be processed in the second-order gradient of the training sample; and determining a first-order gradient histogram and a second-order gradient histogram of the target node based on the first-order gradient and the second-order gradient of the sample to be processed and the second characteristic of the sample to be processed possessed by the second object.

In some embodiments, the first split gain value determining module 1502 includes a histogram information determining subunit, configured to perform single-side sampling on the sample to be processed based on the first-order gradient and the second-order gradient of the sample to be processed to obtain a target training sample; and determining a first-order gradient histogram and a second-order gradient histogram of the target node based on the first-order gradient and the second-order gradient of the target training sample and a second characteristic of the target training sample possessed by the second object.

In some embodiments, the first order histogram of the target node is a first order histogram of the second feature, and the second order histogram of the target node is a second order histogram of the second feature; the first split gain value determination module 1502 includes a first split gain value determination unit configured to determine a gain value of the second feature based on the first order gradient histogram and the second order gradient histogram of the second feature; selecting a gain value satisfying a gain condition from the gain values of the second feature; and taking the gain value meeting the gain condition as a first splitting gain value of the target node in the decision tree model.

The modules in the processing device of the decision tree model can be implemented in whole or in part by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

Based on the same inventive concept, the embodiment of the present application further provides a processing apparatus for a decision tree model, which is used for implementing the processing method for a decision tree model mentioned above. The implementation scheme for solving the problem provided by the apparatus is similar to the implementation scheme described in the method, so the specific limitations in the embodiment of the processing apparatus for one or more decision tree models provided below may refer to the limitations on the processing method for the decision tree model in the foregoing, and are not described herein again.

In some embodiments, as shown in fig. 16, there is provided a processing apparatus of a decision tree model, including: a gradient ciphertext sending module 1601, an encrypted first split gain value obtaining module 1602, and a second node split module 1603, where:

a gradient ciphertext sending module 1601, configured to send a first gradient ciphertext and a second gradient ciphertext to a second object, so that the second object decrypts the first gradient ciphertext and the second gradient ciphertext in a trusted execution environment, determines histogram information of a target node in a decision tree model based on a training sample, a first gradient and a second gradient obtained by decryption, determines a first split gain value of the target node in the decision tree model according to the histogram information, and encrypts the first split gain value;

an encrypted first splitting gain value obtaining module 1602, configured to receive an encrypted first splitting gain value sent by a second object;

a second node splitting module 1603, configured to perform node splitting on a target node in the decision tree model based on the encrypted first splitting gain value.

In some embodiments, the gradient cipher text transmission module 1601 includes a gradient encryption unit;

the gradient encryption unit is used for respectively encrypting the first gradient and the second gradient based on the symmetric key to obtain a first gradient ciphertext and a second gradient ciphertext; sending the first gradient ciphertext and the second gradient ciphertext to a second object;

a second node splitting module 1603, configured to decrypt the encrypted first split gain value based on the symmetric key to obtain a first split gain value; and performing node splitting on the target node in the decision tree model based on the first splitting gain value.

the key encryption module is used for receiving a public key sent by the second object and encrypting the symmetric key through the public key to obtain a key ciphertext; and sending the key ciphertext to the second object to indicate the second object to decrypt the key ciphertext based on a private key corresponding to the public key to obtain a symmetric key, and encrypting the first split gain value based on the symmetric key to obtain an encrypted first split gain value.

In some embodiments, the second node splitting module 1603 is specifically configured to determine a second splitting gain value based on the first gradient and the second gradient of the to-be-processed sample corresponding to the target node and the first characteristic of the to-be-processed sample possessed by the first object; selecting a splitting gain value meeting a preset condition from the first splitting gain value and the second splitting gain value; and carrying out node splitting on the target node in the decision tree model based on the selected splitting gain value.

In some embodiments, the second node splitting module 1603 includes a third splitting unit;

the third splitting unit is used for sending prompt information to the second object when the selected splitting gain value is the first splitting gain value so that the second object can perform node splitting on a target node in the decision tree model corresponding to the second object based on the encrypted first splitting gain value to obtain a first splitting result; and receiving a first split result sent by the second object, and updating the decision tree model corresponding to the first object according to the first split result.

In some embodiments, the second node splitting module 1603 includes a fourth splitting cell;

the fourth splitting unit is used for splitting the target node in the decision tree model corresponding to the first object according to the second splitting gain value when the selected splitting gain value is the second splitting gain value, so as to obtain a second splitting result; and sending the second split result to the second object, so that the second object updates the decision tree model corresponding to the second object based on the second split result.

In some embodiments, a computer device is provided, which may be a second server corresponding to the second object or a second terminal corresponding to the second object, and its internal structure diagram may be as shown in fig. 17. The computer device comprises a processor, a memory, an Input/Output (I/O) interface and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The input/output interface of the computer device is used for exchanging information between the processor and an external device. The communication interface of the computer device is used for connecting and communicating with an external terminal through a network. The computer program is executed by a processor to implement a method of processing a decision tree model.

It will be appreciated by those skilled in the art that the configuration shown in fig. 17 is a block diagram of only a portion of the configuration associated with the present application, and is not intended to limit the computing device to which the present application may be applied, and that a particular computing device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

In some embodiments, there is provided a computer device comprising a memory and a processor, the memory having stored therein a computer program that when executed by the processor performs the steps of:

receiving a first gradient ciphertext and a second gradient ciphertext transmitted by a first object; in a trusted execution environment, decrypting the first gradient ciphertext and the second gradient ciphertext, determining histogram information of a target node in the decision tree model based on the training sample, the decrypted first gradient and the decrypted second gradient, determining a first splitting gain value of the target node in the decision tree model according to the histogram information, and encrypting the first splitting gain value; the first gradient and the second gradient are gradients of different orders; when the encrypted first splitting gain value is obtained, node splitting is carried out on a target node in the decision tree model based on the encrypted first splitting gain value, or;

sending a first gradient ciphertext and a second gradient ciphertext to a second object so that the second object decrypts the first gradient ciphertext and the second gradient ciphertext in a trusted execution environment, determining histogram information of a target node in a decision tree model based on a training sample and a first gradient and a second gradient obtained by decryption, determining a first splitting gain value of the target node in the decision tree model according to the histogram information, and encrypting the first splitting gain value; receiving an encrypted first split gain value sent by a second object; and performing node splitting on the target node in the decision tree model based on the encrypted first splitting gain value.

In some embodiments, there is provided a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:

receiving a first gradient ciphertext and a second gradient ciphertext sent by a first object; in a trusted execution environment, decrypting the first gradient ciphertext and the second gradient ciphertext, determining histogram information of a target node in the decision tree model based on the training sample, the decrypted first gradient and the decrypted second gradient, determining a first splitting gain value of the target node in the decision tree model according to the histogram information, and encrypting the first splitting gain value; the first gradient and the second gradient are gradients of different orders; when the encrypted first split gain value is obtained, node splitting is carried out on a target node in the decision tree model based on the encrypted first split gain value, or;

In some embodiments, a computer program product is provided, comprising a computer program which, when executed by a processor, performs the steps of:

It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, displayed data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data need to comply with the relevant laws and regulations and standards of the relevant country and region.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above may be implemented by hardware instructions of a computer program, which may be stored in a non-volatile computer-readable storage medium, and when executed, may include the processes of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include a Read-Only Memory (ROM), a magnetic tape, a floppy disk, a flash Memory, an optical Memory, a high-density embedded nonvolatile Memory, a resistive Random Access Memory (ReRAM), a Magnetic Random Access Memory (MRAM), a Ferroelectric Random Access Memory (FRAM), a Phase Change Memory (PCM), a graphene Memory, and the like. Volatile Memory can include Random Access Memory (RAM), external cache Memory, and the like. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others. The databases referred to in various embodiments provided herein may include at least one of relational and non-relational databases. The non-relational database may include, but is not limited to, a block chain based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic devices, quantum computing based data processing logic devices, etc., without limitation.

All possible combinations of the technical features in the above embodiments may not be described for the sake of brevity, but should be considered as being within the scope of the present disclosure as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present application should be subject to the appended claims.

Claims

1. A method of processing a decision tree model, the method comprising:

receiving a first gradient ciphertext and a second gradient ciphertext transmitted by a first object;

in a trusted execution environment, decrypting the first gradient ciphertext and the second gradient ciphertext, determining histogram information of a target node in a decision tree model based on a training sample, a first gradient and a second gradient obtained by decryption, determining a first splitting gain value of the target node in the decision tree model according to the histogram information, and encrypting the first splitting gain value; the first gradient and the second gradient are gradients of different orders;

and when the encrypted first split gain value is obtained, node splitting is carried out on a target node in the decision tree model based on the encrypted first split gain value.

2. The method of claim 1, wherein receiving the first gradient cipher text and the second gradient cipher text sent by the first object comprises:

receiving a first gradient ciphertext and a second gradient ciphertext which are obtained by encrypting the first gradient and the second gradient respectively based on a symmetric key and are sent by a first object;

the decrypting the first gradient ciphertext and the second gradient ciphertext comprises:

decrypting the first gradient cipher text and the second gradient cipher text based on the symmetric key.

3. The method of claim 2, further comprising:

generating an asymmetrically encrypted public key and private key in the trusted execution environment, and acquiring the public key output from the trusted execution environment;

sending the public key to the first object to indicate the first object to encrypt the symmetric key based on the public key to obtain a key ciphertext;

receiving a key ciphertext transmitted by the first object;

and in the trusted execution environment, decrypting the key ciphertext based on the private key to obtain the symmetric key for decrypting the first gradient ciphertext and the second gradient ciphertext.

4. The method of claim 2 or 3, wherein the encrypting the first split gain value comprises:

encrypting the first split gain value based on the symmetric key;

performing node splitting on a target node in the decision tree model based on the encrypted first split gain value, including:

sending the encrypted first split gain value to the first object so that the first object decrypts the encrypted first split gain value based on the symmetric key to obtain the first split gain value, and selecting a split gain value meeting a preset condition from the first split gain value and the second split gain value; the second split gain value is determined based on a first gradient and a second gradient of a sample to be processed corresponding to the target node and a first characteristic of the sample to be processed possessed by the first object;

and when the splitting gain value meeting the preset condition is the first splitting gain value, receiving prompt information sent by the first object, and performing node splitting on a target node in the decision tree model based on the encrypted first splitting gain value.

5. The method of claim 4, wherein node splitting a target node in the decision tree model based on the encrypted first split gain value comprises:

performing node splitting on a target node in a decision tree model corresponding to a second object based on the encrypted first splitting gain value to obtain a first splitting result;

sending the first split result to the first object, so that the first object updates a decision tree model corresponding to the first object based on the first split result.

6. The method of claim 4, further comprising:

when the splitting gain value meeting the preset condition is the second splitting gain value, receiving a second splitting result obtained by splitting a target node in the decision tree model corresponding to the first object by the first object based on the second splitting gain value;

and updating the decision tree model corresponding to the second object based on the second split result.

7. The method of claim 1, wherein after receiving the first gradient cipher text and the second gradient cipher text sent by the first object, the method further comprises:

inputting the first gradient cipher text and the second gradient cipher text into a trusted execution environment;

when the encryption of the first split gain value is completed in the trusted execution environment, outputting the encrypted first split gain value from the trusted execution environment.

8. The method of claim 1, wherein the first gradient comprises a first order gradient of the training samples, the second gradient comprises a second order gradient of the training samples, and the histogram information comprises a first order gradient histogram and a second order gradient histogram;

the determining histogram information of the target node in the decision tree model based on the training sample, the decrypted first gradient and the decrypted second gradient includes:

obtaining a sample to be processed corresponding to a target node in a decision tree model;

determining a first order gradient of the sample to be processed in the first order gradient of the training sample, and determining a second order gradient of the sample to be processed in the second order gradient of the training sample;

determining a first-order gradient histogram and a second-order gradient histogram of the target node based on the first-order gradient and the second-order gradient of the sample to be processed and a second feature of the sample to be processed possessed by a second object.

9. The method of claim 8, wherein determining a first order gradient histogram and a second order gradient histogram of the target node based on the first order gradient and the second order gradient of the sample to be processed and a second feature of the sample to be processed provided by a second object comprises:

performing unilateral sampling on the sample to be processed based on the first-order gradient and the second-order gradient of the sample to be processed to obtain a target training sample;

determining a first order gradient histogram and a second order gradient histogram of the target node based on the first order gradient and the second order gradient of the target training sample and a second feature of the target training sample possessed by a second object.

10. The method according to claim 8 or 9, wherein the first order histogram of gradient of the target node is a first order histogram of gradient of the second feature, and the second order histogram of gradient of the target node is a second order histogram of gradient of the second feature; determining a first split gain value of a target node in the decision tree model according to the histogram information, including:

determining a gain value for the second feature based on the histogram of first order gradients and the histogram of second order gradients for the second feature;

selecting a gain value satisfying a gain condition from the gain values of the second feature;

and taking the gain value meeting the gain condition as a first splitting gain value of a target node in the decision tree model.

11. A method of processing a decision tree model, the method comprising:

sending a first gradient ciphertext and a second gradient ciphertext to a second object, so that the second object decrypts the first gradient ciphertext and the second gradient ciphertext in a trusted execution environment, determines histogram information of a target node in a decision tree model based on a training sample, a first gradient and a second gradient obtained by decryption, determines a first splitting gain value of the target node in the decision tree model according to the histogram information, and encrypts the first splitting gain value;

receiving the encrypted first split gain value sent by the second object;

and node splitting a target node in the decision tree model based on the encrypted first splitting gain value.

12. The method of claim 11, wherein sending the first gradient cipher text and the second gradient cipher text to the second object comprises:

encrypting the first gradient and the second gradient respectively based on the symmetric key to obtain a first gradient ciphertext and a second gradient ciphertext;

sending the first gradient cipher text and the second gradient cipher text to a second object;

decrypting the encrypted first split gain value based on the symmetric key to obtain the first split gain value;

and performing node splitting on a target node in the decision tree model based on the first splitting gain value.

13. The method of claim 12, further comprising:

receiving a public key sent by the second object, and encrypting the symmetric key through the public key to obtain a key ciphertext;

and sending the key ciphertext to the second object to instruct the second object to decrypt the key ciphertext based on a private key corresponding to the public key to obtain the symmetric key, and encrypting the first split gain value based on the symmetric key to obtain the encrypted first split gain value.

14. The method of claim 12, wherein the node splitting a target node in the decision tree model based on the first split gain value comprises:

determining a second splitting gain value based on a first gradient and a second gradient of a sample to be processed corresponding to the target node and a first characteristic of the sample to be processed possessed by a first object;

selecting a splitting gain value meeting a preset condition from the first splitting gain value and the second splitting gain value;

and splitting nodes of the target nodes in the decision tree model based on the selected splitting gain value.

15. The method of claim 14, wherein the node splitting a target node in the decision tree model based on the selected splitting gain value comprises:

when the selected splitting gain value is the first splitting gain value, sending prompt information to the second object so that the second object performs node splitting on a target node in a decision tree model corresponding to the second object based on the encrypted first splitting gain value to obtain a first splitting result;

and receiving a first split result sent by the second object, and updating the decision tree model corresponding to the first object according to the first split result.

16. The method of claim 14, wherein the node splitting a target node in the decision tree model based on the selected splitting gain value comprises:

when the selected splitting gain value is the second splitting gain value, node splitting is carried out on a target node in the decision tree model corresponding to the first object according to the second splitting gain value, and a second splitting result is obtained;

and sending the second split result to a second object so that the second object updates a decision tree model corresponding to the second object based on the second split result.

17. An apparatus for processing a decision tree model, the apparatus comprising:

a first splitting gain value determining module, configured to decrypt the first gradient ciphertext and the second gradient ciphertext in a trusted execution environment, determine histogram information of a target node in a decision tree model based on a training sample, a decrypted first gradient and a decrypted second gradient, determine a first splitting gain value of the target node in the decision tree model according to the histogram information, and encrypt the first splitting gain value; the first gradient and the second gradient are gradients of different orders;

and the first node splitting module is used for splitting nodes of a target node in the decision tree model based on the encrypted first splitting gain value when the encrypted first splitting gain value is obtained.

18. An apparatus for processing a decision tree model, the apparatus comprising:

a gradient ciphertext sending module, configured to send a first gradient ciphertext and a second gradient ciphertext to a second object, so that the second object decrypts the first gradient ciphertext and the second gradient ciphertext in a trusted execution environment, determine histogram information of a target node in a decision tree model based on a training sample, a first gradient obtained through decryption, and a second gradient, determine a first splitting gain value of the target node in the decision tree model according to the histogram information, and encrypt the first splitting gain value;

19. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 16.

20. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 16.