CN117251805A

CN117251805A - Federal gradient lifting decision tree model updating system based on breadth-first algorithm

Info

Publication number: CN117251805A
Application number: CN202311543444.5A
Authority: CN
Inventors: 朱明杰; 陈超超; 鲍力成; 李岩; 郑小林
Original assignee: Hangzhou Jinzhita Technology Co ltd
Current assignee: Hangzhou Jinzhita Technology Co ltd
Priority date: 2023-11-20
Filing date: 2023-11-20
Publication date: 2023-12-19
Anticipated expiration: 2043-11-20
Also published as: CN117251805B

Abstract

The present specification provides a federal gradient boost decision tree model update system based on breadth-first algorithm, comprising a first party, a second party, and a service provider: the first participant calculates a target gradient value based on the data distribution of the behavior data set and sends the target gradient value to the second participant; updating a first decision tree of a corresponding tree model updating task issued by a service provider based on behavior characteristics of a behavior data set, traversing the updated first decision tree according to a breadth-first algorithm based on a target gradient value, and acquiring first traversal information and transmitting the first traversal information to the service provider; updating the first decision tree by the second party based on the credit characteristics of the credit data set, traversing the updated first decision tree according to a breadth-first algorithm based on the target gradient value, obtaining second traversing information and sending the second traversing information to the service provider; the service provider calculates node weights of tree nodes in the updated first decision tree based on the first traversal information and the second traversal information, and at least one target gradient promotion decision tree model is obtained.

Description

Federal gradient lifting decision tree model updating system based on breadth-first algorithm

Technical Field

The specification relates to the field of computer technology, in particular to a federal gradient promotion decision tree model updating system based on breadth-first algorithm.

Background

Privacy computing is a technology for data analysis and model training on the premise of protecting data privacy. The gradient lifting decision tree is one of the most widely applied machine learning models at present, and has wide application in the fields of financial wind control, anti-fraud and the like. A user may use a gradient-lifting decision tree algorithm to build a predictive model.

In the prior art, during model training, a training initiator is usually required to search for the optimal split point, synchronize the result to a training participant, and repeat the steps continuously. However, the mode has the problems that the iteration times affect the training time, the model training efficiency is low, and the model training speed is low. Thus, a more efficient approach is needed to solve the above-mentioned problems.

Disclosure of Invention

In view of this, the present description embodiments provide a federal gradient boost decision tree model update system based on breadth-first algorithms. The present specification also relates to a federal gradient boost decision tree model updating method based on breadth-first algorithm, a computing device, and a computer-readable storage medium to address the technical deficiencies of the prior art.

According to a first aspect of embodiments of the present specification, there is provided a federal gradient boost decision tree model updating system based on breadth-first algorithm, comprising a first party, a second party, and a service provider:

the first participant is used for calculating a target gradient value based on the data distribution of the behavior data set and sending the target gradient value to the second participant; updating a first decision tree of a corresponding tree model updating task issued by the service provider based on the behavior characteristics of the behavior data set, traversing the updated first decision tree according to a breadth-first algorithm based on the target gradient value, obtaining first traversing information and transmitting the first traversing information to the service provider;

the second party is configured to update the first decision tree issued by the service provider based on credit characteristics of a credit data set, traverse the updated first decision tree according to a breadth-first algorithm based on the target gradient value, obtain second traversal information, and send the second traversal information to the service provider, where the behavior data set and the credit data set have a data alignment relationship;

the service provider is configured to calculate node weights of tree nodes in the first decision tree based on the first traversal information and the second traversal information, update the first decision tree based on the node weights, and obtain at least one target gradient lifting decision tree model corresponding to the tree model update task according to an update result; and parameter association relations are arranged among all the target gradient lifting decision tree models.

According to a second aspect of embodiments of the present specification, there is provided another federal gradient boost decision tree model updating method based on breadth-first algorithm, comprising:

the first participant calculates a target gradient value based on the data distribution of the behavior data set and sends the target gradient value to the second participant; updating a first decision tree of a corresponding tree model updating task issued by a service provider based on the behavior characteristics of the behavior data set, traversing the updated first decision tree according to a breadth-first algorithm based on the target gradient value, obtaining first traversing information and transmitting the first traversing information to the service provider;

updating the first decision tree issued by the service provider by a second participant based on credit characteristics of a credit data set, traversing the updated first decision tree according to a breadth-first algorithm based on the target gradient value, obtaining second traversing information and sending the second traversing information to the service provider, wherein the behavior data set and the credit data set have a data alignment relationship;

the service provider calculates the node weight of the tree node in the first decision tree based on the first traversing information and the second traversing information, updates the first decision tree based on the node weight, and obtains at least one target gradient lifting decision tree model corresponding to the tree model updating task according to an updating result; and parameter association relations are arranged among all the target gradient lifting decision tree models.

According to a third aspect of embodiments of the present specification, there is provided a federal gradient boost decision tree model updating system based on breadth-first algorithm, the system comprising a first party, a second party, and a service provider:

the first participant is configured to calculate a target gradient value based on a sample distribution of a first sample data set and send the target gradient value to the second participant; updating a first decision tree of a corresponding tree model updating task issued by the service provider based on a first sample characteristic of the first sample data set, traversing the updated first decision tree according to a breadth-first algorithm based on the target gradient value, obtaining first traversal information, and sending the first traversal information to the service provider;

the second participant is configured to update the first decision tree issued by the service provider based on a second sample feature of a second sample data set, and traverse the updated first decision tree according to a breadth-first algorithm based on the target gradient value, obtain second traversal information, and send the second traversal information to the service provider, where the first sample data set and the second sample data set have a data alignment relationship;

The service provider is configured to calculate node weights of tree nodes in the first decision tree based on the first traversal information and the second traversal information, update the first decision tree based on the node weights, and obtain at least one target gradient lifting decision tree model corresponding to the tree model update task according to an update result; the target gradient lifting decision tree models are provided with parameter association relations, and are used for carrying out credit prediction on users.

According to a fourth aspect of embodiments of the present specification, there is provided another federal gradient boost decision tree model updating method based on breadth-first algorithm, comprising:

the first participant calculates a target gradient value based on a sample distribution of the first sample data set and sends the target gradient value to the second participant; updating a first decision tree of a corresponding tree model updating task issued by a service provider based on first sample characteristics of the first sample data set, traversing the updated first decision tree according to a breadth-first algorithm based on the target gradient value, obtaining first traversing information, and sending the first traversing information to the service provider;

the second participant updates the first decision tree issued by the service provider based on a second sample characteristic of a second sample data set, traverses the updated first decision tree according to a breadth-first algorithm based on the target gradient value, obtains second traversal information and sends the second traversal information to the service provider, wherein the first sample data set and the second sample data set have a data alignment relationship;

The service provider calculates the node weight of the tree node in the first decision tree based on the first traversing information and the second traversing information, updates the first decision tree based on the node weight, and obtains at least one target gradient lifting decision tree model corresponding to the tree model updating task according to an updating result; the target gradient lifting decision tree models are provided with parameter association relations, and are used for carrying out credit prediction on users.

According to a fifth aspect of embodiments of the present specification, there is provided a computing device comprising:

a memory and a processor;

the memory is configured to store computer-executable instructions that, when executed by the processor, implement the steps of the federal gradient boost decision tree model updating method based on breadth-first algorithm.

According to a sixth aspect of embodiments of the present description, there is provided a computer readable storage medium storing computer executable instructions which, when executed by a processor, implement the steps of the federal gradient boost decision tree model updating method based on breadth-first algorithm.

The federal gradient boost decision tree model updating system based on breadth-first algorithm provided in the present specification comprises a first participant, a second participant and a service provider: a first participant for calculating a target gradient value based on the data distribution of the behavioural dataset and transmitting the target gradient value to a second participant; updating a first decision tree of a corresponding tree model updating task issued by a service provider based on behavior characteristics of a behavior data set, traversing the updated first decision tree according to a breadth-first algorithm based on a target gradient value, obtaining first traversal information, and transmitting the first traversal information to the service provider; the second party is used for updating the first decision tree issued by the service provider based on the credit characteristics of the credit data set, traversing the updated first decision tree based on the target gradient value according to the breadth-first algorithm, obtaining second traversing information and sending the second traversing information to the service provider, wherein the behavior data set and the credit data set have a data alignment relationship; the service provider is used for calculating the node weight of the tree node in the first decision tree based on the first traversing information and the second traversing information, updating the first decision tree based on the node weight, and obtaining at least one target gradient lifting decision tree model corresponding to the tree model updating task according to the updating result; and parameter association relations are arranged among all the target gradient lifting decision tree models.

According to the embodiment of the description, the gradient lifting decision tree model is updated based on breadth-first algorithm and federal learning, and on one hand, the gradient lifting decision tree model does not leak model gradient in the updating process through the combination of the first participant, the second participant and the service provider; on the other hand, based on the breadth-first algorithm, the updating speed of the model can be improved, and faster model updating can be realized through breadth-first.

Drawings

FIG. 1 is a schematic diagram of a federal gradient boost decision tree model update system based on breadth-first algorithm according to one embodiment of the present disclosure;

FIG. 2 is a schematic diagram of a federal gradient boost decision tree model update system based on breadth-first algorithm according to one embodiment of the present disclosure;

FIG. 3 is a flowchart of a federal gradient boost decision tree model update method based on breadth-first algorithm according to one embodiment of the present disclosure;

FIG. 4 is an interactive schematic diagram of a federal gradient boost decision tree model update method based on breadth-first algorithm applied to gradient boost decision tree model update according to an embodiment of the present disclosure;

FIG. 5 is a schematic diagram of another federal gradient boost decision tree model update system based on breadth-first algorithm according to an embodiment of the present disclosure;

FIG. 6 is a flowchart of another federal gradient boost decision tree model update method based on breadth-first algorithm according to an embodiment of the present disclosure;

FIG. 7 is a block diagram of a computing device according to one embodiment of the present disclosure.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present description. This description may be embodied in many other forms than described herein and similarly generalized by those skilled in the art to whom this disclosure pertains without departing from the spirit of the disclosure and, therefore, this disclosure is not limited by the specific implementations disclosed below.

The terminology used in the one or more embodiments of the specification is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of the specification. As used in this specification, one or more embodiments and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any or all possible combinations of one or more of the associated listed items.

It should be understood that, although the terms first, second, etc. may be used in one or more embodiments of this specification to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first may also be referred to as a second, and similarly, a second may also be referred to as a first, without departing from the scope of one or more embodiments of the present description. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.

First, terms related to one or more embodiments of the present specification will be explained.

Homomorphic encryption: cryptography techniques based on computational complexity theory of mathematical problems. The homomorphic encrypted data is processed to obtain an output, and the output is decrypted, so that the result is the same as the output result obtained by processing the unencrypted original data by the same method.

Gradient lifting decision tree: GBDT (Gradient Boosting Decision Tree), also called MART (Multiple Additive Regression Tree), is an iterative decision tree algorithm consisting of a number of decision trees, the conclusions of all of which are accumulated to make the final answer.

Federal machine learning (Federated machine learning/Federated Learning): also known as federal learning, joint learning, and federal learning. Federal machine learning is a machine learning framework that can effectively help multiple institutions perform data usage and machine learning modeling while meeting the requirements of user privacy protection, data security, and government regulations.

Breadth-first algorithm: BFS (Breadth First Search), also known as "breadth-first search" or "lateral-first search".

In the present specification, a federal gradient boost decision tree model updating system based on breadth-first algorithm is provided. The present specification relates to a federal gradient boost decision tree model updating method based on breadth-first algorithm, a computing device, and a computer-readable storage medium, which are described in detail in the following embodiments.

Referring to the schematic diagram shown in fig. 1, the federal gradient boost decision tree model update system based on breadth-first algorithm provided in the present specification includes a first party, a second party, and a service provider. The first participant, the second participant and the service provider cooperate to complete the tree model updating task. The service provider is used for providing a gradient lifting decision tree model to be trained and completing calculation and discrimination tasks in the model training process. The service provider determines a first decision tree based on a gradient boost decision tree model to be trained and issues the first decision tree to a first participant and a second participant, respectively.

The first participant calculates a target gradient value based on the data distribution of the behavior data set and sends the target gradient value to the second participant; updating a first decision tree of a corresponding tree model updating task issued by a service provider based on behavior characteristics of a behavior data set, traversing the updated first decision tree according to a breadth-first algorithm based on a target gradient value, obtaining first traversal information, and transmitting the first traversal information to the service provider; the second party updates a first decision tree issued by the service provider based on credit characteristics of the credit data set, traverses the updated first decision tree based on the target gradient value according to a breadth-first algorithm, obtains second traversal information and sends the second traversal information to the service provider, wherein the behavior data set and the credit data set have a data alignment relationship; the service provider calculates the node weight of the tree node in the first decision tree based on the first traversing information and the second traversing information, updates the first decision tree based on the node weight, and obtains at least one target gradient lifting decision tree model corresponding to the tree model updating task according to the updating result; and parameter association relations are arranged among all the target gradient lifting decision tree models.

FIG. 2 illustrates a schematic diagram of a federal gradient boost decision tree model update system based on breadth-first algorithms, provided in accordance with an embodiment of the present description; the federal gradient boost decision tree model update system 200 based on breadth-first algorithm includes a first party 210, a second party 220, and a service provider 230:

the first participant 210 is configured to calculate a target gradient value based on the data distribution of the behavior data set and send the target gradient value to the second participant 220; updating a first decision tree of a corresponding tree model updating task issued by the service provider 230 based on the behavior characteristics of the behavior data set, traversing the updated first decision tree according to a breadth-first algorithm based on the target gradient value, obtaining first traversal information, and sending the first traversal information to the service provider 230;

The second participant 220 is configured to update the first decision tree issued by the service provider 230 based on credit characteristics of a credit data set, traverse the updated first decision tree according to a breadth-first algorithm based on the target gradient value, obtain second traversal information, and send the second traversal information to the service provider 230, where the behavior data set and the credit data set have a data alignment relationship;

the service provider 230 is configured to calculate a node weight of a tree node in the first decision tree based on the first traversal information and the second traversal information, update the first decision tree based on the node weight, and obtain at least one target gradient lifting decision tree model corresponding to the tree model update task according to an update result; and parameter association relations are arranged among all the target gradient lifting decision tree models.

Specifically, the tree model updating task is a task for updating the gradient lifting decision tree model; the first participant 210 and the second participant 220 are each participants training a gradient boost decision tree model; the first participant 210 holds a behavior data set of behavior data, and modeling tag data; the second party 220 holds a data set of credit data; the service provider 230 is used for calculating and judging services of the gradient lifting decision tree model in the training process, and is used for calculating gain values, function values and weights in the training process of the gradient lifting decision tree model, judging whether to finish the traversal of the current decision tree and judging whether to finish the training of the gradient lifting decision tree model; the target gradient value refers to a first derivative and a second derivative after encryption obtained based on sample distribution and modeling tag data calculation; the behavior characteristics of the behavior data set are characteristic data obtained after the behavior data are converted into vector expression forms; correspondingly, the credit characteristics of the credit data set are characteristic data obtained after the credit data are converted into vector expression forms; the behavior data include, but are not limited to, purchasing behavior under a commodity purchasing scene, browsing and collecting behavior for commodities, and the like; the credit data is the borrowing and repayment of the user and the behavior information aiming at the financial products.

The gradient lifting decision tree model corresponds to at least one decision tree, and the process of updating the gradient lifting decision tree model is a process of traversing the at least one decision tree; the first decision tree is the first decision tree corresponding to the gradient lifting decision tree model; updating the first decision tree based on the behavior data, namely integrating the behavior data into nodes in the first decision tree on the basis of the tree structure of the first decision tree; correspondingly, updating the first decision tree based on the credit data, namely, merging the credit data into tree nodes in the first decision tree on the basis of the tree structure of the first decision tree; the parameter association relationship refers to a parameter transfer relationship between two adjacent decision trees in the gradient lifting decision tree model, namely, a prediction result obtained after the traversing of the preamble decision tree is completed, and the traversing of the next decision tree is influenced; the first traversal information comprises distribution information of each node of the behavior data in the first decision tree and gradient values of each node, and the gradient values are used for calculating node weights subsequently; the second traversal information comprises distribution information of the credit data in each node in the first decision tree and gradient values of each node, and the gradient values are used for calculating node weights subsequently; when the first decision tree is updated based on the node weight, the first decision tree at the moment comprises distribution information of credit data in each node in the first decision tree and distribution information of behavior data in each node in the first decision tree.

Based on this, when updating the gradient boost decision tree model, the first participant 210 calculates a target gradient value based on the data distribution of the behavior data set and the modeling tag data and sends the target gradient value to the second participant 220. The first decision tree of the corresponding tree model updating task issued by the service provider 230 is updated based on the behavior characteristics of the behavior data set, and the behavior characteristics are distributed to nodes in the first decision tree to complete the division of the behavior characteristics. Traversing the updated first decision tree according to a breadth-first algorithm based on the target gradient values, obtaining first traversal information, and sending the first traversal information to the service provider 230.

The second party 220 updates the first decision tree issued by the service provider 230 based on the credit characteristics of the credit data set, distributing the credit characteristics to the nodes in the first decision tree. Traversing the updated first decision tree according to a breadth-first algorithm based on the target gradient values, obtaining second traversal information, and sending the second traversal information to the service provider 230. The behavioral data set and the credit data set have a data alignment relationship. The service provider 230 calculates node weights of tree nodes in the first decision tree based on the first traversal information and the second traversal information, updates the first decision tree summarizing the first traversal information and the second traversal information based on the node weights, obtains a prediction result of the first decision tree for the behavior data and the credit data, obtains a target gradient lifting decision tree model corresponding to the tree model updating task according to the update result, determines a next decision tree of the corresponding tree model updating task issued by the service provider 230, and the like until the gradient lifting decision tree model is updated to obtain at least two target gradient lifting decision tree models.

In practical applications, the model function of the gradient-lifting decision tree model may be a function as shown in the following formula (1):

wherein,i represents the i-th sample distribution;representing the predicted result of the ith sample distribution; />Representing the predicted result of the K decision tree; k represents the Kth decision tree in the gradient lifting decision tree model; />Representing the i-th sample distribution input data, i.e. behavioural data or credit data.

For example, the updating process of the gradient lifting decision tree model is a training process. The first party and the second party participate in training, and a third party providing services such as calculation and discrimination, namely a service provider. The first party holds credit card use information, financial product holding information and other financial dimension information in the financial field of the user; the second party holds behavior information for the commodity in the same user's e-commerce domain as the first party. The method comprises the steps that a service provider provides a gradient lifting decision tree model to be updated, and sends first decision trees corresponding to the gradient lifting decision tree model to a first participant and a second participant, the first participant and the second participant traverse the first decision tree based on information held by the first participant and the second participant respectively, the traversed information is sent to the service provider, the service provider completes updating of the first decision tree based on the traversed information sent by the first participant and the second participant respectively, a target gradient lifting decision tree model is obtained, and the updated gradient lifting decision tree model is obtained until all decision trees in the gradient lifting decision tree model are updated.

In summary, an embodiment of the present disclosure implements updating a gradient lifting decision tree model based on breadth-first algorithm and federal learning, so that, on one hand, the gradient lifting decision tree model does not leak model gradients in the updating process through the combination of a first participant, a second participant and a service provider; on the other hand, based on the breadth-first algorithm, the updating speed of the model can be improved, and faster model updating can be realized through breadth-first.

Further, considering that when updating the gradient boost decision tree model, there is a risk that the user privacy data for model update and the model gradient are compromised, the first participant 210 may encrypt the initial gradient value with a homomorphic encryption algorithm before sending the initial gradient value to the second participant 220, and in specific implementation, the first participant 210 is configured to calculate the initial gradient value based on the data distribution of the behavior data set; encrypting the initial gradient value to obtain a target gradient value by adopting a homomorphic encryption algorithm based on a public key provided by the service provider 230, and transmitting the target gradient value to the second participant 220; the service provider 230 is configured to decrypt the first traversal information in the ciphertext format and the second traversal information in the ciphertext format based on a private key, and then calculate a node weight of a tree node in the first decision tree, where the public key and the private key form a key pair.

Specifically, the initial gradient value refers to a first derivative and a second derivative calculated based on sample distribution and modeling tag data; encrypting the first derivative and the second derivative by adopting a homomorphic encryption algorithm to obtain a target gradient value; the public key is used for encrypting the first derivative and the second derivative, and the private key is used for decrypting the first traversal information in the ciphertext format and the second traversal information in the ciphertext format.

Based thereon, the first participant 210 calculates an initial gradient value based on the data distribution of the behavioral dataset; the initial gradient value is encrypted using a homomorphic encryption algorithm based on the public key provided by the service provider 230 to obtain a target gradient value, which is sent to the second party 220. After the first participant 210 and the second participant 220 have completed traversing the first decision tree based on the target gradient values, respectively. After obtaining the first traversal information in the ciphertext format sent by the first participant 210 and the second traversal information in the ciphertext format sent by the second participant 220, the service provider 230 may calculate the node weight of the tree node in the first decision tree after decrypting the first traversal information in the ciphertext format and the second traversal information in the ciphertext format based on the private key, because the public key and the private key form a key pair.

Along with the above example, the first derivative G contained in the initial gradient value is calculated and obtained at the second party _i And after the second derivative Hi, adopting homomorphic encryption algorithm to respectively obtain first derivative G _i And second derivative H _i Encrypting to obtain encrypted first derivative [ G ] _i ]And the encrypted second derivative [ H ] _i ]From the encrypted first derivative [ G _i ]And the encrypted second derivative [ H ] _i ]The target gradient values are composed.

In summary, the homomorphic encryption algorithm is adopted to encrypt the initial gradient value to obtain the target gradient value, so that the traversal of the first decision tree and the update of the gradient lifting decision tree model are realized based on the target gradient value, the safety of the updating process of the gradient lifting decision tree model is improved, and the safety of the privacy data of the user is further improved.

Further, considering that the first participant 210 and the second participant 220 respectively traverse the updated first decision tree, and there are multiple node splitting manners, so that the splitting manners need to be enumerated, and the data distribution information is recorded respectively so as to determine an optimal splitting manner, and when the method is implemented, the first participant 210 is configured to determine at least one first splitting manner of the updated first decision tree according to a first traversing result; determining first data distribution information corresponding to each first segmentation mode to form first traversal information; correspondingly, the second participant 220 is configured to determine at least one second partition manner of the updated first decision tree according to a second traversal result; and determining second data distribution information corresponding to each second segmentation mode to form second traversal information.

Specifically, the first traversing result is a plurality of segmenting modes obtained by traversing the updated first decision tree and characteristic segmenting information under each segmenting mode; the first segmentation method can be any segmentation method of behavior characteristics; correspondingly, the first data distribution information is the distribution information of the behavior features in the nodes obtained after the behavior features are segmented by adopting a first segmentation mode; the second traversing result is a plurality of segmenting modes obtained by traversing the updated first decision tree and characteristic segmenting information under each segmenting mode; the second segmentation method can be any segmentation method of credit features; correspondingly, the second data distribution information is the distribution information of the credit features in the nodes obtained after the credit features are segmented by adopting a second segmentation mode.

Based on this, the first participant 210 determines at least one first segmentation of the updated first decision tree according to the first traversal result; after determining and dividing the behavior characteristics, first data distribution information corresponding to each first division mode forms first traversal information; correspondingly, the second participant 220 determines at least one second segmentation method of the updated first decision tree according to the second traversal result; and after the credit characteristics are divided, second data distribution information corresponding to each second dividing mode is determined to form second traversal information.

Dividing the first decision tree by taking age as an example along the use example, wherein 50 is taken as a dividing point, and the age is divided into more than 50 and less than 50 distribution modes; other partitioning points may also be selected to complete the partitioning. And further determining data distribution information corresponding to each segmentation mode to form traversal information.

In summary, when traversing the first decision tree, enumeration is performed on multiple segmentation modes, so that more comprehensive traversing information is generated conveniently.

Further, considering that there are multiple splitting manners in traversing the first decision tree, then one splitting manner needs to be determined as the optimal splitting manner among different splitting manners so as to complete calculation of node weights of tree nodes in the first decision tree, so that data distribution information corresponding to each splitting manner needs to be recorded, and when the method is implemented, determining the first data distribution information corresponding to any one of the first splitting manners includes: the first participant 210 is configured to determine a behavior flag of a first segmentation mode and a gradient value of a first segmentation node corresponding to the first segmentation mode; generating first data distribution information based on the behavior mark of the first segmentation mode and the gradient value of the first segmentation node; correspondingly, the determining of the second data distribution information corresponding to any one of the second division modes comprises the following steps: the second party 220 is configured to determine a credit flag of a second partition mode and a gradient value of a second partition node corresponding to the second partition mode; and generating second data distribution information based on the credit marks of the second segmentation modes and the gradient values of the second segmentation nodes.

Specifically, the behavior mark is used for marking the data distribution information of the left node and the right node in the first segmentation mode; the credit mark is used for marking the data distribution information of the left node and the right node in the second division mode; the gradient values of the first segmentation nodes are the first derivative and the second derivative values of the first segmentation nodes; the gradient values of the second partition nodes are the first derivative and the second derivative values of the second partition nodes.

Based on this, in determining the first data distribution information corresponding to each first division manner, the first participant 210 determines the behavior flag of the first division manner and the gradient value of the first division node corresponding to the first division manner; generating first data distribution information based on the behavior mark of the first segmentation mode and the gradient value of the first segmentation node; accordingly, when determining the second data distribution information corresponding to each second partition mode, the second participant 220 determines the credit of the second partition mode and the gradient value of the second partition node corresponding to the second partition mode; and generating second data distribution information based on the credit marks of the second segmentation mode and the gradient values of the second segmentation nodes.

In practical application, when the first participant traverses the first decision tree based on the behavior characteristics, at least one first segmentation mode is generated, and the data distribution information of each first segmentation mode in the multiple first segmentation modes forms first traversal information of the first decision tree. When the second party traverses the first decision tree based on the credit feature, at least one second division mode is generated, and the data distribution information of each second division mode in the plurality of second division modes forms the first traversal information of the first decision tree.

In summary, the first participant and the second participant record the data distribution information corresponding to the respective segmentation modes, so that the subsequent service provider can update the first decision tree to obtain the target gradient lifting decision tree model.

Further, since the service provider 230 obtains the first traversal information and the second traversal information provided by the first participant 210 and the second participant 220, respectively, before performing the subsequent calculation, the first traversal information and the second traversal information need to be integrated, and when the service provider 230 is specifically implemented, the service provider is further configured to generate the target traversal information based on the first traversal information and the second traversal information; determining at least one segmentation mode based on the target traversal information; and determining a target segmentation mode in the at least one segmentation mode based on the gain value of each segmentation mode, and calculating the node weight of the tree node based on the target gain value of the target segmentation mode.

Specifically, the target traversal information is traversal information obtained by integrating the first traversal information and the second traversal information according to the node corresponding relation; the segmentation mode refers to at least one segmentation mode obtained after segmentation based on characteristic information in the target traversal information; the target division method is a division method corresponding to a larger gain value determined by comparing gain values among at least one division method.

Based on this, the service provider 230 integrates the first traversal information and the second traversal information after obtaining the first traversal information transmitted by the first participant 210 and the second traversal information transmitted by the second participant 220, and generates the target traversal information based on the first traversal information and the second traversal information. At least one partitioning method is determined for the first decision tree based on the target traversal information. Based on the comparison of the gain values of each of the division modes, a division mode with a larger gain value is determined as a target division mode in at least one of the division modes, and the node weight of the tree node is calculated based on the target gain value of the target division mode.

In practical application, when the first traversing information records the distribution information of the nodes of the behavior characteristics in the first decision tree when the first participant traverses the first decision tree; the second traversal information records the distribution information of the nodes of the credit features in the first decision tree when the second participant traverses the first decision tree; and integrating the first traversal information and the second traversal information according to the node sequence based on the corresponding relation of the nodes between the updated decision trees corresponding to the two parties. That is, after the first participant traverses the first decision tree, the distribution information of the nodes of the behavior feature in the first decision tree is obtained, and after the second participant traverses the first decision tree, the distribution information of the nodes of the credit feature in the first decision tree is obtained, and then the distribution information corresponding to the nodes of the same node position in the first decision tree corresponding to the two parties is integrated. The first decision tree held by the service provider can combine the distribution information of the nodes in the first decision tree of the behavior characteristics corresponding to the first participant and the distribution information of the nodes in the first decision tree of the credit characteristics corresponding to the second participant, so that the target traversal information is obtained. And when the first decision tree is segmented based on the target traversal information, the credit feature and the behavior feature can be combined. And determining at least one segmentation mode after segmenting the first decision tree based on the target traversal information. In at least one division mode, a target division mode is selected based on a gain value corresponding to the division mode, and then node weights of tree nodes are calculated based on the target gain value of the target division mode.

In summary, the service provider integrates the first traversal information and the second traversal information to generate the target traversal information, and further calculates the node weight of the tree node based on the target traversal information, so that the traversal results of the first participant and the second participant are fused during the calculation of the node weight, and the accuracy of the calculation of the node weight is improved.

Further, considering that each of the division modes corresponds to a different feature division mode, and the corresponding gain values corresponding to the different feature division modes are also different, the calculation of the gain value needs to be performed for each of the division modes, and when the method is implemented, the determination of the gain value of any one of the division modes includes: the service provider 230 is configured to calculate a function value of a target partition node according to a gradient value of the target partition node in a node partition manner and a lifting tree function; and calculating the gain value of the node segmentation mode based on the function value of the target segmentation node and the function value of the target tree node corresponding to the target segmentation node.

Specifically, the lifting tree function is a preset function and is used for calculating the function value of the node, and then the gain value of the node is calculated by combining the function value to be used as the gain value of the segmentation mode; the target tree node is the father node corresponding to the target partition node.

Based on this, the service provider 230 calculates the function value of the target division node from the gradient value of the target division node in the node division manner and the lifting tree function when calculating the gain value of the division manner. And determining the function value of the target tree node corresponding to the target segmentation node. And calculating a gain value of the node segmentation mode based on the function value of the target segmentation node and the function value of the target tree node corresponding to the target segmentation node.

In practical application, the calculation of the function value of any node can be realized by the following formula (2):

wherein Obj represents; t represents the number of leaf nodes of the current tree (first decision tree, second decision tree, etc.); g _i Representing the first derivative contained by the initial gradient value; h _i Representing the second derivative contained by the initial gradient value; lambda represents the L2 regularization parameter that controls the weight value of the model complexity.

The calculation of the gain value of any one of the split nodes can be determined by the following formula (3):

gain represents the Gain value; g _L A first derivative representing a left node of the split node; g _R A first derivative representing a right node of the split node; h _L A second derivative of the left node representing the split node; h _R Second derivative of right node representing split node A number; gamma denotes a parameter controlling whether or not post pruning is performed.

When determining the data distribution information corresponding to the node in the target division manner, the calculation may be performed by the following formula (4) based on the division flag (behavior flag or credit flag) corresponding to the target division manner:

the segmentation mark is used for marking the characteristic distribution conditions of the left node and the right node; s is S _new Representing a segmentation flag; [ G _new ]Representing the updated first derivative in the gradient values of the target segmentation nodes; [ G _old ]Representing the updated second derivative in the gradient values of the target segmentation nodes; [ H ] _new ]Representing the first derivative of the gradient values of the target segmentation nodes before updating; [ H ] _old ]Representing the second derivative of the gradient values of the target segmentation node before update.

The node weight can be calculated by the following formula (5):

wherein G represents the first derivative in the target gradient value of the second decision tree; h represents the second derivative in the target gradient value of the second decision tree; omega represents the weight of the target segmentation node; lambda represents the L2 regularization parameter that controls the weight value of the model complexity.

In summary, the function value of the target partition node is calculated according to the gradient value of the target partition node and the lifting tree function in the node partition mode, and then the gain value of the node partition mode is calculated based on the function value of the target partition node and the function value of the target tree node corresponding to the target partition node, so that the calculation accuracy of the gain value of the target partition node is improved, and the subsequent calculation of the node weight of the tree node based on the gain value of the target partition node is facilitated.

Further, considering that the tree depth of the first decision tree is at least one layer, when traversing the first decision tree by using the breadth first algorithm, the first participant 210 is configured to determine the first decision tree of the corresponding tree model update task issued by the service provider 230 when performing the layer-by-layer traversal; determining initial tree nodes corresponding to the initial tree depths in the first decision tree; updating the initial tree node based on the behavior characteristics of the behavior data set, traversing the updated initial tree node according to a breadth-first algorithm based on the target gradient value, and sending first traversal information to the service provider 230;

correspondingly, the second participant 220 is configured to determine a first decision tree of a corresponding tree model update task issued by the service provider 230; determining initial tree nodes corresponding to the initial tree depths in the first decision tree; updating the initial tree node based on the credit characteristics of the credit data set, traversing the updated initial tree node according to a breadth-first algorithm based on the target gradient value, and sending second traversal information to the service provider 230.

Based on this, the first participant 210 determines a first decision tree of the corresponding tree model update task issued by the service provider 230 and determines an initial tree node in the first decision tree corresponding to the initial tree depth. Updating the initial tree node based on the behavior characteristics of the behavior data set, traversing the updated initial tree node according to a breadth-first algorithm based on the target gradient value, obtaining first traversing information of the initial tree node under the initial tree depth, and sending the first traversing information to the service provider 230; accordingly, the second participant 220 determines a first decision tree of the corresponding tree model update task issued by the service provider 230, and determines an initial tree node corresponding to the initial tree depth in the first decision tree. Updating the initial tree node based on the credit characteristics of the credit data set, traversing the updated initial tree node according to a breadth-first algorithm based on the target gradient value, obtaining second traversal information of the initial tree node under the initial tree depth, and sending the second traversal information to the service provider 230.

Along the above example, the first participant may traverse the initial tree node under the initial tree depth of the first decision tree based on the behavior feature (time of purchasing the commodity), and send the obtained first traversal information to the service provider; the second party may traverse the initial tree nodes below the initial tree depth of the first decision tree based on the credit characteristics (whether repayment is desired) and send the obtained second traversal information to the service provider.

In summary, the first participant and the second participant traverse the initial tree nodes corresponding to the initial tree depth determined in the first decision tree by adopting the breadth-first algorithm, so that the layer-by-layer traversal based on the tree depth of the first decision tree is realized, and the traversal efficiency of the first decision tree is improved.

Further, after the traversing of the initial tree node under the initial tree depth of the first decision tree is completed, whether the traversing of the first decision tree is completed may be determined by determining whether the initial tree depth of the first decision tree meets a tree depth condition, and in a specific implementation, the service provider 230 is configured to determine whether the initial tree depth of the first decision tree meets a tree depth condition, and if not, determine a target tree depth in the first decision tree based on the initial tree depth, and use the target tree depth as the initial tree depth, and execute the step of determining the initial tree node corresponding to the initial tree depth in the first decision tree; if yes, the step of updating the first decision tree based on the node weight is executed.

Specifically, the tree depth condition may be a tree depth value determined based on the first decision tree, and when the tree depth value corresponding to the initial tree depth is equal to the tree depth value of the first decision tree, the initial tree depth is indicated to be the maximum tree depth of the first decision tree; when the tree depth value corresponding to the initial tree depth is smaller than the tree depth value of the first decision tree, the initial tree depth is not the maximum tree depth of the first decision tree, and a next tree depth exists after the initial tree depth, namely the target tree depth to be traversed.

Based on this, after the initial tree node of the initial tree depth of the first decision tree is traversed, the service provider 230 determines whether the initial tree depth of the first decision tree meets a tree depth condition, if the initial tree depth of the first decision tree does not meet the tree depth condition, it indicates that the initial tree depth is not the maximum tree depth of the first decision tree, and after the initial tree depth, there is still a next tree depth, determines a target tree depth in the first decision tree based on the initial tree depth, and as an initial tree depth, determines an initial tree node corresponding to the initial tree depth in the first decision tree, and traverses the initial tree node; and under the condition that the initial tree depth of the first decision tree meets the tree depth condition, the initial tree depth is the maximum tree depth of the first decision tree, the first decision tree is traversed, and the first decision tree can be updated based on the node weight.

In practical application, the prediction result can be determined by the following formula (6):

wherein P is _old Representing a target gradient value of the first decision tree; p (P) _new Representing the predicted result of the first decision tree, i.e. the updated target gradient value.

In summary, whether the first decision tree is traversed is determined to be completed by judging whether the initial tree depth of the first decision tree meets the tree depth condition, so that the first decision tree is traversed layer by layer based on the breadth-first algorithm, and the traversing efficiency of the first decision tree is improved.

Further, considering that more than one decision tree is corresponding to the gradient lifting decision tree model, after the first decision tree is traversed, a second decision tree is further required to be constructed after the gradient information corresponding to the first decision tree is obtained, and the traversing is continued, and in specific implementation, the service provider 230 is configured to obtain a first target gradient lifting decision tree model corresponding to the tree model updating task according to the updating result; and constructing a second decision tree based on the first target gradient lifting decision tree model, and sending the second decision tree as the first decision tree to the first participant 210 and the second participant 220 until at least one target gradient lifting decision tree model corresponding to the tree model updating task is obtained.

Specifically, the second decision tree is the next decision tree corresponding to the gradient lifting decision tree model after the first decision tree model.

Based on this, the service provider 230 obtains a first target gradient lifting decision tree model corresponding to the tree model update task according to the update result, builds a second decision tree on the basis of the first target gradient lifting decision tree model, and calculates a target gradient value of the second decision tree based on the prediction result of the first target gradient lifting decision tree model. The second decision tree is sent as a first decision tree to the first participant 210 and the second participant 220 for traversal of the second decision tree until at least one target gradient-lifting decision tree model corresponding to the tree model update task is obtained.

In practical application, a gradient association relationship exists between a first decision tree and a second decision tree, a target gradient value corresponding to the second decision tree is obtained by calculation based on a prediction result of the first decision tree, and the target gradient value corresponding to the second decision tree is determined by the following formula (7):

wherein G represents the first derivative in the target gradient value of the second decision tree; h represents the second derivative in the target gradient value of the second decision tree; p (P) _new Representing the prediction result of the first decision tree, namely the updated target gradient value; y represents modeled tag data held by the first party.

In summary, a second decision tree is constructed based on the first target gradient-lifting decision tree model, so that the second decision tree traverses, and the target gradient value of the second decision tree is determined based on the prediction result corresponding to the first decision tree, thereby completing gradual updating of the gradient-lifting decision tree model.

Further, after the tree model updating task is performed, the gradient lifting decision tree model is trained, so that prediction services can be provided for the user based on the trained gradient lifting decision tree model, and in specific implementation, the first participant 210 is configured to receive model parameters issued by the service provider 230 for each target gradient lifting decision tree model, and construct at least one local gradient lifting decision tree model according to the model parameters; and when a prediction task processing request submitted for the user data is received, predicting the user data through at least one local gradient lifting decision tree model, and determining credit product information corresponding to the user according to a prediction result.

Specifically, the model parameters are model parameters which are obtained after the updating of the gradient lifting decision tree model is completed and are used for updating all decision trees corresponding to the gradient lifting decision tree model such as the first decision tree; so that both the first participant 210 and the second participant 220 obtain a trained gradient-lifting decision tree model; correspondingly, the local gradient lifting decision tree model is a trained gradient lifting decision tree model held by the first participant 210; the user data is input data of the model, namely personal information of the user, credit information and behavior information aiming at commodities when the trained gradient lifting decision tree model is used; the credit product information is product information matched with the user, which is obtained by predicting based on the user data by the trained gradient lifting decision tree model.

Based on this, after receiving the model parameters issued by the service provider 230 for each target gradient boost decision tree model, the first participant 210 may construct at least one local gradient boost decision tree model based on each held decision tree and the corresponding model parameters, for providing the prediction service. When a prediction task processing request submitted for user data is received, the user data is predicted through at least one local gradient lifting decision tree model, and credit product information corresponding to the user is determined according to a prediction result.

Along the above example, when the user has a product recommendation requirement, personal information such as credit information, product history behavior information and the like of the user can be provided. The first participant predicts based on the local gradient lifting decision tree model, and further provides credit product information such as the type, the number and the like of the product obtained after prediction for the user.

In summary, the user data is predicted by at least one local gradient lifting decision tree model, and credit product information corresponding to the user is determined according to the prediction result, so that product prediction service is provided for the user.

Corresponding to the above system embodiment, the present disclosure further provides an embodiment of a federal gradient boost decision tree model updating method based on a breadth-first algorithm, and fig. 3 shows a flowchart of a federal gradient boost decision tree model updating method based on a breadth-first algorithm according to an embodiment of the present disclosure, which specifically includes the following steps:

Step S302, a first participant calculates a target gradient value based on the data distribution of the behavior data set and sends the target gradient value to a second participant; updating a first decision tree of a corresponding tree model updating task issued by a service provider based on the behavior characteristics of the behavior data set, traversing the updated first decision tree according to a breadth-first algorithm based on the target gradient value, obtaining first traversing information and transmitting the first traversing information to the service provider;

step S304, a second participant updates the first decision tree issued by the service provider based on credit characteristics of a credit data set, traverses the updated first decision tree according to a breadth-first algorithm based on the target gradient value, obtains second traversal information and sends the second traversal information to the service provider, wherein the behavior data set and the credit data set have a data alignment relationship;

step S306, the service provider calculates the node weight of the tree node in the first decision tree based on the first traversing information and the second traversing information, updates the first decision tree based on the node weight, and obtains at least one target gradient lifting decision tree model corresponding to the tree model updating task according to the updating result; and parameter association relations are arranged among all the target gradient lifting decision tree models.

Optionally, the first participant calculates an initial gradient value based on the data distribution of the behavioural dataset; encrypting the initial gradient value by adopting a homomorphic encryption algorithm based on a public key provided by the service provider to obtain a target gradient value, and transmitting the target gradient value to the second participant;

and the service provider decrypts the first traversal information in the ciphertext format and the second traversal information in the ciphertext format based on a private key, and then calculates the node weight of the tree node in the first decision tree, wherein the public key and the private key form a key pair.

Optionally, the first participant determines at least one first segmentation mode of the updated first decision tree according to the first traversal result; determining first data distribution information corresponding to each first segmentation mode to form first traversal information;

correspondingly, the second participant determines at least one second segmentation mode of the updated first decision tree according to a second traversal result; and determining second data distribution information corresponding to each second segmentation mode to form second traversal information.

Optionally, the determining of the first data distribution information corresponding to any one of the first segmentation modes includes: a first participant determines a behavior sign of a first segmentation mode and a gradient value of a first segmentation node corresponding to the first segmentation mode; generating first data distribution information based on the behavior mark of the first segmentation mode and the gradient value of the first segmentation node;

Correspondingly, the determining of the second data distribution information corresponding to any one of the second division modes comprises the following steps: the second participant determines a credit sign of a second division mode and a gradient value of a second division node corresponding to the second division mode; and generating second data distribution information based on the credit marks of the second segmentation modes and the gradient values of the second segmentation nodes.

Optionally, the service provider generates target traversal information based on the first traversal information and the second traversal information; determining at least one segmentation mode based on the target traversal information; and determining a target segmentation mode in the at least one segmentation mode based on the gain value of each segmentation mode, and calculating the node weight of the tree node based on the target gain value of the target segmentation mode.

Optionally, the determining the gain value of any of the splitting modes includes: the service provider calculates the function value of the target segmentation node according to the gradient value of the target segmentation node in the node segmentation mode and the lifting tree function; and calculating the gain value of the node segmentation mode based on the function value of the target segmentation node and the function value of the target tree node corresponding to the target segmentation node.

Optionally, the first participant determines a first decision tree of a corresponding tree model update task issued by the service provider; determining initial tree nodes corresponding to the initial tree depths in the first decision tree; updating the initial tree node based on the behavior characteristics of the behavior data set, traversing the updated initial tree node based on the target gradient value according to a breadth-first algorithm, and sending first traversal information to the service provider;

correspondingly, the second participant determines a first decision tree of a corresponding tree model updating task issued by the service provider; determining initial tree nodes corresponding to the initial tree depths in the first decision tree; updating the initial tree node based on the credit characteristics of the credit data set, traversing the updated initial tree node according to a breadth-first algorithm based on the target gradient value, and sending second traversing information to the service provider.

Optionally, the service provider judges whether the initial tree depth of the first decision tree meets a tree depth condition, if not, determines a target tree depth in the first decision tree based on the initial tree depth, and performs the step of determining an initial tree node corresponding to the initial tree depth in the first decision tree as the initial tree depth; if yes, the step of updating the first decision tree based on the node weight is executed.

Optionally, the service provider obtains a first target gradient lifting decision tree model corresponding to the tree model updating task according to the updating result; constructing a second decision tree based on the first target gradient lifting decision tree model, and sending the second decision tree to a first participant and a second participant as the first decision tree until at least one target gradient lifting decision tree model corresponding to the tree model updating task is obtained.

Optionally, the first participant receives model parameters issued by the service provider for each target gradient lifting decision tree model, and builds at least one local gradient lifting decision tree model according to the model parameters; and when a prediction task processing request submitted for the user data is received, predicting the user data through at least one local gradient lifting decision tree model, and determining credit product information corresponding to the user according to a prediction result.

The application of the federal gradient boost decision tree model updating method based on the breadth-first algorithm in the gradient boost decision tree model updating is taken as an example, and the federal gradient boost decision tree model updating method based on the breadth-first algorithm is further described below with reference to fig. 4. Fig. 4 is an interaction schematic diagram of a federal gradient boost decision tree model updating method based on breadth-first algorithm, which is applied to gradient boost decision tree model updating according to an embodiment of the present disclosure, specifically including the following steps:

in step S402, the service provider builds a decision tree model and determines a first decision tree in the decision tree model.

The service provider builds a decision tree model comprising at least one decision tree. And respectively sending the sample distribution of all book nodes under the current tree depth to the first participant and the second participant.

In step S404, the service provider sends the first decision tree and the public key to the first party.

The service provider saves the public key and the private key, sends the public key to the first party, and provides a first tree model, i.e. a first decision tree, for the first party.

In step S406, the service provider sends the first decision tree to the second party.

In step S408, the first party determines a first derivative and a second derivative based on the sample distribution and encrypts.

The first party stores first characteristic data and modeling tag data,

in step S410, the first party transmits the encrypted first derivative and second derivative to the second party.

In step S412, the second participant traverses the first decision tree and sends the second traversal result to the service provider.

The second party stores second characteristic data. Traversing the first decision tree based on the second characteristic data, the encrypted first derivative and second derivative, completing the traversal of all split cases, and transmitting the traversal result to the service provider.

In step S414, the first participant traverses the first decision tree and sends the first traversal result to the service provider.

In step S416, the service provider decrypts the first traversal result and the second traversal result after combining to obtain the first derivative and the second derivative of the left child node, and the first derivative and the second derivative of the right child node.

In step S418, the service provider calculates the objective function value and the gain value of the left child node and the right child node, respectively.

In step S420, the service provider selects the node with the largest gain value as the best dividing point and calculates the node weight.

In step S422, the service provider updates the prediction result and calculates the model gradient in case that the decision tree construction condition is satisfied.

In the case of traversing to the target tree depth of the first decision tree, it is indicated that the decision tree construction condition is satisfied.

In step S424, the service provider determines the next decision tree and performs step S402 as the first decision tree if the training condition is not satisfied.

Representing that the training condition is met under the condition that the traversing of a plurality of decision trees contained in the decision tree model is completed; and conversely, if the decision tree model has an unremoved decision tree, and the training condition is not met, determining the next decision tree corresponding to the decision tree model and continuing to move until all decision trees in the decision tree model are moved. Integrating the decision trees, and adding the prediction results of each decision tree to obtain a final prediction value.

Corresponding to the above system embodiment, the present disclosure further provides another embodiment of a federal gradient boost decision tree model updating system based on a breadth-first algorithm, and fig. 5 is a schematic structural diagram of another federal gradient boost decision tree model updating system based on a breadth-first algorithm according to an embodiment of the present disclosure; the breadth-first algorithm-based federal gradient boost decision tree model update system 500 includes a first party 510, a second party 520, and a service provider 530:

the first participant 510 is configured to calculate a target gradient value based on a sample distribution of a first sample data set and send the target gradient value to the second participant 520; updating a first decision tree of a corresponding tree model updating task issued by the service provider 530 based on the first sample characteristics of the first sample data set, traversing the updated first decision tree according to a breadth-first algorithm based on the target gradient value, obtaining first traversal information, and sending the first traversal information to the service provider 530;

the second participant 520 is configured to update the first decision tree issued by the service provider based on a second sample feature of a second sample data set, and traverse the updated first decision tree according to a breadth-first algorithm based on the target gradient value, obtain second traversal information, and send the second traversal information to the service provider 530, where the first sample data set and the second sample data set have a data alignment relationship;

The service provider 530 is configured to calculate a node weight of a tree node in the first decision tree based on the first traversal information and the second traversal information, update the first decision tree based on the node weight, and obtain at least one target gradient lifting decision tree model corresponding to the tree model update task according to an update result; the target gradient lifting decision tree models are provided with parameter association relations, and are used for carrying out credit prediction on users.

In practical applications, the target gradient promotion decision tree model has the capability of credit prediction for users. Personal information, historical credit information, expense information and the like of the user are used as input of a target gradient promotion decision tree model, and the target gradient promotion decision tree model can output credit prediction information for credit prediction of the user and is used for representing credit risk level of the user.

Corresponding to the above system embodiment, the present disclosure further provides another embodiment of a method for updating a federal gradient boost decision tree model based on a breadth-first algorithm, and fig. 6 shows a flowchart of another method for updating a federal gradient boost decision tree model based on a breadth-first algorithm according to one embodiment of the present disclosure. As shown in fig. 6, the method includes:

step S602, the first participant calculates a target gradient value based on the sample distribution of the first sample data set and sends the target gradient value to the second participant; updating a first decision tree of a corresponding tree model updating task issued by a service provider based on first sample characteristics of the first sample data set, traversing the updated first decision tree according to a breadth-first algorithm based on the target gradient value, obtaining first traversing information, and sending the first traversing information to the service provider;

step S604, the second participant updates the first decision tree issued by the service provider based on the second sample feature of the second sample data set, and traverses the updated first decision tree according to a breadth-first algorithm based on the target gradient value, to obtain second traversal information and sends the second traversal information to the service provider, wherein the first sample data set and the second sample data set have a data alignment relationship;

Step S606, the service provider calculates the node weight of the tree node in the first decision tree based on the first traversing information and the second traversing information, updates the first decision tree based on the node weight, and obtains at least one target gradient lifting decision tree model corresponding to the tree model updating task according to the updating result; the target gradient lifting decision tree models are provided with parameter association relations, and are used for carrying out credit prediction on users.

The foregoing is another illustrative version of the federal gradient boost decision tree model update system based on breadth-first algorithm of this embodiment. It should be noted that, the technical solution of the federal gradient boost decision tree model updating system based on the breadth-first algorithm and the technical solution of the federal gradient boost decision tree model updating system based on the breadth-first algorithm belong to the same concept, and the details of the technical solution of the federal gradient boost decision tree model updating system based on the breadth-first algorithm, which are not described in detail, can be referred to the description of the technical solution of the federal gradient boost decision tree model updating system based on the breadth-first algorithm.

Fig. 7 illustrates a block diagram of a computing device 700 provided in accordance with an embodiment of the present specification. The components of computing device 700 include, but are not limited to, memory 710 and processor 720. Processor 720 is coupled to memory 710 via bus 730, and database 750 is used to store data.

Computing device 700 also includes access device 740, access device 740 enabling computing device 700 to communicate via one or more networks 760. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. The access device 740 may include one or more of any type of network interface, wired or wireless (e.g., a Network Interface Card (NIC)), such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.

In one embodiment of the present description, the above-described components of computing device 700, as well as other components not shown in FIG. 7, may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device illustrated in FIG. 7 is for exemplary purposes only and is not intended to limit the scope of the present description. Those skilled in the art may add or replace other components as desired.

Computing device 700 may be any type of stationary or mobile computing device including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), mobile phone (e.g., smart phone), wearable computing device (e.g., smart watch, smart glasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 700 may also be a mobile or stationary server.

The processor 720 is configured to execute computer-executable instructions that, when executed by the processor, implement the steps of the federal gradient boost decision tree model updating method based on the breadth-first algorithm.

The foregoing is a schematic illustration of a computing device of this embodiment. It should be noted that, the technical solution of the computing device and the technical solution of the above-mentioned federal gradient boost decision tree model updating method based on the breadth-first algorithm belong to the same conception, and details of the technical solution of the computing device, which are not described in detail, can be referred to the description of the technical solution of the above-mentioned federal gradient boost decision tree model updating method based on the breadth-first algorithm.

An embodiment of the present disclosure also provides a computer-readable storage medium storing computer instructions that, when executed by a processor, implement the steps of the federal gradient boost decision tree model updating method based on the breadth-first algorithm described above.

The above is an exemplary version of a computer-readable storage medium of the present embodiment. It should be noted that, the technical solution of the storage medium and the technical solution of the above-mentioned federal gradient boost decision tree model updating method based on the breadth-first algorithm belong to the same concept, and details of the technical solution of the storage medium which are not described in detail can be referred to the description of the technical solution of the above-mentioned federal gradient boost decision tree model updating method based on the breadth-first algorithm.

The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

The computer instructions include computer program code that may be in source code form, object code form, executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the content of the computer readable medium can be increased or decreased appropriately according to the requirements of the patent practice, for example, in some areas, according to the patent practice, the computer readable medium does not include an electric carrier signal and a telecommunication signal.

It should be noted that, for the sake of simplicity of description, the foregoing method embodiments are all expressed as a series of combinations of actions, but it should be understood by those skilled in the art that the present description is not limited by the order of actions described, as some steps may be performed in other order or simultaneously in accordance with the present description. Further, those skilled in the art will appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily all necessary in the specification.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.

The preferred embodiments of the present specification disclosed above are merely used to help clarify the present specification. Alternative embodiments are not intended to be exhaustive or to limit the invention to the precise form disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the disclosure and the practical application, to thereby enable others skilled in the art to best understand and utilize the disclosure. This specification is to be limited only by the claims and the full scope and equivalents thereof.

Claims

1. A federal gradient boost decision tree model updating system based on breadth-first algorithm, the system comprising a first party, a second party, and a service provider:

2. The system of claim 1, wherein the first party is configured to calculate an initial gradient value based on a data distribution of the behavioral dataset; encrypting the initial gradient value by adopting a homomorphic encryption algorithm based on a public key provided by the service provider to obtain a target gradient value, and transmitting the target gradient value to the second participant;

The service provider is configured to decrypt the first traversal information in the ciphertext format and the second traversal information in the ciphertext format based on a private key, and then calculate a node weight of a tree node in the first decision tree, where the public key and the private key form a key pair.

3. The system of claim 1, wherein the first party is configured to determine at least one first partitioning of the updated first decision tree based on a first traversal result; determining first data distribution information corresponding to each first segmentation mode to form first traversal information;

correspondingly, the second participant is configured to determine at least one second partition mode of the updated first decision tree according to a second traversal result; and determining second data distribution information corresponding to each second segmentation mode to form second traversal information.

4. The system of claim 3, wherein the determining of the first data distribution information corresponding to any one of the first partitioning modes includes: the first participant is used for determining a behavior sign of a first segmentation mode and a gradient value of a first segmentation node corresponding to the first segmentation mode; generating first data distribution information based on the behavior mark of the first segmentation mode and the gradient value of the first segmentation node;

Correspondingly, the determining of the second data distribution information corresponding to any one of the second division modes comprises the following steps: the second party is used for determining a credit sign of a second division mode and a gradient value of a second division node corresponding to the second division mode; and generating second data distribution information based on the credit marks of the second segmentation modes and the gradient values of the second segmentation nodes.

5. The system of claim 1, wherein the service provider is further configured to generate target traversal information based on the first traversal information and the second traversal information; determining at least one segmentation mode based on the target traversal information; and determining a target segmentation mode in the at least one segmentation mode based on the gain value of each segmentation mode, and calculating the node weight of the tree node based on the target gain value of the target segmentation mode.

6. The system of claim 5, wherein the determining of the gain value for any one of the splitting modes comprises: the service provider is used for calculating the function value of the target segmentation node according to the gradient value of the target segmentation node in the node segmentation mode and the lifting tree function; and calculating the gain value of the node segmentation mode based on the function value of the target segmentation node and the function value of the target tree node corresponding to the target segmentation node.

7. The system of claim 1, wherein the first party is configured to determine a first decision tree for a corresponding tree model update task issued by the service provider; determining initial tree nodes corresponding to the initial tree depths in the first decision tree; updating the initial tree node based on the behavior characteristics of the behavior data set, traversing the updated initial tree node based on the target gradient value according to a breadth-first algorithm, and sending first traversal information to the service provider;

correspondingly, the second participant is configured to determine a first decision tree of a corresponding tree model update task issued by the service provider; determining initial tree nodes corresponding to the initial tree depths in the first decision tree; updating the initial tree node based on the credit characteristics of the credit data set, traversing the updated initial tree node according to a breadth-first algorithm based on the target gradient value, and sending second traversing information to the service provider.

8. The system of claim 7, wherein the service provider is configured to determine whether the initial tree depth of the first decision tree meets a tree depth condition, and if not, determine a target tree depth in the first decision tree based on the initial tree depth, and perform the step of determining an initial tree node corresponding to the initial tree depth in the first decision tree as the initial tree depth; if yes, the step of updating the first decision tree based on the node weight is executed.

9. The system of claim 1, wherein the service provider is configured to obtain a first target gradient boost decision tree model corresponding to the tree model update task according to an update result; constructing a second decision tree based on the first target gradient lifting decision tree model, and sending the second decision tree to a first participant and a second participant as the first decision tree until at least one target gradient lifting decision tree model corresponding to the tree model updating task is obtained.

10. The system of claim 1, wherein the first participant is configured to receive model parameters issued by the service provider for each target gradient boost decision tree model, and construct at least one local gradient boost decision tree model according to the model parameters; and when a prediction task processing request submitted for the user data is received, predicting the user data through at least one local gradient lifting decision tree model, and determining credit product information corresponding to the user according to a prediction result.

11. A federal gradient boost decision tree model updating system based on breadth-first algorithm, the system comprising a first party, a second party, and a service provider:

12. A federal gradient promotion decision tree model updating method based on breadth-first algorithm is characterized by comprising the following steps:

13. A computing device comprising a memory and a processor; the memory is configured to store computer-executable instructions and the processor is configured to execute the computer-executable instructions to implement the steps of the federal gradient boost decision tree model updating method based on breadth-first algorithm of claim 12.

14. A computer readable storage medium storing computer instructions which, when executed by a processor, implement the steps of the federal gradient boost decision tree model updating method based on breadth-first algorithm of claim 12.