CN113392164A - Method, main server, service platform and system for constructing longitudinal federated tree - Google Patents

Method, main server, service platform and system for constructing longitudinal federated tree Download PDF

Info

Publication number
CN113392164A
CN113392164A CN202010174360.9A CN202010174360A CN113392164A CN 113392164 A CN113392164 A CN 113392164A CN 202010174360 A CN202010174360 A CN 202010174360A CN 113392164 A CN113392164 A CN 113392164A
Authority
CN
China
Prior art keywords
vector
common
sample
tree
longitudinal federated
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010174360.9A
Other languages
Chinese (zh)
Other versions
CN113392164B (en
Inventor
刘洋
杜师帅
张芳娟
张钧波
郑宇�
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jingdong City Beijing Digital Technology Co Ltd
Original Assignee
Jingdong City Beijing Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jingdong City Beijing Digital Technology Co Ltd filed Critical Jingdong City Beijing Digital Technology Co Ltd
Priority to CN202010174360.9A priority Critical patent/CN113392164B/en
Publication of CN113392164A publication Critical patent/CN113392164A/en
Application granted granted Critical
Publication of CN113392164B publication Critical patent/CN113392164B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database

Abstract

The disclosure provides a method, a main server, a service platform and a system for constructing a longitudinal federated tree. The main server determines a first vector according to the dimension sum of the features of the common samples of all the service platforms and a preset parameter, wherein the preset parameter represents the number of the features which do not participate in node splitting, and the first vector represents the degree of participation of each feature in node splitting; determining a global second vector according to the local second vectors, wherein the local/global second vectors respectively represent the distance between the feature vector of each common sample on one/all service platforms and the corresponding feature reference vector, and the feature reference vector represents the split point in the value range of each feature; and issuing the product value of the first vector and the global second vector to each service platform, so that each service platform performs node splitting of the longitudinal federated tree by taking the product value as a split value, the risk of exposing the original data of each service platform and the data distribution information of the sample is avoided, and the safety and the privacy of the data are enhanced.

Description

Method, main server, service platform and system for constructing longitudinal federated tree
Technical Field
The disclosure relates to the technical field of computers, in particular to a method, a main server, a service platform and a system for constructing a longitudinal federated tree.
Background
The mass information generated in the big data era promotes the continuous progress and development of artificial intelligence, and the premise of legally utilizing the big data to promote the social progress is to protect the safety of data of business platforms of enterprises, individuals, governments and other organizations. The business platforms of some organizations need to be capable of performing cross-organization joint modeling in combination with data of the business platforms of other organizations while not sharing the data, so that the realization of joint modeling on the premise of protecting data security and privacy is an important problem to be solved at present.
Aiming at the condition that samples in data samples have more overlapping and sample characteristics have less overlapping among a plurality of service platforms, a longitudinal federated tree model is provided in a plurality of related technologies, and the federated tree model is constructed by combining the data samples which have the same samples and not identical sample characteristics of the service platforms.
In the process of constructing the longitudinal federated tree model, the service platform performs primary splitting of the node based on one randomly selected feature each time, and shares the local sample to other service platforms through the main server based on training results such as data distribution information of the secondary splitting, so that the information such as data distribution of each service platform can be collected and shared through one main server under the condition that the original data owned by the service platform is not exposed, and each service platform is coordinated to realize cross-platform unified modeling.
Disclosure of Invention
The inventor finds that in the related technology of building a longitudinal federated tree model by participating in a plurality of service platforms, although the original data of the service platforms are not exposed, the data distribution information of the samples is exposed, and certain threats are caused to the safety and privacy of the data.
In the embodiment of the disclosure, the main server generates a first vector according to the sum of the feature dimensions of the collected common samples of each service platform and preset parameters, generates a global second vector according to the collected local second vectors of each service platform, and sends a product value of the first vector and the global second vector to each service platform, so that each service platform splits the node of the longitudinal federated tree by using the product value as a split value, and finally completes the construction of the longitudinal federated tree model, thereby avoiding the risk of exposing the data distribution information of the original data or the samples of each service platform, and enhancing the security and privacy of the data.
According to some embodiments of the present disclosure, there is provided a method for constructing a vertical federated tree, including:
the main server determines a first vector according to the dimension sum of all the features of the common sample of each service platform and a preset parameter, wherein the preset parameter represents the number of the features which do not participate in the node splitting of the longitudinal federated tree at this time, and the first vector represents the degree of participation of each feature in the node splitting of the longitudinal federated tree at this time;
the main server determines a global second vector according to the collected local second vectors of the service platforms, wherein the local second vectors represent the distance between the feature vector of each common sample on one service platform and the corresponding feature reference vector, the global second vectors represent the distance between the feature vectors of all the service platforms and the corresponding feature reference vector, and the feature reference vector represents a random split point in a feature value range of each feature of each common sample of each service platform;
the main server calculates a product value of the first vector and the global second vector, and issues the product value to each service platform, so that each service platform uses the product value as a split value to split the nodes of the longitudinal federated tree;
and repeating all the steps until a preset termination condition is met.
In some embodiments, determining the first vector comprises: generating a random vector which accords with normal distribution, wherein the dimensionality of the random vector is equal to the sum of the dimensionalities of all the characteristics of the common sample of each service platform; and setting the value of the corresponding number of elements indicated by the preset parameters in the random vector to be 0 to obtain the first vector.
In some embodiments, determining the global second vector comprises: and determining the union of the local second vectors of all the service platforms as a global second vector.
In some embodiments, further comprising: and the main server performs sample alignment on the original samples of the service platforms, and determines the aligned original samples as common samples of the service platforms.
In some embodiments, the main server determines the dimension sum of all the features of the common samples of the service platforms according to the dimension of the features of the collected common samples of the service platforms.
In some embodiments, the preset parameter is less than the sum of dimensions of all features that a common sample of all service platforms has.
In some embodiments, the vertical federate tree is used for evaluating user credits, each service platform includes a plurality of service platforms having user samples of credits to be evaluated, the common sample is a common user sample, the common user sample is a user sample commonly owned by each service platform, and the vertical federate tree depth information of a node where the common user sample is located in a node splitting process corresponds to the credit information of the common user sample.
According to other embodiments of the present disclosure, there is provided a method of constructing a longitudinal federated tree, comprising:
the service platforms calculate local second vectors, and report the local second vectors to a main server, so that the main server determines a global second vector according to the collected local second vectors of the service platforms, wherein the local second vectors represent the distance between the feature vector of each common sample on one service platform and a corresponding feature reference vector, the global second vectors represent the distance between the feature vectors of all the service platforms and the corresponding feature reference vector, and the feature reference vector represents a random split point in a feature value range of each feature of the common samples of the service platforms;
the service platform receives a product value of a first vector and a global second vector sent by a main server, wherein the first vector represents the degree of each feature participating in the node splitting of the longitudinal federated tree;
the service platform uses the product value of the first vector and the global second vector as the splitting value of each common sample to split the nodes of the longitudinal federated tree so as to construct the longitudinal federated tree;
and the service platform repeatedly executes all the steps until a preset termination condition is met.
In some embodiments, further comprising: and the service platform receives a common sample which is sent by the main server and is determined after the samples of the service platforms are aligned.
In some embodiments, the service platform performing the splitting of the nodes of the longitudinal federated tree using the product value of the first vector and the global second vector as the split value for each common sample comprises: and splitting a current node according to the split value corresponding to each common sample so as to determine the child nodes of the current node to which the common samples belong.
In some embodiments, splitting the current node according to the split value corresponding to each common sample comprises: if the splitting value corresponding to the common sample is smaller than 0, the common sample is divided into the right child node of the current node, and if the splitting value corresponding to the common sample is larger than 0, the common sample is divided into the left child node of the current node; or if the split value corresponding to the common sample is less than 0, the common sample is divided into the left child node of the current node, and if the split value corresponding to the common sample is greater than 0, the common sample is divided into the right child node of the current node.
In some embodiments, the termination condition comprises: the depth of the longitudinal federal tree reaches a preset depth; alternatively, the number of samples of leaf nodes of the vertical federated tree reaches a preset number.
In some embodiments, further comprising: and constructing a plurality of longitudinal federated trees by using the method for constructing the longitudinal federated trees described in any embodiment so as to generate a longitudinal federated forest.
In some embodiments, further comprising: the business platform initializes a root node of a longitudinal federated tree such that the root node includes all common samples of the business platform.
In some embodiments, the vertical federal tree is used for evaluating user credits, each service platform includes a plurality of service platforms having user samples of credits to be evaluated, the common sample is a common user sample, the common user sample is a user sample commonly owned by each service platform, and depth information of the vertical federal tree of a node where the common user sample is located in a node splitting process corresponds to credit information of the common user sample.
According to still other embodiments of the present disclosure, there is provided a master server for building a vertical federated tree, including: a memory; and a processor coupled to the memory, the processor configured to execute the method of building a longitudinal federated tree of any of the embodiments based on instructions stored in the memory.
According to still further embodiments of the present disclosure, there is provided a business platform for constructing a longitudinal federated tree, including: a memory; and a processor coupled to the memory, the processor configured to execute the method of building a longitudinal federated tree of any of the embodiments based on instructions stored in the memory.
According to still further embodiments of the present disclosure, there is provided a system for building a longitudinal federated tree, including: the main server in any embodiment and a plurality of service platforms in any embodiment.
According to still further embodiments of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of building a longitudinal federated tree as described in any one of the embodiments.
Drawings
The drawings that will be used in the description of the embodiments or the related art will be briefly described below. The present disclosure can be understood more clearly from the following detailed description, which proceeds with reference to the accompanying drawings.
It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without undue inventive faculty.
Fig. 1 illustrates a flow diagram of a method of building a longitudinal federated tree in accordance with some exemplary embodiments of the present disclosure.
Fig. 2 shows a flow diagram of a method of building a longitudinal federated tree in accordance with further exemplary embodiments of the present disclosure.
Fig. 3 illustrates a schematic diagram of respective business platforms utilizing first and second vectors for splitting of longitudinal federated tree nodes, according to some example embodiments of the present disclosure.
Fig. 4 shows a schematic diagram of a master server building a vertical federated tree, in accordance with some exemplary embodiments of the present disclosure.
Fig. 5 shows a schematic diagram of a business platform building a longitudinal federated tree, in accordance with some exemplary embodiments of the present disclosure.
Fig. 6 shows a schematic diagram of a system for building a longitudinal federated tree, in accordance with some exemplary embodiments of the present disclosure.
Detailed Description
The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure.
In the technical field that a plurality of service platforms participate in building a longitudinal federated tree model, data distribution information of a sample is exposed, and certain threats are caused to the safety and privacy of data, so that the method is provided.
Fig. 1 illustrates a flow diagram of a method of building a longitudinal federated tree in accordance with some exemplary embodiments of the present disclosure. Each service platform is connected to the main server, and a longitudinal federated tree is jointly constructed through the main server.
As shown in FIG. 1, the method of this embodiment comprises steps 101-105.
In step 101, the main server performs sample alignment on the original samples of the service platforms, and determines the aligned original samples as common samples owned by the service platforms, where the common samples are samples owned by the service platforms.
In some embodiments, all the service platforms report all the sample information (such as sample names) locally owned by each service platform to the main server, the main server selects a sample commonly owned by all the service platforms according to the respective local samples reported by the service platforms, determines the sample as a common sample, and issues the common sample to each service platform, so that each service platform subsequently utilizes the common sample to construct a longitudinal federated tree.
In step 102, the main server determines a first vector according to the dimension sum of all the characteristics of the common sample of each service platform and a preset parameter.
The preset parameters represent the number of the characteristics of the node splitting of the current time without participating in the longitudinal federated tree. The preset parameter in the disclosure is a hyper-parameter, i.e., a parameter that can be customized, and can be preset as required. The preset parameter is smaller than the dimension sum of all the characteristics of the common sample of all the service platforms, namely the value range of the preset parameter is a positive integer between intervals [0, n-1], wherein n is the dimension sum of all the characteristics.
Wherein, the first vector represents the degree of each feature participating in the node splitting of the longitudinal federated tree. Before determining the first vector, firstly, each service platform reports the dimensionality of all the characteristics of the local common sample to a main server, and the main server determines the sum of the dimensionalities of all the characteristics of the common sample of all the service platforms according to the collected dimensionality of the characteristics of each common sample of each service platform. Then, a first vector is determined by using the sum of the dimensions of all the characteristics of the determined common samples and preset parameters, and in some embodiments, the method for determining the first vector comprises the following steps: generating a random vector which accords with normal distribution, wherein the dimensionality of the random vector is equal to the sum of the dimensionalities of all the characteristics of the common sample of each service platform; and setting the values of the elements with the corresponding number indicated by the preset parameters in the random vector to be 0 to obtain a first vector. The number of elements of 0 in the first vector represents the number of features that do not participate in the node splitting of the longitudinal federated tree. The first vector determined according to the method is random, and the degree of each feature participating in the node splitting of the longitudinal federated tree at this time is random, so that each feature obtains the opportunity of being selected to participate in the node splitting.
In step 103, the main server determines a global second vector according to the collected local second vectors of the service platforms.
Wherein the local second vector represents the distance of each common sample between the feature vector of one service platform and the corresponding feature reference vector. The global second vector represents the distance of each common sample between the feature vectors of all service platforms and the corresponding feature reference vectors. The feature reference vector represents a random split point in a feature value range of each feature of each common sample of each service platform.
And the service platforms calculate local second vectors and report the local second vectors to the main server, and the main server determines global second vectors according to the collected local second vectors of the service platforms. In some embodiments, the method for each service platform to compute the local second vector includes: the service platform determines a value range of each feature corresponding to the common sample on the service platform according to the value of each feature of the local common sample, selects a numerical value in the value range of each feature, if a plurality of features exist, uses the selected plurality of numerical values as a feature reference vector of the service platform, then calculates the distance between the feature vector of each common sample of the service platform and the corresponding feature reference vector, and determines a distance vector formed by the distance between the feature vector and the feature reference vector as a local second vector corresponding to the service platform. In some embodiments, the method for the master server to determine the global second vector comprises, for example: and the main server determines the union of the collected local second vectors of the service platforms as a global second vector.
In step 104, the primary server calculates a product value of the first vector and the global second vector, and issues the product value to each service platform.
And the main server calculates the product value of the first vector and the global second vector of the multiple common samples to obtain a product value corresponding to each common sample, and sends the product value to each service platform.
In step 105, each service platform uses the product value as a splitting value to split the nodes of the vertical federated tree to construct the vertical federated tree.
The method for constructing the longitudinal federal tree includes, for example: first, the business platform initializes the root node of the vertical federated tree such that the root node includes all common samples for the business platform. Then, the service platform uses the product value of the first vector and the global second vector as the splitting value of each common sample to split the nodes of the longitudinal federated tree. The node splitting method includes, for example: and splitting the current node according to the split value corresponding to each common sample to determine the child nodes of the current node to which the common sample belongs. The splitting rule is, for example, to divide the common sample into the right child node of the current node if the splitting value corresponding to the common sample is less than 0, and to divide the common sample into the left child node of the current node if the splitting value corresponding to the common sample is greater than 0; or the splitting rule is that if the splitting value corresponding to the common sample is smaller than 0, the common sample is divided into the left child node of the current node, and if the splitting value corresponding to the common sample is larger than 0, the common sample is divided into the right child node of the current node. It should be noted that the splitting rule should be consistent during the construction of a vertical federated tree. The splitting rules may be different during the construction of different vertical federated trees.
The above steps 102 and 105 are repeatedly executed until the predetermined termination condition is satisfied. Wherein the termination condition comprises: the depth of the longitudinal federal tree reaches a preset depth; alternatively, the purity of the leaf nodes (e.g., the number of samples of the leaf nodes) of the vertical federal tree reaches a preset number, for example, the number of samples of each leaf node reaches the preset number, or the number of samples of a certain proportion of leaf nodes reaches the preset number.
In addition, a plurality of longitudinal federated trees may be constructed using the method of constructing a longitudinal federated tree described above, thereby generating a longitudinal federated forest.
In the embodiment, the main server generates the first vector according to the feature dimension sum of the collected common samples of each service platform and the preset parameter, generates the global second vector according to the collected local second vector of each service platform, and sends the product value of the first vector and the global second vector to each service platform, so that each service platform splits the node of the longitudinal federated tree by using the product value as a split value, and finally completes the construction of the longitudinal federated tree model, thereby avoiding the risk of exposing the original data of each service platform or the data distribution information of the samples, and enhancing the security and privacy of the data.
In addition, data transmitted between the main server and each service platform, such as a sample name, a characteristic dimension, a local second vector and two random vectors (a first vector and a global second vector), can be transmitted without adopting an encryption mode because the privacy of the local data of the service platform is not involved, so that the encryption processing time is saved, and the construction efficiency of the longitudinal federated tree model is improved.
In addition, in the process of constructing the longitudinal federated tree, all the characteristics of the sample have the opportunity to participate in node splitting together, and compared with the situation that only one characteristic participates in node splitting at each time, the method is beneficial to reducing the time complexity of constructing the longitudinal federated tree model and improving the performance stability of the longitudinal federated tree model.
The method for constructing the longitudinal federated tree in the embodiment can be applied to an application scenario in which data sets with the same business samples and different characteristics of the business samples on a plurality of business platforms are classified.
For example, a vertical federated tree may be applied to evaluate user credits. Aiming at a service scene of evaluating user credit, a plurality of service platforms of user samples with credit to be evaluated are used as a plurality of service platforms participating in building a longitudinal federated tree, the user samples commonly owned by the service platforms are used as common user samples owned by the service platforms, the credit characteristics of the common user samples on different service platforms can be different, the longitudinal federated tree depth information of a node where the common user samples are located in the splitting process corresponds to the credit information of the common user samples, and the corresponding relation can be set according to the specific service scene. If the performance related to the credit abnormity and the credit normality of the user sample is evaluated, the larger the depth of the longitudinal federal tree of the node where the user sample is located is, the smaller the probability of the credit abnormity of the user sample is, and the larger the probability of the credit normality is.
For example, the credit of the user sample is pre-evaluated. The e-commerce institution A has 100 users, which are recorded as U1-U100, and the service platform A of the e-commerce institution A records online shopping consumption records of each user sample U1-U100, such as credit characteristics of online shopping amount, online shopping frequency and the like. The financial institution B has 80 users, which are recorded as U1-U80, and the service platform B of the financial institution B records financial transaction records of each user sample U1-U80, such as credit characteristics of transfer amount, card swiping consumption frequency, credit card amount, credit card repayment record, payroll and the like. The user samples of the service platform A and the service platform B are recorded with users U1-U80, so the main server can select the user samples U1-U80 as common user samples shared by the service platform A and the service platform B. The dimension of the credit characteristics (online shopping amount and online shopping frequency) of the common user sample of the service platform A is 2. The service platform B has a dimension of the characteristics (transfer amount, card swiping consumption frequency, credit card amount, credit card repayment record, payroll line) of the common user sample of 6. The master server determines that the common sample of each service platform has a dimension sum of all features of 8 (i.e., 2+ 6-8). The main server generates a first vector (representing the degree of participation of each credit feature in the current node splitting of the longitudinal federated tree) according to the dimension sum of all the credit features of the common user samples of the service platforms A and B and a preset parameter (representing the number of the credit features not participating in the current node splitting of the longitudinal federated tree), generates a global second vector (representing the distance between the credit feature vector of each common user sample of a certain service platform A or B and a corresponding credit feature reference vector, wherein the credit feature reference vector represents a random split point in a feature value range of each credit feature of each common user sample) according to respective local second vectors reported by the service platforms A and B, and issuing the product value of the first vector and the global second vector to each service platform. The specific calculation method of the first vector and the global second vector may refer to the description of the embodiments shown in fig. 1 and 2. The service platform A and the service platform B can construct a longitudinal federal tree for evaluating the user credit by using the local common user sample and the product value transmitted by the main server. The longitudinal federated tree depth information of the node where the user sample is located corresponds to credit information representing normal credit and abnormal credit of the user sample, and the larger the longitudinal federated tree depth of the node where the user sample is located is, the smaller the probability of credit abnormality of the user sample is, and the larger the probability of credit normality is.
Fig. 2 shows a flow diagram of a method of building a longitudinal federated tree in accordance with further exemplary embodiments of the present disclosure.
As shown in FIG. 2, the method of this embodiment comprises step 201-209.
In step 201, the main server performs sample alignment on the original samples of the respective service platforms, and determines the aligned original samples as common samples owned by the respective service platforms. The specific method refers to step 101.
For example, the user sample set of the service platform 1 is:
user samples Credit feature 1-feature1 Credit feature 2-feature2
A 0.1 1
B 0.3 3
C 0.2 4
The user sample set of the service platform 2 is as follows:
user samples Credit feature 3-feature3
B 3
C 4
The service platform 1 reports the user sample name A, B, C to the main server, the service platform 2 reports the user name B, C to the main server, and the main server performs sample alignment on the original samples of the service platform 1-2 to determine that the user sample B, C is a common sample of the two service platforms.
After the common samples are determined, the sample set actually used for constructing the longitudinal federated tree model by each service platform is as follows:
the user sample set of the service platform 1 is as follows:
user samples Credit feature 1-feature1 Credit feature 2-feature2
B 0.3 3
C 0.2 4
The sample set of the service platform 2 is:
user samples Credit feature 3-feature3
B 3
C 4
In step 202, each service platform initializes a root node of the longitudinal federated tree, the root node of the longitudinal federated tree on each service platform is initialized to include all local samples on the service platform, and then node splitting is performed step by step with the root node as a starting node.
For example, the initialized root nodes for service platform 1 and service platform 2 both include common user sample B, C.
In step 203, each service platform reports the feature dimension (i.e., local feature dimension) of each local common sample to the main server, and the main server determines a first vector according to a preset parameter and the sum of the local feature dimensions.
The preset parameters, the meaning of the first vector and the specific method of determining the first vector refer to step 102.
For example, the local feature dimension of the common sample of the service platform 1 is 2(feature1 and feature2), and the local feature dimension of the common sample of the service platform 2 is 1(feature3), so the sum of the dimensions of all local features of the common sample of the service platforms 1-2 is 3 (i.e. 2+1 ═ 3), and a 3-dimensional random vector [ n0, n1, n2], such as [ -1,1,0], is generated based on a normal distribution, assuming that a preset parameter is 0, that is, all features participate in the splitting of the node, where the 3-dimensional random vector is equal to the first vector, and the first vector is labeled as n, and then n [ -1,1,0 ].
In step 204, each service platform calculates a local second vector and reports the local second vector to the main server, and the main server splices the local second vectors corresponding to each service platform to form a global second vector.
The meaning of the local second vector and the global second vector, the specific method for each service platform to calculate the local second vector and the master server to determine the global second vector refer to step 103.
For example, if the feature dimension of the common sample of the service platform 1 is 2, the feature reference vector p1 corresponding to the service platform 1 is a 2-dimensional vector, and the values of the two elements of p1 are determined by the feature value ranges of feature1 and feature2 on the service platform 1, respectively. For example, the feature value range [0.2,0.3] of feature1 of the service platform 1, the feature value range [3,4] of feature2, the first element of p1 selects a value in the interval [0.2,0.3], for example, 0.25, the second element of p1 selects a value in the interval [3,4], for example, 3.8, and the feature reference vector p1 of the service platform 1 is [0.25,3.8 ]. In the same method, for example, if the feature dimension of the common sample of the service platform 2 is 1, the feature reference vector p2 corresponding to the service platform 2 is a 1-dimensional vector, the value of the element p2 is determined by the feature value range of feature3 on the service platform 2, for example, a value of 3.5 is selected from the feature value range interval [3,4] of the feature3, and the feature reference vector p2 of the service platform 2 is [3.5 ].
For the common user sample B, the common user sample B is in the feature vector X of the service platform 1B-1The distance of the feature reference vector p1 corresponding to the service platform 1 (i.e. the local second vector of the common user sample B at the service platform 1) is denoted as XB-1-p1=[0.3,3]-[0.25,3.8]=[0.05,-0.8]Feature vector X of common user sample B on service platform 2B-2The distance of the feature reference vector p2 corresponding to the service platform 2 (i.e. the local second vector of the co-user sample B at the service platform 2) is denoted XB-2-p2=[3]-[3.5]=[-0.5]The main server splices the collected local second vector of the common user sample B at the service platform 1 and the local second vector at the service platform 2 to form a three-dimensional global second vector of the common user sample B (using X)B-p represents) is [0.05, -0.8, -0.5]。
The calculation method of the global second vector of the common user sample C is the same as that of the global second vector of the common user sample B, and is not described herein again.
In step 205, the primary server calculates a product value of the global second vector and the first vector corresponding to all common samples of the current node to be split, and issues the product value to each service platform, so that each service platform performs the splitting of the node by using the product value as a split value.
For example, a three-dimensional global second vector (X) of common user samples BB-p) is [0.05, -0.8, -0.5%]The first vector n [ -1,1,0 [ ]]Then, thenThe product value of the three-dimensional global second vector and the first vector of common user sample B is denoted as (X)B-p)*n=0.05*(-1)+(-0.8)*(1)+(-0.5)*0=-0.85。
In step 206, each service platform uses the product value sent by the main server to split the node, and stores the parameters of the current node, such as the first vector, the local second vector, and the current depth of the vertical federal tree.
For example, the product of the three-dimensional global second vector and the first vector of common user sample B has a value of (X)B-p) × n ═ 0.05 × (-1) + (-0.8) × (1) + (-0.5) × 0 ═ 0.85. If the splitting rule is that the product value is less than 0, the right child node is divided, since-0.85<0, then common user sample B is divided into right child nodes.
As shown in fig. 3, a schematic diagram of a respective business platform utilizing a first vector and a second vector for splitting of longitudinal federated tree nodes is shown, according to some exemplary embodiments of the present disclosure. Supposing that m service platforms are provided and respectively marked as service platform 1, service platform 2, … … and service platform m, supposing that each service platform has 1 common sample, and the feature vectors corresponding to the common samples are respectively marked as x1, x2, … … and xm. For example, the service platform reports the local feature dimensions corresponding to the common sample locally to the main server, and the main server obtains the sum of all the local feature dimensions of the common sample on all the service platforms, and determines a first vector (denoted as n) according to the sum of all the local feature dimensions and a preset parameter. The service platforms report respective local second vectors (namely, distances between the feature vectors of the common sample on a certain service platform and the corresponding local feature reference vectors are represented as x1-p1, x2-p2, … … and xm-pm) to the main server, and the main server splices the local second vectors reported by the service platforms to form global second vectors (namely, distances between the feature vectors of the common sample on all the service platforms and the corresponding local feature reference vectors are represented as x-p). And the main server issues a product value (x-p) n of a first vector and a global second vector corresponding to the common sample to each service platform, and each service platform divides the sample with the calculated value (x-p) n larger than 0 to the right child node and the sample with the calculated value (x-p) n smaller than 0 to the left child node according to the splitting rule.
Therefore, in the process of constructing the longitudinal federated tree, all the characteristics of the sample have the opportunity to participate in node splitting together, and compared with the situation that only one characteristic participates in node splitting at each time, the method is beneficial to reducing the time complexity of constructing the longitudinal federated tree model and improving the performance stability of the longitudinal federated tree model.
At step 207, at the next node of the longitudinal federated tree, the service platform determines whether the termination condition is satisfied, the determination result can be explicitly or implicitly notified to the main server, if the termination condition is not satisfied, step 202 and step 206 are repeated; if the termination condition is met, the method proceeds to step 208, the current node is stored as a leaf node of the longitudinal federated tree, and the splitting is terminated.
The termination condition may be, for example, that the depth of the vertical federal tree reaches a preset tree depth, or that the number of samples per leaf node reaches a preset number of samples.
The implicit notification manner may be, for example, that the service platform continues to report the local information related to the first vector and the second vector, which is used for generating the first vector and the second vector, to the main server; the explicit notification means may be, for example, that the service platform directly notifies the main server of the termination or non-termination judgment result.
If a federated forest needs to be built, then execution continues to step 209.
In step 209, step 202 and step 208 are repeated, and each longitudinal federated tree is iteratively created on each business platform until a preset number of iterations is reached. The iteration times are the number of the created longitudinal federated trees, and the created plurality of federated trees can form a federated forest.
The federal forest integrates the weak models of the federal trees to obtain a strong model, the accuracy of the federal forest is comprehensively determined according to the results of the federal trees, and the accuracy of the model is improved.
Fig. 4 shows a schematic diagram of a master server building a vertical federated tree, in accordance with some exemplary embodiments of the present disclosure.
As shown in fig. 4, the main server 400 for constructing the vertical federal tree of this embodiment includes: a memory 401 and a processor 402 coupled to the memory 401, the processor 402 being configured to execute a method of the master server building a vertical federated tree in any of the foregoing embodiments based on instructions stored in the memory 401.
The method for the master server 400 to construct the vertical federal tree includes, for example: the main server 400 collects dimensions of all features of a common sample of each service platform, and determines a first vector according to the sum of the collected dimensions of all features and a preset parameter, wherein the preset parameter indicates the number of the features which do not participate in the node splitting of the longitudinal federated tree at this time, and the first vector indicates the degree of each feature participating in the node splitting of the longitudinal federated tree at this time. The main server 400 collects local second vectors of the service platforms, and determines a union set of a plurality of local second vectors corresponding to all the service platforms as a global second vector, wherein the local second vectors represent distances between the feature vectors of each common sample on one service platform and corresponding feature reference vectors, the global second vectors represent distances between the feature vectors of each common sample on all the service platforms and corresponding feature reference vectors, and the feature reference vectors represent random split points in a feature value range of each feature of each common sample of each service platform. Then, the main server 400 calculates a product value of the first vector and the global second vector, and issues the product value to each service platform, so that each service platform performs splitting of nodes of the longitudinal federate tree.
The memory 401 may include, for example, a system memory, a fixed nonvolatile storage medium, and the like. The system memory stores, for example, an operating system, an application program, a Boot Loader (Boot Loader), and other programs.
The master server 400 building the vertical federated tree may also include input-output interfaces 403, network interfaces 404, storage interfaces 405, and the like. These interfaces 403, 404, 405 and the memory 401 and the processor 402 may be connected by a bus 406, for example. The input/output interface 403 provides a connection interface for input/output devices such as a display, a mouse, a keyboard, and a touch screen. The network interface 404 provides a connection interface for various networking devices. The storage interface 405 provides a connection interface for external storage devices such as an SD card and a usb disk.
Fig. 5 shows a schematic diagram of a business platform building a longitudinal federated tree in accordance with further exemplary embodiments of the present disclosure.
As shown in fig. 5, the service platform 500 for building a longitudinal federated tree in this embodiment includes: a memory 501 and a processor 502 coupled to the memory 501, wherein the processor 502 is configured to execute the method for building a vertical federated tree for a business platform in any of the foregoing embodiments based on instructions stored in the memory 501.
The method for constructing the longitudinal federated tree by the service platform 500 includes, for example: the service platform 500 calculates a local second vector, and reports the local second vector to the main server, so that the main server determines a global second vector according to the collected local second vectors of the service platforms, where the local second vector represents a distance between a feature vector of each common sample on one service platform and a corresponding feature reference vector, and the feature reference vector represents a random split point in a feature value range of each feature of the common samples of the service platforms. Then, the service platform 500 receives a product value of a first vector and a global second vector sent by the main server, where the first vector represents a degree of each feature participating in the node splitting of the vertical federate tree at this time, and the global second vector represents a distance between the feature vector of each common sample in all the service platforms and the corresponding feature reference vector. Next, the service platform 500 performs splitting of the nodes of the longitudinal federated tree by using the product value of the first vector and the global second vector as the split value of each common sample to construct the longitudinal federated tree; and the service platform 500 repeatedly executes all the steps until a preset termination condition is met, and the construction of the longitudinal federated tree is completed.
The memory 501 may include, for example, a system memory, a fixed nonvolatile storage medium, and the like. The system memory stores, for example, an operating system, an application program, a Boot Loader (Boot Loader), and other programs.
The service platform 500 for building the longitudinal federated tree may further include an input-output interface 503, a network interface 504, a storage interface 505, and the like. These interfaces 503, 504, 505 and the connection between the memory 501 and the processor 502 may be, for example, via a bus 506. The input/output interface 503 provides a connection interface for input/output devices such as a display, a mouse, a keyboard, and a touch screen. The network interface 504 provides a connection interface for various networking devices. The storage interface 505 provides a connection interface for external storage devices such as an SD card and a usb disk.
Fig. 6 shows a schematic diagram of a system for building a longitudinal federated tree, in accordance with some exemplary embodiments of the present disclosure.
As shown in fig. 6, the system 600 for constructing a longitudinal federated tree in this embodiment includes: the aforementioned host server 400 and a plurality of the aforementioned service platforms 500.
The main server 400 is configured to receive local data (such as a local sample name, a local feature dimension, or a local second vector) transmitted by each service platform 500, generate global data (such as a product value of a first vector and a global second vector), and issue the global data to each service platform 500, so that each service platform 500 locally completes the construction of the vertical federation tree on the platform by using the global data issued by the main server 400. The service platform 400 is configured to report the aforementioned various local data to the main server, receive a product value of a first vector and a global second vector sent by the main server, and then complete splitting of all samples on the platform by using the product value of the first vector and the global second vector until a termination condition is met, thereby completing construction of the federate tree.
As will be appreciated by one skilled in the art, embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable non-transitory storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the disclosure. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only exemplary of the present disclosure and is not intended to limit the present disclosure, so that any modification, equivalent replacement, or improvement made within the spirit and principle of the present disclosure should be included in the scope of the present disclosure.

Claims (19)

1. A method for constructing a longitudinal federated tree is characterized by comprising the following steps:
the main server determines a first vector according to the dimension sum of all the features of the common sample of each service platform and a preset parameter, wherein the preset parameter represents the number of the features which do not participate in the node splitting of the longitudinal federated tree at this time, and the first vector represents the degree of participation of each feature in the node splitting of the longitudinal federated tree at this time;
the main server determines a global second vector according to the collected local second vectors of the service platforms, wherein the local second vectors represent the distance between the feature vector of each common sample on one service platform and the corresponding feature reference vector, the global second vectors represent the distance between the feature vectors of all the service platforms and the corresponding feature reference vector, and the feature reference vector represents a random split point in a feature value range of each feature of each common sample of each service platform;
the main server calculates a product value of the first vector and the global second vector, and issues the product value to each service platform, so that each service platform uses the product value as a split value to split the nodes of the longitudinal federated tree;
and repeating all the steps until a preset termination condition is met.
2. The method of constructing a longitudinal federated tree of claim 1, wherein determining a first vector comprises:
generating a random vector which accords with normal distribution, wherein the dimensionality of the random vector is equal to the sum of the dimensionalities of all the characteristics of the common sample of each service platform;
and setting the value of the corresponding number of elements indicated by the preset parameters in the random vector to be 0 to obtain the first vector.
3. The method of constructing a vertical federated tree of claim 1, wherein determining a global second vector comprises:
and determining the union of the local second vectors of all the service platforms as a global second vector.
4. The method of constructing a longitudinal federated tree of claim 1, further comprising:
and the main server performs sample alignment on the original samples of the service platforms, and determines the aligned original samples as common samples of the service platforms.
5. The method for constructing a longitudinal federated tree according to claim 1, characterized in that, the master server determines the dimension sum of all the features of the common samples of each business platform according to the dimension of the features of the collected common samples of each business platform.
6. The method of constructing a longitudinal federated tree of claim 1, wherein the preset parameter is less than a sum of dimensions of all features that a common sample of all business platforms has.
7. A method for constructing a longitudinal federated tree according to any one of claims 1 to 6, wherein the longitudinal federated tree is used for evaluating user credits, the respective service platforms include a plurality of service platforms that possess user samples of credits to be evaluated, the common sample is a common user sample, the common user sample is a user sample that the respective service platforms commonly possess, and the longitudinal federated tree depth information of a node where the common user sample is located in a node splitting process corresponds to the credit information of the common user sample.
8. A method of constructing a longitudinal federated tree, comprising:
the service platforms calculate local second vectors, and report the local second vectors to a main server, so that the main server determines a global second vector according to the collected local second vectors of the service platforms, wherein the local second vectors represent the distance between the feature vector of each common sample on one service platform and a corresponding feature reference vector, the global second vectors represent the distance between the feature vectors of all the service platforms and the corresponding feature reference vector, and the feature reference vector represents a random split point in a feature value range of each feature of the common samples of the service platforms;
the service platform receives a product value of a first vector and a global second vector sent by a main server, wherein the first vector represents the degree of each feature participating in the node splitting of the longitudinal federated tree;
the service platform uses the product value of the first vector and the global second vector as the splitting value of each common sample to split the nodes of the longitudinal federated tree so as to construct the longitudinal federated tree;
and the service platform repeatedly executes all the steps until a preset termination condition is met.
9. The method of constructing a longitudinal federated tree of claim 8, further comprising:
and the service platform receives a common sample which is sent by the main server and is determined after the samples of the service platforms are aligned.
10. The method of constructing a longitudinal federated tree of claim 8, wherein the business platform performing a split of a node of the longitudinal federated tree using a product value of the first vector and the global second vector as a split value for each common sample comprises:
and splitting a current node according to the split value corresponding to each common sample so as to determine the child nodes of the current node to which the common samples belong.
11. The method for constructing a horizontal federated tree according to claim 8, wherein splitting a current node according to the split value corresponding to each common sample includes:
if the splitting value corresponding to the common sample is smaller than 0, the common sample is divided into the right child node of the current node, and if the splitting value corresponding to the common sample is larger than 0, the common sample is divided into the left child node of the current node;
or if the split value corresponding to the common sample is less than 0, the common sample is divided into the left child node of the current node, and if the split value corresponding to the common sample is greater than 0, the common sample is divided into the right child node of the current node.
12. The method of constructing a longitudinal federated tree of claim 8, wherein the termination condition comprises:
the depth of the longitudinal federal tree reaches a preset depth;
alternatively, the number of samples of leaf nodes of the vertical federated tree reaches a preset number.
13. The method of constructing a longitudinal federated tree of claim 8, further comprising:
constructing a plurality of longitudinal federated trees to generate a longitudinal federated forest using the method of constructing a longitudinal federated tree of claim 8.
14. The method of constructing a longitudinal federated tree of claim 8, further comprising:
the business platform initializes a root node of a longitudinal federated tree such that the root node includes all common samples of the business platform.
15. A method for constructing a longitudinal federated tree according to any one of claims 8 to 14, wherein the longitudinal federated tree is used for evaluating user credits, the respective service platforms include a plurality of service platforms possessing user samples of credits to be evaluated, the common sample is a common user sample, the common user sample is a user sample commonly owned by the respective service platforms, and depth information of a longitudinal federated tree of a node where the common user sample is located in a node splitting process corresponds to credit information of the common user sample.
16. A primary server for constructing a vertical federated tree, comprising:
a memory; and
a processor coupled to the memory, the processor configured to execute the method of building a longitudinal federated tree of any of claims 1-7 based on instructions stored in the memory.
17. A business platform for building a longitudinal federated tree, comprising:
a memory; and
a processor coupled to the memory, the processor configured to execute the method of building a longitudinal federated tree of any of claims 8-15 based on instructions stored in the memory.
18. A system for constructing a longitudinal federated tree comprising:
a host server as claimed in claim 16 and a plurality of service platforms as claimed in claim 17.
19. A non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of constructing a longitudinal federated tree of any one of claims 1-15.
CN202010174360.9A 2020-03-13 2020-03-13 Method for constructing longitudinal federal tree, main server, service platform and system Active CN113392164B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010174360.9A CN113392164B (en) 2020-03-13 2020-03-13 Method for constructing longitudinal federal tree, main server, service platform and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010174360.9A CN113392164B (en) 2020-03-13 2020-03-13 Method for constructing longitudinal federal tree, main server, service platform and system

Publications (2)

Publication Number Publication Date
CN113392164A true CN113392164A (en) 2021-09-14
CN113392164B CN113392164B (en) 2024-01-12

Family

ID=77615861

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010174360.9A Active CN113392164B (en) 2020-03-13 2020-03-13 Method for constructing longitudinal federal tree, main server, service platform and system

Country Status (1)

Country Link
CN (1) CN113392164B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114118641A (en) * 2022-01-29 2022-03-01 华控清交信息科技(北京)有限公司 Wind power plant power prediction method, GBDT model longitudinal training method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160180245A1 (en) * 2014-12-19 2016-06-23 Medidata Solutions, Inc. Method and system for linking heterogeneous data sources
CN109002861A (en) * 2018-08-10 2018-12-14 深圳前海微众银行股份有限公司 Federal modeling method, equipment and storage medium
CN109165683A (en) * 2018-08-10 2019-01-08 深圳前海微众银行股份有限公司 Sample predictions method, apparatus and storage medium based on federation's training
CN110084377A (en) * 2019-04-30 2019-08-02 京东城市(南京)科技有限公司 Method and apparatus for constructing decision tree
CN110633805A (en) * 2019-09-26 2019-12-31 深圳前海微众银行股份有限公司 Longitudinal federated learning system optimization method, device, equipment and readable storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160180245A1 (en) * 2014-12-19 2016-06-23 Medidata Solutions, Inc. Method and system for linking heterogeneous data sources
CN109002861A (en) * 2018-08-10 2018-12-14 深圳前海微众银行股份有限公司 Federal modeling method, equipment and storage medium
CN109165683A (en) * 2018-08-10 2019-01-08 深圳前海微众银行股份有限公司 Sample predictions method, apparatus and storage medium based on federation's training
WO2020029590A1 (en) * 2018-08-10 2020-02-13 深圳前海微众银行股份有限公司 Sample prediction method and device based on federated training, and storage medium
CN110084377A (en) * 2019-04-30 2019-08-02 京东城市(南京)科技有限公司 Method and apparatus for constructing decision tree
CN110633805A (en) * 2019-09-26 2019-12-31 深圳前海微众银行股份有限公司 Longitudinal federated learning system optimization method, device, equipment and readable storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
魏雅婷;王智勇;周舒悦;陈为;: "联邦可视化:一种隐私保护的可视化新模型", 智能科学与技术学报, no. 04 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114118641A (en) * 2022-01-29 2022-03-01 华控清交信息科技(北京)有限公司 Wind power plant power prediction method, GBDT model longitudinal training method and device
CN114118641B (en) * 2022-01-29 2022-04-19 华控清交信息科技(北京)有限公司 Wind power plant power prediction method, GBDT model longitudinal training method and device

Also Published As

Publication number Publication date
CN113392164B (en) 2024-01-12

Similar Documents

Publication Publication Date Title
US11526333B2 (en) Graph outcome determination in domain-specific execution environment
US20210073282A1 (en) Graph-manipulation based domain-specific execution environment
CN109347651B (en) MSVL (modeling, simulation and verification language) -based block chain system modeling and security verification method and system
KR20200088766A (en) Distributed multi-party security model training framework for privacy protection
CN108011741B (en) Method and system for simulating and testing blockchains of distributed networks
CN110874648A (en) Federal model training method and system and electronic equipment
CN112597240B (en) Federal learning data processing method and system based on alliance chain
CN109583731A (en) A kind of Risk Identification Method, device and equipment
De Collibus et al. Heterogeneous preferential attachment in key ethereum-based cryptoassets
CN113392164B (en) Method for constructing longitudinal federal tree, main server, service platform and system
CN104050291A (en) Parallel processing method and system for account balance data
CN113392101A (en) Method, main server, service platform and system for constructing horizontal federated tree
CN111178678B (en) Network node importance evaluation method based on community influence
CN117094773A (en) Online migration learning method and system based on blockchain privacy calculation
CN112131587A (en) Intelligent contract pseudo-random number security inspection method, system, medium and device
CN109993338B (en) Link prediction method and device
WO2020201830A1 (en) Systems and methods for generating, monitoring, and analyzing event networks from event data
CN104965923B (en) A kind of cloud computing application platform construction method for generating cash flow statement
Zhang et al. Parallel option pricing with BSDEs method on MapReduce
CN111882415A (en) Training method and related device of quality detection model
CN111882416A (en) Training method and related device of risk prediction model
US20230334333A1 (en) Methods, apparatuses, and systems for training model by using multiple data owners
Marmsoler et al. On the impact of architecture design decisions on the quality of blockchain-based applications
Kaur et al. Technologies Behind Crypto-Based Decentralized Finance
CN116055049B (en) Multiparty secure computing method, device, system, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant