CN112700031A - XGboost prediction model training method for protecting multi-party data privacy - Google Patents
XGboost prediction model training method for protecting multi-party data privacy Download PDFInfo
- Publication number
- CN112700031A CN112700031A CN202011452494.9A CN202011452494A CN112700031A CN 112700031 A CN112700031 A CN 112700031A CN 202011452494 A CN202011452494 A CN 202011452494A CN 112700031 A CN112700031 A CN 112700031A
- Authority
- CN
- China
- Prior art keywords
- participant
- vector
- col
- order gradient
- participants
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/20—Ensemble learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- Strategic Management (AREA)
- General Engineering & Computer Science (AREA)
- Medical Informatics (AREA)
- Economics (AREA)
- Bioethics (AREA)
- Game Theory and Decision Science (AREA)
- Tourism & Hospitality (AREA)
- Computer Security & Cryptography (AREA)
- Computer Hardware Design (AREA)
- Databases & Information Systems (AREA)
- Entrepreneurship & Innovation (AREA)
- Marketing (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Development Economics (AREA)
- General Business, Economics & Management (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to an XGboost prediction model training method for protecting multi-party data privacy, which comprises a plurality of participants and a coordinator, wherein the participants with labels firstly calculate first-order and second-order gradient vectors and indication vectors by using a current model prediction result and a label value, the rest participants assist with the coordinator through secret sharing to jointly calculate and construct a combined decision tree model based on an XGboost algorithm, the participants cooperate with each other to determine the result of prediction of data to be trained in the combined decision tree model, and finally all the participants and the coordinator iterate together to complete the construction of a plurality of combined decision tree models to obtain a complete multi-party prediction model. Compared with the prior art, the method has the advantages of performing cross-data-source multi-party XGboost integrated model training on the premise of protecting data privacy, improving the prediction capability of the model while ensuring data safety and the like.
Description
Technical Field
The invention relates to the technical field of machine learning, in particular to an XGboost prediction model training method for protecting multi-party data privacy.
Background
The XGboost algorithm is an ensemble learning algorithm, has the characteristics of quick construction and accurate prediction, and is designed to solve the machine learning problem that data features are in the same machine, but the XGboost algorithm is insufficient, so that the situation that multiple parties respectively hold different features of the same batch of data samples, one party holds label information and cannot transmit the label information to other parties cannot be processed.
In order to protect the privacy of data of all parties, a training scheme of longitudinal federal learning is adopted in the field of machine learning so as to achieve the accuracy of a machine learning model which is close to or equal to that of the data under the condition of the same machine. The current longitudinal federal learning algorithm mainly comprises two parties, but the longitudinal federal learning method strictly limits the number of mechanisms which can cooperate, cannot be easily expanded to any other parties, simplifies or approximates a machine learning model in order to smoothly cooperate with multiple parties, and causes loss of calculation result precision.
Disclosure of Invention
The invention aims to provide an XGboost prediction model training method for protecting multi-party data privacy, aiming at overcoming the defects of obstacles existing in data interaction and data precision loss in a cooperation process in the prior art.
The purpose of the invention can be realized by the following technical scheme:
a XGboost prediction model training method for protecting multi-party data privacy comprises a plurality of participants and a coordinator, wherein the participants with labels firstly calculate first-order and second-order gradient vectors and indication vectors by using a current model prediction result and a label value, the rest participants jointly calculate and construct a combined decision tree model based on an XGboost algorithm through secret sharing and assistance of the coordinator, the participants cooperate with each other to determine a prediction result of data to be trained in the combined decision tree model, and finally all the participants and the coordinator iterate together to complete construction of a plurality of combined decision tree models to obtain a complete multi-party prediction model.
The specific steps for training the joint decision tree model are as follows:
s1, the first participant sets the initial number t of the building tree to 1, the initial depth d to 1, the regularization parameter λ and the maximum depth dmaxFor a total of N participants, using secret sharing splitting yields { λ }iAll set parameters are distributed to all participants i, with num owned for eachiParty i generation of a featureRandom non-repeating numiFeature number index, by a first participant holding a tag, to predict a result vector using a current modelAnd calculating a sample label vector y to obtain a first-order gradient vector G and a second-order gradient vector H, generating an initial all-1 indication vector S, respectively performing secret sharing and splitting, and splitting into N first-order gradient vector fragments (G) for N participants in totaliSecond order gradient vector fragmentation { H }iAnd indicates vector fragmentation { S }iAnd distributed to all participants i, i ═ 1, … N, respectively;
s2, each participant i receives { G }i,{H}i、{S}iThen, the ith slice { SG } of the own first-order gradient sum is calculatediIth slice of the sum of second order gradients { SH }iCalculating the ith sub-molecule fragment and the ith denominator fragment of each group corresponding to the splitting gain under each characteristic by using a secret sharing method, determining the maximum splitting gain and the characteristic, the group and whether the division is carried out or not by a coordinating party, and generating a left sub-tree indication vector SL and a right sub-tree indication vector SR after the division if the selected characteristic belongs to a participant i', wherein the SL and the SR respectively indicate samples in a left subset and a right subset obtained by dividing the sample set according to the characteristic and the group corresponding to the maximum splitting gain, and the left subset and the right subset respectively correspond to the left sub-tree and the right sub-tree; splitting the SL and SR into N fragments { SL }through secret sharingiAnd { SR }iI ═ 1, …, N, and distributed to participant i; each participant i utilizes the received { SL }i、{SR}iWith own indication vector slicing { S }iLeft sub-tree first order gradient vector shard { SGL }after the computation sample set is divided into left sub-treesiAnd second order gradient vector slicing { SHL }iComputing right sub-tree first order gradient vector Sharding (SGR) after the sample set is partitioned into the right sub-treeiAnd second order gradient vector sharding { SHR }iUsing { SGL }i、{SHL}i、{SL}iRecursively proceeds to step S2 to construct a left sub-tree, using { SGR }i、{SHR}i、{SR}iRecursively performing step S2 to construct a right subtree, and setting the depth d to d +1, if no division is performed or the maximum depth d is reachedmaxEach participant i calculates the ith fragment of the weight of the current leaf node sigma on the decision tree
S3, for each data sample xpEach participant i utilizes a sample of the held partial featuresCalculating the prediction result of the current t treeAccumulate to the results of the first t-1 trees to produce t trees for data sample xpIntegrated predicted results ofWhereinRepresenting the qth tree to the pth data sample xpThe result of the prediction of (a) is,to representThe p-th element, for a total of M data samples, traversal p 1, …, M yields the complete
And S4, increasing the number T of trees to T +1, and iterating the steps S1-S3 until T to T decision trees are built.
Further, the secret sharing algorithm used in steps S1, S2, S3 is a method of splitting a piece of data θ into multiple pieces { θ }iI pairs of different participantsThe respective fragments are calculated in the same type and step to generate { theta' }iAfter the calculation is finished, the data are generated by addition and combinationAnd theta' is equivalent to the result of executing the same type and synchronous step calculation on theta, and the specific calculation involved comprises the following steps:
a. secret sharing splitting
For one-dimensional data theta, when a participant i carries out secret sharing and splitting, N-1 random numbers are generated for N total participants, and the N-1 random numbers are designated as fragments { theta }i′I '≠ i, for participant i' to use, and participant i generates its own data slice { θ }i=θ-∑i′{θ}i′;
b. Secret sharing addition
For one-dimensional sliced data θA}1,…,{θA}NAnd { theta [ [ theta ] ])B}1,…,{θB}NEach participant i utilizes the holding { theta }A}iAnd { theta [ [ theta ] ])B}iCan directly use common addition to calculate thetaA}i+{θB}i={θ′}iTherefore, for the convenience of description, the common addition is directly used for explanation;
c. secret sharing subtraction
For one-dimensional sliced data θA}1,…,{θA}NAnd { theta [ [ theta ] ])B}1,…,{θB}NEach participant i utilizes the holding { theta }A}iAnd { theta [ [ theta ] ])B}iCan directly use common subtraction to calculate thetaA}i-{θB}i={θ′}iTherefore, for the convenience of description, the common subtraction method is directly used for explanation;
d. secret sharing multiplication
For one-dimensional sliced data θA}1,…,{θA}NAnd { theta [ [ theta ] ])B}1,…,{θB}NOf each participant i holds { theta }A}iAnd { theta [ [ theta ] ])B}iFirst, a coordinator generates one-dimensional variables a, b, c ═ a × b, and splits into { a } through secret sharing1,…,{a}N、{b}1,…,{b}NAnd { c }1,…,{c}NAnd sending the data to each participant i, and each participant i receives { a }i,{b}i,{c}iAnd calculates { e }i={θA}i-{a}iAnd { f }i={θB}i-{b}iSent to the first party, the first party calculatesAndsending the data to all participants, and calculating by the first participant to obtain { theta' }1And the other participants i calculate to obtain { theta' }iFinal secret-sharing multiplicationExpressed as:
for the above steps, the method can be popularized from one-dimensional data to multi-dimensional data.
Further, the step S1 specifically includes:
s1.1, the first participant sets the initial number t of the building tree to be 1, the initial depth d to be 1, the regularization parameter lambda and the maximum depth dmaxGenerating { lambda } using secret sharing splittingiDistribute all set parameters to all participants i, for each owning numiParticipant i, coordinator of individual characteristicsCounting the total number of features num of the participantsfeature=∑i=1numiThe production element is [1, 2.. num. ], numfeature]For each participant i, randomly assigns numiThe array elements in disorder sequence are not overlapped with the array elements obtained among all the participants, and all the participants establish one-to-one mapping map (j) from disorder array elements j to own characteristic numbers and record and store the mapping map (j) in the own party;
s1.2, all participants calculate the maximum characteristic value number k in own owned sample characteristicsselfmaxAnd then the data is sent to a coordinator, and the coordinator determines the maximum characteristic value number k of all participantsmax=max kselfmaxAnd broadcast to all participants;
s1.3, starting from a first participant holding tag data, each participant using the same loss function l (-) and the first participant predicting a result vector by using a modelCalculating a first order gradient vector from the tag value vector ySecond order gradient vectorWith the initial full 1 indicating the vector S, each data xpInitial predicted result of (2)When t is 1, the sum is 0, otherwise, the sum is expressed as the prediction weight sum of the existing t-1 treesSplitting G, H, S secret sharing into N first order gradient vector shards { G }for a total of N participantsiSecond order gradient vector fragmentation { H }iAnd indicates vector fragmentation { S }iI-1, … N, and distributed to participant i.
Further, the step S2 specifically includes:
s2.1, each participant i receives the ith sub-slice (G) of the first-order gradient vectoriSecond order gradient vector ith plate { H }iAnd indicating the ith slice { S }iThen, the ith slice { SG } of the own first-order gradient sum is calculatediIth slice of the sum of second order gradients { SH }i,{SG}iAnd { SH }iBy { G } owned by participants i, respectivelyiAnd { H }iRespectively summing vector elements;
s2.2, each participant i calculates the non-split gain molecule fragment { gain ] owned by each participant through the following formulaup}iWith undivided gain denominator slice { gaindown}i:
{gaindown}i={SH}i+{λ}i
Wherein the content of the first and second substances,for secret sharing multiplication, { lambda }iThe ith fragment is the hyperparameter lambda;
s2.3, each participant i utilizes { G }iAnd { H }iCalculating the first-order gradient and the fragmentation matrix (BG) of all the value intervals of all the characteristics owned by the selfiWith a second order gradient and a fragmentation matrix { BH }i;
S2.4, each participant i initializes the split gain molecular fragmentation matrix { leftgain } of the left subtreeup}iLeft subtree splitting gain denominator fragmentation matrix { leftgain }down}iRight subtree split gain molecular fragmentation matrix { rightgain }up}iRight subtree split gain denominator fragmentation matrix { rightgain }down}i;
S2.5, for the characteristic j, each participant i initializes and records a left subtree accumulated first-order gradient slicing variableLeft subtree cumulative second-order gradient sliced variableRight subtree cumulative first-order gradient sharding variableRight subtree cumulative second-order gradient fragment variableAre all 0;
s2.6, traversing the value interval k by each participant i, and updating and calculatingAndcomprises the following steps:
wherein { BH }i[j,k]And { BH }i[j,k]Respectively represent the fragmentation matrix { BG }iAnd { BH }iTo the [ j, k ] th]Element, update calculationAnd
for the XGBoost model, the splitting gain calculation formula used is as follows:
for each participant i, directly calculating splitting gain numerators and denominator fragments of left and right subtrees at the k-th value interval element position of the j-th characteristic and updating the splitting gain numerators and the denominator fragments into a matrix:
s2.7, each participant i utilizes the splitting gain numerator and the denominator fragmentation matrix of the left sub-tree and the right sub-tree obtained in the S2.6 to calculate the splitting gain difference value between different value intervals k of different characteristics j, and the selected characteristics j corresponding to the maximum splitting gain are determined through the comparison of the coordinatorsbestAnd the value interval kbest;
S2.8, each participant i is subjected to the maximum splitting gain characteristic jbestAnd the value interval kbestSplitting the numerator and denominator patches and non-split numerator and denominator patches { gain } using the left and right subtrees of that locationup}i、{gaindown}iCalculating the total split gain denominator (deno)minator}iSending to a coordinator, and calculating the total split gain molecule fragment { nominator }iSending the data to a first participant, and calculating the denominator by a coordinatorAnd determines a sign0First party calculation moleculeAnd determines a sign1The first participant and the coordinator pass through sign0And sign1Jointly determining a symbol variable corresponding to the final maximum gain;
s2.9, when the sign variable is 1, for the feature jbestWhen the i' th participant has the feature jbestSetting a record sample to fall into the M-dimensional vector SL of the left sub-tree after the characteristic division, and taking out the kth vector SLbestIndividual value intervalSample characteristics j are concentrated in a sample setbestValue takingSatisfy the requirement of The SL position of (1) is set to be 1, the rest positions are set to be 0, and one record sample is set to fall into the M-dimensional vector of the right subtree after the characteristic divisionNamely negating SL, and splitting SL and SR into N shards { SL }through secret sharing for N participants in totaliAnd { SR }iAnd distributed to all participants i, i ═ 1, …, i', …, N;
s2.10, each participant i receives { SL }iAnd { SR }iRecalculating owned by itselfLeft sub-tree indicates vector fragmentation { SL }iAnd right sub-tree indicating vector slice { SR }i:
{SL}i={S}i⊙{SL}i
{SR}i={S}i⊙{SR}i
Wherein a secret-sharing multiplication is performed between co-located elements of anGet a dimension { S }iThe same vector, calculate the own first-order gradient vector slice { GL } that falls into the left sub-tree sampleiWith the first-order gradient vector slice falling into the right sub-tree sample { GR }i:
{GL}i={G}i⊙{SL}i
{GR}i={G}i⊙{SR}i
Computing its own second-order gradient vector slice { HL } falling into the left sub-tree sampleiWith the second order gradient vector patch { HR } falling within the right sub-tree samplei:
{HL}i={H}i⊙{SL}i
{HR}i={H}i⊙{SR}i
S2.11, setting { GL } for each participant ii、{HL}iAnd { SL }iSetting { GR } for constructing first-order gradient vector fragments, second-order gradient vector fragments and indication vector fragments used by the left subtreei、{HR}iAnd { SR }iA first-order gradient vector fragment, a second-order gradient vector fragment and an indication vector fragment used for constructing the right subtree;
s2.12, when the current depth d of the tree reaches the set limit dmaxOr when the symbolic variable is not 1, calculating the weight value of the leaf node, and stopping continuously constructing the left and right subtrees for the current node;
and S2.13, setting d to be d +1, recursively executing the step S2.1 to the step S2.12, and completing construction of the XGboost decision tree.
Further, the step S2.3 specifically includes:
s2.3.1: all participants i initialize record interval first-order gradient and fragmented numfeature*kmaxDimension matrix { BG }iNum of second order gradient and slice from recording intervalfeature*kmaxDimension matrix { BH }i;
S2.3.2: for the feature j, j 1,2featureWhen the ith 'participant has the feature number j, mapping j to the own feature map (j) owned by the participant i' by using the feature index in the step S1-1, counting all the division values owned by the feature and recording the number k of the division valuesj;
S2.3.3: participant i' sets a k recording that the sample falls into the feature partitionmaxMatrix of M dimensionsindexM is the number of samples, and for the j-th feature, all the features are valued valkIs arranged as val from small to largek,k=1,...,kjSetting leftk=valk-1And left0=-∞,rightk=valuekGo through kjAnd taking out the kth value interval (left)k,rightk) Initializing a full 0 vector S 'of a dimension M x 1, and enabling the participator i' in a sample set to sample feature map (j) to take valuemap(j)Satisfy leftk<valuemap(j)≤rightkThe S' position of (A) is taken as 1, and Matrix is recordedindexKth line vector Matrixindex[k,:]=S′T,S′TIs a transposed vector of S ', after the partitioning traversal is finished, for the total N participants, the participant i' transforms the MatrixindexSplitting into N slices { Matrix ] through secret sharingindex}iAnd distributed to all participants i, i ═ 1, …, i', …, N;
s2.3.4: participant i receives { Matrixindex}iFor the j-th feature, traversing k until the maximum value interval number kmaxCalculating first order gradient and slicing { BG }i[j,k]With second order gradient and shard { BH }i[j,k]:
{BG}i[j,k]=sum({Matrixindex}i[k,:]⊙{G}i)
{BH}i[j,k]=sum({Matrixindex}i[k,:]⊙{H}i)
Wherein, [ k,: represents all elements of the k-th row of the selection matrix, where sum (v) represents the summation of elements of the vector v;
s2.3.5: go through j, execute S2.3.2-S2.3.4, make all participants i complete { BG }iAnd { BH }iAnd (4) calculating.
Further, the step S2.7 specifically includes:
s2.7.1, each participant i sets the initial partition index list vector col ═ 1,2, …, k of the currently participating alignmentmax]Recording a length of RcolSetting initial each feature division index list vector colselected;
S2.7.2 XGboost algorithm, for characteristic position [ j, col [ r ]]]And [ j, col [ r +1 ]]] col[r]Denotes the r-th element in the index list col, [ j, col [ r [ ]]]Col [ r ] th of j-th row of the matrix]The number of the elements is one,presentation pairRounding down, traversing r, and calculating the splitting gain difference between the feature positions as follows:
order:
nominator1col
=leftgainup[j,col[r]]*leftgaindown[j,col[r+1]]-leftgainup[j,col[r+1]]*leftgaindown[j,col[r]]
nominator2col
=rightgainup[j,col[r]]*rightgaindown[j,col[r+1]]-rightgainup[j,col[r+1]]*rightgainaown[j,col[r]]
denominator1col=leftgaindown[j,col[r]]*leftgaindown[j,col[r+1]]
denominator2col=rightgaindown[j,col[r]]*rightgaindown[j,col[r+1]]
then:
for theAnd (3) calculating all division position difference value result shards of the splitting gain numerator of the left sub-tree and the right sub-tree in the S2.6 and the characteristic j of the denominator shard matrix by using all the participators i by using the formula:
s2.7.3, each participant i sends its own space bit calculation slicing result vector { nominator1 to the coordinatorcol}i、{nominator2col}i、{denominator1col}i、{denominator2col}iThe coordinator collects and calculates the vectors respectively Wherein the vector of pairs { v }iOperation carried outRepresents for all { v }iThe collocated elements of a slice are summed into a vector, subtending the vectors v and v2Operation v carried out1\v2Representing division of co-located elements of two vectors, i.e.
S2.7.4, initializing an empty list new _ col, sequentially judging the r-th element of col _ shared _ value,if it is not negative, col [ r2]Adding into new _ col, otherwise, adding col [ r 2+1]Adding the new _ col into the new _ col, adding the last element of the col into the new _ col if the col length is odd after traversal is finished, and then broadcasting the new _ col by all participants in the coordination direction, wherein the participant commands col to be the new _ col;
s2.7.5, when col length is greater than 1, iterating steps S2.7.2 to S2.7.4 until col length becomes 1, and taking out the only element col [0 ] in col]Recording colselected[j]=col[0];
S2.7.6, traversing all the characteristics j, iterating steps S2.7.1 to S2.7.5, obtaining the selected division position of each characteristic,partitioning index list vector col into complete featuresselectedSetting the initial current partition index list vector row currently participating in the comparison as [1,2, …, numfeature]Recording a length of Rrow;
S2.7.7 XGboost algorithm, for characteristic position [ row [ r ]],colselected[row[r]]]And [ row [ r +1 ]],colselected[row[r+1]]],Wherein row [ r ]]Denotes the r-th element, col, in the index list rowselected[row[r]]Representation colselectedMiddle with row [ r ]]Is an element of the index position that is,represents a pair of RrowAnd/2, rounding down, traversing r, and calculating the splitting gain difference between the characteristic positions as follows:
order:
nominator1row
=leftgainup[row[r],colselected[row[r]]]*leftgaindown[row[r+1],colselected[row[r+1]]]-leftgainup[row[r+1],colselected[row[r+1]]]*leftgaindown[row[r],colselected[row[r]]]
nominator2row
=rightgainup[row[r],colselected[row[r]]]*rightgaindown[row[r+1],colselected[row[r+1]]]-rightgainup[row[r+1],colselected[row[r+1]]]*rightgaindown[row[r],colselected[row[r]]]
denominator1row
=leftgaindown[row[r],colselected[row[r]]]*leftgaindown[row[r+1],colselected[row[r+1]]]
denominator2row
=rightgaindown[row[r],colselected[row[r]]]*rightgaindown[row[r+1],colselected[row[r+1]]]
then:
for theElements, using the above formula, all the characteristics [1,2, …, num ] of the split gain numerator and denominator fragmentation matrix of the left and right subtrees in S2.6 are calculatedfeature]Each preferably divides the difference result slices between locations:
s2.7.8, each participant i sends its own interval calculation result vector { nominator1 to the coordinatorrow}i、{nominator2row}i、{denominator1row}i、{denominator2row}iThe coordinator collects and calculates the vectors respectively
S2.7.9, initialize empty list new _ row, traverse row _ shared _ value,if it is not negative, then row [ r2 [ ]]Adding into new _ row, otherwise, adding row [ r 2+1]Adding the new _ row element into a new _ row element, adding the last element of the row element into the new _ row element if the row length is odd after traversal is finished, then broadcasting the new _ row element by all participants in a coordination direction, and enabling the row element to be the new _ row element by the participants;
s2.7.10, when the row length is greater than 1, iterating steps S2.7.6 to S2.7.9 until the row length becomes 1, fetching the unique element row [0 ] in the row]The participant is recorded as jbest=row[0]And get kbest=colselected[jbest]Broadcast to all participants to determine the best feature number j after selectionbestThe optimal division position k with the featurebest。
Further, the step S2.8 specifically includes:
s2.8.1, maximum splitting gain characteristic j for a givenbestAnd the value interval kbestThe split gain expression in the XGBoost algorithm is:
each participant i calculates its own split gain molecule slice { nominator }i:
Calculating the split gain denominator slice { denoxinator }of the selfi:
S2.8.2, the remaining participants send the first participant the owned fragmentation gain molecule fragment { nominator }2...{nominator}NThe first party collects { nominator }2...{nominator}NAnd calculateSetting a first participant symbol sign1Judging the sign, and ordering:
s2.8.3, each participant i sends the coordinator the owned split gain denominator slice { denoxinator }to the coordinatoriThe coordinator collects { denoxinator }iN and calculatingSetting coordinator symbol sign0Judging the sign, and ordering:
s2.8.4 first participant sign1Sending the signal to a coordinator, and receiving and calculating sign as sign by the coordinator1*sign0Broadcasting sign to all participants, and receiving the value as the currently established symbol variable by all the participants;
further, the step S2.12 specifically includes:
s2.12.1, each participant i calculates half of the sum of its second order gradient fragment and the regularization term:
each participant i computes its own first-order gradient patch sum:
{g′}i={SG}i
s2.12.2, each participant i determines { h' }iOf order of magnitude muiSo that:
s2.12.3, all participating parties send corresponding magnitude digits, and the coordinator receives and selects the maximum magnitude digit as mumDetermining an iteration step sizeThe process parameter tau and the iteration number iter are sent to all the participants;
s2.12.4, setting random initial value for each participant iAnd a variable with an initial value of 0Starting with κ ═ 1, the iteration proceeds according to the following formula:
setting kappa as kappa +1 after each iteration, terminating when kappa as iter, and recording weight slicing after the computation of the participant i is finished
Further, the step S3 specifically includes:
s3.1 for the t treetEach ofFor data sample xpUsing features of the held partAccording to a local tree modelPerforming leaf node prediction, wherein for each tree node, if the partition information isIf the branch sub-tree has the characteristics, the left (right) sub-tree is divided and prediction is continued according to the characteristics and the values, all leaf node flag bits of the branch sub-tree which does not enter are set to be 0, and if the division information is notUntil a certain characteristic is found, prediction is made along all left and right subtrees of the tree nodeSetting flag bit to be 1 for attributive leaf node, finally obtaining the tree prediction by each participant iGenerating flag bits of all leaf nodes sigma, sigma 1,2iSimultaneously splicing delta leaf weights according to the same sequenceIs a result vector
S3.2, each participant i will indexiSecret sharing splitting is carried out and is divided into { indexi}i′Sent to all participants i', i ═ 1, …, i, …, N;
s3.3, eachThe participant i' receives the mark vector slice { index ] sent by the participant ii}i′Calculating bitwise cumulative multiplication vectors { index } of all vector slicesi′={index1}i′⊙{index2}i′⊙…⊙{indexN}i′And calculating the bitwise multiplication result { v) of the mark vector fragment and the self weight fragmentresult}i′={index}i′⊙{vw}i′;
S3.4, each participant i' pair { vresult}i′Element summation is carried out to { weightp}i′=sum({vresult}i′) And sending the result to the first party, which receives and calculatesAnd calculateBecomes the sample x after the end of the t roundpThe predicted result of (2);
s3.5, traversing all p, and calculating all data samples xpVector formed by combining t-th round prediction results
Compared with the prior art, the invention has the following beneficial effects:
according to the invention, the first-order and second-order gradient vectors and indication vectors are calculated by the participators of the tag by using the prediction result of the current model and the tag value, each participator constructs a decision tree model based on an XGboost algorithm through the assistance of secret sharing and coordinators, determines the prediction result of data to be trained through the joint cooperation of the participators, completes the construction of a plurality of decision tree models through iteration to obtain a complete lossless safe multi-party prediction model, performs cross-data-source multi-party XGboost integrated model training on the premise of protecting data privacy, and improves the prediction capability of the model while ensuring data safety.
Drawings
FIG. 1 is a schematic diagram of the interaction of participants and a coordinator in accordance with the present invention;
FIG. 2 is a schematic flow chart of a model training process of the present invention;
FIG. 3 is a communication flow diagram of the model training process of the present invention;
FIG. 4 is a diagram illustrating a multi-party tree model and its corresponding equivalent model according to an embodiment of the present invention.
Detailed Description
The invention is described in detail below with reference to the figures and specific embodiments. The present embodiment is implemented on the premise of the technical solution of the present invention, and a detailed implementation manner and a specific operation process are given, but the scope of the present invention is not limited to the following embodiments.
Example one
As shown in fig. 1, a XGBoost prediction model training method for protecting multi-party data privacy includes multiple participants and a coordinator, a participant with a tag first calculates a first-order and a second-order gradient vectors and an indication vector by using a current model prediction result and a tag value, the remaining participants assist with the coordinator through secret sharing to jointly calculate and construct a combined decision tree model based on an XGBoost algorithm, the participants cooperate together to determine a result of prediction of data to be trained in the combined decision tree model, and finally all the participants and the coordinator iterate together to complete construction of multiple combined decision tree models, thereby obtaining a complete multi-party prediction model.
In this embodiment, as shown in fig. 4, an application scenario is set such that several different types of mechanisms respectively have the same sample group, but the data characteristics do not overlap. By combining different institutional data, a more sophisticated model can be trained in multiple ways. In order to simulate the effect, a parallel computing frame is locally used, 4 computing nodes are set, the numbers of the computing nodes are respectively 0, 1,2 and 3, the computing nodes correspond to 3 computing participants and 1 coordinating party, wherein the computing node 0 is the coordinating party, the computing node 1 is the first participant with the characteristic, and the computing nodes 2 and 3 represent the participants 2 and 3. The present embodiment uses the Iris data set from UCI Machine Learning reproducibility, selects 100 pieces of data in two categories with class labels of 0 and 1, including four features of sepal length, sepal width, pedal length, and pedal width, and assigns sepal length and pedal length of the four features to the first participant, sepal width to participant 2, and pedal width to participant 3, where all participants regard 80% of the data samples as the training set, and the remaining 20% as the test set. The specific flow is shown in fig. 2 and 3.
S1, setting t to 1, generating initial tree building parameters and feature indexes, and calculating to generate gradient vectors and indicator vector segments, which specifically includes:
s1.1, setting initial tree building parameters and feature indexes:
the first participant sets the initial number t of the building tree to 1, the initial depth d to 1, the regularization parameter λ and the maximum depth dmaxIn this embodiment, λ is set to 1, d max4, for a total of 3 participants, calculate { λ }i1/3 (equivalent to splitting secret sharing into three), distributed to all participants i, the coordinator counting the total number of features num of the participants for each participant i with numi featuresfeature=∑i=1numiThe production element is [1, 2.. num. ], numfeature]For each participant i, randomly assigns numiThe array elements in the disordered sequence are not overlapped with the array elements obtained among the participants, each participant establishes a one-to-one mapping map (j) from the disordered array element j to the characteristic number owned by the participant and records and stores the mapping map (j) on the participant, for example, for a first participant with a first characteristic sepal length and a third characteristic pedal length, the two characteristics are accessed locally through numbers 0 and 1, the first participant is distributed to indexes 2 and 0, the first participant establishes mappings 0 ═ map (2) and 1 ═ map (0), and for the characteristic index number 2 in the subsequent iteration, the first participant owns the characteristic index number and converts the characteristic index number into the corresponding characteristic number 0 in the characteristic set by mapping, so as to access the characteristic;
s1.2, determining the maximum characteristic value quantity:
all participants calculate the maximum characteristic value number k in own sample characteristicsselfmaxAnd then the data is sent to a coordinator, and the coordinator determines the maximum characteristic value number k of all participantsmax=maxkselfmaxAnd broadcast to all participants;
s1.3, calculating to generate gradient vectors and indication vector fragments:
starting from the first party holding the tagged data, each party uses the same loss function l (-) which in the embodiment is the squared loss function MSE, i.e. the square loss function MSEFirst party predicts a result vector using a modelCalculating a first order gradient vector from the tag value vector ySecond order gradient vectorWith the initial full 1 indicating the vector S, each data xpInitial predicted result of (2)When t is 1, the sum is 0, otherwise, the sum is expressed as the prediction weight sum of the existing t-1 treesSplitting G, H, S secret sharing into N first order gradient vector shards { G }for a total of N participantsiSecond order gradient vector fragmentation { H }iAnd indicates vector fragmentation { S }iI-1, … N, and distributed to participant i.
S2, multiple parties construct the tth decision tree based on the XGboost algorithm, and the decision tree specifically comprises the following steps:
s2.1, each participant receives the ith sub-slice (G) of the first-order gradient vectoriSecond order gradient vector ith plate { H }iAnd indicating the ith slice { S }iCalculating the self first-order and second-order gradients and the fragments;
each participant i receives the ith slice of the first-order gradient vector { G }iSecond order gradient vector ith plate { H }iAnd indicating the ith slice { S }iThen, the ith slice { SG } of the own first-order gradient sum is calculatediIth slice of the sum of second order gradients { SH }i,{SG}iAnd { SH }iBy { G } owned by participants i, respectivelyiAnd { H }iRespectively summing vector elements;
s2.2, calculating the non-split gain numerator segment and the denominator segment by each participant:
for the XGBoost algorithm, at a certain tree node, for all data first-order gradients and SG, second-order gradients and SH, and regular term λ that the tree node has, the non-split gain is expressed as:
in our secret sharing scenario, we need to compute its molecular fragment { gain separately without divisionup}iAnd denominator slicing { gaindown}i:
{gaindown}i={SH}i+{λ}i
In the formulaFor secret sharing multiplication, { lambda }iThe ith fragment is the hyperparameter lambda;
s2.3, each participant i utilizes { G }iAnd { H }iCalculating the first-order gradient and the fragmentation matrix (BG) of all the value intervals of all the characteristics owned by the selfiWith a second order gradient and a fragmentation matrix { BH }i
S2.4, each participant i initializes the split gain molecular fragmentation matrix { leftgain } of the left subtreeup}iLeft subtree splitting gain denominator fragmentation matrix { leftgain }down}iRight subtree split gain molecular fragmentation matrix { rightgain }up}iRight subtree split gain denominator fragmentation matrix { rightgain }down}i
In the embodiment, the matrix needs to be explicitly initialized at each participant i to avoid the execution problem;
s2.5, for the characteristic j, each participant i initializes and records a left subtree accumulated first-order gradient slicing variableLeft subtree cumulative second-order gradient sliced variableRight subtree cumulative first-order gradient sharding variableRight subtree cumulative second-order gradient fragment variableAre all 0;
in this embodiment, the above variables need to be explicitly initialized at each participant i to avoid the execution problem;
s2.6, traversing the value interval k by all the participants i, and updating and calculatingAnd updating the split gain numerator and denominator fragmentation matrix of the left and right subtrees at the kth value interval position of the jth characteristic
Traversing the value interval k by each participant i, and updating and calculatingAndcomprises the following steps:
wherein { BH }i[j,k]And { BH }i[j,k]Respectively represent the fragmentation matrix { BG }iAnd { BH }iTo the [ j, k ] th]Element, update calculationAnd
for the XGBoost model, the splitting gain calculation formula used is as follows:
for each participant i, directly calculating splitting gain numerators and denominator fragments of left and right subtrees at the k-th value interval element position of the j-th characteristic and updating the splitting gain numerators and the denominator fragments into a matrix:
s2.7, each participant i utilizes the split gain numerator and the denominator fragmentation matrix of the left sub-tree and the right sub-tree obtained by 2.6 to calculate the split gain difference value between different value intervals k of different characteristics j, and the selected characteristics j corresponding to the maximum split gain is determined through the comparison of the coordinatorsbestAnd the value interval kbest;
S2.8, determining a symbol variable of the maximum gain;
s2.9, determining the indication vector slice divided by the division position for the features and the division position meeting the division criterion
When the sign variable is 1, for the feature jbestWhen the i' th participant has the feature jbestSetting a record sample to fall into the M-dimensional vector SL of the left sub-tree after the characteristic division, and taking out the kth vector SLbestIndividual value intervalSample characteristics j are concentrated in a sample setbestValue takingSatisfy the requirement of The SL position of (1) is set to be 1, the rest positions are set to be 0, and one record sample is set to fall into the M-dimensional vector of the right subtree after the characteristic divisionNamely negating SL, and splitting SL and SR into N shards { SL }through secret sharing for N participants in totaliAnd { SR }iAnd distributed to all participants i, i ═ 1, …, i', …, N;
s2.10, each participant updates first-order and second-order gradient vectors and indicates vector fragmentation
Each participant i receives { SL }iAnd { SR }iRecalculating the left sub-tree indication vector Slice (SL) owned by itselfiAnd right sub-tree indicating vector slice { SR }i:
{SL}i={S}i⊙{SL}i
{SR}i={S}i⊙{SR}i
Wherein a secret-sharing multiplication is performed between co-located elements of anGet a dimension { S }iThe same vector, calculate the own first-order gradient vector slice { GL } that falls into the left sub-tree sampleiWith the first-order gradient vector slice falling into the right sub-tree sample { GR }i:
{GL}i={G}i⊙{SL}i
{GR}i={G}i⊙{SR}i
Computing its own second-order gradient vector slice { HL } falling into the left sub-tree sampleiWith the second order gradient vector patch { HR } falling within the right sub-tree samplei:
{HL}i={H}i⊙{SL}i
{HR}i={H}i⊙{SR}i
S2.11, constructing variables of left and right subtrees and specifying:
for each participant i, { GL } is seti、{HL}iAnd { SL }iFirst-order gradient vector partition, second-order gradient vector partition and method for constructing left subtreeIndicate vector fragmentation, set { GR }i、{HR}iAnd { SR }iA first-order gradient vector fragment, a second-order gradient vector fragment and an indication vector fragment used for constructing the right subtree;
s2.12, when the current depth d of the tree reaches the set limit dmaxOr when the symbolic variable is not 1, calculating the weight value of the leaf node, and stopping continuously constructing the left and right subtrees for the current node;
2.13, increasing the tree depth, and recursively constructing a decision tree:
and setting d as d +1, recursively executing S2.1 to S2.12, and completing construction of an XGboost joint decision tree.
S3, generating a predicted result of the data sample by using the t-th tree, and merging the predicted result with the previous t-1 tree results, including:
s3.1, local result prediction:
for the t treetEach participant i for a data sample xpUsing features of the held partAccording to a local tree modelPerforming leaf node prediction, wherein for each tree node, if the partition information isIf the branch sub-tree has the characteristics, the left (right) sub-tree is divided and prediction is continued according to the characteristics and the values, all leaf node flag bits of the branch sub-trees which do not enter are set to be 0, and if the division information is notUntil a certain characteristic is found, prediction is made along all left and right subtrees of the tree nodeSetting flag bit to be 1 for attributive leaf node, and finally each participanti get the tree predictionGenerating flag bits of all leaf nodes sigma, sigma 1,2iSimultaneously splicing delta leaf weights according to the same sequenceIs a result vectorFor example, as shown in fig. 4, for a data sample, three participants can determine their corresponding token vectors 1-3 locally, and each participant has a result vector sliceWhere the first party holds a feature-partition pair (j)1,k1) And (j)4,k4) Participant 2 holds feature-partition pairs (j)2,k2) Participant 3 holds feature-partition pairs (j)3,k3) Three decision trees are equivalent to a decision tree which is obtained by training when data are stored in a single machine and contains complete division information, the first participating party to the third party respectively carries out sample division according to known information of the first participating party to the third party, when the decision tree contains the division information, a left sub-tree or a right sub-tree is selected, otherwise, the left sub-tree and the right sub-tree are searched, and finally, a mark vector (1, 1, 1, 0, 0), (0, 0, 1, 1, 1) and (0, 1, 1, 0, 0) which indicate the leaves to which a certain data sample belongs are respectively given;
s3.2, sign vector splitting and propagation:
each participant i will indexiSecret sharing splitting is carried out and is divided into { indexi}i′Sent to all participants i', i ═ 1, …, i, …, N;
s3.3, all the participants calculate respective prediction result fragments:
each participant i' receives the mark vector slice { index ] sent by the participant ii}i′Calculating bitwise cumulative multiplication vectors { index } of all vector slicesi′={index1}i′⊙{index2}i′⊙…⊙{indexN}i′And calculating the bitwise multiplication result { v) of the mark vector fragment and the self weight fragmentresult}i′={index}i′⊙{vw}i′;
S3.4, merging prediction result fragments:
each participant i' pair { vresult}i′Element summation is carried out to { weightp}i′=sum({vresult}i′) And sending the result to the first party, which receives and calculatesAnd calculateBecomes the sample x after the end of the t roundpThe predicted result of (2);
s3.5, calculating prediction results of all samples:
go through all p, calculate all data samples xpVector formed by combining t-th round prediction results
S4, iteratively increasing training rounds, and completing the construction of all decision trees:
and increasing the number T of trees to T +1, and iterating the steps S1-S3 until T to T decision trees are built.
Step S2.3 specifically includes:
s2.3.1: all participants i initialize record interval first-order gradient and fragmented numfeature*kmaxDimension matrix { BG }iNum of second order gradient and slice from recording intervalfeature*kmaxDimension matrix { BH }i;
S2.3.2: for the feature j, j 1,2featureWhen the ith' isWhen the participant has the feature number j, mapping j to the own feature map (j) owned by the participant i' by using the feature index in the step S1-1, counting all the division values owned by the feature and recording the number k of the valuesj;
S2.3.3: participant i' sets a k recording that the sample falls into the feature partitionmaxMatrix of M dimensionsindexM is the number of samples, and for the j-th feature, all the features are valued valkIs arranged as val from small to largek,k=1,...,kjSetting left to valk-1And left0=-∞,rightk=valuekGo through kjAnd taking out the kth value interval (left)k,rightk) Initializing a full 0 vector S 'of a dimension M x 1, and enabling the participator i' in a sample set to sample feature map (j) to take valuemap(j)Satisfy leftk<valuemap(j)≤rightkThe S' position of (A) is taken as 1, and Matrix is recordedindexKth line vector Matrixindex[k,:]=S′T,S′TIs a transposed vector of S ', after the partitioning traversal is finished, for the total N participants, the participant i' transforms the MatrixindexSplitting into N slices { Matrix ] through secret sharingindex}iAnd distributed to all participants i, i ═ 1, …, i', …, N;
s2.3.4: participant i receives { Matrixindex}iFor the j-th feature, traversing k until the maximum value interval number kmaxCalculating first order gradient and slicing { BG }i[j,k]With second order gradient and shard { BH }i[j,k]:
{BG}i[j,k]=sum({Matrixindex}i[k,:]⊙{G}i)
{BH}i[j,k]=sum({Matrixindex}i[k,:]⊙{H}i)
Wherein, [ k,: represents all elements of the k-th row of the selection matrix, where sum (v) represents the summation of elements of the vector v;
s2.3.5: go through j, execute S2.3.2-S2.3.4, makeAll participants i complete { BG }iAnd { BH }iAnd (4) calculating.
Step S2.7 specifically includes:
s2.7.1, each participant i sets the initial partition index list vector col ═ 1,2, …, k of the currently participating alignmentmax]Recording a length of RcolSetting initial each feature division index list vector colselected;
S2.7.2 XGboost algorithm, for characteristic position [ j, col [ r ]]]And [ j, col [ r +1 ]]], col[r]Denotes the r-th element in the index list col, [ j, col [ r [ ]]]Col [ r ] th of j-th row of the matrix]The number of the elements is one,presentation pairRounding down, traversing r, and calculating the splitting gain difference between the feature positions as follows:
order:
nominator1col
=leftgainup[j,col[r]]*leftgaindown[j,col[r+1]]-leftgainup[j,col[r+1]]*leftgaindown[j,col[r]]
nominator2col
=rightgainup[j,col[r]]*rightgaindown[j,col[r+1]]-rightgainup[j,col[r+1]]*rightgaindown[j,col[r]]
denominator1col=leftgaindown[j,col[r]]*leftgaindown[j,col[r+1]]
denominator2col=rightgaindown[j,col[r]]*rightgaindown[j,col[r+1]]
then:
for theAnd (3) calculating all division position difference value result shards of the splitting gain numerator of the left sub-tree and the right sub-tree in the S2.6 and the characteristic j of the denominator shard matrix by using all the participators i by using the formula:
s2.7.3, each participant i sends its own space bit calculation slicing result vector { nominator1 to the coordinatorcol}i、{nominator2col}i、{denominator1col}i、{denominator2col}iThe coordinator collects and calculates the vectors respectively Wherein the vector of pairs { v }iOperation carried outRepresents for all { v }iThe collocated elements of a slice are summed into a vector, subtending the vectors v and v2Operation v carried out1\v2Representing division of co-located elements of two vectors, i.e.
S2.7.4, initializing an empty list new _ col, sequentially judging the r-th element of col _ shared _ value,if it is not negative, col [ r2]Adding into new _ col, otherwise, adding col [ r 2+1]Adding the new _ col into the new _ col, adding the last element of the col into the new _ col if the col length is odd after traversal is finished, and then broadcasting the new _ col by all participants in the coordination direction, wherein the participant commands col to be the new _ col;
s2.7.5, when col length is greater than 1, iterating steps S2.7.2 to S2.7.4 until col length becomes 1, and taking out the only element col [0 ] in col]Recording colselected[j]=col[0];
S2.7.6, traversing all the characteristics j, iterating steps S2.7.1 to S2.7.5 to obtain the selected division position of each characteristic, and combining the selected division positions into a complete characteristic division index list vector colselectedSetting the initial current partition index list vector row currently participating in the comparison as [1,2, …, numfeature]Recording a length of Rrow;
S2.7.7 XGboost algorithm, for characteristic position [ row [ r ]],colselected[row[r]]]And [ row [ r +1 ]],colselected[row[r+1]]],Wherein row [ r ]]Denotes the r-th element, col, in the index list rowselected[row[r]]Representation colselectedMiddle with row [ r ]]Is an element of the index position that is,represents a pair of RrowAnd/2, rounding down, traversing r, and calculating the splitting gain difference between the characteristic positions as follows:
order:
nominator1row
=leftgainup[row[r],colselected[row[r]]]*leftgaindown[row[r+1],colselected[row[r+1]]]-leftgainup[row[r+1],colselected[row[r+1]]]*leftgaindown[row[r],colselected[row[r]]]
nominator2row
=rightgainup[row[r],colselected[row[r]]]*rightgaindown[row[r+1],colselected[row[r+1]]]-rightgainup[row[r+1],colselected[row[r+1]]]*rightgaindown[row[r],colselected[row[r]]]
denominator1row
=leftgaindown[row[r],colselected[row[r]]]*leftgaindown[row[r+1],colselected[row[r+1]]]
denominator2row
=rightgaindown[row[r],colselected[row[r]]]*rightgaindown[row[r+1],colselected[row[r+1]]]
then:
for theElements, using the above formula, all the characteristics [1,2, …, num ] of the split gain numerator and denominator fragmentation matrix of the left and right subtrees in S2.6 are calculatedfeature]Each preferably divides the difference result slices between locations:
s2.7.8, each participant i sends its own interval calculation result vector { nominator1 to the coordinatorrow}i、{nominator2row}i、{denominator1row}i、{denominator2row}iThe coordinator collects and calculates the vectors respectively
S2.7.9, initialize empty list new _ row, traverse row _ shared _ value,if it is not negative, then row [ r2 [ ]]Adding into new _ row, otherwise, adding row [ r 2+1]Adding the new _ row element into the new _ row element, and adding the last row element into the new _ row element if the row length is odd after traversalThen, the coordinator broadcasts new _ row to all participants, and the participants make row equal to new _ row;
s2.7.10, when the row length is greater than 1, iterating steps S2.7.6 to S2.7.9 until the row length becomes 1, fetching the unique element row [0 ] in the row]The participant is recorded as jbest=row[0]And get kbest=colselected[jbest]Broadcast to all participants to determine the best feature number j after selectionbestThe optimal division position k with the featurebest。
Step S2.8 specifically includes:
s2.8.1, maximum splitting gain characteristic j for a givenbestAnd the value interval kbestThe split gain expression in the XGBoost algorithm is:
each participant i calculates its own split gain molecule slice { nominator }i:
Calculating the split gain denominator slice { denoxinator }of the selfi:
S2.8.2, the remaining participants send the first participant the owned fragmentation gain molecule fragment { nominator }2...{nominator}NThe first party collects { nominator }2...{nominator}NAnd calculateSetting a first participant symbol sign1Judging the sign, and ordering:
s2.8.3, each participant i sends the coordinator the owned split gain denominator slice { denoxinator }to the coordinatoriThe coordinator collects { denoxinator }iN and calculatingSetting coordinator symbol sign0Judging the sign, and ordering:
s2.8.4 first participant sign1Sending the signal to a coordinator, and receiving and calculating sign as sign by the coordinator1*sign0Broadcasting sign to all participants, and receiving the value as the currently established symbol variable by all the participants;
step S2.12 specifically includes:
s2.12.1, each participant i calculates half of the sum of its second order gradient fragment and the regularization term:
each participant i computes its own first-order gradient patch sum:
{g′}i={SG}i
s2.12.2, each participant i determines { h' }iOf order of magnitude muiSo that:
s2.12.3, all participating parties send corresponding magnitude digits, and the coordinator receives and selects the maximum magnitude digit as mumDetermining an iteration step sizeThe process parameter tau and the iteration number iter are sent to all the participants;
s2.12.4, setting random initial value for each participant iAnd a variable with an initial value of 0Starting with κ ═ 1, the iteration proceeds according to the following formula:
setting kappa as kappa +1 after each iteration, terminating when kappa as iter, and recording weight slicing after the computation of the participant i is finished
In addition, it should be noted that the specific implementation examples described in this specification may have different names, and the above contents described in this specification are only illustrations of the structures of the present invention. All equivalent or simple changes in the structure, characteristics and principles of the invention are included in the protection scope of the invention. Various modifications or additions may be made to the described embodiments or methods may be similarly employed by those skilled in the art without departing from the scope of the invention as defined in the appending claims.
Claims (10)
1. A XGboost prediction model training method for protecting multi-party data privacy is characterized by comprising a plurality of participants and a coordinator, wherein the participants with labels firstly calculate first-order and second-order gradient vectors and indication vectors by using a current model prediction result and a label value, the rest participants assist with the coordinator through secret sharing to jointly calculate and construct a combined decision tree model based on an XGboost algorithm, the participants cooperate with each other to determine a prediction result of data to be trained in the combined decision tree model, and finally all the participants and the coordinator iterate together to complete the construction of a plurality of combined decision tree models to obtain a complete multi-party prediction model.
2. The XGboost prediction model training method for protecting multi-party data privacy as claimed in claim 1, wherein the specific steps of training the joint decision tree model are as follows:
s1, the participants with labels are used as first participants, the initial number, the initial depth, the regularization parameters and the maximum depth of the building tree are set, the regularization parameters are divided secretly, all the set parameters are sent to all the participants, random non-repetitive feature number indexes are generated for each participant with corresponding number of features, the first participant with labels calculates to obtain a first-order gradient vector and a second-order gradient vector by using the current model prediction result vector and the sample label vector, initial full 1 indicating vectors are generated, secret sharing and division are respectively carried out, and the first-order gradient vector fragments, the second-order gradient vector fragments and the indicating vector fragments with corresponding numbers are divided for the participants with the corresponding numbers and are respectively distributed to all the participants;
s2, after each participant i receives the first-order gradient vector fragment, the second-order gradient vector fragment and the indication vector fragment, calculating the fragment of the own first-order gradient sum and the fragment of the second-order gradient sum, calculating the numerator fragment and the denominator fragment of each group corresponding to the splitting gain respectively under each characteristic by using a secret sharing method, determining the maximum splitting gain and the belonged characteristic, the grouping and whether the division is carried out or not by using a coordinating method, when the division is carried out, if the selected characteristic belongs to a specific participant, generating a left sub-tree indication vector and a right sub-tree indication vector after the division, wherein the left sub-tree indication vector and the right sub-tree indication vector respectively indicate samples in a left subset and a right subset obtained by dividing the sample set according to the characteristic and the grouping corresponding to the maximum splitting gain, and the left subset and the right subset respectively correspond to the left sub-tree and the right sub; splitting the left sub-tree indication vector and the right sub-tree indication vector into a plurality of fragments through secret sharing, and distributing the fragments to participants; each participant utilizes the received fragment and the own indicating vector fragment to calculate a left subtree first-order gradient vector fragment and a second-order gradient vector fragment after the sample set is divided into a left subtree, a right subtree first-order gradient vector fragment and a second-order gradient vector fragment after the sample set is divided into a right subtree, the left subtree is constructed by using the left subtree first-order gradient vector fragment, the second-order gradient vector fragment and the left subtree indicating vector recursion, the right subtree is constructed by using the right subtree first-order gradient vector fragment, the second-order gradient vector fragment and the right subtree indicating vector recursion, a depth cyclic increasing condition is set, and if the division is not carried out or the preset maximum depth is reached, each participant calculates a corresponding fragment of the weight of the current leaf node on the decision tree;
s3, for each data sample, each participant utilizes the sample of the held partial characteristics to calculate the prediction result of the current combined decision tree model, and accumulates the prediction results into the results of the previous tree models with corresponding quantity to generate the comprehensive prediction result of the multiple tree models for the data sample;
and S4, increasing the number of the tree models, and iterating the steps S1-S3 until the target number of combined decision tree models are constructed.
3. The XGboost predictive model training method for protecting privacy of multi-party data according to claim 2, wherein the secret sharing algorithm in the steps S1, S2 and S3 comprises secret sharing splitting, secret sharing adding, secret sharing subtracting and secret sharing multiplying.
4. The XGboost predictive model training method for protecting multi-party data privacy as claimed in claim 2, wherein the step S1 specifically comprises:
s1.1, a first participant sets the initial number, the initial depth, the regularization parameter and the maximum depth of a constructed tree, the regularization parameter is shared and split secretly, all set parameters are distributed to all participants, for each participant with corresponding number of characteristics, a coordinator counts the total number of the characteristics of the participants, elements are generated into arrays with corresponding numbers, corresponding number of array elements in a disordering sequence are randomly distributed for each participant, the array elements obtained among the participants are not overlapped, and each participant establishes a one-to-one mapping map (j) from the disordering array elements to own characteristic numbers and records and stores the mapping map (j) at the own participant;
s1.2, calculating the maximum characteristic value quantity in own owned sample characteristics by all participants, sending the maximum characteristic value quantity to a coordinator, determining the maximum characteristic value quantity of all the participants by the coordinator, and broadcasting the maximum characteristic value quantity to all the participants;
s1.3, starting from the first participant with label data, enabling the participants to use the same loss function, enabling the first participant to calculate a first-order gradient vector, a second-order gradient vector and an initial all-1 indication vector by using a model prediction result vector and a label value vector, and enabling the initial prediction result of each piece of data to be divided into multiple first-order gradient vector fragments, second-order gradient vector fragments and indication vector fragments for each participant through a secret sharing algorithm and distributing the multiple first-order gradient vector fragments, second-order gradient vector fragments and indication vector fragments to the corresponding participants.
5. The XGboost predictive model training method for protecting multi-party data privacy as claimed in claim 2, wherein the step S2 specifically comprises:
s2.1, after each participant i receives the corresponding first-order gradient vector fragment, second-order gradient vector fragment and indication vector fragment, calculating the ith fragment { SG }of the first-order gradient sumiIth slice of the sum of second order gradients { SH }i,{SG}iAnd { SH }iThe vector elements are respectively summed by the first-order gradient vector fragment and the second-order gradient vector fragment owned by the participant i;
s2.2, for the XGboost algorithm, when a certain tree node exists, for all data first-order gradients and SG, second-order gradients and SH and a regular term lambda of the node, the non-splitting gain is expressed as:
each participant calculates the non-split gain numerator fragment and the non-split gain denominator fragment owned by the participant according to the non-split gain formula, and the calculation method specifically comprises the following steps:
{gaindown}i={SH}i+{λ}i
wherein, { gainup}iFor fragmentation of the undisrupted gain molecule, { gaindown}iFor the non-split gain denominator slices,for secret sharing multiplication, { lambda }iThe ith fragment is the hyperparameter lambda;
s2.3, each participant i utilizes the corresponding first-order gradient vector fragment and second-order gradient vector fragment to calculate the first-order gradient and fragment matrix (BG) of all the value intervals of all the own characteristicsiWith a second order gradient and a fragmentation matrix { BH }i;
S2.4, each participant i initializes the split gain molecular fragmentation matrix { leftgain } of the left subtreeup}iLeft subtree splitting gain denominator fragmentation matrix { leftgain }down}iRight subtree split gain molecular fragmentation matrix { rightgain }up}iRight subtree split gain denominator fragmentation matrix { rightgain }down}i;
S2.5, for the characteristic j, each participant initializes and records a left subtree accumulated first-order gradient slicing variableLeft subtree cumulative second-order gradient sliced variableRight subtree cumulative first-order gradient sharding variableRight subtree cumulative second-order gradient fragment variableAre all 0;
s2.6, each participant i traverses the value interval, and the left subtree accumulated first-order gradient fragment variable and the left subtree accumulated second-order gradient fragment variable are updated and calculated as follows:
wherein, { BH }i[j,k]And { BH }i[j,k]Respectively represent the fragmentation matrix { BG }iAnd { BH }iTo the [ j, k ] th]The number of the elements is one,
updating and calculating the right subtree accumulated first-order gradient fragment variable and the right subtree accumulated second-order gradient fragment variable:
for the XGBoost model, the splitting gain calculation formula used is as follows:
wherein λ is a regularization parameter;
for each participant, directly calculating the split gain numerator and denominator fragment of the left and right subtrees at the element positions of the corresponding value intervals of the corresponding features and updating the split gain numerator and the denominator fragment into a matrix:
wherein j is a characteristic number, and k is a value interval number;
s2.7, each participant uses the splitting gain numerator and the denominator fragmentation matrix of the left sub-tree and the right sub-tree to calculate the splitting gain difference value between each value interval of all the characteristics j in a traversing way, and the selected characteristics j corresponding to the maximum splitting gain is determined through the comparison of the coordinatorsbestAnd the value interval kbest;
S2.8, for the maximum splitting gain characteristic and the value range, each participant uses the splitting gain numerator and denominator fragment and the non-splitting gain numerator and denominator fragment of the left and right subtrees of the position, calculates the total splitting gain denominator fragment, sends the total splitting gain denominator fragment to the coordinator, calculates the total splitting gain numerator fragment, sends the total splitting gain numerator fragment to a first participant, calculates the denominator and determines the symbol of the coordinator by the coordinator, calculates the numerator and determines the symbol of the first participant, and the symbol of the first participant and the coordinator determine the symbol variable corresponding to the final maximum gain through the symbol of the coordinator and the symbol of the first participant;
s2.9, when the symbol variable is 1, the maximum splitting gain characteristic j is possessedbestSetting a full 0 column vector SL of a left subtree after a recording sample falls into the feature division, and taking out a value interval corresponding to the maximum splitting gainSample characteristics j are concentrated in a sample setbestValue takingSatisfy the requirement ofThe SL element of the position takes a value of 1, and an M-dimensional vector of a right subtree after a record sample falls into the feature division is setNamely, the SL is negated, and for N total participants, the left sub-tree indicating vector and the right sub-tree indicating vector are divided into N fragments through secret sharing and distributed to all the participants;
s2.10, each participant receives the fragments of the left sub-tree indicating vector and the right sub-tree indicating vector, and recalculates the left sub-tree indicating vector fragment { SL }owned by the participantiAnd right sub-tree indicating vector slice { SR }i:
{SL}i={S}i⊙{SL}i
{SR}i={S}i⊙{SR}i
Wherein a secret-sharing multiplication is performed between co-located elements of anGet a dimension { S }iThe same vector, calculate the own first-order gradient vector slice { GL } that falls into the left sub-tree sampleiWith the first-order gradient vector slice falling into the right sub-tree sample { GR }i:
{GL}i={G}i⊙{SL}i
{GR}i={G}i⊙{SR}i
Computing its own second-order gradient vector slice { HL } falling into the left sub-tree sampleiWith the second order gradient vector patch { HR } falling within the right sub-tree samplei:
{HL}i={H}i⊙{SL}i
{HR}i={H}i⊙{SR}i
S2.11, setting { GL } for each participant ii、{HL}iAnd { SL }iSetting { GR } for constructing first-order gradient vector fragments, second-order gradient vector fragments and indication vector fragments used by the left subtreei、{HR}iAnd { SR }iA first-order gradient vector fragment, a second-order gradient vector fragment and an indication vector fragment used for constructing the right subtree;
s2.12, when the current depth of the tree reaches a set limit or the symbolic variable is not 1, calculating the weight value of the leaf node, and stopping continuously constructing the left sub-tree and the right sub-tree for the current node;
s2.13, setting a depth cycle increasing condition, recursively executing the step S2.1 to the step S2.12, and constructing an XGboost joint decision tree model.
6. The XGboost predictive model training method for protecting multi-party data privacy as claimed in claim 5, wherein the step S2.3 specifically comprises:
s2.3.1, all participants initialize the corresponding dimension matrix { BG }recording interval first order gradient and sliceiCorresponding dimension matrix { BH } of second order gradient and fragment of recording intervali;
S2.3.2, j ═ 1,2, …, num for owned feature number jfeatureBy a participant of S1.1, mapping the feature number, namely mapping j to a self feature map (j) owned by a participant, counting all division values owned by the feature and recording the number of the values;
s2.3.3, the participator sets a multi-dimensional Matrix with the record sample falling into the feature divisionindexFor the feature j, all the features are valued valkIs arranged as val from small to largek,k=1,…,kjSetting leftk=valk-1And left1=-∞,rightk=valkTraversing k, and taking out the kth value interval (left)k,rightk) Initializing a full 0 column vector S', and making the sample feature map (j) in the participant sample set take valuemap(j)Satisfy leftk<valuemap(j)≤rightkIs 1, record Matrixindex[k,:]=S′TAfter the partitioning traversal is finished, the partition is split into N fragments { Matrix ] through secret sharingindex}iAnd distributed to the respective participants;
s2.3.4, participant i receives { Matrixindex}iAnd for the j-th feature, traversing k until the maximum value interval number, and calculating a first-order gradient and fragment and a second-order gradient and fragment:
{BG}i[j,k]=sum({Matrixindex}i[k,:]⊙{G}i)
{BH}i[j,k]=sum({Matrixindex}i[k,:]⊙{H}i)
wherein [ k, ] represents selecting all elements of the k row of the matrix, and sum (v) represents summing the elements of the vector v;
s2.3.5, traversing all the features, executing steps S2.3.2 to S2.3.4 to make all the participants complete the calculation of the first order gradient and slice and the second order gradient and slice of the transposed vector.
7. The XGboost predictive model training method for protecting multi-party data privacy as claimed in claim 5, wherein the step S2.7 specifically comprises:
s2.7.1, each participate inThe method sets the initial partition index list vector col ═ 1,2, …, k currently participating in the alignmentmax],kmaxRecord the length of col as R for the maximum feature division numbercolSetting initial each feature division index list vector colselected;
S2.7.2 XGboost algorithm, for characteristic position [ j, col [ r ]]]And [ j, col [ r +1 ]]], col[r]Denotes the r-th element in the index list col, [ j, col [ r [ ]]]Col [ r ] th of j-th row of the matrix]The number of the elements is one,presentation pairRounding down, traversing r, and calculating the splitting gain difference between the feature positions as follows:
order:
denominator1col=leftgaindown[j,col[r]]*eftgaindown[j,col[r+1]]
denominator2col=rightgaindown[j,col[r]]*rightgaindown[j,col[r+1]]
then:
for theAnd (3) calculating all division position difference value result shards of the splitting gain numerator of the left sub-tree and the right sub-tree in the S2.6 and the characteristic j of the denominator shard matrix by using all the participators i by using the formula:
s2.7.3, each participant i sends its own space bit calculation slicing result vector { nominator1 to the coordinatorcol}i、{nominator2col}i、{denominator1col}iAnd { denoxinator 2col}iThe coordinator collects and calculates the vectors respectively Wherein the pair vector{v}iOperation carried outRepresents for all { v }iThe collocated elements of a slice are summed into a vector, subtending the vectors v and v2Operation v carried out1\v2Representing division of co-located elements of two vectors, i.e.
S2.7.4, initializing an empty list new _ col, sequentially judging the r-th element of col _ shared _ value,if it is not negative, col [ r2]Adding into new _ col, otherwise, adding col [ r 2+1]Adding the new _ col into the new _ col, adding the last element of the col into the new _ col if the col length is odd after traversal is finished, and then broadcasting the new _ col by all participants in the coordination direction, wherein the participant commands col to be the new _ col;
s2.7.5, when col length is greater than 1, iterating steps S2.7.2 to S2.7.4 until col length becomes 1, taking out the only element col [0 ] in col]And recording colselected[j]=col[0];
S2.7.6, traversing all the features j, iterating steps S2.7.1 to S2.7.5 to obtain the selected partition positions of each feature, combining the selected partition positions into a complete feature partition index list vector, and setting the initial current partition index list vector row of the current involved in comparison to [1,2, …, num ═ 1,2, …feature]Recording a length of Rrow,numfeatureThe number of all features;
s2.7.7 XGboost algorithm, for characteristic position [ row [ r ]],colselected[row[r]]]Andwherein row [ r ]]Denotes the r-th element, col, in the index list rowselected[row[r]]Representation colselectedMiddle with row [ r ]]Is an element of the index position that is,represents a pair of RrowAnd/2, rounding down, traversing r, and calculating the splitting gain difference between the characteristic positions as follows:
order:
then:
for theElements, using the above formula, all the characteristics [1,2, …, num ] of the split gain numerator and denominator fragmentation matrix of the left and right subtrees in S2.6 are calculatedfeature]The corresponding best division of the difference result fragments between the positions specifically comprises the following steps:
s2.7.8, each participant i sends its own interval calculation result vector { nominator1 to the coordinatorrow}i、{nominator2row}i、{denominator1row}i、{denominator2row}iThe coordinator collects and calculates the vectors respectively
S2.7.9, initialize empty list new _ row, traverse row _ shared _ value,if it is not negative, then row [ r2 [ ]]Adding into new _ row, otherwise, adding row [ r 2+1]Adding the new _ row element into a new _ row element, adding the last element of the row element into the new _ row element if the row length is odd after traversal is finished, then broadcasting the new _ row element by all participants in a coordination direction, and enabling the row element to be the new _ row element by the participants;
s2.7.10, when the row length is greater than 1, iterating steps S2.7.6 to S2.7.9 until the row length becomes 1, fetching the unique element row [0 ] in the row]The participant is recorded as jbest=row[0]And get kbest=colselected[jbest]Broadcast to all participants to determine the best feature number j after selectionbestThe optimal division position k with the featurebest。
8. The XGboost predictive model training method for protecting multi-party data privacy as claimed in claim 5, wherein the step S2.8 specifically comprises:
s2.8.1, calculating the maximum splitting gain characteristic and the splitting gain in the value interval, specifically:
each participant i calculates its own split gain molecular fragment:
calculating the splitting gain denominator of the splitting gain denominator:
s2.8.2, the remaining participants respectively send the owned split gain molecule fragments to the first participant, the first participant collects and calculates, sets the symbol of the first participant, judges the sign of the first participant, and makes:
wherein sign1Is a first participant symbol;
s2.8.3, all participants send the share gain denominator fragments to the coordinator, the coordinator collects and calculates, sets the coordinator symbol, judges the sign, and makes:
wherein sign0Is a coordinator symbol;
s2.8.4, the first participant sends the first participant symbol to the coordinator, the coordinator receives and calculates the total symbol, broadcasts the total symbol to all participants, and all participants receive the value as the currently established symbol variable.
9. The XGboost predictive model training method for protecting multi-party data privacy as claimed in claim 5, wherein the step S2.12 specifically comprises:
s2.12.1, each participant calculates its second order gradient slice and half of the sum of the regular term:
wherein, { SH } is a second-order gradient slice, and { lambda } is a regular term;
each participant computes its own first-order gradient patch sum:
{g′}i={SG}i
wherein { SG } is a first-order gradient slice;
s2.12.2, each participant determines { h' }iOf the order of magnitude such that:
wherein, muiIs an order of magnitude digit;
s2.12.3, all participating parties send corresponding magnitude digits, and the coordinator receives and selects the maximum magnitude digit as mumDetermining an iteration step sizeThe process parameter tau and the iteration number iter are sent to all the participants;
s2.12.4, each participant i settingRandom initial valueAnd a variable with an initial value of 0Starting with κ ═ 1, the iteration proceeds according to the following formula:
10. The XGboost predictive model training method for protecting multi-party data privacy as claimed in claim 2, wherein the step S3 specifically comprises:
s3.1, each participant uses the held partial features for a data sample to predict leaf nodes according to a local tree model, wherein for each tree node, prediction is carried out according to the partition information of the tree node, all leaf node zone bits of branch subtrees which do not enter are set to be 0, if the partition information is not the features owned by the data sample, prediction is carried out along all left and right subtrees of the tree node until a leaf node which determines the attribute of the features is found, the zone bit is set to be 1, finally, each participant obtains the tree prediction features to generate the zone bits of all the leaf nodes, the zone bits are spliced into a zone vector according to the joint decision tree structure sequence of the leaf nodes, and meanwhile, a plurality of leaf weights are spliced into a result vector according to the same sequence;
s3.2, each participant carries out secret sharing and splitting on the mark vector and sends the mark vector to all participants;
s3.3, each participant receives the mark vector fragments sent by other participants, calculates bitwise multiplication vector fragments of all the vector fragments, and calculates bitwise multiplication results of the bitwise multiplication vector fragments and the weight fragments of the participant;
s3.4, each participant performs element summation according to the bitwise multiplication result and sends the result to a first participant, and the first participant receives and calculates a prediction result;
and S3.5, traversing all the data samples, and calculating a prediction result vector formed by combining corresponding prediction results.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011452494.9A CN112700031B (en) | 2020-12-12 | 2020-12-12 | XGboost prediction model training method for protecting multi-party data privacy |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011452494.9A CN112700031B (en) | 2020-12-12 | 2020-12-12 | XGboost prediction model training method for protecting multi-party data privacy |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112700031A true CN112700031A (en) | 2021-04-23 |
CN112700031B CN112700031B (en) | 2023-03-31 |
Family
ID=75508776
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011452494.9A Active CN112700031B (en) | 2020-12-12 | 2020-12-12 | XGboost prediction model training method for protecting multi-party data privacy |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112700031B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113098687A (en) * | 2021-04-27 | 2021-07-09 | 支付宝(杭州)信息技术有限公司 | Method and device for generating data tuple of secure computing protocol |
CN113506163A (en) * | 2021-09-07 | 2021-10-15 | 百融云创科技股份有限公司 | Isolated forest training and predicting method and system based on longitudinal federation |
CN113723477A (en) * | 2021-08-16 | 2021-11-30 | 同盾科技有限公司 | Cross-feature federal abnormal data detection method based on isolated forest |
CN114327371A (en) * | 2022-03-04 | 2022-04-12 | 支付宝(杭州)信息技术有限公司 | Secret sharing-based multi-key sorting method and system |
CN114841016A (en) * | 2022-05-26 | 2022-08-02 | 北京交通大学 | Multi-model federal learning method, system and storage medium |
CN115396100A (en) * | 2022-10-26 | 2022-11-25 | 华控清交信息科技(北京)有限公司 | Careless random disordering method and system based on secret sharing |
CN115630711A (en) * | 2022-12-19 | 2023-01-20 | 华控清交信息科技(北京)有限公司 | XGboost model training method and multi-party security computing platform |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110222527A (en) * | 2019-05-22 | 2019-09-10 | 暨南大学 | A kind of method for secret protection |
CN110795603A (en) * | 2019-10-29 | 2020-02-14 | 支付宝(杭州)信息技术有限公司 | Prediction method and device based on tree model |
CN111144576A (en) * | 2019-12-13 | 2020-05-12 | 支付宝(杭州)信息技术有限公司 | Model training method and device and electronic equipment |
CN111626886A (en) * | 2020-07-30 | 2020-09-04 | 工保科技(浙江)有限公司 | Multi-party cooperation-based engineering performance guarantee insurance risk identification method and platform |
CN111695697A (en) * | 2020-06-12 | 2020-09-22 | 深圳前海微众银行股份有限公司 | Multi-party combined decision tree construction method and device and readable storage medium |
CN111724174A (en) * | 2020-06-19 | 2020-09-29 | 安徽迪科数金科技有限公司 | Citizen credit point evaluation method applying Xgboost modeling |
CN111738360A (en) * | 2020-07-24 | 2020-10-02 | 支付宝(杭州)信息技术有限公司 | Two-party decision tree training method and system |
CN111782550A (en) * | 2020-07-31 | 2020-10-16 | 支付宝(杭州)信息技术有限公司 | Method and device for training index prediction model based on user privacy protection |
-
2020
- 2020-12-12 CN CN202011452494.9A patent/CN112700031B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110222527A (en) * | 2019-05-22 | 2019-09-10 | 暨南大学 | A kind of method for secret protection |
CN110795603A (en) * | 2019-10-29 | 2020-02-14 | 支付宝(杭州)信息技术有限公司 | Prediction method and device based on tree model |
CN111144576A (en) * | 2019-12-13 | 2020-05-12 | 支付宝(杭州)信息技术有限公司 | Model training method and device and electronic equipment |
CN111695697A (en) * | 2020-06-12 | 2020-09-22 | 深圳前海微众银行股份有限公司 | Multi-party combined decision tree construction method and device and readable storage medium |
CN111724174A (en) * | 2020-06-19 | 2020-09-29 | 安徽迪科数金科技有限公司 | Citizen credit point evaluation method applying Xgboost modeling |
CN111738360A (en) * | 2020-07-24 | 2020-10-02 | 支付宝(杭州)信息技术有限公司 | Two-party decision tree training method and system |
CN111626886A (en) * | 2020-07-30 | 2020-09-04 | 工保科技(浙江)有限公司 | Multi-party cooperation-based engineering performance guarantee insurance risk identification method and platform |
CN111782550A (en) * | 2020-07-31 | 2020-10-16 | 支付宝(杭州)信息技术有限公司 | Method and device for training index prediction model based on user privacy protection |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113098687A (en) * | 2021-04-27 | 2021-07-09 | 支付宝(杭州)信息技术有限公司 | Method and device for generating data tuple of secure computing protocol |
CN113098687B (en) * | 2021-04-27 | 2022-04-12 | 支付宝(杭州)信息技术有限公司 | Method and device for generating data tuple of secure computing protocol |
CN113723477A (en) * | 2021-08-16 | 2021-11-30 | 同盾科技有限公司 | Cross-feature federal abnormal data detection method based on isolated forest |
CN113723477B (en) * | 2021-08-16 | 2024-04-30 | 同盾科技有限公司 | Cross-feature federal abnormal data detection method based on isolated forest |
CN113506163A (en) * | 2021-09-07 | 2021-10-15 | 百融云创科技股份有限公司 | Isolated forest training and predicting method and system based on longitudinal federation |
CN114327371A (en) * | 2022-03-04 | 2022-04-12 | 支付宝(杭州)信息技术有限公司 | Secret sharing-based multi-key sorting method and system |
CN114327371B (en) * | 2022-03-04 | 2022-06-21 | 支付宝(杭州)信息技术有限公司 | Secret sharing-based multi-key sorting method and system |
CN114841016A (en) * | 2022-05-26 | 2022-08-02 | 北京交通大学 | Multi-model federal learning method, system and storage medium |
CN115396100A (en) * | 2022-10-26 | 2022-11-25 | 华控清交信息科技(北京)有限公司 | Careless random disordering method and system based on secret sharing |
CN115396100B (en) * | 2022-10-26 | 2023-01-06 | 华控清交信息科技(北京)有限公司 | Careless random disorganizing method and system based on secret sharing |
CN115630711A (en) * | 2022-12-19 | 2023-01-20 | 华控清交信息科技(北京)有限公司 | XGboost model training method and multi-party security computing platform |
Also Published As
Publication number | Publication date |
---|---|
CN112700031B (en) | 2023-03-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112464287B (en) | Multi-party XGboost safety prediction model training method based on secret sharing and federal learning | |
CN112700031B (en) | XGboost prediction model training method for protecting multi-party data privacy | |
Yarotsky et al. | The phase diagram of approximation rates for deep neural networks | |
Bruggemann et al. | Automated search for resource-efficient branched multi-task networks | |
Liu et al. | Detecting community structure in complex networks using simulated annealing with k-means algorithms | |
CN108921657B (en) | Knowledge-enhanced memory network-based sequence recommendation method | |
US20070250522A1 (en) | System and method for organizing, compressing and structuring data for data mining readiness | |
US8996436B1 (en) | Decision tree classification for big data | |
CN112862057B (en) | Modeling method, modeling device, electronic equipment and readable medium | |
CN106777090A (en) | The medical science big data search method of the Skyline that view-based access control model vocabulary is matched with multiple features | |
Pizzuti et al. | Many-objective optimization for community detection in multi-layer networks | |
CN106791964A (en) | Broadcast TV program commending system and method | |
CN115659807A (en) | Method for predicting talent performance based on Bayesian optimization model fusion algorithm | |
CN113723477A (en) | Cross-feature federal abnormal data detection method based on isolated forest | |
CN109902808A (en) | A method of convolutional neural networks are optimized based on floating-point numerical digit Mutation Genetic Algorithms Based | |
CN111104215B (en) | Random gradient descent optimization method based on distributed coding | |
CN114362948B (en) | Federated derived feature logistic regression modeling method | |
CN111832817A (en) | Small world echo state network time sequence prediction method based on MCP penalty function | |
CN114528971A (en) | Atlas frequent relation mode mining method based on heterogeneous atlas neural network | |
KR20230069578A (en) | Sign-Aware Recommendation Apparatus and Method using Graph Neural Network | |
CN114003744A (en) | Image retrieval method and system based on convolutional neural network and vector homomorphic encryption | |
Wickman et al. | Efficient quality-diversity optimization through diverse quality species | |
CN116564555A (en) | Drug interaction prediction model construction method based on deep memory interaction | |
Grebinski | On the power of additive combinatorial search model | |
CN115543616A (en) | Zk-SNARK operation-oriented GPU parallel acceleration method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |