WO2024051456A1 - 多方协同模型训练方法、装置、设备和介质 - Google Patents

多方协同模型训练方法、装置、设备和介质 Download PDF

Info

Publication number
WO2024051456A1
WO2024051456A1 PCT/CN2023/113287 CN2023113287W WO2024051456A1 WO 2024051456 A1 WO2024051456 A1 WO 2024051456A1 CN 2023113287 W CN2023113287 W CN 2023113287W WO 2024051456 A1 WO2024051456 A1 WO 2024051456A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
feature vector
training
noise
gradient
Prior art date
Application number
PCT/CN2023/113287
Other languages
English (en)
French (fr)
Inventor
鲁云飞
郑会钿
刘洋
王聪
吴烨
Original Assignee
北京火山引擎科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京火山引擎科技有限公司 filed Critical 北京火山引擎科技有限公司
Publication of WO2024051456A1 publication Critical patent/WO2024051456A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines

Definitions

  • the present disclosure relates to the field of neural network technology, and in particular to a multi-party collaborative model training method, device, equipment and medium.
  • federated learning proposes multi-party joint modeling to solve the problem of data islands, that is, jointly establishing a joint model.
  • the multi-party joint modeling proposed by federated learning requires the use of a trusted third party to encrypt and decrypt intermediate data.
  • a trusted third party to encrypt and decrypt intermediate data.
  • the third party also needs to have the ability to support federated learning.
  • the technology, computing power and human resources have increased the cost of joint modeling.
  • embodiments of the present disclosure provide a multi-party collaborative model training method, including:
  • the first party participating in model training builds a first model.
  • the first model and the second model are stacked in series to generate a joint model.
  • the first model is located below the second model.
  • the second model participates in model training.
  • the output of the first model is connected to the input of the second model, and the input of the first model is the input of the joint model, and the output of the second model is the output of the joint model.
  • the method further includes:
  • the target feature vector gradient is segmented according to the magnitude of the first feature vector and the second feature vector and then transferred backward to the first model and the third model respectively.
  • predicting the second sample data through the third model to obtain a second feature vector, and combining the second feature vector and the first feature vector to obtain a target feature vector including:
  • the target feature vector is obtained by splicing and combining the feature vectors located in the same row among the first feature vector and the second feature vector.
  • predicting the first sample data through the first model to obtain a first feature vector, and forwardly transferring the first feature vector to the second model includes:
  • the first feature vector and the first disturbance noise are added and then passed to the second model.
  • determining the disturbance noise according to the first feature vector and a preset noise function includes:
  • Disturbance noise is determined based on the sensitivity and the preset privacy budget.
  • Optional also includes:
  • a loss function value is determined.
  • receiving and adjusting parameters of the first model based on the first feature vector gradient of additional noise Optimize and iteratively train until the training end conditions of the joint model are met including:
  • the corresponding first feature vector gradient and disturbance noise are added together and then transferred backward to the first model, and are input to the first sample feature data as the first sample feature data.
  • the first model is trained again by forward propagation;
  • the parameters of the target joint model are determined to be the initial parameters of the first model built by the first party and the second model built by the second party.
  • back-transferring the corresponding first feature vector gradient to the first model includes:
  • the first eigenvector gradient and the second disturbance noise are added together and then transferred backward to the first model.
  • the noise perturbation processing includes at least one of processing based on Laplacian noise perturbation or Gaussian noise perturbation.
  • embodiments of the present disclosure provide a multi-party collaborative model training device, including:
  • a model building module used by the first party participating in model training to build a first model.
  • the first model and the second model are stacked in series to generate a joint model.
  • the first model is located below the second model.
  • the third model is The second model is built by the second party participating in model training;
  • a training module used to predict the first sample data through the first model to obtain a first feature vector, and forward the first feature vector to the second model, indicating that the second model is based on the received forward propagation training of the first feature vector, and reversely transfer the corresponding first feature vector gradient to the first model; the first feature vector and/or transferred between the first model and the second model
  • the first feature vector gradient is transmitted after noise perturbation processing based on the preset privacy budget;
  • a parameter optimization module configured to receive and perform parameter adjustment optimization on the first model based on the first feature vector gradient of the additional noise, and iteratively train until the training end conditions of the joint model are met.
  • an electronic device including:
  • processors one or more processors
  • a storage device for storing one or more programs
  • the one or more processors When the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the method as described in any one of the first aspects.
  • embodiments of the present disclosure provide a computer-readable storage medium on which a computer program is stored. When the program is executed by a processor, the method as described in any one of the first aspects is implemented.
  • an embodiment of the present disclosure provides a computer program, including:
  • embodiments of the present disclosure provide a computer program product, including instructions that, when executed by a processor, cause the processor to perform any of the foregoing methods.
  • Figure 1 is a schematic flowchart of a multi-party collaborative model training method provided by an embodiment of the present disclosure
  • Figure 2 is a schematic flow chart of another multi-party collaborative model training method provided by an embodiment of the present disclosure
  • Figure 3 is a schematic structural diagram of a multi-party collaborative model training device provided by an embodiment of the present disclosure
  • FIG. 4 is a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.
  • embodiments of the present disclosure provide a multi-party collaborative model training method.
  • the first party participating in model training builds a first model, and the first model and the second model are stacked in series to generate a federated model.
  • the first model is located below the second model, and the second model is built by the second party participating in model training; the first model predicts the first sample data to obtain the first feature vector, and forwards the first feature vector to the second model, instructs the second model to perform forward propagation training based on the received first feature vector, and reversely transfer the corresponding first feature vector gradient to the first model; the first model transferred between the first model and the second model A eigenvector and/or the first eigenvector gradient is transmitted after noise perturbation processing based on a preset privacy budget; receiving and performing parameter adjustment optimization on the first model based on the first eigenvector gradient with added noise, and iteratively training until satisfying link The training end condition of the combined model.
  • the joint model includes a first model and a second model.
  • the first model and the second model are stacked in series to generate a joint model.
  • the first sample feature data is input to the first model
  • the first sample feature data corresponding to the first sample feature data is obtained.
  • forward the first feature vector to the second model instruct the second model to forward propagate training based on the received first feature vector, and reversely transfer the corresponding first feature vector gradient to the first model
  • the first eigenvector and the first eigenvector perform noise disturbance processing based on the preset privacy budget to ensure the privacy of the data transferred between the first model and the second model
  • the joint model established by comparison requires a trusted third party.
  • the multi-party collaborative model training method proposed in the embodiment of the present disclosure only processes the intermediate data of the joint model based on the preset noise algorithm, ensuring the privacy of the data and reducing the cost of joint modeling.
  • Figure 1 is a schematic flowchart of a multi-party collaborative model training method provided by an embodiment of the present disclosure.
  • the method of this embodiment can be executed by a multi-party collaborative model training device, which can be implemented in hardware/or software, and can be configured in electronic equipment.
  • the multi-party collaborative model training method described in any embodiment of this application can be implemented. As shown in Figure 1, the method specifically includes S10 to S30.
  • the first party participating in model training builds the first model.
  • the first model and the second model are stacked in series to generate a joint model.
  • the first model is located below the second model.
  • the second model is built by the second party participating in model training.
  • the joint model provided by the embodiment of the present disclosure includes a first model and a second model.
  • the first model and the second model are connected in series.
  • the training sample data includes the first sample data and the sample label data corresponding to the first sample data.
  • the joint model includes a first model and a second model.
  • the first model and the second model are connected in series, that is, the output of the first model is connected to the input of the second model, and the first sample data is input to the first model.
  • the input of the first model is the input of the joint model
  • the first feature vector output by the first model is used as the input of the second model
  • the output of the second model is the training sample label data corresponding to the first sample data, that is, the second model
  • the output of is the output of the joint model.
  • the joint model By setting the joint model to include a first model and a second model, that is, when the first sample data is input to the joint model, first the first model of the joint model processes the input first sample data to obtain the corresponding first sample data. the first eigenvector of .
  • the first feature vector and/or the first feature vector gradient transferred between the first model and the second model is based on the preset
  • the privacy budget is transmitted after noise perturbation processing.
  • noise perturbation processing examples include Laplacian-based noise perturbation and Gaussian-based noise perturbation.
  • the multi-party collaborative model training method provided by the embodiment of the present disclosure, after predicting the first sample data through the first model to obtain the first feature vector, performs noise perturbation processing on the first feature vector and the first feature vector gradient based on the preset privacy budget , then pass the first feature vector after noise perturbation upward to the second model, and reversely pass the gradient of the first feature vector after noise perturbation to the first model, that is, perform noise perturbation processing based on the preset privacy budget, The privacy of data transferred between the first model and the second model is guaranteed.
  • the degree of privacy protection and data availability are the most important measurement indicators.
  • some researchers have proposed differential privacy technology.
  • As a privacy protection model it strictly defines the intensity of privacy protection, that is, the addition or deletion of any record will not affect the final query result.
  • traditional differential privacy technology concentrates the original data into a data center and then Publishing relevant statistical information that satisfies differential privacy is called Centralized Differential Privacy (CDP) technology.
  • CDP Centralized Differential Privacy
  • LDP Local Differential Privacy
  • the Laplacian algorithm (or Gaussian algorithm, etc.) is used to perturb the training sample data input to the joint model.
  • the training sample data input to the joint model includes several batches .
  • the training sample data forms a set X: (X 1 , i : (x i1 ,x i2 ,...,x im ), so the set X forms a matrix.
  • the maximum value - minimum value of xmj Calculate the maximum value - minimum value of xmj ),
  • the Laplacian noise is then calculated based on the calculated ⁇ F: Finally, the calculated Laplacian noise is appended to each training sample data to achieve privacy of data input into the joint model.
  • the first feature vector is input to the second model and the first feature vector gradient is reversed.
  • noise perturbation processing can be performed based on the preset privacy budget to add noise perturbation to the first feature vector and the first feature vector gradient, that is, the calculated noise is added to each first feature vector to ensure the first Privacy of data passed between the model and the second model.
  • the processed first feature vector is forwardly transferred to the second model, and the second model passes the After feature extraction and feature analysis of the first feature vector, the predicted sample label data corresponding to the first feature vector is output, and then the loss function value is determined based on the relationship between the predicted sample label data and the training sample label data corresponding to the first sample data.
  • the loss function value is greater than the preset threshold, the corresponding first feature vector gradient is reversely transferred to the first model, and is input to the first model as the first sample feature data for forward propagation training again; in the loss function
  • the parameters of the target joint model are determined to be the initial parameters of the first model built by the first party and the second model built by the second party.
  • Embodiments of the present disclosure provide a multi-party collaborative model training method.
  • the first party participating in model training builds a first model.
  • the first model and the second model are stacked in series to generate a joint model.
  • the first model is located below the second model.
  • the model is constructed by a second party participating in model training; the first characteristic is obtained by predicting the first sample data through the first model.
  • the first eigenvector and/or the first eigenvector gradient transferred between the first model and the second model is transferred after noise perturbation processing based on the preset privacy budget; receiving and based on the first eigenvector gradient pair with additional noise
  • the first model performs parameter adjustment optimization and iterative training until the training end conditions of the joint model are met. That is, the joint model includes a first model and a second model. The first model and the second model are stacked in series to generate a joint model. When the first sample feature data is input to the first model, the first sample feature data corresponding to the first sample feature data is obtained.
  • the joint model established by comparison requires a trusted third party.
  • the multi-party collaborative model training method proposed by the embodiment of the present disclosure only processes the intermediate data of the joint model based on the preset noise algorithm, ensuring the privacy of the data and reducing the cost of joint modeling.
  • the joint model includes a first model and a second model, and the first model and the second model are connected in series.
  • the joint model can also be other structures.
  • the first model includes a first sub-model and a second sub-model, the first sub-model and the second sub-model are connected in series, the second sub-model and the second model are connected in series, or the first model includes the first sub-model.
  • the model and the second sub-model, the first sub-model and the second sub-model are connected in parallel, the second model and the first sub-model and the second sub-model are connected in series, etc.
  • the number of sub-models included in the first model is implemented in this disclosure This example does not specifically limit this.
  • Determining the disturbance noise according to the first eigenvector and the preset noise function includes: selecting the maximum value and the minimum value of each group of eigenvectors from the first eigenvector to form a maximum value array and a minimum value array respectively; according to the maximum value array and The minimum array determines the sensitivity of the preset noise function; the perturbation noise is determined based on the sensitivity and the preset privacy budget.
  • the training sample data includes N rows of first sample data.
  • the first sample data is input into the first model to obtain the corresponding first sample data.
  • the eigenvector is obtained by adding the first eigenvector and the first disturbance noise to obtain the first target eigenvector and then passes it to the second model.
  • the first model obtains a first feature vector corresponding to the first sample data by performing deep learning and feature extraction on the first sample data, and the first feature vector represents the characteristics of the first sample data.
  • the first sample data is:
  • the set X composed of the first sample data: (X 1 , X 2 , X 3 ), among which, X 1 ( male, 15, high school), Primary school), first by inputting the first sample data composed of X 1 , X 2 and X 3 into the first model, the first model processes the first sample data and maps it to a multi-dimensional space to obtain the first sample data The corresponding spatial vector representation in the multi-dimensional space is the eigenvector. Then the maximum value-minimum value is calculated column by column for the first eigenvector output by the first model, and ⁇ F: (f 1 , f 2 , f 3 ) is obtained.
  • ⁇ F the sensitivity of the preset noise function
  • the calculated Laplacian noise is added to each feature vector and then input to the second model, that is, by applying the first model output to the third
  • the first feature vector is added with noise and then input to the second model to ensure the privacy of data between the first model and the second model.
  • the first preset noise function is exemplarily a Laplacian function.
  • the first preset noise function may be a Gaussian function, which is not specifically limited in the embodiments of the present disclosure.
  • the second model processes the first feature vector with added noise to obtain a predicted sample label feature vector, and the predicted sample label feature vector is used Characterizes the characteristics of the sample label predicted after the first sample data passes through the joint model.
  • the first sample data includes three groups of training sample data.
  • the training sample data includes multiple groups, which is not specifically limited in the embodiments of the present disclosure.
  • FIG 2 is a schematic flow chart of another multi-party collaborative model training method provided by an embodiment of the present disclosure.
  • the embodiment of the present disclosure is based on the above embodiment.
  • the specific implementation of step S20 also includes the following steps: S201 ⁇ S203.
  • the training sample feature data It includes N rows of first sample data and N rows of second sample data, and the features of the column vector corresponding to the first sample data and the column vector of the second sample data do not intersect.
  • the first sample data is input into the first model to obtain the first feature vector corresponding to the first sample data
  • the second sample data is input into the third model to obtain the second feature corresponding to the third sample data. vector.
  • the first model obtains the first feature vector corresponding to the first sample data by performing deep learning and feature extraction on the first sample data.
  • the third model obtains the first feature vector corresponding to the first sample data by performing deep learning and feature extraction on the second sample data.
  • the second feature vector corresponding to the two sample data, the first feature vector represents the characteristics of the first sample data, and the second feature vector represents the characteristics of the second sample data.
  • the first sample data is:
  • the second sample data is:
  • the characteristics of the first sample data are gender, age and education respectively
  • the characteristics of the second sample data are browsing product type, time period and the number of views of the same product, that is, the first sample data and the second
  • the features corresponding to the sample data do not intersect.
  • the first sample data consists of a set X1: (X 11 , X 12 , X 13 ), where X 11 (male , 15, high school), Female, 6, primary school), the second sample data consists of a set X2: ( X 21 , X 22 , X 23 ), among which, 10:30-11:30, 1), X 23 (2 , 10:30-11:30, 1), by inputting the first sample data composed of X 11 ,
  • the first model processes the first sample data and maps it to the multi-dimensional space to obtain the spatial vector representation corresponding to the first sample data in the multi-dimensional space, that is, the first feature vector.
  • the second sample data is input to the third model, and the third model processes the second sample data and maps it to the multidimensional space to obtain the spatial vector representation corresponding to the second sample data in the multidimensional space, that is, the second feature vector.
  • the first eigenvector and the second eigenvector need to be processed based on the preset noise function, that is, after the first model outputs the first eigenvector, the first eigenvector is processed through the preset noise function.
  • the first sample data X1 ( X 11 )
  • the first target feature vector obtained by processing the first feature vector is Y1': (Y 11 ′, Y 12 ′, Y 13 ′).
  • the second feature vector is processed through a preset noise algorithm.
  • the second sample data X2: (X 21 , X 22 , X 23 ) is predicted by the third model to obtain the second feature vector Y2: (Y 21 , Y 22 , Y 23 ).
  • the preset noise algorithm is
  • the second target feature vector obtained by processing the feature vector is Y2': (Y 21 ′, Y 22 ′, Y 23 ′).
  • the first target feature vector and the second target feature vector are spliced Finally, the target feature vector is obtained.
  • the preset noise function for processing the first eigenvector is different from the preset noise function for processing the second eigenvector.
  • the preset noise function processed may be the same noise function or different noise functions, which is not specifically limited in the embodiments of the present disclosure.
  • merging the second feature vector and the first feature vector to obtain the target feature vector includes: sequentially splicing and combining the feature vectors located in the same row of the first feature vector and the second feature vector. Finally, the target feature vector is obtained.
  • the first target feature vector obtained after performing noise perturbation processing on the first feature vector Y1: (Y 11 , Y 12 , Y 13 ) based on the preset privacy budget is Y1': (Y 11 ′, Y 12 ′, Y 13 ′)
  • the second target feature vector obtained by performing noise perturbation processing on the second feature vector Y2: (Y 21 , Y 22 , Y 23 ) based on the preset privacy budget is Y2': (Y 21 ′ , Y 22 ′, Y 23 ′)
  • the process of splicing the first target feature vector and the second target feature vector is: Y 11 ′ in the first target feature vector and Y 21 ′ in the second target feature vector Splicing, splicing Y 12 ′ in the first target feature vector and Y 22 ′ in the second target feature vector, splicing Y 13 ′ in the first target feature vector and Y 23 ′ in the second target feature vector, after splicing
  • Y 21 ′ in the second target feature vector is spliced into the first target feature vector.
  • the first feature vector and the second feature vector are respectively processed based on the preset privacy budget.
  • Noise perturbation processing then combining the first eigenvector and the second eigenvector after noise perturbation processing to obtain the target eigenvector and passing it upward to the second model, and combining the first eigenvector gradient and the second feature after noise perturbation processing
  • the vector gradient is merged to obtain the target feature vector gradient and then transferred back to the first model and the third model. That is, noise perturbation processing is performed based on the preset privacy budget to ensure that the data transferred between the first model, the third model, and the second model are privacy.
  • the number of models included in the joint model needs to be determined before the target feature vector gradient is transferred back to the first model.
  • the first model and the second model are stacked in series to generate the joint model, at this time, The gradient of the target feature vector can be directly transferred back to the first model as the input of the joint model.
  • the second model of the joint model needs to be The number of models included below the model divides the target feature vector gradient.
  • the process of dividing the target feature vector gradient according to the number of models included below the second model in the joint model is: after processing the first feature vector Y1: (Y 11 , Y 12 , Y 13 ) first goal
  • the feature vector is Y1': (Y 11 ', Y 12 ', Y 13 ')
  • the second target feature vector obtained by processing the second feature vector Y2: (Y 21 , Y 22 , Y 23 ) is Y2' ⁇ (Y 21 ′, Y 22 ′, Y 23 ′)
  • the process of splicing the first target feature vector and the second target feature vector is to combine Y 11 ′ in the first target feature vector with the second target feature Vector
  • Y 21 ′ is spliced
  • Y 12 ′ in the first target feature vector is spliced with the second target feature vector Y 22 ′
  • Y 13 ′ in the first target feature vector is spliced with the second target feature vector Y 23 ′
  • the target feature vector Y' ⁇ (Y 11 ′+Y 21 ′, Y 12 ′+Y 22 ′, Y 13 ′+Y 23 ′)
  • the gradient is G: Where Loss represents the loss value, Indicates the predicted value of the dimension corresponding to its subscript.
  • each part corresponds to three features, input the feature vector corresponding to the first part to the first model, input the feature vector corresponding to the second part to the third model, and add the target feature vector gradient value G Divide it into two parts, each part corresponds to three features, input the feature vector corresponding to the first part into the first model, input the feature vector corresponding to the second part into the third model, and finally, the divided target feature vector
  • the gradient values are passed backward to the first model and the third model respectively.
  • step S30 the specific implementation method corresponding to step S30 is S301: receiving and performing parameter adjustment optimization on the first model and the third model based on the target feature vector gradient with added noise, and iteratively training until the training end conditions of the joint model are met.
  • the joint model includes a first model, a second model and a third model
  • the first model and the third model are stacked in series and then stacked in series with the second model to generate a joint model.
  • first input the first sample data to the first model to obtain the first feature vector corresponding to the first sample data
  • input the second sample data to the third model to obtain the second feature vector corresponding to the second sample data
  • the first feature vector output by the first model and the second feature vector output by the third model are processed and then spliced.
  • the privacy of the sample data output to the second model is ensured, and on the other hand, the privacy of the sample data output by the first model and Data privacy between third models.
  • the corresponding first feature vector gradient is transferred backward to the first model
  • the method includes: determining the second disturbance noise according to the first eigenvector gradient and the second preset noise function; adding the first eigenvector gradient and the second disturbance noise and then inversely transmitting them to the first model.
  • the first sample data consists of a set X1: (X 11 , X 12 , X 13 ), where X 11 (male , 15, high school), Female, 6, primary school), the second sample data consists of a set X2: (X 21 , X 22 , X 23 ), where, X 21 (15, 8 :30-9:30, 5), 10:30-11:30, 1), X 23 (2, 10:30-11:30, 1), the first sample data X1: (X 11 , X 12 , X 13 ) is predicted by the first model
  • the first feature vector is Y1: (Y 11 , Y 12 , Y 13 ), and the second sample data X2: (X 21 , X 22 , X 23 ) is predicted by the third model to obtain the second feature vector Y2: (Y 21 , Y 22 , Y 23 ), then calculate the maximum-minimum value of the first eigenvector Y1 and the second eigenvector Y2 by column
  • FIG 3 is a schematic structural diagram of a multi-party collaborative model training device provided by an embodiment of the present disclosure. As shown in Figure 3, the multi-party collaborative model training device includes:
  • the model building module 310 is used by the first party participating in model training to build a first model.
  • the first model and the second model are stacked in series to generate a joint model.
  • the first model is located below the second model.
  • the second model is a model participating in the model training. second-party build;
  • the training module 320 is used to predict the first sample data through the first model to obtain the first feature vector, and forward the first feature vector to the second model, instructing the second model to forward the first feature vector based on the received first feature vector.
  • the parameter optimization module 330 is configured to receive and perform parameter adjustment optimization on the first model based on the first feature vector gradient of the additional noise, and iteratively train until the training end conditions of the joint model are met.
  • the first party participating in model training builds a first model, and the first model and the second model are stacked in series to generate a joint model.
  • the first model is located below the second model, and the second model Constructed for the second party participating in model training; predict the first sample data through the first model to obtain the first feature vector, and forward the first feature vector to the second model, instructing the second model to based on the received first Feature vectors are forward propagated and trained, and the corresponding first feature vector gradient is reversely transferred to the first model; the first feature vector and/or the first feature vector gradient transferred between the first model and the second model are based on the preset privacy budget
  • the acoustic disturbance is transmitted after processing; it receives and performs parameter adjustment and optimization on the first model based on the first feature vector gradient of the additional noise, and iteratively trains until the training end conditions of the joint model are met.
  • the joint model includes a first model and a second model.
  • the first model and the second model are stacked in series to generate a joint model.
  • the first sample feature data is input to the first model
  • the first sample feature data corresponding to the first sample feature data is obtained.
  • forward the first feature vector to the second model instruct the second model to forward propagate training based on the received first feature vector, and reversely transfer the corresponding first feature vector gradient to the first model.
  • the multi-party collaborative model training method proposed in the embodiment of the present disclosure only processes the intermediate data of the joint model based on the preset noise algorithm, ensuring the privacy of the data and reducing the complexity of joint modeling. cost.
  • the device provided by the embodiment of the present invention can execute the method provided by any embodiment of the present invention, and has corresponding functional modules and beneficial effects for executing the method.
  • the present disclosure also provides an electronic device, including: a processor, the processor is configured to execute a computer program stored in a memory, and when the computer program is executed by the processor, the steps of the above method embodiments are implemented.
  • FIG. 4 is a schematic structural diagram of an electronic device provided by the present disclosure.
  • FIG. 4 shows a block diagram of an exemplary electronic device suitable for implementing embodiments of the present invention.
  • the electronic device shown in FIG. 4 is only an example and should not impose any restrictions on the functions and scope of use of the embodiments of the present invention.
  • electronic device 800 is embodied in the form of a general computing device.
  • the components of electronic device 800 may include, but are not limited to: one or more processors 810, system memory 820, and a bus 830 connecting different system components (including system memory 820 and processors).
  • Bus 830 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a graphics accelerated port, a processor, or a local bus using any of a variety of bus structures.
  • these architectures include, but are not limited to, the Industry Standard Architecture (ISA) bus, the Micro Channel Architecture (MAC) bus, the Enhanced ISA bus, the Video Electronics Standards Association (VESA) local bus, and the Peripheral Component Interconnect ( PCI) bus.
  • ISA Industry Standard Architecture
  • MAC Micro Channel Architecture
  • VESA Video Electronics Standards Association
  • PCI Peripheral Component Interconnect
  • Electronic device 800 typically includes a variety of computer system readable media. These media can be any media that can be accessed by electronic device 800, including volatile and nonvolatile media, removable and non-removable media.
  • System memory 820 may include computer system readable media in the form of volatile memory, such as random access memory (RAM) 840 and/or cache memory 850 .
  • Electronic device 800 may further include other removable/non-removable, volatile/non-volatile computer system storage media.
  • storage system 860 may be used to read and write to non-removable, non-volatile magnetic media (commonly referred to as "hard drives").
  • Disk drives may be provided for reading and writing from removable non-volatile disks (e.g., "floppy disks"), and for reading and writing from removable non-volatile optical disks (e.g., CD-ROMs, DVD-ROMs, or other optical media).
  • CD-ROM drive may be connected to bus 830 through one or more data media interfaces.
  • System memory 820 may include at least one program product having a set (eg, at least one) of program modules configured to perform the functions of various embodiments of the present invention.
  • a program/utility 880 having a set of (at least one) program modules 870 may be stored, for example, in system memory 820 Data, each of these examples or some combination may include an implementation of a network environment.
  • Program modules 870 generally perform functions and/or methods in the described embodiments of the present invention.
  • the processor 810 executes at least one program among multiple programs stored in the system memory 820 to execute various functional applications and information processing, for example, to implement the method embodiments provided by the embodiments of the present invention.
  • the present disclosure also provides a computer-readable storage medium on which a computer program is stored.
  • a computer program is stored on which a computer program is stored.
  • the steps of the above method embodiments are implemented.
  • the computer-readable medium may be a computer-readable signal medium or a computer-readable storage medium.
  • the computer-readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, device or device, or any combination thereof. More specific examples (non-exhaustive list) of computer readable storage media include: electrical connections having one or more conductors, portable computer disks, hard drives, random access memory (RAM), read only memory (ROM), Erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program for use by or in connection with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the above.
  • Computer-readable signal media can also be any computer-readable media other than computer-readable storage media, which can be sent, disseminated or transfer a program for use by or in connection with an instruction execution system, apparatus, or device.
  • Program code embodied on a computer-readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wire, optical cable, RF, etc., or any suitable combination of the foregoing.
  • Computer program code for performing the operations of the present invention may be written in one or more programming languages, including object-oriented programming languages such as Java, Smalltalk, C++, and conventional Procedural programming language—such as "C" or a similar programming language.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or wide area network (WAN) domain, or may be connected to an external computer (e.g., using an Internet service provider). Internet connection).
  • LAN local area network
  • WAN wide area network
  • Internet service provider e.g., using an Internet service provider
  • the present disclosure also provides a computer program product.
  • the computer program product When the computer program product is run on a computer, it causes the computer to execute steps for implementing the above method embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本公开涉及多方协同模型训练方法、装置、电子设备和存储介质,包括:参与模型训练的第一方构建第一模型,第一模型与第二模型串联堆叠生成联合模型;通过第一模型预测第一样本数据得到第一特征向量,并将第一特征向量正向传递至第二模型,指示第二模型基于接收到的第一特征向量正向传播训练,并将对应的第一特征向量梯度反向传递至第一模型;第一模型和第二模型间传递的第一特征向量和/或第一特征向量梯度是基于预设隐私预算进行噪声扰动处理后进行传递的;接收并基于附加噪声的第一特征向量梯度对第一模型进行调参优化,迭代训练直至满足联合模型的训练结束条件。

Description

多方协同模型训练方法、装置、设备和介质
相关申请的交叉引用
本申请是以中国申请号为202211079219.6,申请日为2022年9月5日的申请为基础,并主张其优先权,该中国申请的公开内容在此作为整体引入本申请中。
技术领域
本公开涉及神经网络技术领域,尤其涉及一种多方协同模型训练方法、装置、设备和介质。
背景技术
由于数据常常是以孤岛的形式存在,联邦学习为解决数据孤岛的问题,提出多方联合建模,即共同建立联合模型。
但联邦学习提出的多方联合建模需要利用信任第三方进行中间数据的加解密,然而在现实生活中,找到合作双方共同信任的第三方协作者是困难的,此外还需要第三方具备支持联邦学习的技术、算力和人力资源,提升了联合建模的成本。
发明内容
第一方面,本公开实施例提供了一种多方协同模型训练方法,包括:
参与模型训练的第一方构建第一模型,所述第一模型与第二模型串联堆叠生成联合模型,所述第一模型位于所述第二模型的下方,所述第二模型为参与模型训练的第二方构建;
通过所述第一模型预测第一样本数据得到第一特征向量,并将所述第一特征向量正向传递至所述第二模型,指示所述第二模型基于接收到的第一特征向量正向传播训练,并将对应的第一特征向量梯度反向传递至所述第一模型;所述第一模型和所述第二模型间传递的第一特征向量和/或第一特征向量梯度是基于预设隐私预算进行噪声扰动处理后进行传递的;
接收并基于附加噪声的第一特征向量梯度对所述第一模型进行调参优化,迭代训练直至满足所述联合模型的训练结束条件。
可选的,所述第一模型的输出与所述第二模型的输入连接,所述第一模型的输入 为所述联合模型的输入,所述第二模型的输出为所述联合模型的输出。
可选的,所述第一模型与第三模型并联堆叠后,再与所述第二模型串联堆叠生成所述联合模型;则所述方法还包括:
通过所述第三模型预测第二样本数据得到第二特征向量,将所述第二特征向量和所述第一特征向量合并得到目标特征向量;
将所述目标特征向量正向传递至所述第二模型,指示所述第二模型基于接收到的目标特征向量正向传播训练,并确定目标特征向量梯度;
根据所述第一特征向量和所述第二特征向量的大小对所述目标特征向量梯度进行分割后分别反向传递至所述第一模型和所述第三模型。
可选的,所述通过所述第三模型预测第二样本数据得到第二特征向量,将所述第二特征向量和所述第一特征向量合并得到目标特征向量,包括:
通过所述第三模型预测第二样本数据得到第二特征向量;
依次将所述第一特征向量和所述第二特征向量中位于同一行的特征向量进行拼接组合后得到目标特征向量。
可选的,所述通过所述第一模型预测第一样本数据得到第一特征向量,并将所述第一特征向量正向传递至所述第二模型,包括:
通过所述第一模型预测第一样本数据得到第一特征向量;
根据所述第一特征向量和第一预设噪声函数确定第一扰动噪声;
将所述第一特征向量和第一扰动噪声进行相加后传递至所述第二模型。
可选的,所述根据所述第一特征向量和预设噪声函数确定扰动噪声,包括:
从所述第一特征向量中选取每一组特征向量的最大值以及最小值,分别组成最大值数组和最小值数组;
根据所述最大值数组和所述最小值数组确定所述预设噪声函数的敏感度;
基于所述敏感度和所述预设隐私预算确定扰动噪声。
可选的,还包括:
所述将对应的第一特征向量梯度反向传递至所述第一模型之前,获取所述第二模型输出的预测样本标签数据;
基于所述预测样本标签数据与所述第一样本数据对应的训练样本标签数据的关系,确定损失函数值。
可选的,所述接收并基于附加噪声的第一特征向量梯度对所述第一模型进行调参 优化,迭代训练直至满足所述联合模型的训练结束条件,包括:
在所述损失函数值大于预设阈值时,将对应的第一特征向量梯度和扰动噪声进行相加后反向传递至所述第一模型,并作为所述第一样本特征数据输入至所述第一模型再次进行正向传播训练;
在所述损失函数值小于或等于预设阈值时,确定目标联合模型的参数为第一方构建的第一模型以及第二方构建的第二模型的初始参数。
可选的,所述将对应的第一特征向量梯度反向传递至所述第一模型,包括:
根据所述第一特征向量梯度和第二预设噪声函数确定第二扰动噪声;
所述第一特征向量梯度和第二扰动噪声进行相加后反向传递至所述第一模型。
可选的,噪声扰动处理包括基于拉普拉斯噪声扰动或基于高斯噪声扰动的处理中的至少一种。
第二方面,本公开实施例提供一种多方协同模型训练装置,包括:
模型构建模块,用于参与模型训练的第一方构建第一模型,所述第一模型与第二模型串联堆叠生成联合模型,所述第一模型位于所述第二模型的下方,所述第二模型为参与模型训练的第二方构建;
训练模块,用于通过所述第一模型预测第一样本数据得到第一特征向量,并将所述第一特征向量正向传递至所述第二模型,指示所述第二模型基于接收到的第一特征向量正向传播训练,并将对应的第一特征向量梯度反向传递至所述第一模型;所述第一模型和所述第二模型间传递的第一特征向量和/或第一特征向量梯度是基于预设隐私预算进行噪声扰动处理后进行传递的;
参数优化模块,用于接收并基于附加噪声的第一特征向量梯度对所述第一模型进行调参优化,迭代训练直至满足所述联合模型的训练结束条件。
第三方面,本公开实施例提供一种电子设备,包括:
一个或多个处理器;
存储装置,用于存储一个或多个程序,
当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如第一方面中任一所述的方法。
第四方面,本公开实施例提供一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现如第一方面中任一所述的方法。
第五方面,本公开实施例提供一种计算机程序,包括:
指令,所述指令当由处理器执行时使所述处理器执行前述任一种方法。
第六方面,本公开实施例提供一种计算机程序产品,包括指令,所述指令当由处理器执行时使所述处理器执行前述任一种方法。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的实施例,并与说明书一起用于解释本公开的原理。
为了更清楚地说明本公开实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,对于本领域普通技术人员而言,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本公开实施例提供的一种多方协同模型训练方法的流程示意图;
图2是本公开实施例提供的另一种多方协同模型训练方法的流程示意图;
图3是本公开实施例提供的一种多方协同模型训练装置的结构示意图;
图4是本公开实施例提供的一种电子设备的结构示意图。
具体实施方式
为了能够更清楚地理解本公开的上述目的、特征和优点,下面将对本公开的方案进行进一步描述。需要说明的是,在不冲突的情况下,本公开的实施例及实施例中的特征可以相互组合。
在下面的描述中阐述了很多具体细节以便于充分理解本公开,但本公开还可以采用其他不同于在此描述的方式来实施;显然,说明书中的实施例只是本公开的一部分实施例,而不是全部的实施例。
为了解决利用第三方进行联邦学习时所存在的问题,本公开实施例提供一种多方协同模型训练方法,参与模型训练的第一方构建第一模型,第一模型与第二模型串联堆叠生成联合模型,第一模型位于第二模型的下方,第二模型为参与模型训练的第二方构建;通过第一模型预测第一样本数据得到第一特征向量,并将第一特征向量正向传递至第二模型,指示第二模型基于接收到的第一特征向量正向传播训练,并将对应的第一特征向量梯度反向传递至第一模型;第一模型和第二模型间传递的第一特征向量和/或第一特征向量梯度是基于预设隐私预算进行噪声扰动处理后进行传递的;接收并基于附加噪声的第一特征向量梯度对第一模型进行调参优化,迭代训练直至满足联 合模型的训练结束条件。即联合模型包括第一模型和第二模型,第一模型和第二模型串联堆叠生成联合模型,当将第一样本特征数据输入至第一模型得到与第一样本特征数据对应的第一特征向量后,将第一特征向量正向传递至第二模型,指示第二模型基于接收到的第一特征向量正向传播训练,并将对应的第一特征向量梯度反向传递至第一模型,由于第一特征向量和第一特征向量基于预设隐私预算进行噪声扰动处理,保证第一模型和第二模型之间传递的数据的隐私性,相比较建立的联合模型需要信任的第三方进行中间数据的加解密的方式,本公开实施例提出的多方协同模型训练方法,仅仅基于预设噪声算法对联合模型的中间数据进行处理,保证了数据的隐私性,降低了联合建模的成本。
下面参照附图描述本公开的方案。
图1是本公开实施例提供的一种多方协同模型训练方法的流程示意图。本实施例方法可由多方协同模型训练装置来执行,该装置可采用硬件/或软件的方式来实现,并可配置于电子设备中。可实现本申请任意实施例所述的多方协同模型训练方法。如图1所示,该方法具体包括S10~S30。
S10、参与模型训练的第一方构建第一模型,第一模型与第二模型串联堆叠生成联合模型,第一模型位于第二模型的下方,第二模型为参与模型训练的第二方构建。
本公开实施例提供的联合模型包括第一模型和第二模型,第一模型和第二模型串联连接,训练样本数据包括第一样本数据以及第一样本数据对应的样本标签数据。
示例性的,联合模型包括第一模型和第二模型,第一模型和第二模型串联连接,即第一模型的输出与第二模型的输入连接,将第一样本数据输入至第一模型,第一模型的输入为联合模型的输入,第一模型输出的第一特征向量作为第二模型的输入,第二模型的输出与第一样本数据对应的训练样本标签数据,即第二模型的输出为联合模型的输出。
通过设置联合模型包括第一模型和第二模型,即当输入第一样本数据至联合模型后,首先联合模型的第一模型对输入的第一样本数据进行处理得到第一样本数据对应的第一特征向量。
S20、通过第一模型预测第一样本数据得到第一特征向量,并将第一特征向量正向传递至第二模型,指示第二模型基于接收到的第一特征向量正向传播训练,并将对应的第一特征向量梯度反向传递至第一模型。
第一模型和第二模型间传递的第一特征向量和/或第一特征向量梯度是基于预设 隐私预算进行噪声扰动处理后进行传递的。
噪声扰动处理示例性包括基于拉普拉斯噪声扰动和基于高斯噪声扰动等。
本公开实施例提供的多方协同模型训练方法,在通过第一模型预测第一样本数据得到第一特征向量后,基于预设隐私预算对第一特征向量和第一特征向量梯度进行噪声扰动处理,再将噪声扰动处理后的第一特征向量向上传递至第二模型,以及将噪声扰动处理后的第一特征向量梯度反向传递至第一模型,即基于预设隐私预算进行噪声扰动处理,保证由第一模型和第二模型之间传递的数据的隐私性。
具体的,隐私保护程度和数据可用性是最重要的衡量指标,为了平衡隐私保护程度和数据可用性,需要引入形式化定义对隐私进行量化,顺应这一发展趋势,有研究者提出了差分隐私技术。作为一种隐私保护模型,其严格定义了隐私保护的强度,即任意一条记录的添加或删除,都不会影响最终的查询结果,但是,传统的差分隐私技术将原始数据集中到一个数据中心然后发布满足差分隐私的相关统计信息,被称为中心化差分隐私(CDP,Centralized Differential Privacy)技术。中心化差分隐私对于敏感信息的保护始终基于一个前提假设:可信的第三方数据收集者,而这在很多场景是无法满足的。所以在中心化差分隐私基础上,又发展出本地化差分隐私技术(LDP,Local Differential Privacy),该模型中,每个用户首先对数据进行隐私化处理,再将处理后的数据发送给数据收集者,在对数据进行统计分析的同时,保证了个体的隐私信息不被泄露。
本地化差分隐私的形式化定义如下:
给定n个用户,每个用户对应一条记录,给定一个隐私算法M及其定义域Dom(M)和值域Ran(M),若算法M在任意两条记录t和t′(t,t′∈Dom(M))上得到相同的输出结果满足下列不等式(1),则M满足ε-本地化差分隐私。
Pr[M(t)=t*]≤eε×Pr[M(t′)=t*]+δ      (1)
从上述定义中可以看出,本地化差分隐私技术通过控制任意两条记录的输出结果的相似性,从而确保算法M满足本地化差分隐私。简言之,根据隐私算法M的某个输出结果,几乎无法推理出其输入数据为哪一条记录。
在具体的实施方式中,使用拉普拉斯算法(或者高斯算法等)对输入至联合模型的训练样本数据进行扰动。
拉普拉斯算法如公式(2)~(4)所示:
Δf=max(f(t)-f(t′))     (2)
M(t)=f(t)+Y     (3)
f为被保护的函数过程,例如为模型的正向传播或反向传播的输出结果,t和t’为两份数据集;M(t)表示加扰动之后的输出结果;ε为隐私预算,即泄漏隐私的度量,Y表示拉普拉斯分布的噪声,可以满足(ε,0)差分隐私。
在本申请中引入差分隐私机制的步骤如下:
输入至联合模型的训练样本数据包括若干次批次,对于一个训练批次,训练样本数据组成集合X∶(X1,X2,…,Xm),按序号下标的每个元素为向量Xi∶(xi1,xi2,…,xim),从而集合X构成了一个矩阵。对X按列计算的最大值-最小值,得到ΔF:(f1,f2,…,fm),其中fj=max(x1j,…,xmj)-min(x1j,…,xmj),
然后根据计算的ΔF计算拉普拉斯噪声:最后将计算的拉普拉斯噪声附加到每个训练样本数据中,实现输入至联合模型中数据的隐私性。
因此,在获取到第一模型输出的第一样本数据对应的第一特征向量以及第一特征向量梯度后,将该第一特征向量输入至第二模型之前以及将第一特征向量梯度反向传递至第一模型之前,可基于预设隐私预算进行噪声扰动处理对该第一特征向量以及第一特征向量梯度添加噪声扰动,即将计算的噪声附加到每个第一特征向量中,保证第一模型和第二模型之间传递的数据的隐私性。
S30、接收并基于附加噪声的第一特征向量梯度对第一模型进行调参优化,迭代训练直至满足联合模型的训练结束条件。
在基于预设隐私预算对第一模型输出的第一特征向量以及第一特征向量梯度进行噪声扰动处理后,将处理后到的第一特征向量正向传递至第二模型,第二模型通过对第一特征向量进行特征提取、特征分析后,输出该第一特征向量对应的预测样本标签数据,然后基于预测样本标签数据与第一样本数据对应的训练样本标签数据的关系,确定损失函数值;在损失函数值大于预设阈值时,将对应的第一特征向量梯度反向传递至第一模型,并作为第一样本特征数据输入至第一模型再次进行正向传播训练;在损失函数值小于或等于预设阈值时,确定目标联合模型的参数为第一方构建的第一模型以及第二方构建的第二模型的初始参数。
本公开实施例提供一种多方协同模型训练方法,参与模型训练的第一方构建第一模型,第一模型与第二模型串联堆叠生成联合模型,第一模型位于第二模型的下方,第二模型为参与模型训练的第二方构建;通过第一模型预测第一样本数据得到第一特 征向量,并将第一特征向量正向传递至第二模型,指示第二模型基于接收到的第一特征向量正向传播训练,并将对应的第一特征向量梯度反向传递至第一模型;第一模型和第二模型间传递的第一特征向量和/或第一特征向量梯度是基于预设隐私预算进行噪声扰动处理后进行传递的;接收并基于附加噪声的第一特征向量梯度对第一模型进行调参优化,迭代训练直至满足联合模型的训练结束条件。即联合模型包括第一模型和第二模型,第一模型和第二模型串联堆叠生成联合模型,当将第一样本特征数据输入至第一模型得到与第一样本特征数据对应的第一特征向量后,将第一特征向量正向传递至第二模型,指示第二模型基于接收到的第一特征向量正向传播训练,并将对应的第一特征向量梯度反向传递至第一模型,由于第一特征向量和第一特征向量基于预设隐私预算进行噪声扰动处理,保证第一模型和第二模型之间传递的数据的隐私性,相比较建立的联合模型需要信任的第三方进行中间数据的加解密的方式,本公开实施例提出的多方协同模型训练方法,仅仅基于预设噪声算法对联合模型的中间数据进行处理,保证了数据的隐私性,降低了联合建模的成本。
需要说明的是,上述公开的实施例示例性说明联合模型包括第一模型和第二模型,第一模型和第二模型串联连接,在具体的可实施方式中,联合模型也可以为其它结构,示例性的,第一模型包括第一子模型和第二子模型,第一子模型和第二子模型串联连接,第二子模型和第二模型串联连接,又或者第一模型包括第一子模型和第二子模型,第一子模型和第二子模型并联连接,第二模型和第一子模型以及第二子模型串联连接等,此外,第一模型包括的子模型的数量本公开实施例不对此进行具体限定。
作为一种可实施方式,可选的,通过第一模型预测第一样本数据得到第一特征向量,并将第一特征向量向上传递至第二模型,包括:
通过第一模型预测第一样本数据得到第一特征向量;根据第一特征向量和第一预设噪声函数确定第一扰动噪声;将第一特征向量和第一扰动噪声进行相加后传递至第二模型。
根据第一特征向量和预设噪声函数确定扰动噪声,包括:从第一特征向量中选取每一组特征向量的最大值以及最小值,分别组成最大值数组和最小值数组;根据最大值数组和最小值数组确定预设噪声函数的敏感度;基于敏感度和预设隐私预算确定扰动噪声。
当联合模型中位于第二模型下方仅仅包括第一模型时,训练样本数据包括N行第一样本数据,此时将第一样本数据输入至第一模型后得到与第一样本数据对应的第一 特征向量,通过将第一特征向量和第一扰动噪声进行相加得到第一目标特征向量后传递至第二模型。
第一模型通过对第一样本数据进行深度学习、特征提取后得到与第一样本数据对应的第一特征向量,第一特征向量表征第一样本数据的特征。
示例性的,如下表1所示,第一样本数据为:
表1
第一样本数据组成的集合X∶(X1,X2,X3),其中,X1(男,15,高中),X2(女,21,本科),X3(女,6,小学),首先通过将X1、X2和X3组成的第一样本数据输入至第一模型,第一模型对第一样本数据进行处理并映射到多维空间得到该第一样本数据在多维空间对应的空间向量表示即特征向量,然后对第一模型输出的第一特征向量按列计算最大值-最小值,得到ΔF:(f1,f2,f3),最后根据计算的ΔF计算拉普拉斯噪声:其中,ε为预设隐私预算,ΔF为预设噪声函数的敏感度,将计算的拉普拉斯噪声附加到每个特征向量中后输入至第二模型,即通过对第一模型输出的第一特征向量增加噪声后再输入至第二模型,保证第一模型和第二模型之间数据的隐私性。
上述实施例中,第一预设噪声函数示例性采用拉普拉斯函数,在其它可实施方式中,第一预设噪声函数可以为高斯函数,本公开实施例不对此进行具体限定。
当对第一模型输出的第一特征向量增加噪声后输入至第二模型后,此时第二模型对增加噪声的第一特征向量进行处理得到预测样本标签特征向量,预测样本标签特征向量用于表征第一样本数据经过该联合模型后预测得到的样本标签的特征。
需要说明的是,上述实施例示例性表示第一样本数据包括三组训练样本数据,在具体的实施方式中,训练样本数据包括多组,本公开实施例不对此进行具体限定。
图2是本公开实施例提供的另一种多方协同模型训练方法的流程示意图,本公开实施例是在上述实施例的基础上,如图2所示,步骤S20的具体实现方式还包括步骤 S201~S203。
S201、通过第一模型预测第一样本数据得到第一特征向量,通过第三模型预测第二样本数据得到第二特征向量,将第二特征向量和第一特征向量合并得到目标特征向量。
作为一种可实施方式,当联合模型包括第一模型、第二模型和第三模型,第一模型与第三模型串联堆叠后,再与第二模型串联堆叠生成联合模型时,训练样本特征数据包括N行第一样本数据和N行第二样本数据,且第一样本数据对应的列向量和第二样本数据的列向量的特征不相交。此时将第一样本数据输入至第一模型后得到与第一样本数据对应的第一特征向量,将第二样本数据输入至第三模型后得到与第三样本数据对应的第二特征向量。第一模型通过对第一样本数据进行深度学习、特征提取后得到与第一样本数据对应的第一特征向量,第三模型通过对第二样本数据进行深度学习、特征提取后得到与第二样本数据对应的第二特征向量,第一特征向量表征第一样本数据的特征,第二特征向量表征第二样本数据的特征。
示例性的,如表2所示,第一样本数据为:
表2
示例性的,如下表3所示,第二样本数据为:
表3
上述表中,第一样本数据包括的特征分别为性别、年龄和学历,第二样本数据包括的特征分别为浏览产品类型、时间段以及相同产品浏览次数,即第一样本数据和第二样本数据所对应的特征不相交。通过设置第一模型和第三模型包括特征不相交,进而保证联合模型输出的预测样本标签数据的精度。
示例性的,第一样本数据组成的集合X1∶(X11,X12,X13),其中,X11(男,15,高中),X12(女,21,本科),X13(女,6,小学),第二样本数据组成的集合X2:(X21,X22,X23),其中,X21(15,8:30-9:30,5),X22(2,10:30-11:30,1),X23(2,10:30-11:30,1),通过将X11、X12和X13组成的第一样本数据输入至第一模型,第一模型对第一样本数据进行处理并映射到多维空间得到该第一样本数据在多维空间对应的空间向量表示即第一特征向量,通过将X21、X22和X23组成的第二样本数据输入至第三模型,第三模型对第二样本数据进行处理并映射到多维空间得到该第二样本数据在多维空间对应的空间向量表示即第二特征向量。
在具体的实施方式中,需要基于预设噪声函数对第一特征向量和第二特征向量进行处理,即在第一模型输出第一特征向量后,通过预设噪声函数对第一特征向量进行处理,示例性的,第一样本数据X1∶(X11,X12,X13)经第一模型预测得到第一特征向量为Y1∶(Y11,Y12,Y13),预设噪声函数对第一特征向量进行处理得到的第一目标特征向量为Y1'∶(Y11′,Y12′,Y13′)。在第三模型输出第二特征向量后,通过预设噪声算法对第二特征向量进行处理。示例性的,第二样本数据X2∶(X21,X22,X23)经第三模型预测得到第二特征向量Y2∶(Y21,Y22,Y23),预设噪声算法对第二特征向量进行处理得到的第二目标特征向量为Y2'∶(Y21′,Y22′,Y23′)。
在基于预设噪声算法对第一特征向量进行处理得到第一目标特征向量以及对第二特征向量进行处理得到第二目标特征向量后,通过将将第一目标特征向量和第二目标特征向量拼接后得到目标特征向量。
需要说明的是,上述实施例中,基于预设噪声函数对第一特征向量和第二特征向量进行处理的过程中,对第一特征向量进行处理的预设噪声函数与对第二特征向量进行处理的预设噪声函数可以为相同的噪声函数,也可以为不同的噪声函数,本公开实施例不对此进行具体限定。
可选的,在具体的实施方式中,将第二特征向量和第一特征向量合并得到目标特征向量,包括:依次将第一特征向量和第二特征向量中位于同一行的特征向量进行拼接组合后得到目标特征向量。
示例性的,在对第一特征向量Y1∶(Y11,Y12,Y13)基于预设隐私预算进行噪声扰动处理后得到的第一目标特征向量为Y1'∶(Y11′,Y12′,Y13′),以及对第二特征向量Y2∶(Y21,Y22,Y23)基于预设隐私预算进行噪声扰动处理后得到的第二目标特征向量为Y2'∶(Y21′,Y22′,Y23′)后,将第一目标特征向量和第二目标特征向量进行拼接的过程为,将第一目标特征向量中的Y11′与第二目标特征向量Y21′进行拼接,将第一目标特征向量中的Y12′与第二目标特征向量Y22′进行拼接,将第一目标特征向量中的Y13′与第二目标特征向量Y23′进行拼接,拼接后得到目标特征向量Y'∶(Y11′+Y21′,Y12′+Y22′,Y13′+Y23′)。
需要说明的是,在具体的实施方式中,将位于同一行的第一目标特征向量和第二目标特征向量进行拼接的过程中,将第二目标特征向量中的Y21′拼接在第一目标特征向量Y11′的后面,将第二目标特征向量中的Y22′拼接在第一目标特征向量Y12′的后面,将第二目标特征向量中的Y23′拼接在第一目标特征向量Y13′的后面。
S202、将目标特征向量正向传递至第二模型,指示第二模型基于接收到的目标特征向量正向传播训练,并确定目标特征向量梯度。
在通过第一模型预测第一样本数据得到第一特征向量,通过第三模型预测第二样本数据得到第二特征向量后,分别基于预设隐私预算对第一特征向量和第二特征向量进行噪声扰动处理,再将噪声扰动处理后的第一特征向量和第二特征向量合并后得到目标特征向量后向上传递至第二模型,以及将噪声扰动处理后的第一特征向量梯度以及第二特征向量梯度合并得到目标特征向量梯度后反向传递至第一模型和第三模型,即基于预设隐私预算进行噪声扰动处理,保证由第一模型与第三模型和第二模型之间传递的数据的隐私性。
S203、根据第一特征向量和第二特征向量的大小对目标特征向量梯度进行分割后分别反向传递至第一模型和第三模型。
在具体的实施方式中,将目标特征向量梯度反向传递至第一模型之前,需要确定联合模型包括的模型的数量,当第一模型与第二模型串联堆叠生成联合模型时,此时,将目标特征向量梯度直接反向传递至第一模型作为联合模型的输入即可,当第一模型与第三模型串联堆叠后,再与第二模型串联堆叠生成联合模型时,需要根据联合模型第二模型下方包括的模型的数量将目标特征向量梯度进行分割。
示例性的,根据联合模型中第二模型下方包括的模型的数量将目标特征向量梯度进行分割的过程为:在对第一特征向量Y1∶(Y11,Y12,Y13)进行处理得到的第一目标 特征向量为Y1'∶(Y11′,Y12′,Y13′),以及对第二特征向量Y2∶(Y21,Y22,Y23)进行处理得到的第二目标特征向量为Y2'∶(Y21′,Y22′,Y23′)后,将第一目标特征向量和第二目标特征向量进行拼接的过程为,将第一目标特征向量中的Y11′与第二目标特征向量Y21′进行拼接,将第一目标特征向量中的Y12′与第二目标特征向量Y22′进行拼接,将第一目标特征向量中的Y13′与第二目标特征向量Y23′进行拼接,拼接后得到目标特征向量Y'∶(Y11′+Y21′,Y12′+Y22′,Y13′+Y23′),拼接后的目标特征向量Y'∶(Y11′+Y21′,Y12′+Y22′,Y13′+Y23′)梯度为G:其中Loss表示损失值,表示其下角标所对应的维度的预测值。通过将目标特征向量梯度值G进行分割,保证反向传递至第一模型和第三模型的数据与训练样本数据对应。例如,上述实施例中,输入至第一模型的第一样本数据中每一个样本数据均包括三个特征,输入至第三模型的第二样本数据中每个样本数据也包括三个特征,因此,首先需要将目标特征向量梯度值G中的分割成两个部分,每个部分对应三个特征,将第一部分对应的特征向量输入至第一模型,将第二部分对应的特征向量输入至第三模型,将目标特征向量梯度值G中的分割成两个部分,每个部分对应三个特征,将第一部分对应的特征向量输入至第一模型,将第二部分对应的特征向量输入至第三模型,最后,将分割后的目标特征向量梯度值分别反向传递至第一模型和第三模型。
此时,对应步骤S30的具体实现方式为S301:接收并基于附加噪声的目标特征向量梯度对第一模型和第三模型进行调参优化,迭代训练直至满足联合模型的训练结束条件。
本公开实施例提供的多方协同模型训练方法,当联合模型包括第一模型、第二模型和第三模型,第一模型与第三模型串联堆叠后,再与第二模型串联堆叠生成联合模型时,首先将第一样本数据输入至第一模型得到与第一样本数据对应的第一特征向量,将第二样本数据输入至第三模型得到与第二样本数据对应的第二特征向量,由于输入至第一模型的第一样本数据对应的列向量和输入至第三模型的第二样本数据的列向量的特征不相交,可以保证联合模型的训练精度,此外,通过预设噪声函数对第一模型输出的第一特征向量以及第三模型输出的第二特征向量进行处理后再进行拼接,一方面保证输出至第二模型的样本数据的隐私性,另一方面保证第一模型和第三模型之间数据的隐私性。
可选的,作为一种可实施方式,将对应的第一特征向量梯度反向传递至第一模型, 包括:根据第一特征向量梯度和第二预设噪声函数确定第二扰动噪声;第一特征向量梯度和第二扰动噪声进行相加后反向传递至第一模型。
示例性的,第一样本数据组成的集合X1∶(X11,X12,X13),其中,X11(男,15,高中),X12(女,21,本科),X13(女,6,小学),第二样本数据组成的集合X2:(X21,X22,X23),其中,X21(15,8:30-9:30,5),X22(2,10:30-11:30,1),X23(2,10:30-11:30,1),第一样本数据X1∶(X11,X12,X13)经第一模型预测得到第一特征向量为Y1∶(Y11,Y12,Y13),第二样本数据X2∶(X21,X22,X23)经第三模型预测得到第二特征向量Y2∶(Y21,Y22,Y23),然后对第一特征向量Y1和第二特征向量Y2分别按列计算最大值-最小值,得到ΔFG:(fG1,fG2,fG3),最后根据计算的ΔFG计算拉普拉斯噪声:其中,ε为预设隐私预算,ΔFG为预设噪声函数的敏感度,将计算的拉普拉斯噪声附加到每个特征向量中后得到第一目标特征向量和第二目标特征向量。
图3是本公开实施例提供的一种多方协同模型训练装置的结构示意图,如图3所示,多方协同模型训练装置包括:
模型构建模块310,用于参与模型训练的第一方构建第一模型,第一模型与第二模型串联堆叠生成联合模型,第一模型位于第二模型的下方,第二模型为参与模型训练的第二方构建;
训练模块320,用于通过第一模型预测第一样本数据得到第一特征向量,并将第一特征向量正向传递至第二模型,指示第二模型基于接收到的第一特征向量正向传播训练,并将对应的第一特征向量梯度反向传递至第一模型;第一模型和第二模型间传递的第一特征向量和/或第一特征向量梯度是基于预设隐私预算进行噪声扰动处理后进行传递的;
参数优化模块330,用于接收并基于附加噪声的第一特征向量梯度对第一模型进行调参优化,迭代训练直至满足联合模型的训练结束条件。
本公开实施例提供的多方协同模型训练装置,参与模型训练的第一方构建第一模型,第一模型与第二模型串联堆叠生成联合模型,第一模型位于第二模型的下方,第二模型为参与模型训练的第二方构建;通过第一模型预测第一样本数据得到第一特征向量,并将第一特征向量正向传递至第二模型,指示第二模型基于接收到的第一特征向量正向传播训练,并将对应的第一特征向量梯度反向传递至第一模型;第一模型和第二模型间传递的第一特征向量和/或第一特征向量梯度是基于预设隐私预算进行噪 声扰动处理后进行传递的;接收并基于附加噪声的第一特征向量梯度对第一模型进行调参优化,迭代训练直至满足联合模型的训练结束条件。即联合模型包括第一模型和第二模型,第一模型和第二模型串联堆叠生成联合模型,当将第一样本特征数据输入至第一模型得到与第一样本特征数据对应的第一特征向量后,将第一特征向量正向传递至第二模型,指示第二模型基于接收到的第一特征向量正向传播训练,并将对应的第一特征向量梯度反向传递至第一模型,由于第一特征向量和第一特征向量基于预设隐私预算进行噪声扰动处理,保证第一模型和第二模型之间传递的数据的隐私性,相比较现有技术中建立的联合模型需要信任的第三方进行中间数据的加解密,本公开实施例提出的多方协同模型训练方法,仅仅基于预设噪声算法对联合模型的中间数据进行处理,保证了数据的隐私性,降低了联合建模的成本。
本发明实施例所提供的装置可执行本发明任意实施例所提供的方法,具备执行方法相应的功能模块和有益效果。
值得注意的是,上述装置的实施例中,所包括的各个单元和模块只是按照功能逻辑进行划分的,但并不局限于上述的划分,只要能够实现相应的功能即可;另外,各功能单元的具体名称也只是为了便于相互区分,并不用于限制本发明的保护范围。
本公开还提供一种电子设备,包括:处理器,所述处理器用于执行存储于存储器的计算机程序,所述计算机程序被处理器执行时实现上述方法实施例的步骤。
图4为本公开提供的一种电子设备的结构示意图,图4示出了适于用来实现本发明实施例实施方式的示例性电子设备的框图。图4显示的电子设备仅仅是一个示例,不应对本发明实施例的功能和使用范围带来任何限制。
如图4所示,电子设备800以通用计算设备的形式表现。电子设备800的组件可以包括但不限于:一个或者多个处理器810,系统存储器820,连接不同系统组件(包括系统存储器820和处理器)的总线830。
总线830表示几类总线结构中的一种或多种,包括存储器总线或者存储器控制器,外围总线,图形加速端口,处理器或者使用多种总线结构中的任意总线结构的局域总线。举例来说,这些体系结构包括但不限于工业标准体系结构(ISA)总线,微通道体系结构(MAC)总线,增强型ISA总线、视频电子标准协会(VESA)局域总线以及外围组件互连(PCI)总线。
电子设备800典型地包括多种计算机系统可读介质。这些介质可以是任何能够被电子设备800访问的介质,包括易失性和非易失性介质,可移动的和不可移动的介质。
系统存储器820可以包括易失性存储器形式的计算机系统可读介质,例如随机存取存储器(RAM)840和/或高速缓存存储器850。电子设备800可以进一步包括其它可移动/不可移动的、易失性/非易失性计算机系统存储介质。仅作为举例,存储系统860可以用于读写不可移动的、非易失性磁介质(通常称为“硬盘驱动器”)。可以提供用于对可移动非易失性磁盘(例如“软盘”)读写的磁盘驱动器,以及对可移动非易失性光盘(例如CD-ROM、DVD-ROM或者其它光介质)读写的光盘驱动器。在这些情况下,每个驱动器可以通过一个或者多个数据介质接口与总线830相连。系统存储器820可以包括至少一个程序产品,该程序产品具有一组(例如至少一个)程序模块,这些程序模块被配置以执行本发明实施例各实施例的功能。
具有一组(至少一个)程序模块870的程序/实用工具880,可以存储在例如系统存储器820中,这样的程序模块870包括但不限于操作系统、一个或者多个应用程序、其它程序模块以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。程序模块870通常执行本发明实施例所描述的实施例中的功能和/或方法。
处理器810通过运行存储在系统存储器820中的多个程序中的至少一个程序,从而执行各种功能应用以及信息处理,例如实现本发明实施例所提供的方法实施例。
本公开还提供一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现上述方法实施例的步骤。
可以采用一个或多个计算机可读的介质的任意组合。计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质。计算机可读存储介质例如可以是但不限于电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本文件中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。
计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播 或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。
计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于无线、电线、光缆、RF等等,或者上述的任意合适的组合。
可以以一种或多种程序设计语言或其组合来编写用于执行本发明操作的计算机程序代码,所述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)域连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
本公开还提供一种计算机程序产品,当所述计算机程序产品在计算机上运行时,使得所述计算机执行实现上述方法实施例的步骤。
需要说明的是,在本文中,诸如“第一”和“第二”等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
以上所述仅是本公开的具体实施方式,使本领域技术人员能够理解或实现本公开。对这些实施例的多种修改对本领域的技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本公开的精神或范围的情况下,在其它实施例中实现。因此,本公开将不会被限制于本文所述的这些实施例,而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。

Claims (15)

  1. 一种多方协同模型训练方法,包括:
    参与模型训练的第一方构建第一模型,所述第一模型与第二模型串联堆叠生成联合模型,所述第一模型位于所述第二模型的下方,所述第二模型为参与模型训练的第二方构建;
    通过所述第一模型预测第一样本数据得到第一特征向量,并将所述第一特征向量正向传递至所述第二模型,指示所述第二模型基于接收到的第一特征向量正向传播训练,并将对应的第一特征向量梯度反向传递至所述第一模型;所述第一模型和所述第二模型间传递的第一特征向量和/或第一特征向量梯度是基于预设隐私预算进行噪声扰动处理后进行传递的;
    接收并基于附加噪声的第一特征向量梯度对所述第一模型进行调参优化,迭代训练直至满足所述联合模型的训练结束条件。
  2. 根据权利要求1所述的方法,其中,所述第一模型的输出与所述第二模型的输入连接,所述第一模型的输入为所述联合模型的输入,所述第二模型的输出为所述联合模型的输出。
  3. 根据权利要求1或2所述的方法,其中,所述第一模型与第三模型并联堆叠后,再与所述第二模型串联堆叠生成所述联合模型,所述方法还包括:
    通过所述第三模型预测第二样本数据得到第二特征向量,将所述第二特征向量和所述第一特征向量合并得到目标特征向量;
    将所述目标特征向量正向传递至所述第二模型,指示所述第二模型基于接收到的目标特征向量正向传播训练,并确定目标特征向量梯度;
    根据所述第一特征向量和所述第二特征向量的大小对所述目标特征向量梯度进行分割后分别反向传递至所述第一模型和所述第三模型。
  4. 根据权利要求3所述的方法,其中,所述通过所述第三模型预测第二样本数据得到第二特征向量,将所述第二特征向量和所述第一特征向量合并得到目标特征向量,包括:
    通过所述第三模型预测第二样本数据得到第二特征向量;
    依次将所述第一特征向量和所述第二特征向量中位于同一行的特征向量进行拼接组合后得到目标特征向量。
  5. 根据权利要求1~4中任一项所述的方法,其中,所述通过所述第一模型预测第一样本数据得到第一特征向量,并将所述第一特征向量正向传递至所述第二模型,包括:
    通过所述第一模型预测第一样本数据得到第一特征向量;
    根据所述第一特征向量和第一预设噪声函数确定第一扰动噪声;
    将所述第一特征向量和第一扰动噪声进行相加后传递至所述第二模型。
  6. 根据权利要求5所述的方法,其中,所述根据所述第一特征向量和预设噪声函数确定扰动噪声,包括:
    从所述第一特征向量中选取每一组特征向量的最大值以及最小值,分别组成最大值数组和最小值数组;
    根据所述最大值数组和所述最小值数组确定所述预设噪声函数的敏感度;
    基于所述敏感度和所述预设隐私预算确定扰动噪声。
  7. 根据权利要求1~6中任一项所述的方法,还包括:
    所述将对应的第一特征向量梯度反向传递至所述第一模型之前,获取所述第二模型输出的预测样本标签数据;
    基于所述预测样本标签数据与所述第一样本数据对应的训练样本标签数据的关系,确定损失函数值。
  8. 根据权利要求7所述的方法,其中,所述接收并基于附加噪声的第一特征向量梯度对所述第一模型进行调参优化,迭代训练直至满足所述联合模型的训练结束条件,包括:
    在所述损失函数值大于预设阈值时,将对应的第一特征向量梯度和扰动噪声进行相加后反向传递至所述第一模型,并作为所述第一样本特征数据输入至所述第一模型再次进行正向传播训练;
    在所述损失函数值小于或等于预设阈值时,确定目标联合模型的参数为第一方构建的第一模型以及第二方构建的第二模型的初始参数。
  9. 根据权利要求1~8中任一项所述的方法,其中,所述将对应的第一特征向量梯度反向传递至所述第一模型,包括:
    根据所述第一特征向量梯度和第二预设噪声函数确定第二扰动噪声;
    将所述第一特征向量梯度和第二扰动噪声进行相加后反向传递至所述第一模型。
  10. 根据权利要求1~9中任一项所述的方法,其中,噪声扰动处理包括基于拉普 拉斯噪声扰动或基于高斯噪声扰动的处理中的至少一种。
  11. 一种多方协同模型训练装置,包括:
    模型构建模块,用于参与模型训练的第一方构建第一模型,所述第一模型与第二模型串联堆叠生成联合模型,所述第一模型位于所述第二模型的下方,所述第二模型为参与模型训练的第二方构建;
    训练模块,用于通过所述第一模型预测第一样本数据得到第一特征向量,并将所述第一特征向量正向传递至所述第二模型,指示所述第二模型基于接收到的第一特征向量正向传播训练,并将对应的第一特征向量梯度反向传递至所述第一模型;所述第一模型和所述第二模型间传递的第一特征向量和/或第一特征向量梯度是基于预设隐私预算进行噪声扰动处理后进行传递的;
    参数优化模块,用于接收并基于附加噪声的第一特征向量梯度对所述第一模型进行调参优化,迭代训练直至满足所述联合模型的训练结束条件。
  12. 一种电子设备,包括:
    一个或多个处理器;
    存储装置,用于存储一个或多个程序,
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1~10中任一所述的方法。
  13. 一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现如权利要求1~10中任一所述的方法。
  14. 一种计算机程序,包括:
    指令,所述指令当由处理器执行时使所述处理器执行根据权利要求1~10中任一所述的方法。
  15. 一种计算机程序产品,包括:
    指令,所述指令当由处理器执行时使所述处理器执行根据权利要求1~10中任一所述的方法。
PCT/CN2023/113287 2022-09-05 2023-08-16 多方协同模型训练方法、装置、设备和介质 WO2024051456A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211079219.6 2022-09-05
CN202211079219.6A CN115640517A (zh) 2022-09-05 2022-09-05 多方协同模型训练方法、装置、设备和介质

Publications (1)

Publication Number Publication Date
WO2024051456A1 true WO2024051456A1 (zh) 2024-03-14

Family

ID=84940964

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/113287 WO2024051456A1 (zh) 2022-09-05 2023-08-16 多方协同模型训练方法、装置、设备和介质

Country Status (2)

Country Link
CN (1) CN115640517A (zh)
WO (1) WO2024051456A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115640517A (zh) * 2022-09-05 2023-01-24 北京火山引擎科技有限公司 多方协同模型训练方法、装置、设备和介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112668044A (zh) * 2020-12-21 2021-04-16 中国科学院信息工程研究所 面向联邦学习的隐私保护方法及装置
CN113011587A (zh) * 2021-03-24 2021-06-22 支付宝(杭州)信息技术有限公司 一种隐私保护的模型训练方法和系统
CN113127931A (zh) * 2021-06-18 2021-07-16 国网浙江省电力有限公司信息通信分公司 基于瑞丽散度进行噪声添加的联邦学习差分隐私保护方法
CN114091617A (zh) * 2021-11-29 2022-02-25 深圳前海微众银行股份有限公司 联邦学习建模优化方法、电子设备、存储介质及程序产品
CN114239860A (zh) * 2021-12-07 2022-03-25 支付宝(杭州)信息技术有限公司 基于隐私保护的模型训练方法及装置
CN115640517A (zh) * 2022-09-05 2023-01-24 北京火山引擎科技有限公司 多方协同模型训练方法、装置、设备和介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112668044A (zh) * 2020-12-21 2021-04-16 中国科学院信息工程研究所 面向联邦学习的隐私保护方法及装置
CN113011587A (zh) * 2021-03-24 2021-06-22 支付宝(杭州)信息技术有限公司 一种隐私保护的模型训练方法和系统
CN113127931A (zh) * 2021-06-18 2021-07-16 国网浙江省电力有限公司信息通信分公司 基于瑞丽散度进行噪声添加的联邦学习差分隐私保护方法
CN114091617A (zh) * 2021-11-29 2022-02-25 深圳前海微众银行股份有限公司 联邦学习建模优化方法、电子设备、存储介质及程序产品
CN114239860A (zh) * 2021-12-07 2022-03-25 支付宝(杭州)信息技术有限公司 基于隐私保护的模型训练方法及装置
CN115640517A (zh) * 2022-09-05 2023-01-24 北京火山引擎科技有限公司 多方协同模型训练方法、装置、设备和介质

Also Published As

Publication number Publication date
CN115640517A (zh) 2023-01-24

Similar Documents

Publication Publication Date Title
US20210004718A1 (en) Method and device for training a model based on federated learning
WO2021114911A1 (zh) 用户风险评估方法及装置、电子设备、存储介质
WO2020182122A1 (zh) 用于生成文本匹配模型的方法和装置
WO2022089256A1 (zh) 联邦神经网络模型的训练方法、装置、设备、计算机程序产品及计算机可读存储介质
WO2022016964A1 (zh) 纵向联邦建模优化方法、设备及可读存储介质
WO2024051456A1 (zh) 多方协同模型训练方法、装置、设备和介质
JP7229148B2 (ja) プライバシーを保護しながら分散された顧客データ上で機械学習すること
WO2020207174A1 (zh) 用于生成量化神经网络的方法和装置
CN112149174B (zh) 模型训练方法、装置、设备和介质
CN112149706B (zh) 模型训练方法、装置、设备和介质
EP3863003B1 (en) Hidden sigmoid function calculation system, hidden logistic regression calculation system, hidden sigmoid function calculation device, hidden logistic regression calculation device, hidden sigmoid function calculation method, hidden logistic regression calculation method, and program
WO2021184769A1 (zh) 神经网络文本翻译模型的运行方法、装置、设备、及介质
WO2021196935A1 (zh) 数据校验方法、装置、电子设备和存储介质
WO2023174018A1 (zh) 一种纵向联邦学习方法、装置、系统、设备及存储介质
EP4086808A2 (en) Text checking method and apparatus based on knowledge graph, electronic device, and medium
US20230161899A1 (en) Data processing for release while protecting individual privacy
Yuan et al. Two new PRP conjugate gradient algorithms for minimization optimization models
WO2023045137A1 (zh) 数据融合方法、装置、电子设备和计算机可读存储介质
Hu et al. Delay-dependent stability of Runge–Kutta methods for linear delay differential–algebraic equations
US20170155571A1 (en) System and method for discovering ad-hoc communities over large-scale implicit networks by wave relaxation
CN112149834B (zh) 模型训练方法、装置、设备和介质
US20220044109A1 (en) Quantization-aware training of quantized neural networks
CN112149141B (zh) 模型训练方法、装置、设备和介质
WO2023185125A1 (zh) 产品资源的数据处理方法及装置、电子设备、存储介质
WO2024066143A1 (zh) 分子碰撞截面的预测方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23862153

Country of ref document: EP

Kind code of ref document: A1