WO2022016964A1 - Vertical federated modeling optimization method and device, and readable storage medium - Google Patents
Vertical federated modeling optimization method and device, and readable storage medium Download PDFInfo
- Publication number
- WO2022016964A1 WO2022016964A1 PCT/CN2021/093407 CN2021093407W WO2022016964A1 WO 2022016964 A1 WO2022016964 A1 WO 2022016964A1 CN 2021093407 W CN2021093407 W CN 2021093407W WO 2022016964 A1 WO2022016964 A1 WO 2022016964A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- network
- data
- search
- gradient
- output
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/04—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
- H04L63/0428—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
Definitions
- the present application relates to the technical field of artificial intelligence, and in particular, to a vertical federation modeling optimization method, device and readable storage medium.
- vertical federated learning when the data features of the participants overlap less and the users overlap more, the part of the users and data with the same users but different data features is taken out to jointly train the machine learning model. For example, there are two participants A and B belonging to the same region, where participant A is a bank and participant B is an e-commerce platform. Participants A and B have more and the same users in the same area, but A and B have different businesses, and the recorded user data characteristics are different. In particular, the user data characteristics of A and B records may be complementary. In such a scenario, vertical federated learning can be used to help A and B build a joint machine learning predictive model to help A and B provide better services to their customers.
- the participants of the vertical federated learning need to design their own model structures in advance when using the vertical federated technology, and the slight difference in the designed model structure may greatly affect the performance of the overall vertical federated learning technology.
- the participation threshold of vertical federated learning is relatively high, which limits the application scope of vertical federated learning in specific task areas.
- the main purpose of this application is to provide a vertical federation modeling optimization method, equipment and readable storage medium, which aims to solve the need for the current vertical federated learning participants to design their own model structures in advance when using vertical federation technology. This results in a high threshold for participation in vertical federated learning.
- the present application provides an optimization method for vertical federation modeling.
- the method is applied to a label party participating in the vertical federation modeling, and the label party is connected to each data party participating in the vertical federation modeling.
- a first data set and a first search network constructed based on respective data characteristics are respectively deployed in the data cube, and the method includes the following steps:
- the present application also provides a vertical federation modeling optimization method, the method is applied to the data cubes participating in the vertical federation modeling, and each data cube is respectively deployed with a first data set and a The first search network, the method includes the following steps:
- the second network output and the label data of the label side calculate the first gradient of the loss function relative to each of the first network outputs, and return the first gradient to the corresponding data side;
- Search structure parameters and/or model parameters in the first search network are updated according to the first gradient received from the tag side.
- the present application also provides a vertical federated modeling and optimization device, the vertical federated modeling and optimization device includes: a memory, a processor, and a vertical federated modeling and optimization device stored on the memory and running on the processor.
- a federated modeling optimizer that, when executed by the processor, implements the steps of the vertical federated modeling optimization method described above.
- the present application also proposes a computer-readable storage medium, where a vertical federated modeling optimization program is stored on the computer-readable storage medium, and the vertical federated modeling optimization program is implemented when executed by a processor Steps of the vertical federated modeling optimization method as described above.
- the first network output sent by the data side is received by the label side, wherein the first network output is obtained by the data side entering the first data set into the first search network; the label side fuses each One network output obtains the second network output, and the first gradient of the loss function relative to each first network output is calculated according to the second network output and the label data of the local end; each first gradient is subjected to differential privacy encryption processing to obtain each first encrypted Gradient, each first encrypted gradient is sent to the corresponding data party, so that the data party can update the search structure parameters and/or model parameters in the first search network according to the first encrypted gradient.
- the data party since the gradient sent by the label party to the data party is processed by differential privacy encryption, the data party cannot know the original gradient, thus avoiding the data party from deriving the label data and feature data of the label party according to the gradient. This avoids the leakage of private data in the label side to the data side, and improves the data security of the label side during the vertical federation modeling process.
- the present application realizes that in the vertical federated modeling process, the data parties only need to set up their own search networks, that is, Yes, the connection between each network unit in the search network, that is, the model structure, is automatically determined by optimizing and updating the search structure parameters during the vertical federation modeling process, which realizes automatic vertical federated learning without spending a lot of manpower.
- the material resources pre-set the model structure, which lowers the threshold for participating in vertical federated learning, enables vertical federated learning to be applied to a wider range of specific task fields to achieve specific tasks, and improves the application scope of vertical federated learning.
- the data sent to the tag side is the output of the search network, and the tag side sent to the data side is the gradient after differential privacy processing. To a certain extent, the data security and model information security of each participant are guaranteed.
- FIG. 1 is a schematic structural diagram of a hardware operating environment involved in a solution according to an embodiment of the present application
- FIG. 2 is a schematic flowchart of the first embodiment of the vertical federated modeling optimization method of the present application
- FIG. 3 is a framework diagram of an automatic vertical federated learning of differential privacy encrypted communication information involved in an embodiment of the application.
- FIG. 1 is a schematic diagram of a device structure of a hardware operating environment involved in the solution of the embodiment of the present application.
- the vertical federation modeling and optimization device in this embodiment of the present application may be devices such as a smart phone, a personal computer, and a server, which are not specifically limited herein.
- the vertical federated modeling optimization device may include: a processor 1001 , such as a CPU, a network interface 1004 , a user interface 1003 , a memory 1005 , and a communication bus 1002 .
- the communication bus 1002 is used to realize the connection and communication between these components.
- the user interface 1003 may include a display screen (Display), an input unit such as a keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a wireless interface.
- the network interface 1004 may include a standard wired interface and a wireless interface (eg, a WI-FI interface).
- the memory 1005 may be high-speed RAM memory, or may be non-volatile memory, such as disk memory.
- the memory 1005 may also be a storage device independent of the aforementioned processor 1001 .
- FIG. 1 does not constitute a limitation on the vertical federated modeling optimization device, and may include more or less components than the one shown, or combine some components, or different Component placement.
- the memory 1005 as a computer storage medium may include an operating system, a network communication module, a user interface module, and a vertical federation modeling optimization program.
- the operating system is a program that manages and controls the hardware and software resources of the device, and supports the operation of the vertical federation modeling optimization program and other software or programs.
- the user interface 1003 is mainly used for data communication with the client;
- the data side of the model establishes a communication connection, and each data side is respectively deployed with a first data set and a first search network constructed based on their respective data characteristics;
- the processor 1001 can be used to call the vertical federation modeling optimization program stored in the memory 1005, and do the following:
- the step of receiving the first network output sent by the data party, wherein the first network output is obtained by the data party entering the first data set into the first search network includes:
- the label side is deployed with an output network and a second data set and a second search network constructed based on the data features of the label side,
- the step of fusing each of the first network outputs to obtain the second network output includes:
- the processor 1001 may also be used to call the vertical federation construction stored in the memory 1005. modulo optimizer and do the following:
- a second gradient of the loss function relative to the target parameter in the second search network is calculated according to the second network output and the label data, and the target parameter is updated according to the second gradient, wherein the target parameter are search structure parameters and/or model parameters in the second search network.
- differential privacy encryption processing includes clipping processing and Gaussian noise addition processing
- step of performing differential privacy encryption processing on each of the first gradients to obtain each first encrypted gradient includes:
- each element in the noise array corresponds to each element in the first gradient one-to-one;
- the first encrypted gradient is obtained by adding Gaussian noise to the first gradient by using the noise array.
- the method further includes:
- the second preset threshold is set according to the privacy level and the modeling progress.
- the user interface 1003 is mainly used for data communication with the client;
- the tag side of the model establishes a communication connection, and each data side is respectively deployed with a first data set and a first search network constructed based on their respective data characteristics;
- the processor 1001 can be used to call the vertical federation modeling optimization program stored in the memory 1005, and do the following:
- the second network output and the label data of the label side calculate the first gradient of the loss function relative to each of the first network outputs, and return the first gradient to the corresponding data side;
- Search structure parameters and/or model parameters in the first search network are updated according to the first gradient received from the tag side.
- the search structure parameters in the first search network of the data side include weights corresponding to the connection operations between network elements in the first search network, and the said first gradient received from the label side is updated according to the weight.
- the processor 1001 can also be used to call the vertical federation modeling optimization program stored in the memory 1005, and perform the following operations:
- a reservation operation is selected from each connection operation
- the model formed by each of the reservation operations and the network elements connected to each of the reservation operations is taken as a target model.
- FIG. 2 is a schematic flowchart of the first embodiment of the vertical federation modeling optimization method of the present application. It should be noted that although a logical order is shown in the flowcharts, in some cases, the steps shown or described may be performed in an order different from that herein.
- the vertical federated modeling optimization method of the present application is applied to the labeling party participating in the vertical federated learning.
- the labeling party is connected to each data party participating in the vertical federated learning.
- the data party and the tag party can be devices such as smart phones, personal computers, and servers.
- the vertical federated modeling optimization method includes:
- Step S10 receiving the first network output sent by the data party, wherein the first network output is obtained by the data party inputting the first data set into the first search network;
- the participants in the vertical federated learning are divided into two categories, one is the labeling party with label data, and the other is the data party that has no label data but has feature data.
- the labeling party has one
- the data side has one or more.
- Each data party can deploy a dataset and a search network based on its own data features; if the tag party also has feature data, the tag party can also act as a data party and deploy the data set and search network based on its data features.
- the network performs both the tasks of the label side and the tasks of the data side.
- the data side and the search network of the data side are called the first data set and the first search network
- the data set and the search network of the label side are called the first
- the second dataset and the second search network are distinguished.
- the sample dimensions of the data sets of each participant are aligned, that is, the sample IDs of each data set are the same, but the data characteristics of each participant may be different.
- Each participant may use the encrypted sample alignment method in advance to construct a sample dimension-aligned data set, which will not be described in detail here.
- the search network refers to a network used for network structure search (NAS).
- the search network of each participant may Architecture Search, microstructure search) method to design the network.
- the search network includes multiple units, each unit corresponds to a network layer, and some units are provided with connection operations. Taking two units as an example, the connection operations before these two units can be preset N types of connections operation, and defines the weight corresponding to each connection operation, the weight is the search structure parameter of the search network, and the network layer parameters in the unit are the model parameters of the search network.
- a network structure search is required to optimize and update the search structure parameters and model parameters. Based on the final updated search structure parameters, the final network structure can be determined, that is, which connection operation or operations to retain. Since the structure of the network is determined after a network search, each participant does not need to set the network structure of the model like designing a traditional vertical federated learning model, thus reducing the difficulty of designing the model.
- the search network combination of each participant constitutes a task model, and the network output of each search network is fused to obtain the final output of the task model.
- the task model may further include an output network for fusing the network outputs of each search network, the output network is set after the search network connected to each participant, and the output data of each search network is used as input data, The output of the output network is used as the final output of the task model.
- the output network can be deployed on the label side, and the output network can use a fully connected layer or other complex neural network structure, which can vary according to the model prediction task; the output form of the output network can also be set according to the specific model prediction task. For example, when the model prediction task is image classification, the output of the output network is the class to which the input image belongs.
- the data side wants to update the parameters in their respective search networks it needs the label data in the label side to calculate the loss function and gradient, and the label side needs to calculate the loss function and gradient. Datasets and Search Networks in Square.
- the data side and the label side can interact with each other to update the intermediate results of the model parameters and search structure parameters in the respective search networks, and update the model parameters and the search network based on the received intermediate results.
- the search structure parameters are used to update the respective search networks, thereby completing the update of the task model.
- the intermediate result can be the gradient of the parameters or the output data of the search network.
- the intermediate result sent to the tag side may be the output data of the search network at the end; when the participant is the tag side, the intermediate result sent to the data side may be the calculated data side.
- Each participant can jointly update parameters in multiple rounds.
- the data sends a network output to the label side, and the label sends a gradient to the data side.
- Each participant can only update the model structure parameters in their respective search networks, or they can only update their respective search networks.
- the model parameters can also be updated simultaneously with the model structure parameters and model parameters in the respective search networks, that is, the model structure parameters and/or model parameters can be updated.
- the model structure parameters and model parameters in the search network of each participant are updated multiple times. Specifically, in each round of joint parameter update, which parameter each participant will update can be set uniformly in advance.
- each data party inputs their respective first data sets into their respective first search networks, obtains output results after processing by the first search networks, and obtains the output of the first network based on the output results .
- the data parties send their respective first network outputs to the tag parties.
- the data party can directly use the output result of the first search network (also referred to as the original output of the network) as the first network output, or encrypt the output result with an encryption algorithm, and use the encrypted result as the first network output.
- the homomorphic encryption method is used for encryption or the differential privacy encryption method is used for encryption.
- the tag side receives the first network output sent by each data side.
- the participants may use different data sets in each round of joint update parameters. Specifically, the participants can divide the total data set into multiple small training sets (also referred to as data batches), and each round uses a small data set to participate in the joint update of parameters, or the participants can also jointly update the parameters in each round Before parameter update, a batch of data is sampled with replacement from the total data set to participate in the joint parameter update of this round.
- small training sets also referred to as data batches
- Step S20 fusing each of the first network outputs to obtain a second network output, and calculating the first gradient of the loss function relative to each of the first network outputs according to the second network output and the label data of the local end;
- the label side fuses each first network output to obtain the second network output.
- the tag side can average the outputs of each first network to obtain the second network output, or when the tag side deploys an output network, the output of each first network can be spliced and input into the output network, and processed by the output network. Get the second network output.
- the method of splicing may be vector splicing.
- the label side calculates a loss function according to the output of the second network and the label data of the label side.
- the loss function can be the mean square error of the regression problem or the cross entropy loss of the classification problem, etc., and calculates the loss function relative to the first network output. a gradient.
- the method of calculating the gradient according to the loss function can refer to the chain rule and the gradient descent algorithm, and will not be described in detail here.
- Step S30 performing differential privacy encryption processing on each of the first gradients to obtain each first encryption gradient, and sending each of the first encryption gradients to the corresponding data party, so that the data party can use the first encryption gradient according to the first encryption gradient.
- the search structure parameters and/or model parameters in the first search network are updated.
- differential privacy is a means in cryptography, which aims to provide a way to maximize the accuracy of data query when querying from statistical databases, while minimizing the chance of identifying its records.
- differential privacy is to add random noise to the data.
- the differential privacy encryption processing method may adopt the existing differential privacy encryption processing method, which will not be described in detail here.
- the tag side sends each first encryption gradient to the corresponding data side, that is, which data side sends the first network output corresponding to the first encryption gradient to which data side the first encryption gradient is sent.
- the data party updates the search structure parameters and/or model parameters in its first search network according to the first encryption gradient. Specifically, according to the chain rule and gradient descent algorithm, the data party calculates the gradient of the loss function relative to the search structure parameters and/or model parameters in the search network according to the first encrypted gradient, and updates the search structure parameters and/or model parameters correspondingly according to the gradient. or model parameters.
- the first one calculate the gradient of the loss function relative to the search structure parameters according to the first encryption gradient, and update the search structure parameters according to the gradient
- the second type calculate the relative loss function according to the first encryption gradient.
- the third type calculate the gradient of the loss function relative to the search structure parameter according to the first encryption gradient, update the search structure parameter according to the gradient, and calculate the loss function according to the first encryption gradient
- the model parameters are updated according to the gradient with respect to the model parameters. At this point, a round of joint parameter updating process is completed.
- the participants can obtain the target model according to the search network after updating the parameters.
- the search structure parameters in the search network of the participants may include weights corresponding to connection operations between network elements in the search network. That is, connection operations are set between network units, and each connection operation corresponds to a weight. It should be noted that a connection operation is not set between any two network units. Participants can select a reservation operation from each connection operation according to the search structure parameters in their updated search network. Specifically, for every two network units that have connection operations, there are multiple connection operations between them, and one or more connection operations with a greater weight may be selected from the multiple connection operations as the reserved operation. After the reservation operation is determined, the model formed by each reservation operation and the network elements connected by each reservation operation is used as the target model of the participant. Each participant can use their own target models to jointly complete specific model prediction tasks.
- the first network output sent by the data side is received by the tag side, wherein the first network output is obtained by the data side inputting the first data set into the first search network; the tag side fuses The output of each first network obtains the output of the second network, and the first gradient of the loss function relative to the output of each first network is calculated according to the output of the second network and the label data of the local end; An encrypted gradient, each first encrypted gradient is sent to the corresponding data party, so that the data party can update the search structure parameters and/or model parameters in the first search network according to the first encrypted gradient.
- the data party since the gradient sent by the label party to the data party is processed by differential privacy encryption, the data party cannot know the original gradient, thus avoiding the data party from deriving the label data and feature data of the label party according to the gradient. This avoids the leakage of private data in the label side to the data side, and improves the data security of the label side during the vertical federation modeling process.
- this embodiment realizes that in the vertical federated modeling process, each data party only needs to set up their own search methods. The network is enough.
- each network unit in the search network that is, the model structure
- the connection between each network unit in the search network is automatically determined by optimizing and updating the search structure parameters in the vertical federation modeling process, which realizes automatic vertical federation learning without spending
- a large number of human and material resources are used to pre-set the model structure, which lowers the threshold for participating in vertical federated learning, so that vertical federated learning can be applied to a wider range of specific task fields to achieve specific tasks, and the application scope of vertical federated learning is improved.
- the data sent to the tag side is the output of the search network, and the tag side sent to the data side is the gradient after differential privacy processing. To a certain extent, the data security and model information security of each participant are guaranteed.
- the step S10 may include:
- Step S101 Receive the first network output sent by the data party, where the first network output is the original network output obtained by the data party entering the first data set into the first search network for processing, and It is obtained by performing differential privacy encryption processing on the original output of the network.
- the data party can input its first data set into its first search network for processing to obtain the original output of the network, and the original output of the network is the result directly output by the first search network.
- the data party performs differential privacy encryption processing on the original network output to obtain the first network output, and then sends the first network output to the label party. That is, the first network output sent by each data party received by the tag party is the result of the differential privacy encryption process performed by the data party, not the original network output of the first search network.
- the tag party cannot know the network based on the first network output.
- the original output prevents the label party from deriving the characteristic data of the data party according to the original output of the network, which further prevents the private data in the data party from leaking to the label party, and improves the data security of the data party.
- the labeling party deploys an output network and a third data characteristic constructed based on the labeling party.
- Two datasets and a second search network, the step of fusing each of the first network outputs to obtain the second network output in the step S20 includes:
- Step S201 inputting the second data set into the second search network to obtain a third network output
- the tagging party when the tagging party owns the feature data, the tagging party can deploy the second data set and the second search network constructed based on the data features of the tagging party.
- the tag side may also deploy an output network for fusing the network outputs of the various search networks.
- the tag side can input the second data set into the second search network, and obtain the output of the third network after processing by the second search network.
- Step S202 splicing the third network output and each of the first network outputs and then inputting the output network into the output network to obtain the second network output;
- each network output can be regarded as a vector form, and a common vector splicing method can be used for splicing each network output.
- step S20 it also includes:
- Step S40 Calculate the second gradient of the loss function relative to the target parameter in the second search network according to the second network output and the label data, and update the target parameter according to the second gradient, wherein the The target parameters are search structure parameters and/or model parameters in the second search network.
- the label side can also calculate the second gradient of the loss function relative to the target parameters in the second search network, and update the target according to the second gradient.
- the target parameters may be search structure parameters and/or model parameters in the second search network.
- the first one calculate the gradient of the loss function relative to the search structure parameters, and update the search structure parameters according to the gradient; the second: calculate the gradient of the loss function relative to the model parameters, and update the model according to the gradient
- the third type calculate the gradient of the loss function relative to the search structure parameters, update the search structure parameters according to the gradient, and calculate the gradient of the loss function relative to the model parameters, and update the model parameters according to the gradient.
- the step of performing differential privacy encryption processing on each of the first gradients in step S30 to obtain each of the first encrypted gradients includes:
- Step S301 performing clipping processing on the first gradient to obtain a first clipping gradient, wherein the second-order norm of the first clipping gradient is less than or equal to a first preset threshold;
- the differential privacy encryption processing may include two steps of clipping processing and adding Gaussian noise processing.
- the label side performs clipping processing on the first gradient to obtain the first clipping gradient, and the second-order norm of the first clipping gradient obtained after clipping is less than or equal to the first preset threshold.
- the first preset threshold is a threshold preset by the tag side.
- the first gradient is clipped to the first clipped gradient whose second-order norm is less than or equal to the first preset threshold, so that the first gradient calculated when The change of the clipping gradient is limited to a certain range, so that the data side cannot deduce the original data of the label side according to the first clipping gradient.
- the label can adopt any cutting processing method that can achieve this purpose.
- a cropping processing method is that, for each first gradient, the label side calculates the ratio of the second-order norm of the first gradient to the first preset threshold, and selects the larger ratio from the obtained ratio and 1. value, divide the first gradient by the larger value to obtain the first clipping gradient.
- the second-order norm of the first clipping gradient calculated according to the method is less than or equal to the first preset threshold.
- the label party can set different first preset thresholds for different data parties. If the privacy level is higher, a smaller first preset threshold can be set. , and the privacy level is lower, a larger first preset threshold can be set.
- Step S302 generating a noise array that obeys the target Gaussian distribution, wherein the mean value of the target Gaussian distribution is 0, the mean square error is a second preset threshold, and each element in the noise array is equal to each element in the first gradient.
- Step S303 using the noise array to add Gaussian noise to the first gradient to obtain a first encrypted gradient.
- the label can generate a noise array that obeys the target Gaussian distribution, where the mean of the target Gaussian distribution is 0, the mean square error is the second preset threshold, and each element in the noise array corresponds to each element in the first gradient.
- the second preset threshold may be set according to specific needs, and the second preset threshold may be the square of the first preset threshold multiplied by the square of a coefficient. If the first gradient is in matrix form, the generated noise array is also in matrix form, and the matrix size of the noise array is the same as that of the first gradient.
- the tag side uses a noise array to add Gaussian noise to each first gradient to obtain each first encrypted gradient. Specifically, for each first gradient, the tag side adds the noise array to the first gradient, that is, adds each element in the first gradient to the element at the corresponding position in the noise array. Since the first encryption gradient is the result obtained after clipping and adding noise, the data side cannot know the original first gradient according to the first encryption gradient, so the original data of the label side cannot be deduced, thereby improving the data of the label side. privacy.
- the method also includes:
- Step S50 obtaining the privacy level and modeling progress of this vertical federation modeling
- Step S60 setting the second preset threshold according to the privacy level and the modeling progress.
- the tag side can set the second preset threshold during the vertical federation modeling process, that is, the tag side can use a different second preset threshold when parameters are jointly updated in each round.
- the label can obtain the privacy level of this vertical federation modeling and the current modeling progress.
- the benchmark threshold corresponding to different privacy levels can be preset, and the threshold change range corresponding to different modeling progress can be preset, wherein the threshold change range can be negative or positive, and the modeling progress can be the convergence speed of the loss function, joint The rounds or duration of updating parameters.
- the tag side can determine the corresponding benchmark threshold according to the mapping level, and determine the threshold change range according to the current modeling progress, and add the benchmark threshold value to the threshold change range to obtain the second prediction value.
- the correspondence between the privacy level and the reference threshold may be that the higher the level is, the larger the reference threshold is, and the lower the level is, the smaller the reference threshold is, so that the higher the privacy level is, the more noise is added, and the lower the privacy level is. Small, flexibly set the noise size according to the privacy level, avoiding data distortion caused by excessive noise and affecting the prediction accuracy of the model
- the relationship between the convergence speed and the threshold change range can be that the faster the convergence speed, the larger the threshold change range, and the slower the convergence speed, the smaller the threshold change range, so that when the convergence speed is relatively slow, it is difficult to
- the second preset threshold can be reduced by a smaller threshold variation range (possibly negative), thereby reducing the noise, so as to promote the convergence of the loss function and ensure the prediction accuracy of the model.
- the relationship between the rounds and the threshold change range may be that the larger the round, the smaller the threshold change range, and the smaller the round, the larger the threshold change range, so that with the joint
- the second preset threshold becomes smaller and smaller, so that the noise is gradually reduced, so as to promote the convergence of the loss function and ensure the prediction accuracy of the model.
- the relationship between the duration and the threshold change range may be that the longer the duration is, the smaller the threshold change range is, and the smaller the duration is, the larger the threshold change range is, so that with the duration of the joint update parameters
- the second preset threshold becomes smaller and smaller, so that the noise is gradually reduced, so as to promote the convergence of the loss function and ensure the prediction accuracy of the model.
- A represents the label side
- B represents the data side
- i represents the label of the data side
- N is the number of the data side.
- A has feature data X A and corresponding label data Y A
- B 1 , . . . , B N have feature data X 1 , . . . , X N respectively.
- the characteristic data X A , X 1 , . . . , X N have data characteristics of different distributions.
- Each participant has a search network, namely, Net A , Net 1 ,..., Net N , and the corresponding model parameters and search structure parameters are W A , W 1 ,..., W N and ⁇ A , ⁇ 1 ,... , ⁇ N .
- A also deploys an output network Net out for computing Y out .
- the clip(x) in the lower right corner of the figure indicates that x is clipped, and +N(0, ⁇ 2 ) indicates that Gaussian noise is added to the clipping result.
- a fourth embodiment of the vertical federated modeling optimization method of the present application is proposed.
- the method is applied to data cubes participating in vertical federation modeling, and each data cube is respectively deployed with a first data set and a first search network constructed based on respective data features, and the method includes the following steps:
- Step A10 inputting the first data set into the first search network to obtain the original output of the network
- the participants in the vertical federated learning are divided into two categories, one is the labeling party with label data, and the other is the data party that has no label data but has feature data.
- the labeling party has one
- the data side has one or more.
- Each data party can deploy a dataset and a search network based on its own data features; if the tag party also has feature data, the tag party can also act as a data party and deploy the data set and search network based on its data features.
- the network performs both the tasks of the label side and the tasks of the data side.
- the data side and the search network of the data side are called the first data set and the first search network
- the data set and the search network of the label side are called the first
- the second dataset and the second search network are distinguished.
- the sample dimensions of the data sets of each participant are aligned, that is, the sample IDs of each data set are the same, but the data characteristics of each participant may be different.
- Each participant may use the encrypted sample alignment method in advance to construct a sample dimension-aligned data set, which will not be described in detail here.
- the search network refers to a network used for network structure search (NAS).
- the search network of each participant may Architecture Search, microstructure search) method to design the network.
- the search network includes multiple units, each unit corresponds to a network layer, and some units are provided with connection operations. Taking two units as an example, the connection operations before these two units can be preset N types of connections operation, and defines the weight corresponding to each connection operation, the weight is the search structure parameter of the search network, and the network layer parameters in the unit are the model parameters of the search network.
- a network structure search is required to optimize and update the search structure parameters and model parameters. Based on the final updated search structure parameters, the final network structure can be determined, that is, which connection operation or operations to retain. Since the structure of the network is determined after a network search, each participant does not need to set the network structure of the model like designing a traditional vertical federated learning model, thus reducing the difficulty of designing the model.
- the search network combination of each participant constitutes a task model, and the network output of each search network is fused to obtain the final output of the task model.
- the task model may further include an output network for fusing the network outputs of each search network, the output network is set after the search network connected to each participant, and the output data of each search network is used as input data, The output of the output network is used as the final output of the task model.
- the output network can be deployed on the label side, and the output network can use a fully connected layer or other complex neural network structure, which can vary according to the model prediction task; the output form of the output network can also be set according to the specific model prediction task. For example, when the model prediction task is image classification, the output of the output network is the class to which the input image belongs.
- the data side wants to update the parameters in their respective search networks it needs the label data in the label side to calculate the loss function and gradient, and the label side needs to calculate the loss function and gradient. Datasets and Search Networks in Square.
- the data side and the label side can interact with each other to update the intermediate results of the model parameters and search structure parameters in the respective search networks, and update the model parameters and the search network based on the received intermediate results.
- the search structure parameters are used to update the respective search networks, thereby completing the update of the task model.
- the intermediate result can be the gradient of the parameters or the output data of the search network.
- the intermediate result sent to the tag side may be the output data of the search network at the end; when the participant is the tag side, the intermediate result sent to the data side may be the calculated data side.
- Each participant can jointly update parameters in multiple rounds.
- the data sends a network output to the label side, and the label sends a gradient to the data side.
- Each participant can only update the model structure parameters in their respective search networks, or they can only update their respective search networks.
- the model parameters can also be updated simultaneously with the model structure parameters and model parameters in the respective search networks, that is, the model structure parameters and/or model parameters can be updated.
- the model structure parameters and model parameters in the search network of each participant are updated multiple times. Specifically, in each round of joint parameter update, which parameter each participant will update can be set uniformly in advance.
- each data party inputs their respective first data sets into their respective first search networks, and processes the first search networks to obtain an output result, which is the original output of the network.
- the participants may use different data sets in each round of joint update parameters. Specifically, the participants can divide the total data set into multiple small training sets (also referred to as data batches), and each round uses a small data set to participate in the joint update of parameters, or the participants can also jointly update the parameters in each round Before parameter update, a batch of data is sampled with replacement from the total data set to participate in the joint parameter update of this round.
- small training sets also referred to as data batches
- Step A20 performing differential privacy encryption processing on the original network output to obtain a first network output
- Step A30 sending the first network output to the label party participating in the vertical federation modeling, so that the label party fuses the first network output received from each data party to obtain the second network output, Calculate the first gradient of the loss function relative to each of the first network outputs according to the second network output and the label data of the label side, and return the first gradient to the corresponding data side;
- the data party performs differential privacy encryption processing on the original network output to obtain the first network output, and then sends the first network output to the label party.
- the differential privacy encryption processing method in this embodiment may adopt the existing differential privacy encryption processing method. That is, the first network output sent by each data party received by the tag party is the result of the differential privacy encryption process performed by the data party, not the original network output of the first search network.
- the tag party cannot know the network based on the first network output.
- the original output prevents the label party from deriving the characteristic data of the data party according to the original output of the network, which further prevents the private data in the data party from leaking to the label party, and improves the data security of the data party.
- the method of performing differential privacy processing on the output of the first network by the data party may refer to the method of performing differential privacy processing on the first gradient by the tag side in the third embodiment.
- O i represents the label of each data square
- O i represents the first network output corresponding to each data square
- the result O i ' O i /max(1,
- the tag side receives the first network output sent by each data side, and fuses each first network output to obtain the second network output. Specifically, the tag side can average the outputs of each first network to obtain the second network output, or when the tag side deploys an output network, the output of each first network can be spliced and input into the output network, and processed by the output network. Get the second network output.
- the method of splicing may be vector splicing.
- the label side calculates a loss function according to the output of the second network and the label data of the label side.
- the loss function can be the mean square error of the regression problem or the cross entropy loss of the classification problem, etc., and calculates the loss function relative to the first network output. a gradient.
- the method of calculating the gradient according to the loss function can refer to the chain rule and the gradient descent algorithm, and will not be described in detail here.
- the tag side After calculating the first gradient corresponding to each first network output, the tag side can send the first gradient to the corresponding data side, that is, which data side sends the first network output corresponding to the first gradient, then Which data party to send this first gradient to.
- Step A40 Update search structure parameters and/or model parameters in the first search network according to the first gradient received from the tag side.
- the data party After receiving the first gradient, the data party updates the search structure parameters and/or model parameters in its first search network according to the first gradient. Specifically, according to the chain rule and the gradient descent algorithm, the data party obtains the gradient of the loss function relative to the search structure parameters and/or model parameters in the search network according to the first gradient calculation, and updates the search structure parameters and/or the corresponding gradient according to the gradient. model parameters.
- the first one calculate the gradient of the loss function relative to the search structure parameters according to the first gradient, and update the search structure parameters according to the gradient
- the second calculate the loss function according to the first gradient relative to the model The gradient of the parameter, update the model parameters according to the gradient
- the third type calculate the gradient of the loss function relative to the search structure parameters according to the first gradient, update the search structure parameters according to the gradient, and calculate the loss function according to the first gradient relative to the model parameters
- the gradient of according to which the model parameters are updated.
- the participants can obtain the target model according to the search network after updating the parameters.
- Each participant can use their own target models to jointly complete specific model prediction tasks.
- the data party inputs the first data set into the first search network to obtain the original network output, performs differential privacy encryption processing on the original network output to obtain the first network output, and sends the first network output to the label party , so that the label side fuses the first network output received from each data side to obtain the second network output, and calculates the first network output of the loss function relative to the first network output according to the second network output and the label side’s label data.
- the data side updates the search structure parameters and/or model parameters in the first search network according to the first gradient received from the label side.
- the data side cannot know the original gradient, thus preventing the data side from deriving the label data and feature data of the label side according to the gradient, which further avoids the The private data in the label side is leaked to the data side, which improves the data security of the label side.
- this embodiment realizes that in the vertical federated modeling process, each data party only needs to set up their own search methods. The network is enough.
- each network unit in the search network that is, the model structure
- the connection between each network unit in the search network is automatically determined by optimizing and updating the search structure parameters in the vertical federation modeling process, which realizes automatic vertical federation learning without spending
- a large number of human and material resources are used to pre-set the model structure, which lowers the threshold for participating in vertical federated learning, so that vertical federated learning can be applied to a wider range of specific task fields to achieve specific tasks, and the application scope of vertical federated learning is improved.
- the data sent to the tag side is the output of the search network, and the tag side sent to the data side is the gradient after differential privacy processing. To a certain extent, the data security and model information security of each participant are guaranteed.
- the search structure parameter in the first search network of the data party includes the weight corresponding to the connection operation between the network units in the first search network, and after the step A40, the method further includes:
- Step A50 according to the search structure parameters in the first search network after updating the parameters, select a reservation operation from each connection operation;
- step A60 a model formed by each of the reservation operations and the network units connected to each of the reservation operations is used as a target model.
- the search structure parameters in the search network of the data party may include weights corresponding to connection operations between network elements in the search network. That is, connection operations are set between network units, and each connection operation corresponds to a weight. It should be noted that a connection operation is not set between any two network units.
- the data can select the retention operation from each connection operation according to the search structure parameters in the updated search network. Specifically, for every two network units that have connection operations, there are multiple connection operations between them, and one or more connection operations with a greater weight may be selected from the multiple connection operations as the reserved operation. After the reservation operation is determined, the model formed by each reservation operation and the network elements connected by each reservation operation is used as the target model of the participant.
- each participant may be a device deployed in a bank or other financial institution, and the participant stores user data recorded by each institution during business processing.
- Each institution can build a data set based on its own data characteristics, and use their own data sets to jointly conduct vertical federated learning, and enrich the features by expanding the model. degree to improve the prediction performance of the model.
- each participant can jointly build a user risk prediction model, which is used to predict the user's risk level in business scenarios such as credit business and insurance business.
- the data characteristics of each participant can select the risk characteristics related to the user's risk prediction according to actual experience, such as the user's deposit amount, the user's default times, and so on.
- Each participant uses their own data sets to jointly perform vertical federation modeling according to the method in the above-mentioned embodiment to obtain their own target models.
- each participant can jointly carry out risk prediction for users.
- the data party inputs the user data corresponding to the second risk feature of the target user at the local end into the target model of the local end, and obtains the output of the first model after processing by the target model.
- the data party sends the first model output to the data application provider.
- the label side receives the first model output sent by each data side.
- the labeler inputs the user data corresponding to the first risk feature of the target user at its local end into the target model of its local end, and after processing by the target model, the second model output is obtained.
- the tag side splices the output of each first model and the output of the second model, and inputs the splicing result into the output network of the tag side's local end. After processing by the output network, the output obtains the risk prediction result of the target user.
- the tag party can send the target user's risk prediction result to the data party, so that the data party can perform subsequent business processing according to the target user's risk prediction result, for example, Determine whether to lend to the target user according to the risk prediction result.
- each participant only needs to set up their own search network, and does not need to spend a lot of manpower and material resources to set up a carefully set model structure, thereby lowering the threshold for participating in vertical federated learning, enabling banks and other financial institutions to be more It is convenient to carry out joint modeling through longitudinal federated learning, and then complete the risk prediction task through the risk prediction model obtained by joint modeling. Moreover, in the process of vertical federation modeling and the use of models for risk prediction after modeling, each participant does not need to directly interact with their own datasets and models, thus ensuring the security of user privacy data in each participant.
- the data party performs differential privacy encryption on the network output of the search network before sending it to the label party, which prevents the label party from deriving the original user data in the data party according to the network output, thus further improving the data security of the data party.
- the label side encrypts the gradient corresponding to the network output before sending it to the data side, which prevents the data side from deriving the original user data in the label side according to the gradient of the network output, thus further improving the data security of the label side.
- an embodiment of the present application also proposes a vertical federation modeling optimization device.
- the device is deployed on a label party participating in the vertical federation modeling.
- the label party is in communication connection with each data party participating in the vertical federation modeling.
- a first data set and a first search network constructed based on respective data characteristics are respectively deployed in the data cube, and the device includes:
- a receiving module configured to receive the first network output sent by the data party, wherein the first network output is obtained by the data party inputting the first data set into the first search network;
- a computing module configured to fuse each of the first network outputs to obtain a second network output, and calculate the first gradient of the loss function relative to each of the first network outputs according to the second network output and the label data of the local end;
- a first differential privacy processing module configured to perform differential privacy encryption processing on each of the first gradients to obtain each first encrypted gradient, and send each of the first encrypted gradients to the corresponding data party for the data party to base on
- the first encryption gradient updates search structure parameters and/or model parameters in the first search network.
- the receiving module is also used for:
- the tag side is deployed with an output network, a second data set and a second search network constructed based on the data characteristics of the tag side, and the computing module includes:
- an input unit configured to input the second data set into the second search network to obtain a third network output
- a splicing unit for splicing the third network output and each of the first network outputs and then inputting the output network into the output network to obtain the second network output;
- the device also includes:
- a first update module configured to calculate the second gradient of the loss function relative to the target parameter in the second search network according to the second network output and the label data, and update the target parameter according to the second gradient , wherein the target parameter is a search structure parameter and/or a model parameter in the second search network.
- differential privacy encryption processing includes clipping processing and adding Gaussian noise processing
- first differential privacy processing module includes:
- a clipping processing unit configured to perform clipping processing on the first gradient to obtain a first clipping gradient, wherein the second-order norm of the first clipping gradient is less than or equal to a first preset threshold;
- the generating unit is configured to generate a noise array that obeys a target Gaussian distribution, wherein the mean value of the target Gaussian distribution is 0, the mean square error is a second preset threshold, and each element in the noise array is the same as each element in the first gradient.
- a target Gaussian distribution wherein the mean value of the target Gaussian distribution is 0, the mean square error is a second preset threshold, and each element in the noise array is the same as each element in the first gradient.
- the noise adding unit is used for adding Gaussian noise to the first gradient by using the noise array to obtain a first encrypted gradient.
- the device also includes:
- the acquisition module is used to acquire the privacy level and modeling progress of this vertical federation modeling
- a setting module configured to set the second preset threshold according to the privacy level and the modeling progress.
- an embodiment of the present application also proposes a vertical federation modeling optimization device.
- the device is deployed in data parties participating in vertical federation modeling, and each data party is respectively deployed with a first data set and a first data set constructed based on respective data characteristics. searching the network, the apparatus includes:
- an input module configured to input the first data set into the first search network to obtain the original output of the network
- a second differential privacy processing module configured to perform differential privacy encryption processing on the original network output to obtain the first network output
- a sending module configured to send the first network output to the label party participating in the vertical federation modeling, so that the label party can fuse the first network output received from each data party to obtain the second network output Then, calculate the first gradient of the loss function relative to each of the first network outputs according to the second network output and the label data of the label side, and return the first gradient to the corresponding data side;
- a second update module configured to update search structure parameters and/or model parameters in the first search network according to the first gradient received from the tag side.
- search structure parameters in the first search network of the data party include weights corresponding to connection operations between network elements in the first search network, and the device further includes:
- a selection module for selecting a reservation operation from each connection operation according to the search structure parameter in the first search network after updating the parameter
- a determination module configured to use a model formed by each of the reservation operations and the network units connected to each of the reservation operations as a target model.
- the extended content of the specific implementation of the vertical federated modeling optimization apparatus of the present application is basically the same as that of the above-mentioned embodiments of the vertical federated modeling optimization method, and will not be repeated here.
- an embodiment of the present application further provides a computer-readable storage medium, where a vertical federation modeling optimization program is stored on the storage medium, and when the vertical federation modeling optimization program is executed by a processor, the following vertical federation is realized Steps for modeling optimization methods.
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Computer Networks & Wireless Communication (AREA)
- Artificial Intelligence (AREA)
- Medical Informatics (AREA)
- Signal Processing (AREA)
- Computer Hardware Design (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A vertical federated modeling optimization method and device, and a readable storage medium. The method comprises: receiving first network outputs sent by the data party, the first network outputs being obtained by inputting, by the data party, the first dataset into the first search network (S10); fusing the first network outputs to obtain a second network output, and calculating a first gradient of a loss function relative to each first network output according to the second network output and tag data of a home terminal (S20); and performing differential privacy encryption processing on each first gradient to obtain first encryption gradients, and sending each first encryption gradient to a corresponding data party, so that the data party updates a search structure parameter and/or a model parameter in the first search network according to the first encryption gradient (S30).
Description
本申请要求于2020年7月23日申请的、申请号为202010719397.5的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with application number 202010719397.5 filed on July 23, 2020, the entire contents of which are incorporated herein by reference.
本申请涉及人工智能技术领域,尤其涉及一种纵向联邦建模优化方法、设备及可读存储介质。The present application relates to the technical field of artificial intelligence, and in particular, to a vertical federation modeling optimization method, device and readable storage medium.
随着人工智能的发展,人们为解决数据孤岛的问题,提出了“联邦学习”的概念,使得联邦双方在不用给出己方数据的情况下,也可进行模型训练得到模型参数,并且可以避免数据隐私泄露的问题。With the development of artificial intelligence, people put forward the concept of "federated learning" in order to solve the problem of data islands, so that both sides of the federation can train models to obtain model parameters without giving their own data, and can avoid data privacy breaches.
纵向联邦学习是在参与者的数据特征重叠较小,而用户重叠较多的情况下,取出参与者用户相同而用户数据特征不同的那部分用户及数据进行联合训练机器学习模型。比如有属于同一个地区的两个参与者A和B,其中参与者A是一家银行,参与者B是一个电商平台。参与者A和B在同一地区拥有较多相同的用户,但是A与B的业务不同,记录的用户数据特征是不同的。特别地,A和B记录的用户数据特征可能是互补的。在这样的场景下,可以使用纵向联邦学习来帮助A和B构建联合机器学习预测模型,帮助A和B向他们的客户提供更好的服务。In vertical federated learning, when the data features of the participants overlap less and the users overlap more, the part of the users and data with the same users but different data features is taken out to jointly train the machine learning model. For example, there are two participants A and B belonging to the same region, where participant A is a bank and participant B is an e-commerce platform. Participants A and B have more and the same users in the same area, but A and B have different businesses, and the recorded user data characteristics are different. In particular, the user data characteristics of A and B records may be complementary. In such a scenario, vertical federated learning can be used to help A and B build a joint machine learning predictive model to help A and B provide better services to their customers.
但是,目前纵向联邦学习的参与方在使用纵向联邦技术时需要对各自的模型结构进行预先的设计,而由于设计的模型结构稍有差别可能就会极大地影响整体纵向联邦学习技术的性能,使得纵向联邦学习的参与门槛较高,限制了纵向联邦学习在具体任务领域的应用范围。However, at present, the participants of the vertical federated learning need to design their own model structures in advance when using the vertical federated technology, and the slight difference in the designed model structure may greatly affect the performance of the overall vertical federated learning technology. The participation threshold of vertical federated learning is relatively high, which limits the application scope of vertical federated learning in specific task areas.
本申请的主要目的在于提供一种纵向联邦建模优化方法、设备及可读存储介质,旨在解决目前纵向联邦学习的参与方在使用纵向联邦技术时需要对各自的模型结构进行预先的设计,造成纵向联邦学习参与门槛高的问题。The main purpose of this application is to provide a vertical federation modeling optimization method, equipment and readable storage medium, which aims to solve the need for the current vertical federated learning participants to design their own model structures in advance when using vertical federation technology. This results in a high threshold for participation in vertical federated learning.
为实现上述目的,本申请提供一种纵向联邦建模优化方法,所述方法应用于参与纵向联邦建模的标签方,所述标签方与参与纵向联邦建模的各数据方通信连接,各所述数据方中分别部署有基于各自数据特征构建的第一数据集和第一搜索网络,所述方法包括以下步骤:In order to achieve the above purpose, the present application provides an optimization method for vertical federation modeling. The method is applied to a label party participating in the vertical federation modeling, and the label party is connected to each data party participating in the vertical federation modeling. A first data set and a first search network constructed based on respective data characteristics are respectively deployed in the data cube, and the method includes the following steps:
接收所述数据方发送的第一网络输出,其中,所述第一网络输出是所述数据方将所述第一数据集输入所述第一搜索网络得到的;receiving the first network output sent by the data party, wherein the first network output is obtained by the data party inputting the first data set into the first search network;
融合各所述第一网络输出得到第二网络输出,并根据所述第二网络输出和本端的标签数据计算损失函数相对于各所述第一网络输出的第一梯度;Fusing each of the first network outputs to obtain a second network output, and calculating the first gradient of the loss function relative to each of the first network outputs according to the second network output and the label data of the local end;
将各所述第一梯度进行差分隐私加密处理得到各第一加密梯度,将各所述第一加密梯度发送给对应的数据方,以供所述数据方根据所述第一加密梯度更新所述第一搜索网络中的搜索结构参数和/或模型参数。Performing differential privacy encryption processing on each of the first gradients to obtain each of the first encryption gradients, and sending each of the first encryption gradients to the corresponding data party for the data party to update the first encryption gradient according to the first encryption gradient. Search structure parameters and/or model parameters in the first search network.
为实现上述目的,本申请还提供一种纵向联邦建模优化方法,所述方法应用于参与纵向联邦建模的数据方,各数据方中分别部署有基于各自数据特征构建的第一数据集和第一搜索网络,所述方法包括以下步骤:In order to achieve the above purpose, the present application also provides a vertical federation modeling optimization method, the method is applied to the data cubes participating in the vertical federation modeling, and each data cube is respectively deployed with a first data set and a The first search network, the method includes the following steps:
将所述第一数据集输入所述第一搜索网络得到网络原始输出;Inputting the first data set into the first search network to obtain the original output of the network;
对所述网络原始输出进行差分隐私加密处理得到第一网络输出;Performing differential privacy encryption processing on the original output of the network to obtain a first network output;
将所述第一网络输出发送给参与纵向联邦建模的标签方,以供所述标签方对从各数据方接收到的所述第一网络输出进行融合得到第二网络输出后,根据所述第二网络输出和所述标签方的标签数据计算损失函数相对于各所述第一网络输出的第一梯度,并将所述第一梯度并返回给对应的数据方;Send the first network output to the label party participating in the vertical federation modeling, so that the label party can fuse the first network output received from each data party to obtain the second network output, according to the The second network output and the label data of the label side calculate the first gradient of the loss function relative to each of the first network outputs, and return the first gradient to the corresponding data side;
根据从所述标签方接收到的所述第一梯度更新所述第一搜索网络中的搜索结构参数和/或模型参数。Search structure parameters and/or model parameters in the first search network are updated according to the first gradient received from the tag side.
为实现上述目的,本申请还提供一种纵向联邦建模优化设备,所述纵向联邦建模优化设备包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的纵向联邦建模优化程序,所述纵向联邦建模优化程序被所述处理器执行时实现如上所述的纵向联邦建模优化方法的步骤。To achieve the above object, the present application also provides a vertical federated modeling and optimization device, the vertical federated modeling and optimization device includes: a memory, a processor, and a vertical federated modeling and optimization device stored on the memory and running on the processor. A federated modeling optimizer that, when executed by the processor, implements the steps of the vertical federated modeling optimization method described above.
此外,为实现上述目的,本申请还提出一种计算机可读存储介质,所述计算机可读存储介质上存储有纵向联邦建模优化程序,所述纵向联邦建模优化程序被处理器执行时实现如上所述的纵向联邦建模优化方法的步骤。In addition, in order to achieve the above object, the present application also proposes a computer-readable storage medium, where a vertical federated modeling optimization program is stored on the computer-readable storage medium, and the vertical federated modeling optimization program is implemented when executed by a processor Steps of the vertical federated modeling optimization method as described above.
本申请中,通过标签方接收数据方发送的第一网络输出,其中,第一网络输出是所述数据方将所述第一数据集输入所述第一搜索网络得到的;标签方融合各第一网络输出得到第二网络输出,并根据第二网络输出和本端的标签数据计算损失函数相对于各第一网络输出的第一梯度;将各第一梯度进行差分隐私加密处理得到各第一加密梯度,将各第一加密梯度发送给对应的数据方,以供数据方根据第一加密梯度更新第一搜索网络中的搜索结构参数和/或模型参数。在本申请中,由于标签方发送给数据方的梯度是经过差分隐私加密处理后的,数据方无法获知原始的梯度,从而避免了数据方根据梯度推导出标签方的标签数据和特征数据,也就避免了标签方中的隐私数据泄露给数据方,提升了在纵向联邦建模过程中标签方的数据安全性。并且,相比于现有纵向联邦学习中,各参与方需要人工花费大量人力物力预先设计模型结构的方式,本申请实现了在纵向联邦建模过程中,数据方只需要设置各自的搜索网络即可,搜索网络中各个网络单元之间的连接,也即模型结构,是在纵向联邦建模过程中通过优化更新搜索结构参数的方式自动确定的,实现了自动纵向联邦学习,不需要花费大量人力物力预先设置模型结构,降低了参与纵向联邦学习的门槛,使得纵向联邦学习能够被应用于更广泛的具体任务领域中去实现具体的任务,提高了纵向联邦学习的应用范围。且建模过程中,数据方向标签方发送的是搜索网络的输出,标签方向数据方发送的是差分隐私处理后的梯度,各个参与方之间并不会直接交互数据集和模型本身,从而一定程度上保障了各个参与方的数据安全和模型信息安全。In this application, the first network output sent by the data side is received by the label side, wherein the first network output is obtained by the data side entering the first data set into the first search network; the label side fuses each One network output obtains the second network output, and the first gradient of the loss function relative to each first network output is calculated according to the second network output and the label data of the local end; each first gradient is subjected to differential privacy encryption processing to obtain each first encrypted Gradient, each first encrypted gradient is sent to the corresponding data party, so that the data party can update the search structure parameters and/or model parameters in the first search network according to the first encrypted gradient. In this application, since the gradient sent by the label party to the data party is processed by differential privacy encryption, the data party cannot know the original gradient, thus avoiding the data party from deriving the label data and feature data of the label party according to the gradient. This avoids the leakage of private data in the label side to the data side, and improves the data security of the label side during the vertical federation modeling process. Moreover, compared with the existing vertical federated learning method in which each participant needs to manually spend a lot of manpower and material resources to design the model structure in advance, the present application realizes that in the vertical federated modeling process, the data parties only need to set up their own search networks, that is, Yes, the connection between each network unit in the search network, that is, the model structure, is automatically determined by optimizing and updating the search structure parameters during the vertical federation modeling process, which realizes automatic vertical federated learning without spending a lot of manpower. The material resources pre-set the model structure, which lowers the threshold for participating in vertical federated learning, enables vertical federated learning to be applied to a wider range of specific task fields to achieve specific tasks, and improves the application scope of vertical federated learning. In the modeling process, the data sent to the tag side is the output of the search network, and the tag side sent to the data side is the gradient after differential privacy processing. To a certain extent, the data security and model information security of each participant are guaranteed.
图1为本申请实施例方案涉及的硬件运行环境的结构示意图;FIG. 1 is a schematic structural diagram of a hardware operating environment involved in a solution according to an embodiment of the present application;
图2为本申请纵向联邦建模优化方法第一实施例的流程示意图;FIG. 2 is a schematic flowchart of the first embodiment of the vertical federated modeling optimization method of the present application;
图3为本申请实施例涉及的一种差分隐私加密通信信息的自动纵向联邦学习框架图。FIG. 3 is a framework diagram of an automatic vertical federated learning of differential privacy encrypted communication information involved in an embodiment of the application.
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。It should be understood that the specific embodiments described herein are only used to explain the present application, but not to limit the present application.
如图1所示,图1是本申请实施例方案涉及的硬件运行环境的设备结构示意图。As shown in FIG. 1 , FIG. 1 is a schematic diagram of a device structure of a hardware operating environment involved in the solution of the embodiment of the present application.
需要说明的是,本申请实施例纵向联邦建模优化设备可以是智能手机、个人计算机和服务器等设备,在此不做具体限制。It should be noted that, the vertical federation modeling and optimization device in this embodiment of the present application may be devices such as a smart phone, a personal computer, and a server, which are not specifically limited herein.
如图1所示,该纵向联邦建模优化设备可以包括:处理器1001,例如CPU,网络接口1004,用户接口1003,存储器1005,通信总线1002。其中,通信总线1002用于实现这些组件之间的连接通信。用户接口1003可以包括显示屏(Display)、输入单元比如键盘(Keyboard),可选用户接口1003还可以包括标准的有线接口、无线接口。网络接口1004可选的可以包括标准的有线接口、无线接口(如WI-FI接口)。存储器1005可以是高速RAM存储器,也可以是稳定的存储器(non-volatile memory),例如磁盘存储器。存储器1005可选的还可以是独立于前述处理器1001的存储装置。As shown in FIG. 1 , the vertical federated modeling optimization device may include: a processor 1001 , such as a CPU, a network interface 1004 , a user interface 1003 , a memory 1005 , and a communication bus 1002 . Among them, the communication bus 1002 is used to realize the connection and communication between these components. The user interface 1003 may include a display screen (Display), an input unit such as a keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a wireless interface. Optionally, the network interface 1004 may include a standard wired interface and a wireless interface (eg, a WI-FI interface). The memory 1005 may be high-speed RAM memory, or may be non-volatile memory, such as disk memory. Optionally, the memory 1005 may also be a storage device independent of the aforementioned processor 1001 .
本领域技术人员可以理解,图1中示出的设备结构并不构成对纵向联邦建模优化设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。Those skilled in the art can understand that the device structure shown in FIG. 1 does not constitute a limitation on the vertical federated modeling optimization device, and may include more or less components than the one shown, or combine some components, or different Component placement.
如图1所示,作为一种计算机存储介质的存储器1005中可以包括操作系统、网络通信模块、用户接口模块以及纵向联邦建模优化程序。其中,操作系统是管理和控制设备硬件和软件资源的程序,支持纵向联邦建模优化程序以及其它软件或程序的运行。As shown in FIG. 1 , the memory 1005 as a computer storage medium may include an operating system, a network communication module, a user interface module, and a vertical federation modeling optimization program. Among them, the operating system is a program that manages and controls the hardware and software resources of the device, and supports the operation of the vertical federation modeling optimization program and other software or programs.
当图1所示设备为参与纵向联邦建模的标签方时,在图1所示的设备中,用户接口1003主要用于与客户端进行数据通信;网络接口1004主要用于与参与纵向联邦建模的数据方建立通信连接,各数据方中分别部署有基于各自数据特征构建的第一数据集和第一搜索网络;处理器1001可以用于调用存储器1005中存储的纵向联邦建模优化程序,并执行以下操作:When the device shown in FIG. 1 is a tag party participating in the vertical federation modeling, in the device shown in FIG. 1 , the user interface 1003 is mainly used for data communication with the client; The data side of the model establishes a communication connection, and each data side is respectively deployed with a first data set and a first search network constructed based on their respective data characteristics; the processor 1001 can be used to call the vertical federation modeling optimization program stored in the memory 1005, and do the following:
接收所述数据方发送的第一网络输出,其中,所述第一网络输出是所述数据方将所述第一数据集输入所述第一搜索网络得到的;receiving the first network output sent by the data party, wherein the first network output is obtained by the data party inputting the first data set into the first search network;
融合各所述第一网络输出得到第二网络输出,并根据所述第二网络输出和本端的标签数据计算损失函数相对于各所述第一网络输出的第一梯度;Fusing each of the first network outputs to obtain a second network output, and calculating the first gradient of the loss function relative to each of the first network outputs according to the second network output and the label data of the local end;
将各所述第一梯度进行差分隐私加密处理得到各第一加密梯度,将各所述第一加密梯度发送给对应的数据方,以供所述数据方根据所述第一加密梯度更新所述第一搜索网络中的搜索结构参数和/或模型参数。Performing differential privacy encryption processing on each of the first gradients to obtain each of the first encryption gradients, and sending each of the first encryption gradients to the corresponding data party for the data party to update the first encryption gradient according to the first encryption gradient. Search structure parameters and/or model parameters in the first search network.
进一步地,所述接收所述数据方发送的第一网络输出,其中,所述第一网络输出是所述数据方将所述第一数据集输入所述第一搜索网络得到的的步骤包括:Further, the step of receiving the first network output sent by the data party, wherein the first network output is obtained by the data party entering the first data set into the first search network includes:
接收所述数据方发送的第一网络输出,其中,所述第一网络输出是所述数据方将所述第一数据集输入所述第一搜索网络进行处理得到网络原始输出,并对所述网络原始输出进行差分隐私加密处理后得到的。Receive the first network output sent by the data party, wherein the first network output is that the data party inputs the first data set into the first search network for processing to obtain the original network output, and analyzes the The original output of the network is obtained after differential privacy encryption processing.
进一步地,所述标签方部署有输出网络以及基于所述标签方的数据特征构建的第二数据集和第二搜索网络,Further, the label side is deployed with an output network and a second data set and a second search network constructed based on the data features of the label side,
所述融合各所述第一网络输出得到第二网络输出的步骤包括:The step of fusing each of the first network outputs to obtain the second network output includes:
将所述第二数据集输入所述第二搜索网络得到第三网络输出;Inputting the second data set into the second search network to obtain a third network output;
将所述第三网络输出和各所述第一网络输出进行拼接后输入所述输出网络得到第二网络输出;After splicing the third network output and each of the first network outputs, input the output network to obtain the second network output;
所述根据所述第二网络输出和本端的标签数据计算损失函数相对于各所述第一网络输出的第一梯度的步骤之后,处理器1001还可以用于调用存储器1005中存储的纵向联邦建模优化程序,并执行以下操作:After the step of calculating the first gradient of the loss function relative to each of the first network outputs according to the second network output and the local label data, the processor 1001 may also be used to call the vertical federation construction stored in the memory 1005. modulo optimizer and do the following:
根据所述第二网络输出和所述标签数据计算损失函数相对于所述第二搜索网络中目标参数的第二梯度,并根据所述第二梯度更新所述目标参数,其中,所述目标参数是所述第二搜索网络中的搜索结构参数和/或模型参数。A second gradient of the loss function relative to the target parameter in the second search network is calculated according to the second network output and the label data, and the target parameter is updated according to the second gradient, wherein the target parameter are search structure parameters and/or model parameters in the second search network.
进一步地,所述差分隐私加密处理包括裁剪处理和添加高斯噪声处理,所述将各所述第一梯度进行差分隐私加密处理得到各第一加密梯度的步骤包括:Further, the differential privacy encryption processing includes clipping processing and Gaussian noise addition processing, and the step of performing differential privacy encryption processing on each of the first gradients to obtain each first encrypted gradient includes:
对所述第一梯度进行裁剪处理得到第一裁剪梯度,其中,所述第一裁剪梯度的二阶范数小于或等于第一预设阈值;Perform clipping processing on the first gradient to obtain a first clipping gradient, wherein the second-order norm of the first clipping gradient is less than or equal to a first preset threshold;
生成服从目标高斯分布的噪声阵列,其中,所述目标高斯分布的均值为0,均方差为第二预设阈值,所述噪声阵列中各元素与所述第一梯度中各元素一一对应;generating a noise array that obeys the target Gaussian distribution, wherein the mean value of the target Gaussian distribution is 0, the mean square error is a second preset threshold, and each element in the noise array corresponds to each element in the first gradient one-to-one;
采用所述噪声阵列对所述第一梯度进行添加高斯噪声处理得到第一加密梯度。The first encrypted gradient is obtained by adding Gaussian noise to the first gradient by using the noise array.
进一步地,所述生成服从目标高斯分布的噪声阵列的步骤之前,还包括:Further, before the step of generating the noise array subject to the target Gaussian distribution, the method further includes:
获取本次纵向联邦建模的隐私级别和建模进度;Obtain the privacy level and modeling progress of this vertical federation modeling;
根据所述隐私级别和所述建模进度设置所述第二预设阈值。The second preset threshold is set according to the privacy level and the modeling progress.
当图1所示设备为参与纵向联邦建模的数据方时,在图1所示的设备中,用户接口1003主要用于与客户端进行数据通信;网络接口1004主要用于与参与纵向联邦建模的标签方建立通信连接,各数据方中分别部署有基于各自数据特征构建的第一数据集和第一搜索网络;处理器1001可以用于调用存储器1005中存储的纵向联邦建模优化程序,并执行以下操作:When the device shown in FIG. 1 is a data party participating in the vertical federation modeling, in the device shown in FIG. 1 , the user interface 1003 is mainly used for data communication with the client; The tag side of the model establishes a communication connection, and each data side is respectively deployed with a first data set and a first search network constructed based on their respective data characteristics; the processor 1001 can be used to call the vertical federation modeling optimization program stored in the memory 1005, and do the following:
将所述第一数据集输入所述第一搜索网络得到网络原始输出;Inputting the first data set into the first search network to obtain the original output of the network;
对所述网络原始输出进行差分隐私加密处理得到第一网络输出;Performing differential privacy encryption processing on the original output of the network to obtain a first network output;
将所述第一网络输出发送给参与纵向联邦建模的标签方,以供所述标签方对从各数据方接收到的所述第一网络输出进行融合得到第二网络输出后,根据所述第二网络输出和所述标签方的标签数据计算损失函数相对于各所述第一网络输出的第一梯度,并将所述第一梯度并返回给对应的数据方;Send the first network output to the label party participating in the vertical federation modeling, so that the label party can fuse the first network output received from each data party to obtain the second network output, according to the The second network output and the label data of the label side calculate the first gradient of the loss function relative to each of the first network outputs, and return the first gradient to the corresponding data side;
根据从所述标签方接收到的所述第一梯度更新所述第一搜索网络中的搜索结构参数和/或模型参数。Search structure parameters and/or model parameters in the first search network are updated according to the first gradient received from the tag side.
进一步地,数据方的所述第一搜索网络中搜索结构参数包括第一搜索网络中网络单元之间连接操作对应的权重,所述根据从所述标签方接收到的所述第一梯度更新所述第一搜索网络中的搜索结构参数和/或模型参数的步骤之后,处理器1001还可以用于调用存储器1005中存储的纵向联邦建模优化程序,并执行以下操作:Further, the search structure parameters in the first search network of the data side include weights corresponding to the connection operations between network elements in the first search network, and the said first gradient received from the label side is updated according to the weight. After the above-described steps of searching for structural parameters and/or model parameters in the first search network, the processor 1001 can also be used to call the vertical federation modeling optimization program stored in the memory 1005, and perform the following operations:
根据更新参数后的第一搜索网络中的搜索结构参数从各连接操作中选取保留操作;According to the search structure parameters in the first search network after updating the parameters, a reservation operation is selected from each connection operation;
将各所述保留操作和各所述保留操作连接的网络单元所构成的模型作为目标模型。The model formed by each of the reservation operations and the network elements connected to each of the reservation operations is taken as a target model.
基于上述的结构,提出纵向联邦建模优化方法的各实施例。Based on the above structure, various embodiments of the vertical federated modeling optimization method are proposed.
参照图2,图2为本申请纵向联邦建模优化方法第一实施例的流程示意图。需要说明的是,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。本申请纵向联邦建模优化方法应用于参与纵向联邦学习的标签方,所述标签方与参与纵向联邦学习的各数据方通信连接,各数据方分别部署有基于各自数据特征构建的第一数据集和第一搜索网络,数据方和标签方可以是智能手机、个人计算机和服务器等设备。在本实施例中,纵向联邦建模优化方法包括:Referring to FIG. 2 , FIG. 2 is a schematic flowchart of the first embodiment of the vertical federation modeling optimization method of the present application. It should be noted that although a logical order is shown in the flowcharts, in some cases, the steps shown or described may be performed in an order different from that herein. The vertical federated modeling optimization method of the present application is applied to the labeling party participating in the vertical federated learning. The labeling party is connected to each data party participating in the vertical federated learning. With the first search network, the data party and the tag party can be devices such as smart phones, personal computers, and servers. In this embodiment, the vertical federated modeling optimization method includes:
步骤S10,接收所述数据方发送的第一网络输出,其中,所述第一网络输出是所述数据方将所述第一数据集输入所述第一搜索网络得到的;Step S10, receiving the first network output sent by the data party, wherein the first network output is obtained by the data party inputting the first data set into the first search network;
在本实施例中,纵向联邦学习中的参与方分为两类,一类是拥有标签数据的标签方,一类是没有标签数据但拥有特征数据的数据方,一般情况下,标签方有一个,数据方有一个或多个。各个数据方可分别部署有基于各自数据特征构建的数据集和搜索网络;若标签方也拥有特征数据时,则标签方也可作为一个数据方,并部署基于其数据特征构建的数据集和搜索网络,既执行标签方的任务也执行数据方的任务。以下为避免指代不清,当将数据方和标签方单独描述时,将数据方的数据集和搜索网络称第一数据集和第一搜索网络,将标签方的数据集和搜索网络称第二数据集和第二搜索网络以示区分。各个参与方的数据集的样本维度是对齐的,也即,各个数据集的样本ID是相同的,但是各个参与方的数据特征可各不相同。各个参与方预先可采用加密样本对齐的方式来构建样本维度对齐的数据集,在此不进行详细赘述。搜索网络是指用于进行网络结构搜索(NAS)的网络,在本实施例中,各个参与方的搜索网络可以是各自预先根据DARTS(Differentiable
Architecture Search,可微结构搜索)方法设计的网络。In this embodiment, the participants in the vertical federated learning are divided into two categories, one is the labeling party with label data, and the other is the data party that has no label data but has feature data. Generally, the labeling party has one , the data side has one or more. Each data party can deploy a dataset and a search network based on its own data features; if the tag party also has feature data, the tag party can also act as a data party and deploy the data set and search network based on its data features. The network performs both the tasks of the label side and the tasks of the data side. In order to avoid unclear references, when the data side and the label side are described separately, the data side and the search network of the data side are called the first data set and the first search network, and the data set and the search network of the label side are called the first The second dataset and the second search network are distinguished. The sample dimensions of the data sets of each participant are aligned, that is, the sample IDs of each data set are the same, but the data characteristics of each participant may be different. Each participant may use the encrypted sample alignment method in advance to construct a sample dimension-aligned data set, which will not be described in detail here. The search network refers to a network used for network structure search (NAS). In this embodiment, the search network of each participant may
Architecture Search, microstructure search) method to design the network.
搜索网络中包括多个单元,每个单元对应一个网络层,其中部分单元之间设置有连接操作,以其中两个单元为例,这两个单元之前的连接操作可以是预先设置的N种连接操作,并定义了每种连接操作对应的权重,该权重即搜索网络的搜索结构参数,单元内的网络层参数即搜索网络的模型参数。在模型训练过程中,需要进行网络结构搜索以优化更新搜索结构参数和模型参数,基于最终更新的搜索结构参数即可确定最终的网络结构,即确定保留哪个或哪些连接操作。由于该网络的结构是经过网络搜索之后才确定的,各个参与方不需要像设计传统纵向联邦学习的模型一样去设置模型的网络结构,从而降低了设计模型的难度。The search network includes multiple units, each unit corresponds to a network layer, and some units are provided with connection operations. Taking two units as an example, the connection operations before these two units can be preset N types of connections operation, and defines the weight corresponding to each connection operation, the weight is the search structure parameter of the search network, and the network layer parameters in the unit are the model parameters of the search network. In the model training process, a network structure search is required to optimize and update the search structure parameters and model parameters. Based on the final updated search structure parameters, the final network structure can be determined, that is, which connection operation or operations to retain. Since the structure of the network is determined after a network search, each participant does not need to set the network structure of the model like designing a traditional vertical federated learning model, thus reducing the difficulty of designing the model.
各个参与方的搜索网络组合构成一个任务模型,各搜索网络的网络输出经过融合后得到任务模型最终的输出。进一步地,任务模型还可包括用于对各个搜索网络的网络输出进行融合的输出网络,该输出网络被设置于连接在各个参与方的搜索网络之后,以各个搜索网络的输出数据作为输入数据,输出网络的输出结果即作为任务模型的最终输出。输出网络可部署于标签方,输出网络可以采用全连接层,或者其他复杂的神经网络结构,具体可根据模型预测任务不同而不同;输出网络的输出结果的形式也可根据具体模型预测任务设置,例如,当模型预测任务是图像分类时,输出网络的输出结果是输入图像所属的类别。The search network combination of each participant constitutes a task model, and the network output of each search network is fused to obtain the final output of the task model. Further, the task model may further include an output network for fusing the network outputs of each search network, the output network is set after the search network connected to each participant, and the output data of each search network is used as input data, The output of the output network is used as the final output of the task model. The output network can be deployed on the label side, and the output network can use a fully connected layer or other complex neural network structure, which can vary according to the model prediction task; the output form of the output network can also be set according to the specific model prediction task. For example, when the model prediction task is image classification, the output of the output network is the class to which the input image belongs.
各参与方需要联合一起优化更新任务模型,也即,联合更新各自搜索网络中的模型结构参数和模型参数,最终得到符合预测准确率要求的任务模型。具体地,在联合进行参数更新的过程中,数据方要更新各自搜索网络中的参数,则需要标签方中的标签数据来计算损失函数和梯度,标签方要计算损失函数和梯度,则需要数据方中的数据集和搜索网络。由于各参与方中的数据集以及标签方中的标签数据都可能是隐私数据,例如各个银行之间联合建模时,数据往往来源于办理银行相关业务的用户的数据,若数据方和标签方之间直接交互数据集中的数据、模型结构和模型参数,则各参与方之间会泄露数据隐私。因此,在本实施例中,数据方和标签方之间可交互用于更新各自搜索网络中模型参数和搜索结构参数的中间结果,并基于接收到的中间结果更新各自的搜索网络中模型参数和搜索结构参数,以对各自搜索网络进行更新,进而完成任务模型的更新。中间结果可以是参数的梯度,也可以是搜索网络的输出数据。具体地,当参与方是数据方时,发送给标签方的中间结果可以是该端搜索网络的输出数据;当参与方是标签方时,发送给数据方的中间结果可以是计算得到的数据方所发送的输出数据对应的梯度。All participants need to jointly optimize and update the task model, that is, jointly update the model structure parameters and model parameters in their respective search networks, and finally obtain a task model that meets the prediction accuracy requirements. Specifically, in the process of joint parameter update, if the data side wants to update the parameters in their respective search networks, it needs the label data in the label side to calculate the loss function and gradient, and the label side needs to calculate the loss function and gradient. Datasets and Search Networks in Square. Because the data sets in each participant and the label data in the label party may be private data, for example, when the joint modeling between banks, the data often comes from the data of users who handle banking-related business, if the data party and the label party If the data, model structure and model parameters in the dataset are directly interacted with each other, data privacy will be leaked between the participants. Therefore, in this embodiment, the data side and the label side can interact with each other to update the intermediate results of the model parameters and search structure parameters in the respective search networks, and update the model parameters and the search network based on the received intermediate results. The search structure parameters are used to update the respective search networks, thereby completing the update of the task model. The intermediate result can be the gradient of the parameters or the output data of the search network. Specifically, when the participant is the data side, the intermediate result sent to the tag side may be the output data of the search network at the end; when the participant is the tag side, the intermediate result sent to the data side may be the calculated data side. The gradient corresponding to the output data sent.
各个参与方可进行多轮联合更新参数。在一轮联合更新参数的过程中,数据方向标签方发送一次网络输出,标签方向数据方发送一次梯度,各参与方可以仅更新各自搜索网络中的模型结构参数,也可以仅更新各自搜索网络中的模型参数,也可以同时更新各自搜索网络中的模型结构参数和模型参数,也即,可更新模型结构参数和/或模型参数。经过多轮联合更新参数后,各参与方的搜索网络中的模型结构参数和模型参数均得到多次更新。具体地各轮联合更新参数时各参与方具体更新哪种参数可预先统一设置。Each participant can jointly update parameters in multiple rounds. In the process of a round of joint parameter update, the data sends a network output to the label side, and the label sends a gradient to the data side. Each participant can only update the model structure parameters in their respective search networks, or they can only update their respective search networks. The model parameters can also be updated simultaneously with the model structure parameters and model parameters in the respective search networks, that is, the model structure parameters and/or model parameters can be updated. After multiple rounds of jointly updating parameters, the model structure parameters and model parameters in the search network of each participant are updated multiple times. Specifically, in each round of joint parameter update, which parameter each participant will update can be set uniformly in advance.
具体地,在一轮联合更新参数的过程中,各个数据方将各自的第一数据集输入各自的第一搜索网络,经过第一搜索网络的处理得到输出结果,基于输出结果得到第一网络输出。各数据方将各自的第一网络输出发送给标签方。其中,数据方可将第一搜索网络的输出结果(也可称网络原始输出)直接作为第一网络输出,也可以将输出结果采用加密算法进行加密,将加密结果作为第一网络输出。例如采用同态加密方法进行加密或差分隐私加密方法进行加密。标签方接收各个数据方发送的第一网络输出。Specifically, in the process of jointly updating parameters in a round, each data party inputs their respective first data sets into their respective first search networks, obtains output results after processing by the first search networks, and obtains the output of the first network based on the output results . The data parties send their respective first network outputs to the tag parties. The data party can directly use the output result of the first search network (also referred to as the original output of the network) as the first network output, or encrypt the output result with an encryption algorithm, and use the encrypted result as the first network output. For example, the homomorphic encryption method is used for encryption or the differential privacy encryption method is used for encryption. The tag side receives the first network output sent by each data side.
需要说明的是,参与方在各轮联合更新参数中可采用不同的数据集。具体地,参与方可将总的数据集划分为多个小的训练集(也可称为数据批),每轮采用一个小数据集参与联合更新参数,或者,参与方也可以是每轮联合参数更新前,从总的数据集中进行有放回的采样一批数据来参与该轮的联合参数更新。It should be noted that the participants may use different data sets in each round of joint update parameters. Specifically, the participants can divide the total data set into multiple small training sets (also referred to as data batches), and each round uses a small data set to participate in the joint update of parameters, or the participants can also jointly update the parameters in each round Before parameter update, a batch of data is sampled with replacement from the total data set to participate in the joint parameter update of this round.
步骤S20,融合各所述第一网络输出得到第二网络输出,并根据所述第二网络输出和本端的标签数据计算损失函数相对于各所述第一网络输出的第一梯度;Step S20, fusing each of the first network outputs to obtain a second network output, and calculating the first gradient of the loss function relative to each of the first network outputs according to the second network output and the label data of the local end;
标签方融合各个第一网络输出得到第二网络输出。具体地,标签方可将各第一网络输出进行平均得到第二网络输出,或当标签方部署有输出网络时,可将各第一网络输出进行拼接后输入输出网络中,经过输出网络的处理得到第二网络输出。其中,拼接的方式可以是进行向量拼接。标签方根据第二网络输出和标签方的标签数据计算损失函数,该损失函数可以是回归问题的均方误差或分类问题的交叉熵损失等,并计算损失函数相对于各个第一网络输出的第一梯度。根据损失函数计算梯度的方式可参照链式法则和梯度下降算法,在此不进行详细赘述。The label side fuses each first network output to obtain the second network output. Specifically, the tag side can average the outputs of each first network to obtain the second network output, or when the tag side deploys an output network, the output of each first network can be spliced and input into the output network, and processed by the output network. Get the second network output. The method of splicing may be vector splicing. The label side calculates a loss function according to the output of the second network and the label data of the label side. The loss function can be the mean square error of the regression problem or the cross entropy loss of the classification problem, etc., and calculates the loss function relative to the first network output. a gradient. The method of calculating the gradient according to the loss function can refer to the chain rule and the gradient descent algorithm, and will not be described in detail here.
步骤S30,将各所述第一梯度进行差分隐私加密处理得到各第一加密梯度,将各所述第一加密梯度发送给对应的数据方,以供所述数据方根据所述第一加密梯度更新所述第一搜索网络中的搜索结构参数和/或模型参数。Step S30, performing differential privacy encryption processing on each of the first gradients to obtain each first encryption gradient, and sending each of the first encryption gradients to the corresponding data party, so that the data party can use the first encryption gradient according to the first encryption gradient. The search structure parameters and/or model parameters in the first search network are updated.
标签方在计算得到各个第一网络输出对应的第一梯度后,将第一梯度分别进行差分隐私加密处理得到各个第一加密梯度。其中,差分隐私(differential privacy)是密码学中的一种手段,旨在提供一种当从统计数据库查询时,最大化数据查询的准确性,同时最大限度减少识别其记录的机会,一种常见的差分隐私方法是在数据上添加随机化的噪音,本实施例中差分隐私加密处理方式可采用现有的差分隐私加密处理方式,在此不进行详细赘述。After calculating and obtaining the first gradient corresponding to each first network output, the tag side performs differential privacy encryption processing on the first gradient to obtain each first encryption gradient. Among them, differential privacy (differential privacy) is a means in cryptography, which aims to provide a way to maximize the accuracy of data query when querying from statistical databases, while minimizing the chance of identifying its records. A common The differential privacy method described above is to add random noise to the data. In this embodiment, the differential privacy encryption processing method may adopt the existing differential privacy encryption processing method, which will not be described in detail here.
标签方将各个第一加密梯度发送给对应的数据方,也即,第一加密梯度对应的第一网络输出是由哪个数据方发送的,就将该第一加密梯度发送给哪个数据方。数据方在接收到第一加密梯度后,根据第一加密梯度更新其第一搜索网络中的搜索结构参数和/或模型参数。具体地,数据方根据链式法则和梯度下降算法,根据第一加密梯度计算得到损失函数相对于其搜索网络中搜索结构参数和/或模型参数的梯度,并根据梯度对应更新搜索结构参数和/或模型参数。也即分为三种情况,第一种:根据第一加密梯度计算得到损失函数相对于搜索结构参数的梯度,根据该梯度更新搜索结构参数;第二种:根据第一加密梯度计算损失函数相对于模型参数的梯度,根据该梯度更新模型参数;第三种:根据第一加密梯度计算损失函数相对于搜索结构参数的梯度,根据该梯度更新搜索结构参数,并根据第一加密梯度计算损失函数相对于模型参数的梯度,根据该梯度更新模型参数。至此完成一轮联合更新参数的过程。The tag side sends each first encryption gradient to the corresponding data side, that is, which data side sends the first network output corresponding to the first encryption gradient to which data side the first encryption gradient is sent. After receiving the first encryption gradient, the data party updates the search structure parameters and/or model parameters in its first search network according to the first encryption gradient. Specifically, according to the chain rule and gradient descent algorithm, the data party calculates the gradient of the loss function relative to the search structure parameters and/or model parameters in the search network according to the first encrypted gradient, and updates the search structure parameters and/or model parameters correspondingly according to the gradient. or model parameters. That is to say, there are three cases, the first one: calculate the gradient of the loss function relative to the search structure parameters according to the first encryption gradient, and update the search structure parameters according to the gradient; the second type: calculate the relative loss function according to the first encryption gradient. According to the gradient of the model parameters, update the model parameters according to the gradient; the third type: calculate the gradient of the loss function relative to the search structure parameter according to the first encryption gradient, update the search structure parameter according to the gradient, and calculate the loss function according to the first encryption gradient The model parameters are updated according to the gradient with respect to the model parameters. At this point, a round of joint parameter updating process is completed.
经过多轮联合更新参数后,参与方可根据其更新参数后的搜索网络得到目标模型。具体地,参与方的搜索网络中搜索结构参数可包括搜索网络中网络单元之间连接操作对应的权重。也即,网络单元之间设置了连接操作,每个连接操作对应一个权重。需要说明的是,并不是任意两个网络单元之间都设置有连接操作。参与方可根据其更新后的搜索网络中的搜索结构参数,从各个连接操作中选取保留操作。具体地,对于每两个存在连接操作的网络单元,其之间有多条连接操作,可从多条连接操作中选出权重大的一个或多个连接操作作为保留操作。在确定保留操作后,将各保留操作以及各个保留操作连接的网络单元所构成的模型,作为参与方的目标模型。各个参与方可采用各自的目标模型联合完成具体的模型预测任务。After several rounds of jointly updating the parameters, the participants can obtain the target model according to the search network after updating the parameters. Specifically, the search structure parameters in the search network of the participants may include weights corresponding to connection operations between network elements in the search network. That is, connection operations are set between network units, and each connection operation corresponds to a weight. It should be noted that a connection operation is not set between any two network units. Participants can select a reservation operation from each connection operation according to the search structure parameters in their updated search network. Specifically, for every two network units that have connection operations, there are multiple connection operations between them, and one or more connection operations with a greater weight may be selected from the multiple connection operations as the reserved operation. After the reservation operation is determined, the model formed by each reservation operation and the network elements connected by each reservation operation is used as the target model of the participant. Each participant can use their own target models to jointly complete specific model prediction tasks.
在本实施例中,通过标签方接收数据方发送的第一网络输出,其中,第一网络输出是所述数据方将所述第一数据集输入所述第一搜索网络得到的;标签方融合各第一网络输出得到第二网络输出,并根据第二网络输出和本端的标签数据计算损失函数相对于各第一网络输出的第一梯度;将各第一梯度进行差分隐私加密处理得到各第一加密梯度,将各第一加密梯度发送给对应的数据方,以供数据方根据第一加密梯度更新第一搜索网络中的搜索结构参数和/或模型参数。在本申请中,由于标签方发送给数据方的梯度是经过差分隐私加密处理后的,数据方无法获知原始的梯度,从而避免了数据方根据梯度推导出标签方的标签数据和特征数据,也就避免了标签方中的隐私数据泄露给数据方,提升了在纵向联邦建模过程中标签方的数据安全性。并且,相比于现有纵向联邦学习中,各参与方需要人工花费大量人力物力预先设计模型结构的方式,本实施例实现了在纵向联邦建模过程中,各数据方只需要设置各自的搜索网络即可,搜索网络中各个网络单元之间的连接,也即模型结构,是在纵向联邦建模过程中通过优化更新搜索结构参数的方式自动确定的,实现了自动纵向联邦学习,不需要花费大量人力物力预先设置模型结构,降低了参与纵向联邦学习的门槛,使得纵向联邦学习能够被应用于更广泛的具体任务领域中去实现具体的任务,提高了纵向联邦学习的应用范围。且建模过程中,数据方向标签方发送的是搜索网络的输出,标签方向数据方发送的是差分隐私处理后的梯度,各个参与方之间并不会直接交互数据集和模型本身,从而一定程度上保障了各个参与方的数据安全和模型信息安全。In this embodiment, the first network output sent by the data side is received by the tag side, wherein the first network output is obtained by the data side inputting the first data set into the first search network; the tag side fuses The output of each first network obtains the output of the second network, and the first gradient of the loss function relative to the output of each first network is calculated according to the output of the second network and the label data of the local end; An encrypted gradient, each first encrypted gradient is sent to the corresponding data party, so that the data party can update the search structure parameters and/or model parameters in the first search network according to the first encrypted gradient. In this application, since the gradient sent by the label party to the data party is processed by differential privacy encryption, the data party cannot know the original gradient, thus avoiding the data party from deriving the label data and feature data of the label party according to the gradient. This avoids the leakage of private data in the label side to the data side, and improves the data security of the label side during the vertical federation modeling process. Moreover, compared with the existing vertical federated learning method in which each participant needs to spend a lot of manpower and material resources to pre-design the model structure, this embodiment realizes that in the vertical federated modeling process, each data party only needs to set up their own search methods. The network is enough. The connection between each network unit in the search network, that is, the model structure, is automatically determined by optimizing and updating the search structure parameters in the vertical federation modeling process, which realizes automatic vertical federation learning without spending A large number of human and material resources are used to pre-set the model structure, which lowers the threshold for participating in vertical federated learning, so that vertical federated learning can be applied to a wider range of specific task fields to achieve specific tasks, and the application scope of vertical federated learning is improved. In the modeling process, the data sent to the tag side is the output of the search network, and the tag side sent to the data side is the gradient after differential privacy processing. To a certain extent, the data security and model information security of each participant are guaranteed.
进一步地,为进一步提升数据方的数据安全性,所述步骤S10可包括:Further, in order to further improve the data security of the data party, the step S10 may include:
步骤S101,接收所述数据方发送的第一网络输出,其中,所述第一网络输出是所述数据方将所述第一数据集输入所述第一搜索网络进行处理得到网络原始输出,并对所述网络原始输出进行差分隐私加密处理后得到的。Step S101: Receive the first network output sent by the data party, where the first network output is the original network output obtained by the data party entering the first data set into the first search network for processing, and It is obtained by performing differential privacy encryption processing on the original output of the network.
数据方可将其第一数据集输入其第一搜索网络进行处理得到网络原始输出,网络原始输出即第一搜索网络直接输出的结果。数据方将网络原始输出进行差分隐私加密处理得到第一网络输出,再将第一网络输出发送给标签方。也即,标签方接收到的各个数据方发送的第一网络输出是数据方进行差分隐私加密处理后的结果,而不是第一搜索网络的网络原始输出,标签方基于第一网络输出无法获知网络原始输出,从而避免了标签方根据网络原始输出推导出数据方的特征数据,也就进一步地避免了数据方中的隐私数据泄露给标签方,提升了数据方的数据安全性。The data party can input its first data set into its first search network for processing to obtain the original output of the network, and the original output of the network is the result directly output by the first search network. The data party performs differential privacy encryption processing on the original network output to obtain the first network output, and then sends the first network output to the label party. That is, the first network output sent by each data party received by the tag party is the result of the differential privacy encryption process performed by the data party, not the original network output of the first search network. The tag party cannot know the network based on the first network output. The original output prevents the label party from deriving the characteristic data of the data party according to the original output of the network, which further prevents the private data in the data party from leaking to the label party, and improves the data security of the data party.
进一步地,基于上述第一实施例,提出本申请纵向联邦建模优化方法第二实施例,在本实施例中,所述标签方部署有输出网络以及基于所述标签方的数据特征构建的第二数据集和第二搜索网络,所述步骤S20中融合各所述第一网络输出得到第二网络输出的步骤包括:Further, based on the above-mentioned first embodiment, a second embodiment of the vertical federated modeling optimization method of the present application is proposed. In this embodiment, the labeling party deploys an output network and a third data characteristic constructed based on the labeling party. Two datasets and a second search network, the step of fusing each of the first network outputs to obtain the second network output in the step S20 includes:
步骤S201,将所述第二数据集输入所述第二搜索网络得到第三网络输出;Step S201, inputting the second data set into the second search network to obtain a third network output;
在本实施例中,当标签方拥有特征数据时,标签方可部署基于标签方的数据特征构建的第二数据集和第二搜索网络。标签方还可部署有用于融合各个搜索网络的网络输出的输出网络。标签方在一轮联合更新参数过程中,可将第二数据集输入第二搜索网络,经过第二搜索网络的处理得到第三网络输出。In this embodiment, when the tagging party owns the feature data, the tagging party can deploy the second data set and the second search network constructed based on the data features of the tagging party. The tag side may also deploy an output network for fusing the network outputs of the various search networks. In the process of jointly updating parameters in one round, the tag side can input the second data set into the second search network, and obtain the output of the third network after processing by the second search network.
步骤S202,将所述第三网络输出和各所述第一网络输出进行拼接后输入所述输出网络得到第二网络输出;Step S202, splicing the third network output and each of the first network outputs and then inputting the output network into the output network to obtain the second network output;
标签方在融合各个第一网络输出时,将第三网络输出和各个第一网络输出进行拼接,也即,将各个网络输出进行拼接,将拼接结果输出网络,经过输出网络的处理得到第二网络输出。其中,各个网络输出可看做向量形式,对各个网络输出进行拼接可采用常用的向量拼接方式。When the label side fuses the outputs of each first network, it splices the output of the third network and the output of each first network, that is, splices the outputs of each network, outputs the splicing result to the network, and obtains the second network through the processing of the output network. output. Among them, each network output can be regarded as a vector form, and a common vector splicing method can be used for splicing each network output.
所述步骤S20之后,还包括:After the step S20, it also includes:
步骤S40,根据所述第二网络输出和所述标签数据计算损失函数相对于所述第二搜索网络中目标参数的第二梯度,并根据所述第二梯度更新所述目标参数,其中,所述目标参数是所述第二搜索网络中的搜索结构参数和/或模型参数。Step S40: Calculate the second gradient of the loss function relative to the target parameter in the second search network according to the second network output and the label data, and update the target parameter according to the second gradient, wherein the The target parameters are search structure parameters and/or model parameters in the second search network.
标签方在计算得到第二网络输出,根据第二网络输出和标签数据计算得到损失函数后,还可计算损失函数相对于第二搜索网络中目标参数的第二梯度,并根据第二梯度更新目标参数。其中,目标参数可以是第二搜索网络中的搜索结构参数和/或模型参数。也即分为三种情况,第一种:计算损失函数相对于搜索结构参数的梯度,根据该梯度更新搜索结构参数;第二种:计算损失函数相对于模型参数的梯度,根据该梯度更新模型参数;第三种:计算损失函数相对于搜索结构参数的梯度,根据该梯度更新搜索结构参数,并计算损失函数相对于模型参数的梯度,根据该梯度更新模型参数。After calculating the output of the second network and obtaining the loss function according to the output of the second network and the label data, the label side can also calculate the second gradient of the loss function relative to the target parameters in the second search network, and update the target according to the second gradient. parameter. The target parameters may be search structure parameters and/or model parameters in the second search network. That is to say, there are three cases, the first one: calculate the gradient of the loss function relative to the search structure parameters, and update the search structure parameters according to the gradient; the second: calculate the gradient of the loss function relative to the model parameters, and update the model according to the gradient The third type: calculate the gradient of the loss function relative to the search structure parameters, update the search structure parameters according to the gradient, and calculate the gradient of the loss function relative to the model parameters, and update the model parameters according to the gradient.
进一步地,基于上述第一和/或二实施例,提出本申请纵向联邦建模优化方法第三实施例。在本实施例中,所述步骤S30中将各所述第一梯度进行差分隐私加密处理得到各第一加密梯度的步骤包括:Further, based on the above-mentioned first and/or second embodiments, a third embodiment of the vertical federated modeling optimization method of the present application is proposed. In this embodiment, the step of performing differential privacy encryption processing on each of the first gradients in step S30 to obtain each of the first encrypted gradients includes:
步骤S301,对所述第一梯度进行裁剪处理得到第一裁剪梯度,其中,所述第一裁剪梯度的二阶范数小于或等于第一预设阈值;Step S301, performing clipping processing on the first gradient to obtain a first clipping gradient, wherein the second-order norm of the first clipping gradient is less than or equal to a first preset threshold;
进一步地,在本实施例中,差分隐私加密处理可包括裁剪处理和添加高斯噪声处理两步。具体地,标签方对第一梯度进行裁剪处理得到第一裁剪梯度,经过裁剪后得到的第一裁剪梯度的二阶范数小于或等于第一预设阈值。其中,第一预设阈值是标签方预先设置的一个阈值。将第一梯度裁剪为二阶范数小于或等于第一预设阈值的第一裁剪梯度,目的是使得标签方各轮联合更新参数时计算得到的第一梯度的变化幅度较大时,第一裁剪梯度的变化被限制在一定范围内,从而使得数据方无法根据第一裁剪梯度来反推出标签方的原始数据。标签方可采用任意能够达到该目的的裁剪处理方式。Further, in this embodiment, the differential privacy encryption processing may include two steps of clipping processing and adding Gaussian noise processing. Specifically, the label side performs clipping processing on the first gradient to obtain the first clipping gradient, and the second-order norm of the first clipping gradient obtained after clipping is less than or equal to the first preset threshold. The first preset threshold is a threshold preset by the tag side. The first gradient is clipped to the first clipped gradient whose second-order norm is less than or equal to the first preset threshold, so that the first gradient calculated when The change of the clipping gradient is limited to a certain range, so that the data side cannot deduce the original data of the label side according to the first clipping gradient. The label can adopt any cutting processing method that can achieve this purpose.
进一步地,一种裁剪处理方式是,对于每一个第一梯度,标签方计算该第一梯度的二阶范数与第一预设阈值的比值,并从得到的比值和1中选取较大的值,采用该第一梯度除以该较大值,得到第一裁剪梯度。根据该方法计算得到的第一裁剪梯度的二阶范数是小于或等于第一预设阈值的。根据该方法,若i表示各数据方的标号,G
i表示各数据方对应的第一梯度,则第一裁剪梯度G
i’=G
i/max(1,||Gi||
2/D
A),其中,D
A表示标签方设置的第一预设阈值。需要说明的是,根据标签方对待各个数据方的隐私级别不同,标签方可针对不同的数据方设置不同的第一预设阈值,隐私级别较高,则可设置较小的第一预设阈值,隐私级别较低,则可设置较大的第一预设阈值。
Further, a cropping processing method is that, for each first gradient, the label side calculates the ratio of the second-order norm of the first gradient to the first preset threshold, and selects the larger ratio from the obtained ratio and 1. value, divide the first gradient by the larger value to obtain the first clipping gradient. The second-order norm of the first clipping gradient calculated according to the method is less than or equal to the first preset threshold. According to this method, if i represents the label of each data square, and G i represents the first gradient corresponding to each data square, then the first clipping gradient G i '=G i /max(1, ||Gi|| 2 /D A ), where D A represents the first preset threshold set by the tag side. It should be noted that, according to the different privacy levels of each data party treated by the label party, the label party can set different first preset thresholds for different data parties. If the privacy level is higher, a smaller first preset threshold can be set. , and the privacy level is lower, a larger first preset threshold can be set.
步骤S302,生成服从目标高斯分布的噪声阵列,其中,所述目标高斯分布的均值为0,均方差为第二预设阈值,所述噪声阵列中各元素与所述第一梯度中各元素一一对应;Step S302, generating a noise array that obeys the target Gaussian distribution, wherein the mean value of the target Gaussian distribution is 0, the mean square error is a second preset threshold, and each element in the noise array is equal to each element in the first gradient. one correspondence;
步骤S303,采用所述噪声阵列对所述第一梯度进行添加高斯噪声处理得到第一加密梯度。Step S303, using the noise array to add Gaussian noise to the first gradient to obtain a first encrypted gradient.
标签方可生成服从目标高斯分布的噪声阵列,其中,目标高斯分布的均值为0,均方差为第二预设阈值,噪声阵列中各元素与第一梯度中各元素一一对应。其中,第二预设阈值可以是根据具体需要进行设置的,第二预设阈值可以是第一预设阈值的平方乘以一个系数的平方。第一梯度是矩阵形式,则生成的噪声阵列也是矩阵形式,并且噪声阵列的矩阵大小与第一梯度的矩阵大小相同。The label can generate a noise array that obeys the target Gaussian distribution, where the mean of the target Gaussian distribution is 0, the mean square error is the second preset threshold, and each element in the noise array corresponds to each element in the first gradient. The second preset threshold may be set according to specific needs, and the second preset threshold may be the square of the first preset threshold multiplied by the square of a coefficient. If the first gradient is in matrix form, the generated noise array is also in matrix form, and the matrix size of the noise array is the same as that of the first gradient.
标签方采用噪声阵列对各个第一梯度进行添加高斯噪声处理得到各个第一加密梯度。具体地,对于每一个第一梯度,标签方将噪声阵列与第一梯度相加,也即,将第一梯度中各元素分别加上噪声阵列中对应位置的元素。由于第一加密梯度是经过裁剪处理和添加噪声处理后得到的结果,数据方无法根据第一加密梯度获知原始的第一梯度,从而无法推导出标签方的原始数据,从而提高了标签方的数据隐私。The tag side uses a noise array to add Gaussian noise to each first gradient to obtain each first encrypted gradient. Specifically, for each first gradient, the tag side adds the noise array to the first gradient, that is, adds each element in the first gradient to the element at the corresponding position in the noise array. Since the first encryption gradient is the result obtained after clipping and adding noise, the data side cannot know the original first gradient according to the first encryption gradient, so the original data of the label side cannot be deduced, thereby improving the data of the label side. privacy.
进一步地,所述方法还包括:Further, the method also includes:
步骤S50,获取本次纵向联邦建模的隐私级别和建模进度;Step S50, obtaining the privacy level and modeling progress of this vertical federation modeling;
步骤S60,根据所述隐私级别和所述建模进度设置所述第二预设阈值。Step S60, setting the second preset threshold according to the privacy level and the modeling progress.
标签方可在纵向联邦建模过程中设置第二预设阈值,也即,在各轮联合更新参数时,标签方可采用不同的第二预设阈值。具体地,标签方可获取本次纵向联邦建模的隐私级别和当前的建模进度。可以预先设置不同的隐私级别所对应的基准阈值,预先设置不同的建模进度对应的阈值变化幅度,其中,阈值变化幅度可以是负也可以是正,建模进度可以是损失函数的收敛速度、联合更新参数的轮次或时长。The tag side can set the second preset threshold during the vertical federation modeling process, that is, the tag side can use a different second preset threshold when parameters are jointly updated in each round. Specifically, the label can obtain the privacy level of this vertical federation modeling and the current modeling progress. The benchmark threshold corresponding to different privacy levels can be preset, and the threshold change range corresponding to different modeling progress can be preset, wherein the threshold change range can be negative or positive, and the modeling progress can be the convergence speed of the loss function, joint The rounds or duration of updating parameters.
标签方获取到本次纵向联邦建模的隐私级别后,可根据映射级别确定对应的基准阈值,并根据当前的建模进度确定阈值变化幅度,将基准阈值加上阈值变化幅度,得到第二预设阈值。隐私级别与基准阈值之间的对应关系可以是级别越高基准阈值越大,级别越低基准阈值越小,从而使得隐私级别越高时添加的噪声越大,隐私级别较低时添加的噪声越小,灵活地根据隐私级别来设置噪声大小,避免了噪声过大导致数据失真,影响模型的预测准确率After obtaining the privacy level of this vertical federation modeling, the tag side can determine the corresponding benchmark threshold according to the mapping level, and determine the threshold change range according to the current modeling progress, and add the benchmark threshold value to the threshold change range to obtain the second prediction value. Set the threshold. The correspondence between the privacy level and the reference threshold may be that the higher the level is, the larger the reference threshold is, and the lower the level is, the smaller the reference threshold is, so that the higher the privacy level is, the more noise is added, and the lower the privacy level is. Small, flexibly set the noise size according to the privacy level, avoiding data distortion caused by excessive noise and affecting the prediction accuracy of the model
当建模进度是收敛速度时,收敛速度与阈值变化幅度之间的关系可以是收敛速度越快阈值变化幅度越大,收敛速度越慢阈值变化幅度越小,以使得当收敛速度比较慢,难以收敛时,能够通过较小的阈值变化幅度(可能为负),使得第二预设阈值减小,从而使得噪声减小,以促进损失函数的收敛,保证模型的预测准确率。当建模进度是联合更新参数的轮次时,轮次与阈值变化幅度之间的关系可以是轮次越大阈值变化幅度越小,轮次越小阈值变化幅度越大,以使得随着联合更新参数的轮次增多,损失函数趋近于收敛时,第二预设阈值越来越小,从而使得噪声逐渐减小,以促进损失函数的收敛,保证模型的预测准确率。当建模进度是联合更新参数的时长时,时长与阈值变化幅度之间的关系可以是时长越大阈值变化幅度越小,时长越小阈值变化幅度越大,以使得随着联合更新参数的时长越来越长,损失函数趋近于收敛时,第二预设阈值越来越小,从而使得噪声逐渐减小,以促进损失函数的收敛,保证模型的预测准确率。When the modeling progress is the convergence speed, the relationship between the convergence speed and the threshold change range can be that the faster the convergence speed, the larger the threshold change range, and the slower the convergence speed, the smaller the threshold change range, so that when the convergence speed is relatively slow, it is difficult to When converging, the second preset threshold can be reduced by a smaller threshold variation range (possibly negative), thereby reducing the noise, so as to promote the convergence of the loss function and ensure the prediction accuracy of the model. When the modeling progress is the rounds of jointly updating the parameters, the relationship between the rounds and the threshold change range may be that the larger the round, the smaller the threshold change range, and the smaller the round, the larger the threshold change range, so that with the joint As the number of times of updating parameters increases, and the loss function tends to converge, the second preset threshold becomes smaller and smaller, so that the noise is gradually reduced, so as to promote the convergence of the loss function and ensure the prediction accuracy of the model. When the modeling progress is the duration of jointly updating the parameters, the relationship between the duration and the threshold change range may be that the longer the duration is, the smaller the threshold change range is, and the smaller the duration is, the larger the threshold change range is, so that with the duration of the joint update parameters When the length becomes longer and the loss function tends to converge, the second preset threshold becomes smaller and smaller, so that the noise is gradually reduced, so as to promote the convergence of the loss function and ensure the prediction accuracy of the model.
进一步地,如图3所示,为一种差分隐私加密通信信息的自动纵向联邦学习框架。A表示标签方,B表示数据方,i表示数据方的标号,N为数据方的个数。A拥有特征数据X
A和对应的标签数据Y
A,B
1、…、B
N分别拥有特征数据X
1、…、X
N。特征数据X
A、X
1、…、X
N有不同分布的数据特征。各参与方有一个搜索网络,即,Net
A、Net
1、…、Net
N,其对应的模型参数和搜索结构参数分别为W
A、W
1、…、W
N和α
A、α
1、…、α
N。A还部署有一个输出网络Net
out,用于计算Y
out。图中右下角的clip(x)表示对x进行裁剪处理,+N(0,σ
2)表示在裁剪处理结果上添加高斯噪声。
Further, as shown in Figure 3, it is an automatic vertical federated learning framework for differential privacy encrypted communication information. A represents the label side, B represents the data side, i represents the label of the data side, and N is the number of the data side. A has feature data X A and corresponding label data Y A , and B 1 , . . . , B N have feature data X 1 , . . . , X N respectively. The characteristic data X A , X 1 , . . . , X N have data characteristics of different distributions. Each participant has a search network, namely, Net A , Net 1 ,..., Net N , and the corresponding model parameters and search structure parameters are W A , W 1 ,..., W N and α A , α 1 ,... , α N . A also deploys an output network Net out for computing Y out . The clip(x) in the lower right corner of the figure indicates that x is clipped, and +N(0, σ 2 ) indicates that Gaussian noise is added to the clipping result.
进一步地,基于上述第一、第二和/或第三实施例,提出本申请纵向联邦建模优化方法第四实施例。在本实施例中,所述方法应用于参与纵向联邦建模的数据方,各数据方中分别部署有基于各自数据特征构建的第一数据集和第一搜索网络,所述方法包括以下步骤:Further, based on the above-mentioned first, second and/or third embodiments, a fourth embodiment of the vertical federated modeling optimization method of the present application is proposed. In this embodiment, the method is applied to data cubes participating in vertical federation modeling, and each data cube is respectively deployed with a first data set and a first search network constructed based on respective data features, and the method includes the following steps:
步骤A10,将所述第一数据集输入所述第一搜索网络得到网络原始输出;Step A10, inputting the first data set into the first search network to obtain the original output of the network;
在本实施例中,纵向联邦学习中的参与方分为两类,一类是拥有标签数据的标签方,一类是没有标签数据但拥有特征数据的数据方,一般情况下,标签方有一个,数据方有一个或多个。各个数据方可分别部署有基于各自数据特征构建的数据集和搜索网络;若标签方也拥有特征数据时,则标签方也可作为一个数据方,并部署基于其数据特征构建的数据集和搜索网络,既执行标签方的任务也执行数据方的任务。以下为避免指代不清,当将数据方和标签方单独描述时,将数据方的数据集和搜索网络称第一数据集和第一搜索网络,将标签方的数据集和搜索网络称第二数据集和第二搜索网络以示区分。各个参与方的数据集的样本维度是对齐的,也即,各个数据集的样本ID是相同的,但是各个参与方的数据特征可各不相同。各个参与方预先可采用加密样本对齐的方式来构建样本维度对齐的数据集,在此不进行详细赘述。搜索网络是指用于进行网络结构搜索(NAS)的网络,在本实施例中,各个参与方的搜索网络可以是各自预先根据DARTS(Differentiable
Architecture Search,可微结构搜索)方法设计的网络。In this embodiment, the participants in the vertical federated learning are divided into two categories, one is the labeling party with label data, and the other is the data party that has no label data but has feature data. Generally, the labeling party has one , the data side has one or more. Each data party can deploy a dataset and a search network based on its own data features; if the tag party also has feature data, the tag party can also act as a data party and deploy the data set and search network based on its data features. The network performs both the tasks of the label side and the tasks of the data side. In order to avoid unclear references, when the data side and the label side are described separately, the data side and the search network of the data side are called the first data set and the first search network, and the data set and the search network of the label side are called the first The second dataset and the second search network are distinguished. The sample dimensions of the data sets of each participant are aligned, that is, the sample IDs of each data set are the same, but the data characteristics of each participant may be different. Each participant may use the encrypted sample alignment method in advance to construct a sample dimension-aligned data set, which will not be described in detail here. The search network refers to a network used for network structure search (NAS). In this embodiment, the search network of each participant may
Architecture Search, microstructure search) method to design the network.
搜索网络中包括多个单元,每个单元对应一个网络层,其中部分单元之间设置有连接操作,以其中两个单元为例,这两个单元之前的连接操作可以是预先设置的N种连接操作,并定义了每种连接操作对应的权重,该权重即搜索网络的搜索结构参数,单元内的网络层参数即搜索网络的模型参数。在模型训练过程中,需要进行网络结构搜索以优化更新搜索结构参数和模型参数,基于最终更新的搜索结构参数即可确定最终的网络结构,即确定保留哪个或哪些连接操作。由于该网络的结构是经过网络搜索之后才确定的,各个参与方不需要像设计传统纵向联邦学习的模型一样去设置模型的网络结构,从而降低了设计模型的难度。The search network includes multiple units, each unit corresponds to a network layer, and some units are provided with connection operations. Taking two units as an example, the connection operations before these two units can be preset N types of connections operation, and defines the weight corresponding to each connection operation, the weight is the search structure parameter of the search network, and the network layer parameters in the unit are the model parameters of the search network. In the model training process, a network structure search is required to optimize and update the search structure parameters and model parameters. Based on the final updated search structure parameters, the final network structure can be determined, that is, which connection operation or operations to retain. Since the structure of the network is determined after a network search, each participant does not need to set the network structure of the model like designing a traditional vertical federated learning model, thus reducing the difficulty of designing the model.
各个参与方的搜索网络组合构成一个任务模型,各搜索网络的网络输出经过融合后得到任务模型最终的输出。进一步地,任务模型还可包括用于对各个搜索网络的网络输出进行融合的输出网络,该输出网络被设置于连接在各个参与方的搜索网络之后,以各个搜索网络的输出数据作为输入数据,输出网络的输出结果即作为任务模型的最终输出。输出网络可部署于标签方,输出网络可以采用全连接层,或者其他复杂的神经网络结构,具体可根据模型预测任务不同而不同;输出网络的输出结果的形式也可根据具体模型预测任务设置,例如,当模型预测任务是图像分类时,输出网络的输出结果是输入图像所属的类别。The search network combination of each participant constitutes a task model, and the network output of each search network is fused to obtain the final output of the task model. Further, the task model may further include an output network for fusing the network outputs of each search network, the output network is set after the search network connected to each participant, and the output data of each search network is used as input data, The output of the output network is used as the final output of the task model. The output network can be deployed on the label side, and the output network can use a fully connected layer or other complex neural network structure, which can vary according to the model prediction task; the output form of the output network can also be set according to the specific model prediction task. For example, when the model prediction task is image classification, the output of the output network is the class to which the input image belongs.
各参与方需要联合一起优化更新任务模型,也即,联合更新各自搜索网络中的模型结构参数和模型参数,最终得到符合预测准确率要求的任务模型。具体地,在联合进行参数更新的过程中,数据方要更新各自搜索网络中的参数,则需要标签方中的标签数据来计算损失函数和梯度,标签方要计算损失函数和梯度,则需要数据方中的数据集和搜索网络。由于各参与方中的数据集以及标签方中的标签数据都可能是隐私数据,例如各个银行之间联合建模时,数据往往来源于办理银行相关业务的用户的数据,若数据方和标签方之间直接交互数据集中的数据、模型结构和模型参数,则各参与方之间会泄露数据隐私。因此,在本实施例中,数据方和标签方之间可交互用于更新各自搜索网络中模型参数和搜索结构参数的中间结果,并基于接收到的中间结果更新各自的搜索网络中模型参数和搜索结构参数,以对各自搜索网络进行更新,进而完成任务模型的更新。中间结果可以是参数的梯度,也可以是搜索网络的输出数据。具体地,当参与方是数据方时,发送给标签方的中间结果可以是该端搜索网络的输出数据;当参与方是标签方时,发送给数据方的中间结果可以是计算得到的数据方所发送的输出数据对应的梯度。All participants need to jointly optimize and update the task model, that is, jointly update the model structure parameters and model parameters in their respective search networks, and finally obtain a task model that meets the prediction accuracy requirements. Specifically, in the process of joint parameter update, if the data side wants to update the parameters in their respective search networks, it needs the label data in the label side to calculate the loss function and gradient, and the label side needs to calculate the loss function and gradient. Datasets and Search Networks in Square. Because the data sets in each participant and the label data in the label party may be private data, for example, when the joint modeling between banks, the data often comes from the data of users who handle banking-related business, if the data party and the label party If the data, model structure and model parameters in the dataset are directly interacted with each other, data privacy will be leaked between the participants. Therefore, in this embodiment, the data side and the label side can interact with each other to update the intermediate results of the model parameters and search structure parameters in the respective search networks, and update the model parameters and the search network based on the received intermediate results. The search structure parameters are used to update the respective search networks, thereby completing the update of the task model. The intermediate result can be the gradient of the parameters or the output data of the search network. Specifically, when the participant is the data side, the intermediate result sent to the tag side may be the output data of the search network at the end; when the participant is the tag side, the intermediate result sent to the data side may be the calculated data side. The gradient corresponding to the output data sent.
各个参与方可进行多轮联合更新参数。在一轮联合更新参数的过程中,数据方向标签方发送一次网络输出,标签方向数据方发送一次梯度,各参与方可以仅更新各自搜索网络中的模型结构参数,也可以仅更新各自搜索网络中的模型参数,也可以同时更新各自搜索网络中的模型结构参数和模型参数,也即,可更新模型结构参数和/或模型参数。经过多轮联合更新参数后,各参与方的搜索网络中的模型结构参数和模型参数均得到多次更新。具体地各轮联合更新参数时各参与方具体更新哪种参数可预先统一设置。Each participant can jointly update parameters in multiple rounds. In the process of a round of joint parameter update, the data sends a network output to the label side, and the label sends a gradient to the data side. Each participant can only update the model structure parameters in their respective search networks, or they can only update their respective search networks. The model parameters can also be updated simultaneously with the model structure parameters and model parameters in the respective search networks, that is, the model structure parameters and/or model parameters can be updated. After multiple rounds of jointly updating parameters, the model structure parameters and model parameters in the search network of each participant are updated multiple times. Specifically, in each round of joint parameter update, which parameter each participant will update can be set uniformly in advance.
具体地,在一轮联合更新参数的过程中,各个数据方将各自的第一数据集输入各自的第一搜索网络,经过第一搜索网络的处理得到输出结果,该输出结果为网络原始输出。Specifically, in the process of jointly updating parameters in one round, each data party inputs their respective first data sets into their respective first search networks, and processes the first search networks to obtain an output result, which is the original output of the network.
需要说明的是,参与方在各轮联合更新参数中可采用不同的数据集。具体地,参与方可将总的数据集划分为多个小的训练集(也可称为数据批),每轮采用一个小数据集参与联合更新参数,或者,参与方也可以是每轮联合参数更新前,从总的数据集中进行有放回的采样一批数据来参与该轮的联合参数更新。It should be noted that the participants may use different data sets in each round of joint update parameters. Specifically, the participants can divide the total data set into multiple small training sets (also referred to as data batches), and each round uses a small data set to participate in the joint update of parameters, or the participants can also jointly update the parameters in each round Before parameter update, a batch of data is sampled with replacement from the total data set to participate in the joint parameter update of this round.
步骤A20,对所述网络原始输出进行差分隐私加密处理得到第一网络输出;Step A20, performing differential privacy encryption processing on the original network output to obtain a first network output;
步骤A30,将所述第一网络输出发送给参与纵向联邦建模的标签方,以供所述标签方对从各数据方接收到的所述第一网络输出进行融合得到第二网络输出后,根据所述第二网络输出和所述标签方的标签数据计算损失函数相对于各所述第一网络输出的第一梯度,并将所述第一梯度并返回给对应的数据方;Step A30, sending the first network output to the label party participating in the vertical federation modeling, so that the label party fuses the first network output received from each data party to obtain the second network output, Calculate the first gradient of the loss function relative to each of the first network outputs according to the second network output and the label data of the label side, and return the first gradient to the corresponding data side;
数据方对网络原始输出进行差分隐私加密处理得到第一网络输出,再将第一网络输出发送给标签方。本实施例中差分隐私加密处理方式可采用现有的差分隐私加密处理方式。也即,标签方接收到的各个数据方发送的第一网络输出是数据方进行差分隐私加密处理后的结果,而不是第一搜索网络的网络原始输出,标签方基于第一网络输出无法获知网络原始输出,从而避免了标签方根据网络原始输出推导出数据方的特征数据,也就进一步地避免了数据方中的隐私数据泄露给标签方,提升了数据方的数据安全性。The data party performs differential privacy encryption processing on the original network output to obtain the first network output, and then sends the first network output to the label party. The differential privacy encryption processing method in this embodiment may adopt the existing differential privacy encryption processing method. That is, the first network output sent by each data party received by the tag party is the result of the differential privacy encryption process performed by the data party, not the original network output of the first search network. The tag party cannot know the network based on the first network output. The original output prevents the label party from deriving the characteristic data of the data party according to the original output of the network, which further prevents the private data in the data party from leaking to the label party, and improves the data security of the data party.
进一步地,在一实施方式中,数据方对第一网络输出进行差分隐私处理的方式可参照上述第三实施例中标签方对第一梯度进行差分隐私处理的方式。根据该方法,若i表示各数据方的标号,O
i表示各数据方对应的第一网络输出,则对第一网络输出进行裁剪得到的结果O
i’=O
i/max(1,||Oi||
2/C
i),其中,Di表示数据方i设置的阈值。
Further, in an embodiment, the method of performing differential privacy processing on the output of the first network by the data party may refer to the method of performing differential privacy processing on the first gradient by the tag side in the third embodiment. According to this method, if i represents the label of each data square, and O i represents the first network output corresponding to each data square, then the result O i '=O i /max(1, || Oi|| 2 /C i ), where Di represents the threshold set by the data cube i.
标签方接收各个数据方发送的第一网络输出,并融合各个第一网络输出得到第二网络输出。具体地,标签方可将各第一网络输出进行平均得到第二网络输出,或当标签方部署有输出网络时,可将各第一网络输出进行拼接后输入输出网络中,经过输出网络的处理得到第二网络输出。其中,拼接的方式可以是进行向量拼接。标签方根据第二网络输出和标签方的标签数据计算损失函数,该损失函数可以是回归问题的均方误差或分类问题的交叉熵损失等,并计算损失函数相对于各个第一网络输出的第一梯度。根据损失函数计算梯度的方式可参照链式法则和梯度下降算法,在此不进行详细赘述。The tag side receives the first network output sent by each data side, and fuses each first network output to obtain the second network output. Specifically, the tag side can average the outputs of each first network to obtain the second network output, or when the tag side deploys an output network, the output of each first network can be spliced and input into the output network, and processed by the output network. Get the second network output. The method of splicing may be vector splicing. The label side calculates a loss function according to the output of the second network and the label data of the label side. The loss function can be the mean square error of the regression problem or the cross entropy loss of the classification problem, etc., and calculates the loss function relative to the first network output. a gradient. The method of calculating the gradient according to the loss function can refer to the chain rule and the gradient descent algorithm, and will not be described in detail here.
标签方在计算得到各个第一网络输出对应的第一梯度后,可将第一梯度发送给对应的数据方,也即,第一梯度对应的第一网络输出是由哪个数据方发送的,就将该第一梯度发送给哪个数据方。After calculating the first gradient corresponding to each first network output, the tag side can send the first gradient to the corresponding data side, that is, which data side sends the first network output corresponding to the first gradient, then Which data party to send this first gradient to.
步骤A40,根据从所述标签方接收到的所述第一梯度更新所述第一搜索网络中的搜索结构参数和/或模型参数。Step A40: Update search structure parameters and/or model parameters in the first search network according to the first gradient received from the tag side.
数据方在接收到第一梯度后,根据第一梯度更新其第一搜索网络中的搜索结构参数和/或模型参数。具体地,数据方根据链式法则和梯度下降算法,根据第一梯度计算得到损失函数相对于其搜索网络中搜索结构参数和/或模型参数的梯度,并根据梯度对应更新搜索结构参数和/或模型参数。也即分为三种情况,第一种:根据第一梯度计算得到损失函数相对于搜索结构参数的梯度,根据该梯度更新搜索结构参数;第二种:根据第一梯度计算损失函数相对于模型参数的梯度,根据该梯度更新模型参数;第三种:根据第一梯度计算损失函数相对于搜索结构参数的梯度,根据该梯度更新搜索结构参数,并根据第一梯度计算损失函数相对于模型参数的梯度,根据该梯度更新模型参数。至此完成一轮联合更新参数的过程。After receiving the first gradient, the data party updates the search structure parameters and/or model parameters in its first search network according to the first gradient. Specifically, according to the chain rule and the gradient descent algorithm, the data party obtains the gradient of the loss function relative to the search structure parameters and/or model parameters in the search network according to the first gradient calculation, and updates the search structure parameters and/or the corresponding gradient according to the gradient. model parameters. That is to say, there are three cases, the first one: calculate the gradient of the loss function relative to the search structure parameters according to the first gradient, and update the search structure parameters according to the gradient; the second: calculate the loss function according to the first gradient relative to the model The gradient of the parameter, update the model parameters according to the gradient; the third type: calculate the gradient of the loss function relative to the search structure parameters according to the first gradient, update the search structure parameters according to the gradient, and calculate the loss function according to the first gradient relative to the model parameters The gradient of , according to which the model parameters are updated. At this point, a round of joint parameter updating process is completed.
经过多轮联合更新参数后,参与方可根据其更新参数后的搜索网络得到目标模型。各个参与方可采用各自的目标模型联合完成具体的模型预测任务。After several rounds of jointly updating the parameters, the participants can obtain the target model according to the search network after updating the parameters. Each participant can use their own target models to jointly complete specific model prediction tasks.
在本实施例中,通过数据方将第一数据集输入所述第一搜索网络得到网络原始输出,对网络原始输出进行差分隐私加密处理得到第一网络输出,将第一网络输出发送给标签方,以供标签方对从各数据方接收到的第一网络输出进行融合得到第二网络输出后,根据第二网络输出和标签方的标签数据计算损失函数相对于各第一网络输出的第一梯度,并将第一梯度并返回给对应的数据方;数据方根据从标签方接收到的第一梯度更新第一搜索网络中的搜索结构参数和/或模型参数。由于标签方发送给数据方的梯度是经过差分隐私加密处理后的,数据方无法获知原始的梯度,从而避免了数据方根据梯度推导出标签方的标签数据和特征数据,也就进一步地避免了标签方中的隐私数据泄露给数据方,提升了标签方的数据安全性。并且,相比于现有纵向联邦学习中,各参与方需要人工花费大量人力物力预先设计模型结构的方式,本实施例实现了在纵向联邦建模过程中,各数据方只需要设置各自的搜索网络即可,搜索网络中各个网络单元之间的连接,也即模型结构,是在纵向联邦建模过程中通过优化更新搜索结构参数的方式自动确定的,实现了自动纵向联邦学习,不需要花费大量人力物力预先设置模型结构,降低了参与纵向联邦学习的门槛,使得纵向联邦学习能够被应用于更广泛的具体任务领域中去实现具体的任务,提高了纵向联邦学习的应用范围。且建模过程中,数据方向标签方发送的是搜索网络的输出,标签方向数据方发送的是差分隐私处理后的梯度,各个参与方之间并不会直接交互数据集和模型本身,从而一定程度上保障了各个参与方的数据安全和模型信息安全。In this embodiment, the data party inputs the first data set into the first search network to obtain the original network output, performs differential privacy encryption processing on the original network output to obtain the first network output, and sends the first network output to the label party , so that the label side fuses the first network output received from each data side to obtain the second network output, and calculates the first network output of the loss function relative to the first network output according to the second network output and the label side’s label data. gradient, and return the first gradient to the corresponding data side; the data side updates the search structure parameters and/or model parameters in the first search network according to the first gradient received from the label side. Since the gradient sent by the label side to the data side is processed by differential privacy encryption, the data side cannot know the original gradient, thus preventing the data side from deriving the label data and feature data of the label side according to the gradient, which further avoids the The private data in the label side is leaked to the data side, which improves the data security of the label side. Moreover, compared with the existing vertical federated learning method in which each participant needs to spend a lot of manpower and material resources to pre-design the model structure, this embodiment realizes that in the vertical federated modeling process, each data party only needs to set up their own search methods. The network is enough. The connection between each network unit in the search network, that is, the model structure, is automatically determined by optimizing and updating the search structure parameters in the vertical federation modeling process, which realizes automatic vertical federation learning without spending A large number of human and material resources are used to pre-set the model structure, which lowers the threshold for participating in vertical federated learning, so that vertical federated learning can be applied to a wider range of specific task fields to achieve specific tasks, and the application scope of vertical federated learning is improved. In the modeling process, the data sent to the tag side is the output of the search network, and the tag side sent to the data side is the gradient after differential privacy processing. To a certain extent, the data security and model information security of each participant are guaranteed.
进一步地,数据方的所述第一搜索网络中搜索结构参数包括第一搜索网络中网络单元之间连接操作对应的权重,所述步骤A40之后,还包括:Further, the search structure parameter in the first search network of the data party includes the weight corresponding to the connection operation between the network units in the first search network, and after the step A40, the method further includes:
步骤A50,根据更新参数后的第一搜索网络中的搜索结构参数从各连接操作中选取保留操作;Step A50, according to the search structure parameters in the first search network after updating the parameters, select a reservation operation from each connection operation;
步骤A60,将各所述保留操作和各所述保留操作连接的网络单元所构成的模型作为目标模型。In step A60, a model formed by each of the reservation operations and the network units connected to each of the reservation operations is used as a target model.
数据方的搜索网络中搜索结构参数可包括搜索网络中网络单元之间连接操作对应的权重。也即,网络单元之间设置了连接操作,每个连接操作对应一个权重。需要说明的是,并不是任意两个网络单元之间都设置有连接操作。在经过多轮联合更新参数后,数据方可根据其更新后的搜索网络中的搜索结构参数,从各个连接操作中选取保留操作。具体地,对于每两个存在连接操作的网络单元,其之间有多条连接操作,可从多条连接操作中选出权重大的一个或多个连接操作作为保留操作。在确定保留操作后,将各保留操作以及各个保留操作连接的网络单元所构成的模型,作为参与方的目标模型。The search structure parameters in the search network of the data party may include weights corresponding to connection operations between network elements in the search network. That is, connection operations are set between network units, and each connection operation corresponds to a weight. It should be noted that a connection operation is not set between any two network units. After several rounds of jointly updating the parameters, the data can select the retention operation from each connection operation according to the search structure parameters in the updated search network. Specifically, for every two network units that have connection operations, there are multiple connection operations between them, and one or more connection operations with a greater weight may be selected from the multiple connection operations as the reserved operation. After the reservation operation is determined, the model formed by each reservation operation and the network elements connected by each reservation operation is used as the target model of the participant.
进一步地,在一实施方式中,各个参与方可以是部署于银行或其他金融机构的设备,参与方中存储有各机构在业务处理过程中记录的用户数据。不同的机构涉及的具体业务存在差异,因此各个参与方的用户数据的特征可能不同,各个机构可基于各自的数据特征构建数据集,采用各自的数据集联合进行纵向联邦学习,通过扩充模型特征丰富度的方式来提升模型的预测性能。具体地,各个参与方可联合构建用户风险预测模型,用于在信贷业务、保险业务等等业务场景中预测用户的风险程度。各个参与方的数据特征可以根据实际经验选取与用户风险预测相关的风险特征,例如,用户的存款数额、用户的违约次数等等。Further, in one embodiment, each participant may be a device deployed in a bank or other financial institution, and the participant stores user data recorded by each institution during business processing. There are differences in the specific business involved in different institutions, so the characteristics of user data of each participant may be different. Each institution can build a data set based on its own data characteristics, and use their own data sets to jointly conduct vertical federated learning, and enrich the features by expanding the model. degree to improve the prediction performance of the model. Specifically, each participant can jointly build a user risk prediction model, which is used to predict the user's risk level in business scenarios such as credit business and insurance business. The data characteristics of each participant can select the risk characteristics related to the user's risk prediction according to actual experience, such as the user's deposit amount, the user's default times, and so on.
各个参与方采用各自的数据集按照上述实施例中的方式联合进行纵向联邦建模,得到各自的目标模型。Each participant uses their own data sets to jointly perform vertical federation modeling according to the method in the above-mentioned embodiment to obtain their own target models.
在得到各自的目标模型后,各参与方可联合对用户进行风险预测。数据方将目标用户其本端的第二风险特征对应的用户数据输入其本端的目标模型,经过目标模型的处理,得到第一模型输出。数据方将第一模型输出发送给数据应用提供方。标签方接收各个数据方发送的第一模型输出。After obtaining their respective target models, each participant can jointly carry out risk prediction for users. The data party inputs the user data corresponding to the second risk feature of the target user at the local end into the target model of the local end, and obtains the output of the first model after processing by the target model. The data party sends the first model output to the data application provider. The label side receives the first model output sent by each data side.
标签方将目标用户在其本端的第一风险特征对应的用户数据输入其本端的目标模型,经过目标模型的处理,得到第二模型输出。标签方将各个第一模型输出和第二模型输出进行拼接,将拼接结果输入标签方本端的输出网络,经过输出网络的处理,输出得到目标用户的风险预测结果。The labeler inputs the user data corresponding to the first risk feature of the target user at its local end into the target model of its local end, and after processing by the target model, the second model output is obtained. The tag side splices the output of each first model and the output of the second model, and inputs the splicing result into the output network of the tag side's local end. After processing by the output network, the output obtains the risk prediction result of the target user.
进一步地,当目标用户的风险预测任务是数据方发起时,标签方可以将目标用户的风险预测结果发送给数据方,以供数据方根据目标用户的风险预测结果进行后续的业务处理,例如,根据风险预测结果确定是否对目标用户进行贷款。Further, when the target user's risk prediction task is initiated by the data party, the tag party can send the target user's risk prediction result to the data party, so that the data party can perform subsequent business processing according to the target user's risk prediction result, for example, Determine whether to lend to the target user according to the risk prediction result.
在本实施例中,各参与方只需要设置各自的搜索网络即可,不需要花费大量人力物力去设置精心设置模型结构,从而降低了参与纵向联邦学习的门槛,使得银行和其他金融机构能够更加方便地通过纵向联邦学习进行联合建模,进而通过联合建模得到的风险预测模型完成风险预测任务。并且,在纵向联邦建模和建模后采用模型进行风险预测的过程中,各个参与方不需要直接交互各自的数据集和模型本身,从而保障了各个参与方中的用户隐私数据的安全。并且,数据方将搜索网络的网络输出进行差分隐私加密处理后再发送给标签方,避免了标签方根据网络输出推导出数据方中的原始用户数据,从而进一步提升了数据方的数据安全性。标签方将网络输出对应的梯度进行差分隐私加密处理后再发送给数据方,避免了数据方根据网络输出的梯度推导出标签方中的原始用户数据,从而进一步提升了标签方的数据安全性。In this embodiment, each participant only needs to set up their own search network, and does not need to spend a lot of manpower and material resources to set up a carefully set model structure, thereby lowering the threshold for participating in vertical federated learning, enabling banks and other financial institutions to be more It is convenient to carry out joint modeling through longitudinal federated learning, and then complete the risk prediction task through the risk prediction model obtained by joint modeling. Moreover, in the process of vertical federation modeling and the use of models for risk prediction after modeling, each participant does not need to directly interact with their own datasets and models, thus ensuring the security of user privacy data in each participant. In addition, the data party performs differential privacy encryption on the network output of the search network before sending it to the label party, which prevents the label party from deriving the original user data in the data party according to the network output, thus further improving the data security of the data party. The label side encrypts the gradient corresponding to the network output before sending it to the data side, which prevents the data side from deriving the original user data in the label side according to the gradient of the network output, thus further improving the data security of the label side.
此外本申请实施例还提出一种纵向联邦建模优化装置,所述装置部署于参与纵向联邦建模的标签方,所述标签方与参与纵向联邦建模的各数据方通信连接,各所述数据方中分别部署有基于各自数据特征构建的第一数据集和第一搜索网络,所述装置包括:In addition, an embodiment of the present application also proposes a vertical federation modeling optimization device. The device is deployed on a label party participating in the vertical federation modeling. The label party is in communication connection with each data party participating in the vertical federation modeling. A first data set and a first search network constructed based on respective data characteristics are respectively deployed in the data cube, and the device includes:
接收模块,用于接收所述数据方发送的第一网络输出,其中,所述第一网络输出是所述数据方将所述第一数据集输入所述第一搜索网络得到的;a receiving module, configured to receive the first network output sent by the data party, wherein the first network output is obtained by the data party inputting the first data set into the first search network;
计算模块,用于融合各所述第一网络输出得到第二网络输出,并根据所述第二网络输出和本端的标签数据计算损失函数相对于各所述第一网络输出的第一梯度;a computing module, configured to fuse each of the first network outputs to obtain a second network output, and calculate the first gradient of the loss function relative to each of the first network outputs according to the second network output and the label data of the local end;
第一差分隐私处理模块,用于将各所述第一梯度进行差分隐私加密处理得到各第一加密梯度,将各所述第一加密梯度发送给对应的数据方,以供所述数据方根据所述第一加密梯度更新所述第一搜索网络中的搜索结构参数和/或模型参数。A first differential privacy processing module, configured to perform differential privacy encryption processing on each of the first gradients to obtain each first encrypted gradient, and send each of the first encrypted gradients to the corresponding data party for the data party to base on The first encryption gradient updates search structure parameters and/or model parameters in the first search network.
进一步地,所述接收模块还用于:Further, the receiving module is also used for:
接收所述数据方发送的第一网络输出,其中,所述第一网络输出是所述数据方将所述第一数据集输入所述第一搜索网络进行处理得到网络原始输出,并对所述网络原始输出进行差分隐私加密处理后得到的。Receive the first network output sent by the data party, wherein the first network output is that the data party inputs the first data set into the first search network for processing to obtain the original network output, and analyzes the The original output of the network is obtained after differential privacy encryption processing.
进一步地,所述标签方部署有输出网络以及基于所述标签方的数据特征构建的第二数据集和第二搜索网络,所述计算模块包括:Further, the tag side is deployed with an output network, a second data set and a second search network constructed based on the data characteristics of the tag side, and the computing module includes:
输入单元,用于将所述第二数据集输入所述第二搜索网络得到第三网络输出;an input unit, configured to input the second data set into the second search network to obtain a third network output;
拼接单元,用于将所述第三网络输出和各所述第一网络输出进行拼接后输入所述输出网络得到第二网络输出;a splicing unit for splicing the third network output and each of the first network outputs and then inputting the output network into the output network to obtain the second network output;
所述装置还包括:The device also includes:
第一更新模块,用于根据所述第二网络输出和所述标签数据计算损失函数相对于所述第二搜索网络中目标参数的第二梯度,并根据所述第二梯度更新所述目标参数,其中,所述目标参数是所述第二搜索网络中的搜索结构参数和/或模型参数。a first update module, configured to calculate the second gradient of the loss function relative to the target parameter in the second search network according to the second network output and the label data, and update the target parameter according to the second gradient , wherein the target parameter is a search structure parameter and/or a model parameter in the second search network.
进一步地,所述差分隐私加密处理包括裁剪处理和添加高斯噪声处理,所述第一差分隐私处理模块包括:Further, the differential privacy encryption processing includes clipping processing and adding Gaussian noise processing, and the first differential privacy processing module includes:
裁剪处理单元,用于对所述第一梯度进行裁剪处理得到第一裁剪梯度,其中,所述第一裁剪梯度的二阶范数小于或等于第一预设阈值;a clipping processing unit, configured to perform clipping processing on the first gradient to obtain a first clipping gradient, wherein the second-order norm of the first clipping gradient is less than or equal to a first preset threshold;
生成单元,用于生成服从目标高斯分布的噪声阵列,其中,所述目标高斯分布的均值为0,均方差为第二预设阈值,所述噪声阵列中各元素与所述第一梯度中各元素一一对应;The generating unit is configured to generate a noise array that obeys a target Gaussian distribution, wherein the mean value of the target Gaussian distribution is 0, the mean square error is a second preset threshold, and each element in the noise array is the same as each element in the first gradient. One-to-one correspondence of elements;
添加噪声单元,用于采用所述噪声阵列对所述第一梯度进行添加高斯噪声处理得到第一加密梯度。The noise adding unit is used for adding Gaussian noise to the first gradient by using the noise array to obtain a first encrypted gradient.
进一步地,所述装置还包括:Further, the device also includes:
获取模块,用于获取本次纵向联邦建模的隐私级别和建模进度;The acquisition module is used to acquire the privacy level and modeling progress of this vertical federation modeling;
设置模块,用于根据所述隐私级别和所述建模进度设置所述第二预设阈值。a setting module, configured to set the second preset threshold according to the privacy level and the modeling progress.
此外本申请实施例还提出一种纵向联邦建模优化装置,所述装置部署于参与纵向联邦建模的数据方,各数据方中分别部署有基于各自数据特征构建的第一数据集和第一搜索网络,所述装置包括:In addition, an embodiment of the present application also proposes a vertical federation modeling optimization device. The device is deployed in data parties participating in vertical federation modeling, and each data party is respectively deployed with a first data set and a first data set constructed based on respective data characteristics. searching the network, the apparatus includes:
输入模块,用于将所述第一数据集输入所述第一搜索网络得到网络原始输出;an input module, configured to input the first data set into the first search network to obtain the original output of the network;
第二差分隐私处理模块,用于对所述网络原始输出进行差分隐私加密处理得到第一网络输出;a second differential privacy processing module, configured to perform differential privacy encryption processing on the original network output to obtain the first network output;
发送模块,用于将所述第一网络输出发送给参与纵向联邦建模的标签方,以供所述标签方对从各数据方接收到的所述第一网络输出进行融合得到第二网络输出后,根据所述第二网络输出和所述标签方的标签数据计算损失函数相对于各所述第一网络输出的第一梯度,并将所述第一梯度并返回给对应的数据方;A sending module, configured to send the first network output to the label party participating in the vertical federation modeling, so that the label party can fuse the first network output received from each data party to obtain the second network output Then, calculate the first gradient of the loss function relative to each of the first network outputs according to the second network output and the label data of the label side, and return the first gradient to the corresponding data side;
第二更新模块,用于根据从所述标签方接收到的所述第一梯度更新所述第一搜索网络中的搜索结构参数和/或模型参数。A second update module, configured to update search structure parameters and/or model parameters in the first search network according to the first gradient received from the tag side.
进一步地,数据方的所述第一搜索网络中搜索结构参数包括第一搜索网络中网络单元之间连接操作对应的权重,所述装置还包括:Further, the search structure parameters in the first search network of the data party include weights corresponding to connection operations between network elements in the first search network, and the device further includes:
选取模块,用于根据更新参数后的第一搜索网络中的搜索结构参数从各连接操作中选取保留操作;a selection module for selecting a reservation operation from each connection operation according to the search structure parameter in the first search network after updating the parameter;
确定模块,用于将各所述保留操作和各所述保留操作连接的网络单元所构成的模型作为目标模型。A determination module, configured to use a model formed by each of the reservation operations and the network units connected to each of the reservation operations as a target model.
本申请纵向联邦建模优化装置的具体实施方式的拓展内容与上述纵向联邦建模优化方法各实施例基本相同,在此不做赘述。The extended content of the specific implementation of the vertical federated modeling optimization apparatus of the present application is basically the same as that of the above-mentioned embodiments of the vertical federated modeling optimization method, and will not be repeated here.
此外,本申请实施例还提出一种计算机可读存储介质,所述存储介质上存储有纵向联邦建模优化程序,所述纵向联邦建模优化程序被处理器执行时实现如下所述的纵向联邦建模优化方法的步骤。In addition, an embodiment of the present application further provides a computer-readable storage medium, where a vertical federation modeling optimization program is stored on the storage medium, and when the vertical federation modeling optimization program is executed by a processor, the following vertical federation is realized Steps for modeling optimization methods.
本申请纵向联邦建模优化设备和计算机可读存储介质的各实施例,均可参照本申请纵向联邦建模优化方法各实施例,此处不再赘述。For the embodiments of the vertical federation modeling and optimization device and the computer-readable storage medium of the present application, reference may be made to the embodiments of the vertical federated modeling and optimization method of the present application, which will not be repeated here.
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。The above-mentioned serial numbers of the embodiments of the present application are only for description, and do not represent the advantages or disadvantages of the embodiments.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各实施例所述的方法。From the description of the above embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus a necessary general hardware platform, and of course hardware can also be used, but in many cases the former is better implementation. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence or the part that contributes to the prior art, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, CD-ROM), including several instructions to enable a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to execute the methods described in the embodiments of this application.
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。The above are only the preferred embodiments of the present application, and are not intended to limit the scope of the patent of the present application. Any equivalent structure or equivalent process transformation made by using the contents of the description and drawings of the present application, or directly or indirectly applied in other related technical fields , are similarly included within the scope of patent protection of this application.
Claims (20)
- 一种纵向联邦建模优化方法,其中,所述方法应用于参与纵向联邦建模的标签方,所述标签方与参与纵向联邦建模的各数据方通信连接,各所述数据方中分别部署有基于各自数据特征构建的第一数据集和第一搜索网络,所述方法包括以下步骤:A vertical federation modeling optimization method, wherein the method is applied to a label party participating in the vertical federation modeling, the label party is in communication connection with each data party participating in the vertical federation modeling, and each of the data parties is deployed separately There are a first data set and a first search network constructed based on the respective data features, and the method includes the following steps:接收所述数据方发送的第一网络输出,其中,所述第一网络输出是所述数据方将所述第一数据集输入所述第一搜索网络得到的;receiving the first network output sent by the data party, wherein the first network output is obtained by the data party inputting the first data set into the first search network;融合各所述第一网络输出得到第二网络输出,并根据所述第二网络输出和本端的标签数据计算损失函数相对于各所述第一网络输出的第一梯度;Fusing each of the first network outputs to obtain a second network output, and calculating the first gradient of the loss function relative to each of the first network outputs according to the second network output and the label data of the local end;将各所述第一梯度进行差分隐私加密处理得到各第一加密梯度,将各所述第一加密梯度发送给对应的数据方,以供所述数据方根据所述第一加密梯度更新所述第一搜索网络中的搜索结构参数和/或模型参数。Performing differential privacy encryption processing on each of the first gradients to obtain each of the first encryption gradients, and sending each of the first encryption gradients to the corresponding data party for the data party to update the first encryption gradient according to the first encryption gradient. Search structure parameters and/or model parameters in the first search network.
- 如权利要求1所述的纵向联邦建模优化方法,其中,所述接收所述数据方发送的第一网络输出,其中,所述第一网络输出是所述数据方将所述第一数据集输入所述第一搜索网络得到的步骤包括:The vertical federated modeling optimization method according to claim 1, wherein the receiving the first network output sent by the data party, wherein the first network output is the data party converting the first data set The steps obtained by entering the first search network include:接收所述数据方发送的第一网络输出,其中,所述第一网络输出是所述数据方将所述第一数据集输入所述第一搜索网络进行处理得到网络原始输出,并对所述网络原始输出进行差分隐私加密处理后得到的。Receive the first network output sent by the data party, wherein the first network output is that the data party inputs the first data set into the first search network for processing to obtain the original network output, and analyzes the The original output of the network is obtained after differential privacy encryption processing.
- 如权利要求1所述的纵向联邦建模优化方法,其中,所述标签方部署有输出网络以及基于所述标签方的数据特征构建的第二数据集和第二搜索网络。The vertical federated modeling optimization method according to claim 1, wherein the tag side deploys an output network and a second data set and a second search network constructed based on data features of the tag side.
- 如权利要求3所述的纵向联邦建模优化方法,其中,所述融合各所述第一网络输出得到第二网络输出的步骤包括:The vertical federated modeling optimization method according to claim 3, wherein the step of fusing each of the first network outputs to obtain the second network output comprises:将所述第二数据集输入所述第二搜索网络得到第三网络输出;Inputting the second data set into the second search network to obtain a third network output;将所述第三网络输出和各所述第一网络输出进行拼接后输入所述输出网络得到第二网络输出;After splicing the third network output and each of the first network outputs, input the output network to obtain the second network output;所述根据所述第二网络输出和本端的标签数据计算损失函数相对于各所述第一网络输出的第一梯度的步骤之后,还包括:After the step of calculating the first gradient of the loss function relative to each of the first network outputs according to the second network output and the local label data, the method further includes:根据所述第二网络输出和所述标签数据计算损失函数相对于所述第二搜索网络中目标参数的第二梯度,并根据所述第二梯度更新所述目标参数,其中,所述目标参数是所述第二搜索网络中的搜索结构参数和/或模型参数。A second gradient of the loss function relative to the target parameter in the second search network is calculated according to the second network output and the label data, and the target parameter is updated according to the second gradient, wherein the target parameter are search structure parameters and/or model parameters in the second search network.
- 如权利要求1至4任一项所述的纵向联邦建模优化方法,其中,所述差分隐私加密处理包括裁剪处理和添加高斯噪声处理,所述将各所述第一梯度进行差分隐私加密处理得到各第一加密梯度的步骤包括:The vertical federated modeling optimization method according to any one of claims 1 to 4, wherein the differential privacy encryption processing includes clipping processing and Gaussian noise addition processing, and the differential privacy encryption processing is performed on each of the first gradients The steps of obtaining each first encryption gradient include:对所述第一梯度进行裁剪处理得到第一裁剪梯度,其中,所述第一裁剪梯度的二阶范数小于或等于第一预设阈值;Perform clipping processing on the first gradient to obtain a first clipping gradient, wherein the second-order norm of the first clipping gradient is less than or equal to a first preset threshold;生成服从目标高斯分布的噪声阵列,其中,所述目标高斯分布的均值为0,均方差为第二预设阈值,所述噪声阵列中各元素与所述第一梯度中各元素一一对应;generating a noise array that obeys the target Gaussian distribution, wherein the mean value of the target Gaussian distribution is 0, the mean square error is a second preset threshold, and each element in the noise array corresponds to each element in the first gradient one-to-one;采用所述噪声阵列对所述第一梯度进行添加高斯噪声处理得到第一加密梯度。The first encrypted gradient is obtained by adding Gaussian noise to the first gradient by using the noise array.
- 如权利要求5所述的纵向联邦建模优化方法,其中,所述生成服从目标高斯分布的噪声阵列的步骤之前,还包括:The longitudinal federated modeling optimization method according to claim 5, wherein before the step of generating a noise array obeying a target Gaussian distribution, the method further comprises:获取本次纵向联邦建模的隐私级别和建模进度;Obtain the privacy level and modeling progress of this vertical federation modeling;根据所述隐私级别和所述建模进度设置所述第二预设阈值。The second preset threshold is set according to the privacy level and the modeling progress.
- 如权利要求1所述的纵向联邦建模优化方法,其中,若标签方也拥有特征数据时,则标签方也作为一个数据方,并部署基于其数据特征构建的数据集和搜索网络,既执行标签方的任务也执行数据方的任务。The vertical federated modeling optimization method according to claim 1, wherein, if the tag side also owns feature data, the tag side also acts as a data side, and deploys a dataset and a search network constructed based on its data features, both executing The tasks of the label side also perform the tasks of the data side.
- 如权利要求7所述的纵向联邦建模优化方法,其中,数据方的数据集和搜索网络为第一数据集和第一搜索网络,标签方的数据集和搜索网络为第二数据集和第二搜索网络。The vertical federated modeling optimization method according to claim 7, wherein the data set and search network of the data side are the first data set and the first search network, and the data set and search network of the label side are the second data set and the first search network 2. Search the web.
- 一种纵向联邦建模优化方法,其中,所述方法应用于参与纵向联邦建模的数据方,各数据方中分别部署有基于各自数据特征构建的第一数据集和第一搜索网络,所述方法包括以下步骤:A vertical federation modeling optimization method, wherein the method is applied to data cubes participating in vertical federation modeling, and each data cube is respectively deployed with a first data set and a first search network constructed based on respective data characteristics, the The method includes the following steps:将所述第一数据集输入所述第一搜索网络得到网络原始输出;Inputting the first data set into the first search network to obtain the original output of the network;对所述网络原始输出进行差分隐私加密处理得到第一网络输出;Performing differential privacy encryption processing on the original output of the network to obtain a first network output;将所述第一网络输出发送给参与纵向联邦建模的标签方,以供所述标签方对从各数据方接收到的所述第一网络输出进行融合得到第二网络输出后,根据所述第二网络输出和所述标签方的标签数据计算损失函数相对于各所述第一网络输出的第一梯度,并将所述第一梯度并返回给对应的数据方;Send the first network output to the label party participating in the vertical federation modeling, so that the label party can fuse the first network output received from each data party to obtain the second network output, according to the The second network output and the label data of the label side calculate the first gradient of the loss function relative to each of the first network outputs, and return the first gradient to the corresponding data side;根据从所述标签方接收到的所述第一梯度更新所述第一搜索网络中的搜索结构参数和/或模型参数。Search structure parameters and/or model parameters in the first search network are updated according to the first gradient received from the tag side.
- 如权利要求9所述的纵向联邦建模优化方法,其中,数据方的所述第一搜索网络中搜索结构参数包括第一搜索网络中网络单元之间连接操作对应的权重,所述根据从所述标签方接收到的所述第一梯度更新所述第一搜索网络中的搜索结构参数和/或模型参数的步骤之后,还包括:The vertical federated modeling optimization method according to claim 9, wherein the search structure parameters in the first search network of the data side include weights corresponding to connection operations between network elements in the first search network, and the parameters are based on the data obtained from the After the step of updating the search structure parameters and/or model parameters in the first search network by the first gradient received by the label side, the method further includes:根据更新参数后的第一搜索网络中的搜索结构参数从各连接操作中选取保留操作;According to the search structure parameters in the first search network after updating the parameters, a reservation operation is selected from each connection operation;将各所述保留操作和各所述保留操作连接的网络单元所构成的模型作为目标模型。The model formed by each of the reservation operations and the network elements connected to each of the reservation operations is taken as a target model.
- 一种纵向联邦建模优化设备,其中,所述纵向联邦建模优化设备包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的纵向联邦建模优化程序,所述纵向联邦建模优化程序被所述处理器执行时实现以下步骤:A vertical federated modeling and optimization device, wherein the vertical federated modeling and optimization device includes: a memory, a processor, and a vertical federated modeling optimization program stored on the memory and executable on the processor, and The vertical federated modeling optimizer implements the following steps when executed by the processor:接收所述数据方发送的第一网络输出,其中,所述第一网络输出是所述数据方将所述第一数据集输入所述第一搜索网络得到的;receiving the first network output sent by the data party, wherein the first network output is obtained by the data party inputting the first data set into the first search network;融合各所述第一网络输出得到第二网络输出,并根据所述第二网络输出和本端的标签数据计算损失函数相对于各所述第一网络输出的第一梯度;Fusing each of the first network outputs to obtain a second network output, and calculating the first gradient of the loss function relative to each of the first network outputs according to the second network output and the label data of the local end;将各所述第一梯度进行差分隐私加密处理得到各第一加密梯度,将各所述第一加密梯度发送给对应的数据方,以供所述数据方根据所述第一加密梯度更新所述第一搜索网络中的搜索结构参数和/或模型参数。Performing differential privacy encryption processing on each of the first gradients to obtain each of the first encryption gradients, and sending each of the first encryption gradients to the corresponding data party for the data party to update the first encryption gradient according to the first encryption gradient. Search structure parameters and/or model parameters in the first search network.
- 如权利要求11所述的纵向联邦建模优化设备,其中,所述接收所述数据方发送的第一网络输出,其中,所述第一网络输出是所述数据方将所述第一数据集输入所述第一搜索网络得到的步骤包括:11. The vertical federation modeling optimization device of claim 11, wherein the receiving the first network output sent by the data party, wherein the first network output is the data party's transformation of the first data set The steps obtained by entering the first search network include:接收所述数据方发送的第一网络输出,其中,所述第一网络输出是所述数据方将所述第一数据集输入所述第一搜索网络进行处理得到网络原始输出,并对所述网络原始输出进行差分隐私加密处理后得到的。Receive the first network output sent by the data party, wherein the first network output is that the data party inputs the first data set into the first search network for processing to obtain the original network output, and analyzes the The original output of the network is obtained after differential privacy encryption processing.
- 如权利要求11所述的纵向联邦建模优化设备,其中,所述标签方部署有输出网络以及基于所述标签方的数据特征构建的第二数据集和第二搜索网络,The vertical federation modeling optimization device of claim 11, wherein the tag side deploys an output network and a second data set and a second search network constructed based on data features of the tag side,所述融合各所述第一网络输出得到第二网络输出的步骤包括:The step of fusing each of the first network outputs to obtain the second network output includes:将所述第二数据集输入所述第二搜索网络得到第三网络输出;Inputting the second data set into the second search network to obtain a third network output;将所述第三网络输出和各所述第一网络输出进行拼接后输入所述输出网络得到第二网络输出;After splicing the third network output and each of the first network outputs, input the output network to obtain the second network output;所述根据所述第二网络输出和本端的标签数据计算损失函数相对于各所述第一网络输出的第一梯度的步骤之后,还包括:After the step of calculating the first gradient of the loss function relative to each of the first network outputs according to the second network output and the local label data, the method further includes:根据所述第二网络输出和所述标签数据计算损失函数相对于所述第二搜索网络中目标参数的第二梯度,并根据所述第二梯度更新所述目标参数,其中,所述目标参数是所述第二搜索网络中的搜索结构参数和/或模型参数。A second gradient of the loss function relative to the target parameter in the second search network is calculated according to the second network output and the label data, and the target parameter is updated according to the second gradient, wherein the target parameter are search structure parameters and/or model parameters in the second search network.
- 如权利要求11至13中任一项所述的纵向联邦建模优化设备,其中,所述差分隐私加密处理包括裁剪处理和添加高斯噪声处理,所述将各所述第一梯度进行差分隐私加密处理得到各第一加密梯度的步骤包括:The vertical federated modeling optimization device according to any one of claims 11 to 13, wherein the differential privacy encryption processing includes clipping processing and Gaussian noise addition processing, and the differential privacy encryption is performed on each of the first gradients The steps of processing to obtain each first encrypted gradient include:对所述第一梯度进行裁剪处理得到第一裁剪梯度,其中,所述第一裁剪梯度的二阶范数小于或等于第一预设阈值;Perform clipping processing on the first gradient to obtain a first clipping gradient, wherein the second-order norm of the first clipping gradient is less than or equal to a first preset threshold;生成服从目标高斯分布的噪声阵列,其中,所述目标高斯分布的均值为0,均方差为第二预设阈值,所述噪声阵列中各元素与所述第一梯度中各元素一一对应;generating a noise array that obeys the target Gaussian distribution, wherein the mean value of the target Gaussian distribution is 0, the mean square error is a second preset threshold, and each element in the noise array corresponds to each element in the first gradient one-to-one;采用所述噪声阵列对所述第一梯度进行添加高斯噪声处理得到第一加密梯度。The first encrypted gradient is obtained by adding Gaussian noise to the first gradient by using the noise array.
- 如权利要求14所述的纵向联邦建模优化方法,其中,所述生成服从目标高斯分布的噪声阵列的步骤之前,还包括:The longitudinal federated modeling optimization method according to claim 14, wherein before the step of generating a noise array obeying a target Gaussian distribution, the method further comprises:获取本次纵向联邦建模的隐私级别和建模进度;Obtain the privacy level and modeling progress of this vertical federation modeling;根据所述隐私级别和所述建模进度设置所述第二预设阈值。The second preset threshold is set according to the privacy level and the modeling progress.
- 如权利要求11所述的纵向联邦建模优化方法,其中,若标签方也拥有特征数据时,则标签方也作为一个数据方,并部署基于其数据特征构建的数据集和搜索网络,既执行标签方的任务也执行数据方的任务。The vertical federated modeling optimization method according to claim 11, wherein if the tag side also owns feature data, the tag side also acts as a data side, and deploys a dataset and a search network constructed based on its data features, both executing The tasks of the label side also perform the tasks of the data side.
- 如权利要求16所述的纵向联邦建模优化方法,其中,数据方的数据集和搜索网络为第一数据集和第一搜索网络,标签方的数据集和搜索网络为第二数据集和第二搜索网络。The vertical federated modeling optimization method according to claim 16, wherein the data set and search network of the data side are the first data set and the first search network, and the data set and search network of the label side are the second data set and the first search network 2. Search the web.
- 一种纵向联邦建模优化设备,其中,所述纵向联邦建模优化设备包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的纵向联邦建模优化程序,所述纵向联邦建模优化程序被所述处理器执行时实现以下步骤:A vertical federated modeling and optimization device, wherein the vertical federated modeling and optimization device includes: a memory, a processor, and a vertical federated modeling optimization program stored on the memory and executable on the processor, and The vertical federated modeling optimizer implements the following steps when executed by the processor:将所述第一数据集输入所述第一搜索网络得到网络原始输出;Inputting the first data set into the first search network to obtain the original output of the network;对所述网络原始输出进行差分隐私加密处理得到第一网络输出;Performing differential privacy encryption processing on the original output of the network to obtain a first network output;将所述第一网络输出发送给参与纵向联邦建模的标签方,以供所述标签方对从各数据方接收到的所述第一网络输出进行融合得到第二网络输出后,根据所述第二网络输出和所述标签方的标签数据计算损失函数相对于各所述第一网络输出的第一梯度,并将所述第一梯度并返回给对应的数据方;Send the first network output to the label party participating in the vertical federation modeling, so that the label party can fuse the first network output received from each data party to obtain the second network output, according to the The second network output and the label data of the label side calculate the first gradient of the loss function relative to each of the first network outputs, and return the first gradient to the corresponding data side;根据从所述标签方接收到的所述第一梯度更新所述第一搜索网络中的搜索结构参数和/或模型参数。Search structure parameters and/or model parameters in the first search network are updated according to the first gradient received from the tag side.
- 如权利要求18所述的纵向联邦建模优化设备,其中,数据方的所述第一搜索网络中搜索结构参数包括第一搜索网络中网络单元之间连接操作对应的权重,所述根据从所述标签方接收到的所述第一梯度更新所述第一搜索网络中的搜索结构参数和/或模型参数的步骤之后,还包括:The vertical federation modeling and optimization device according to claim 18, wherein the search structure parameters in the first search network of the data party include weights corresponding to connection operations between network elements in the first search network, the After the step of updating the search structure parameters and/or model parameters in the first search network by the first gradient received by the label side, the method further includes:根据更新参数后的第一搜索网络中的搜索结构参数从各连接操作中选取保留操作;According to the search structure parameters in the first search network after updating the parameters, the reservation operation is selected from each connection operation;将各所述保留操作和各所述保留操作连接的网络单元所构成的模型作为目标模型。A model formed by each of the reservation operations and the network elements connected to each of the reservation operations is taken as a target model.
- 一种计算机可读存储介质,其中,所述计算机可读存储介质上存储有纵向联邦建模优化程序,所述纵向联邦建模优化程序被处理器执行时实现如权利要求1至10中任一项所述的纵向联邦建模优化方法的步骤。A computer-readable storage medium, wherein a vertical federated modeling optimization program is stored on the computer-readable storage medium, and when the longitudinal federal modeling optimization program is executed by a processor, any one of claims 1 to 10 is implemented The steps of the vertical federated modeling optimization method described in Item.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010719397.5A CN111860864A (en) | 2020-07-23 | 2020-07-23 | Longitudinal federal modeling optimization method, device and readable storage medium |
CN202010719397.5 | 2020-07-23 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022016964A1 true WO2022016964A1 (en) | 2022-01-27 |
Family
ID=72949871
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/093407 WO2022016964A1 (en) | 2020-07-23 | 2021-05-12 | Vertical federated modeling optimization method and device, and readable storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN111860864A (en) |
WO (1) | WO2022016964A1 (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114595835A (en) * | 2022-05-07 | 2022-06-07 | 腾讯科技(深圳)有限公司 | Model training method and device based on federal learning, equipment and storage medium |
CN114611008A (en) * | 2022-05-09 | 2022-06-10 | 北京淇瑀信息科技有限公司 | User service strategy determination method and device based on federal learning and electronic equipment |
CN114662705A (en) * | 2022-03-18 | 2022-06-24 | 腾讯科技(深圳)有限公司 | Federal learning method, device, electronic equipment and computer readable storage medium |
CN114742239A (en) * | 2022-03-09 | 2022-07-12 | 大连理工大学 | Financial insurance claim risk model training method and device based on federal learning |
CN114841145A (en) * | 2022-05-10 | 2022-08-02 | 平安科技(深圳)有限公司 | Text abstract model training method and device, computer equipment and storage medium |
CN114841373A (en) * | 2022-05-24 | 2022-08-02 | 中国电信股份有限公司 | Parameter processing method, device, system and product applied to mixed federal scene |
CN114881247A (en) * | 2022-06-10 | 2022-08-09 | 杭州博盾习言科技有限公司 | Longitudinal federal feature derivation method, device and medium based on privacy computation |
CN115346668A (en) * | 2022-07-29 | 2022-11-15 | 京东城市(北京)数字科技有限公司 | Training method and device of health risk grade evaluation model |
CN117454185A (en) * | 2023-12-22 | 2024-01-26 | 深圳市移卡科技有限公司 | Federal model training method, federal model training device, federal model training computer device, and federal model training storage medium |
CN116304644B (en) * | 2023-05-18 | 2024-06-11 | 腾讯科技(深圳)有限公司 | Data processing method, device, equipment and medium based on federal learning |
TWI852148B (en) | 2022-10-28 | 2024-08-11 | 財團法人工業技術研究院 | Data privacy protection method, server device and client device for federated learning |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111860864A (en) * | 2020-07-23 | 2020-10-30 | 深圳前海微众银行股份有限公司 | Longitudinal federal modeling optimization method, device and readable storage medium |
CN112347476B (en) * | 2020-11-13 | 2024-02-02 | 脸萌有限公司 | Data protection method, device, medium and equipment |
CN112132270B (en) * | 2020-11-24 | 2021-03-23 | 支付宝(杭州)信息技术有限公司 | Neural network model training method, device and system based on privacy protection |
CN112700003A (en) * | 2020-12-25 | 2021-04-23 | 深圳前海微众银行股份有限公司 | Network structure search method, device, equipment, storage medium and program product |
CN112700013A (en) * | 2020-12-30 | 2021-04-23 | 深圳前海微众银行股份有限公司 | Parameter configuration method, device, equipment and storage medium based on federal learning |
CN112347500B (en) * | 2021-01-11 | 2021-04-09 | 腾讯科技(深圳)有限公司 | Machine learning method, device, system, equipment and storage medium of distributed system |
CN113011603A (en) * | 2021-03-17 | 2021-06-22 | 深圳前海微众银行股份有限公司 | Model parameter updating method, device, equipment, storage medium and program product |
CN113051239A (en) * | 2021-03-26 | 2021-06-29 | 北京沃东天骏信息技术有限公司 | Data sharing method, use method of model applying data sharing method and related equipment |
CN112799708B (en) * | 2021-04-07 | 2021-07-13 | 支付宝(杭州)信息技术有限公司 | Method and system for jointly updating business model |
CN115965093B (en) * | 2021-10-09 | 2024-10-11 | 抖音视界有限公司 | Model training method and device, storage medium and electronic equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110633805A (en) * | 2019-09-26 | 2019-12-31 | 深圳前海微众银行股份有限公司 | Longitudinal federated learning system optimization method, device, equipment and readable storage medium |
CN110633806A (en) * | 2019-10-21 | 2019-12-31 | 深圳前海微众银行股份有限公司 | Longitudinal federated learning system optimization method, device, equipment and readable storage medium |
CN111210003A (en) * | 2019-12-30 | 2020-05-29 | 深圳前海微众银行股份有限公司 | Longitudinal federated learning system optimization method, device, equipment and readable storage medium |
CN111860864A (en) * | 2020-07-23 | 2020-10-30 | 深圳前海微众银行股份有限公司 | Longitudinal federal modeling optimization method, device and readable storage medium |
-
2020
- 2020-07-23 CN CN202010719397.5A patent/CN111860864A/en active Pending
-
2021
- 2021-05-12 WO PCT/CN2021/093407 patent/WO2022016964A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110633805A (en) * | 2019-09-26 | 2019-12-31 | 深圳前海微众银行股份有限公司 | Longitudinal federated learning system optimization method, device, equipment and readable storage medium |
CN110633806A (en) * | 2019-10-21 | 2019-12-31 | 深圳前海微众银行股份有限公司 | Longitudinal federated learning system optimization method, device, equipment and readable storage medium |
CN111210003A (en) * | 2019-12-30 | 2020-05-29 | 深圳前海微众银行股份有限公司 | Longitudinal federated learning system optimization method, device, equipment and readable storage medium |
CN111860864A (en) * | 2020-07-23 | 2020-10-30 | 深圳前海微众银行股份有限公司 | Longitudinal federal modeling optimization method, device and readable storage medium |
Non-Patent Citations (1)
Title |
---|
CHAOYANG HE; MURALI ANNAVARAM; SALMAN AVESTIMEHR: "FedNAS: Federated Deep Learning via Neural Architecture Search", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY,, 18 April 2020 (2020-04-18), Ithaca, NY 14853, XP081647715 * |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114742239A (en) * | 2022-03-09 | 2022-07-12 | 大连理工大学 | Financial insurance claim risk model training method and device based on federal learning |
CN114662705A (en) * | 2022-03-18 | 2022-06-24 | 腾讯科技(深圳)有限公司 | Federal learning method, device, electronic equipment and computer readable storage medium |
CN114595835B (en) * | 2022-05-07 | 2022-07-22 | 腾讯科技(深圳)有限公司 | Model training method and device based on federal learning, equipment and storage medium |
CN114595835A (en) * | 2022-05-07 | 2022-06-07 | 腾讯科技(深圳)有限公司 | Model training method and device based on federal learning, equipment and storage medium |
CN114611008A (en) * | 2022-05-09 | 2022-06-10 | 北京淇瑀信息科技有限公司 | User service strategy determination method and device based on federal learning and electronic equipment |
CN114611008B (en) * | 2022-05-09 | 2022-07-22 | 北京淇瑀信息科技有限公司 | User service strategy determination method and device based on federal learning and electronic equipment |
CN114841145B (en) * | 2022-05-10 | 2023-07-11 | 平安科技(深圳)有限公司 | Text abstract model training method, device, computer equipment and storage medium |
CN114841145A (en) * | 2022-05-10 | 2022-08-02 | 平安科技(深圳)有限公司 | Text abstract model training method and device, computer equipment and storage medium |
CN114841373A (en) * | 2022-05-24 | 2022-08-02 | 中国电信股份有限公司 | Parameter processing method, device, system and product applied to mixed federal scene |
CN114841373B (en) * | 2022-05-24 | 2024-05-10 | 中国电信股份有限公司 | Parameter processing method, device, system and product applied to mixed federal scene |
CN114881247A (en) * | 2022-06-10 | 2022-08-09 | 杭州博盾习言科技有限公司 | Longitudinal federal feature derivation method, device and medium based on privacy computation |
CN115346668A (en) * | 2022-07-29 | 2022-11-15 | 京东城市(北京)数字科技有限公司 | Training method and device of health risk grade evaluation model |
TWI852148B (en) | 2022-10-28 | 2024-08-11 | 財團法人工業技術研究院 | Data privacy protection method, server device and client device for federated learning |
CN116304644B (en) * | 2023-05-18 | 2024-06-11 | 腾讯科技(深圳)有限公司 | Data processing method, device, equipment and medium based on federal learning |
CN117454185A (en) * | 2023-12-22 | 2024-01-26 | 深圳市移卡科技有限公司 | Federal model training method, federal model training device, federal model training computer device, and federal model training storage medium |
CN117454185B (en) * | 2023-12-22 | 2024-03-12 | 深圳市移卡科技有限公司 | Federal model training method, federal model training device, federal model training computer device, and federal model training storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN111860864A (en) | 2020-10-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022016964A1 (en) | Vertical federated modeling optimization method and device, and readable storage medium | |
WO2022007321A1 (en) | Longitudinal federal modeling optimization method, apparatus and device, and readable storage medium | |
CN113609521B (en) | Federated learning privacy protection method and system based on countermeasure training | |
CN110245510B (en) | Method and apparatus for predicting information | |
CN110633806B (en) | Longitudinal federal learning system optimization method, device, equipment and readable storage medium | |
WO2021120676A1 (en) | Model training method for federated learning network, and related device | |
WO2021022707A1 (en) | Hybrid federated learning method and architecture | |
WO2021159798A1 (en) | Method for optimizing longitudinal federated learning system, device and readable storage medium | |
WO2021083276A1 (en) | Method, device, and apparatus for combining horizontal federation and vertical federation, and medium | |
WO2022193432A1 (en) | Model parameter updating method, apparatus and device, storage medium, and program product | |
WO2021174877A1 (en) | Processing method for smart decision-based target detection model, and related device | |
WO2022156594A1 (en) | Federated model training method and apparatus, electronic device, computer program product, and computer-readable storage medium | |
WO2020119540A1 (en) | Group profile picture generation method and device | |
US20240176906A1 (en) | Methods, apparatuses, and systems for collaboratively updating model by multiple parties for implementing privacy protection | |
CN113627085A (en) | Method, apparatus, medium, and program product for optimizing horizontal federated learning modeling | |
CN111767411B (en) | Knowledge graph representation learning optimization method, device and readable storage medium | |
WO2022048195A1 (en) | Longitudinal federation modeling method, apparatus, and device, and computer readable storage medium | |
US20170228349A1 (en) | Combined predictions methodology | |
JP2023521120A (en) | Method and Apparatus for Evaluating Collaborative Training Models | |
CN105337841A (en) | Information processing method and system, client, and server | |
EP4386636A1 (en) | User data processing system, method and apparatus | |
US20240275828A1 (en) | Account registration session management operations using concurrent preliminary risk scoring | |
CN111177653B (en) | Credit evaluation method and device | |
CN113626866A (en) | Localized differential privacy protection method and system for federal learning, computer equipment and storage medium | |
CN112016698A (en) | Factorization machine model construction method and device and readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21845772 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21845772 Country of ref document: EP Kind code of ref document: A1 |