CN111340247A - Longitudinal federated learning system optimization method, device and readable storage medium - Google Patents
Longitudinal federated learning system optimization method, device and readable storage medium Download PDFInfo
- Publication number
- CN111340247A CN111340247A CN202010089045.6A CN202010089045A CN111340247A CN 111340247 A CN111340247 A CN 111340247A CN 202010089045 A CN202010089045 A CN 202010089045A CN 111340247 A CN111340247 A CN 111340247A
- Authority
- CN
- China
- Prior art keywords
- intermediate result
- encrypted
- data
- equipment
- learning system
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 61
- 238000005457 optimization Methods 0.000 title claims abstract description 39
- 238000004891 communication Methods 0.000 claims abstract description 18
- 238000005070 sampling Methods 0.000 claims description 33
- 238000012545 processing Methods 0.000 claims description 16
- 238000004364 calculation method Methods 0.000 claims description 13
- 238000005056 compaction Methods 0.000 claims 1
- 230000009469 supplementation Effects 0.000 claims 1
- 238000012549 training Methods 0.000 abstract description 19
- 230000008569 process Effects 0.000 description 18
- 230000006870 function Effects 0.000 description 16
- 238000007477 logistic regression Methods 0.000 description 8
- 238000004422 calculation algorithm Methods 0.000 description 7
- 230000000295 complement effect Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 238000010801 machine learning Methods 0.000 description 5
- 230000003993 interaction Effects 0.000 description 2
- 238000012417 linear regression Methods 0.000 description 2
- 230000000717 retained effect Effects 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 239000013589 supplement Substances 0.000 description 2
- 239000013598 vector Substances 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000013215 result calculation Methods 0.000 description 1
- 238000004904 shortening Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/20—Ensemble learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Storage Device Security (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
本发明公开了一种纵向联邦学习系统优化方法、设备及可读存储介质,所述方法包括:接收所述第二设备发送的第二设备的加密精简中间结果,而后对所述第二设备的加密精简中间结果进行数据补齐,得到第二设备的加密补齐中间结果,最后利用所述第二设备的加密补齐中间结果计算得到所述第一设备中模型参数对应的加密第一梯度值,并基于所述加密第一梯度值更新所述第一设备的模型参数,循环迭代直到检测到满足预设停止条件时,得到训练完成的第一设备的目标模型参数。在纵向联邦训练中,通过减少参与设备的中间结果所包含的数据个数,从而减少了需要加密及通信的数据量,降低了加密和通信成本,同时极大的缩短了纵向联邦建模时间。
The invention discloses a vertical federated learning system optimization method, device and readable storage medium. The method comprises: receiving an encrypted and simplified intermediate result of the second device sent by the second device, and Encrypting and simplifying the intermediate results for data completion, obtaining the encryption and filling intermediate results of the second device, and finally calculating the encrypted first gradient values corresponding to the model parameters in the first device by using the encryption and filling intermediate results of the second device. , and update the model parameters of the first device based on the encrypted first gradient value, and iterate in a loop until it is detected that the preset stop condition is met, to obtain the target model parameters of the trained first device. In vertical federation training, by reducing the number of data contained in the intermediate results of participating devices, the amount of data that needs to be encrypted and communicated is reduced, the cost of encryption and communication is reduced, and the modeling time of vertical federation is greatly shortened.
Description
技术领域technical field
本发明涉及机器学习技术领域,尤其涉及一种纵向联邦学习系统优化方法、设备及可读存储介质。The present invention relates to the technical field of machine learning, and in particular, to an optimization method, device and readable storage medium for a longitudinal federated learning system.
背景技术Background technique
随着人工智能的发展,人们为解决数据孤岛的问题,提出了“联邦学习”的概念,使得联邦双方在不用给出己方数据的情况下,也可进行模型训练得到模型参数,并且可以避免数据隐私泄露的问题。With the development of artificial intelligence, people put forward the concept of "federated learning" in order to solve the problem of data islands, so that both sides of the federation can train models to obtain model parameters without giving their own data, and can avoid data privacy breaches.
纵向联邦学习是在参与者的数据特征重叠较小,而用户重叠较多的情况下,取出参与者用户相同而用户数据特征不同的那部分用户及数据进行联合训练机器学习模型。比如有属于同一个地区的两个参与者A和B,其中参与者A是一家银行,参与者B是一个电商平台。参与者A和B在同一地区拥有较多相同的用户,但是A与B的业务不同,记录的用户数据特征是不同的。特别地,A和B记录的用户数据特征可能是互补的。在这样的场景下,可以使用纵向联邦学习来帮助A和B构建联合机器学习预测模型,帮助A和B向他们的客户提供更好的服务。In vertical federated learning, when the data features of the participants overlap less and the users overlap more, the part of the users and data with the same users but different data features is taken out to jointly train the machine learning model. For example, there are two participants A and B belonging to the same region, where participant A is a bank and participant B is an e-commerce platform. Participants A and B have more and the same users in the same area, but A and B have different businesses, and the recorded user data characteristics are different. In particular, the user data characteristics of A and B records may be complementary. In such a scenario, vertical federated learning can be used to help A and B build a joint machine learning predictive model to help A and B provide better services to their customers.
纵向联邦学习在建模过程中,参与者之间以加密形式交互用于计算梯度和损失函数的中间结果,每一轮模型训练都需要对中间结果中的每个数据进行加密及交换,中间结果的数量与参与者所拥有的数据的数量相同,故加密及交互的数据量很大,加密和通信成本很高,同时也增加了纵向联邦建模时间。In the modeling process of vertical federated learning, participants interact in encrypted form to calculate the intermediate results of gradient and loss function. Each round of model training needs to encrypt and exchange each data in the intermediate results. The amount of data is the same as the amount of data owned by the participants, so the amount of encrypted and interactive data is large, the encryption and communication costs are high, and it also increases the vertical federation modeling time.
发明内容SUMMARY OF THE INVENTION
本发明的主要目的在于提供一种纵向联邦学习系统优化方法、装置、设备及可读存储介质,旨在实现降低纵向联邦学习训练过程中的加密和通信成本,缩短建模时间。The main purpose of the present invention is to provide a vertical federated learning system optimization method, device, device and readable storage medium, aiming at reducing the encryption and communication costs in the vertical federated learning training process and shortening the modeling time.
为实现上述目的,本发明提供一种纵向联邦学习系统优化方法,应用于参与纵向联邦学习的第一设备,所述第一设备与第二设备通信连接,所述纵向联邦学习系统优化方法包括以下步骤:In order to achieve the above object, the present invention provides a vertical federated learning system optimization method, which is applied to a first device participating in vertical federated learning, and the first device is communicatively connected to the second device. The vertical federated learning system optimization method includes the following: step:
接收所述第二设备发送的第二设备的加密精简中间结果,其中,所述第二设备用于对计算得到的第二设备的各条样本数据对应的原始中间结果进行抽样处理,得到所述第二设备的部分样本数据对应的精简中间结果,并对所述第二设备的精简中间结果进行加密,得到所述第二设备的加密精简中间结果;Receive the encrypted and simplified intermediate result of the second device sent by the second device, wherein the second device is configured to perform sampling processing on the original intermediate results corresponding to each piece of sample data of the second device obtained by calculation, to obtain the a simplified intermediate result corresponding to part of the sample data of the second device, and encrypting the simplified intermediate result of the second device to obtain the encrypted and simplified intermediate result of the second device;
对所述第二设备的加密精简中间结果进行数据补齐,得到第二设备的加密补齐中间结果,其中,所述加密补齐中间结果的数据数量与所述原始中间结果的数据数量相同;performing data complementation on the encrypted and simplified intermediate result of the second device to obtain an encrypted and complemented intermediate result of the second device, wherein the data quantity of the encrypted and complemented intermediate result is the same as the data quantity of the original intermediate result;
利用所述第二设备的加密补齐中间结果计算得到所述第一设备中模型参数对应的加密第一梯度值,并基于所述加密第一梯度值更新所述第一设备的模型参数,循环迭代直到检测到满足预设停止条件时,得到训练完成的第一设备的目标模型参数。The encrypted first gradient value corresponding to the model parameter in the first device is obtained by calculating the intermediate result of the encryption and complementation of the second device, and the model parameter of the first device is updated based on the encrypted first gradient value, and the cycle is repeated. Iterating until it is detected that the preset stop condition is satisfied, the target model parameters of the trained first device are obtained.
可选地,所述对所述第二设备的加密精简中间结果进行数据补齐获得加密补齐中间结果的步骤包括:Optionally, the step of performing data complementation on the encrypted and reduced intermediate result of the second device to obtain the encrypted and complemented intermediate result includes:
获取所述第二设备的抽样对照表,并基于所述第二设备的抽样对照表在所述第二设备的加密精简中间结果中确定填充数据以及所述填充数据对应的填充位置;Obtaining a sampling comparison table of the second device, and determining padding data and a padding position corresponding to the padding data in the encrypted and condensed intermediate result of the second device based on the sampling comparison table of the second device;
在所述第二设备的加密精简中间结果中,基于所述填充位置插入所述填充数据,得到所述第二设备的加密补齐中间结果。In the encrypted compact intermediate result of the second device, the padding data is inserted based on the padding position to obtain the encrypted complementary intermediate result of the second device.
可选地,所述利用所述第二设备的加密补齐中间结果计算得到所述第一设备中模型参数对应的加密第一梯度值的步骤包括:Optionally, the step of calculating the encrypted first gradient value corresponding to the model parameter in the first device by using the encryption and complementing the intermediate result of the second device includes:
计算得到用于计算梯度值的第一设备的加密精简中间结果;calculating the encrypted condensed intermediate result of the first device for calculating the gradient value;
利用所述第二设备的加密补齐中间结果以及所述第一设备的加密精简中间结果,计算得到第一设备的加密中间结果;Using the encryption-complemented intermediate result of the second device and the encryption-reduced intermediate result of the first device to calculate the encrypted intermediate result of the first device;
利用所述第一设备的加密中间结果计算得到所述第一设备中模型参数对应的加密第一梯度值。The encrypted first gradient value corresponding to the model parameter in the first device is calculated by using the encrypted intermediate result of the first device.
可选地,所述计算得到用于计算梯度值的第一设备的加密精简中间结果的步骤包括:Optionally, the step of obtaining the encrypted and simplified intermediate result of the first device for calculating the gradient value by calculating includes:
所述第一设备对计算得到的第一设备的各条样本数据对应的原始中间结果进行抽样处理,得到所述第一设备的部分样本数据对应的精简中间结果;The first device performs sampling processing on the calculated original intermediate results corresponding to each piece of sample data of the first device, to obtain a simplified intermediate result corresponding to part of the sample data of the first device;
对所述第一设备的精简中间结果进行加密,得到所述第一设备的加密精简中间结果。The condensed intermediate result of the first device is encrypted to obtain the encrypted condensed intermediate result of the first device.
可选地,所述利用所述第二设备的加密补齐中间结果以及所述第一设备的加密精简中间结果,计算得到第一设备的加密中间结果的步骤包括:Optionally, the step of calculating and obtaining the encrypted intermediate result of the first device by using the encryption of the second device to complete the intermediate result and the encryption of the first device to reduce the intermediate result includes:
对所述第一设备的加密精简中间结果进行数据补齐,得到所述第一设备的加密补齐中间结果;performing data complementation on the encrypted and simplified intermediate result of the first device to obtain the encrypted and complemented intermediate result of the first device;
利用所述第一设备的加密补齐中间结果与所述第二设备的加密补齐中间结果,计算得到第一设备的加密中间结果。The encrypted intermediate result of the first device is obtained by calculation by using the encryption and complementation intermediate result of the first device and the encryption and complementation intermediate result of the second device.
为实现上述目的,本发明还提供一种纵向联邦学习系统优化方法,应用于参与纵向联邦学习的第二设备,所述纵向联邦学习系统优化方法包括以下步骤:In order to achieve the above object, the present invention also provides a vertical federated learning system optimization method, which is applied to the second device participating in the vertical federated learning. The vertical federated learning system optimization method includes the following steps:
对计算得到的第二设备的各条样本数据对应的原始中间结果进行抽样处理,得到所述第二设备的部分样本数据对应的精简中间结果;Perform sampling processing on the calculated original intermediate results corresponding to each piece of sample data of the second device, to obtain a simplified intermediate result corresponding to part of the sample data of the second device;
对第二设备的精简中间结果进行加密,得到所述第二设备的部分样本数据对应的精简中间结果并发送至所述第一设备,以供所述第一设备基于所述第二设备的加密精简中间结果反馈所述第一设备的加密中间结果,其中,所述第一设备用于对接收的所述第二设备的加密精简中间结果进行数据补齐获得所述第二设备的加密补齐中间结果,并利用所述第二设备的加密补齐中间结果计算得到所述第一设备的加密中间结果;Encrypt the reduced intermediate result of the second device to obtain a reduced intermediate result corresponding to part of the sample data of the second device and send it to the first device for the first device to encrypt based on the second device The condensed intermediate result feeds back the encrypted intermediate result of the first device, wherein the first device is configured to perform data complementation on the received encrypted condensed intermediate result of the second device to obtain the encrypted complement of the second device intermediate results, and use the encryption of the second device to supplement the intermediate results to calculate the encrypted intermediate results of the first device;
利用所述第一设备的加密中间结果计算得到所述第二设备中模型参数对应的加密第二梯度值,并基于所述加密第二梯度值更新所述第二设备的模型参数,循环迭代直到检测到满足预设停止条件时,得到训练完成的第二设备的目标模型参数。The encrypted second gradient value corresponding to the model parameter in the second device is calculated by using the encrypted intermediate result of the first device, and the model parameter of the second device is updated based on the encrypted second gradient value, and the loop is iterated until When it is detected that the preset stop condition is satisfied, the target model parameters of the second device that has been trained are obtained.
可选地,所述对计算得到的第二设备的各条样本数据对应的原始中间结果进行抽样处理,得到所述第二设备的部分样本数据对应的精简中间结果的步骤包括:Optionally, the step of performing sampling processing on the calculated original intermediate results corresponding to each piece of sample data of the second device, and obtaining the simplified intermediate results corresponding to part of the sample data of the second device includes:
利用所述第二设备的模型参数分别对所述第二设备各条样本数据进行加权求和,计算得到所述第二设备的原始中间结果;Using the model parameters of the second device to perform weighted summation on each piece of sample data of the second device, and calculate the original intermediate result of the second device;
基于阈值对所述第二设备的原始中间结果进行拆分,得到第一子原始中间结果和第二子原始中间结果,其中,所述第一子原始中间结果中的各个数据小于或等于阈值,所述第二子原始中间结果中的各个数据大于阈值;Splitting the original intermediate result of the second device based on the threshold to obtain a first sub-original intermediate result and a second sub-original intermediate result, wherein each data in the first sub-original intermediate result is less than or equal to the threshold, Each data in the second sub-original intermediate result is greater than a threshold;
将所述第一子原始中间结果的所有数据进行分组,并确定各组各自的代表数据,由各组的代表数据组成第三子原始中间结果;Grouping all the data of the first sub-original intermediate results, and determining the respective representative data of each group, and forming the third sub-original intermediate results from the representative data of each group;
基于所述第三子原始中间结果和所述第二子原始中间结果,得到第二设备的精简中间结果。Based on the third sub-original intermediate result and the second sub-original intermediate result, a reduced intermediate result for the second device is obtained.
为实现上述目的,本发明还提供一种纵向联邦学习系统优化设备,所述纵向联邦学习系统优化设备包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的纵向联邦学习系统优化程序,所述纵向联邦学习系统优化程序被所述处理器执行时实现如上所述的纵向联邦学习系统优化方法的步骤。In order to achieve the above object, the present invention also provides a vertical federated learning system optimization device, the vertical federated learning system optimization device includes: a memory, a processor, and a vertical federated learning system stored on the memory and running on the processor. A federated learning system optimization program, when the longitudinal federated learning system optimization program is executed by the processor, implements the steps of the vertical federated learning system optimization method as described above.
此外,为实现上述目的,本发明还提出一种可读存储介质,所述可读存储介质上存储有纵向联邦学习系统优化程序,所述纵向联邦学习系统优化程序被处理器执行时实现如上所述的纵向联邦学习系统优化方法的步骤。In addition, in order to achieve the above object, the present invention also provides a readable storage medium, on which a longitudinal federated learning system optimization program is stored, and the longitudinal federated learning system optimization program is executed by a processor to achieve the above-mentioned The steps of the vertical federated learning system optimization method described above.
本发明中,接收所述第二设备发送的第二设备的加密精简中间结果,其中,所述第二设备对计算得到的第二设备的各条样本数据对应的原始中间结果进行抽样处理,得到所述第二设备的部分样本数据对应的精简中间结果,并对所述第二设备的精简中间结果进行加密,得到所述第二设备的加密精简中间结果,而后对所述第二设备的加密精简中间结果进行数据补齐,得到第二设备的加密补齐中间结果,其中,所述加密补齐中间结果的数据数量与所述原始中间结果的数据数量相同,接下来利用所述第二设备的加密补齐中间结果计算得到所述第一设备中模型参数对应的加密第一梯度值,并基于所述加密第一梯度值更新所述第一设备的模型参数,循环迭代直到检测到满足预设停止条件时,得到训练完成的第一设备的目标模型参数。通过减少参与设备对应的中间结果所包含的数据个数,减少了需要加密及通信的数据量,降低了加密和通信成本,同时极大的缩短了纵向联邦建模时间。In the present invention, the encrypted and simplified intermediate result of the second device sent by the second device is received, wherein the second device performs sampling processing on the original intermediate results corresponding to each piece of sample data of the second device obtained by calculation, to obtain The simplified intermediate result corresponding to part of the sample data of the second device is encrypted, and the simplified intermediate result of the second device is encrypted to obtain the encrypted simplified intermediate result of the second device, and then the encryption of the second device is encrypted. Streamlining the intermediate result and performing data completion to obtain an encrypted and completed intermediate result of the second device, wherein the data quantity of the encrypted and completed intermediate result is the same as the data quantity of the original intermediate result, and then using the second device The encrypted first gradient value corresponding to the model parameter in the first device is obtained by calculating the intermediate result of the encryption and complementation, and the model parameter of the first device is updated based on the encrypted first gradient value, and the loop is iterated until it is detected that the predetermined value is satisfied. When the stop condition is set, the target model parameters of the first device that has been trained are obtained. By reducing the number of data contained in the intermediate results corresponding to participating devices, the amount of data that needs to be encrypted and communicated is reduced, the cost of encryption and communication is reduced, and the time for vertical federation modeling is greatly shortened.
附图说明Description of drawings
图1是本发明实施例方案涉及的硬件运行环境的结构示意图;1 is a schematic structural diagram of a hardware operating environment involved in an embodiment of the present invention;
图2为本发明纵向联邦学习系统优化方法第一实施例的流程示意图;FIG. 2 is a schematic flowchart of the first embodiment of the vertical federated learning system optimization method of the present invention;
图3为本发明实施例涉及的一种样本数据示意图。FIG. 3 is a schematic diagram of a sample data involved in an embodiment of the present invention.
本发明目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。The realization, functional characteristics and advantages of the present invention will be further described with reference to the accompanying drawings in conjunction with the embodiments.
具体实施方式Detailed ways
应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.
如图1所示,图1是本发明实施例方案涉及的硬件运行环境的结构示意图。As shown in FIG. 1 , FIG. 1 is a schematic structural diagram of a hardware operating environment involved in an embodiment of the present invention.
需要说明的是,图1即可为纵向联邦学习系统优化设备的硬件运行环境的结构示意图。本发明实施例纵向联邦学习系统优化设备可以是PC,也可以是智能手机、智能电视机、平板电脑、便携计算机等具有显示功能的终端设备。It should be noted that FIG. 1 can be a schematic structural diagram of the hardware operating environment of the vertical federated learning system optimization device. The vertical federated learning system optimization device in the embodiment of the present invention may be a PC, or may be a terminal device with a display function, such as a smart phone, a smart TV, a tablet computer, and a portable computer.
如图1所示,该纵向联邦学习系统优化设备可以包括:处理器1001,例如CPU,网络接口1004,用户接口1003,存储器1005,通信总线1002。其中,通信总线1002用于实现这些组件之间的连接通信。用户接口1003可以包括显示屏(Display)、输入单元比如键盘(Keyboard),可选用户接口1003还可以包括标准的有线接口、无线接口。网络接口1004可选的可以包括标准的有线接口、无线接口(如WI-FI接口)。存储器1005可以是高速RAM存储器,也可以是稳定的存储器(non-volatile memory),例如磁盘存储器。存储器1005可选的还可以是独立于前述处理器1001的存储装置。As shown in FIG. 1 , the vertical federated learning system optimization device may include: a
本领域技术人员可以理解,图1中示出的系统结构并不构成对终端系统的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。Those skilled in the art can understand that the system structure shown in FIG. 1 does not constitute a limitation on the terminal system, and may include more or less components than the one shown, or combine some components, or arrange different components.
如图1所示,作为一种可读存储介质的存储器1005中可以包括操作系统、网络通信模块、用户接口模块以及纵向联邦学习系统优化程序。As shown in FIG. 1 , the
在图1所示的系统中,网络接口1004主要用于连接后台服务器,与后台服务器进行数据通信;用户接口1003主要用于连接客户端(客户端),与客户端进行数据通信;而处理器1001可以用于调用存储器1005中存储的纵向联邦学习系统优化程序。In the system shown in FIG. 1 , the
在本实施例中,终端系统包括:存储器1005、处理器1001及存储在所述存储器1005上并可在所述处理器1001上运行的纵向联邦学习系统优化程序,其中,处理器1001调用存储器1005中存储的纵向联邦学习系统优化程序时,执行本申请各个实施例提供的纵向联邦学习系统优化方法的步骤。In this embodiment, the terminal system includes: a
基于上述的结构,提出纵向联邦学习系统优化方法的各个实施例。Based on the above structure, various embodiments of the vertical federated learning system optimization method are proposed.
本发明实施例提供了纵向联邦学习系统优化方法的实施例,需要说明的是,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。本发明实施例中涉及的第一设备和第二设备可以是参与纵向联邦学习联邦学习的参与设备,参与设备可以是智能手机、个人计算机和服务器等设备。This embodiment of the present invention provides an embodiment of an optimization method for a vertical federated learning system. It should be noted that, although a logical sequence is shown in the flowchart, in some cases, the sequence shown here may be executed in a different sequence. out or described steps. The first device and the second device involved in the embodiment of the present invention may be participating devices participating in the vertical federated learning federated learning, and the participating devices may be devices such as smart phones, personal computers, and servers.
参照图2,图2为本发明纵向联邦学习系统优化方法第一实施例的流程示意图。在本实施例中,所述纵向联邦学习系统优化方法包括:Referring to FIG. 2 , FIG. 2 is a schematic flowchart of a first embodiment of an optimization method for a vertical federated learning system according to the present invention. In this embodiment, the vertical federated learning system optimization method includes:
步骤S10,接收所述第二设备发送的第二设备的加密精简中间结果,其中,所述第二设备用于对计算得到的第二设备的各条样本数据对应的原始中间结果进行抽样处理,得到所述第二设备的部分样本数据对应的精简中间结果,并对所述第一设备的精简中间结果进行加密,得到所述第一设备的加密精简中间结果;Step S10: Receive an encrypted and simplified intermediate result of the second device sent by the second device, wherein the second device is configured to perform sampling processing on the original intermediate results corresponding to each piece of sample data of the second device obtained by calculation, obtaining a reduced intermediate result corresponding to part of the sample data of the second device, and encrypting the reduced intermediate result of the first device, to obtain an encrypted reduced intermediate result of the first device;
在本实施例中,第一设备与第二设备预先建立通信连接。第一设备和第二设备的本地数据在用户维度上有重叠部分,在数据特征上有不相同的部分(可能完全不相同),第一设备与第二设备采用各自的本地数据进行样本对齐,确定双方的共有用户和不同的数据特征,第一设备将本地数据中共有用户的数据作为训练数据,第二设备将本地数据中共有用户的数据中与第一设备数据特征不同的数据作为训练数据,也即最终确定的第一样本数据和第二样本数据中用户是相同的,数据特征不相同。第一设备和第二设备进行样本对齐的方式可采用现有的样本对齐技术,在此不进行详细赘述。例如,图3为第一设备和第二设备中的样本数据示意图,第一设备本地数据中包括3个用户{U1,U2,U3},数据特征包括{X1,X2,X3},第二设备本地数据包括3个用户{U1,U2,U4},数据特征包括{X4,X5}。样本对齐后,第一设备确定的训练数据是用户U1和U2在数据特征X1、X2和X3下的数据,第二设备确定的训练数据是用户U1和U2在数据特征X4和X5下的数据。In this embodiment, the first device establishes a communication connection with the second device in advance. The local data of the first device and the second device have overlapping parts in the user dimension, and have different parts (may be completely different) in the data characteristics. The first device and the second device use their respective local data for sample alignment, Determine the shared users and different data characteristics of the two parties, the first device uses the data of the shared users in the local data as training data, and the second device uses the data of the shared users in the local data that is different from the data characteristics of the first device as training data , that is, the users in the finally determined first sample data and the second sample data are the same, and the data characteristics are different. The manner in which the first device and the second device perform sample alignment may adopt the existing sample alignment technology, which will not be described in detail here. For example, FIG. 3 is a schematic diagram of sample data in the first device and the second device. The local data of the first device includes 3 users {U1, U2, U3}, the data features include {X1, X2, X3}, and the second device includes 3 users {U1, U2, U3}. The local data includes 3 users {U1, U2, U4}, and the data features include {X4, X5}. After the samples are aligned, the training data determined by the first device is the data of users U1 and U2 under data features X1, X2 and X3, and the training data determined by the second device is the data of users U1 and U2 under data features X4 and X5.
具体地,纵向联邦线性模型学习的一轮模型参数更新过程中,第一设备与第二设备之间以加密形式交互用于计算梯度和损失函数的中间结果,其中,加密采用同态加密算法,由第一设备和第二设备共同信任的第三方协调方产生公钥和私钥,并把公钥发送给第一设备和第二设备进行加密,第一设备和第二设备将加密的梯度值和加密的损失函数发送到协调方解密,然后根据解密后的梯度值更新第一设备的本地模型和第二设备的本地模型。Specifically, during a round of model parameter update in the learning of the longitudinal federated linear model, the first device and the second device interact in encrypted form to calculate the intermediate results of the gradient and the loss function, wherein the encryption adopts a homomorphic encryption algorithm, The public key and private key are generated by a third-party coordinator trusted by the first device and the second device, and the public key is sent to the first device and the second device for encryption. The first device and the second device will encrypt the gradient value And the encrypted loss function is sent to the coordinator for decryption, and then the local model of the first device and the local model of the second device are updated according to the decrypted gradient value.
本申请涉及的线性模型包括但不限于:逻辑回归、线性回归、泊松回归等线性的基于权重学习的模型算法。为了描述方便,本发明中以纵向逻辑回归模型训练为例,进行说明参与纵向联邦学习的第一设备与第二设备一起联合构建一个逻辑回归模型。第二设备拥有数据其中,DA表示第二设备的数据集,第一设备拥有数据和标签其中,DB表示第一设备的数据集,和都是多维向量,而yi是标量(例如,取值为0或1的标量,表示是或者否)。定义其中,wA和wB是分别对应于和的机器学习模型参数,则The linear models involved in this application include, but are not limited to, linear weight learning-based model algorithms such as logistic regression, linear regression, and Poisson regression. For the convenience of description, the present invention takes longitudinal logistic regression model training as an example to illustrate that the first device participating in the longitudinal federated learning and the second device jointly build a logistic regression model. The second device owns the data where D A represents the data set of the second device and the first device owns the data and labels where DB represents the dataset of the first device, and are multidimensional vectors, and y i is a scalar (eg, a scalar with the value 0 or 1, indicating yes or no). definition where w A and w B are respectively corresponding to and the machine learning model parameters, then
损失函数loss(也称为代价函数)为:The loss function loss (also known as the cost function) is:
loss在第一设备进行计算,根据损失函数定义可知,需要第二设备发送中间结果uA和至第一设备,以供第一设备计算loss值。联邦训练过程中,需要对中间结果进行加密,避免数据隐私泄露,故第二设备发送加密中间结果[[uA]]和至第一设备,其中,[[·]]表示同态加密。The loss is calculated in the first device. According to the definition of the loss function, the second device needs to send the intermediate results u A and to the first device for the first device to calculate the loss value. During the federated training process, the intermediate results need to be encrypted to avoid leakage of data privacy, so the second device sends the encrypted intermediate results [[u A ]] and to the first device, where [[ ]] represents homomorphic encryption.
进一步地,定义 Further, define
同态加密后 After homomorphic encryption
梯度函数G为:G=dx=∑dxi,则[[G]]=[[dx]]=∑[[d]]xi,第一设备根据接收第二设备发送的加密中间结果[[uA]],以及第一设备的uB计算得到[[d]],根据[[d]]进一步计算第一设备本地模型的加密梯度值[[GB]],同时第一设备发送[[d]]至第二设备,以供第二设备计算本地模型的加密梯度值[[GA]]。The gradient function G is: G=dx=∑dx i , then [[G]]=[[dx]]=∑[[d]]x i , the first device receives the encrypted intermediate result [[ u A ]], and the u B of the first device is calculated to obtain [[d]], and the encrypted gradient value [[ GB ]] of the local model of the first device is further calculated according to [[d]], while the first device sends [ [d]] to the second device for the second device to calculate the encrypted gradient value [[G A ]] of the local model.
具体地,uA和的数量分别与的样本数量相同,通常情况下样本数量是非常大的,第二设备需要对uA和进行加密再发送至第一设备进行交互,因此整个加密过程非常耗时并且通信量很大。第二设备对原始中间结果uA和进行抽样处理,得到精简中间结果u'A和原始中间结果到精简中间结果是数据维度上的数据量减少,即精简中间结果中的数据个数小于原始中间结果的数据个数,因此减少了需要加密及通信的数据量,降低了加密和通信成本。进一步,第二设备对精简中间结果进行加密,得到第二设备的加密精简中间结果,然后发送该加密精简中间结果至第一设备。原始中间结果uA和的处理过程相似,在本实施例中,以uA为例进行说明。Specifically, u A and The number of and The number of samples is the same, usually the number of samples is very large, the second device needs to The encryption is performed and then sent to the first device for interaction, so the entire encryption process is very time-consuming and has a large amount of communication. The second device compares the original intermediate result u A and Sampling is performed to obtain the simplified intermediate results u' A and From the original intermediate result to the reduced intermediate result, the amount of data in the data dimension is reduced, that is, the number of data in the reduced intermediate result is smaller than the number of data in the original intermediate result, thus reducing the amount of data that needs to be encrypted and communicated, reducing encryption and communication. cost. Further, the second device encrypts the condensed intermediate result to obtain the encrypted condensed intermediate result of the second device, and then sends the encrypted condensed intermediate result to the first device. The original intermediate result u A and The processing process is similar, and in this embodiment, u A is used as an example for description.
步骤S20,对所述第二设备的加密精简中间结果进行数据补齐,得到第二设备的加密补齐中间结果,其中,所述加密补齐中间结果的数据数量与所述原始中间结果的数据数量相同;Step S20, performing data complementation on the encrypted and simplified intermediate result of the second device to obtain the encrypted and complemented intermediate result of the second device, wherein the data quantity of the encrypted and complemented intermediate result is the same as the data of the original intermediate result. the same quantity;
在本实施例中,需要第一设备和第二设备的中间结果进行数据对齐后进行相关计算,所以第一设备接收到第二设备的加密精简中间结果后,需要进行数据补齐,得到加密补齐中间结果,保证加密补齐中间结果的数据数量与原始中间结果的数据数量相同。In this embodiment, The intermediate results of the first device and the second device are required to perform data alignment and then perform related calculations. Therefore, after the first device receives the encrypted and simplified intermediate results of the second device, it needs to complete the data to obtain the encrypted and complete intermediate results to ensure encryption. The amount of data in the padded intermediate result is the same as the amount of data in the original intermediate result.
具体地,步骤S20包括:Specifically, step S20 includes:
步骤S21,获取所述第二设备的抽样对照表,并基于所述第二设备的抽样对照表在所述第二设备的加密精简中间结果中确定填充数据以及所述填充数据对应的填充位置;Step S21, obtaining the sampling comparison table of the second device, and determining padding data and the padding position corresponding to the padding data in the encrypted and simplified intermediate result of the second device based on the sampling comparison table of the second device;
步骤S22,在所述第二设备的加密精简中间结果中,基于所述填充位置插入所述填充数据,得到所述第二设备的加密补齐中间结果。Step S22 , inserting the padding data based on the padding position in the encrypted and condensed intermediate result of the second device, to obtain an encrypted and complete intermediate result of the second device.
在本实施例中,抽样对照表是第二设备在对原始中间结果进行抽样处理时生成的,该抽样对照表中记录了精简中间结果中各数据与原始中间结果中数据的替代关系,例如,精简中间结果中数据a,是原始中间结果中数据1,数据2,数据3的替代数据,就可以用数据a,恢复数据1、数据2和数据3。因为对精简中间结果采用的是同态加密算法,加密过程中不会影响数据的顺序,所以可以根据抽样对照表,对加密精简中间结果进行数据补齐,从而保证第二设备的加密补齐中间结果与第一设备中对应的数据是对齐。In this embodiment, the sampling comparison table is generated when the second device performs sampling processing on the original intermediate result, and the sampling comparison table records the substitution relationship between each data in the reduced intermediate result and the data in the original intermediate result, for example, Data a in the simplified intermediate result is the substitute data for data 1, data 2, and data 3 in the original intermediate result, and data a can be used to restore data 1, data 2, and data 3. Because the homomorphic encryption algorithm is used for the simplified intermediate result, the order of data will not be affected during the encryption process. Therefore, data can be filled in the encrypted and reduced intermediate result according to the sampling comparison table, so as to ensure that the encryption of the second device completes the intermediate result. The result is aligned with the corresponding data in the first device.
具体地,获取第二设备的抽样对照表,该抽样对照表由第二设备发送到第一设备,根据该抽样对照表在第二设备的加密精简中间结果中挑选出填充数据,再确定填充数据是对哪些数据的替代,例如,填充数据为数据a,其是对数据1、数据2和数据3的替代,需要说明的是数据1、数据2和数据3并不在加密精简中间结果中,只是在抽样对照表中记录了数据a与数据1、数据2和数据3之间存在替代关系,在加密精简中间结果补齐的过程中,需要用数据a补充到数据1、数据2和数据3所在位置。进一步确定填充数据对应的填充位置,在第二设备的加密精简中间结果中,在填充位置插入对应的填充数据,得到第二设备的加密补齐中间结果。Specifically, a sampling comparison table of the second device is obtained, the sampling comparison table is sent by the second device to the first device, padding data is selected from the encrypted and condensed intermediate results of the second device according to the sampling comparison table, and then the padding data is determined Which data is replaced. For example, the padding data is data a, which is a replacement for data 1, data 2, and data 3. It should be noted that data 1, data 2, and data 3 are not in the encrypted and simplified intermediate results, but only In the sampling comparison table, it is recorded that there is a substitution relationship between data a and data 1, data 2 and data 3. In the process of encrypting and simplifying the intermediate results, it is necessary to supplement data a to where data 1, data 2 and data 3 are located. Location. The padding position corresponding to the padding data is further determined, and the corresponding padding data is inserted into the padding position in the encryption and simplification intermediate result of the second device to obtain the encryption and complementation intermediate result of the second device.
步骤S30,利用所述第二设备的加密补齐中间结果计算得到所述第一设备中模型参数对应的加密第一梯度值,并基于所述加密第一梯度值更新所述第一设备的模型参数,循环迭代直到检测到满足预设停止条件时,得到训练完成的第一设备的目标模型参数。Step S30, calculating the encrypted first gradient value corresponding to the model parameter in the first device using the encryption and complementing the intermediate result of the second device, and updating the model of the first device based on the encrypted first gradient value parameters, and iterates in a loop until it is detected that the preset stop condition is satisfied, and then the target model parameters of the first device that has been trained are obtained.
在本实施例中,利用得到的第二设备的加密补齐中间结果后,与第一设备的加密补齐中间结果计算第一设备中模型参数对应的加密第一梯度值,将加密第一梯度值发送的协调方进行解密,协调方将解密后的第一梯度值发回给第一设备,第一设备利用第一梯度值更新本地模型参数。同时,利用第一设备的加密补齐中间结果和第二设备的加密补齐中间结果计算加密的损失函数,发送协调方,协调方对加密的损失函数进行解密,并检测是否满足预设停止条件,如果不满足预设停止条件,则继续下一轮迭代训练。In this embodiment, after using the obtained encryption of the second device to complete the intermediate result, the encrypted first gradient value corresponding to the model parameters in the first device is calculated with the encrypted and complete intermediate result of the first device, and the encrypted first gradient value is calculated. The coordinator that sends the value decrypts, and the coordinator sends the decrypted first gradient value back to the first device, and the first device updates the local model parameters using the first gradient value. At the same time, the encrypted loss function is calculated by using the encrypted and complete intermediate result of the first device and the encrypted and complete intermediate result of the second device, and sent to the coordinator, and the coordinator decrypts the encrypted loss function, and detects whether the preset stop condition is met. , if the preset stopping condition is not met, continue to the next round of iterative training.
具体地,步骤S30包括:Specifically, step S30 includes:
步骤S31,计算得到用于计算梯度值的第一设备的加密精简中间结果;Step S31, calculating and obtaining the encrypted and simplified intermediate result of the first device for calculating the gradient value;
具体地,步骤S31包括:Specifically, step S31 includes:
步骤a,所述第一设备对计算得到的第一设备的各条样本数据对应的原始中间结果进行抽样处理,得到所述第一设备的部分样本数据对应的精简中间结果;Step a, the first device performs sampling processing on the original intermediate results corresponding to each piece of sample data of the first device obtained by calculation, and obtains a simplified intermediate result corresponding to part of the sample data of the first device;
步骤b,对所述第一设备的精简中间结果进行加密,得到所述第一设备的加密精简中间结果。Step b, encrypting the condensed intermediate result of the first device to obtain the encrypted condensed intermediate result of the first device.
在本实施例中,需要对第一设备的原始中间结果进行加密后,才能与第二设备的加密补齐中间结果进行计算,为了减少加密时间,提升模型训练速度,故也对第一设备的原始中间结果进行抽样处理,从而减少加密的数据量,节省加密成本和模型训练时间。In this embodiment, the original intermediate result of the first device needs to be encrypted before it can be calculated with the encrypted intermediate result of the second device. In order to reduce the encryption time and improve the model training speed, the The raw intermediate results are sampled, thereby reducing the amount of encrypted data, saving encryption costs and model training time.
根据第一设备的模型参数以及第一设备拥有的数据,计算得到第一设备的各条样本数据对应的原始中间结果,其中,wB是第一设备的模型参数,是第一设备拥有的数据。对第一设备的原始中间结果进行抽样处理,得到第一设备的精简中间结果。According to the model parameters of the first device and the data possessed by the first device, the original intermediate results corresponding to each piece of sample data of the first device are calculated and obtained, where w B is the model parameter of the first device, is data owned by the first device. Perform sampling processing on the original intermediate result of the first device to obtain a simplified intermediate result of the first device.
为了避免纵向逻辑回归模型训练的精度损失,因此只替换绝对值较小的ui值,对较大值ui仍保留原值。故抽样的具体处理过程是,根据阈值对第一设备的原始中间结果进行拆分,得到两个原始中间结果的子集,其中,第一个子集中各个数据小于或等于阈值,第二个子集中各个数据大于阈值,阈值根据实际情况确定,只对第一个子集中的数据进行抽样。将第一个子集中的数据进行分组,并确定各组各自的代表数据,由各组的代表数据组成第三个子集,第三个子集和第二个子集的数据即为第一设备的精简中间结果,进一步对第一设备的精简中间结果进行加密,加密算法采用同态加密,得到第一设备的加密精简中间结果。In order to avoid the accuracy loss of longitudinal logistic regression model training, only the value of ui with a smaller absolute value is replaced, and the original value of ui with a larger value is still retained. Therefore, the specific processing process of sampling is to split the original intermediate results of the first device according to the threshold to obtain two subsets of the original intermediate results, wherein each data in the first subset is less than or equal to the threshold, and in the second subset Each data is greater than the threshold, the threshold is determined according to the actual situation, and only the data in the first subset is sampled. The data in the first subset is grouped, and the representative data of each group is determined. The representative data of each group forms a third subset, and the data of the third subset and the second subset is the simplification of the first device. For the intermediate result, the simplified intermediate result of the first device is further encrypted, and the encryption algorithm adopts homomorphic encryption to obtain the encrypted and simplified intermediate result of the first device.
步骤S32,利用所述第二设备的加密补齐中间结果以及所述第一设备的加密精简中间结果,计算得到第一设备的加密中间结果;Step S32, using the encryption of the second device to complete the intermediate result and the encryption of the first device to reduce the intermediate result, to calculate and obtain the encrypted intermediate result of the first device;
具体地,步骤S32包括:Specifically, step S32 includes:
步骤c,对所述第一设备的加密精简中间结果进行数据补齐,得到所述第一设备的加密补齐中间结果;Step c, performing data complementation on the encrypted and simplified intermediate result of the first device to obtain the encrypted and complemented intermediate result of the first device;
步骤d,利用所述第一设备的加密补齐中间结果与所述第二设备的加密补齐中间结果,计算得到第一设备的加密中间结果。In step d, the encrypted intermediate result of the first device is obtained by calculating by using the encryption and complementation intermediate result of the first device and the encryption and complementation intermediate result of the second device.
在本实施例中,对第一设备的加密精简中间结果进行数据补齐,得到第一设备的加密补齐中间结果,具体过程为:获取第一设备的抽样对照表,并根据第一设备的抽样对照表在第一设备的加密精简中间结果中确定填充数据以及填充数据对应的填充位置,在第一设备的加密精简中间结果中,根据填充位置插入填充数据,得到第一设备的加密补齐中间结果,第一设备的加密补齐中间结果的数据数量与第一设备的原始中间结果的数据数量相同。In this embodiment, data complementing is performed on the encrypted and simplified intermediate result of the first device to obtain the encrypted and complemented intermediate result of the first device. The specific process is: obtaining the sampling comparison table of the first device, The sampling comparison table determines the padding data and the padding position corresponding to the padding data in the encrypted and simplified intermediate result of the first device, inserts the padding data according to the padding position in the encrypted and simplified intermediate result of the first device, and obtains the encrypted complement of the first device As for the intermediate result, the data quantity of the encrypted-complemented intermediate result of the first device is the same as the data quantity of the original intermediate result of the first device.
进一步地,第一设备的加密补齐中间结果与第二设备的加密补齐中间结果数据是对齐的,利用第一设备的加密补齐中间结果与第二设备的加密补齐中间结果,计算得到第一设备的加密中间结果[[d]]。Further, the encryption of the first equipment and the intermediate result of the encryption of the second equipment are aligned, and the encryption of the first equipment and the encryption of the second equipment are used to complete the intermediate result, and the calculation is obtained. The encrypted intermediate result [[d]] of the first device.
步骤S33,利用所述第一设备的加密中间结果计算得到所述第一设备中模型参数对应的加密第一梯度值。Step S33, calculating the encrypted first gradient value corresponding to the model parameter in the first device by using the encrypted intermediate result of the first device.
在本实施例中,第一设备的加密中间结果[[d]],第一设备中模型参数对应的加密第一梯度值[[GB]],根据第一设备的加密中间结果以及第一设备拥有的数据,计算得到加密第一梯度值。In this embodiment, the encrypted intermediate result [[d]] of the first device, the encrypted first gradient value [[G B ]] corresponding to the model parameters in the first device, According to the encrypted intermediate result of the first device and the data possessed by the first device, the encrypted first gradient value is obtained by calculation.
本实施例提出的纵向联邦学习系统优化方法,接收所述第二设备发送的第二设备的加密精简中间结果,而后对所述第二设备的加密精简中间结果进行数据补齐,得到第二设备的加密补齐中间结果,最后利用所述第二设备的加密补齐中间结果计算得到所述第一设备中模型参数对应的加密第一梯度值,并基于所述加密第一梯度值更新所述第一设备的模型参数,循环迭代直到检测到满足预设停止条件时,得到训练完成的第一设备的目标模型参数。在纵向联邦训练中,通过减少参与设备的中间结果所包含的数据个数,从而减少了需要加密及通信的数据量,降低了加密和通信成本,同时极大的缩短了纵向联邦建模时间。The vertical federated learning system optimization method proposed in this embodiment receives the encrypted and simplified intermediate results of the second device sent by the second device, and then performs data complementation on the encrypted and simplified intermediate results of the second device to obtain the second device. Finally, the encrypted first gradient value corresponding to the model parameter in the first device is calculated by using the encrypted and completed intermediate result of the second device, and the encrypted first gradient value is updated based on the encrypted first gradient value. The model parameters of the first device are cyclically iterated until it is detected that the preset stop condition is met, and the target model parameters of the first device that have been trained are obtained. In vertical federation training, by reducing the number of data contained in the intermediate results of participating devices, the amount of data that needs to be encrypted and communicated is reduced, the cost of encryption and communication is reduced, and the modeling time of vertical federation is greatly shortened.
进一步的,根据第一实施例,本发明纵向联邦学习系统优化方法第二实施例提供一种纵向联邦学习系统优化方法,所述纵向联邦学习系统优化方法应用于第二设备,所述第二设备可以是智能手机、个人计算机等设备,所述纵向联邦学习系统优化方法包括:Further, according to the first embodiment, the second embodiment of the vertical federated learning system optimization method of the present invention provides a vertical federated learning system optimization method, the vertical federated learning system optimization method is applied to a second device, and the second device It can be a smart phone, a personal computer and other devices, and the vertical federated learning system optimization method includes:
步骤A10,对计算得到的第二设备的各条样本数据对应的原始中间结果进行抽样处理,得到所述第二设备的部分样本数据对应的精简中间结果;Step A10, performing sampling processing on the calculated original intermediate results corresponding to each piece of sample data of the second device, to obtain a simplified intermediate result corresponding to part of the sample data of the second device;
在本实施例中,在纵向联邦学习的一轮模型参数更新过程中,第一设备与第二设备之间以加密形式交互用于计算梯度和损失函数的中间结果,其中,加密采用同态加密算法,由第一设备和第二设备共同信任的第三方协调方产生公钥和私钥,并把公钥发送给第一设备和第二设备进行加密,第一设备和第二设备将加密的梯度值和加密的损失函数发送到协调方解密,然后根据解密后的梯度值更新第一设备和第二设备本地模型。In this embodiment, during a round of model parameter update process of vertical federated learning, the first device and the second device interact in encrypted form to calculate the intermediate results of gradient and loss function, wherein the encryption adopts homomorphic encryption Algorithm, a third-party coordinator jointly trusted by the first device and the second device generates a public key and a private key, and sends the public key to the first device and the second device for encryption, and the first device and the second device will encrypt the encrypted data. The gradient value and the encrypted loss function are sent to the coordinator for decryption, and then the local models of the first device and the second device are updated according to the decrypted gradient value.
本申请涉及的线性模型包括但不限于:逻辑回归、线性回归、泊松回归等线性的基于权重学习的模型算法。为了描述方便,本发明中以纵向逻辑回归模型训练为例,进行说明参与纵向联邦学习的第一设备与第二设备一起联合构建一个逻辑回归模型。。第二设备拥有数据其中,DA表示第二设备的数据集,第一设备拥有数据和标签其中,DB表示第一设备的数据集,和都是多维向量,而yi是标量(例如,取值为0或1的标量,表示是或者否)。定义其中,wA和wB是分别对应于和的机器学习模型参数,则The linear models involved in this application include, but are not limited to, linear weight learning-based model algorithms such as logistic regression, linear regression, and Poisson regression. For the convenience of description, the present invention takes longitudinal logistic regression model training as an example to illustrate that the first device participating in the longitudinal federated learning and the second device jointly build a logistic regression model. . The second device owns the data where D A represents the data set of the second device and the first device owns the data and labels where DB represents the dataset of the first device, and are multidimensional vectors, and y i is a scalar (eg, a scalar with the value 0 or 1, indicating yes or no). definition where w A and w B are respectively corresponding to and the machine learning model parameters, then
损失函数loss(也称为代价函数)为:The loss function loss (also known as the cost function) is:
loss在第一设备进行计算,根据损失函数定义可知需要第二设备发送中间结果uA和至第一设备,以供第一设备计算loss值。联邦训练过程中,需要对中间结果进行加密,避免数据隐私泄露,故第二设备发送加密中间结果[[uA]]和至第一设备,其中,[[·]]表示同态加密。The loss is calculated in the first device. According to the definition of the loss function, it can be seen that the second device needs to send the intermediate results u A and to the first device for the first device to calculate the loss value. During the federated training process, the intermediate results need to be encrypted to avoid leakage of data privacy, so the second device sends the encrypted intermediate results [[u A ]] and to the first device, where [[ ]] represents homomorphic encryption.
定义 definition
同态加密后的梯度函数G为:G=dx=∑dxi,则[[G]]=[[dx]]=∑[[d]]xi,第一设备根据接收第二设备发送的加密中间结果[[uA]],以及第一设备的uB计算得到[[d]],根据[[d]]进一步计算第一设备本地模型的加密梯度值[[GB]],同时第一设备发送[[d]]至第二设备,以供第二设备计算本地模型的加密梯度值[[GA]]。Homomorphically encrypted The gradient function G is: G=dx=∑dx i , then [[G]]=[[dx]]=∑[[d]]x i , the first device receives the encrypted intermediate result [[ u A ]], and the u B of the first device is calculated to obtain [[d]], and the encrypted gradient value [[ GB ]] of the local model of the first device is further calculated according to [[d]], while the first device sends [ [d]] to the second device for the second device to calculate the encrypted gradient value [[G A ]] of the local model.
具体地,uA和的数量分别与的样本数量相同,通常情况下样本数量是非常大的,第二设备需要对uA和进行加密再发送至第一设备进行交互,因此整个加密过程非常耗时并且通信量很大。第二设备对原始中间结果uA和进行抽样处理,得到精简中间结果u'A和原始中间结果到精简中间结果是数据维度上的数据量减少,即精简中间结果中的数据个数小于原始中间结果的数据个数,因此减少了需要加密及通信的数据量,降低了加密和通信成本。进一步,第二设备对精简中间结果进行加密,得到第二设备的加密精简中间结果,然后发送该加密精简中间结果至第一设备。原始中间结果uA和的处理过程相似,在本实施例中,以uA为例进行说明。Specifically, u A and The number of and The number of samples is the same, usually the number of samples is very large, the second device needs to The encryption is performed and then sent to the first device for interaction, so the entire encryption process is very time-consuming and has a large amount of communication. The second device compares the original intermediate result u A and Sampling is performed to obtain the simplified intermediate results u' A and From the original intermediate result to the reduced intermediate result, the amount of data in the data dimension is reduced, that is, the number of data in the reduced intermediate result is smaller than the number of data in the original intermediate result, thus reducing the amount of data that needs to be encrypted and communicated, reducing encryption and communication. cost. Further, the second device encrypts the condensed intermediate result to obtain the encrypted condensed intermediate result of the second device, and then sends the encrypted condensed intermediate result to the first device. The original intermediate result u A and The processing process is similar, and in this embodiment, u A is used as an example for description.
具体地,步骤A10包括:Specifically, step A10 includes:
步骤A12,基于阈值对所述第二设备的原始中间结果进行拆分,得到第一子原始中间结果和第二子原始中间结果,其中,所述第一子原始中间结果中的各个数据小于或等于阈值,所述第二子原始中间结果中的各个数据大于阈值;Step A12, splitting the original intermediate result of the second device based on the threshold to obtain a first sub-original intermediate result and a second sub-original intermediate result, wherein each data in the first sub-original intermediate result is less than or is equal to the threshold, and each data in the second sub-original intermediate result is greater than the threshold;
步骤A13,将所述第一子原始中间结果的所有数据进行分组,并确定各组各自的代表数据,由各组的代表数据组成第三子原始中间结果;Step A13, grouping all the data of the first sub-original intermediate results, and determining the respective representative data of each group, and forming the third sub-original intermediate results by the representative data of each group;
步骤A14,基于所述第三子原始中间结果和所述第二子原始中间结果,得到第二设备的精简中间结果。Step A14, obtaining a simplified intermediate result of the second device based on the third sub-original intermediate result and the second sub-original intermediate result.
在本实施例中,为了避免纵向逻辑回归模型训练的精度损失,因此只替换绝对值较小的ui值,对较大值ui仍保留原值。故抽样的具体处理过程是,根据阈值对第二设备的原始中间结果进行拆分,得到两个原始中间结果的子集:第一子原始中间结果和第二子原始中间结果,其中,第一子原始中间结果中的各个数据小于或等于阈值,第二子原始中间结果中的各个数据大于阈值,阈值根据实际情况确定,只对第一子原始中间结果中的数据进行抽样。将第一子原始中间结果中的数据进行分组,并确定各组各自的代表数据,由各组的代表数据组成第三子原始中间结果,第三子原始中间结果和第二子原始中间结果的数据即为第二设备的精简中间结果。其中,对第一子原始中间结果中的数据进行分组以及确定代表数据的具体方法可根据实际情况确定,例如将第一子原始中间结果中的数据按照从大到小的顺序排列,然后平均分成N组,每组计算平均数,将平均数作为每组的代表数据;还可以采用手动设置N个初始聚类中心,再利用k-means得到最终聚类中心点,将最终聚类中心点作为每组的代表数据。In this embodiment, in order to avoid the loss of accuracy in the training of the longitudinal logistic regression model, only the value of ui with a smaller absolute value is replaced, and the original value of ui with a larger value is still retained. Therefore, the specific processing process of sampling is to split the original intermediate results of the second device according to the threshold to obtain two subsets of original intermediate results: the first sub-original intermediate results and the second sub-original intermediate results, where the first sub-original intermediate results Each data in the sub-original intermediate result is less than or equal to the threshold, and each data in the second sub-original intermediate result is greater than the threshold. The threshold is determined according to the actual situation, and only the data in the first sub-original intermediate result is sampled. The data in the first sub-original intermediate results are grouped, and the respective representative data of each group is determined, and the representative data of each group is composed of the third sub-original intermediate results, the third sub-original intermediate results and the second sub-original intermediate results. The data is the condensed intermediate result of the second device. The specific method for grouping the data in the first sub-original intermediate result and determining the representative data can be determined according to the actual situation, for example, arranging the data in the first sub-original intermediate result in descending order, and then equally dividing the N groups, the average is calculated for each group, and the average is used as the representative data of each group; it is also possible to manually set N initial cluster centers, and then use k-means to obtain the final cluster center point, and use the final cluster center point as Representative data for each group.
步骤A20,对第二设备的精简中间结果进行加密,得到所述第二设备的部分样本数据对应的精简中间结果并发送至所述第一设备,以供所述第一设备基于所述第二设备的加密精简中间结果反馈所述第一设备的加密中间结果,其中,所述第一设备对接收的所述第二设备的加密精简中间结果进行数据补齐获得所述第二设备的加密补齐中间结果,并利用所述第二设备的加密补齐中间结果计算得到所述第一设备的加密中间结果;Step A20: Encrypt the simplified intermediate result of the second device to obtain a simplified intermediate result corresponding to part of the sample data of the second device and send it to the first device for the first device to use the second device based on the second device. The encrypted and reduced intermediate result of the device feeds back the encrypted intermediate result of the first device, wherein the first device performs data completion on the received encrypted and reduced intermediate result of the second device to obtain the encrypted complement of the second device. Completing the intermediate result, and using the encryption of the second device to complete the intermediate result calculation to obtain the encrypted intermediate result of the first device;
在本实施例中,进一步对第二设备的精简中间结果进行加密,加密算法采用同态加密,得到第二设备的加密精简中间结果,并发送第二设备的加密精简中间结果至第一设备。第一设备接收到第二设备的加密精简中间结果,需要对第二设备的加密精简中间结果进行数据补齐,得到第二设备的加密补齐中间结果,然后利用第一设备的加密补齐中间结果与第二设备的加密补齐中间结果,计算得到第一设备的加密中间结果[[d]],并由第一设备发送第一设备的加密中间结果[[d]]至第二设备,其中,第一设备的加密补齐中间结果与第二设备的加密补齐中间结果数据是对齐的。In this embodiment, the condensed intermediate result of the second device is further encrypted, and the encryption algorithm adopts homomorphic encryption to obtain the encrypted condensed intermediate result of the second device, and the encrypted condensed intermediate result of the second device is sent to the first device. The first device receives the encrypted and simplified intermediate result of the second device, and needs to complete the data of the encrypted and simplified intermediate result of the second device, obtains the encrypted and completed intermediate result of the second device, and then uses the encryption of the first device to complete the intermediate result. The result is supplemented with the encrypted intermediate result of the second device, and the encrypted intermediate result [[d]] of the first device is obtained by calculation, and the encrypted intermediate result [[d]] of the first device is sent by the first device to the second device. Wherein, the encrypted and completed intermediate result data of the first device is aligned with the encrypted and completed intermediate result data of the second device.
步骤A30,利用所述第一设备的加密中间结果计算得到所述第二设备中模型参数对应的加密第二梯度值,并基于所述加密第二梯度值更新所述第二设备的模型参数,循环迭代直到检测到满足预设停止条件时,得到训练完成的第二设备的目标模型参数。Step A30, calculating the encrypted second gradient value corresponding to the model parameter in the second device by using the encrypted intermediate result of the first device, and updating the model parameter of the second device based on the encrypted second gradient value, The loop is iterated until it is detected that the preset stop condition is satisfied, and the target model parameters of the second device that has been trained are obtained.
在本实施例中,利用得到的第一设备的加密中间结果[[d]],第二设备中模型参数对应的加密第一梯度值[[GA]],根据第一设备的加密中间结果以及第二设备的拥有的数据,结算得到加密第二梯度值。将加密第二梯度值发送的协调方进行解密,协调方将解密后的第二梯度值发回给第二设备,第二设备利用第二梯度值更新本地模型参数。同时,协调方检测是否满足预设停止条件,如果不满足预设停止条件,则继续下一轮迭代训练。In this embodiment, using the obtained encrypted intermediate result [[d]] of the first device, the encrypted first gradient value [[G A ]] corresponding to the model parameters in the second device, According to the encrypted intermediate result of the first device and the data possessed by the second device, the encrypted second gradient value is obtained by settlement. The coordinator who sent the encrypted second gradient value decrypts, the coordinator sends the decrypted second gradient value back to the second device, and the second device uses the second gradient value to update the local model parameters. At the same time, the coordinator detects whether the preset stop condition is met, and if the preset stop condition is not met, the next round of iterative training is continued.
本实施例提出的纵向联邦学习系统优化方法,对计算得到的第二设备的各条样本数据对应的原始中间结果进行抽样处理,得到所述第二设备的部分样本数据对应的精简中间结果,对第二设备的精简中间结果进行加密,得到所述第二设备的部分样本数据对应的精简中间结果并发送至所述第一设备,以供所述第一设备基于所述第二设备的加密精简中间结果反馈所述第一设备的加密中间结果,利用所述第一设备的加密中间结果计算得到所述第二设备中模型参数对应的加密第二梯度值,并基于所述加密第二梯度值更新所述第二设备的模型参数,循环迭代直到检测到满足预设停止条件时,得到训练完成的第二设备的目标模型参数。在纵向联邦训练中,通过减少参与设备的中间结果所包含的数据个数,从而减少了需要加密及通信的数据量,降低了加密和通信成本,同时极大的缩短了纵向联邦建模时间。The vertical federated learning system optimization method proposed in this embodiment performs sampling processing on the calculated original intermediate results corresponding to each piece of sample data of the second device, and obtains simplified intermediate results corresponding to part of the sample data of the second device. The condensed intermediate result of the second device is encrypted, and the condensed intermediate result corresponding to part of the sample data of the second device is obtained and sent to the first device for the first device to simplify based on the encryption of the second device. The intermediate result feeds back the encrypted intermediate result of the first device, and calculates the encrypted second gradient value corresponding to the model parameter in the second device by using the encrypted intermediate result of the first device, and based on the encrypted second gradient value The model parameters of the second device are updated, and the loop iteration is performed until it is detected that the preset stop condition is satisfied, and the target model parameters of the second device that have been trained are obtained. In vertical federation training, by reducing the number of data contained in the intermediate results of participating devices, the amount of data that needs to be encrypted and communicated is reduced, the cost of encryption and communication is reduced, and the modeling time of vertical federation is greatly shortened.
此外,本发明实施例还提出一种可读存储介质,所述存储介质上存储有纵向联邦学习系统优化程序,所述纵向联邦学习系统优化程序被处理器执行时实现如下所述的纵向联邦学习系统优化方法的步骤。In addition, an embodiment of the present invention also provides a readable storage medium, where a vertical federated learning system optimization program is stored on the storage medium, and when the vertical federated learning system optimization program is executed by a processor, the vertical federated learning as described below is implemented The steps of the system optimization method.
本发明纵向联邦学习系统优化设备和可读存储介质的各实施例,均可参照本发明纵向联邦学习系统优化方法各个实施例,此处不再赘述。For the embodiments of the vertical federated learning system optimization device and the readable storage medium of the present invention, reference may be made to the various embodiments of the vertical federated learning system optimization method of the present invention, which will not be repeated here.
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。It should be noted that, herein, the terms "comprising", "comprising" or any other variation thereof are intended to encompass non-exclusive inclusion, such that a process, method, article or device comprising a series of elements includes not only those elements, It also includes other elements not expressly listed or inherent to such a process, method, article or apparatus. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in a process, method, article or apparatus that includes the element.
上述本发明实施例序号仅仅为了描述,不代表实施例的优劣。The above-mentioned serial numbers of the embodiments of the present invention are only for description, and do not represent the advantages or disadvantages of the embodiments.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本发明各个实施例所述的方法。From the description of the above embodiments, those skilled in the art can clearly understand that the method of the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course can also be implemented by hardware, but in many cases the former is better implementation. Based on this understanding, the technical solutions of the present invention can be embodied in the form of software products in essence or the parts that make contributions to the prior art, and the computer software products are stored in a storage medium (such as ROM/RAM, magnetic disk, CD-ROM), including several instructions to make a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of the present invention.
以上仅为本发明的优选实施例,并非因此限制本发明的专利范围,凡是利用本发明说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本发明的专利保护范围内。The above are only preferred embodiments of the present invention, and are not intended to limit the scope of the present invention. Any equivalent structure or equivalent process transformation made by using the contents of the description and drawings of the present invention, or directly or indirectly applied in other related technical fields , are similarly included in the scope of patent protection of the present invention.
Claims (10)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010089045.6A CN111340247B (en) | 2020-02-12 | 2020-02-12 | Longitudinal federal learning system optimization method, device and readable storage medium |
PCT/CN2020/129255 WO2021159798A1 (en) | 2020-02-12 | 2020-11-17 | Method for optimizing longitudinal federated learning system, device and readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010089045.6A CN111340247B (en) | 2020-02-12 | 2020-02-12 | Longitudinal federal learning system optimization method, device and readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111340247A true CN111340247A (en) | 2020-06-26 |
CN111340247B CN111340247B (en) | 2024-10-15 |
Family
ID=71183882
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010089045.6A Active CN111340247B (en) | 2020-02-12 | 2020-02-12 | Longitudinal federal learning system optimization method, device and readable storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN111340247B (en) |
WO (1) | WO2021159798A1 (en) |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112700010A (en) * | 2020-12-30 | 2021-04-23 | 深圳前海微众银行股份有限公司 | Feature completion method, device, equipment and storage medium based on federal learning |
CN112949866A (en) * | 2021-03-25 | 2021-06-11 | 深圳前海微众银行股份有限公司 | Poisson regression model training method and device, electronic equipment and storage medium |
CN113011603A (en) * | 2021-03-17 | 2021-06-22 | 深圳前海微众银行股份有限公司 | Model parameter updating method, device, equipment, storage medium and program product |
CN113240127A (en) * | 2021-04-07 | 2021-08-10 | 睿蜂群(北京)科技有限公司 | Federal learning-based training method and device, electronic equipment and storage medium |
WO2021159798A1 (en) * | 2020-02-12 | 2021-08-19 | 深圳前海微众银行股份有限公司 | Method for optimizing longitudinal federated learning system, device and readable storage medium |
CN113704776A (en) * | 2021-07-15 | 2021-11-26 | 杭州医康慧联科技股份有限公司 | Machine learning method based on federal learning |
CN113762531A (en) * | 2021-03-19 | 2021-12-07 | 北京沃东天骏信息技术有限公司 | A federated learning system |
CN113806759A (en) * | 2020-12-28 | 2021-12-17 | 京东科技控股股份有限公司 | Federal learning model training method and device, electronic equipment and storage medium |
CN114077901A (en) * | 2021-11-23 | 2022-02-22 | 山东大学 | A User Location Prediction Framework Based on Clustering Graph Federated Learning |
CN114140144A (en) * | 2020-09-04 | 2022-03-04 | 京东科技控股股份有限公司 | Information delivery method and device, and information delivery decision model acquisition method and device |
CN114330759A (en) * | 2022-03-08 | 2022-04-12 | 富算科技(上海)有限公司 | Training method and system for longitudinal federated learning model |
CN114692859A (en) * | 2020-12-29 | 2022-07-01 | 阿里巴巴集团控股有限公司 | Data processing method and device, computing equipment and test reduction equipment |
WO2022142366A1 (en) * | 2020-12-31 | 2022-07-07 | 华为技术有限公司 | Method and apparatus for updating machine learning model |
CN115358826A (en) * | 2022-08-29 | 2022-11-18 | 中国银行股份有限公司 | Product recommendation method and device, storage medium and electronic equipment |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118644423B (en) * | 2024-04-30 | 2025-03-21 | 中国科学院自动化研究所 | Data recovery method and device based on federated learning |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101375284A (en) * | 2004-10-25 | 2009-02-25 | 里克·L·奥尔西尼 | Secure data analysis method and system |
US20120198241A1 (en) * | 2011-01-27 | 2012-08-02 | Security First Corp. | Systems and methods for securing data |
US20160344428A1 (en) * | 2014-01-30 | 2016-11-24 | Hewlett Packard Enterprise Development Lp | Joint encryption and error correction encoding |
CN110288094A (en) * | 2019-06-10 | 2019-09-27 | 深圳前海微众银行股份有限公司 | Model parameter training method and device based on federated learning |
CN110633806A (en) * | 2019-10-21 | 2019-12-31 | 深圳前海微众银行股份有限公司 | Vertical federated learning system optimization method, device, equipment and readable storage medium |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109492420B (en) * | 2018-12-28 | 2021-07-20 | 深圳前海微众银行股份有限公司 | Model parameter training method, terminal, system and medium based on federated learning |
CN110245510B (en) * | 2019-06-19 | 2021-12-07 | 北京百度网讯科技有限公司 | Method and apparatus for predicting information |
CN110428058B (en) * | 2019-08-08 | 2024-04-26 | 深圳前海微众银行股份有限公司 | Federal learning model training method, device, terminal equipment and storage medium |
KR102781104B1 (en) * | 2019-08-15 | 2025-03-18 | 엘지전자 주식회사 | Method and apparatus for recognizing a business card using federated learning |
CN111340247B (en) * | 2020-02-12 | 2024-10-15 | 深圳前海微众银行股份有限公司 | Longitudinal federal learning system optimization method, device and readable storage medium |
-
2020
- 2020-02-12 CN CN202010089045.6A patent/CN111340247B/en active Active
- 2020-11-17 WO PCT/CN2020/129255 patent/WO2021159798A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101375284A (en) * | 2004-10-25 | 2009-02-25 | 里克·L·奥尔西尼 | Secure data analysis method and system |
US20120198241A1 (en) * | 2011-01-27 | 2012-08-02 | Security First Corp. | Systems and methods for securing data |
US20160344428A1 (en) * | 2014-01-30 | 2016-11-24 | Hewlett Packard Enterprise Development Lp | Joint encryption and error correction encoding |
CN110288094A (en) * | 2019-06-10 | 2019-09-27 | 深圳前海微众银行股份有限公司 | Model parameter training method and device based on federated learning |
CN110633806A (en) * | 2019-10-21 | 2019-12-31 | 深圳前海微众银行股份有限公司 | Vertical federated learning system optimization method, device, equipment and readable storage medium |
Non-Patent Citations (2)
Title |
---|
FARZIN HADDADPOUR ET AL.,: "On the Convergence of Local Descent Methods in Federated Learning", 《ARXIV》, 31 October 2019 (2019-10-31), pages 1 - 49 * |
魏雅婷 等,: "联邦可视化:一种隐私保护的可视化新模型", 《智能科学与技术学报》, vol. 1, no. 4, 31 December 2019 (2019-12-31), pages 415 - 420 * |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021159798A1 (en) * | 2020-02-12 | 2021-08-19 | 深圳前海微众银行股份有限公司 | Method for optimizing longitudinal federated learning system, device and readable storage medium |
CN114140144B (en) * | 2020-09-04 | 2025-03-21 | 京东科技控股股份有限公司 | Information delivery method and device, information delivery decision model acquisition method and device |
CN114140144A (en) * | 2020-09-04 | 2022-03-04 | 京东科技控股股份有限公司 | Information delivery method and device, and information delivery decision model acquisition method and device |
CN113806759A (en) * | 2020-12-28 | 2021-12-17 | 京东科技控股股份有限公司 | Federal learning model training method and device, electronic equipment and storage medium |
CN114692859A (en) * | 2020-12-29 | 2022-07-01 | 阿里巴巴集团控股有限公司 | Data processing method and device, computing equipment and test reduction equipment |
CN112700010A (en) * | 2020-12-30 | 2021-04-23 | 深圳前海微众银行股份有限公司 | Feature completion method, device, equipment and storage medium based on federal learning |
WO2022142366A1 (en) * | 2020-12-31 | 2022-07-07 | 华为技术有限公司 | Method and apparatus for updating machine learning model |
CN113011603A (en) * | 2021-03-17 | 2021-06-22 | 深圳前海微众银行股份有限公司 | Model parameter updating method, device, equipment, storage medium and program product |
CN113762531A (en) * | 2021-03-19 | 2021-12-07 | 北京沃东天骏信息技术有限公司 | A federated learning system |
CN112949866A (en) * | 2021-03-25 | 2021-06-11 | 深圳前海微众银行股份有限公司 | Poisson regression model training method and device, electronic equipment and storage medium |
CN113240127A (en) * | 2021-04-07 | 2021-08-10 | 睿蜂群(北京)科技有限公司 | Federal learning-based training method and device, electronic equipment and storage medium |
CN113704776A (en) * | 2021-07-15 | 2021-11-26 | 杭州医康慧联科技股份有限公司 | Machine learning method based on federal learning |
CN114077901A (en) * | 2021-11-23 | 2022-02-22 | 山东大学 | A User Location Prediction Framework Based on Clustering Graph Federated Learning |
CN114077901B (en) * | 2021-11-23 | 2024-05-24 | 山东大学 | User position prediction method based on clustering graph federation learning |
CN114330759A (en) * | 2022-03-08 | 2022-04-12 | 富算科技(上海)有限公司 | Training method and system for longitudinal federated learning model |
CN114330759B (en) * | 2022-03-08 | 2022-08-02 | 富算科技(上海)有限公司 | Training method and system for longitudinal federated learning model |
CN115358826A (en) * | 2022-08-29 | 2022-11-18 | 中国银行股份有限公司 | Product recommendation method and device, storage medium and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN111340247B (en) | 2024-10-15 |
WO2021159798A1 (en) | 2021-08-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111340247A (en) | Longitudinal federated learning system optimization method, device and readable storage medium | |
CN109165725B (en) | Neural network federal modeling method, equipment and storage medium based on transfer learning | |
CN110704860B (en) | Longitudinal federal learning method, equipment, system and storage medium for improving safety | |
US20230109352A1 (en) | Node group-based data processing method and system, device, and medium | |
US12254121B2 (en) | Data processing method and apparatus, device, and computer-readable storage medium | |
CN110851786B (en) | Inter-enterprise data interaction method, device, equipment and storage medium based on longitudinal federal learning | |
CN110633806A (en) | Vertical federated learning system optimization method, device, equipment and readable storage medium | |
CN107147720B (en) | Traceable effective public auditing method and traceable effective public auditing system in cloud storage data sharing | |
CN107145792B (en) | Multi-user privacy protection data clustering method and system based on ciphertext data | |
CN111598254A (en) | Federated learning modeling method, device and readable storage medium | |
CN113542228B (en) | Federated learning-based data transmission method, device and readable storage medium | |
CN112347500B (en) | Machine learning method, device, system, equipment and storage medium of distributed system | |
US20160020904A1 (en) | Method and system for privacy-preserving recommendation based on matrix factorization and ridge regression | |
WO2022142366A1 (en) | Method and apparatus for updating machine learning model | |
CN112000987B (en) | Method, device and readable storage medium for building factor decomposition machine classification model | |
CN114696990B (en) | Multi-party computing method, system and related equipment based on fully homomorphic encryption | |
CN111291273A (en) | Recommendation system optimization method, device, equipment and readable storage medium | |
CN116502732B (en) | Federal learning method and system based on trusted execution environment | |
CN111325352A (en) | Model update method, device, equipment and medium based on vertical federated learning | |
CN112000988B (en) | Factor decomposition machine regression model construction method, device and readable storage medium | |
CN112016698A (en) | Factorization machine model construction method and device and readable storage medium | |
CN111324812A (en) | Federal recommendation method, device, equipment and medium based on transfer learning | |
CN112199697A (en) | Information processing method, device, equipment and medium based on shared root key | |
CN111343265A (en) | Information pushing method, device, equipment and readable storage medium | |
CN111368196A (en) | Method, apparatus, device and readable storage medium for updating model parameters |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |