WO2021083276A1 - Method, device, and apparatus for combining horizontal federation and vertical federation, and medium - Google Patents

Method, device, and apparatus for combining horizontal federation and vertical federation, and medium Download PDF

Info

Publication number
WO2021083276A1
WO2021083276A1 PCT/CN2020/124846 CN2020124846W WO2021083276A1 WO 2021083276 A1 WO2021083276 A1 WO 2021083276A1 CN 2020124846 W CN2020124846 W CN 2020124846W WO 2021083276 A1 WO2021083276 A1 WO 2021083276A1
Authority
WO
WIPO (PCT)
Prior art keywords
preset
federation
model
reinforcement learning
vertical
Prior art date
Application number
PCT/CN2020/124846
Other languages
French (fr)
Chinese (zh)
Inventor
梁新乐
刘洋
陈天健
董苗波
Original Assignee
深圳前海微众银行股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳前海微众银行股份有限公司 filed Critical 深圳前海微众银行股份有限公司
Publication of WO2021083276A1 publication Critical patent/WO2021083276A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A method, device, and apparatus for combining a horizontal federation and a vertical federation, and a medium. The method for combining a horizontal federation and a vertical federation comprises: acquiring available public information, and inputting the available public information into a preset vertical federation server to obtain vector information (S10); training, on the basis of the vector information, a vertical federation model of the preset vertical federation server, and updating network weights of respective preset reinforcement learning models (S20); and regularly inputting each of the updated preset reinforcement learning models into a preset horizontal federation server, and iteratively updating each of the updated preset reinforcement learning models (S30). The method solves the technical problem in the prior art in which reinforcement learning models consume a considerable amount of computing system resources.

Description

横向联邦和纵向联邦联合方法、装置、设备及介质Horizontal federation and vertical federation combined methods, devices, equipment and media
本申请要求于2019年10月29日提交中国专利局、申请号为201911035368.0、发明名称为“横向联邦和纵向联邦联合方法、装置、设备及介质”的中国专利申请的优先权,其全部内容通过引用结合在申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on October 29, 2019, the application number is 201911035368.0, and the invention title is "horizontal federal and vertical federal combined methods, devices, equipment, and media". The entire content of the application is approved The reference is incorporated in the application.
技术领域Technical field
本发明涉及金融科技(Fintech)的机器学习技术领域,尤其涉及一种横向联邦和纵向联邦联合方法、装置、设备及介质。The present invention relates to the field of machine learning technology of financial technology (Fintech), and in particular to a combined method, device, equipment and medium of horizontal federation and vertical federation.
背景技术Background technique
随着金融科技,尤其是互联网科技金融的不断发展,越来越多的技术(如分布式、区块链Blockchain、人工智能等)应用在金融领域,但金融业也对技术提出了更高的要求,如对金融业对应待办事项的分发也有更高的要求。With the continuous development of financial technology, especially Internet technology and finance, more and more technologies (such as distributed, blockchain, artificial intelligence, etc.) are applied in the financial field, but the financial industry also proposes a higher level of technology. Requirements, such as the distribution of to-do items in the financial industry, also have higher requirements.
随着人工智能的逐步发展,利用强化学习在工业界进行最优化控制已经取得了广泛的研究,在现有技术中,强化学习模型通常利用自身收集的数据进行学习、优化和控制,但是,强化学习模型自身采集数据往往会产生一些困难,例如,无人车的高速雷达无法穿越遮挡,由于其图像传感器高度的限制,无人车无法获得更为全面的数据(如周围车辆的分布,运行状态等),导致强化学习模型的样本处理效率低下,且模型控制性能差,进一步地,为了在样本处理效率低下,且模型控制性能差的情况下,为获得较好的最优化控制结果,仅仅通过强化学习模型自身独自进行学习、优化和控制,需要消耗大量的计算系统资源,所以,现有技术中存在强化学习模型的计算系统资源消耗高的技术问题。With the gradual development of artificial intelligence, extensive research has been made on the use of reinforcement learning to optimize control in the industry. In the existing technology, reinforcement learning models usually use their own collected data for learning, optimization and control. However, reinforcement Collecting data by the learning model itself often causes some difficulties. For example, the high-speed radar of an unmanned vehicle cannot pass through the occlusion. Due to the limitation of the height of its image sensor, the unmanned vehicle cannot obtain more comprehensive data (such as the distribution of surrounding vehicles and operating status). Etc.), resulting in low sample processing efficiency of the reinforcement learning model and poor model control performance. Further, in order to obtain better optimal control results when the sample processing efficiency is low and the model control performance is poor, only pass The reinforcement learning model itself performs learning, optimization and control, and needs to consume a large amount of computing system resources. Therefore, there is a technical problem in the prior art that the computing system resource consumption of the reinforcement learning model is high.
技术解决方案Technical solutions
本发明的主要目的在于提供一种横向联邦和纵向联邦联合方法、装置、设备和介质,旨在解决现有技术中强化学习模型的计算系统资源消耗高的技术问题。The main purpose of the present invention is to provide a horizontal federation and vertical federation combined method, device, equipment and medium, aiming to solve the technical problem of high resource consumption of the computing system of the reinforcement learning model in the prior art.
为实现上述目的,本发明实施例提供一种横向联邦和纵向联邦联合方法,所述横向联邦和纵向联邦联合方法应用于横向联邦和纵向联邦联合设备,所述横向联邦和纵向联邦联合方法包括:In order to achieve the above objective, embodiments of the present invention provide a combination method for horizontal federation and vertical federation. The combination method for horizontal federation and vertical federation is applied to a horizontal federation and a vertical federation combined device. The horizontal federation and vertical federation combination method includes:
获取可利用公共信息,并将所述可利用公共信息输入预设纵向联邦服务方,获得向量信息;Obtain available public information, and input the available public information into a preset vertical federal service party to obtain vector information;
基于所述向量信息,对所述预设纵向联邦服务方的纵向联邦模型进行训练,更新各预设强化学习模型的网络权重;Based on the vector information, train the vertical federation model of the preset vertical federated service provider, and update the network weight of each preset reinforcement learning model;
定期将更新后的各所述预设强化学习模型输入预设横向联邦服务器,对更新后的各所述预设强化学习模型进行迭代更新。Each of the updated preset reinforcement learning models is input to a preset horizontal federated server regularly, and each of the updated preset reinforcement learning models is iteratively updated.
此外,为实现上述目的,本发明还提供一种横向联邦和纵向联邦联合装置,所述横向联邦和纵向联邦联合装置应用于横向联邦和纵向联邦联合设备,所述横向联邦和纵向联邦联合装置包括:In addition, in order to achieve the above-mentioned object, the present invention also provides a horizontal federation and vertical federation combined device, the horizontal federation and vertical federation combined device is applied to horizontal federation and vertical federation combined equipment, the horizontal federation and vertical federation combined device includes :
输入模块,用于所述获取可利用公共信息,并将所述可利用公共信息输入预设纵向联邦服务方,获得向量信息;The input module is used to obtain the available public information, and input the available public information into a preset vertical federated service party to obtain vector information;
第一更新模块,用于所述基于所述向量信息,对所述预设纵向联邦服务方的纵向联邦模型进行训练,更新各预设强化学习模型的网络权重;The first update module is configured to train the vertical federation model of the preset vertical federated server based on the vector information, and update the network weight of each preset reinforcement learning model;
第二更新模块,用于所述定期将更新后的各所述预设强化学习模型输入预设横向联邦服务器,对更新后的各所述预设强化学习模型进行迭代更新。The second update module is configured to periodically input the updated preset reinforcement learning models to a preset horizontal federated server, and iteratively update each of the updated preset reinforcement learning models.
此外,为实现上述目的,本发明还提供一种横向联邦和纵向联邦联合设备,所述横向联邦和纵向联邦联合设备包括:存储器、处理器以及存储在所述存储器上并可在所述处理器上运行的所述横向联邦和纵向联邦联合方法的程序,所述横向联邦和纵向联邦联合方法的程序被处理器执行时可实现如上述的横向联邦和纵向联邦联合方法的步骤。In addition, in order to achieve the above-mentioned object, the present invention also provides a combined horizontal federation and vertical federated device. The combined horizontal federated and vertical federated device includes a memory, a processor, and a memory, a processor, and a device stored in the memory and available on the processor. The horizontal federation and vertical federation combined method program running on the above-mentioned horizontal federation and vertical federation combined method program can realize the steps of the horizontal federation and vertical fed combined combined method as described above when the program of the horizontal federation and vertical federation combined method is executed by the processor.
此外,为实现上述目的,本发明还提供一种介质,且所述介质为计算机可读存储介质,所述介质上存储有实现横向联邦和纵向联邦联合方法的程序,所述横向联邦和纵向联邦联合方法的程序被处理器执行时实现如上述的横向联邦和纵向联邦联合方法的步骤。In addition, in order to achieve the above-mentioned object, the present invention also provides a medium, and the medium is a computer-readable storage medium, and the medium stores a program for realizing the combination method of horizontal federation and vertical federation. When the program of the joint method is executed by the processor, the steps of the above-mentioned horizontal federation and vertical federation joint method are realized.
本申请通过获取可利用公共信息,并将所述可利用公共信息输入预设纵向联邦服务方,获得向量信息,进而基于所述向量信息,对所述预设纵向联邦服务方的纵向联邦模型进行训练,更新各预设强化学习模型的网络权重,This application obtains vector information by obtaining available public information and inputting the available public information to a preset vertical federated service party, and then based on the vector information, performs the vertical federation model of the preset vertical federated service party Training, update the network weights of each preset reinforcement learning model,
进一步地,定期将更新后的各所述预设强化学习模型输入预设横向联邦服务器,对更新后的各所述预设强化学习模型进行迭代更新。也即,本申请首先进行可利用公共信息的获取,进而将所述可利用公共信息输入预设纵向联邦服务器,获得向量信息,进一步地,基于所述向量信息,进行对所述纵向联邦模型的训练,以更新各所述预设强化学习模型的网络权重,最后,定期将更新后的各所述预设强化学习模型输入预设横向联邦服务器,以进行对更新后的各所述预设强化学习模型的迭代更新。也即,本申请通过将所述可利用公共信息输入预设纵向联邦模型,对所述预设纵向联邦模型进行纵向联邦学习,进而更新各所述预设强化学习模型,所以,本申请中用于模型训练的训练数据更加全面和宽泛,使得模型的控制性能得到了提升,模型更加健壮,避免了使用单一的本地数据对模型进行训练,进一步地,通过定期将更新后的各所述预设强化学习模型输入预设横向联邦服务器,对各所述预设强化学习模型进行横向联邦学习,对更新后的各所述预设强化学习模型进行迭代更新,增加了各所述预设强化学习模型有效的训练数据,进而减少了训练效果低下的训练过程,进一步地,减少了单个所述预设强化学习模型计算系统资源的消耗,所以,解决了现有技术中强化学习模型的计算系统资源消耗高的技术问题。Further, each of the updated preset reinforcement learning models is periodically input to a preset horizontal federated server, and each of the updated preset reinforcement learning models is iteratively updated. That is, this application first obtains the available public information, and then inputs the available public information into a preset vertical federation server to obtain vector information, and further, based on the vector information, performs a comparison of the vertical federation model Training to update the network weights of each of the preset reinforcement learning models, and finally, periodically input the updated preset reinforcement learning models to the preset horizontal federated server to perform the reinforcement of the updated presets Iterative update of the learning model. That is, this application inputs the available public information into a preset vertical federation model, performs vertical federation learning on the preset vertical federation model, and then updates each preset reinforcement learning model. Therefore, this application uses The training data for model training is more comprehensive and broad, which improves the control performance of the model and makes the model more robust, avoiding the use of a single local data to train the model. Furthermore, by regularly updating each of the presets The reinforcement learning model is input to a preset horizontal federated server, and each preset reinforcement learning model is subjected to horizontal federated learning, and each preset reinforcement learning model after the update is updated iteratively, and each preset reinforcement learning model is added Effective training data, thereby reducing the training process with low training effects, and further reducing the consumption of the computing system resources of the single preset reinforcement learning model, thus solving the computing system resource consumption of the reinforcement learning model in the prior art High technical issues.
附图说明Description of the drawings
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本发明的实施例,并与说明书一起用于解释本发明的原理。The drawings here are incorporated into the specification and constitute a part of the specification, show embodiments in accordance with the present invention, and together with the specification are used to explain the principle of the present invention.
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,对于本领域普通技术人员而言,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the following will briefly introduce the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, for those of ordinary skill in the art, In other words, other drawings can be obtained based on these drawings without creative labor.
图1为本发明横向联邦和纵向联邦联合方法第一实施例的流程示意图;FIG. 1 is a schematic flowchart of the first embodiment of the combined method of horizontal federation and vertical federation according to the present invention;
图2为本发明横向联邦和纵向联邦联合方法中解析某应用软件界面的树状界面完整逻辑模型的示意图;Fig. 2 is a schematic diagram of analyzing a complete logical model of a tree interface of an application software interface in the combined method of horizontal federation and vertical federation of the present invention;
图3为本发明横向联邦和纵向联邦联合方法建立所述界面完整逻辑模型的流程图示意图;Fig. 3 is a schematic diagram of a flow chart of establishing a complete logical model of the interface by the combined method of horizontal federation and vertical federation of the present invention;
图4为本发明横向联邦和纵向联邦联合方法第二实施例的流程示意图;4 is a schematic flowchart of a second embodiment of the combined method of horizontal federation and vertical federation according to the present invention;
图5为本发明实施例方案涉及的硬件运行环境的设备结构示意图。FIG. 5 is a schematic diagram of the device structure of the hardware operating environment involved in the solution of the embodiment of the present invention.
本发明目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。The realization of the objectives, functional characteristics and advantages of the present invention will be further described in conjunction with the embodiments and with reference to the accompanying drawings.
本发明的实施方式Embodiments of the present invention
应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。It should be understood that the specific embodiments described here are only used to explain the present invention, but not to limit the present invention.
本发明提供一种横向联邦和纵向联邦联合方法,所述横向联邦和纵向联邦联合方法应用于横向联邦和纵向联邦联合设备,在本申请横向联邦和纵向联邦联合方法的第一实施例中,参照图1,所述横向联邦和纵向联邦联合方法包括:The present invention provides a combination method of horizontal federation and vertical federation. The combination method of horizontal federation and vertical federation is applied to the combined equipment of horizontal federation and vertical federation. In the first embodiment of the horizontal federation and vertical federation method of this application, refer to Figure 1. The combined method of horizontal federation and vertical federation includes:
步骤S10,获取可利用公共信息,并将所述可利用公共信息输入预设纵向联邦服务方,获得向量信息;Step S10: Obtain available public information, and input the available public information into a preset vertical federated service party to obtain vector information;
在本实施例中,需要说明的是,所述向量信息指的是预设强化学习模型的训练过程中产生的梯度信息,其中,所述梯度为一向量,可对预设损失函数求取偏导获去,且所述梯度的负方向为函数的当前值逼近极小值的指向方向,也即,所述梯度的负方向为损失函数值下降最快的方向,所述梯度的步长即为损失函数值的最大变化率。所述预设纵向联邦服务器是预先设置好的服务器,可联合不同的预设强化学习模型进行纵向联邦学习,其中,所述纵向联邦学习是在参与者的数据特征重叠较小,而用户重叠较多的情况下,取出参与者用户相同而用户数据特征不同的那部分用户及数据进行联合机器学习训练。例如,假设有属于同一个地区的两个参与者A和B,其中参与者A是一家银行,参与者B是一个电商平台,参与者A和B在同一地区拥有较多相同的用户,但是A与B的业务不同,记录的用户数据特征是不同的,特别地,A和B记录的用户数据特征可能是互补的,在这样的场景下,可以使用纵向联邦学习来帮助A和B构建联合机器学习预测模型,帮助A和B向客户提供更好的服务。In this embodiment, it should be noted that the vector information refers to the gradient information generated during the training process of the preset reinforcement learning model, where the gradient is a vector, and the preset loss function can be biased. And the negative direction of the gradient is the direction in which the current value of the function approaches the minimum value, that is, the negative direction of the gradient is the direction in which the loss function value drops fastest, and the step size of the gradient is Is the maximum rate of change of the loss function value. The preset vertical federated server is a preset server that can be combined with different preset reinforcement learning models to perform vertical federated learning, where the vertical federated learning is based on the fact that the data features of the participants are less overlapped, and the user overlaps are greater. In many cases, take out the part of users and data with the same participant user but different user data characteristics for joint machine learning training. For example, suppose there are two participants A and B belonging to the same region. Participant A is a bank and participant B is an e-commerce platform. Participants A and B have more and the same users in the same region, but The business of A and B is different, and the recorded user data characteristics are different. In particular, the user data characteristics recorded by A and B may be complementary. In such a scenario, vertical federated learning can be used to help A and B build a joint The machine learning predictive model helps A and B provide better services to customers.
具体地,通过预设强化学习模型向所述预设纵向联邦服务方发送消息请求,其中,所述消息请求中包括识别信息,基于所述识别信息,所述公共信息联邦方可从预设公共数据源中获取所述识别信息对应的可利用公共信息,进而将所述可利用公共信息输入公共信息联邦方中的纵向联邦模型,获得所述向量信息,例如,假设纵向联邦模型中采用的为批量梯度下降法对纵向联邦模型进行训练,则将所述可利用公共信息作为一批训练值输入公共信息联邦方中的纵向联邦模型,获得所述纵向联邦模型输出值,进而,计算所述纵向联邦模型的输出值与所述训练值对应的真实值之间的差异程度,也即计算本次训练的当前误差值,进而对预设损失函数分别求取关于模型误差和所述纵向联邦模型的模型权重的偏导数,其中,所述损失函数为关于所述模型权重和模型误差的二次函数,进而获得当前权重值和当前误差值共同对应的偏导数值,也即获得梯度向量值,也即,获得所述向量信息。Specifically, a message request is sent to the preset vertical federated service party through a preset reinforcement learning model, wherein the message request includes identification information, and based on the identification information, the public information federated party can obtain the information from the preset public Obtain the available public information corresponding to the identification information from the data source, and then input the available public information into the vertical federation model in the public information federation party to obtain the vector information. For example, suppose that the vertical federation model is The batch gradient descent method is used to train the vertical federation model, and the available public information is input as a batch of training values into the vertical federation model in the public information federation party to obtain the output value of the vertical federation model, and then calculate the vertical federation model. The degree of difference between the output value of the federated model and the true value corresponding to the training value, that is, the current error value of this training is calculated, and then the preset loss function is respectively obtained about the model error and the longitudinal federated model The partial derivative of the model weight, where the loss function is a quadratic function of the model weight and the model error, and then the partial derivative value corresponding to the current weight value and the current error value is obtained, that is, the gradient vector value is also obtained. That is, the vector information is obtained.
其中,在步骤S10中,所述获取可利用公共信息的步骤包括:Wherein, in step S10, the step of obtaining available public information includes:
步骤S11,接收预设强化学习模型的消息请求,通过预设纵向联邦方获取所述消息请求中的识别信息;Step S11, receiving a message request of a preset reinforcement learning model, and obtaining identification information in the message request through a preset vertical federated party;
在本实施例中,具体地,将各所述预设强化学习模型的消息请求发送至公共信息联邦方,进而通过所述公共信息联邦方获取所述消息请求中的识别信息,其中,所述消息请求中包括识别信息,所述识别信息包括地理位置坐标、车牌号等,其中,获取所述消息请求中的识别信息包括标签匹配、关键词匹配等方法。In this embodiment, specifically, the message request of each of the preset reinforcement learning models is sent to the public information federated party, and then the identification information in the message request is obtained through the public information federated party, wherein the The message request includes identification information, the identification information includes geographic location coordinates, license plate number, etc., where the identification information in the message request includes tag matching, keyword matching and other methods.
步骤S12,基于所述识别信息,通过所述预设纵向联邦方在预设公共数据源中匹配所述识别信息对应的所述可利用公共信息。Step S12, based on the identification information, match the available public information corresponding to the identification information in a preset public data source through the preset vertical federated party.
在本实施例中,需要说明的是,所述公共数据源包括众多强化学习模型的模型训练信息,其中,所述模型训练信息中包括可利用公共信息和不可利用公共信息。In this embodiment, it should be noted that the public data source includes model training information of a large number of reinforcement learning models, wherein the model training information includes available public information and unavailable public information.
具体地,所述识别信息包括识别标签、识别关键词和识别字符串等,通过所述预设纵向联邦方对所述公共数据源中的模型训练信息进行逐一对比,挑选出包括所述识别信息的模型训练信息,获得所述可利用公共信息。Specifically, the identification information includes identification tags, identification keywords, identification strings, etc., and the model training information in the public data source is compared one by one through the preset vertical federated party, and the identification information is selected to include the identification information. To obtain the available public information.
其中,在步骤S10中,所述预设纵向联邦服务方包括纵向联邦模型,所述纵向联邦模型包括当前权重值,Wherein, in step S10, the preset vertical federated service provider includes a vertical federated model, and the vertical federated model includes a current weight value,
所述将所述可利用公共信息输入预设纵向联邦服务方,获得向量信息的步骤包括:The step of inputting the available public information into a preset vertical federated service party to obtain vector information includes:
步骤S13,将所述可利用公共信息作为当前输入值输入所述纵向联邦模型,获得当前输出值;Step S13, input the available public information as the current input value into the vertical federation model to obtain the current output value;
在本实施例中,需要说明的是,所述纵向联邦模型包括神经网络模型,一所述当前输入值对应一所述当前输出值。In this embodiment, it should be noted that the longitudinal federation model includes a neural network model, and one of the current input values corresponds to the current output value.
具体地,将所述可利用公共信息作为当前输入值输入所述纵向联邦模型,对所述可利用公共信息以预先设置的数据处理方法进行数据处理,其中,所述预先设置的数据处理方法包括卷积处理、池化处理、全连接处理等,其中,假设当前输入值为图像,则所述卷积指的是对图像对应的图像矩阵和卷积核进行逐个元素相乘再求和,获得图像特征值的过程,所述卷积核指的是界面图像特征对应的权值矩阵,所述池化指的是对通过卷积而获得的图像特征值进行整合,从而获得新的特征值的过程,所述全连接可视为一种特殊卷积处理,所述特殊卷积处理的结果为获得图像对应的一个一维向量,进而获得所述当前输出值,其中,所述当前输出值包括图像、向量、判别结果、特征值等。Specifically, the available public information is input into the vertical federation model as the current input value, and the available public information is processed by a preset data processing method, wherein the preset data processing method includes Convolution processing, pooling processing, fully connected processing, etc., where, assuming that the current input value is an image, the convolution refers to multiplying the image matrix and the convolution kernel corresponding to the image element by element and then summing them to obtain The process of image feature value, the convolution kernel refers to the weight matrix corresponding to the interface image feature, and the pooling refers to the integration of image feature values obtained through convolution to obtain new feature values In the process, the full connection can be regarded as a special convolution process, and the result of the special convolution process is to obtain a one-dimensional vector corresponding to the image, and then obtain the current output value, where the current output value includes Images, vectors, discrimination results, eigenvalues, etc.
步骤S14,将所述当前输出值与预设当前真实值进行比对,获得当前误差值;Step S14, comparing the current output value with a preset current true value to obtain a current error value;
在本实施例中,需要说明的是,每一所述当前输入值均对应一当前真实值,所述点前时间点真实值即为模型理论输出值。In this embodiment, it should be noted that each of the current input values corresponds to a current real value, and the real value at the time before the point is the theoretical output value of the model.
具体地,例如,将所述当前输出值为X,所述预设当前真实值为Y,则所述训练输出值和所述真实输出值的差值为X-Y,所述当前误差值为(X-Y)/X。Specifically, for example, if the current output value is X and the preset current true value is Y, then the difference between the training output value and the true output value is XY, and the current error value is (XY )/X.
步骤S15,基于所述当前权重值和所述当前误差值,对预设损失函数求偏导,获得所述当前权重值和所述当前误差值共同对应的向量信息。Step S15, based on the current weight value and the current error value, obtain a partial derivative of a preset loss function to obtain vector information corresponding to the current weight value and the current error value.
在本实施例中,需要说明的是,所述预设损失函数指的是关于所述模型权重和所述模型误差的二次函数。In this embodiment, it should be noted that the preset loss function refers to a quadratic function with respect to the model weight and the model error.
具体地,求取预设损失函数关于所述模型权重和所述模型误差的偏导数,由于所述当前权重值和所述当前误差值为所述预设损失函数中的特定点取值,进而获取所述特定点取值对应的偏导数,进而获得所述当前权重值和所述当前误差值共同对应的向量信息,例如,假设所述预设损失函数为f(x,y),所述模型权重为x,所述模型误差为y,则所述梯度向量也即偏导数为(∂f(x,y)/∂x,∂f(x,y)/∂y),若当前权重值为0.5,当前误差值为0.1,则所述向量信息为当x=0.5,且y=0.1时的梯度向量值。Specifically, the partial derivative of the preset loss function with respect to the model weight and the model error is obtained, since the current weight value and the current error value are taken at a specific point in the preset loss function, and then Obtain the partial derivative corresponding to the value of the specific point, and then obtain the vector information corresponding to the current weight value and the current error value. For example, assuming that the preset loss function is f(x, y), the The model weight is x and the model error is y, then the gradient vector, that is, the partial derivative is (∂f(x,y)/∂x,∂f(x,y)/∂y), if the current weight value If it is 0.5 and the current error value is 0.1, then the vector information is the gradient vector value when x=0.5 and y=0.1.
步骤S20,基于所述向量信息,对所述预设纵向联邦服务方的纵向联邦模型进行训练,更新各预设强化学习模型的网络权重;Step S20: Based on the vector information, train the vertical federation model of the preset vertical federated service provider, and update the network weight of each preset reinforcement learning model;
在本实施例中,需要说明的是,所述向量信息包括梯度向量。In this embodiment, it should be noted that the vector information includes a gradient vector.
具体地,基于所述向量信息,对所述预设纵向联邦服务方的纵向联邦模型进行训练,以获取样本信息,进而基于所述样本信息,对各所述预设强化学习模型进行训练,进而更新各所述预设强化学习模型的网络权重。Specifically, based on the vector information, train the vertical federation model of the preset vertical federation service provider to obtain sample information, and then train each of the preset reinforcement learning models based on the sample information, and then Update the network weight of each of the preset reinforcement learning models.
步骤S30,定期将更新后的各所述预设强化学习模型输入预设横向联邦服务器,对更新后的各所述预设强化学习模型进行迭代更新。In step S30, each of the updated preset reinforcement learning models is input to a preset horizontal federated server regularly, and each of the updated preset reinforcement learning models is iteratively updated.
在本实施例中,需要说明的是,所述预设横向联邦服务器是预先设置好的服务器,可联合不同的预设强化学习模型进行横向联邦学习,其中,所述横向联邦学习是在各个参与者的数据特征重叠较多,而用户重叠较少的情况下,取出参与者数据特征相同而用户不完全相同的那部分数据进行联合机器学习。例如,假设参与者中有两家不同地区的银行,它们的用户群体分别来自各自所在的地区,相互的交集很小,但是它们的业务很相似,记录的用户数据特征很大部分是相同的,则可以使用横向联邦学习来帮助两家银行构建联合模型来预测他们的客户行为,另外本实施例中所有的信息交互均可选择进行加密处理,是否进行加密处理由用户自行选择。In this embodiment, it should be noted that the preset horizontal federated server is a preset server that can be combined with different preset reinforcement learning models to perform horizontal federated learning, where the horizontal federated learning is performed in each participant When the data features of the participants overlap more, and the users overlap less, the part of the data that has the same data features of the participants but not the same users is taken out for joint machine learning. For example, suppose there are two banks in different regions among the participants. Their user groups are from their respective regions, and their mutual intersections are small, but their businesses are very similar, and most of the recorded user data characteristics are the same. Then, horizontal federated learning can be used to help two banks build a joint model to predict their customer behavior. In addition, all information interactions in this embodiment can be encrypted, and the user can choose whether to perform the encryption.
具体地,定期将更新后的各所述预设强化学习模型的模型参数输入预设横向联邦服务器,对各所述模型参数进行融合,获得全局模型参数,其中,所述模型参数包括梯度信息、权重信息等,进而将所述全局模型参数分发给各所述预设强化学习模型,各所述预设强化学习模型将收到的全局模型参数用作本地模型训练的起始点或者是作为本地模型的最新模型参数,以开始训练或者是继续训练所述预设强化学习模型,如图2所示为基于混合横向和纵向联邦的强化学习架构,其中,所述强化学习Agent1和强化学习Agent2为不同的强化学习模型,所述数据存储为存储样本信息的数据存储库,所述数据源用于接收各所述预设强化学习模型发送的传感器数据,所述控制器用于实现所述控制信息对应的操作Specifically, the updated model parameters of each of the preset reinforcement learning models are input to a preset horizontal federated server on a regular basis, and each of the model parameters is fused to obtain global model parameters, where the model parameters include gradient information, Weight information, etc., and then distribute the global model parameters to each of the preset reinforcement learning models, and each preset reinforcement learning model uses the received global model parameters as the starting point for local model training or as a local model The latest model parameters of, to start training or continue to train the preset reinforcement learning model, as shown in Figure 2 is a reinforcement learning architecture based on hybrid horizontal and vertical federation, where the reinforcement learning Agent1 and reinforcement learning Agent2 are different The data is stored as a data storage library storing sample information, the data source is used to receive sensor data sent by each of the preset reinforcement learning models, and the controller is used to implement the control information corresponding to the operating
本实施例通过获取可利用公共信息,并将所述可利用公共信息输入预设纵向联邦服务方,获得向量信息,进而基于所述向量信息,对所述预设纵向联邦服务方的纵向联邦模型进行训练,更新各预设强化学习模型的网络权重,进一步地,定期将更新后的各所述预设强化学习模型输入预设横向联邦服务器,对更新后的各所述预设强化学习模型进行迭代更新。也即,本实施例首先进行可利用公共信息的获取,进而将所述可利用公共信息输入预设纵向联邦服务器,获得向量信息,进一步地,基于所述向量信息,进行对所述纵向联邦模型的训练,以更新各所述预设强化学习模型的网络权重,最后,定期将更新后的各所述预设强化学习模型输入预设横向联邦服务器,以进行对更新后的各所述预设强化学习模型的迭代更新。也即,本实施例通过将所述可利用公共信息输入预设纵向联邦模型,对所述预设纵向联邦模型进行纵向联邦学习,进而更新各所述预设强化学习模型,所以,本实施例中用于模型训练的训练数据更加全面和宽泛,使得模型的控制性能得到了提升,模型更加健壮,进一步地,通过定期将更新后的各所述预设强化学习模型输入预设横向联邦服务器,对各所述预设强化学习模型进行横向联邦学习,对更新后的各所述预设强化学习模型进行迭代更新,进一步地,提升了模型的控制性能和健壮性,增加了各所述预设强化学习模型有效的训练数据,进而减少了训练效果低下的训练过程,进一步地,减少了单个所述预设强化学习模型计算系统资源的消耗,所以,解决了现有技术中强化学习模型的计算系统资源消耗高的技术问题。This embodiment obtains vector information by obtaining available public information and inputting the available public information into a preset vertical federated service party to obtain vector information, and then based on the vector information, compare the vertical federation model of the preset vertical federated service party Perform training, update the network weight of each preset reinforcement learning model, and further, periodically input the updated preset reinforcement learning model to the preset horizontal federated server, and perform the update on each preset reinforcement learning model. Iterative update. That is, this embodiment first acquires the available public information, and then inputs the available public information into a preset vertical federation server to obtain vector information, and further, based on the vector information, perform a comparison of the vertical federation model Training to update the network weights of each of the preset reinforcement learning models, and finally, periodically input the updated preset reinforcement learning models to the preset horizontal federated server to perform the update of each of the preset reinforcement learning models. Iterative update of reinforcement learning model. That is, in this embodiment, by inputting the available public information into a preset vertical federation model, the preset vertical federation model is subjected to vertical federation learning, and then each preset reinforcement learning model is updated. Therefore, this embodiment The training data used for model training is more comprehensive and broad, which improves the control performance of the model and makes the model more robust. Furthermore, by regularly inputting the updated preset reinforcement learning models to the preset horizontal federated server, Perform horizontal federated learning on each of the preset reinforcement learning models, and iteratively update each of the updated preset reinforcement learning models, and further improve the control performance and robustness of the model, and increase each preset The effective training data of the reinforcement learning model further reduces the training process with low training effects, and further reduces the consumption of the calculation system resources of the single preset reinforcement learning model. Therefore, it solves the calculation of the reinforcement learning model in the prior art. Technical problem of high system resource consumption.
进一步地,参照图3,基于本申请中第一实施例,在横向联邦和纵向联邦联合方法的另一实施例中,所述基于各所述向量信息,对各所述预设强化学习模型进行训练,以更新各所述预设强化学习模型的步骤包括:Further, referring to FIG. 3, based on the first embodiment of the present application, in another embodiment of the horizontal federation and vertical federation method, the predetermined reinforcement learning model is performed based on each of the vector information. The steps of training to update each of the preset reinforcement learning models include:
步骤S21,接收各所述预设强化学习模型发送的传感器数据,并基于所述传感器数据和所述向量信息,通过所述纵向联邦模型产生控制信息;Step S21, receiving sensor data sent by each of the preset reinforcement learning models, and generating control information through the longitudinal federated model based on the sensor data and the vector information;
在本实施例中,需要说明的是,基于所述控制信息,通过预设控制器可对预设强化学习模型进行控制,例如,假设所述纵向联邦模型为无人车,则通过所述控制信息可控制无人车的行进速度和行进方向。In this embodiment, it should be noted that, based on the control information, the preset reinforcement learning model can be controlled by the preset controller. For example, if the longitudinal federated model is an unmanned vehicle, the control The information can control the speed and direction of the unmanned vehicle.
具体地,在预设强化学习模型对应的本地数据源中获取所述传感器数据,并将所述传感器数据发送至预设公共联邦方,其中,所述传感器数据包括距离传感器数据、压力传感器数据和速度传感器数据等,也即,所述传感器数据表明了所述纵向联邦模型的当前时间步的状态信息,进而基于所述传感器数据和所述向量信息,通过所述纵向联邦模型产生控制信息,其中,所述向量信息对应的梯度向量的方向为所述纵向联邦模型需进行训练的方向,使得所述纵向联邦模型向下一时间步状态信息进行训练,所述控制信息可控制所述纵向联邦模型向着下一时间步状态信息进行训练。Specifically, the sensor data is acquired from a local data source corresponding to a preset reinforcement learning model, and the sensor data is sent to a preset public federal party, where the sensor data includes distance sensor data, pressure sensor data, and Speed sensor data, etc., that is, the sensor data indicates the state information of the current time step of the longitudinal federated model, and then based on the sensor data and the vector information, control information is generated through the longitudinal federated model, wherein The direction of the gradient vector corresponding to the vector information is the direction in which the longitudinal federated model needs to be trained, so that the longitudinal federated model is trained for the next time step state information, and the control information can control the longitudinal federated model Train towards the next time step status information.
步骤S22,在所述控制信息对应的训练环境下,对所述纵向联邦模型进行训练,获得奖励信息和下一时间步状态信息;Step S22, in the training environment corresponding to the control information, train the vertical federation model to obtain reward information and next time step status information;
在本实施例中,需要说明的是,所述奖励信息通过预设奖励函数计算获取,所述奖励函数用于为所述纵向联邦模型加入非线性因素,所述下一时间步状态信息为在对所述纵向联邦模型进行训练后,对所述纵向联邦模型的网络权重进行更新后的所述纵向联邦模型的模型状态信息,且在进行所述纵向联邦模型更新之前,也即,获取所述下一时间步状态信息之前,判断本次更新是否有利于降低模型误差,若可降低模型误差,则进行更新,若不可降低模型误差,则不进行更新。In this embodiment, it should be noted that the reward information is calculated and obtained by a preset reward function, the reward function is used to add a non-linear factor to the vertical federation model, and the next time step state information is After training the vertical federation model, the model state information of the vertical federation model after updating the network weight of the vertical federation model, and before performing the updating of the vertical federation model, that is, obtain the Before the next time step status information, it is judged whether this update is beneficial to reduce the model error, if the model error can be reduced, then the update is performed, if the model error cannot be reduced, the update is not performed.
具体地,在所述控制信息对应的训练环境下,对所述纵向联邦模型进行训练,获取所述述纵向联邦模型中的神经网络的每一神经元的奖励信息和网络权重,也即,获得奖励信息和下一时间步状态信息,其中,所述神经元包括卷积层、池化层、全连接层等。Specifically, in the training environment corresponding to the control information, the longitudinal federation model is trained to obtain the reward information and the network weight of each neuron of the neural network in the longitudinal federation model, that is, to obtain Reward information and next time step state information, where the neuron includes a convolutional layer, a pooling layer, a fully connected layer, and so on.
步骤S23,将所述奖励信息、所述下一时间步状态信息和所述控制信息存储为样本信息,并基于所述样本信息,更新各所述预设强化学习模型的网络权重。Step S23: Store the reward information, the next time step status information, and the control information as sample information, and update the network weight of each preset reinforcement learning model based on the sample information.
在本实施例中,具体地,将所述奖励信息、所述下一时间步状态信息和所述控制信息合并为样本信息,并存储在各所述预设预设强化学习模型对应的数据存储库中,进而各所述预设强化学习模型可从各自对应的数据存储库提取样本信息进行训练,并根据训练结果更新各所述预设强化学习模型的网络权重。In this embodiment, specifically, the reward information, the next time step state information, and the control information are combined into sample information, and stored in the data storage corresponding to each of the preset reinforcement learning models. In the library, each of the preset reinforcement learning models can extract sample information from the corresponding data storage library for training, and update the network weight of each preset reinforcement learning model according to the training result.
其中,在步骤S23中,所述基于所述样本信息,更新各所述预设强化学习模型的网络权重的步骤包括:Wherein, in step S23, the step of updating the network weight of each of the preset reinforcement learning models based on the sample information includes:
步骤S231,将所述样本信息作为训练数据输入所述预设强化学习模型,以对所述预设强化学习模型进行训练,获得训练输出值;Step S231: Input the sample information as training data into the preset reinforcement learning model to train the preset reinforcement learning model to obtain a training output value;
在本实施例中,具体地,将所述样本信息作为训练数据输入所述预设强化学习模型,以对所述训练数据进行数据处理,其中,所述数据处理包括卷积、池化、全连接等,进而获得训练输出值,其中,所述训练输出值包括图像、向量、数值等。In this embodiment, specifically, the sample information is input to the preset reinforcement learning model as training data to perform data processing on the training data, where the data processing includes convolution, pooling, and full Connect and so on to obtain training output values, where the training output values include images, vectors, values, and so on.
步骤S232,将所述训练输出值与所述训练数据对应的真实输出值进行比对,获得模型误差值;Step S232, comparing the training output value with the actual output value corresponding to the training data to obtain a model error value;
在本实施例中将所述训练输出值与所述训练数据对应的真实输出值进行比对,获得模型误差值,具体地,例如,将所述训练输出值为X,所述真实输出值为Y,则所述训练输出值和所述真实输出值的差值为X-Y,所述当前误差值为(X-Y)/X。In this embodiment, the training output value is compared with the real output value corresponding to the training data to obtain a model error value. Specifically, for example, the training output value is set to X, and the real output value is Y, the difference between the training output value and the real output value is XY, and the current error value is (XY)/X.
步骤S233,将所述模型误差值与预设误差阀值进行比对,若所述模型误差值小于所述预设误差阀值,则完成对所述预设强化学习模型的训练;Step S233, comparing the model error value with a preset error threshold, and if the model error value is less than the preset error threshold, complete the training of the preset reinforcement learning model;
在本实施例中,需要说明的是,所述模型误差值小于所述预设误差阀值为完成对所述预设强化学习模型的训练的可选训练完成条件之一,所述训练完成条件还包括损失函数收敛、模型参数收敛、达到最大迭代次数、达到最大训练时间等,其中所述模型参数包括所述模型误差值。In this embodiment, it should be noted that the model error value is less than the preset error threshold value as one of the optional training completion conditions for completing the training of the preset reinforcement learning model, and the training completion condition It also includes loss function convergence, model parameter convergence, reaching the maximum number of iterations, reaching the maximum training time, etc., where the model parameters include the model error value.
步骤S234,若所述模型误差值大于或等于所述预设误差阀值,则基于所述模型误差值更新所述预设强化学习模型的网络权重,对所述预设强化学习模型重新进行训练。Step S234: If the model error value is greater than or equal to the preset error threshold, update the network weight of the preset reinforcement learning model based on the model error value, and retrain the preset reinforcement learning model .
在本实施例中,需要说明的是,所述网络权重即为卷积核或者权值矩阵。In this embodiment, it should be noted that the network weight is a convolution kernel or a weight matrix.
具体地,若所述模型误差值大于或等于所述预设误差阀值,则基于所述模型误差值获取相对应的梯度向量值,并基于所述梯度向量值更新所述预设强化学习模型的网络权重,并对所述预设强化学习模型重新进行训练,直至达成预先设置的所述训练完成条件。Specifically, if the model error value is greater than or equal to the preset error threshold, the corresponding gradient vector value is obtained based on the model error value, and the preset reinforcement learning model is updated based on the gradient vector value And retrain the preset reinforcement learning model until the preset training completion condition is reached.
本实施例通过接收各所述预设强化学习模型发送的传感器数据,并基于所述传感器数据和所述向量信息,通过所述纵向联邦模型产生控制信息,进而在所述控制信息对应的训练环境下,对所述纵向联邦模型进行训练,获得奖励信息和下一时间步状态信息,进一步地,将所述奖励信息、所述下一时间步状态信息和所述控制信息存储为样本信息,并基于所述样本信息,更新各所述预设强化学习模型的网络权重。也即,本实施例首先进行传感器数据的获取,进而基于所述传感器数据和所述向量信息,通过所述纵向联邦模型产生控制信息,进一步地,在所述控制信息对应的训练环境下,进行对所述纵向联邦模型的训练,获得奖励信息和下一时间步状态信息,最后,进行所述奖励信息、所述下一时间步状态信息和所述控制信息的存储,获得样本信息,以基于所述样本信息,进行对各所述预设强化学习模型的更新。也即,本实施例通过将各所述预设强化学习模型对应的所述可利用公共信息转化为样本信息,进而实现联合多个所述预设强化学习模型的数据对各所述预设强化学习模型进行训练更新的目的,极大程度上增强了各所述预设强化学习模型的控制性能和健壮性,减少了单个所述预设强化学习模型的模型训练时间和训练量,进而减少了单个所述预设强化学习模型的计算系统资源消耗,所以,为解决现有技术中强化学习模型的计算系统资源消耗高的技术问题奠定了基础。In this embodiment, by receiving sensor data sent by each of the preset reinforcement learning models, and based on the sensor data and the vector information, control information is generated through the longitudinal federated model, and then in the training environment corresponding to the control information Next, train the vertical federation model to obtain reward information and next time step status information, and further, store the reward information, the next time step status information, and the control information as sample information, and Based on the sample information, the network weight of each preset reinforcement learning model is updated. That is, this embodiment first acquires sensor data, and then generates control information through the longitudinal federated model based on the sensor data and the vector information. Further, in the training environment corresponding to the control information, perform The training of the vertical federation model obtains reward information and next time step status information. Finally, the reward information, the next time step status information, and the control information are stored to obtain sample information based on The sample information is updated to each of the preset reinforcement learning models. That is, in this embodiment, by converting the available public information corresponding to each of the preset reinforcement learning models into sample information, it is possible to combine the data of multiple preset reinforcement learning models to strengthen each of the preset reinforcement learning models. The purpose of training and updating the learning model greatly enhances the control performance and robustness of each preset reinforcement learning model, reduces the model training time and training volume of a single preset reinforcement learning model, and thereby reduces The computing system resource consumption of a single preset reinforcement learning model lays a foundation for solving the technical problem of high computing system resource consumption of the reinforcement learning model in the prior art.
进一步地,参照图4,基于本申请中第一实施例和第二实施例,在横向联邦和纵向联邦联合方法的另一实施例中,所述定期将更新后的各所述预设强化学习模型输入预设横向联邦服务器,对更新后的各所述预设强化学习模型进行迭代更新的步骤包括:Further, referring to FIG. 4, based on the first embodiment and the second embodiment of the present application, in another embodiment of the horizontal federation and vertical federation method, the preset reinforcement learning is regularly updated The model is input to the preset horizontal federated server, and the steps of iteratively updating each of the preset reinforcement learning models after the update include:
步骤S31,定期将更新后的各所述预设强化学习模型输入所述预设横向联邦服务器,以基于预设联邦规则对更新后的各所述预设强化学习模型进行横向联邦,获得横向联邦模型;Step S31: Periodically input each of the updated preset reinforcement learning models to the preset horizontal federation server to perform horizontal federation on each of the updated preset reinforcement learning models based on preset federation rules to obtain a horizontal federation model;
在本实施例中,需要说明的是,所述预设横向联邦服务器是预先设置好的可用于进行横向联邦学习的服务器,所述定期可由用户自行设置定期时间,例如,假设所述定期时间设置为10分钟,则每隔10分钟向所述预设横向联邦服务器发送一次所述更新后的各所述预设强化学习模型。In this embodiment, it should be noted that the preset horizontal federated server is a preset server that can be used for horizontal federated learning, and the regular time can be set by the user. For example, suppose the regular time is set If it is 10 minutes, each of the updated preset reinforcement learning models is sent to the preset horizontal federated server every 10 minutes.
具体地,定期将更新后的各所述预设强化学习模型输入所述预设横向联邦服务器,以将各所述预设强化学习模型的模型参数发送至所述横向联邦服务器,对各所述模型参数进行融合,获得全局模型参数,并基于所述全局模型参数对所述各所述预设强化学习模型进行更新,获得横向联邦模型。Specifically, each of the updated preset reinforcement learning models is input to the preset horizontal federation server on a regular basis, so as to send the model parameters of each preset reinforcement learning model to the horizontal federated server, and each of the preset reinforcement learning models is sent to the horizontal federated server. The model parameters are fused to obtain global model parameters, and the preset reinforcement learning models are updated based on the global model parameters to obtain a horizontal federation model.
其中,所述更新后的各所述预设强化学习模型包括更新模型参数,Wherein, each of the updated preset reinforcement learning models includes updated model parameters,
所述定期将更新后的各所述预设强化学习模型输入所述预设横向联邦服务器,以基于预设联邦规则对更新后的各所述预设强化学习模型进行横向联邦,获得横向联邦模型步骤包括:Said regularly inputting each of the updated preset reinforcement learning models into the preset horizontal federation server to perform horizontal federation on each of the updated preset reinforcement learning models based on preset federation rules to obtain a horizontal federation model The steps include:
步骤S311,定期将各所述更新模型参数输入所述预设横向联邦服务器,以对各所述更新模型参数进行融合,获得全局模型参数;Step S311, periodically inputting each of the updated model parameters to the preset horizontal federated server to fuse each of the updated model parameters to obtain global model parameters;
在本实施例中,具体地,将各所述更新模型参数输入所述预设横向联邦服务器,以将各所述更新模型参数进行预设规则的数据处理,其中,所述预设规则的数据处理包括求平均值、加权平均的等,进而获得所述全局模型参数,其中,参与加权平均的各所述更新模型参数对应的权重占比由用户自行设定。In this embodiment, specifically, each of the updated model parameters is input to the preset horizontal federated server to perform data processing of each of the updated model parameters according to a preset rule, wherein the data of the preset rule The processing includes averaging, weighted average, etc., to obtain the global model parameter, wherein the weight proportion corresponding to each of the updated model parameters participating in the weighted average is set by the user.
步骤S312,将所述全局模型参数分发至更新后的各所述预设强化学习模型,以基于所述全局模型参数对更新后的所述预设强化学习模型进行训练,获得所述横向联邦模型。Step S312: Distribute the global model parameters to each of the updated preset reinforcement learning models, so as to train the updated preset reinforcement learning models based on the global model parameters to obtain the horizontal federation model .
在本实施例中,具体地,将所述全局模型参数分发至更新后的各所述预设强化学习模型,以将所述全局模型参数作为各所述预设强化学习模型的模型训练起始点或者直接替换各所述预设强化学习模型的本地模型参数,进而对更新后的所述预设强化学习模型进行训练,获得所述横向联邦模型。In this embodiment, specifically, the global model parameters are distributed to each of the updated preset reinforcement learning models, so that the global model parameters are used as the model training starting point of each preset reinforcement learning model Or directly replace the local model parameters of each preset reinforcement learning model, and then train the updated preset reinforcement learning model to obtain the horizontal federated model.
步骤S32,基于所述横向联邦模型,对更新后的各所述预设强化学习模型进行迭代更新。Step S32, based on the horizontal federation model, iteratively update each of the updated preset reinforcement learning models.
在本实施例中,具体地,基于所述横向联邦模型中的全局模型参数,将所述全局模型参数作为各所述预设强化学习模型的模型训练起始点或者直接替换各所述预设强化学习模型的本地模型参数,进而对更新后的所述预设强化学习模型进行训练,并判断训练后的所述预设强化学习模型是否达到训练完成条件,若达到训练完成条件,则完成对所述预设强化学习模型的训练,若未达到预设训练完成条件,则更新所述预设强化学习模型的网络权重,对所述预设强化学习模型重新进行训练,知道达到所述训练完成条件,其中,所述训练完成条件包括损失函数收敛、模型参数收敛、达到最大迭代次数、达到最大训练时间等。In this embodiment, specifically, based on the global model parameters in the horizontal federation model, the global model parameters are used as the model training starting point of each of the preset reinforcement learning models or directly replace each of the preset reinforcements. Learn the local model parameters of the model, and then train the updated preset reinforcement learning model, and determine whether the preset reinforcement learning model after training meets the training completion conditions, and if the training completion conditions are met, the training is completed For the training of the preset reinforcement learning model, if the preset training completion condition is not met, update the network weight of the preset reinforcement learning model, and retrain the preset reinforcement learning model until the training completion condition is reached Wherein, the training completion conditions include loss function convergence, model parameter convergence, reaching the maximum number of iterations, reaching the maximum training time, and the like.
本实施例通过定期将更新后的各所述预设强化学习模型输入所述预设横向联邦服务器,以基于预设联邦规则对更新后的各所述预设强化学习模型进行横向联邦,获得横向联邦模型,进而基于所述横向联邦模型,对更新后的各所述预设强化学习模型进行迭代更新。也即,本实施例通过定期将更新后的各所述预设强化学习模型输入所述预设横向联邦服务器,以基于预设联邦规则进行对更新后的各所述预设强化学习模型的横向联邦,获得横向联邦模型,进而基于所述横向联邦模型,进行对更新后的各所述预设强化学习模型的迭代更新。也即,本实施提供了一种进行横向联邦的方法,通过定期将更新后的各所述预设强化学习模型输入所述预设横向联邦服务器,联合更新后的各所述预设强化学习模型进行学习,获取更新后的各所述预设强化学习模型共同对应的横向联邦模型,进而基于所述横向联邦模型,对更新后的各所述预设强化学习模型进行迭代更新,进一步地提升了模型的控制性能和健壮性,减少了单个所述预设强化学习模型的模型训练时间和训练量,进而减少了单个所述预设强化学习模型的计算系统资源消耗,所以为解决现有技术中强化学习模型控制性能差、健壮性低的技术问题奠定了基础。In this embodiment, by periodically inputting each of the updated preset reinforcement learning models into the preset horizontal federation server to perform horizontal federation on each of the updated preset reinforcement learning models based on preset federation rules, to obtain the horizontal federation. The federation model further updates each of the updated preset reinforcement learning models iteratively based on the horizontal federation model. That is, in this embodiment, by periodically inputting each of the updated preset reinforcement learning models to the preset horizontal federation server, the updated preset reinforcement learning models are performed on the basis of preset federation rules. Federation, obtaining a horizontal federation model, and then, based on the horizontal federation model, iteratively update each of the updated preset reinforcement learning models. That is, this implementation provides a method for performing horizontal federation, by periodically inputting each of the updated preset reinforcement learning models into the preset horizontal federation server, and uniting the updated preset reinforcement learning models Perform learning to obtain the horizontal federation model corresponding to each of the updated preset reinforcement learning models, and then, based on the horizontal federation model, iteratively update each of the updated preset reinforcement learning models, which further improves The control performance and robustness of the model reduces the model training time and training volume of a single preset reinforcement learning model, thereby reducing the computing system resource consumption of a single preset reinforcement learning model. Therefore, in order to solve the problem in the prior art The technical problems of poor control performance and low robustness of the reinforcement learning model laid the foundation.
参照图5,图5是本发明实施例方案涉及的硬件运行环境的设备结构示意图。Referring to FIG. 5, FIG. 5 is a schematic diagram of the device structure of the hardware operating environment involved in the solution of the embodiment of the present invention.
如图5所示,该横向联邦和纵向联邦联合设备可以包括:处理器1001,例如CPU,存储器1005,通信总线1002。其中,通信总线1002用于实现处理器1001和存储器1005之间的连接通信。存储器1005可以是高速RAM存储器,也可以是稳定的存储器(non-volatile memory),例如磁盘存储器。存储器1005可选的还可以是独立于前述处理器1001的存储设备。As shown in FIG. 5, the horizontal federation and vertical federation combined device may include a processor 1001, such as a CPU, a memory 1005, and a communication bus 1002. Among them, the communication bus 1002 is used to implement connection and communication between the processor 1001 and the memory 1005. The memory 1005 may be a high-speed RAM memory, or a stable memory (non-volatile memory), such as a magnetic disk memory. Optionally, the memory 1005 may also be a storage device independent of the aforementioned processor 1001.
可选地,该横向联邦和纵向联邦联合设备还可以包括矩形用户接口、网络接口、摄像头、RF(Radio Frequency,射频)电路,传感器、音频电路、WiFi模块等等。矩形用户接口可以包括显示屏(Display)、输入子模块比如键盘(Keyboard),可选矩形用户接口还可以包括标准的有线接口、无线接口。网络接口可选的可以包括标准的有线接口、无线接口(如WI-FI接口)。Optionally, the combined horizontal and vertical federation equipment may also include a rectangular user interface, a network interface, a camera, an RF (Radio Frequency, radio frequency) circuit, a sensor, an audio circuit, a WiFi module, and so on. The rectangular user interface may include a display screen (Display) and an input sub-module such as a keyboard (Keyboard), and the optional rectangular user interface may also include a standard wired interface and a wireless interface. Optional network interface can include standard wired interface, wireless interface (such as WI-FI interface).
本领域技术人员可以理解,图5中示出的横向联邦和纵向联邦联合设备结构并不构成对横向联邦和纵向联邦联合设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。Those skilled in the art can understand that the horizontal federation and vertical federation combined equipment structure shown in FIG. 5 does not constitute a limitation on the horizontal federation and vertical federation combined equipment, and may include more or less components than shown in the figure, or a combination Certain components, or different component arrangements.
如图5所示,作为一种计算机存储介质的存储器1005中可以包括操作系统、网络通信模块以及横向联邦和纵向联邦联合程序。操作系统是管理和控制横向联邦和纵向联邦联合设备硬件和软件资源的程序,支持横向联邦和纵向联邦联合程序以及其它软件和/或程序的运行。网络通信模块用于实现存储器1005内部各组件之间的通信,以及与横向联邦和纵向联邦联合系统中其它硬件和软件之间通信。As shown in FIG. 5, the memory 1005 as a computer storage medium may include an operating system, a network communication module, and a horizontal federation and a vertical federation joint program. The operating system is a program that manages and controls the hardware and software resources of horizontal federation and vertical federation combined equipment, and supports the operation of horizontal federation and vertical federation combined programs and other software and/or programs. The network communication module is used to realize the communication between the components in the memory 1005 and the communication with other hardware and software in the horizontal federation and vertical federation system.
在图5所示的横向联邦和纵向联邦联合设备中,处理器1001用于执行存储器1005中存储的横向联邦和纵向联邦联合程序,实现上述任一项所述的横向联邦和纵向联邦联合方法的步骤。In the horizontal federation and vertical federation combined equipment shown in FIG. 5, the processor 1001 is used to execute the horizontal federation and vertical federation combined program stored in the memory 1005 to implement the horizontal federation and vertical federation combined method described in any one of the above step.
本发明横向联邦和纵向联邦联合设备具体实施方式与上述横向联邦和纵向联邦联合方法各实施例基本相同,在此不再赘述。The specific implementation of the horizontal federation and vertical federation combined equipment of the present invention is basically the same as the foregoing embodiments of the horizontal federation and vertical federation combined method, and will not be repeated here.
本发明还提供一种横向联邦和纵向联邦联合装置,所述横向联邦和纵向联邦联合装置包括:The present invention also provides a combined device for horizontal federation and vertical federation. The combined device for horizontal federation and vertical federation includes:
输入模块,用于所述获取可利用公共信息,并将所述可利用公共信息输入预设纵向联邦服务方,获得向量信息;The input module is used to obtain the available public information, and input the available public information into a preset vertical federated service party to obtain vector information;
第一更新模块,用于所述基于所述向量信息,对所述预设纵向联邦服务方的纵向联邦模型进行训练,更新各预设强化学习模型的网络权重;The first update module is configured to train the vertical federation model of the preset vertical federated server based on the vector information, and update the network weight of each preset reinforcement learning model;
第二更新模块,用于所述定期将更新后的各所述预设强化学习模型输入预设横向联邦服务器,对更新后的各所述预设强化学习模型进行迭代更新。The second update module is configured to periodically input the updated preset reinforcement learning models to a preset horizontal federated server, and iteratively update each of the updated preset reinforcement learning models.
可选地,所述第一更新模块包括:Optionally, the first update module includes:
获取单元,用于所述接收各所述预设强化学习模型发送的传感器数据,并基于所述传感器数据和所述向量信息,通过所述纵向联邦模型产生控制信息;An acquiring unit, configured to receive sensor data sent by each of the preset reinforcement learning models, and generate control information through the longitudinal federated model based on the sensor data and the vector information;
第一训练单元,用于所述在所述控制信息对应的训练环境下,对所述纵向联邦模型进行训练,获得奖励信息和下一时间步状态信息;The first training unit is configured to train the longitudinal federated model in the training environment corresponding to the control information to obtain reward information and next time step status information;
第一更新单元,用于所述将所述奖励信息、所述下一时间步状态信息和所述控制信息存储为样本信息,并基于所述样本信息,更新各所述预设强化学习模型的网络权重。The first update unit is configured to store the reward information, the next time step status information, and the control information as sample information, and update each preset reinforcement learning model based on the sample information Network weight.
可选地,第一更新单元包括:Optionally, the first update unit includes:
第一训练子单元,用于所述将所述样本信息作为训练数据输入所述预设强化学习模型,以对所述预设强化学习模型进行训练,获得训练输出值;The first training subunit is configured to input the sample information as training data into the preset reinforcement learning model, so as to train the preset reinforcement learning model to obtain a training output value;
比对子单元,用于所述将所述训练输出值与所述训练数据对应的真实输出值进行比对,获得模型误差值;The comparison subunit is used to compare the training output value with the actual output value corresponding to the training data to obtain a model error value;
第一判断子单元,用于所述将所述模型误差值与预设误差阀值进行比对,若所述模型误差值小于所述预设误差阀值,则完成对所述预设强化学习模型的训练;The first judgment subunit is configured to compare the model error value with a preset error threshold, and if the model error value is less than the preset error threshold, complete the preset reinforcement learning Model training;
第二判断子单元,用于所述若所述模型误差值大于或等于所述预设误差阀值,则基于所述模型误差值更新所述预设强化学习模型的网络权重,对所述预设强化学习模型重新进行训练。The second judging subunit is configured to: if the model error value is greater than or equal to the preset error threshold, update the network weight of the preset reinforcement learning model based on the model error value, and evaluate the prediction Let the reinforcement learning model be retrained.
可选地,所述第二更新模块包括:Optionally, the second update module includes:
定期发送单元,用于所述定期将更新后的各所述预设强化学习模型输入所述预设横向联邦服务器,以基于预设联邦规则对更新后的各所述预设强化学习模型进行横向联邦,获得横向联邦模型;The regular sending unit is configured to periodically input the updated preset reinforcement learning models to the preset horizontal federation server, so as to perform horizontal transfers on the updated preset reinforcement learning models based on preset federation rules. Federation, obtain the horizontal federation model;
第二更新单元,用于所述基于所述横向联邦模型,对更新后的各所述预设强化学习模型进行迭代更新。The second update unit is configured to iteratively update each of the updated preset reinforcement learning models based on the horizontal federation model.
可选地,所述定期发送单元单元包括:Optionally, the periodic sending unit unit includes:
融合子单元,用于所述定期将各所述更新模型参数输入所述预设横向联邦服务器,以对各所述更新模型参数进行融合,获得全局模型参数;The fusion subunit is configured to periodically input each of the updated model parameters to the preset horizontal federated server to fuse each of the updated model parameters to obtain global model parameters;
第二训练子单元,用于所述将所述全局模型参数分发至更新后的各所述预设强化学习模型,以基于所述全局模型参数对更新后的所述预设强化学习模型进行训练,获得所述横向联邦模型。The second training subunit is used for distributing the global model parameters to each of the updated preset reinforcement learning models, so as to train the updated preset reinforcement learning models based on the global model parameters , To obtain the horizontal federation model.
可选地,所述输入模块包括:Optionally, the input module includes:
输入单元,用于所述将所述可利用公共信息作为当前输入值输入所述纵向联邦模型,获得当前输出值;An input unit for inputting the available public information as the current input value into the longitudinal federated model to obtain the current output value;
比对单元,用于所述将所述当前输出值与预设当前真实值进行比对,获得当前误差值;A comparison unit, configured to compare the current output value with a preset current true value to obtain a current error value;
求偏导单元,用于所述基于所述当前权重值和所述当前误差值,对预设损失函数求偏导,获得所述当前权重值和所述当前误差值共同对应的向量信息。The partial derivative unit is configured to obtain a partial derivative of a preset loss function based on the current weight value and the current error value to obtain vector information corresponding to the current weight value and the current error value.
可选地,所述输入模块包括:Optionally, the input module includes:
接收单元,用于所述接收预设强化学习模型的消息请求,通过预设纵向联邦方获取所述消息请求中的识别信息;A receiving unit, configured to receive a message request of a preset reinforcement learning model, and obtain the identification information in the message request through a preset vertical federate;
匹配单元,用于所述基于所述识别信息,通过所述预设纵向联邦方在预设公共数据源中匹配所述识别信息对应的所述可利用公共信息。The matching unit is configured to match the available public information corresponding to the identification information in a preset public data source through the preset vertical federated party based on the identification information.
本发明横向联邦和纵向联邦联合装置的具体实施方式与上述横向联邦和纵向联邦联合方法各实施例基本相同,在此不再赘述。The specific implementation of the horizontal federation and vertical federation combined device of the present invention is basically the same as the foregoing embodiments of the horizontal federation and vertical federation combined method, and will not be repeated here.
本发明提供了一种介质,且所述介质为计算机可读存储介质,所述介质存储有一个或者一个以上程序,所述一个或者一个以上程序还可被一个或者一个以上的处理器执行以用于实现上述任一项所述的横向联邦和纵向联邦联合方法的步骤。The present invention provides a medium, and the medium is a computer-readable storage medium, the medium stores one or more programs, and the one or more programs may also be executed by one or more processors for use To realize the steps of the horizontal federation and vertical federation combined method described in any one of the above.
本发明介质具体实施方式与上述横向联邦和纵向联邦联合方法各实施例基本相同,在此不再赘述。The specific implementation of the medium of the present invention is basically the same as the foregoing embodiments of the horizontal federation and vertical federation combined method, and will not be repeated here.
以上仅为本发明的优选实施例,并非因此限制本发明的专利范围,凡是利用本发明说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本发明的专利处理范围内。The above are only preferred embodiments of the present invention, and do not limit the scope of the present invention. Any equivalent structure or equivalent process transformation made by using the contents of the description and drawings of the present invention, or directly or indirectly applied to other related technical fields , The same principle is included in the scope of the patent treatment of the present invention.

Claims (20)

  1. 一种横向联邦和纵向联邦联合方法,其特征在于,所述横向联邦和纵向联邦联合方法包括: A combined method of horizontal federation and vertical federation, characterized in that the combined method of horizontal federation and vertical federation includes:
    获取可利用公共信息,并将所述可利用公共信息输入预设纵向联邦服务方,获得向量信息;Obtain available public information, and input the available public information into a preset vertical federal service party to obtain vector information;
    基于所述向量信息,对所述预设纵向联邦服务方的纵向联邦模型进行训练,更新各预设强化学习模型的网络权重;Based on the vector information, train the vertical federation model of the preset vertical federated service provider, and update the network weight of each preset reinforcement learning model;
    定期将更新后的各所述预设强化学习模型输入预设横向联邦服务器,对更新后的各所述预设强化学习模型进行迭代更新。Each of the updated preset reinforcement learning models is input to a preset horizontal federated server regularly, and each of the updated preset reinforcement learning models is iteratively updated.
  2. 如权利要求1所述横向联邦和纵向联邦联合方法,其特征在于,所述基于所述向量信息,对所述预设纵向联邦服务方的纵向联邦模型进行训练,以更新各预设强化学习模型的网络权重的步骤包括: The horizontal federation and vertical federation joint method according to claim 1, wherein the vertical federation model of the preset vertical federated service party is trained based on the vector information to update each preset reinforcement learning model The steps of network weighting include:
    接收各所述预设强化学习模型发送的传感器数据,并基于所述传感器数据和所述向量信息,通过所述纵向联邦模型产生控制信息;Receiving sensor data sent by each of the preset reinforcement learning models, and generating control information through the longitudinal federated model based on the sensor data and the vector information;
    在所述控制信息对应的训练环境下,对所述纵向联邦模型进行训练,获得奖励信息和下一时间步状态信息;Training the longitudinal federation model in the training environment corresponding to the control information to obtain reward information and next time step status information;
    将所述奖励信息、所述下一时间步状态信息和所述控制信息存储为样本信息,并基于所述样本信息,更新各所述预设强化学习模型的网络权重。The reward information, the next time step status information, and the control information are stored as sample information, and based on the sample information, the network weight of each of the preset reinforcement learning models is updated.
  3. 如权利要求2所述横向联邦和纵向联邦联合方法,其特征在于,所述基于所述样本信息,更新各所述预设强化学习模型的网络权重的步骤包括: 3. The combined method of horizontal federation and vertical federation according to claim 2, wherein the step of updating the network weight of each of the preset reinforcement learning models based on the sample information comprises:
    将所述样本信息作为训练数据输入所述预设强化学习模型,以对所述预设强化学习模型进行训练,获得训练输出值;Inputting the sample information as training data into the preset reinforcement learning model to train the preset reinforcement learning model to obtain a training output value;
    将所述训练输出值与所述训练数据对应的真实输出值进行比对,获得模型误差值;Comparing the training output value with the actual output value corresponding to the training data to obtain a model error value;
    将所述模型误差值与预设误差阀值进行比对,若所述模型误差值小于所述预设误差阀值,则完成对所述预设强化学习模型的训练;Comparing the model error value with a preset error threshold, and if the model error value is less than the preset error threshold, completing the training of the preset reinforcement learning model;
    若所述模型误差值大于或等于所述预设误差阀值,则基于所述模型误差值更新所述预设强化学习模型的网络权重,对所述预设强化学习模型重新进行训练。If the model error value is greater than or equal to the preset error threshold, the network weight of the preset reinforcement learning model is updated based on the model error value, and the preset reinforcement learning model is retrained.
  4. 如权利要求1所述横向联邦和纵向联邦联合方法,其特征在于,所述定期将更新后的各所述预设强化学习模型输入预设横向联邦服务器,对更新后的各所述预设强化学习模型进行迭代更新的步骤包括: The combined method of horizontal federation and vertical federation according to claim 1, characterized in that, each of the updated preset reinforcement learning models is input to a preset horizontal federation server regularly, and each of the updated presets is strengthened The steps of iterative update of the learning model include:
    定期将更新后的各所述预设强化学习模型输入所述预设横向联邦服务器,以基于预设联邦规则对更新后的各所述预设强化学习模型进行横向联邦,获得横向联邦模型;Regularly input each of the updated preset reinforcement learning models to the preset horizontal federation server to perform horizontal federation on each of the updated preset reinforcement learning models based on preset federation rules to obtain a horizontal federation model;
    基于所述横向联邦模型,对更新后的各所述预设强化学习模型进行迭代更新。Based on the horizontal federation model, iteratively update each of the updated preset reinforcement learning models.
  5. 如权利要求4所述横向联邦和纵向联邦联合方法,其特征在于,所述更新后的各所述预设强化学习模型包括更新模型参数, 5. The combined horizontal federation and vertical federation method according to claim 4, wherein each of the preset reinforcement learning models after the update includes updated model parameters,
    所述定期将更新后的各所述预设强化学习模型输入所述预设横向联邦服务器,以基于预设联邦规则对更新后的各所述预设强化学习模型进行横向联邦,获得横向联邦模型步骤包括:Said regularly inputting each of the updated preset reinforcement learning models into the preset horizontal federation server to perform horizontal federation on each of the updated preset reinforcement learning models based on preset federation rules to obtain a horizontal federation model The steps include:
    定期将各所述更新模型参数输入所述预设横向联邦服务器,以对各所述更新模型参数进行融合,获得全局模型参数;Regularly input each of the updated model parameters to the preset horizontal federated server to fuse each of the updated model parameters to obtain global model parameters;
    将所述全局模型参数分发至更新后的各所述预设强化学习模型,以基于所述全局模型参数对更新后的所述预设强化学习模型进行训练,获得所述横向联邦模型。Distributing the global model parameters to each of the updated preset reinforcement learning models, so as to train the updated preset reinforcement learning models based on the global model parameters to obtain the horizontal federated model.
  6. 如权利要求1所述横向联邦和纵向联邦联合方法,其特征在于,所述预设纵向联邦服务方包括纵向联邦模型,所述纵向联邦模型包括当前权重值, The method for combining horizontal federation and vertical federation according to claim 1, wherein the preset vertical federated service party includes a vertical federated model, and the vertical federated model includes a current weight value,
    所述将所述可利用公共信息输入预设纵向联邦服务方,获得向量信息的步骤包括:The step of inputting the available public information into a preset vertical federated service party to obtain vector information includes:
    将所述可利用公共信息作为当前输入值输入所述纵向联邦模型,获得当前输出值;Input the available public information as the current input value into the vertical federation model to obtain the current output value;
    将所述当前输出值与预设当前真实值进行比对,获得当前误差值;Comparing the current output value with the preset current true value to obtain the current error value;
    基于所述当前权重值和所述当前误差值,对预设损失函数求偏导,获得所述当前权重值和所述当前误差值共同对应的向量信息。Based on the current weight value and the current error value, a partial derivative of a preset loss function is obtained to obtain vector information corresponding to the current weight value and the current error value.
  7. 如权利要求1所述横向联邦和纵向联邦联合方法,其特征在于,所述获取可利用公共信息的步骤包括: The method of combining horizontal federation and vertical federation according to claim 1, wherein the step of obtaining available public information comprises:
    接收预设强化学习模型的消息请求,通过预设纵向联邦方获取所述消息请求中的识别信息;Receiving a message request of a preset reinforcement learning model, and obtaining identification information in the message request through a preset vertical federated party;
    基于所述识别信息,通过所述预设纵向联邦方在预设公共数据源中匹配所述识别信息对应的所述可利用公共信息。Based on the identification information, the available public information corresponding to the identification information is matched in a preset public data source through the preset vertical federated party.
  8. 一种横向联邦和纵向联邦联合装置,其特征在于,所述横向联邦和纵向联邦联合装置应用于横向联邦和纵向联邦联合设备,所述横向联邦和纵向联邦联合装置包括: A combined device of horizontal federation and vertical federation, characterized in that the combined device of horizontal federation and vertical federation is applied to combined equipment of horizontal federation and vertical federation, and the combined device of horizontal federation and vertical federation includes:
    输入模块,用于所述获取可利用公共信息,并将所述可利用公共信息输入预设纵向联邦服务方,获得向量信息;The input module is used to obtain the available public information, and input the available public information into a preset vertical federated service party to obtain vector information;
    第一更新模块,用于所述基于所述向量信息,对所述预设纵向联邦服务方的纵向联邦模型进行训练,更新各预设强化学习模型的网络权重;The first update module is configured to train the vertical federation model of the preset vertical federated server based on the vector information, and update the network weight of each preset reinforcement learning model;
    第二更新模块,用于所述定期将更新后的各所述预设强化学习模型输入预设横向联邦服务器,对更新后的各所述预设强化学习模型进行迭代更新。The second update module is configured to periodically input the updated preset reinforcement learning models to a preset horizontal federated server, and iteratively update each of the updated preset reinforcement learning models.
  9. 如权利要求8所述横向联邦和纵向联邦联合装置,其特征在于,所述第一更新模块包括: 8. The combined device of horizontal federation and vertical federation according to claim 8, wherein the first update module comprises:
    获取单元,用于所述接收各所述预设强化学习模型发送的传感器数据,并基于所述传感器数据和所述向量信息,通过所述纵向联邦模型产生控制信息;An acquiring unit, configured to receive sensor data sent by each of the preset reinforcement learning models, and generate control information through the longitudinal federated model based on the sensor data and the vector information;
    第一训练单元,用于所述在所述控制信息对应的训练环境下,对所述纵向联邦模型进行训练,获得奖励信息和下一时间步状态信息;The first training unit is configured to train the longitudinal federated model in the training environment corresponding to the control information to obtain reward information and next time step status information;
    第一更新单元,用于所述将所述奖励信息、所述下一时间步状态信息和所述控制信息存储为样本信息,并基于所述样本信息,更新各所述预设强化学习模型的网络权重。The first update unit is configured to store the reward information, the next time step status information, and the control information as sample information, and update each preset reinforcement learning model based on the sample information Network weight.
  10. 如权利要求9所述横向联邦和纵向联邦联合装置,其特征在于,所述第一更新单元包括: 9. The combined device for horizontal federation and vertical federation according to claim 9, wherein the first update unit comprises:
    第一训练子单元,用于所述将所述样本信息作为训练数据输入所述预设强化学习模型,以对所述预设强化学习模型进行训练,获得训练输出值;The first training subunit is configured to input the sample information as training data into the preset reinforcement learning model, so as to train the preset reinforcement learning model to obtain a training output value;
    比对子单元,用于所述将所述训练输出值与所述训练数据对应的真实输出值进行比对,获得模型误差值;The comparison subunit is used to compare the training output value with the actual output value corresponding to the training data to obtain a model error value;
    第一判断子单元,用于所述将所述模型误差值与预设误差阀值进行比对,若所述模型误差值小于所述预设误差阀值,则完成对所述预设强化学习模型的训练;The first judgment subunit is configured to compare the model error value with a preset error threshold, and if the model error value is less than the preset error threshold, complete the preset reinforcement learning Model training;
    第二判断子单元,用于所述若所述模型误差值大于或等于所述预设误差阀值,则基于所述模型误差值更新所述预设强化学习模型的网络权重,对所述预设强化学习模型重新进行训练。The second judging subunit is configured to: if the model error value is greater than or equal to the preset error threshold, update the network weight of the preset reinforcement learning model based on the model error value, and evaluate the prediction Let the reinforcement learning model be retrained.
  11. 如权利要求8所述横向联邦和纵向联邦联合装置,其特征在于,所述第二更新模块包括: The combined device of horizontal federation and vertical federation according to claim 8, wherein the second update module comprises:
    定期发送单元,用于所述定期将更新后的各所述预设强化学习模型输入所述预设横向联邦服务器,以基于预设联邦规则对更新后的各所述预设强化学习模型进行横向联邦,获得横向联邦模型;The regular sending unit is configured to periodically input the updated preset reinforcement learning models to the preset horizontal federation server, so as to perform horizontal transfers on the updated preset reinforcement learning models based on preset federation rules. Federation, obtain the horizontal federation model;
    第二更新单元,用于所述基于所述横向联邦模型,对更新后的各所述预设强化学习模型进行迭代更新。The second update unit is configured to iteratively update each of the updated preset reinforcement learning models based on the horizontal federation model.
  12. 如权利要求11所述横向联邦和纵向联邦联合装置,其特征在于,所述定期发送单元包括: The combined device for horizontal federation and vertical federation according to claim 11, wherein the periodic sending unit comprises:
    融合子单元,用于所述定期将各所述更新模型参数输入所述预设横向联邦服务器,以对各所述更新模型参数进行融合,获得全局模型参数;The fusion subunit is configured to periodically input each of the updated model parameters to the preset horizontal federated server to fuse each of the updated model parameters to obtain global model parameters;
    第二训练子单元,用于所述将所述全局模型参数分发至更新后的各所述预设强化学习模型,以基于所述全局模型参数对更新后的所述预设强化学习模型进行训练,获得所述横向联邦模型。The second training subunit is used for distributing the global model parameters to each of the updated preset reinforcement learning models, so as to train the updated preset reinforcement learning models based on the global model parameters , To obtain the horizontal federation model.
  13. 如权利要求8所述横向联邦和纵向联邦联合装置,其特征在于,所述输入模块包括: The combined device of horizontal federation and vertical federation according to claim 8, wherein the input module comprises:
    输入单元,用于所述将所述可利用公共信息作为当前输入值输入所述纵向联邦模型,获得当前输出值;An input unit for inputting the available public information as the current input value into the longitudinal federated model to obtain the current output value;
    比对单元,用于所述将所述当前输出值与预设当前真实值进行比对,获得当前误差值;A comparison unit, configured to compare the current output value with a preset current true value to obtain a current error value;
    求偏导单元,用于所述基于所述当前权重值和所述当前误差值,对预设损失函数求偏导,获得所述当前权重值和所述当前误差值共同对应的向量信息。The partial derivative unit is configured to obtain a partial derivative of a preset loss function based on the current weight value and the current error value to obtain vector information corresponding to the current weight value and the current error value.
  14. 如权利要求8所述横向联邦和纵向联邦联合装置,其特征在于,所述输入模块包括: The combined device of horizontal federation and vertical federation according to claim 8, wherein the input module comprises:
    接收单元,用于所述接收预设强化学习模型的消息请求,通过预设纵向联邦方获取所述消息请求中的识别信息;A receiving unit, configured to receive a message request of a preset reinforcement learning model, and obtain the identification information in the message request through a preset vertical federate;
    匹配单元,用于所述基于所述识别信息,通过所述预设纵向联邦方在预设公共数据源中匹配所述识别信息对应的所述可利用公共信息。The matching unit is configured to match the available public information corresponding to the identification information in a preset public data source through the preset vertical federated party based on the identification information.
  15. 一种横向联邦和纵向联邦联合设备,其中,所述横向联邦和纵向联邦联合设备包括存储器、处理器和存储在所述存储器上并可在所述处理器上运行的横向联邦和纵向联邦联合程序,所述横向联邦和纵向联邦联合程序被所述处理器执行时实现如下步骤: A combined horizontal federation and vertical federated device, wherein the combined horizontal federated and vertical federated device includes a memory, a processor, and a horizontal federated and vertical federated combined program stored on the memory and running on the processor When the horizontal federation and vertical federation joint program is executed by the processor, the following steps are implemented:
    获取可利用公共信息,并将所述可利用公共信息输入预设纵向联邦服务方,获得向量信息;Obtain available public information, and input the available public information into a preset vertical federal service party to obtain vector information;
    基于所述向量信息,对所述预设纵向联邦服务方的纵向联邦模型进行训练,更新各预设强化学习模型的网络权重;Based on the vector information, train the vertical federation model of the preset vertical federated service provider, and update the network weight of each preset reinforcement learning model;
    定期将更新后的各所述预设强化学习模型输入预设横向联邦服务器,对更新后的各所述预设强化学习模型进行迭代更新。Each of the updated preset reinforcement learning models is input to a preset horizontal federated server regularly, and each of the updated preset reinforcement learning models is iteratively updated.
  16. 如权利要求15所述横向联邦和纵向联邦联合设备,其特征在于,所述基于所述向量信息,对所述预设纵向联邦服务方的纵向联邦模型进行训练,以更新各预设强化学习模型的网络权重的步骤包括:The horizontal federation and vertical federation combined equipment according to claim 15, wherein the vertical federation model of the preset vertical federation service party is trained based on the vector information to update each preset reinforcement learning model The steps of network weighting include:
    接收各所述预设强化学习模型发送的传感器数据,并基于所述传感器数据和所述向量信息,通过所述纵向联邦模型产生控制信息;Receiving sensor data sent by each of the preset reinforcement learning models, and generating control information through the longitudinal federated model based on the sensor data and the vector information;
    在所述控制信息对应的训练环境下,对所述纵向联邦模型进行训练,获得奖励信息和下一时间步状态信息;Training the longitudinal federation model in the training environment corresponding to the control information to obtain reward information and next time step status information;
    将所述奖励信息、所述下一时间步状态信息和所述控制信息存储为样本信息,并基于所述样本信息,更新各所述预设强化学习模型的网络权重。The reward information, the next time step status information, and the control information are stored as sample information, and based on the sample information, the network weight of each of the preset reinforcement learning models is updated.
  17. 如权利要求16所述横向联邦和纵向联邦联合设备,其特征在于,所述基于所述样本信息,更新各所述预设强化学习模型的网络权重的步骤包括: The horizontal federation and vertical federation combined device according to claim 16, wherein the step of updating the network weight of each of the preset reinforcement learning models based on the sample information comprises:
    将所述样本信息作为训练数据输入所述预设强化学习模型,以对所述预设强化学习模型进行训练,获得训练输出值;Inputting the sample information as training data into the preset reinforcement learning model to train the preset reinforcement learning model to obtain a training output value;
    将所述训练输出值与所述训练数据对应的真实输出值进行比对,获得模型误差值;Comparing the training output value with the actual output value corresponding to the training data to obtain a model error value;
    将所述模型误差值与预设误差阀值进行比对,若所述模型误差值小于所述预设误差阀值,则完成对所述预设强化学习模型的训练;Comparing the model error value with a preset error threshold, and if the model error value is less than the preset error threshold, completing the training of the preset reinforcement learning model;
    若所述模型误差值大于或等于所述预设误差阀值,则基于所述模型误差值更新所述预设强化学习模型的网络权重,对所述预设强化学习模型重新进行训练。If the model error value is greater than or equal to the preset error threshold, the network weight of the preset reinforcement learning model is updated based on the model error value, and the preset reinforcement learning model is retrained.
  18. 如权利要求15所述横向联邦和纵向联邦联合设备,其特征在于,所述定期将更新后的各所述预设强化学习模型输入预设横向联邦服务器,对更新后的各所述预设强化学习模型进行迭代更新的步骤包括: The horizontal federation and vertical federation combined equipment according to claim 15, wherein the preset reinforcement learning model is periodically input to a preset horizontal federation server after updating, and each preset reinforcement learning model is updated to a preset horizontal federation server. The steps of iterative update of the learning model include:
    定期将更新后的各所述预设强化学习模型输入所述预设横向联邦服务器,以基于预设联邦规则对更新后的各所述预设强化学习模型进行横向联邦,获得横向联邦模型;Regularly input each of the updated preset reinforcement learning models to the preset horizontal federation server to perform horizontal federation on each of the updated preset reinforcement learning models based on preset federation rules to obtain a horizontal federation model;
    基于所述横向联邦模型,对更新后的各所述预设强化学习模型进行迭代更新。Based on the horizontal federation model, iteratively update each of the updated preset reinforcement learning models.
  19. 如权利要求15所述横向联邦和纵向联邦联合设备,其特征在于,所述预设纵向联邦服务方包括纵向联邦模型,所述纵向联邦模型包括当前权重值, The horizontal federation and vertical federation combined equipment according to claim 15, wherein the preset vertical federated service party includes a vertical federated model, and the vertical federated model includes a current weight value,
    所述将所述可利用公共信息输入预设纵向联邦服务方,获得向量信息的步骤包括:The step of inputting the available public information into a preset vertical federated service party to obtain vector information includes:
    将所述可利用公共信息作为当前输入值输入所述纵向联邦模型,获得当前输出值;Input the available public information as the current input value into the vertical federation model to obtain the current output value;
    将所述当前输出值与预设当前真实值进行比对,获得当前误差值;Comparing the current output value with the preset current true value to obtain the current error value;
    基于所述当前权重值和所述当前误差值,对预设损失函数求偏导,获得所述当前权重值和所述当前误差值共同对应的向量信息。Based on the current weight value and the current error value, a partial derivative of a preset loss function is obtained to obtain vector information corresponding to the current weight value and the current error value.
  20. 一种介质,其特征在于,所述介质上存储有实现横向联邦和纵向联邦联合方法的程序,所述实现横向联邦和纵向联邦联合方法的程序被处理器执行以实现如权利要求1至7中任一项所述横向联邦和纵向联邦联合方法的步骤。 A medium, characterized in that a program for realizing the combined method of horizontal federation and vertical federation is stored on the medium, and the program for realizing the combined method of horizontal federation and vertical federation is executed by a processor to realize as claimed in claims 1 to 7. Any one of the steps of the horizontal federation and vertical federation combined method.
PCT/CN2020/124846 2019-10-29 2020-10-29 Method, device, and apparatus for combining horizontal federation and vertical federation, and medium WO2021083276A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911035368.0 2019-10-29
CN201911035368.0A CN110782042B (en) 2019-10-29 2019-10-29 Method, device, equipment and medium for combining horizontal federation and vertical federation

Publications (1)

Publication Number Publication Date
WO2021083276A1 true WO2021083276A1 (en) 2021-05-06

Family

ID=69387208

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/124846 WO2021083276A1 (en) 2019-10-29 2020-10-29 Method, device, and apparatus for combining horizontal federation and vertical federation, and medium

Country Status (2)

Country Link
CN (1) CN110782042B (en)
WO (1) WO2021083276A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113238867A (en) * 2021-05-19 2021-08-10 浙江凡双科技有限公司 Federated learning method based on network unloading
CN113435604A (en) * 2021-06-16 2021-09-24 清华大学 Method and device for optimizing federated learning
CN113490184A (en) * 2021-05-10 2021-10-08 北京科技大学 Smart factory-oriented random access resource optimization method and device
CN113515890A (en) * 2021-05-21 2021-10-19 华北电力大学 Renewable energy day-ahead scene generation method based on federal learning
CN113536667A (en) * 2021-06-22 2021-10-22 同盾科技有限公司 Federal model training method and device, readable storage medium and equipment
CN114363176A (en) * 2021-12-20 2022-04-15 中山大学 Network identification method, device, terminal and medium based on federal learning

Families Citing this family (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110782042B (en) * 2019-10-29 2022-02-11 深圳前海微众银行股份有限公司 Method, device, equipment and medium for combining horizontal federation and vertical federation
CN111353167A (en) * 2020-02-26 2020-06-30 深圳前海微众银行股份有限公司 Data discrimination method, device, equipment and storage medium based on multiple providers
CN111369042B (en) * 2020-02-27 2021-09-24 山东大学 Wireless service flow prediction method based on weighted federal learning
CN111383094A (en) * 2020-03-06 2020-07-07 深圳前海微众银行股份有限公司 Product service full-chain driving method, equipment and readable storage medium
CN111401552B (en) * 2020-03-11 2023-04-07 浙江大学 Federal learning method and system based on batch size adjustment and gradient compression rate adjustment
CN113392101A (en) * 2020-03-13 2021-09-14 京东城市(北京)数字科技有限公司 Method, main server, service platform and system for constructing horizontal federated tree
CN113554476B (en) * 2020-04-23 2024-04-19 京东科技控股股份有限公司 Training method and system of credit prediction model, electronic equipment and storage medium
CN112001500B (en) * 2020-08-13 2021-08-03 星环信息科技(上海)股份有限公司 Model training method, device and storage medium based on longitudinal federated learning system
CN112307331B (en) * 2020-10-14 2023-11-24 湖南天河国云科技有限公司 Intelligent recruitment information pushing method, system and terminal equipment for college graduates based on blockchain
CN112381428B (en) * 2020-11-19 2023-09-19 平安科技(深圳)有限公司 Service distribution method, device, equipment and storage medium based on reinforcement learning
CN112486180A (en) * 2020-12-10 2021-03-12 深圳前海微众银行股份有限公司 Vehicle control method, device, equipment, storage medium and program product
CN112560059B (en) * 2020-12-17 2022-04-29 浙江工业大学 Vertical federal model stealing defense method based on neural pathway feature extraction
CN112738035B (en) * 2020-12-17 2022-04-29 杭州趣链科技有限公司 Block chain technology-based vertical federal model stealing defense method
CN112560752B (en) * 2020-12-23 2024-03-26 杭州趣链科技有限公司 License plate recognition training method and device based on federal learning and related equipment
WO2022144001A1 (en) * 2020-12-31 2022-07-07 京东科技控股股份有限公司 Federated learning model training method and apparatus, and electronic device
CN113112026A (en) * 2021-04-02 2021-07-13 佳讯飞鸿(北京)智能科技研究院有限公司 Optimization method and device for federated learning model
WO2022226903A1 (en) * 2021-04-29 2022-11-03 浙江大学 Federated learning method for k-means clustering algorithm
CN113516250B (en) * 2021-07-13 2023-11-03 北京百度网讯科技有限公司 Federal learning method, device, equipment and storage medium
CN113673696B (en) * 2021-08-20 2024-03-22 山东鲁软数字科技有限公司 Power industry hoisting operation violation detection method based on reinforcement federal learning
CN114548426B (en) * 2022-02-17 2023-11-24 北京百度网讯科技有限公司 Asynchronous federal learning method, business service prediction method, device and system
CN115169576B (en) * 2022-06-24 2024-02-09 上海富数科技有限公司 Model training method and device based on federal learning and electronic equipment
CN115796309A (en) * 2022-09-20 2023-03-14 天翼电子商务有限公司 Horizontal and vertical combination algorithm for federated learning
CN115238065B (en) * 2022-09-22 2022-12-20 太极计算机股份有限公司 Intelligent document recommendation method based on federal learning
CN115759248B (en) * 2022-11-07 2023-06-13 吉林大学 Financial system analysis method and storage medium based on decentralised hybrid federal learning

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180316502A1 (en) * 2017-04-27 2018-11-01 Factom Data Reproducibility Using Blockchains
CN109167695A (en) * 2018-10-26 2019-01-08 深圳前海微众银行股份有限公司 Alliance Network construction method, equipment and readable storage medium storing program for executing based on federation's study
CN109871702A (en) * 2019-02-18 2019-06-11 深圳前海微众银行股份有限公司 Federal model training method, system, equipment and computer readable storage medium
CN110245510A (en) * 2019-06-19 2019-09-17 北京百度网讯科技有限公司 Method and apparatus for predictive information
CN110263936A (en) * 2019-06-14 2019-09-20 深圳前海微众银行股份有限公司 Laterally federation's learning method, device, equipment and computer storage medium
CN110782042A (en) * 2019-10-29 2020-02-11 深圳前海微众银行股份有限公司 Method, device, equipment and medium for combining horizontal federation and vertical federation

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180316502A1 (en) * 2017-04-27 2018-11-01 Factom Data Reproducibility Using Blockchains
CN109167695A (en) * 2018-10-26 2019-01-08 深圳前海微众银行股份有限公司 Alliance Network construction method, equipment and readable storage medium storing program for executing based on federation's study
CN109871702A (en) * 2019-02-18 2019-06-11 深圳前海微众银行股份有限公司 Federal model training method, system, equipment and computer readable storage medium
CN110263936A (en) * 2019-06-14 2019-09-20 深圳前海微众银行股份有限公司 Laterally federation's learning method, device, equipment and computer storage medium
CN110245510A (en) * 2019-06-19 2019-09-17 北京百度网讯科技有限公司 Method and apparatus for predictive information
CN110782042A (en) * 2019-10-29 2020-02-11 深圳前海微众银行股份有限公司 Method, device, equipment and medium for combining horizontal federation and vertical federation

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113490184A (en) * 2021-05-10 2021-10-08 北京科技大学 Smart factory-oriented random access resource optimization method and device
CN113490184B (en) * 2021-05-10 2023-05-26 北京科技大学 Random access resource optimization method and device for intelligent factory
CN113238867A (en) * 2021-05-19 2021-08-10 浙江凡双科技有限公司 Federated learning method based on network unloading
CN113238867B (en) * 2021-05-19 2024-01-19 浙江凡双科技股份有限公司 Federal learning method based on network unloading
CN113515890A (en) * 2021-05-21 2021-10-19 华北电力大学 Renewable energy day-ahead scene generation method based on federal learning
CN113515890B (en) * 2021-05-21 2024-03-08 华北电力大学 Renewable energy day-ahead scene generation method based on federal learning
CN113435604A (en) * 2021-06-16 2021-09-24 清华大学 Method and device for optimizing federated learning
CN113435604B (en) * 2021-06-16 2024-05-07 清华大学 Federal learning optimization method and device
CN113536667A (en) * 2021-06-22 2021-10-22 同盾科技有限公司 Federal model training method and device, readable storage medium and equipment
CN113536667B (en) * 2021-06-22 2024-03-01 同盾科技有限公司 Federal model training method, federal model training device, readable storage medium and federal model training device
CN114363176A (en) * 2021-12-20 2022-04-15 中山大学 Network identification method, device, terminal and medium based on federal learning
CN114363176B (en) * 2021-12-20 2023-08-08 中山大学 Network identification method, device, terminal and medium based on federal learning

Also Published As

Publication number Publication date
CN110782042A (en) 2020-02-11
CN110782042B (en) 2022-02-11

Similar Documents

Publication Publication Date Title
WO2021083276A1 (en) Method, device, and apparatus for combining horizontal federation and vertical federation, and medium
WO2023005133A1 (en) Federated learning modeling optimization method and device, and readable storage medium and program product
US20220044120A1 (en) Synthesizing a singular ensemble machine learning model from an ensemble of models
WO2022016964A1 (en) Vertical federated modeling optimization method and device, and readable storage medium
CN113095512A (en) Federal learning modeling optimization method, apparatus, medium, and computer program product
CN113037877B (en) Optimization method for time-space data and resource scheduling under cloud edge architecture
WO2022022024A1 (en) Training sample construction method, apparatus, and device, and computer-readable storage medium
CN110020022B (en) Data processing method, device, equipment and readable storage medium
CN112785002A (en) Model construction optimization method, device, medium, and computer program product
EP4002231A1 (en) Federated machine learning as a service
CN111898768A (en) Data processing method, device, equipment and medium
CN112785144A (en) Model construction method, device and storage medium based on federal learning
CN111428883A (en) Federal modeling method, device and readable storage medium based on backward law
Pérez et al. A mobile group decision making model for heterogeneous information and changeable decision contexts
CN111428884A (en) Federal modeling method, device and readable storage medium based on forward law
CN111680799B (en) Method and device for processing model parameters
CN116781788B (en) Service decision method and service decision device
CN112541556A (en) Model construction optimization method, device, medium, and computer program product
CN112381236A (en) Data processing method, device, equipment and storage medium for federal transfer learning
CN116978450A (en) Protein data processing method, device, electronic equipment and storage medium
CN113516254A (en) Method, apparatus, medium, and program product for optimizing horizontal federated learning modeling
Song et al. Personalized federated learning with server-side information
CN113793298A (en) Pulmonary nodule detection model construction optimization method, equipment, storage medium and product
CN111445030A (en) Federal modeling method, device and readable storage medium based on stepwise regression method
CN113869533A (en) Federal learning modeling optimization method, apparatus, readable storage medium, and program product

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20881497

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20881497

Country of ref document: EP

Kind code of ref document: A1