WO2021083276A1

WO2021083276A1 - Method, device, and apparatus for combining horizontal federation and vertical federation, and medium

Info

Publication number: WO2021083276A1
Application number: PCT/CN2020/124846
Authority: WO
Inventors: 梁新乐; 刘洋; 陈天健; 董苗波
Original assignee: 深圳前海微众银行股份有限公司
Priority date: 2019-10-29
Filing date: 2020-10-29
Publication date: 2021-05-06
Also published as: CN110782042A; CN110782042B

Abstract

A method, device, and apparatus for combining a horizontal federation and a vertical federation, and a medium. The method for combining a horizontal federation and a vertical federation comprises: acquiring available public information, and inputting the available public information into a preset vertical federation server to obtain vector information (S10); training, on the basis of the vector information, a vertical federation model of the preset vertical federation server, and updating network weights of respective preset reinforcement learning models (S20); and regularly inputting each of the updated preset reinforcement learning models into a preset horizontal federation server, and iteratively updating each of the updated preset reinforcement learning models (S30). The method solves the technical problem in the prior art in which reinforcement learning models consume a considerable amount of computing system resources.

Description

Horizontal federation and vertical federation combined methods, devices, equipment and media

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on October 29, 2019, the application number is 201911035368.0, and the invention title is "horizontal federal and vertical federal combined methods, devices, equipment, and media". The entire content of the application is approved The reference is incorporated in the application.

Technical field

The present invention relates to the field of machine learning technology of financial technology (Fintech), and in particular to a combined method, device, equipment and medium of horizontal federation and vertical federation.

Background technique

With the continuous development of financial technology, especially Internet technology and finance, more and more technologies (such as distributed, blockchain, artificial intelligence, etc.) are applied in the financial field, but the financial industry also proposes a higher level of technology. Requirements, such as the distribution of to-do items in the financial industry, also have higher requirements.

With the gradual development of artificial intelligence, extensive research has been made on the use of reinforcement learning to optimize control in the industry. In the existing technology, reinforcement learning models usually use their own collected data for learning, optimization and control. However, reinforcement Collecting data by the learning model itself often causes some difficulties. For example, the high-speed radar of an unmanned vehicle cannot pass through the occlusion. Due to the limitation of the height of its image sensor, the unmanned vehicle cannot obtain more comprehensive data (such as the distribution of surrounding vehicles and operating status). Etc.), resulting in low sample processing efficiency of the reinforcement learning model and poor model control performance. Further, in order to obtain better optimal control results when the sample processing efficiency is low and the model control performance is poor, only pass The reinforcement learning model itself performs learning, optimization and control, and needs to consume a large amount of computing system resources. Therefore, there is a technical problem in the prior art that the computing system resource consumption of the reinforcement learning model is high.

Technical solutions

The main purpose of the present invention is to provide a horizontal federation and vertical federation combined method, device, equipment and medium, aiming to solve the technical problem of high resource consumption of the computing system of the reinforcement learning model in the prior art.

In order to achieve the above objective, embodiments of the present invention provide a combination method for horizontal federation and vertical federation. The combination method for horizontal federation and vertical federation is applied to a horizontal federation and a vertical federation combined device. The horizontal federation and vertical federation combination method includes:

Obtain available public information, and input the available public information into a preset vertical federal service party to obtain vector information;

Based on the vector information, train the vertical federation model of the preset vertical federated service provider, and update the network weight of each preset reinforcement learning model;

Each of the updated preset reinforcement learning models is input to a preset horizontal federated server regularly, and each of the updated preset reinforcement learning models is iteratively updated.

In addition, in order to achieve the above-mentioned object, the present invention also provides a horizontal federation and vertical federation combined device, the horizontal federation and vertical federation combined device is applied to horizontal federation and vertical federation combined equipment, the horizontal federation and vertical federation combined device includes :

The input module is used to obtain the available public information, and input the available public information into a preset vertical federated service party to obtain vector information;

The first update module is configured to train the vertical federation model of the preset vertical federated server based on the vector information, and update the network weight of each preset reinforcement learning model;

The second update module is configured to periodically input the updated preset reinforcement learning models to a preset horizontal federated server, and iteratively update each of the updated preset reinforcement learning models.

In addition, in order to achieve the above-mentioned object, the present invention also provides a combined horizontal federation and vertical federated device. The combined horizontal federated and vertical federated device includes a memory, a processor, and a memory, a processor, and a device stored in the memory and available on the processor. The horizontal federation and vertical federation combined method program running on the above-mentioned horizontal federation and vertical federation combined method program can realize the steps of the horizontal federation and vertical fed combined combined method as described above when the program of the horizontal federation and vertical federation combined method is executed by the processor.

In addition, in order to achieve the above-mentioned object, the present invention also provides a medium, and the medium is a computer-readable storage medium, and the medium stores a program for realizing the combination method of horizontal federation and vertical federation. When the program of the joint method is executed by the processor, the steps of the above-mentioned horizontal federation and vertical federation joint method are realized.

This application obtains vector information by obtaining available public information and inputting the available public information to a preset vertical federated service party, and then based on the vector information, performs the vertical federation model of the preset vertical federated service party Training, update the network weights of each preset reinforcement learning model,

Further, each of the updated preset reinforcement learning models is periodically input to a preset horizontal federated server, and each of the updated preset reinforcement learning models is iteratively updated. That is, this application first obtains the available public information, and then inputs the available public information into a preset vertical federation server to obtain vector information, and further, based on the vector information, performs a comparison of the vertical federation model Training to update the network weights of each of the preset reinforcement learning models, and finally, periodically input the updated preset reinforcement learning models to the preset horizontal federated server to perform the reinforcement of the updated presets Iterative update of the learning model. That is, this application inputs the available public information into a preset vertical federation model, performs vertical federation learning on the preset vertical federation model, and then updates each preset reinforcement learning model. Therefore, this application uses The training data for model training is more comprehensive and broad, which improves the control performance of the model and makes the model more robust, avoiding the use of a single local data to train the model. Furthermore, by regularly updating each of the presets The reinforcement learning model is input to a preset horizontal federated server, and each preset reinforcement learning model is subjected to horizontal federated learning, and each preset reinforcement learning model after the update is updated iteratively, and each preset reinforcement learning model is added Effective training data, thereby reducing the training process with low training effects, and further reducing the consumption of the computing system resources of the single preset reinforcement learning model, thus solving the computing system resource consumption of the reinforcement learning model in the prior art High technical issues.

Description of the drawings

The drawings here are incorporated into the specification and constitute a part of the specification, show embodiments in accordance with the present invention, and together with the specification are used to explain the principle of the present invention.

In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the following will briefly introduce the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, for those of ordinary skill in the art, In other words, other drawings can be obtained based on these drawings without creative labor.

FIG. 1 is a schematic flowchart of the first embodiment of the combined method of horizontal federation and vertical federation according to the present invention;

Fig. 2 is a schematic diagram of analyzing a complete logical model of a tree interface of an application software interface in the combined method of horizontal federation and vertical federation of the present invention;

Fig. 3 is a schematic diagram of a flow chart of establishing a complete logical model of the interface by the combined method of horizontal federation and vertical federation of the present invention;

4 is a schematic flowchart of a second embodiment of the combined method of horizontal federation and vertical federation according to the present invention;

FIG. 5 is a schematic diagram of the device structure of the hardware operating environment involved in the solution of the embodiment of the present invention.

The realization of the objectives, functional characteristics and advantages of the present invention will be further described in conjunction with the embodiments and with reference to the accompanying drawings.

Embodiments of the present invention

It should be understood that the specific embodiments described here are only used to explain the present invention, but not to limit the present invention.

The present invention provides a combination method of horizontal federation and vertical federation. The combination method of horizontal federation and vertical federation is applied to the combined equipment of horizontal federation and vertical federation. In the first embodiment of the horizontal federation and vertical federation method of this application, refer to Figure 1. The combined method of horizontal federation and vertical federation includes:

Step S10: Obtain available public information, and input the available public information into a preset vertical federated service party to obtain vector information;

In this embodiment, it should be noted that the vector information refers to the gradient information generated during the training process of the preset reinforcement learning model, where the gradient is a vector, and the preset loss function can be biased. And the negative direction of the gradient is the direction in which the current value of the function approaches the minimum value, that is, the negative direction of the gradient is the direction in which the loss function value drops fastest, and the step size of the gradient is Is the maximum rate of change of the loss function value. The preset vertical federated server is a preset server that can be combined with different preset reinforcement learning models to perform vertical federated learning, where the vertical federated learning is based on the fact that the data features of the participants are less overlapped, and the user overlaps are greater. In many cases, take out the part of users and data with the same participant user but different user data characteristics for joint machine learning training. For example, suppose there are two participants A and B belonging to the same region. Participant A is a bank and participant B is an e-commerce platform. Participants A and B have more and the same users in the same region, but The business of A and B is different, and the recorded user data characteristics are different. In particular, the user data characteristics recorded by A and B may be complementary. In such a scenario, vertical federated learning can be used to help A and B build a joint The machine learning predictive model helps A and B provide better services to customers.

Specifically, a message request is sent to the preset vertical federated service party through a preset reinforcement learning model, wherein the message request includes identification information, and based on the identification information, the public information federated party can obtain the information from the preset public Obtain the available public information corresponding to the identification information from the data source, and then input the available public information into the vertical federation model in the public information federation party to obtain the vector information. For example, suppose that the vertical federation model is The batch gradient descent method is used to train the vertical federation model, and the available public information is input as a batch of training values into the vertical federation model in the public information federation party to obtain the output value of the vertical federation model, and then calculate the vertical federation model. The degree of difference between the output value of the federated model and the true value corresponding to the training value, that is, the current error value of this training is calculated, and then the preset loss function is respectively obtained about the model error and the longitudinal federated model The partial derivative of the model weight, where the loss function is a quadratic function of the model weight and the model error, and then the partial derivative value corresponding to the current weight value and the current error value is obtained, that is, the gradient vector value is also obtained. That is, the vector information is obtained.

Wherein, in step S10, the step of obtaining available public information includes:

Step S11, receiving a message request of a preset reinforcement learning model, and obtaining identification information in the message request through a preset vertical federated party;

In this embodiment, specifically, the message request of each of the preset reinforcement learning models is sent to the public information federated party, and then the identification information in the message request is obtained through the public information federated party, wherein the The message request includes identification information, the identification information includes geographic location coordinates, license plate number, etc., where the identification information in the message request includes tag matching, keyword matching and other methods.

Step S12, based on the identification information, match the available public information corresponding to the identification information in a preset public data source through the preset vertical federated party.

In this embodiment, it should be noted that the public data source includes model training information of a large number of reinforcement learning models, wherein the model training information includes available public information and unavailable public information.

Specifically, the identification information includes identification tags, identification keywords, identification strings, etc., and the model training information in the public data source is compared one by one through the preset vertical federated party, and the identification information is selected to include the identification information. To obtain the available public information.

Wherein, in step S10, the preset vertical federated service provider includes a vertical federated model, and the vertical federated model includes a current weight value,

The step of inputting the available public information into a preset vertical federated service party to obtain vector information includes:

Step S13, input the available public information as the current input value into the vertical federation model to obtain the current output value;

In this embodiment, it should be noted that the longitudinal federation model includes a neural network model, and one of the current input values corresponds to the current output value.

Specifically, the available public information is input into the vertical federation model as the current input value, and the available public information is processed by a preset data processing method, wherein the preset data processing method includes Convolution processing, pooling processing, fully connected processing, etc., where, assuming that the current input value is an image, the convolution refers to multiplying the image matrix and the convolution kernel corresponding to the image element by element and then summing them to obtain The process of image feature value, the convolution kernel refers to the weight matrix corresponding to the interface image feature, and the pooling refers to the integration of image feature values obtained through convolution to obtain new feature values In the process, the full connection can be regarded as a special convolution process, and the result of the special convolution process is to obtain a one-dimensional vector corresponding to the image, and then obtain the current output value, where the current output value includes Images, vectors, discrimination results, eigenvalues, etc.

Step S14, comparing the current output value with a preset current true value to obtain a current error value;

In this embodiment, it should be noted that each of the current input values corresponds to a current real value, and the real value at the time before the point is the theoretical output value of the model.

Specifically, for example, if the current output value is X and the preset current true value is Y, then the difference between the training output value and the true output value is XY, and the current error value is (XY )/X.

Step S15, based on the current weight value and the current error value, obtain a partial derivative of a preset loss function to obtain vector information corresponding to the current weight value and the current error value.

In this embodiment, it should be noted that the preset loss function refers to a quadratic function with respect to the model weight and the model error.

Specifically, the partial derivative of the preset loss function with respect to the model weight and the model error is obtained, since the current weight value and the current error value are taken at a specific point in the preset loss function, and then Obtain the partial derivative corresponding to the value of the specific point, and then obtain the vector information corresponding to the current weight value and the current error value. For example, assuming that the preset loss function is f(x, y), the The model weight is x and the model error is y, then the gradient vector, that is, the partial derivative is (∂f(x,y)/∂x,∂f(x,y)/∂y), if the current weight value If it is 0.5 and the current error value is 0.1, then the vector information is the gradient vector value when x=0.5 and y=0.1.

Step S20: Based on the vector information, train the vertical federation model of the preset vertical federated service provider, and update the network weight of each preset reinforcement learning model;

In this embodiment, it should be noted that the vector information includes a gradient vector.

Specifically, based on the vector information, train the vertical federation model of the preset vertical federation service provider to obtain sample information, and then train each of the preset reinforcement learning models based on the sample information, and then Update the network weight of each of the preset reinforcement learning models.

In step S30, each of the updated preset reinforcement learning models is input to a preset horizontal federated server regularly, and each of the updated preset reinforcement learning models is iteratively updated.

In this embodiment, it should be noted that the preset horizontal federated server is a preset server that can be combined with different preset reinforcement learning models to perform horizontal federated learning, where the horizontal federated learning is performed in each participant When the data features of the participants overlap more, and the users overlap less, the part of the data that has the same data features of the participants but not the same users is taken out for joint machine learning. For example, suppose there are two banks in different regions among the participants. Their user groups are from their respective regions, and their mutual intersections are small, but their businesses are very similar, and most of the recorded user data characteristics are the same. Then, horizontal federated learning can be used to help two banks build a joint model to predict their customer behavior. In addition, all information interactions in this embodiment can be encrypted, and the user can choose whether to perform the encryption.

Specifically, the updated model parameters of each of the preset reinforcement learning models are input to a preset horizontal federated server on a regular basis, and each of the model parameters is fused to obtain global model parameters, where the model parameters include gradient information, Weight information, etc., and then distribute the global model parameters to each of the preset reinforcement learning models, and each preset reinforcement learning model uses the received global model parameters as the starting point for local model training or as a local model The latest model parameters of, to start training or continue to train the preset reinforcement learning model, as shown in Figure 2 is a reinforcement learning architecture based on hybrid horizontal and vertical federation, where the reinforcement learning Agent1 and reinforcement learning Agent2 are different The data is stored as a data storage library storing sample information, the data source is used to receive sensor data sent by each of the preset reinforcement learning models, and the controller is used to implement the control information corresponding to the operating

This embodiment obtains vector information by obtaining available public information and inputting the available public information into a preset vertical federated service party to obtain vector information, and then based on the vector information, compare the vertical federation model of the preset vertical federated service party Perform training, update the network weight of each preset reinforcement learning model, and further, periodically input the updated preset reinforcement learning model to the preset horizontal federated server, and perform the update on each preset reinforcement learning model. Iterative update. That is, this embodiment first acquires the available public information, and then inputs the available public information into a preset vertical federation server to obtain vector information, and further, based on the vector information, perform a comparison of the vertical federation model Training to update the network weights of each of the preset reinforcement learning models, and finally, periodically input the updated preset reinforcement learning models to the preset horizontal federated server to perform the update of each of the preset reinforcement learning models. Iterative update of reinforcement learning model. That is, in this embodiment, by inputting the available public information into a preset vertical federation model, the preset vertical federation model is subjected to vertical federation learning, and then each preset reinforcement learning model is updated. Therefore, this embodiment The training data used for model training is more comprehensive and broad, which improves the control performance of the model and makes the model more robust. Furthermore, by regularly inputting the updated preset reinforcement learning models to the preset horizontal federated server, Perform horizontal federated learning on each of the preset reinforcement learning models, and iteratively update each of the updated preset reinforcement learning models, and further improve the control performance and robustness of the model, and increase each preset The effective training data of the reinforcement learning model further reduces the training process with low training effects, and further reduces the consumption of the calculation system resources of the single preset reinforcement learning model. Therefore, it solves the calculation of the reinforcement learning model in the prior art. Technical problem of high system resource consumption.

Further, referring to FIG. 3, based on the first embodiment of the present application, in another embodiment of the horizontal federation and vertical federation method, the predetermined reinforcement learning model is performed based on each of the vector information. The steps of training to update each of the preset reinforcement learning models include:

Step S21, receiving sensor data sent by each of the preset reinforcement learning models, and generating control information through the longitudinal federated model based on the sensor data and the vector information;

In this embodiment, it should be noted that, based on the control information, the preset reinforcement learning model can be controlled by the preset controller. For example, if the longitudinal federated model is an unmanned vehicle, the control The information can control the speed and direction of the unmanned vehicle.

Specifically, the sensor data is acquired from a local data source corresponding to a preset reinforcement learning model, and the sensor data is sent to a preset public federal party, where the sensor data includes distance sensor data, pressure sensor data, and Speed sensor data, etc., that is, the sensor data indicates the state information of the current time step of the longitudinal federated model, and then based on the sensor data and the vector information, control information is generated through the longitudinal federated model, wherein The direction of the gradient vector corresponding to the vector information is the direction in which the longitudinal federated model needs to be trained, so that the longitudinal federated model is trained for the next time step state information, and the control information can control the longitudinal federated model Train towards the next time step status information.

Step S22, in the training environment corresponding to the control information, train the vertical federation model to obtain reward information and next time step status information;

In this embodiment, it should be noted that the reward information is calculated and obtained by a preset reward function, the reward function is used to add a non-linear factor to the vertical federation model, and the next time step state information is After training the vertical federation model, the model state information of the vertical federation model after updating the network weight of the vertical federation model, and before performing the updating of the vertical federation model, that is, obtain the Before the next time step status information, it is judged whether this update is beneficial to reduce the model error, if the model error can be reduced, then the update is performed, if the model error cannot be reduced, the update is not performed.

Specifically, in the training environment corresponding to the control information, the longitudinal federation model is trained to obtain the reward information and the network weight of each neuron of the neural network in the longitudinal federation model, that is, to obtain Reward information and next time step state information, where the neuron includes a convolutional layer, a pooling layer, a fully connected layer, and so on.

Step S23: Store the reward information, the next time step status information, and the control information as sample information, and update the network weight of each preset reinforcement learning model based on the sample information.

In this embodiment, specifically, the reward information, the next time step state information, and the control information are combined into sample information, and stored in the data storage corresponding to each of the preset reinforcement learning models. In the library, each of the preset reinforcement learning models can extract sample information from the corresponding data storage library for training, and update the network weight of each preset reinforcement learning model according to the training result.

Wherein, in step S23, the step of updating the network weight of each of the preset reinforcement learning models based on the sample information includes:

Step S231: Input the sample information as training data into the preset reinforcement learning model to train the preset reinforcement learning model to obtain a training output value;

In this embodiment, specifically, the sample information is input to the preset reinforcement learning model as training data to perform data processing on the training data, where the data processing includes convolution, pooling, and full Connect and so on to obtain training output values, where the training output values include images, vectors, values, and so on.

Step S232, comparing the training output value with the actual output value corresponding to the training data to obtain a model error value;

In this embodiment, the training output value is compared with the real output value corresponding to the training data to obtain a model error value. Specifically, for example, the training output value is set to X, and the real output value is Y, the difference between the training output value and the real output value is XY, and the current error value is (XY)/X.

Step S233, comparing the model error value with a preset error threshold, and if the model error value is less than the preset error threshold, complete the training of the preset reinforcement learning model;

In this embodiment, it should be noted that the model error value is less than the preset error threshold value as one of the optional training completion conditions for completing the training of the preset reinforcement learning model, and the training completion condition It also includes loss function convergence, model parameter convergence, reaching the maximum number of iterations, reaching the maximum training time, etc., where the model parameters include the model error value.

Step S234: If the model error value is greater than or equal to the preset error threshold, update the network weight of the preset reinforcement learning model based on the model error value, and retrain the preset reinforcement learning model .

In this embodiment, it should be noted that the network weight is a convolution kernel or a weight matrix.

Specifically, if the model error value is greater than or equal to the preset error threshold, the corresponding gradient vector value is obtained based on the model error value, and the preset reinforcement learning model is updated based on the gradient vector value And retrain the preset reinforcement learning model until the preset training completion condition is reached.

In this embodiment, by receiving sensor data sent by each of the preset reinforcement learning models, and based on the sensor data and the vector information, control information is generated through the longitudinal federated model, and then in the training environment corresponding to the control information Next, train the vertical federation model to obtain reward information and next time step status information, and further, store the reward information, the next time step status information, and the control information as sample information, and Based on the sample information, the network weight of each preset reinforcement learning model is updated. That is, this embodiment first acquires sensor data, and then generates control information through the longitudinal federated model based on the sensor data and the vector information. Further, in the training environment corresponding to the control information, perform The training of the vertical federation model obtains reward information and next time step status information. Finally, the reward information, the next time step status information, and the control information are stored to obtain sample information based on The sample information is updated to each of the preset reinforcement learning models. That is, in this embodiment, by converting the available public information corresponding to each of the preset reinforcement learning models into sample information, it is possible to combine the data of multiple preset reinforcement learning models to strengthen each of the preset reinforcement learning models. The purpose of training and updating the learning model greatly enhances the control performance and robustness of each preset reinforcement learning model, reduces the model training time and training volume of a single preset reinforcement learning model, and thereby reduces The computing system resource consumption of a single preset reinforcement learning model lays a foundation for solving the technical problem of high computing system resource consumption of the reinforcement learning model in the prior art.

Further, referring to FIG. 4, based on the first embodiment and the second embodiment of the present application, in another embodiment of the horizontal federation and vertical federation method, the preset reinforcement learning is regularly updated The model is input to the preset horizontal federated server, and the steps of iteratively updating each of the preset reinforcement learning models after the update include:

Step S31: Periodically input each of the updated preset reinforcement learning models to the preset horizontal federation server to perform horizontal federation on each of the updated preset reinforcement learning models based on preset federation rules to obtain a horizontal federation model;

In this embodiment, it should be noted that the preset horizontal federated server is a preset server that can be used for horizontal federated learning, and the regular time can be set by the user. For example, suppose the regular time is set If it is 10 minutes, each of the updated preset reinforcement learning models is sent to the preset horizontal federated server every 10 minutes.

Specifically, each of the updated preset reinforcement learning models is input to the preset horizontal federation server on a regular basis, so as to send the model parameters of each preset reinforcement learning model to the horizontal federated server, and each of the preset reinforcement learning models is sent to the horizontal federated server. The model parameters are fused to obtain global model parameters, and the preset reinforcement learning models are updated based on the global model parameters to obtain a horizontal federation model.

Wherein, each of the updated preset reinforcement learning models includes updated model parameters,

Said regularly inputting each of the updated preset reinforcement learning models into the preset horizontal federation server to perform horizontal federation on each of the updated preset reinforcement learning models based on preset federation rules to obtain a horizontal federation model The steps include:

Step S311, periodically inputting each of the updated model parameters to the preset horizontal federated server to fuse each of the updated model parameters to obtain global model parameters;

In this embodiment, specifically, each of the updated model parameters is input to the preset horizontal federated server to perform data processing of each of the updated model parameters according to a preset rule, wherein the data of the preset rule The processing includes averaging, weighted average, etc., to obtain the global model parameter, wherein the weight proportion corresponding to each of the updated model parameters participating in the weighted average is set by the user.

Step S312: Distribute the global model parameters to each of the updated preset reinforcement learning models, so as to train the updated preset reinforcement learning models based on the global model parameters to obtain the horizontal federation model .

In this embodiment, specifically, the global model parameters are distributed to each of the updated preset reinforcement learning models, so that the global model parameters are used as the model training starting point of each preset reinforcement learning model Or directly replace the local model parameters of each preset reinforcement learning model, and then train the updated preset reinforcement learning model to obtain the horizontal federated model.

Step S32, based on the horizontal federation model, iteratively update each of the updated preset reinforcement learning models.

In this embodiment, specifically, based on the global model parameters in the horizontal federation model, the global model parameters are used as the model training starting point of each of the preset reinforcement learning models or directly replace each of the preset reinforcements. Learn the local model parameters of the model, and then train the updated preset reinforcement learning model, and determine whether the preset reinforcement learning model after training meets the training completion conditions, and if the training completion conditions are met, the training is completed For the training of the preset reinforcement learning model, if the preset training completion condition is not met, update the network weight of the preset reinforcement learning model, and retrain the preset reinforcement learning model until the training completion condition is reached Wherein, the training completion conditions include loss function convergence, model parameter convergence, reaching the maximum number of iterations, reaching the maximum training time, and the like.

In this embodiment, by periodically inputting each of the updated preset reinforcement learning models into the preset horizontal federation server to perform horizontal federation on each of the updated preset reinforcement learning models based on preset federation rules, to obtain the horizontal federation. The federation model further updates each of the updated preset reinforcement learning models iteratively based on the horizontal federation model. That is, in this embodiment, by periodically inputting each of the updated preset reinforcement learning models to the preset horizontal federation server, the updated preset reinforcement learning models are performed on the basis of preset federation rules. Federation, obtaining a horizontal federation model, and then, based on the horizontal federation model, iteratively update each of the updated preset reinforcement learning models. That is, this implementation provides a method for performing horizontal federation, by periodically inputting each of the updated preset reinforcement learning models into the preset horizontal federation server, and uniting the updated preset reinforcement learning models Perform learning to obtain the horizontal federation model corresponding to each of the updated preset reinforcement learning models, and then, based on the horizontal federation model, iteratively update each of the updated preset reinforcement learning models, which further improves The control performance and robustness of the model reduces the model training time and training volume of a single preset reinforcement learning model, thereby reducing the computing system resource consumption of a single preset reinforcement learning model. Therefore, in order to solve the problem in the prior art The technical problems of poor control performance and low robustness of the reinforcement learning model laid the foundation.

Referring to FIG. 5, FIG. 5 is a schematic diagram of the device structure of the hardware operating environment involved in the solution of the embodiment of the present invention.

As shown in FIG. 5, the horizontal federation and vertical federation combined device may include a processor 1001, such as a CPU, a memory 1005, and a communication bus 1002. Among them, the communication bus 1002 is used to implement connection and communication between the processor 1001 and the memory 1005. The memory 1005 may be a high-speed RAM memory, or a stable memory (non-volatile memory), such as a magnetic disk memory. Optionally, the memory 1005 may also be a storage device independent of the aforementioned processor 1001.

Optionally, the combined horizontal and vertical federation equipment may also include a rectangular user interface, a network interface, a camera, an RF (Radio Frequency, radio frequency) circuit, a sensor, an audio circuit, a WiFi module, and so on. The rectangular user interface may include a display screen (Display) and an input sub-module such as a keyboard (Keyboard), and the optional rectangular user interface may also include a standard wired interface and a wireless interface. Optional network interface can include standard wired interface, wireless interface (such as WI-FI interface).

Those skilled in the art can understand that the horizontal federation and vertical federation combined equipment structure shown in FIG. 5 does not constitute a limitation on the horizontal federation and vertical federation combined equipment, and may include more or less components than shown in the figure, or a combination Certain components, or different component arrangements.

As shown in FIG. 5, the memory 1005 as a computer storage medium may include an operating system, a network communication module, and a horizontal federation and a vertical federation joint program. The operating system is a program that manages and controls the hardware and software resources of horizontal federation and vertical federation combined equipment, and supports the operation of horizontal federation and vertical federation combined programs and other software and/or programs. The network communication module is used to realize the communication between the components in the memory 1005 and the communication with other hardware and software in the horizontal federation and vertical federation system.

In the horizontal federation and vertical federation combined equipment shown in FIG. 5, the processor 1001 is used to execute the horizontal federation and vertical federation combined program stored in the memory 1005 to implement the horizontal federation and vertical federation combined method described in any one of the above step.

The specific implementation of the horizontal federation and vertical federation combined equipment of the present invention is basically the same as the foregoing embodiments of the horizontal federation and vertical federation combined method, and will not be repeated here.

The present invention also provides a combined device for horizontal federation and vertical federation. The combined device for horizontal federation and vertical federation includes:

Optionally, the first update module includes:

An acquiring unit, configured to receive sensor data sent by each of the preset reinforcement learning models, and generate control information through the longitudinal federated model based on the sensor data and the vector information;

The first training unit is configured to train the longitudinal federated model in the training environment corresponding to the control information to obtain reward information and next time step status information;

The first update unit is configured to store the reward information, the next time step status information, and the control information as sample information, and update each preset reinforcement learning model based on the sample information Network weight.

Optionally, the first update unit includes:

The first training subunit is configured to input the sample information as training data into the preset reinforcement learning model, so as to train the preset reinforcement learning model to obtain a training output value;

The comparison subunit is used to compare the training output value with the actual output value corresponding to the training data to obtain a model error value;

The first judgment subunit is configured to compare the model error value with a preset error threshold, and if the model error value is less than the preset error threshold, complete the preset reinforcement learning Model training;

The second judging subunit is configured to: if the model error value is greater than or equal to the preset error threshold, update the network weight of the preset reinforcement learning model based on the model error value, and evaluate the prediction Let the reinforcement learning model be retrained.

Optionally, the second update module includes:

The regular sending unit is configured to periodically input the updated preset reinforcement learning models to the preset horizontal federation server, so as to perform horizontal transfers on the updated preset reinforcement learning models based on preset federation rules. Federation, obtain the horizontal federation model;

The second update unit is configured to iteratively update each of the updated preset reinforcement learning models based on the horizontal federation model.

Optionally, the periodic sending unit unit includes:

The fusion subunit is configured to periodically input each of the updated model parameters to the preset horizontal federated server to fuse each of the updated model parameters to obtain global model parameters;

The second training subunit is used for distributing the global model parameters to each of the updated preset reinforcement learning models, so as to train the updated preset reinforcement learning models based on the global model parameters , To obtain the horizontal federation model.

Optionally, the input module includes:

An input unit for inputting the available public information as the current input value into the longitudinal federated model to obtain the current output value;

A comparison unit, configured to compare the current output value with a preset current true value to obtain a current error value;

The partial derivative unit is configured to obtain a partial derivative of a preset loss function based on the current weight value and the current error value to obtain vector information corresponding to the current weight value and the current error value.

Optionally, the input module includes:

A receiving unit, configured to receive a message request of a preset reinforcement learning model, and obtain the identification information in the message request through a preset vertical federate;

The matching unit is configured to match the available public information corresponding to the identification information in a preset public data source through the preset vertical federated party based on the identification information.

The specific implementation of the horizontal federation and vertical federation combined device of the present invention is basically the same as the foregoing embodiments of the horizontal federation and vertical federation combined method, and will not be repeated here.

The present invention provides a medium, and the medium is a computer-readable storage medium, the medium stores one or more programs, and the one or more programs may also be executed by one or more processors for use To realize the steps of the horizontal federation and vertical federation combined method described in any one of the above.

The specific implementation of the medium of the present invention is basically the same as the foregoing embodiments of the horizontal federation and vertical federation combined method, and will not be repeated here.

The above are only preferred embodiments of the present invention, and do not limit the scope of the present invention. Any equivalent structure or equivalent process transformation made by using the contents of the description and drawings of the present invention, or directly or indirectly applied to other related technical fields , The same principle is included in the scope of the patent treatment of the present invention.

Claims

A combined method of horizontal federation and vertical federation, characterized in that the combined method of horizontal federation and vertical federation includes:

Obtain available public information, and input the available public information into a preset vertical federal service party to obtain vector information;

Based on the vector information, train the vertical federation model of the preset vertical federated service provider, and update the network weight of each preset reinforcement learning model;

Each of the updated preset reinforcement learning models is input to a preset horizontal federated server regularly, and each of the updated preset reinforcement learning models is iteratively updated.
The horizontal federation and vertical federation joint method according to claim 1, wherein the vertical federation model of the preset vertical federated service party is trained based on the vector information to update each preset reinforcement learning model The steps of network weighting include:

Receiving sensor data sent by each of the preset reinforcement learning models, and generating control information through the longitudinal federated model based on the sensor data and the vector information;

Training the longitudinal federation model in the training environment corresponding to the control information to obtain reward information and next time step status information;

The reward information, the next time step status information, and the control information are stored as sample information, and based on the sample information, the network weight of each of the preset reinforcement learning models is updated.
3. The combined method of horizontal federation and vertical federation according to claim 2, wherein the step of updating the network weight of each of the preset reinforcement learning models based on the sample information comprises:

Inputting the sample information as training data into the preset reinforcement learning model to train the preset reinforcement learning model to obtain a training output value;

Comparing the training output value with the actual output value corresponding to the training data to obtain a model error value;

Comparing the model error value with a preset error threshold, and if the model error value is less than the preset error threshold, completing the training of the preset reinforcement learning model;

If the model error value is greater than or equal to the preset error threshold, the network weight of the preset reinforcement learning model is updated based on the model error value, and the preset reinforcement learning model is retrained.
The combined method of horizontal federation and vertical federation according to claim 1, characterized in that, each of the updated preset reinforcement learning models is input to a preset horizontal federation server regularly, and each of the updated presets is strengthened The steps of iterative update of the learning model include:

Regularly input each of the updated preset reinforcement learning models to the preset horizontal federation server to perform horizontal federation on each of the updated preset reinforcement learning models based on preset federation rules to obtain a horizontal federation model;

Based on the horizontal federation model, iteratively update each of the updated preset reinforcement learning models.
5. The combined horizontal federation and vertical federation method according to claim 4, wherein each of the preset reinforcement learning models after the update includes updated model parameters,

Said regularly inputting each of the updated preset reinforcement learning models into the preset horizontal federation server to perform horizontal federation on each of the updated preset reinforcement learning models based on preset federation rules to obtain a horizontal federation model The steps include:

Regularly input each of the updated model parameters to the preset horizontal federated server to fuse each of the updated model parameters to obtain global model parameters;

Distributing the global model parameters to each of the updated preset reinforcement learning models, so as to train the updated preset reinforcement learning models based on the global model parameters to obtain the horizontal federated model.
The method for combining horizontal federation and vertical federation according to claim 1, wherein the preset vertical federated service party includes a vertical federated model, and the vertical federated model includes a current weight value,

The step of inputting the available public information into a preset vertical federated service party to obtain vector information includes:

Input the available public information as the current input value into the vertical federation model to obtain the current output value;

Comparing the current output value with the preset current true value to obtain the current error value;

Based on the current weight value and the current error value, a partial derivative of a preset loss function is obtained to obtain vector information corresponding to the current weight value and the current error value.
The method of combining horizontal federation and vertical federation according to claim 1, wherein the step of obtaining available public information comprises:

Receiving a message request of a preset reinforcement learning model, and obtaining identification information in the message request through a preset vertical federated party;

Based on the identification information, the available public information corresponding to the identification information is matched in a preset public data source through the preset vertical federated party.
A combined device of horizontal federation and vertical federation, characterized in that the combined device of horizontal federation and vertical federation is applied to combined equipment of horizontal federation and vertical federation, and the combined device of horizontal federation and vertical federation includes:

The input module is used to obtain the available public information, and input the available public information into a preset vertical federated service party to obtain vector information;

The first update module is configured to train the vertical federation model of the preset vertical federated server based on the vector information, and update the network weight of each preset reinforcement learning model;

The second update module is configured to periodically input the updated preset reinforcement learning models to a preset horizontal federated server, and iteratively update each of the updated preset reinforcement learning models.
8. The combined device of horizontal federation and vertical federation according to claim 8, wherein the first update module comprises:

An acquiring unit, configured to receive sensor data sent by each of the preset reinforcement learning models, and generate control information through the longitudinal federated model based on the sensor data and the vector information;

The first training unit is configured to train the longitudinal federated model in the training environment corresponding to the control information to obtain reward information and next time step status information;

The first update unit is configured to store the reward information, the next time step status information, and the control information as sample information, and update each preset reinforcement learning model based on the sample information Network weight.
9. The combined device for horizontal federation and vertical federation according to claim 9, wherein the first update unit comprises:

The first training subunit is configured to input the sample information as training data into the preset reinforcement learning model, so as to train the preset reinforcement learning model to obtain a training output value;

The comparison subunit is used to compare the training output value with the actual output value corresponding to the training data to obtain a model error value;

The first judgment subunit is configured to compare the model error value with a preset error threshold, and if the model error value is less than the preset error threshold, complete the preset reinforcement learning Model training;

The second judging subunit is configured to: if the model error value is greater than or equal to the preset error threshold, update the network weight of the preset reinforcement learning model based on the model error value, and evaluate the prediction Let the reinforcement learning model be retrained.
The combined device of horizontal federation and vertical federation according to claim 8, wherein the second update module comprises:

The regular sending unit is configured to periodically input the updated preset reinforcement learning models to the preset horizontal federation server, so as to perform horizontal transfers on the updated preset reinforcement learning models based on preset federation rules. Federation, obtain the horizontal federation model;

The second update unit is configured to iteratively update each of the updated preset reinforcement learning models based on the horizontal federation model.
The combined device for horizontal federation and vertical federation according to claim 11, wherein the periodic sending unit comprises:

The fusion subunit is configured to periodically input each of the updated model parameters to the preset horizontal federated server to fuse each of the updated model parameters to obtain global model parameters;

The second training subunit is used for distributing the global model parameters to each of the updated preset reinforcement learning models, so as to train the updated preset reinforcement learning models based on the global model parameters , To obtain the horizontal federation model.
The combined device of horizontal federation and vertical federation according to claim 8, wherein the input module comprises:

An input unit for inputting the available public information as the current input value into the longitudinal federated model to obtain the current output value;

A comparison unit, configured to compare the current output value with a preset current true value to obtain a current error value;

The partial derivative unit is configured to obtain a partial derivative of a preset loss function based on the current weight value and the current error value to obtain vector information corresponding to the current weight value and the current error value.
The combined device of horizontal federation and vertical federation according to claim 8, wherein the input module comprises:

A receiving unit, configured to receive a message request of a preset reinforcement learning model, and obtain the identification information in the message request through a preset vertical federate;

The matching unit is configured to match the available public information corresponding to the identification information in a preset public data source through the preset vertical federated party based on the identification information.
A combined horizontal federation and vertical federated device, wherein the combined horizontal federated and vertical federated device includes a memory, a processor, and a horizontal federated and vertical federated combined program stored on the memory and running on the processor When the horizontal federation and vertical federation joint program is executed by the processor, the following steps are implemented:

Obtain available public information, and input the available public information into a preset vertical federal service party to obtain vector information;

Based on the vector information, train the vertical federation model of the preset vertical federated service provider, and update the network weight of each preset reinforcement learning model;

Each of the updated preset reinforcement learning models is input to a preset horizontal federated server regularly, and each of the updated preset reinforcement learning models is iteratively updated.
The horizontal federation and vertical federation combined equipment according to claim 15, wherein the vertical federation model of the preset vertical federation service party is trained based on the vector information to update each preset reinforcement learning model The steps of network weighting include:

Receiving sensor data sent by each of the preset reinforcement learning models, and generating control information through the longitudinal federated model based on the sensor data and the vector information;

Training the longitudinal federation model in the training environment corresponding to the control information to obtain reward information and next time step status information;

The reward information, the next time step status information, and the control information are stored as sample information, and based on the sample information, the network weight of each of the preset reinforcement learning models is updated.
The horizontal federation and vertical federation combined device according to claim 16, wherein the step of updating the network weight of each of the preset reinforcement learning models based on the sample information comprises:

Inputting the sample information as training data into the preset reinforcement learning model to train the preset reinforcement learning model to obtain a training output value;

Comparing the training output value with the actual output value corresponding to the training data to obtain a model error value;

Comparing the model error value with a preset error threshold, and if the model error value is less than the preset error threshold, completing the training of the preset reinforcement learning model;

If the model error value is greater than or equal to the preset error threshold, the network weight of the preset reinforcement learning model is updated based on the model error value, and the preset reinforcement learning model is retrained.
The horizontal federation and vertical federation combined equipment according to claim 15, wherein the preset reinforcement learning model is periodically input to a preset horizontal federation server after updating, and each preset reinforcement learning model is updated to a preset horizontal federation server. The steps of iterative update of the learning model include:

Regularly input each of the updated preset reinforcement learning models to the preset horizontal federation server to perform horizontal federation on each of the updated preset reinforcement learning models based on preset federation rules to obtain a horizontal federation model;

Based on the horizontal federation model, iteratively update each of the updated preset reinforcement learning models.
The horizontal federation and vertical federation combined equipment according to claim 15, wherein the preset vertical federated service party includes a vertical federated model, and the vertical federated model includes a current weight value,

The step of inputting the available public information into a preset vertical federated service party to obtain vector information includes:

Input the available public information as the current input value into the vertical federation model to obtain the current output value;

Comparing the current output value with the preset current true value to obtain the current error value;

Based on the current weight value and the current error value, a partial derivative of a preset loss function is obtained to obtain vector information corresponding to the current weight value and the current error value.
A medium, characterized in that a program for realizing the combined method of horizontal federation and vertical federation is stored on the medium, and the program for realizing the combined method of horizontal federation and vertical federation is executed by a processor to realize as claimed in claims 1 to 7. Any one of the steps of the horizontal federation and vertical federation combined method.