CN115357931A

CN115357931A - Federal transfer learning recommendation method based on homomorphic encryption

Info

Publication number: CN115357931A
Application number: CN202210983985.9A
Authority: CN
Inventors: 吴刚; 黎煜祺; 薛其韵; 张耿荣
Original assignee: Hangzhou Pugongying Technology Co ltd
Current assignee: Hangzhou Pugongying Technology Co ltd
Priority date: 2022-08-17
Filing date: 2022-08-17
Publication date: 2022-11-18

Abstract

The invention discloses a recommendation method of federal transfer learning based on homomorphic encryption, wherein a central server is responsible for generating an initial training gradient of a federal learning model, immediately encrypting is carried out on the server, the encrypted gradient is sent to an app side of a user through a deep reinforcement learning algorithm, the app side user privacy data is trained and learned after receiving the initial gradient sent by the server, when training is iterated to a certain number of rounds, the initial training gradient is encrypted and transmitted back to the central server, and the central server generates a global model.

Description

Recommendation method for federal transfer learning based on homomorphic encryption

Technical Field

The invention relates to a recommendation method for federal transfer learning based on homomorphic encryption.

Background

The data elements serve as a foundation of artificial intelligence, rich data accumulation is provided for development of the artificial intelligence, and along with increasing of calculation level and rapid reduction of hardware cost, more possibilities are provided for making great breakthrough of the artificial intelligence. In recent years, however, the worldwide supervision on data privacy is becoming generalized, intensive, and strict: only in China, a series of policies aiming at protecting the data privacy security of citizens are issued successively since 2018, such as: according to the technical scheme, national laws, administrative laws and regulations, department regulations and the like such as scientific data management methods, national health and medical big data standards and security and service management methods (trial implementation), data security management methods (solicited opinions draft) and the like are used for guaranteeing data privacy, and a series of measures of the country standardize a series of behavior specifications of enterprises in data collection, storage, protection and use of users.

The privacy calculation and the associated federal study are developed by following the operation under the environment of increasingly strict supervision on data, and the technologies enable enterprises to fully mobilize the initiative of all parties including a data resource owner party, a user party, an operator party and a supervision party under the premise of meeting the requirements of data compliance, and realize mass aggregation, transaction and circulation of data resources, so that the data resource value of a third-party organization is mastered, the market allocation of data elements is promoted, and the privacy calculation is more remarkable in value under the promulgation of the national data security law and the like. Privacy calculations are actually a collection of "data available invisible" technologies, including federal learning, confidential calculations, differential privacy, data desensitization, and the like.

Most of data of the user side apps in the traditional state are stored in a client/server-based or distributed database, the storage mode is mostly plaintext storage or low-level security storage, and generally, the possibility of attack is high.

Privacy data: particularly, sensitive data such as personal medical health data and physical examination data are required by many citizens, and the sensitive data information of the citizens cannot be processed and sold, but the citizens do not intend to violate the requirement, and much private information of the citizens is leaked without reason. If the appearance of deep learning makes the AI revive, then the huge business opportunities brought by data circulation and sharing make the privacy calculation revive. Like traditional capital, land, labor, technology and the like, the data element is one of production elements, and is combined with calculation power and algorithms to become a novel social productivity, and more business scenes need circulation and sharing of multi-party data elements. The accuracy is further improved by forming a combined AI model in the medical field, pharmaceutical enterprises, medical institutions, insurance companies and the major health industry and by case data sharing.

Disclosure of Invention

According to the problems in the prior art, the invention provides a recommendation method of Federal transfer learning (FTL model for short) based on homomorphic encryption, which is based on the worry that citizens in the background technology possibly misuse private data, particularly health & physical examination data, the privacy data of family members of users, such as health & physical examination, and the like, are encrypted for storage, AI training, circulation/exchange and the like, so that the privacy data of the clients are invisible to the inside and the outside of a company, and the privacy data elements of the users are protected to the greatest extent on the premise of meeting the security policy related to national data.

In order to realize the purpose, the technical scheme of the invention is as follows: a recommendation method of a diet plan based on homomorphic encrypted Federal transfer learning comprises the following steps: the method comprises the following steps:

the method comprises the following steps: generating a data gradient of Federal transfer learning (FTL model for short), encrypting and distributing to a user app side for training:

101, a central server of the federal migration learning is responsible for generating an initial average gradient (iag for short) of model training of a certain user recommendation model (RECOM for short) at a certain time point; the initial time points of this model are defined as: s; the central server is provided with an internal memory computing chip (ReRAM); reRAM is a new type of non-volatile memory;

201: at a certain time point S, when the central server is responsible for encrypting and distributing the initial average gradient of the federal migration learning to the app side of each user, we generate a signal called a multi-level cache signaling identifier (ReRAM MLCSI) in an Instruction Register (IR) in the ReRAM for the initial average gradient (iag) in the Instruction Register (IR) and an L1/L2/L3 level cache of the chipset, and the components of the ReRAM MLCSI are composed of L1 MLCSI, L2 MLCSI, L3 MLCSI, and MLCSI External.

202: as the number of iterative rounds of federal transfer learning model training increases: more and more ReRAM MLCSI is provided, the series of ReRAM MLCSI is constructed into a set of a dynamic shortest Path first protocol Table (OSPF Table for short), and the dynamic OSPF Table plays a role in connecting the ReRAM MLCSI with an L1/L2/L3 level cache and the shortest path of each external network side app end:

when the connection timestamp of a certain link in the OSPF Table exceeds the time threshold obtained after the reinforcement learning, the OSPF Table may consider the network unreachable: the link is abandoned, and a reconnected handshake protocol is initiated to the ReRAM MLCSI;

step two: pre-processing of user-side app data gradients (feature gradients) of FTL model:

301: with the lapse of an initial time S, since data (i.e., model feature dimensions of users) on app sides of users are different in an initial state, only a federate migration learning (FTL model) is used to solve such a model with inconsistent data dimension distribution, and according to a time sliding window mechanism under a big data component Flink in a development environment, a time average value of a certain piece of data generated by a user at a certain time point can be calculated: such as: the time average of two time points of the user generated data is calculated: s +1= (S + 1) + (S + 2)/2;

302, since the feature dimensions of the app side of each user are different, and the federated migration learning can just process the heterogeneous data, we do the following here:

firstly, the data gradient in a certain area range needs to be distributed, and a parent node of the certain area is assumed as follows: the Father _ Node _ HZ calculates the distribution of the data gradient distribution of users in the range, so that the data gradient distribution of the users approximately tends to the data distribution of an independent same distribution (IID, named as independent and intrinsic distributed);

in our algorithm, let us assume that there is a data gradient distribution of the sub-region of the next level under the Father _ Node _ HZ, for example, the sub-region can be set as a certain sub-Node:

"1" at these child node levels: each child node respectively carries out operations such as aggregation, averaging and the like on the data gradient of the user side in the network area range;

2, the data gradient of the last child Node is uploaded to the Father Node farmer _ Node _ HZ under certain conditions, and the final total data gradient < feature gradient > selects the optimal return path according to the following step 401 and returns the optimal return path to the central server of the FTL model.

Step three: and (3) finding an optimal path returned from the user side data gradient to the Federal mobility server by adopting a deep reinforcement learning method:

401: after the initial characteristic gradient information of the user side of the first round is obtained after the processing of the step 301 and the step 302, the initial characteristic gradient information needs to be instantly transmitted back to a central server of a federal migration learning (FTL model);

step four: and recommending by the Federal transfer learning according to the gradient information of the user:

501: 1, a user uploads app private data to a federated migration learning server through 401, the app on the user side is responsible for training local data to obtain a local model, the central server is responsible for weighting and aggregating the local model to obtain a global model, and a model approaching to a centralized machine learning result is finally obtained after multiple iterations, so that a plurality of privacy risks brought by traditional machine learning source data aggregation are effectively reduced;

2, carrying out an exclusive operation on the encrypted gradient information of the calculation participants (various app users) of different models by the Federal migration learning center server: calculating and deducing the gradient of data of different parts of each calculation participant respectively;

the recommendation system carries out recommendation by combining marking data of recommenders according to historical behaviors of users;

and 4, desensitizing, encrypting and storing the obtained recommendation information and privacy data of the user on a distributed server.

Preferably, in 201, the initial average gradient is encrypted by RSA256, MD5 or SHA 1.

Preferably, the total data gradient in 302 is transmitted back to the central server of the FTL model, and the following conditions may be preset: < a > a certain predetermined point in time; < b > above some predetermined good gradient threshold; < c > some predetermined good number of rounds of iteration.

Further, based on 401, since the recommendation algorithm is different from the traditional distributed machine learning environment based on the local area network, the biggest challenge of the recommendation algorithm of the health privacy information such as the encrypted federal shift learning-based user menu and the physical examination is that all users are dispersed all over the country, and thus, many complex network environments derive: the method comprises the problems of network throughput delay, network access and the like, the possibility that parameters and data of a training model are lost at any time is met, the trade-off between transmission delay and model precision needs to be made, and the following innovation is made in the aspect of model parameter transmission of federal transfer learning:

the method comprises the steps of modeling the connectivity (or arrival rate) of a complex network, namely the network connectivity of a metropolitan area network is poorer than that of a traditional local area network, and finding a connection path transmitted from each user app side to a federated migration learning center server through autonomous learning in a deep reinforcement learning mode.

The specific innovative method comprises the following steps:

"1" is equally set under a certain time point S, there are N users who are simultaneously operating various behaviors by using our app, and these N users constitute multi agent (multi agent) under the current time point, we set the initial state of these agents as S |, and the initial Environment of these agents under the S |, state as en _ environmental | (abbreviated as E |):

these E |, include: the feature gradients of N users at time S, the status of each user side is represented by an encrypted status code, and the series of parameters form a user side initial training data set, and these E |, at the user level, are called user initial context: e £ user;

meanwhile, E |, also includes the external conditions & attributes of the N users, including the current network path situation, here we also encrypt the indicators in each network path situation, and the series of encrypted status codes constitute the external condition part of the initial environment E |, of the multi-agents at a certain moment S: e external, when the action (A) is not at the same time S, the reward (R) is 0, the discount rate is γ, the learning rate is α:

the following is the formula (defined in the reinforcement learning environment) in the initial state of the user side network: s = (agent, E = (user + external), a = None, R =0, γ = 0);

"2" takes the lead of time to the next point: s +1, the operation behaviors of the N users may trigger a series of actions and the following problems of corresponding reward values, discount rates, learning rates, and the like:

at S +1 time, the expected reward for N users to perform an action (abbreviated as "a") is: q pi (S, a) = E pi [ Rt | St = S =, at = a ];

at this point, after the series of actions is triggered (actions are defined herein as operations by a certain user), a series of reward values and environmental changes are generated: for example: how many hops (hops) the current location of the app of user M (M belongs to N) needs to go to reach the Federal migration learning Server (usually, there are many (countless) paths; for example, in a certain round of training: we select the first 100 (which can be customized according to the network connection situation) paths, and in these 100 paths, we will construct some data when the number of arriving hops is different, such as: the method comprises the following steps that a network service provider brand, network speed, ip longitude and latitude, whether other users (users who use apps at the same time) exist in a certain area or not and the like, and the network parameter characteristics form a data source for next-stage model training, namely a RawDataSet @ network access;

"3" as the state-action value function in reinforcement learning obtained in the "2" step, in reinforcement learning: we go to a value that maximizes the state-action value function, and build this value to find the optimal state-action value function from the optimal policy, which is traditionally done according to greedy policy or epsilon-greedy policy, but in our complex metro-based path connectivity, if we use the traditional approach to find the optimal state-action value function, we have to define a very huge table in the policy network, which is exponentially expanded to be maintainable as network connectivity nodes and more users join.

Here, we introduce an improved version of the ResNet-50 (residual network) of the deep learning neural network in the reinforcement learning of this complex network path finding: the prototype of the network is composed of 3 blocks (Block) + as the last fully connected layer (total 50 layers of network), and the neural network is characterized in that: once a certain round of the model in the training is trapped in the local optimum and stagnates for training at a certain layer, the optimal gradient can be found in the network by a method similar to a multi-hop mechanism (can be colloquially called as an "identical shortcut key": one or more layers can be directly "hopped") so that the training of the model trapped in the local optimum is continued and the training is accelerated until the model converges;

our innovation points are as follows:

based on the ResNet-50 model embedded in our original model, the model is initialized for training, and as time goes on, the original model of ResNet-50 gradually falls into a local optimal state like the original model, so that temporary training is stopped.

While we initialize the model: recording the semaphore of each parameter of the model passing through the network at each moment, and creating a set of adaptive Kalman filter scheme with multiple fading factors to realize the separation of real network signals, network noises and the like, wherein the network noises comprise: index parameters such as instantaneous flow of instantaneous passing of the network, network abnormal data and the like:

firstly, collecting communication signals in a network channel when a model is initialized (a certain time point is set as S), obtaining respective state prediction estimated values of multidimensional signals in the channel according to a Kalman filter (alternative: simultaneously drawing a visual graph as an assistant),

because the conventional kalman filter has memory, if the previous data measurement is not accurate (here, it mainly refers to all data carried under a certain time node in network communication): the general statistical properties are inaccurate: it may directly result in inaccurate state estimation of the current state and the future state, and in severe cases, may cause the filter to diverge.

The scheme for processing the filter with better divergence can directly correct the next predicted mean square error matrix of the state estimation by selecting a proper fading factor, thereby playing the role of inhibiting the filter divergence.

Because the traffic in the network traffic (traffic) in our model learning is transmitted at a speed faster than the actual training data and the encrypted gradient, we estimate the traffic through each layer of the neural network ResNet-50 because it is trained to each layer (ResNet-50 totals 50 layers): the network returns some training parameters, the space occupied by the specific parameters in a memory or a buffer is large, congestion can be caused by back-and-forth transmission, and now, whether a signal is generated continuously when a certain layer is reached is judged by a filter, so that the layer where the model falls into local optimization (which means a certain gradient of the training staying at the certain layer) can be inferred;

until this step, a set of adaptive Kalman filter with multiple fading factors is embedded in the existing ResNet-50 neural network architecture to filter different semaphores in a network path to judge the training progress of the model;

5 ResNet-50 neural network of adaptive Kalman filter with embedded multiple fading factors (hereinafter referred to as addition Kalman ResNet-50, or AK ResNet-50) modified from the above step, puts the AK ResNet-50 in step 3 of 401 above:

let this AK ResNet-50 network be in a larger range of metropolitan networks: and searching an optimal path connecting the user app side and the Federal migration learning server side, returning an unreachable state code once a certain network path is unreachable, and updating the state code into a strategy for searching the optimal path through reinforcement learning.

The invention has the beneficial effects that:

(1): innovation of a server chip on a server side of federal transfer learning: the server is a server loaded with a memory computing chip (ReRAM), and has the functions of: the instant communication is maintained by creating a connection routing table connecting a signal identifier in an independent memory computing chip, a cache (cache) of a server L1/L2/L3 and an external user app, and the function of scheduling management is played; the relevant steps in this context are: 201;

(2): the method for transmitting the app data gradient back to the Federal migration learning server at the user side comprises the following steps: uploading the gradient layer by layer to a federated learning server by a mechanism of creating an information (dimension) gradient of the hierarchical users; the relevant steps in this context are: 302;

(3): the optimized transmission method for transmitting the parameters of the federal transfer learning and the like of each user side back to the federal transfer learning server comprises the following steps: learning an optimal return path from the main learning through reinforcement learning; finding out the optimal path under the complex network by combining reinforcement learning; the relevant steps in this context are: 401- "1", "2", "3";

(4): problem of model parameter transmission at each user side: the method is characterized in that the optimal transmission node of model parameters (including gradients and the like), data and the like in a network transmission path is judged by modifying the existing open-source deep learning neural network ResNet-50 (residual error network) to embed a multi-fading factor adaptive Kalman filter; the relevant steps in this context are: 401 to 4.

Drawings

FIG. 1 is a schematic flow chart of the present invention.

Detailed Description

Example 1

The embodiment relates to a comprehensive scene of encryption (decryption), storage, calculation, transmission, recommendation systems and the like of private information such as health information, meal plans, physical examination reports authorized by users and the like of family members of users at the app side. The method comprises the steps of firstly, carrying out key steps of data cleaning, feature extraction, classification and the like on health information, physical examination reports and other information of the whole family members of a user, then combining health diet recipes which are sorted by national first-level dieticians and comprehensive scores of six dimensions (the dimensions comprise health, balance, diversity, attractiveness, preference and value) according to the extracted features of privacy information of the user, then, enabling the data to enter a recommendation system training unit with the user and commodities (recipes) separated from each other (at the moment, all privacy data of the user are stored on the user side, only models flow out of the user side), stopping training when the training converges to a certain threshold (accuracy), immediately returning the models to a server to obtain lunch and dinner diet recipes which best meet the health of each member of the family members of the user, then, encrypting a series of information of the users, and scattering and storing the information on a distributed server according to a certain strategy.

The technical scheme provides a recommendation method of a diet plan based on homomorphic encryption and federal transfer learning, which comprises the following steps:

the central server is responsible for generating an initial training gradient of the federated learning model, immediately encrypting is carried out on the server, the encrypted gradient finds the optimal path to the app side of the user through a deep reinforcement learning algorithm and is sent to the app side, the privacy data of the app side user is trained and learned after receiving the initial gradient sent by the server, when training iteration reaches a certain number of rounds, the privacy data are encrypted and transmitted back to the central server, and the central server generates the global model.

The central server acts as follows: because the diet recommendation system is a cross-link system with strong comprehensiveness between the Federal transfer learning heterogeneous model training (different types of hardware terminals of all users, generally referring to mobile phones; 2, asymmetric data dimension space, different health/physical examination and other data uploaded on the user side) and the cryptographic operation (encryption and decryption) in a complex network environment (WLAN), if a central coordination server in the traditional distributed machine learning is not used for coordinating consumption and scheduling of parameters and data, the following main problems can be caused in a large probability:

because a server responsible for scheduling is lacked in the model training process, data and parameters participating in model training are transmitted when passing through a multi-server memory: its throughput may exhibit an exponential ramp as the data volume & parameters increase;

in the meantime, the resources occupied by the encryption and decryption of private data such as health data and physical examination of the user may frequently occupy a lot of io, which may slow down the convergence speed of the global overall model;

in "C", a server participating in private data such as health/physical examination calculated by a user is different from a server in conventional distributed machine learning (the conventional distributed machine learning server is generally located in the same local area network, and network transmission quality can be guaranteed), and the actual physical location of the user of the app is inconsistent, which is equivalent to that each end participating in federal migration learning is located in a complex network with different locations: factors such as network io throughput, network rate, network connectivity, shortest path and the like in the network may have great influence on parameter transmission participating in the Federal transfer learning model training, so that some data packets may be lost in the half way of transmission;

because the traditional ROM type memory in D is a volatile memory, once the traditional ROM type memory encounters network congestion, power failure, natural disasters or data packet loss caused by other artificial/non-artificial accidents, the health/physical examination data and training gradient of a user disappear, and bad experience is brought to the user.

The method comprises the following specific steps:

101, a central server of the federal migration learning is responsible for generating an initial average gradient (iag) of model training of a certain user recommendation model (called RECOM model below) at a certain time point (S is defined below as the initial time point of the model);

201: in the above #101, at a certain time point S, when a server installed with a re-ram (ReRAM) is responsible for distributing the initial average gradient of federal migration learning to the app side of each user through encryption such as RSA256, MD5 or SHA1, because ReRAM is a novel nonvolatile memory, we generate a signal called a Multi-Level Cache signaling Identifier (ReRAM Multi-Level Cache signaling Identifier, hereinafter referred to as a ReRAM MLCSI) in an Instruction Register (IR) in ReRAM for this initial average gradient (iag) in the Instruction Register (IR) and an L1/L2/L3 Level Cache of the chipset, and components of the ReRAM MLCSI include L1 MLCSI, L2 MLCSI, L3 MLCSI, and MLCSI External.

202: as the number of iterative rounds of federal transfer learning model training increases: more and more ReRAM MLCSI is established, the series of ReRAM MLCSI is constructed into a set of a dynamic shortest Path first protocol Table (hereinafter, OSPF Table), and the dynamic OSPF Table plays a role in connecting the ReRAM MLCSI with an L1/L2/L3 level cache and the shortest path of each external network side app end:

301: because data (i.e., model feature dimensions of users) on app sides of users are different in an initial state, only a federal migration learning (FTL) model is used for solving the model with inconsistent data dimension distribution, and according to a time sliding window mechanism under a big data component Flink in a development environment, a time average value of a certain piece of data generated by a user at a certain time point can be calculated: such as: the time average of two time points of the user generated data is calculated: s +1= (S + 1) + (S + 2)/2;

since the feature dimensions of the app side of each user are different, the federated migration learning just can process such heterogeneous data, such as: user a may have "high three", user B has a "gout history": the inability to eat purine-bearing food, while user C may be pregnant: daily balanced nutrition is required, so our practice here is as follows:

in the algorithm, we set the data gradient distribution of the sub-region of the next level under the Father _ Node _ HZ, for example, the sub-region can be set as Child _ Node _ GS, child _ Node _ YH, child _ Node _ BJ, etc.:

"1" at these child node levels: each Child Node (for example, child Node GS Node) can respectively aggregate the user-side data gradients in the network area range, and perform averaging and other operations;

2, the data gradient of the last child Node is uploaded to the Father Node Father _ Node _ HZ under certain conditions, and the final total data gradient < eigen gradient > selects the optimal backhaul according to the following step # 401: these conditions are optional and include the following (which can be added later depending on the actual situation): < a > a certain predetermined point in time; < b > above some predetermined good gradient threshold; < c > some predetermined good number of rounds of iteration.

401: after the initial feature gradient information of the user side of the first round obtained after the processing of the step 301&302 is processed, the initial feature gradient information needs to be instantly transmitted back to a central server of a federal migration learning (FTL), but because the initial feature gradient information is different from a traditional distributed machine learning environment based on a local area network, the biggest challenge of the recommendation algorithm of the health privacy information such as the encrypted user menu of the federal migration learning, physical examination and the like is that all users are dispersed all over the country, and thus a plurality of complex network environments derive from the following steps: the method comprises the problems of network throughput delay, network access and the like, and the possibility that parameters and data of a training model are lost at any time, so that trade-off needs to be made between transmission delay and model precision. For this reason, our innovation in the model parameter transmission of federal migration learning is as follows:

the method comprises the following steps of modeling the connectivity (or arrival rate) of a complex network, namely the network connectivity of a metropolitan area network is poorer than that of a traditional local area network, and finding a connection path transmitted from each user app side to a federated migration learning center server through autonomous learning in a deep reinforcement learning mode, wherein the specific innovative method comprises the following steps:

as described in the text of "1", as well as in a certain time point S, N users are provided to operate various behaviors while using our app, the N users constitute multi agents (multi agents) at the current time point, we set the initial states of the agents to be S |, and the initial environments of the agents to be S |, as an initial Environment, as an Environment, in the state of S |:

these E |, include: the feature gradient of N users in S time can be a user searching state, a browsing state, a menu changing state, a menu adding state, a payment, a physical examination report uploading and the like, the state of each user side is represented by a state code encrypted by SHA1 for example, if the searching state code is SX01 and the like, the series of parameters form a user side initial training data set, and E |, at the user level, is called as a user initial environment: e £ user;

meanwhile, E |, also includes the external conditions & attributes of the N users, including the current network path conditions (including important parameters such as network connectivity, network packet loss ratio, shortest route, etc.), here, we also encrypt the indicators under each network path condition, and the series of encrypted status codes form the external condition part of the initial environment E |, of the multi-agent at a certain S moment: e | _ external, simultaneously, if action (A) is not in the S moment, reward (R) is 0, discount rate is gamma, learning rate is alpha:

the following is the formula (defined in the reinforcement learning environment) in the initial state of the user side network:

S⊙＝(agent,E⊙(user+external),A＝None,R＝0,γ＝0)

at S +1 time, the expected reward for N users to perform an action (abbreviated as "a") is: q pi (S, A) = E pi [ Rt | St = S |, at = A ]

At this point, after the series of actions is triggered (actions are defined herein as operations by a certain user), a series of reward values and environmental changes are generated: for example: how many hops (hops) the current location of the app of user M (M e N) needs to go to reach the Federal migration learning Server-typically, there will be many (countless) paths, such as in a certain round of training: we select the first 100 (which can be customized according to the network connection situation) paths, and in these 100 paths, the number of arriving hops is different, and we will form some data, such as: the method comprises the following steps that a network service provider brand, network speed, ip longitude and latitude, whether other users (users who use apps at the same time) exist in a certain area or not and the like, and the network parameter characteristics form a data source for next-stage model training, namely a RawDataSet @ network access;

"3" regarding the state-action value function in reinforcement learning, as determined in step "2" of 401, in reinforcement learning: we go to find a value that maximizes the state-action value function, and construct the value to find the optimal state-action value function from the optimal strategy, which is conventionally done according to greedy strategy or epsilon-greedy strategy, but in our complex metro-based path connectivity, if we use the conventional method to find the optimal state-action value function, we have to define a very huge table in the policy network, which is exponentially inflated to be too large to maintain as network connectivity nodes and more users join.

Here, we introduce an improved version of the ResNet-50 (residual network) of the deep learning neural network in the reinforcement learning of this complex network path finding: the prototype of the network is composed of 3 blocks (Block) + as the last fully connected layer (total 50 layers of network), and the neural network is characterized in that: once a round of the model in training is stuck in the local optimum at a layer, the optimal gradient can be found in the network by using a method similar to the multi-hop mechanism (which can be colloquially called "constant shortcut": one or more layers can be directly "hopped") so that the training of the model stuck in the local optimum is continued and the training is accelerated until the model converges. Our innovation points are as follows:

While we initialize the model: recording the semaphore of each parameter of the model passing through the network at each moment, and creating a set of adaptive Kalman filter scheme with multiple fading factors to realize the separation of real network signals, network noises and the like, wherein the network noises comprise: index parameters such as instantaneous flow of instantaneous passing of the network, abnormal network data and the like:

firstly, collecting communication signals in a network channel when a model is initialized (a certain time point is set as S), and obtaining respective state prediction estimation values of multidimensional signals in the channel according to a Kalman filter (alternative: simultaneously drawing a visual graph as an assistant), wherein the traditional Kalman filter has memory, and if the measurement of data in the previous period is not accurate (mainly referring to all data loaded under a certain time point in network communication): the general statistical properties are inaccurate: it may directly result in inaccurate state estimation of the current state and the future state, and in severe cases, may cause the filter to diverge.

Because the traffic in the network traffic (traffic) in our model learning is transmitted at a speed faster than the actual training data and the encrypted gradient, we estimate the traffic through each layer of the neural network ResNet-50 because it is trained to each layer (ResNet-50 totals 50 layers): the network will return some training parameters, and the space occupied by these specific parameters in the memory or buffer is large, so that congestion may be caused by transmission back and forth, and now we can deduce which layer the model falls into the local optimum (which means a certain gradient in which training is stopped at a certain layer) only by judging whether a certain layer is reached by a filter and continuing to generate signals.

Until that, a set of adaptive Kalman filters with multiple fading factors is embedded in the existing ResNet-50 neural network architecture to filter different semaphores in a network channel to judge the training progress of the model;

5, according to the above modified ResNet-50 neural network with embedded multiple elimination factor adaptive Kalman filter (hereinafter referred to as "Adaptation Kalman ResNet-50" or "AK ResNet-50"), putting the AK ResNet-50 in the step 3 of # 401: let this AK ResNet-50 network be in a larger range of metropolitan area networks: and searching an optimal path communicating the user app side and the Federal migration learning server side, returning an unreachable state code once a certain network path is unreachable, and updating the state code into a strategy for searching the optimal path through reinforcement learning.

501：

1 user uploads private data such as health information, physical examination information, etc. of app to our federal migration learning server via steps # 401. The user side app is responsible for training local data to obtain a local model, the central server is responsible for weighting and aggregating the local model to obtain a global model, and a model approaching to a centralized machine learning result is finally obtained after multiple iterations, so that a plurality of privacy risks brought by traditional machine learning source data aggregation are effectively reduced.

2, the Federal migration learning center server performs exclusive operation on the encrypted gradient information of the calculation participants (various app users) of different models: the gradient of the data of the different parts of each computing participant is computed and derived separately (in this case, each participant is not aware of the private domain message of the respective user).

The recommendation system of 3 makes recommendations based on the user's historical behavior, combined with data that is marked via a dietician.

And 4, desensitizing, encrypting and storing the obtained private data such as the recipe information of the user on a distributed server.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents or improvements made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A recommendation method of a diet plan based on the federal shift learning of homomorphic encryption comprises the following steps: the method is characterized by comprising the following steps:

101, a central server of the federal migration learning is responsible for generating an initial average gradient (iag for short) of model training of a certain user recommendation model (RECOM for short) at a certain time point; the initial time points of this model are defined as: s; the central server is provided with an in-memory computing chip (ReRAM); reRAM is a new type of non-volatile memory;

202: as the number of iterative rounds of federated migratory learning model training increases: more and more ReRAM MLCSI is established, the series of ReRAM MLCSI is constructed into a set of a dynamic shortest Path first protocol Table (OSPF Table for short), and the dynamic OSPF Table plays a role in connecting the ReRAM MLCSI with an L1/L2/L3 level cache and the shortest path of each external network side app end:

step two: preprocessing of user-side app data gradients (feature gradients) of FTL model:

301: assuming that as the initial time S passes, since the app-side data (i.e., the model feature dimensions of the user) of each user are different in the initial state, only federate migration learning (FTL model) is used to solve such a model with inconsistent data dimension distribution, and according to a time sliding window mechanism under a big data component Flink in the development environment, we can calculate a time average value of a certain piece of data generated by the user at a certain time point: such as: the time average of two time points of the user generated data is calculated: s +1= (S + 1) + (S + 2)/2;

302: since the feature dimensions of the app side of each user are different, and just so the federated migration learning can process the heterogeneous data, we do the following here:

at the level of these child nodes, < 1 >: each child node can respectively carry out operations such as aggregation, averaging and the like on the data gradient of the user side in the network area range;

in "2", the data gradient of the last child Node is uploaded to the Father Node Father _ Node _ HZ under certain conditions, and finally, the total data gradient < eigen gradient > is returned to the central server of the FTL model by selecting the optimal return path according to the following step 401.

Step three: finding an optimal path returned from the user side data gradient to the Federal mobility server by adopting a deep reinforcement learning method:

2, the Federal migration learning center server performs exclusive operation on the encrypted gradient information of the calculation participants (various app users) of different models: calculating and deducing the gradient of data of different parts of each calculation participant respectively;

2. The method for recommending a dietary plan for federal transfer learning based on homomorphic encryption as claimed in claim 1, wherein said 201 is characterized in that the initial average gradient is encrypted by RSA256, MD5 or SHA 1.

3. The method for recommending a dietary plan based on federal transfer learning with homomorphic encryption as claimed in claim 1, wherein the total data gradient in 302 is transmitted back to the central server of FTL model, and the conditions can be preset as follows: < a > a certain predetermined point in time; < b > some predetermined good gradient threshold; < c > the number of turns for some predetermined good iteration.

4. The method for recommending a dietary plan for Federal transfer learning based on homomorphic encryption according to claim 1, wherein we need to make a trade-off between transmission delay and model accuracy based on 401, and make the following innovation in the model parameter transmission of Federal transfer learning: the method is characterized in that deep reinforcement learning is utilized to model the connectivity (or called arrival rate) of a complex network, namely the network connectivity of a metropolitan area network is poorer than that of a traditional local area network, and a connection path transmitted from each user app side to a Federal migration learning center server is found through autonomous learning in a deep reinforcement learning mode.

5. The recommendation method for a dietary plan based on homomorphic encryption Federal transfer learning as claimed in claim 4, characterized by the following specific innovative actions:

similarly, if "1" is set at a certain time point S, N users are provided to operate various behaviors while using our app, and the N users constitute multi-agents (multi-agents) at the current time point, we set the initial states of the agents as S, and the initial environments of the agents in the S state as Environment [ ("E" ]):

these E |, include: the series of parameters constituting the initial training data set of the user side, whose feature gradient is represented by an encrypted status code for each of the N users at the time of S, is referred to as user initial context: e | _ user;

meanwhile, E |, also includes the ambient conditions & attributes of these N users, including the current network path situation, where we also encrypt the indicators in each network path situation, the series of encrypted status codes constitutes the ambient condition part of the initial environment E |, of these multi-agents at a certain S moment: e external, when the action (A) is not at the same time S, the reward (R) is 0, the discount rate is γ, the learning rate is α:

S⊙＝(agent,E⊙(user+external),A＝None,R＝0,γ＝0)；

"2" takes the lead of time to the next point: at S +1, the operation behaviors of the N users may prompt a series of actions and the following problems of corresponding reward value, discount rate, learning rate, and the like:

at S +1 time, the expected reward obtained when N users perform an action (abbreviated as "a") is: q pi (S, a) = E pi [ Rt | St = S =, at = a ];

at this point, after the series of actions is triggered (actions are defined herein as operations by a certain user), a series of reward values and environmental changes are generated: for example: how many hops (hops) the current location of the app of user M (M belongs to N) needs to go to reach the Federal migration learning Server (usually, there are many (countless) paths; for example, in a certain round of training: we select the first 100 (which can be customized according to the network connection situation) paths, and in these 100 paths, the number of arriving hops is different, and we will form some data, such as: the brand of a network service provider, the network speed, the longitude and latitude of an ip, whether other users (users using apps) exist in a certain area or not, and the network parameter characteristics form a data source for next-stage model training, namely a RawDataSet' network path;

"3" as the state-action value function in reinforcement learning obtained in the "2" step, in reinforcement learning: we find a value that maximizes this state-action value function, and the purpose of constructing this value is to find the optimal state value function from the optimal strategy;

"4" we introduced an improved version of the ResNet-50 (residual network) of the deep learning neural network in the reinforcement learning of this complex network path finding: the prototype of the network is composed of 3 blocks (Block) + as the last fully connected layer (total 50 layers of network), and the neural network is characterized in that: once a certain round of the model in the training is trapped in the local optimum and stagnates for training at a certain layer, the optimal gradient can be found in the network by a method similar to a multi-hop mechanism (can be colloquially called as an "identical shortcut key": one or more layers can be directly "hopped") so that the training of the model trapped in the local optimum is continued and the training is accelerated until the model converges;

based on the ResNet-50 model embedded into the original model, firstly, the model is initialized for training, and as time goes by, the original model of ResNet-50 gradually falls into a locally optimal state like the original model, so that temporary training stagnation is caused;

while we initialize the model: the method records the semaphore of each parameter of the model passing through the network at each moment, and creates a set of adaptive Kalman filter scheme with multiple fading factors to realize the separation of a real network signal, network noise and the like, wherein the network noise comprises the following steps: instantaneous flow of the network passing instantaneously, network abnormal data index parameters:

firstly, collecting communication signals in a network channel when a model is initialized (a certain time point is set as S), wherein respective state prediction estimation values of multidimensional signals in the channel can be obtained according to a Kalman filter (alternative: simultaneously drawing a visual graph as assistance);

correcting a next predicted mean square error matrix of the state estimation by selecting a proper fading factor so as to inhibit filtering divergence;

in the transmission of the semaphore in network communication (communication) in model learning, the transmission speed of the semaphore is faster than the transmission speed of actual training data and the gradient after encryption, and the semaphore of each layer passing through a neural network ResNet-50 is estimated; because training is to each layer (ResNet-50 for 50 layers): the network will return some training parameters, and the space occupied by these specific parameters in the memory or buffer is large, so that congestion may be caused by transmission back and forth, and now we can judge whether a certain layer is reached by a filter to generate signals continuously, and can deduce which layer the model falls into local optimum (which means a certain gradient that the training stagnates at a certain layer);

5, according to the above modified ResNet-50 neural network with embedded multi-fading factor adaptive Kalman filter (hereinafter referred to as "attachment Kalman ResNet-50" or "AK ResNet-50"), putting the AK ResNet-50 in step 3 of 401:

let this AK ResNet-50 network be in a larger range of metropolitan area networks: and searching an optimal path communicating the user app side and the Federal migration learning server side, returning an unreachable state code once a certain network path is unreachable, and updating the state code into a strategy for searching the optimal path through reinforcement learning.