CN114124784B - Intelligent routing decision protection method and system based on vertical federation - Google Patents

Intelligent routing decision protection method and system based on vertical federation Download PDF

Info

Publication number
CN114124784B
CN114124784B CN202210096691.4A CN202210096691A CN114124784B CN 114124784 B CN114124784 B CN 114124784B CN 202210096691 A CN202210096691 A CN 202210096691A CN 114124784 B CN114124784 B CN 114124784B
Authority
CN
China
Prior art keywords
training
state data
model
client
clients
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210096691.4A
Other languages
Chinese (zh)
Other versions
CN114124784A (en
Inventor
杨林
高先明
冯涛
张京京
陶沛琳
王雯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Network Engineering Institute of Systems Engineering Academy of Military Sciences
Original Assignee
Institute of Network Engineering Institute of Systems Engineering Academy of Military Sciences
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Network Engineering Institute of Systems Engineering Academy of Military Sciences filed Critical Institute of Network Engineering Institute of Systems Engineering Academy of Military Sciences
Priority to CN202210096691.4A priority Critical patent/CN114124784B/en
Publication of CN114124784A publication Critical patent/CN114124784A/en
Application granted granted Critical
Publication of CN114124784B publication Critical patent/CN114124784B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/02Topology update or discovery
    • H04L45/08Learning-based routing, e.g. using neural networks or artificial intelligence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention provides an intelligent route decision protection method and system based on vertical federation. The method comprises the following steps: step S1, sampling state data of the agent in the application scene is obtained through sampling, the sampling state data are divided into N groups of sampling sub-state data which are respectively sent to N clients, and N is not less than 2 and is a positive integer; step S2, each client in the N clients generates characteristic data of the sampling sub-state data by using the constructed client model based on the received sampling sub-state data and sends the characteristic data to the server; and step S3, the server side generates a routing decision for the whole task of the agent based on the N groups of received characteristic data from the N clients by using the constructed server side model.

Description

Intelligent routing decision protection method and system based on vertical federation
Technical Field
The invention belongs to the field of data processing for intelligent routing, and particularly relates to an intelligent routing decision protection method and system based on vertical federation.
Background
Under the background that the connection objects of the network system are quantized excessively and the connection relation is complicated, the traditional routing decision method based on manual configuration cannot configure the optimal routing decision within a limited time, and researchers are prompted to introduce an artificial intelligence algorithm into an intelligent routing decision process. With the successful application of deep reinforcement learning in the fields of robot control, game gaming, computer vision, unmanned driving and the like, researchers apply the deep reinforcement learning to the field of intelligent routing decision, and the aspects of network traffic scheduling efficiency, network resource allocation rationality and the like are improved.
Although the deep reinforcement learning can effectively improve the level of routing decision, the training process is easy to be attacked, so that the data of the training set is abnormal, the judgment or action selection of the intelligent routing on the decision in the learning process is further influenced, and the intelligent routing finally learns the action in the failure direction. In the field of security protection of intelligent routing decision models, a model protection technology facing deep reinforcement learning does not have much new progress, and how to protect the security of the intelligent routing decision models becomes an important challenge in the field of security application.
Disclosure of Invention
In order to solve the technical problems, the invention provides an intelligent routing decision protection scheme based on a vertical federation, and aims to protect a routing decision model based on deep reinforcement learning from being influenced by decision vulnerabilities or malicious attacks.
The invention discloses an intelligent route decision protection method based on vertical federation in a first aspect. The method comprises the following steps:
step S1, sampling state data of the agent in the application scene is obtained through sampling, the sampling state data are divided into N groups of sampling sub-state data which are respectively sent to N clients, and N is not less than 2 and is a positive integer;
step S2, each client in the N clients generates characteristic data of the sampling sub-state data by using the constructed client model based on the received sampling sub-state data and sends the characteristic data to the server;
and step S3, the server side generates a routing decision for the whole task of the agent based on the N groups of received characteristic data from the N clients by using the constructed server side model.
According to the method of the first aspect of the present invention, in step S2, the N client models are constructed to have the same model structure, each client model includes two client submodels, each client submodel also has the same model structure, and each client submodel includes two fully-connected layers and two activation function layers.
According to the method of the first aspect of the present invention, in step S3, the server side performs a splicing process on the N sets of feature data received from the N clients to obtain complete feature data, and the server side model generates a routing decision for an overall task of the agent according to the complete feature data, where the server side model includes a full connection layer and a Tanh activation function layer.
According to the method of the first aspect of the present invention, before the step S1 to the step S3, the method further comprises: step S0, pre-training the server-side model and the N client-side models, where the pre-training specifically includes:
step S0-1, acquiring training state data of the agent in the application scene through pre-sampling, wherein the training state data is divided into N groups of training sub-state data, adding interference noise representing malicious attacks into the kth group of training sub-state data in the N groups of training sub-state data, and then respectively sending the kth group of training sub-state data and other N-1 groups of training sub-state data to N clients, wherein k is more than or equal to 1 and less than or equal to N, and k is a positive integer;
step S0-2, each of the N clients generates training feature data of the training sub-state data by using the client model based on the received training sub-state data, and sends the training feature data to the server;
step S0-3, the server side generates a routing decision for a training task of the agent based on the N groups of received training feature data from the N clients by using the server side model;
s0-4, acquiring a real decision of a training task of the agent, and calculating a loss function based on a routing decision of the training task and the real decision of the training task;
step S0-5, the loss function is fed back to the N clients, the N clients repeat the steps S0-1 to S0-4 after receiving the loss function until the calculated loss function is lower than a threshold, and then execute the steps S1 to S3 using the pre-trained server-side model and the N client-side models.
According to the method of the first aspect of the present invention, in said step S0-4:
the loss function is expressed using the following formula:
Figure DEST_PATH_IMAGE002
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE004
a loss function representing a network of actions in the client model,
Figure DEST_PATH_IMAGE006
a loss function representing a discriminative network in the client model,
Figure DEST_PATH_IMAGE008
model parameters representing the client model;
the loss function of the action network is:
Figure DEST_PATH_IMAGE010
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE012
representing the state transition probability of the action network,
Figure DEST_PATH_IMAGE014
representing the previous state transition probability of the action network,
Figure DEST_PATH_IMAGE015
current model parameters representing the client model,
Figure DEST_PATH_IMAGE017
previous model parameters representing the client model,
Figure DEST_PATH_IMAGE019
representing an intercept function, intercepting
Figure DEST_PATH_IMAGE021
The value within the range is such that,
Figure DEST_PATH_IMAGE023
the representation of the hyper-parameter is,
Figure DEST_PATH_IMAGE025
representing time step
Figure DEST_PATH_IMAGE027
The advantage of the estimation of the time is,
Figure DEST_PATH_IMAGE029
representing the time step at a previous model parameter of the client model
Figure DEST_PATH_IMAGE027A
An estimated advantage of time;
the loss function of the discrimination network is:
Figure DEST_PATH_IMAGE031
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE033
is a function of the target value and,
Figure DEST_PATH_IMAGE035
is a predicted value of the number of the frames,
Figure DEST_PATH_IMAGE037
and
Figure DEST_PATH_IMAGE039
respectively represent the state and the action of the mobile phone,
Figure DEST_PATH_IMAGE041
and
Figure DEST_PATH_IMAGE043
representing a hyper-parameter.
According to the method of the first aspect of the invention, when the sampling state data and the training state data are obtained, a near-segment strategy optimization algorithm is adopted to collect the states, actions and reward values at a plurality of moments; the method specifically comprises the following steps: at a first moment, the agent obtains state data from a simulation environment of the application scene, the action network makes corresponding actions based on the state data, and the judgment network gives reward values for the actions made by the action network; at other times, the state, action and reward value at a certain time are acquired in the same way.
The invention discloses an intelligent route decision protection system based on vertical federation in a second aspect. The system comprises:
the state sampling module is configured to acquire sampling state data of the agent in an application scene through sampling, the sampling state data is divided into N groups of sampling sub-state data and is respectively sent to N clients, and N is greater than or equal to 2 and is a positive integer;
the characteristic generating module is configured to generate characteristic data of the sampling sub-state data by utilizing the constructed client model based on the sampling sub-state data received by each of the N clients and send the characteristic data to the server;
a routing decision module configured to generate, by using the constructed server-side model, a routing decision for the overall task of the agent based on the N sets of feature data received by the server side from the N clients.
According to the system of the second aspect of the present invention, the N constructed client models have the same model structure, each client model includes two client submodels, each client submodel also has the same model structure, and each client submodel includes two fully-connected layers and two activation function layers.
According to the system of the second aspect of the present invention, the server-side module performs a splicing process on the N sets of feature data received from the N clients to obtain complete feature data, and generates a routing decision for an overall task of the agent according to the complete feature data, where the server-side module includes a full connection layer and a Tanh activation function layer.
A system according to the second aspect of the invention, the system comprising: a preprocessing module configured to pre-train the server-side model and the N client-side models, where the pre-training specifically includes:
acquiring training state data of the agent in the application scene through pre-sampling, wherein the training state data is divided into N groups of training sub-state data, adding interference noise representing malicious attacks into the kth group of training sub-state data in the N groups of training sub-state data, and then respectively sending the kth group of training sub-state data and other N-1 groups of training sub-state data to N clients, wherein k is more than or equal to 1 and is less than or equal to N, and k is a positive integer;
each client side in the N client sides generates training characteristic data of the training sub-state data by utilizing the client side model based on the received training sub-state data and sends the training characteristic data to the server side;
the server side generates a routing decision aiming at a training task of the agent based on N groups of training characteristic data received from the N clients by utilizing the server side model;
acquiring a real decision of a training task of the agent, and calculating a loss function based on a routing decision of the training task and the real decision of the training task;
and feeding the loss function back to the N clients, and repeating the steps after the N clients receive the loss function until the calculated loss function is lower than a threshold value.
According to the system of the second aspect of the invention, the loss function is expressed by the following formula:
Figure DEST_PATH_IMAGE002A
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE044
a loss function representing a network of actions in the client model,
Figure DEST_PATH_IMAGE006A
a loss function representing a discriminative network in the client model,
Figure DEST_PATH_IMAGE045
model parameters representing the client model;
the loss function of the action network is:
Figure DEST_PATH_IMAGE010A
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE012A
representing the state transition probability of the action network,
Figure DEST_PATH_IMAGE014A
representing the previous state transition probability of the action network,
Figure DEST_PATH_IMAGE046
current model parameters representing the client model,
Figure DEST_PATH_IMAGE017A
previous model parameters representing the client model,
Figure DEST_PATH_IMAGE019A
representing an intercept function, intercepting
Figure DEST_PATH_IMAGE047
The value within the range is such that,
Figure DEST_PATH_IMAGE048
the representation of the hyper-parameter is,
Figure DEST_PATH_IMAGE049
representing time step
Figure DEST_PATH_IMAGE027AA
The advantage of the estimation of the time is,
Figure DEST_PATH_IMAGE029A
representing the time step at a previous model parameter of the client model
Figure DEST_PATH_IMAGE027AAA
An estimated advantage of time;
the loss function of the discrimination network is:
Figure DEST_PATH_IMAGE050
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE051
is a function of the target value and,
Figure DEST_PATH_IMAGE035A
is a predicted value of the number of the frames,
Figure DEST_PATH_IMAGE052
and
Figure DEST_PATH_IMAGE039A
respectively represent the state and the action of the mobile phone,
Figure DEST_PATH_IMAGE053
and
Figure DEST_PATH_IMAGE054
representing a hyper-parameter.
According to the system of the second aspect of the invention, when the sampling state data and the training state data are obtained, a near-segment strategy optimization algorithm is adopted to collect the states, actions and reward values at a plurality of moments; the method specifically comprises the following steps: at a first moment, the agent obtains state data from a simulation environment of the application scene, the action network makes corresponding actions based on the state data, and the judgment network gives reward values for the actions made by the action network; at other times, the state, action and reward value at a certain time are acquired in the same way.
A third aspect of the invention discloses an electronic device. The electronic device comprises a memory and a processor, wherein the memory stores a computer program, and the processor implements the steps of the intelligent route decision protection method based on the vertical federation in the first aspect of the present invention when executing the computer program.
A fourth aspect of the invention discloses a computer-readable storage medium. The computer readable storage medium stores thereon a computer program, which when executed by a processor implements the steps of the vertical federation-based intelligent route decision protection method of the first aspect of the present invention.
In summary, the technical scheme of the invention is based on a vertical federation model and a data protection function, a reinforcement learning framework based on the vertical federation is designed, training of the model is divided into a local client and a server, the number of the clients is arbitrary, different clients respectively take different feature data for training, and simultaneously, data uploaded to the server only has features, so that an attacker can be confused that the whole strategy model cannot be equivalently divided into input features even if the attacker takes input and output of a certain client, and the divided features are divided into different clients for training. By adopting the method and the device, an attacker can hardly steal the complete training task of the intelligent routing decision and cannot steal the whole intelligent routing decision model, so that the purpose of protecting the intelligent routing decision model is achieved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description in the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a flowchart of an intelligent routing decision protection method based on vertical federation according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a vertical federation architecture according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a near-end policy optimization algorithm according to an embodiment of the present invention;
FIG. 4 is a block diagram of a vertical federation based intelligent routing decision protection system in accordance with an embodiment of the present invention;
fig. 5 is a block diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention discloses an intelligent route decision protection method based on vertical federation in a first aspect. Fig. 1 is a flowchart of an intelligent routing decision protection method based on vertical federation according to an embodiment of the present invention; as shown in fig. 1, the method includes: step S1, acquiring sampling state data of the agent in the application scene through sampling, wherein the sampling state data are divided into N groups of sampling sub-state data which are respectively sent to N clients, and N is not less than 2 and is a positive integer; step S2, each client in the N clients generates characteristic data of the sampling sub-state data by using the constructed client model based on the received sampling sub-state data and sends the characteristic data to the server; and step S3, the server side generates a routing decision for the whole task of the agent based on the N groups of received characteristic data from the N clients by using the constructed server side model.
FIG. 2 is a schematic diagram of a vertical federation architecture according to an embodiment of the present invention; as shown in FIG. 2, where the solid lines represent forward propagation and the dashed lines represent backward propagation, the simulation environment may be a variety of reinforcement learning scenarios.
In step S1, sampling status data of the agent in the application scenario is obtained through sampling, the sampling status data is divided into N groups of sampling sub-status data, and the N groups of sampling sub-status data are respectively sent to N clients, where N is greater than or equal to 2 and is a positive integer.
In step S2, each of the N clients generates feature data of the sampled sub-state data by using the constructed client model based on the received sampled sub-state data, and sends the feature data to the server.
In some embodiments, in the step S2, the N client models are constructed to have the same model structure, each client model includes two client submodels, each client submodel also has the same model structure, and each client submodel includes two fully-connected layers and two activation function layers.
Specifically, a traditional deep reinforcement learning model is divided into a plurality of clients and a server, two clients are designed, the sampled state is split and distributed to the clients, the clients take data with different characteristics to establish a vertical federal environment, the clients need to perform data processing (characteristic extraction) locally after taking the data, the characteristic extraction can adopt a principal component analysis method, a multidimensional scale analysis method and the like, and the characteristics are output through a client model and sent to the server. And then, building a client model and a server model, wherein each client model has a consistent structure and consists of two submodels, the submodels have consistent structures and comprise two layers of full connection and two layers of activation functions, and the server model comprises a full connection layer.
In step S3, the server generates a routing decision for the overall task of the agent based on the N sets of feature data received from the N clients using the constructed server model.
In some embodiments, in step S3, the server side performs a splicing process on the N sets of feature data received from the N clients to obtain complete feature data, and the server side model generates a routing decision for an overall task of the agent according to the complete feature data, where the server side model includes a full connection layer and a Tanh activation function layer.
Specifically, feature information output by each client model is aggregated at the server side, where the aggregator performs a splicing operation on features transmitted by the server side. And uploading the characteristics output by the local model to a server, and enabling the server to aggregate data by using an aggregator and then put the data into a server model for processing so as to generate a routing decision for the overall task of the intelligent agent.
In some embodiments, before the step S1 to the step S3, the method further comprises: step S0, pre-training the server-side model and the N client-side models, where the pre-training specifically includes:
step S0-1, acquiring training state data of the agent in the application scene through pre-sampling, wherein the training state data is divided into N groups of training sub-state data, adding interference noise representing malicious attacks into the kth group of training sub-state data in the N groups of training sub-state data, and then respectively sending the kth group of training sub-state data and other N-1 groups of training sub-state data to N clients, wherein k is more than or equal to 1 and less than or equal to N, and k is a positive integer;
step S0-2, each of the N clients generates training feature data of the training sub-state data by using the client model based on the received training sub-state data, and sends the training feature data to the server;
step S0-3, the server side generates a routing decision for a training task of the agent based on the N groups of received training feature data from the N clients by using the server side model;
s0-4, acquiring a real decision of a training task of the agent, and calculating a loss function based on a routing decision of the training task and the real decision of the training task;
step S0-5, the loss function is fed back to the N clients, the N clients repeat the steps S0-1 to S0-4 after receiving the loss function until the calculated loss function is lower than a threshold, and then execute the steps S1 to S3 using the pre-trained server-side model and the N client-side models.
In some embodiments, in said step S0-4:
the loss function is expressed using the following formula:
Figure DEST_PATH_IMAGE002AA
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE044A
a loss function representing a network of actions in the client model,
Figure DEST_PATH_IMAGE006AA
a loss function representing a discriminative network in the client model,
Figure DEST_PATH_IMAGE045A
model parameters representing the client model;
the loss function of the action network is:
Figure DEST_PATH_IMAGE010AA
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE012AA
representing the state transition probability of the action network,
Figure DEST_PATH_IMAGE014AA
representing the previous state transition probability of the action network,
Figure DEST_PATH_IMAGE046A
current model parameters representing the client model,
Figure DEST_PATH_IMAGE017AA
previous model parameters representing the client model,
Figure DEST_PATH_IMAGE019AA
representing an intercept function, intercepting
Figure DEST_PATH_IMAGE055
The value within the range is such that,
Figure DEST_PATH_IMAGE048A
the representation of the hyper-parameter is,
Figure DEST_PATH_IMAGE049A
representing time step
Figure DEST_PATH_IMAGE027AAAA
The advantage of the estimation of the time is,
Figure DEST_PATH_IMAGE029AA
representing the time step at a previous model parameter of the client model
Figure DEST_PATH_IMAGE027_5A
An estimated advantage of time;
the loss function of the discrimination network is:
Figure DEST_PATH_IMAGE050A
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE051A
is a function of the target value and,
Figure DEST_PATH_IMAGE035AA
is a predicted value of the number of the frames,
Figure DEST_PATH_IMAGE052A
and
Figure DEST_PATH_IMAGE039AA
respectively represent the state and the action of the mobile phone,
Figure DEST_PATH_IMAGE053A
and
Figure DEST_PATH_IMAGE054A
representing a hyper-parameter.
Specifically, considering the attacks that may exist in the testing stage, the trained models are distributed in various places and are difficult to be manipulated simultaneously, and if an attacker can take one of the client models and noise the input through various attack strategies, the whole task is difficult to be greatly influenced by the operation. Therefore, in the training phase, the interference noise which characterizes the malicious attack is added to one of the clients, and in other embodiments, the interference noise which characterizes the malicious attack can also be added to more than one client.
Each client model also updates the model parameters by using the loss feedback of the server model. Although the server-side model training loss function is similar to a near-end Policy Optimization (PPO) model, the network models are different, and here, both the action network and the evaluation network of the server side are constructed by a layer of full-link plus a Tanh activation function.
In some embodiments, when the sampling state data and the training state data are obtained, a near-segment strategy optimization algorithm is adopted to collect states, actions and reward values at a plurality of moments; the method specifically comprises the following steps: at a first moment, the agent obtains state data from a simulation environment of the application scene, the action network makes corresponding actions based on the state data, and the judgment network gives reward values for the actions made by the action network; at other times, the state, action and reward value at a certain time are acquired in the same way.
Specifically, taking PPO as an example, an observation data set is generated; FIG. 3 is a schematic structural diagram of a near-end policy optimization algorithm (PPO) according to an embodiment of the present invention; as shown in fig. 3, reinforcement learning is mainly to continuously optimize decisions by observing the surrounding environment, taking optimal actions, and obtaining feedback. Collecting state, action and reward value pairs of N moments from training scene
Figure DEST_PATH_IMAGE057
. And taking the data set as a sample set to be trained. The target model selects a Deep Reinforcement Learning (DRL) model based on the PPO algorithm, and performs attack defense based on the model, where the DRL model based on the PPO algorithm is shown in fig. 2. The model decision process consists of tuples
Figure DEST_PATH_IMAGE059
Therein is described
Figure DEST_PATH_IMAGE061
In the case of a limited set of states,
Figure DEST_PATH_IMAGE063
in order to be a limited set of actions,Pin order to be a probability of a state transition,Rin order to be a function of the reward,
Figure DEST_PATH_IMAGE065
is used to calculate a long-term cumulative return for the discount factor. The agent needs to interact with the environment continuously in the DRL model training, and the agent is in the current stateS t Temporal agent takes action according to learned policyA t . At the same time, the environment will feed back a reward value to the agent
Figure DEST_PATH_IMAGE067
To evaluate the quality of the current action. PPO uses significance sampling to solve the problem that when one wants to sample from one distribution, it is difficult to sample, so it is proposed to sample from another distribution that is easy to sample. When PPO combines importance sampling and action-discrimination framework, the agent is composed of two parts, one part is action and is responsible for interacting with environment to collect samples, and the other part is discrimination and is responsible for judging the quality of action.
And (3) updating the action network, wherein the action can be updated by using a PPO gradient update formula:
Figure DEST_PATH_IMAGE069
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE071
is a parameter of the policy that is,
Figure DEST_PATH_IMAGE073
an empirical expectation of the step of time is referred to,
Figure DEST_PATH_IMAGE075
refers to the state transition probability of the action network that needs to be trained,
Figure DEST_PATH_IMAGE077
refers to the old action network state transition probability,
Figure DEST_PATH_IMAGE079
is a hyper-parameter, usually taking the value 0.1 or 0.2,
Figure DEST_PATH_IMAGE081
is the time steptThe estimated advantage of the time, the calculation formula of the merit function is:
Figure DEST_PATH_IMAGE083
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE085
Figure DEST_PATH_IMAGE087
is thattThe judgment network of the time obtains the function of the state value,r t is thattThe time of day prize value.
And updating the discriminant network, wherein the other one of the PPO models needing to be updated is the discriminant network, and the partial network loss function is calculated as follows:
Figure DEST_PATH_IMAGE089
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE091
is a function of the target value and,
Figure DEST_PATH_IMAGE093
is a predicted value of the number of the frames,sandarespectively, state and action, the updated network parameters are propagated back through this loss function.
Client model stealing attacks in a training phase (or testing phase)
In order to improve the effect of model stealing, the model structure of the stealing model selects the input DQN which is the same as that of the target model.
(1) Stealing datasets
The trained deep reinforcement learning model is used as a target model, the sampling state action pair is used as a stealing data set in the testing stage, and the model is used as a training sample of an equivalent model.
(2) Training equivalent models
Training an equivalent strategy by using imitation learning on the basis of stealing data, replacing a generator G with an action network in the training process of the imitation strategy, inputting the output action and state of the generator G into a discriminator in pair, comparing the output action and state with expert data, and judging the output action and state of the generator G by the discriminator
Figure DEST_PATH_IMAGE095
As a reward value to guide strategy learning that mimics learning. Thus, the discriminator loss function in the mock learning can be expressed as:
Figure DEST_PATH_IMAGE097
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE099
a strategy that mimics the result of learning is shown,
Figure DEST_PATH_IMAGE101
an expert strategy representing the sampling. In the first item
Figure DEST_PATH_IMAGE103
Representing the judgment of the arbiter on the real data, second term
Figure DEST_PATH_IMAGE105
The judgment of the generated data is shown, and the G and the D are circularly and alternately optimized to train the required action network and the required judgment network through the maximum and minimum game process.
In the training process, parameters of the discriminant network and the action network are updated reversely by minimizing a loss function through gradient derivation, wherein the loss function is as follows:
Figure DEST_PATH_IMAGE107
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE109
is to imitate a strategy
Figure DEST_PATH_IMAGE099A
Entropy of, by a constant
Figure DEST_PATH_IMAGE111
Control ofAs a strategic regularization term in the loss function; and then generating a target model for resisting sample attack by using the trained equivalent model.
Analysis of defense feasibility
Federal learning aims at building a federal learning model based on distributed data sets. During model training, model-related information can be exchanged between the parties, but the raw data cannot. This exchange does not expose any protected private portion of the data at each site. The trained federated learning model can be placed in each participant of the federated learning system, and can also be shared among multiple parties, so that private information is protected. The vertical federation only uploads the features processed by the model to the server based on the characteristic that the overlap of the data features of the client is low, so that the model and the data privacy are well protected; the method has the advantages that the model protection is improved well, an attacker cannot learn an approximate strategy if the attacker only is equivalent to a single client model, the total task cannot be acquired, and the attack on the single client model cannot greatly influence the total task.
Specific examples
Assuming a vertical federated scenario, the original integer isxThere are two clients whose data are respectivelyx 1 Andx 2 and is alsox 1 Andx 2 there is no feature overlap. With additional client model
Figure DEST_PATH_IMAGE113
Client model
Figure DEST_PATH_IMAGE115
And server side model
Figure DEST_PATH_IMAGE117
. The model attacker carries out model attack at the client, supposing that the attacker can take the data model of one of the clients, the attacker interferes the input of the current client model through various strategies, and the model after interference is executed as follows:
Figure DEST_PATH_IMAGE119
wherein
Figure DEST_PATH_IMAGE121
Is a noise, and the noise is,x 1 x 2 are the inputs of the two clients respectively,
Figure DEST_PATH_IMAGE123
is the operation of the feature connection. The input change of the client model at this time is
Figure DEST_PATH_IMAGE125
It is obvious ifxHas the dimension ofaThenx 1 Andx 2 will all be ofa/2If, ifaIs small enough that the noise is at this time
Figure DEST_PATH_IMAGE126
The influence is large, and if the noise is larger than a certain threshold value, the influence on the whole is small. And if there isnEach client model having an input dimension ofa/nIf, ifnLarge enough, one client is disturbed by noise and does not have a significant impact on the overall task. Therefore, the larger the dimension of the input characteristic of the model client is, the smaller the dimension of the input characteristic of the model client is, the stronger the defense capability of the model client is.
The invention discloses an intelligent route decision protection system based on vertical federation in a second aspect. FIG. 4 is a block diagram of a vertical federation based intelligent routing decision protection system in accordance with an embodiment of the present invention; as shown in fig. 4, the system 400 includes:
the state sampling module 401 is configured to obtain sampling state data of an agent in an application scene through sampling, wherein the sampling state data is divided into N groups of sampling sub-state data and respectively sent to N clients, and N is greater than or equal to 2 and is a positive integer;
a feature generation module 402, configured to generate feature data of the sampled sub-state data by using the constructed client model based on the sampled sub-state data received by each of the N clients, and send the feature data to the server;
a routing decision module 403 configured to generate, by using the constructed server-side model, a routing decision for the overall task of the agent based on the N sets of feature data received by the server side from the N clients.
According to the system of the second aspect of the present invention, the N constructed client models have the same model structure, each client model includes two client submodels, each client submodel also has the same model structure, and each client submodel includes two fully-connected layers and two activation function layers.
According to the system of the second aspect of the present invention, the server-side module performs a splicing process on the N sets of feature data received from the N clients to obtain complete feature data, and generates a routing decision for an overall task of the agent according to the complete feature data, where the server-side module includes a full connection layer and a Tanh activation function layer.
A system according to the second aspect of the invention, the system comprising: a preprocessing module 404 configured to pre-train the server-side model and the N client-side models, where the pre-training specifically includes:
acquiring training state data of the agent in the application scene through pre-sampling, wherein the training state data is divided into N groups of training sub-state data, adding interference noise representing malicious attacks into the kth group of training sub-state data in the N groups of training sub-state data, and then respectively sending the kth group of training sub-state data and other N-1 groups of training sub-state data to N clients, wherein k is more than or equal to 1 and is less than or equal to N, and k is a positive integer;
each client side in the N client sides generates training characteristic data of the training sub-state data by utilizing the client side model based on the received training sub-state data and sends the training characteristic data to the server side;
the server side generates a routing decision aiming at a training task of the agent based on N groups of training characteristic data received from the N clients by utilizing the server side model;
acquiring a real decision of a training task of the agent, and calculating a loss function based on a routing decision of the training task and the real decision of the training task;
and feeding the loss function back to the N clients, and repeating the steps after the N clients receive the loss function until the calculated loss function is lower than a threshold value.
According to the system of the second aspect of the invention, the loss function is expressed by the following formula:
Figure DEST_PATH_IMAGE002AAA
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE044AA
a loss function representing a network of actions in the client model,
Figure DEST_PATH_IMAGE006AAA
a loss function representing a discriminative network in the client model,
Figure DEST_PATH_IMAGE045AA
model parameters representing the client model;
the loss function of the action network is:
Figure DEST_PATH_IMAGE010AAA
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE012AAA
representing the action networkThe probability of the state transition of (a),
Figure DEST_PATH_IMAGE014AAA
representing the previous state transition probability of the action network,
Figure DEST_PATH_IMAGE046AA
current model parameters representing the client model,
Figure DEST_PATH_IMAGE017AAA
previous model parameters representing the client model,
Figure DEST_PATH_IMAGE019AAA
representing an intercept function, intercepting
Figure DEST_PATH_IMAGE021A
The value within the range is such that,
Figure DEST_PATH_IMAGE048AA
the representation of the hyper-parameter is,
Figure DEST_PATH_IMAGE049AA
representing time step
Figure DEST_PATH_IMAGE027_6A
The advantage of the estimation of the time is,
Figure DEST_PATH_IMAGE029AAA
representing the time step at a previous model parameter of the client model
Figure DEST_PATH_IMAGE027_7A
An estimated advantage of time;
the loss function of the discrimination network is:
Figure DEST_PATH_IMAGE050AA
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE051AA
is a function of the target value and,
Figure DEST_PATH_IMAGE035AAA
is a predicted value of the number of the frames,
Figure DEST_PATH_IMAGE052AA
and
Figure DEST_PATH_IMAGE039AAA
respectively represent the state and the action of the mobile phone,
Figure DEST_PATH_IMAGE053AA
and
Figure DEST_PATH_IMAGE054AA
representing a hyper-parameter.
According to the system of the second aspect of the invention, when the sampling state data and the training state data are obtained, a near-segment strategy optimization algorithm is adopted to collect the states, actions and reward values at a plurality of moments; the method specifically comprises the following steps: at a first moment, the agent obtains state data from a simulation environment of the application scene, the action network makes corresponding actions based on the state data, and the judgment network gives reward values for the actions made by the action network; at other times, the state, action and reward value at a certain time are acquired in the same way.
A third aspect of the invention discloses an electronic device. The electronic device comprises a memory and a processor, wherein the memory stores a computer program, and the processor implements the steps of the intelligent route decision protection method based on the vertical federation in the first aspect of the present invention when executing the computer program.
FIG. 5 is a block diagram of an electronic device according to an embodiment of the invention; as shown in fig. 5, the electronic apparatus includes a processor, a memory, a communication interface, a display screen, and an input device connected through a system bus. Wherein the processor of the electronic device is configured to provide computing and control capabilities. The memory of the electronic equipment comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the electronic device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, an operator network, Near Field Communication (NFC) or other technologies. The display screen of the electronic equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the electronic equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the electronic equipment, an external keyboard, a touch pad or a mouse and the like.
It will be understood by those skilled in the art that the structure shown in fig. 5 is only a partial block diagram related to the technical solution of the present disclosure, and does not constitute a limitation of the electronic device to which the solution of the present application is applied, and a specific electronic device may include more or less components than those shown in the drawings, or combine some components, or have a different arrangement of components.
A fourth aspect of the invention discloses a computer-readable storage medium. The computer readable storage medium stores thereon a computer program, which when executed by a processor implements the steps of the vertical federation-based intelligent route decision protection method of the first aspect of the present invention.
In summary, the technical scheme of the invention is based on a vertical federation model and a data protection function, a reinforcement learning framework based on the vertical federation is designed, training of the model is divided into a local client and a server, the number of the clients is arbitrary, different clients respectively take different feature data for training, and simultaneously, data uploaded to the server only has features, so that an attacker can be confused that the whole strategy model cannot be equivalently divided into input features even if the attacker takes input and output of a certain client, and the divided features are divided into different clients for training. By adopting the method and the device, an attacker can hardly steal the complete training task of the intelligent routing decision and cannot steal the whole intelligent routing decision model, so that the purpose of protecting the intelligent routing decision model is achieved.
In the deep reinforcement learning training process, a trained model has great potential safety hazard, the model and data are easily utilized maliciously by an attacker, the attacker can train an equivalent model according to an input state and an output action to generate a malicious sample to influence the decision of a target intelligent agent, based on the situation, by taking the vertical federal starvation model and a data protection function as a reference, a reinforcement learning framework based on the vertical federal is designed, the training of the model is divided into a local client and a server, the number of the clients is arbitrary, different clients respectively take different feature data for training, and simultaneously, the data uploaded to the server only has features, so that the attacker can be confused that the input and the output of a certain client cannot be equivalent to form an integral strategy model, and the functions of model and data protection are achieved.
The invention has the following beneficial effects: a depth reinforcement learning model protection method based on vertical federation is provided for the poisoning of the depth reinforcement learning model; not only can the model be protected, but also the data can be protected; splitting an input state in a reinforcement learning training process to ensure that a client side takes data with different characteristic distributions, so that the data and a model are protected; the method has good applicability, can effectively detect model poisoning, and does not influence the execution of normal strategies.
It should be noted that the technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, however, as long as there is no contradiction between the combinations of the technical features, the scope of the present description should be considered. The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. An intelligent routing decision protection method based on vertical federation is characterized in that the method comprises the following steps:
step S1, sampling state data of the agent in the application scene is obtained through sampling, the sampling state data are divided into N groups of sampling sub-state data which are respectively sent to N clients, and N is not less than 2 and is a positive integer;
step S2, each client in the N clients generates characteristic data of the sampling sub-state data by using the constructed client model based on the received sampling sub-state data and sends the characteristic data to the server;
and step S3, the server side generates a routing decision for the whole task of the agent based on the N groups of received characteristic data from the N clients by using the constructed server side model.
2. The method according to claim 1, wherein in step S2, the N constructed client models have the same model structure, each client model includes two client submodels, each client submodel also has the same model structure, and each client submodel includes two full-connectivity layers and two activation function layers.
3. The method according to claim 2, wherein in step S3, the server side splices the N sets of feature data received from the N clients to obtain complete feature data, and the server side model generates a routing decision for an overall task of the agent according to the complete feature data, and the server side model includes a full connection layer and a Tanh activation function layer.
4. The method according to claim 3, wherein before the step S1-S3, the method further comprises: step S0, pre-training the server-side model and the N client-side models, where the pre-training specifically includes:
step S0-1, acquiring training state data of the agent in the application scene through pre-sampling, wherein the training state data is divided into N groups of training sub-state data, adding interference noise representing malicious attacks into the kth group of training sub-state data in the N groups of training sub-state data, and then respectively sending the kth group of training sub-state data and other N-1 groups of training sub-state data to N clients, wherein k is more than or equal to 1 and less than or equal to N, and k is a positive integer;
step S0-2, each of the N clients generates training feature data of the training sub-state data by using the client model based on the received training sub-state data, and sends the training feature data to the server;
step S0-3, the server side generates a routing decision for a training task of the agent based on the N groups of received training feature data from the N clients by using the server side model;
s0-4, acquiring a real decision of a training task of the agent, and calculating a loss function based on a routing decision of the training task and the real decision of the training task;
step S0-5, the loss function is fed back to the N clients, the N clients repeat the steps S0-1 to S0-4 after receiving the loss function until the calculated loss function is lower than a threshold, and then execute the steps S1 to S3 using the pre-trained server-side model and the N client-side models.
5. The method for intelligent vertical federation-based routing decision protection according to claim 4, wherein in the step S0-4:
the loss function is expressed using the following formula:
Figure DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 419964DEST_PATH_IMAGE002
a loss function representing a network of actions in the client model,
Figure DEST_PATH_IMAGE003
a loss function representing a discriminative network in the client model,
Figure 976585DEST_PATH_IMAGE004
current model parameters representing the client model;
the loss function of the action network is:
Figure DEST_PATH_IMAGE005
wherein the content of the first and second substances,
Figure 897268DEST_PATH_IMAGE006
representing the state transition probability of the action network,
Figure DEST_PATH_IMAGE007
representing the previous state transition probability of the action network,
Figure 666378DEST_PATH_IMAGE008
current model parameters representing the client model,
Figure DEST_PATH_IMAGE009
previous model parameters representing the client model,
Figure DEST_PATH_IMAGE011
representing an intercept function, intercepting
Figure 559379DEST_PATH_IMAGE012
The value within the range is such that,
Figure DEST_PATH_IMAGE013
the representation of the hyper-parameter is,
Figure 755743DEST_PATH_IMAGE014
representing time step
Figure 960459DEST_PATH_IMAGE015
The advantage of the estimation of the time is,
Figure DEST_PATH_IMAGE016
representing the time step at a previous model parameter of the client model
Figure 674206DEST_PATH_IMAGE015
An estimated advantage of time;
the loss function of the discrimination network is:
Figure 546347DEST_PATH_IMAGE017
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE018
is a function of the target value and,
Figure 937050DEST_PATH_IMAGE019
is a predicted value of the number of the frames,sand
Figure DEST_PATH_IMAGE020
respectively represent the state and the action of the mobile phone,
Figure 566746DEST_PATH_IMAGE021
and
Figure 772599DEST_PATH_IMAGE023
representing a hyper-parameter.
6. The intelligent route decision protection method based on vertical federation of claims 5, wherein a near-segment policy optimization algorithm is adopted to collect the states at multiple times when the sampling state data and the training state data are obtainedsAnd act of
Figure DEST_PATH_IMAGE024
A reward value; the method specifically comprises the following steps: at a first moment, the agent obtains state data from a simulation environment of the application scene, the action network makes corresponding actions based on the state data, and the judgment network gives reward values for the actions made by the action network; at other times, the state, action and reward value at a certain time are acquired in the same way.
7. A vertical federation-based intelligent routing decision protection system, the system comprising:
the state sampling module is configured to acquire sampling state data of the agent in an application scene through sampling, the sampling state data is divided into N groups of sampling sub-state data and is respectively sent to N clients, and N is greater than or equal to 2 and is a positive integer;
the characteristic generating module is configured to generate characteristic data of the sampling sub-state data by utilizing the constructed client model based on the sampling sub-state data received by each of the N clients and send the characteristic data to the server;
a routing decision module configured to generate, by using the constructed server-side model, a routing decision for the overall task of the agent based on the N sets of feature data received by the server side from the N clients.
8. A vertical federation-based intelligent routing decision protection system, the system comprising: the preprocessing module is configured to pre-train the server-side model and the N client-side models, where the pre-training specifically includes:
acquiring training state data of the agent in an application scene through pre-sampling, wherein the training state data is divided into N groups of training sub-state data, adding interference noise representing malicious attacks into the kth group of training sub-state data in the N groups of training sub-state data, and then respectively sending the kth group of training sub-state data and other N-1 groups of training sub-state data to N clients, wherein k is more than or equal to 1 and is less than or equal to N, and k is a positive integer;
each client side in the N client sides generates training characteristic data of the training sub-state data by utilizing the client side model based on the received training sub-state data and sends the training characteristic data to the server side;
the server side generates a routing decision aiming at a training task of the agent based on N groups of training characteristic data received from the N clients by utilizing the server side model;
acquiring a real decision of a training task of the agent, and calculating a loss function based on a routing decision of the training task and the real decision of the training task;
and feeding the loss function back to the N clients, and repeating the steps after the N clients receive the loss function until the calculated loss function is lower than a threshold value.
9. An electronic device, comprising a memory and a processor, wherein the memory stores a computer program, and the processor implements the steps of the method for protecting intelligent vertical federation-based routing decisions of any one of claims 1 to 6 when executing the computer program.
10. A computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method for vertical federation-based intelligent routing decision protection of any one of claims 1 to 6.
CN202210096691.4A 2022-01-27 2022-01-27 Intelligent routing decision protection method and system based on vertical federation Active CN114124784B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210096691.4A CN114124784B (en) 2022-01-27 2022-01-27 Intelligent routing decision protection method and system based on vertical federation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210096691.4A CN114124784B (en) 2022-01-27 2022-01-27 Intelligent routing decision protection method and system based on vertical federation

Publications (2)

Publication Number Publication Date
CN114124784A CN114124784A (en) 2022-03-01
CN114124784B true CN114124784B (en) 2022-04-12

Family

ID=80361987

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210096691.4A Active CN114124784B (en) 2022-01-27 2022-01-27 Intelligent routing decision protection method and system based on vertical federation

Country Status (1)

Country Link
CN (1) CN114124784B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109802981A (en) * 2017-11-17 2019-05-24 车伯乐(北京)信息科技有限公司 A kind of configuration method of global data, apparatus and system
CN111786713A (en) * 2020-06-04 2020-10-16 大连理工大学 Unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning
CN112182982A (en) * 2020-10-27 2021-01-05 北京百度网讯科技有限公司 Multi-party combined modeling method, device, equipment and storage medium
CN113191484A (en) * 2021-04-25 2021-07-30 清华大学 Federal learning client intelligent selection method and system based on deep reinforcement learning

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1887484B1 (en) * 2002-11-06 2009-10-14 Tellique Kommunikationstechnik GmbH Method for pre-transmission of structured data sets between a client device and a server device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109802981A (en) * 2017-11-17 2019-05-24 车伯乐(北京)信息科技有限公司 A kind of configuration method of global data, apparatus and system
CN111786713A (en) * 2020-06-04 2020-10-16 大连理工大学 Unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning
CN112182982A (en) * 2020-10-27 2021-01-05 北京百度网讯科技有限公司 Multi-party combined modeling method, device, equipment and storage medium
CN113191484A (en) * 2021-04-25 2021-07-30 清华大学 Federal learning client intelligent selection method and system based on deep reinforcement learning

Also Published As

Publication number Publication date
CN114124784A (en) 2022-03-01

Similar Documents

Publication Publication Date Title
CN113408743B (en) Method and device for generating federal model, electronic equipment and storage medium
CN113609521B (en) Federated learning privacy protection method and system based on countermeasure training
Ma et al. On safeguarding privacy and security in the framework of federated learning
CN111461226A (en) Countermeasure sample generation method, device, terminal and readable storage medium
CN112884131A (en) Deep reinforcement learning strategy optimization defense method and device based on simulation learning
CN110852448A (en) Cooperative intelligent agent learning method based on multi-intelligent agent reinforcement learning
CN112884130A (en) SeqGAN-based deep reinforcement learning data enhanced defense method and device
CN113645197B (en) Decentralized federal learning method, device and system
CN113255936A (en) Deep reinforcement learning strategy protection defense method and device based on simulation learning and attention mechanism
CN113077052A (en) Reinforced learning method, device, equipment and medium for sparse reward environment
CN111625820A (en) Federal defense method based on AIoT-oriented security
CN112396187A (en) Multi-agent reinforcement learning method based on dynamic collaborative map
Xiao et al. Network security situation prediction method based on MEA-BP
CN112600794A (en) Method for detecting GAN attack in combined deep learning
CN115208604B (en) AMI network intrusion detection method, device and medium
CN116861239A (en) Federal learning method and system
CN107347064B (en) Cloud computing platform situation prediction method based on neural network algorithm
CN115481441A (en) Difference privacy protection method and device for federal learning
CN117235742B (en) Intelligent penetration test method and system based on deep reinforcement learning
CN111091102B (en) Video analysis device, server, system and method for protecting identity privacy
CN114124784B (en) Intelligent routing decision protection method and system based on vertical federation
CN115001937B (en) Smart city Internet of things-oriented fault prediction method and device
Gao et al. Multi-source feedback based light-weight trust mechanism for edge computing
Wang et al. Deep reinforcement learning for joint sensor scheduling and power allocation under DoS attack
CN116957067B (en) Reinforced federal learning method and device for public safety event prediction model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant