CN113158544A - Edge pre-caching strategy based on federal learning under vehicle-mounted content center network - Google Patents

Edge pre-caching strategy based on federal learning under vehicle-mounted content center network Download PDF

Info

Publication number
CN113158544A
CN113158544A CN202110149492.0A CN202110149492A CN113158544A CN 113158544 A CN113158544 A CN 113158544A CN 202110149492 A CN202110149492 A CN 202110149492A CN 113158544 A CN113158544 A CN 113158544A
Authority
CN
China
Prior art keywords
content
rsu
action
network
vehicle
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110149492.0A
Other languages
Chinese (zh)
Other versions
CN113158544B (en
Inventor
姚琳
李兆洋
吴国伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Original Assignee
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology filed Critical Dalian University of Technology
Priority to CN202110149492.0A priority Critical patent/CN113158544B/en
Publication of CN113158544A publication Critical patent/CN113158544A/en
Application granted granted Critical
Publication of CN113158544B publication Critical patent/CN113158544B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Abstract

The invention belongs to the technical field of vehicle-mounted content center networks, and provides an edge pre-caching strategy based on federal learning in a vehicle-mounted content center network. The RSU takes the historical moving path of the vehicle and the content possibly requested as the basis, models the state and the action taken, then solves the optimal content distribution mode by utilizing deep reinforcement learning, and stores the required content to the corresponding RSU in advance, thereby reducing the time delay required by the vehicle to acquire the content from the RSU. Each RSU self-trains the model according to the locally collected data, then the models obtained by training each RSU are summarized by using federal learning, the models are weighted and averaged according to the data quantity, and the summarized models are uniformly distributed to each RSU. And finally, the priority of the repeated content in cache replacement is reduced according to the cache list of the neighbor node, so that the cache redundancy is reduced.

Description

Edge pre-caching strategy based on federal learning under vehicle-mounted content center network
Technical Field
The invention relates to an edge pre-caching strategy based on federal learning under a vehicle-mounted content center network, and belongs to the technical field of vehicle-mounted content center networks.
Background
Vehicular ad-hoc networks (VANET) are a special type of mobile ad-hoc network that contains several fixed infrastructures and vehicles. In VANET, each vehicle may communicate with other vehicles or fixed roadside base units. In the past decades, VANET has become a content sharing platform of unrelated origin, i.e. VANET is more concerned about the content itself than the actual carrier of the content. Content-oriented applications cover different areas such as entertainment, sports, shopping, etc. In order to meet the content-oriented characteristics of the VANET, a new network structure, namely a content-centric networking (CCN), is provided. Unlike IP networks, content names are basic elements in CCNs, which are characterized by the basic exchange of content request packets (called Interest) and content response packets (called Data). The intra-network caching of the CCN facilitates efficient distribution of streaming content under the mobility and intermittent connectivity of vehicles, resulting in a content-centric vehicle network (VCCN). VCCN can achieve better network performance under security applications, traffic applications and content applications (such as file sharing and commercial advertising).
Similar to the vehicular network, the VCCN mainly includes two types of nodes, namely mobile nodes such as vehicles, also called obus (on Board Unit), and roadside fixed infrastructure (RSU). The nodes have the functions of forwarding the interest packets and caching the content, and the RSU serves as an edge node and bears the functions of receiving the request from the mobile node and requesting the content from the cloud data source, so that the reasonable configuration of the caching strategy of the RSU plays a vital role in improving the content obtaining efficiency of the user. For the in-vehicle network edge cache, the operation environment is very complex, and the local content popularity near the mobile node is influenced by various factors. In particular, user preferences in terms of content are influenced by user context (e.g., location, personal characteristics, and device diversity) in complex patterns. Furthermore, the edge nodes selected to satisfy a particular user request are subject to complex effects of network conditions (e.g., network topology, cooperation between wireless channels and BSs). Due to the natural dynamics of wireless networks, the caching environment of vehicular networks may change over time. The edge nodes should have the intelligence to learn new states and new actions and match them in order to take optimal or near optimal actions. By performing feedback of its behavior, the intelligence of its behavior is known. The intelligent caching strategy should then be able to accept feedback and thus be able to adapt to dynamic changes in the operating environment.
Disclosure of Invention
In order to effectively improve the performance of an edge cache system under a vehicle-mounted content center network, the invention provides an edge pre-cache strategy based on federal learning. The RSU takes the historical moving path of the vehicle and the content possibly requested as the basis, models the state and the action taken, then utilizes deep reinforcement learning to solve the optimal content distribution mode, and stores the required content to the corresponding RSU in advance, thereby reducing the time delay required by the vehicle to acquire the content from the RSU. Each RSU self-trains the model according to the data collected locally, then the models obtained by training each RSU are summarized by using federal learning, the models are weighted and averaged according to the data quantity, and the summarized models are uniformly distributed to the RSUs. And finally, the priority of the repeated content in cache replacement is reduced according to the cache list of the neighbor node, so that the cache redundancy is reduced.
The technical scheme of the invention is as follows:
an edge pre-caching strategy based on federal learning under a vehicle-mounted content center network comprises the following steps:
(1) firstly, data of content requests and corresponding vehicle movement information are collected in a dynamic environment of a vehicle-mounted network, a deep reinforcement learning intelligent agent deployed on an RSU is trained, and a decision which is most beneficial to reducing request time delay is made under a given condition. The training process of the DRL agent first needs to define a state space (state space), an action space (action space), and a reward function (reward function):
(1.1) the state space is mainly composed of two parts, one is the moving state of the vehicle and the other is the request probability of the content. The moving state of the vehicle mainly comprises the current position of the current vehicle and a position which can be reached after a time slice. The current position is easy to obtain, but the possible arriving position cannot be accurately predicted, so that the Markov chain is adopted to predict the possible arriving position of the vehicle according to the historical path of the vehicle, and the prediction result is used as a component of a state space. The request probability of the content is also divided into two types, one is the popularity of the content, and the other is the next possible requested content predicted based on the content currently requested by the vehicle.
(1.2) in order to avoid the motion space from being too expansive, the DRL agent is limited to select only one content to be stored in the cache at a time, and the selection is repeated for multiple times to achieve the purpose of storing the contents with high priority in the cache. To further improve efficiency, the range of selectable content is further narrowed according to the content popularity, and only content with popularity higher than a threshold can be used as a pre-cached object.
(1.3) representing the working efficiency of the DRL agent by using the cache hit rate, wherein in order to take short-term benefits and long-term benefits into consideration, a return function is represented by an exponential weighted average hit rate:
Figure BDA0002932021700000031
wherein r isiRepresents the hit rate of the ith time slice from the current time, w epsilon (0,1) is an exponential weighting factor, and the larger w is, the less profit is gained by the decay of the reward function along with the time.
(2) After the state space, the action space and the reward function are defined, a deep learning framework of the intelligent agent can be constructed and trained. The deep reinforcement learning framework adopted by the patent consists of the following parts:
(2.1) operator network is defined by a parameter θμIs a mapping from state space to action space. Given the state of a state space, the operator network can calculate the original motion in a corresponding motion space according to the parameters of the operator network
Figure BDA0002932021700000032
As an output.
(2.2) the generation of the original motion can effectively reduce the computational complexity brought by a large-scale motion space, but the dimension of the motion space is reduced, and simultaneously, the inaccuracy of a decision result is easily caused. The method of K-Nearest Neighbor (KNN) is therefore used to extend the generated actions into a set of actions, i.e. a set of valid actions in an action space, each element of which may be the action to be performed.
(2.3) to avoid the action of selecting the low Q value, a critic network is defined to limit the output of the operator network, and the parameters of the operator network are updated. The Q value for each action is calculated for the critic network as follows:
Figure BDA0002932021700000041
wherein s istIndicates the state at time t, atIndicating the action taken at time t, θQAnd thetaμRepresenting the parameters of the critical network and the actor network respectively,
Figure BDA0002932021700000042
indicates the expectation of the value in parentheses under the conditions of the environment E, r(s)t,at) Is shown in state stTake action atThe return is brought in gamma e (0,1)]Weight decay factor, μ(s), for the cumulative return in the futuret+1μ) The operation obtained based on the operator network and the state at time t +1 is shown. For each possible action in the action set generated in the previous step, the Critic network calculates a corresponding Q value according to the current state and the next state, wherein the action that takes the maximum value will be selected as the execution action.
And then randomly selecting state transition records in N playback pools, and updating the criticic network by a minimum loss function, wherein the definition of a loss function L is as follows:
Figure BDA0002932021700000043
wherein y isi=ri+γQ′(si+1,μ′(si+1μ′)|θQ′) I represents the selected ith record,q 'and μ' represent critic and operator networks before the corresponding state transition of this record occurs.
Updating the parameters of the actor network by using the sampling strategy gradient, wherein the sampling strategy gradient is calculated as follows:
Figure BDA0002932021700000044
i.e. the parameter theta to the operator network according to the chain ruleμCalculating a gradient in which
Figure BDA0002932021700000045
For critic network to state siAction taken under conditions a ═ μ(s)i) The gradient is calculated and the gradient is calculated,
Figure BDA0002932021700000046
for the operator network to the parameter thetaμAnd (5) calculating a gradient.
(3) The training of the deep reinforcement learning agent needs a large amount of data as a training set, the data are usually collected from different RSUs, if the data are to be uploaded to a central node, such as a specific RSU or a remote server, on one hand, a situation that bandwidth is occupied in a large amount is caused, on the other hand, the computing performance of the single node becomes a bottleneck limit, and meanwhile, the computing resources of a large number of edge nodes are not utilized effectively. Therefore, the method adopts a federal learning framework, each RSU collects data locally and trains a given network, and then model parameters are uploaded to a remote server periodically. And the remote server performs federal averaging to obtain updated model parameters and issues the updated model parameters to each RSU again. The procedure for federal learning is as follows:
(3.1) firstly, the remote server initializes a model of the deep reinforcement learning intelligent agent and endows random parameter initial values for the current actor network and the critic network. The remote server then distributes this model to the various RSUs within the region.
And (3.2) the RSU starts to train the model after receiving the model, the training process is as described in the step (2), if historical data is available, the historical data can be processed and used for training the model, and meanwhile, new data obtained in the operation of the system after receiving the model can further update the model.
(3.3) after a period of training, each RSU transmits the model trained by itself back to a remote server, and the remote server performs federated averaging (fed averaging), and considering that different RSUs are located at different positions, the traffic flow is also calculated in the following specific manner:
Figure BDA0002932021700000051
wherein theta ist+1Representing the network parameters after one iteration, K is the total number of RSUs participating in the federal learning, n is the total number of requests received by each RSU during the single training period of the iteration, nkThe number of requests received for the kth RSU,
Figure BDA0002932021700000052
representing the parameters after the k RSU training. The whole process is circularly carried out until the model parameters are kept stable.
And (3.4) the remote server redistributes the trained model to each RSU, and each RSU guides the cache operation by using a uniform intelligent body.
(4) It is mentioned in step (1) that the DRL agent selects only one content at a time for pre-caching, and then pre-caches a number of possible contents by repeating many times. Thus, in effect, one pre-cached content corresponds to the Q value of one action. On this basis, in order to reduce the space waste caused by storing the same content by a plurality of adjacent RSUs, each RSU exchanges its own cache list with the adjacent RSU when calculating the Q value of each action, and if one content exists in a plurality of adjacent RSUs, the priority of the action is additionally reduced, specifically, the calculation method is as follows:
Figure BDA0002932021700000061
wherein n isdFor storing the content in the adjacent RSUThe number of such cells. And the RSU reorders all the contents according to the adjusted Q value, and then sequentially pre-caches the contents meeting the conditions.
The invention has the beneficial effects that: for a mobile network in a vehicle, an operation environment is very complicated, and local content popularity in the vicinity of a mobile node is affected by various factors. The deep reinforcement learning can model a complex operation environment, the cache environment is represented through mobile prediction and user request content prediction, and an optimal pre-cache selection result is obtained through training of a large amount of data.
Since the RSUs are located in different regions, the user density and the number of requests are different, generally, the larger the training set is, the more accurate the obtained model is, but if the RSUs upload all the training data to a specific RSU or a remote server, the data transmission will occupy a large amount of bandwidth resources, and meanwhile, the single-point performance bottleneck will restrict the training efficiency of the whole model. The federal study can effectively solve the problems, the bandwidth occupation is reduced by a mode of transmitting model parameters, and simultaneously, the calculation resources of the RSU are fully utilized to carry out model training, so that the bottleneck of single-point performance is avoided.
Finally, the priority of the repeated content is additionally reduced according to the cache list adjacent to the RSU, so that space waste caused by redundant cache can be effectively reduced, and the cache efficiency is improved.
Drawings
Fig. 1 is an organizational chart of a pre-caching strategy according to the present invention.
FIG. 2 is a flowchart of deep reinforcement learning modeling according to the present invention.
FIG. 3 is a flow chart of deep reinforcement learning agent training according to the present invention.
Fig. 4 is a flow chart of federal learning in accordance with the present invention.
Fig. 5 is a flowchart of the RSU pre-caching according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail by examples and drawings.
The method comprises the steps of carrying out deep reinforcement learning to model an edge cache environment, integrating trained model parameters by using a federal learning framework, and carrying out pre-caching by an RSU (remote subscriber Unit) through a local intelligent agent.
Referring to fig. 2, the specific implementation process of modeling the edge cache environment required for deep reinforcement learning is as follows:
step 1, counting historical moving paths of each vehicle in a preheating stage RSU.
And 2, respectively establishing a Markov chain-based movement prediction model according to the historical movement path of each vehicle.
And 3, uploading the position of each vehicle periodically by each vehicle, and enabling the RSU to obtain the current position l of each vehicletAs part of the state space.
And 4, substituting the positions of the two latest time slices of the vehicle into the movement prediction model by the RSU to calculate the most probable position l of the next time slice of each vehiclet+1Also as part of the state space.
Step 5, based on the assumption that the user will request in sequence when accessing the video stream data, the content that the user may access can be calculated, for example, the content accessed at time t is ciThen the content that may be requested at time t +1 may be calculated as follows:
c(t+1)=ci+Δi
Δi=Δt/dc
where Δ t denotes the duration of a time slice, dcIndicating the time of average play of a content.
And 6, the RSU calculates the popularity of the content according to the following mode:
Figure BDA0002932021700000081
wherein λ ∈ [0,1 ]]To characterize the weight of the number of historical requests relative to the number of recent requests, n, for a decay factortIndicating the number of requests for the content over a period of time t. The popularity of content also serves as a component element of the state space.
And 7, screening the content according to the content popularity, wherein only the popularity is higher than a threshold value rhotCan the content of (a) be served as a pre-cached object.
And 8, limiting the DRL agent to select one content to be stored in the cache at one time, and repeating the selection for multiple times to store the high-priority content in the cache. The motion space of a single operation is all the contents screened in step 7:
Figure BDA0002932021700000082
where N represents the number of content for which popularity reaches a threshold.
And 9, representing the working efficiency of the DRL agent by using the cache hit rate, wherein in order to take short-term income and long-term income into consideration, the return function is represented by an exponential weighted average hit rate:
Figure BDA0002932021700000083
wherein r isiRepresenting the hit rate of the ith time slice from the current start, w ∈ (0,1) is an exponential weighting factor, and the larger w is, the less profit is gained by the decay of the reward function along with time.
Referring to fig. 3, the specific training process of the deep reinforcement learning agent on the RSU is as follows:
step 10, initializing operator network mu (s | theta)μ) Criticc network Q (s, a | θ)Q) With the parameters respectively being thetaμAnd thetaQ(ii) a Simultaneously initializing the target networks mu 'and Q', the initialization parameter theta of whichμ′←θμ,θQ′←θQ. The experienced playback set R is initialized.
Step 11. select an original action based on the state at time t
Figure BDA0002932021700000084
Step 12, selecting the nearest k effective actions by using a KNN algorithm to be recorded as
Figure BDA0002932021700000085
Figure BDA0002932021700000091
Step 13, selecting the action with the maximum Q value according to the current strategy
Figure BDA0002932021700000092
Executing, observing and reporting rtAnd a new state st+1And transfer the state(s)t,at,rt,st+1) The empirical playback set R is logged.
Step 14. select a certain size of state transition record(s) from the empirical playback set Ri,ai,ri,si+1). Setting yi=ri+γQ′(si+1,μ′(si+1μ′)|θQ′)。
And step 15, updating the critic network through a minimization loss function:
Figure BDA0002932021700000093
updating the actor network with the parameter gradient:
Figure BDA0002932021700000094
step 16, updating the target network:
θQ′←τθQ+(1-τ)θQ′
θμ′←τθμ+(1-τ)θμ′
wherein τ < 1 is the update coefficient.
Steps 11 to 16 are model updates performed for a time slice, each time slice being repeated one time in a loop.
Referring to fig. 4, the specific flow of the federal learning based on deep reinforcement learning is as follows:
and step 17, the remote server initializes a model of the deep reinforcement learning agent and endows random parameter initial values for the current actor network and the critic network.
The remote server distributes the model to the RSUs within the region, step 18.
And 19, performing online training on the deep reinforcement learning agent by the RSU based on the steps 10 to 16.
And 20, after the RSUs are trained for a period of time, each RSU transmits the model trained by the RSU back to the remote server, and the remote server performs federated averaging (fed averaging), wherein the specific calculation mode is shown in the technical scheme (3.3).
And step 21, the remote server redistributes the trained model to each RSU, and each RSU guides the cache operation by using a uniform intelligent agent. Steps 19 to 21 are then repeated until the model converges.
Referring to fig. 5, the specific process of the RSU performing the pre-caching is as follows:
and step 22, periodically exchanging respective cache lists adjacent to the RSU.
And 23, after the model training is finished and each time slice begins, the RSU collects environment information and constructs corresponding states, including the moving state of the vehicle and the request probability of the content.
And 24, selecting an effective action set according to the description of the steps 11-12.
And 25, firstly, eliminating the contents of which the popularity does not reach the threshold value from the contents corresponding to each action in the set.
Step 26, if the collection has optional contents, executing steps 27-29; otherwise, ending the pre-caching operation of the current time slice.
Step 27. when each RSU calculates Q value of each action by using critic network, if one content exists in a plurality of adjacent RSUs, additional content is addedThe priority of the action is reduced, and the specific calculation mode is as follows:
Figure BDA0002932021700000101
wherein n isdIs the number of such contents present in the adjacent RSU. And the RSU reorders all contents according to the adjusted Q value and then pre-caches the contents with the highest Q value.
And step 28, if the cache is full, selecting content to be discarded by adopting an LRU cache replacement strategy, and then placing the pre-cached content into the cache.
Step 29, if the amount of the pre-cached content reaches 3/5 of the cache space, ending the pre-caching operation of the current time slice; otherwise, go back to step 24 to repeat the above operations.

Claims (1)

1. An edge pre-caching strategy based on federal learning under a vehicle-mounted content center network is characterized by comprising the following steps:
(1) firstly, acquiring data of content requests and corresponding vehicle movement information in a dynamic environment of a vehicle-mounted network, training a deep reinforcement learning agent deployed on an RSU, and making a decision which is most beneficial to reducing request delay under a given condition; the training process of the DRL agent first needs to define the state space, action space and reward function:
(1.1) the state space is mainly composed of two parts, one part is the moving state of the vehicle and the other part is the request probability of the content; the moving state of the vehicle comprises the current position of the current vehicle and a position which can be reached after a time slice; the current position is easy to obtain, but the possible arriving position cannot be accurately predicted, so that the Markov chain is adopted to predict the possible arriving position of the vehicle according to the historical path of the vehicle, and the prediction result is used as a component of a state space; the request probability of the content is also divided into two types, one is the popularity of the content, and the other is the content which is predicted to be requested next possibly based on the content currently requested by the vehicle;
(1.2) in order to avoid the excessive expansion of the action space, the DRL agent is limited to select one content to be stored in the cache at one time, and the selection is repeated for multiple times to store the high-priority content in the cache; in order to further improve the efficiency, the range of the selectable content is further reduced according to the popularity of the content, and only the content with the popularity higher than a threshold value can be used as a pre-cached object;
(1.3) representing the working efficiency of the DRL agent by using the cache hit rate, wherein in order to take short-term benefits and long-term benefits into consideration, a return function is represented by an exponential weighted average hit rate:
Figure FDA0002932021690000011
wherein r isiThe hit rate of the ith time slice from the current time is represented, w belongs to (0,1) as an exponential weighting factor, and the larger w is, the less profit of the decay of the return function along with the time is;
(2) after a state space, an action space and a return function are defined, a deep learning framework of the intelligent agent can be constructed and trained; the deep reinforcement learning framework adopted by the method consists of the following parts:
(2.1) operator network is defined by a parameter θμIs a mapping from state space to action space; given the state of a state space, the operator network calculates the original action in a corresponding action space according to the parameters of the operator network
Figure FDA0002932021690000021
As an output;
(2.2) expanding the generated action into a group of actions by adopting a K-nearest neighbor method, namely, a set of effective actions in an action space, wherein each element of the set of effective actions is possibly used as the action to be executed;
(2.3) in order to avoid the action of selecting the low Q value, a critic network is defined to limit the output of the operator network, and the parameters of the operator network are updated; the deterministic target strategy is as follows:
Figure FDA0002932021690000022
wherein s istIndicates the state at time t, atIndicating the action taken at time t, θQAnd thetaμRepresenting the parameters of the critical network and the actor network respectively,
Figure FDA0002932021690000023
indicates the expectation of the value in parentheses under the conditions of the environment E, r(s)t,at) Is shown in state stTake action atThe return is brought in gamma e (0,1)]Weight decay factor, μ(s), for the cumulative return in the futuret+1μ) Represents an action based on the operator network and the state at time t + 1; for each possible action in the action set generated in the previous step, the criticc network calculates a corresponding Q value according to the current state and the next state, wherein the action which obtains the maximum value is selected as the execution action;
the criticc network is then updated by minimizing a loss function defined as:
Figure FDA0002932021690000024
wherein y isi=ri+γQ′(si+1,μ′(si+1μ′)|θQ′) I represents the selected ith record, and Q 'and mu' represent critic and operator networks before the state transition corresponding to the record occurs;
updating the parameters of the actor network by using the gradient of the sampling strategy:
Figure FDA0002932021690000031
i.e. the parameter theta to the operator network according to the chain ruleμCalculating a gradient in which
Figure FDA0002932021690000032
For critic network to state siAction taken under conditions a ═ μ(s)i) The gradient is calculated and the gradient is calculated,
Figure FDA0002932021690000033
for operator network to parameter thetaμCalculating a gradient;
(3) the method adopts a federal learning framework, each RSU respectively collects data locally and trains a given network, and then model parameters are uploaded to a remote server periodically; the remote server carries out federal averaging to obtain updated model parameters and sends the updated model parameters to each RSU again; the process of federal learning is as follows:
(3.1) firstly, initializing a model of a deep reinforcement learning agent by a remote server, and endowing random parameter initial values for a current actor network and a critic network; the remote server then distributes the model to the RSUs in the region;
(3.2) the RSU starts to train the model after receiving the model, the training process is the same as the step (2), if historical data available for adoption exist, the processed historical data are used for training the model, and meanwhile, new data obtained in the operation of the system after receiving the model further update the model;
(3.3) after a period of training, each RSU transmits the model trained by the RSU back to a remote server, and the remote server performs federal averaging, wherein the traffic flow is calculated in the following specific way by considering that the positions of different RSUs are different:
Figure FDA0002932021690000034
wherein, thetat+1Representing the network parameters after one iteration, K is the total number of RSUs participating in the federal learning, n is the total number of requests received by each RSU during the single training period of the current iteration, nkThe number of requests received for the kth RSU,
Figure FDA0002932021690000041
representing the parameters after the k RSU training. The whole process is circularly carried out until the model parameters are stable;
(3.4) the remote server redistributes the trained model to each RSU, and each RSU guides cache operation by a uniform agent;
(4) in the step (1), the DRL agent only selects one content for pre-caching at one time, and then pre-caches a plurality of possible contents by repeating for a plurality of times; thus, in fact, one pre-cached content corresponds to the Q value of one action; on this basis, in order to reduce the space waste caused by storing the same content by a plurality of adjacent RSUs, each RSU exchanges its own cache list with the adjacent RSU when calculating the Q value of each action, and if one content exists in a plurality of adjacent RSUs, the priority of the action is additionally reduced, specifically, the calculation method is as follows:
Figure FDA0002932021690000042
wherein n isdThe number of the content existing in the adjacent RSU; and the RSU reorders all the contents according to the adjusted Q value, and then sequentially pre-caches the contents meeting the conditions.
CN202110149492.0A 2021-02-03 2021-02-03 Edge pre-caching strategy based on federal learning under vehicle-mounted content center network Active CN113158544B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110149492.0A CN113158544B (en) 2021-02-03 2021-02-03 Edge pre-caching strategy based on federal learning under vehicle-mounted content center network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110149492.0A CN113158544B (en) 2021-02-03 2021-02-03 Edge pre-caching strategy based on federal learning under vehicle-mounted content center network

Publications (2)

Publication Number Publication Date
CN113158544A true CN113158544A (en) 2021-07-23
CN113158544B CN113158544B (en) 2024-04-12

Family

ID=76882726

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110149492.0A Active CN113158544B (en) 2021-02-03 2021-02-03 Edge pre-caching strategy based on federal learning under vehicle-mounted content center network

Country Status (1)

Country Link
CN (1) CN113158544B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116567719A (en) * 2023-07-05 2023-08-08 北京集度科技有限公司 Data transmission method, vehicle-mounted system, device and storage medium
WO2024065903A1 (en) * 2022-09-29 2024-04-04 福州大学 Joint optimization system and method for computation offloading and resource allocation in multi-constraint-edge environment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109617962A (en) * 2018-12-11 2019-04-12 电子科技大学 A kind of car networking mist node content caching method based on the content degree of association
CN110535875A (en) * 2019-09-19 2019-12-03 大连理工大学 Caching under vehicle-mounted content center network based on cooperation mode pollutes attack detection method
CN110958573A (en) * 2019-11-22 2020-04-03 大连理工大学 Mobile perception cooperative caching method based on consistent Hash under vehicle-mounted content center network
KR102124979B1 (en) * 2019-07-31 2020-06-22 (주)크래프트테크놀로지스 Server and methor for performing order excution for stock trading
CN111491175A (en) * 2019-10-18 2020-08-04 北京大学 Edge network caching method and device based on video content characteristics

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109617962A (en) * 2018-12-11 2019-04-12 电子科技大学 A kind of car networking mist node content caching method based on the content degree of association
KR102124979B1 (en) * 2019-07-31 2020-06-22 (주)크래프트테크놀로지스 Server and methor for performing order excution for stock trading
CN110535875A (en) * 2019-09-19 2019-12-03 大连理工大学 Caching under vehicle-mounted content center network based on cooperation mode pollutes attack detection method
CN111491175A (en) * 2019-10-18 2020-08-04 北京大学 Edge network caching method and device based on video content characteristics
CN110958573A (en) * 2019-11-22 2020-04-03 大连理工大学 Mobile perception cooperative caching method based on consistent Hash under vehicle-mounted content center network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
曲大鹏;杨文;杨越;程天放;吴思锦;王兴伟;: "结合内容流行和主观偏好的节点缓存策略", 小型微型计算机系统, no. 11, 15 November 2018 (2018-11-15) *
霍跃华;刘银龙;: "内容中心网络中基于内容流行度和节点属性的协作缓存策略", 太原理工大学学报, no. 01, 15 January 2018 (2018-01-15) *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024065903A1 (en) * 2022-09-29 2024-04-04 福州大学 Joint optimization system and method for computation offloading and resource allocation in multi-constraint-edge environment
CN116567719A (en) * 2023-07-05 2023-08-08 北京集度科技有限公司 Data transmission method, vehicle-mounted system, device and storage medium
CN116567719B (en) * 2023-07-05 2023-11-10 北京集度科技有限公司 Data transmission method, vehicle-mounted system, device and storage medium

Also Published As

Publication number Publication date
CN113158544B (en) 2024-04-12

Similar Documents

Publication Publication Date Title
Tang et al. Survey on machine learning for intelligent end-to-end communication toward 6G: From network access, routing to traffic control and streaming adaption
CN110312231A (en) Content caching decision and resource allocation joint optimization method based on mobile edge calculations in a kind of car networking
CN109391681A (en) V2X mobility prediction based on MEC unloads scheme with content caching
CN111385734B (en) Internet of vehicles content caching decision optimization method
CN112020103B (en) Content cache deployment method in mobile edge cloud
CN114143891A (en) FDQL-based multi-dimensional resource collaborative optimization method in mobile edge network
CN113158544A (en) Edge pre-caching strategy based on federal learning under vehicle-mounted content center network
Nomikos et al. A survey on reinforcement learning-aided caching in heterogeneous mobile edge networks
Chua et al. Resource allocation for mobile metaverse with the Internet of Vehicles over 6G wireless communications: A deep reinforcement learning approach
Somesula et al. Cooperative cache update using multi-agent recurrent deep reinforcement learning for mobile edge networks
CN113727306A (en) Decoupling C-V2X network slicing method based on deep reinforcement learning
CN114423061B (en) Wireless route optimization method based on attention mechanism and deep reinforcement learning
CN113950113B (en) Internet of vehicles switching decision method based on hidden Markov
Balasubramanian et al. FedCo: A federated learning controller for content management in multi-party edge systems
CN116321307A (en) Bidirectional cache placement method based on deep reinforcement learning in non-cellular network
Khanal et al. Route-based proactive content caching using self-attention in hierarchical federated learning
CN114374949A (en) Power control mechanism based on information freshness optimization in Internet of vehicles
CN104822150B (en) The spectrum management method of information active cache in the multi-hop cognition cellular network of center
Hazarika et al. AFL-DMAAC: Integrated resource management and cooperative caching for URLLC-IoV networks
CN116249162A (en) Collaborative caching method based on deep reinforcement learning in vehicle-mounted edge network
CN116484976A (en) Asynchronous federal learning method in wireless network
Cai et al. Cooperative content caching and delivery in vehicular networks: A deep neural network approach
Zhang et al. Novel resource allocation algorithm of edge computing based on deep reinforcement learning mechanism
Tirupathi et al. HybridCache: AI-assisted cloud-RAN caching with reduced in-network content redundancy
Khanal et al. Proactive content caching at self-driving car using federated learning with edge cloud

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant