CN114143891A - FDQL-based multi-dimensional resource collaborative optimization method in mobile edge network - Google Patents

FDQL-based multi-dimensional resource collaborative optimization method in mobile edge network Download PDF

Info

Publication number
CN114143891A
CN114143891A CN202111447130.6A CN202111447130A CN114143891A CN 114143891 A CN114143891 A CN 114143891A CN 202111447130 A CN202111447130 A CN 202111447130A CN 114143891 A CN114143891 A CN 114143891A
Authority
CN
China
Prior art keywords
base station
ddql
fdql
content
optimization
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111447130.6A
Other languages
Chinese (zh)
Inventor
高志宇
王天荆
沈航
白光伟
田一博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Tech University
Original Assignee
Nanjing Tech University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Tech University filed Critical Nanjing Tech University
Priority to CN202111447130.6A priority Critical patent/CN114143891A/en
Publication of CN114143891A publication Critical patent/CN114143891A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W72/00Local resource management
    • H04W72/50Allocation or scheduling criteria for wireless resources
    • H04W72/54Allocation or scheduling criteria for wireless resources based on quality criteria
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W72/00Local resource management
    • H04W72/50Allocation or scheduling criteria for wireless resources
    • H04W72/54Allocation or scheduling criteria for wireless resources based on quality criteria
    • H04W72/542Allocation or scheduling criteria for wireless resources based on quality criteria using measured or perceived quality

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Quality & Reliability (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Probability & Statistics with Applications (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

In the prior art, a mobile edge network presents a trend of intellectualization, diversification and integration, so that the optimal allocation of multidimensional resources faces a plurality of challenges. In order to improve the accuracy of multi-dimensional resource optimization, the invention provides an FDQL-based multi-dimensional resource collaborative optimization method in a mobile edge network. The method constructs a multidimensional resource allocation model by taking the minimized MOS as an optimization target, and designs a double-layer decision scheme. Firstly, a base station at the bottom layer utilizes a double-depth Q learning DDQL to carry out local model training to obtain an optimal decision in a short period; and then, the edge nodes on the upper layer utilize the Federal deep learning FDQL to carry out global model training so as to reduce the deviation of distributed decisions in a long period. The experimental result shows that the algorithm is superior to other methods in the aspects of reducing the content service delay and improving the user experience quality.

Description

FDQL-based multi-dimensional resource collaborative optimization method in mobile edge network
Technical Field
The invention belongs to the technical field of communication, and particularly relates to a multi-dimensional resource collaborative optimization method based on Federal deep reinforcement learning (FDQL) in a mobile edge network.
Background
As known from the mobile market forecast report issued by Ericsson, by 2024, 5G users will reach 19 hundred million[1]The rapidly increasing data traffic exacerbates the conflict between limited spectrum, computing, and cache resources and increasing resource demand. Meanwhile, the Internet of Things (IoT, Intemet of Things)[2]Vehicle net (IoV, Intemet of Vehicles)[3]Increases the complexity of the network environment. At present, network communication faces diversified, integrated, intelligent and other challenges, which aggravates the difficulty of resource management. For the method, part of service processing and resource scheduling functions are deployed to the cloud platform by an operator to realize service performance improvement[4]
However, in the face of future global coverage of 100%, ultra-large scale terminal device access, and mass data transmission below millisecond delay, the traditional processing platform relying on cloud computing faces a huge challenge. Especially, emerging services such as intelligent driving, Virtual Reality (VR), Augmented Reality (AR), and ultra-high video streaming increasingly depend on real-time data processing capability with high reliability and low time delay, and a cloud center far away from a user and a terminal device cannot process huge application programs in time, and network congestion and transmission delay also seriously affect user experience. Mobile Edge Computing (MEC) techniques to marginalize and localize network resources[5-9]Is one of the key technologies for solving the problems.
The MEC technology deploys a server with strong computing power to an edge node close to a user and a macro base station to increase local computing resources; meanwhile, the method is configured with larger caching capacity, and the novel data service quality of Web browsing, multimedia, social networks and the like is improved. And computing and storage resources sink to the edge of the network, so that the bandwidth pressure can be remarkably relieved, the load of a return link is reduced, and the service delay is reduced, thereby overcoming the defect of overlong service delay of the cloud center. Although a large-scale mobile network densely deployed MEC server can provide a service solution with ultra-low delay and high bandwidth for a user, the problem of resource usage imbalance still exists in a rapidly increased user service request, and therefore how to optimize a multidimensional resource allocation scheme on the premise of satisfying Quality of Experience (QoE) of the user becomes one of the problems that the MEC needs to solve urgently.
The traditional genetic algorithm, the particle swarm algorithm, the game theory, the graph theory coloring algorithm and the like can be used for solving the resource optimization problem of the MEC. Document [10] proposes a two-stage heuristic optimization algorithm based on a genetic algorithm, which decouples the joint optimization problem of computation offload and resource allocation, and obtains an allocation strategy with the minimum energy consumption by iteratively updating the solution of the problem. For the problem of multiple access characteristics and resource limitation of a base station in a dense cell, document [11] models a joint Optimization problem of a computation offloading decision and allocation of spectrum, power and computation resources as a mixed integer nonlinear programming problem of NP-hard, and obtains a computation offloading and resource scheduling scheme with low system overhead by using a Particle Swarm Optimization (PSO). Document [12] establishes an efficient network resource optimization strategy according to the Stackelberg game theory, realizes low-delay and high-reliability video service, and alleviates conflicts between transmission performance and QoE. Document [13] proposes a network resource sharing scheme based on graph theory coloring theory, which takes system resource overhead as an optimization target, realizes efficient V2X collaborative caching and computation, and communication resource allocation, and reduces multimedia service delay. The new characteristics of big data, dynamism, multiple targets and the like of the existing MEC network make the traditional method unable to fully mine network information to generate an optimal resource allocation decision.
Artificial intelligence represented by machine learning and deep learning is gradually changed from initial algorithm driving to data, algorithm and computational power composite driving, and various problems in the application field can be effectively solved. Document [14]A Reinforcement Learning (RL) optimization framework based on edge cloud computing is provided, and optimal task unloading and resource allocation decisions are quickly made by utilizing the advantages of the RL capability in adaptive processing of environmental diversity and dynamic property, so that the aims of minimizing task delay and energy consumption of a user battery are fulfilled. With exponential growth in MEC network size and structural complexityThe RL-based resource optimization algorithm has slow convergence due to huge state space, and is difficult to find an optimal solution. Deep Reinforcement Learning (DRL) estimates the value function of RL using a Deep Neural Network (DNN) to obtain an accurate approximate solution. Deep Q-Learning (DQL) is used as a DRL algorithm, the perception capability of Deep Learning and the decision capability of reinforcement Learning are combined, and the perception decision problem of a complex system is solved by continuous trial and error[15]. Document [16 ] for an MEC server to provide three multimedia services for a user, namely live streaming, buffered streaming and low-delay enhanced mobile broadband application]A QoS evaluation model is designed firstly, then network resources are dynamically distributed by using DQN to meet the QoS of users to the maximum extent, and the resource scheduling performance is superior to a circular scheduling algorithm and a priority scheduling algorithm. Document [17]]A resource allocation model with minimum task average energy consumption is constructed under the constraint conditions of communication, calculation and cache resources, and a multidimensional resource allocation method based on Double Deep Q-Learning (DDQL) is provided. Simulation results show that compared with a random algorithm, a greedy algorithm, a particle swarm algorithm and DQL, the DDQL can better solve the problem of multi-task resource allocation and reduce the average energy consumption of tasks by at least 5%. Estimating cumulative delay and reward per action using two neural networks [18 ]]A attention-based DDQL method is provided, and a CPU frequency and transmission power scheduling strategy with minimum delay and energy consumption can be obtained in a long period. Document [19 ]]With the goals of maximizing long-term profit and meeting low-delay calculation of users as targets, the DDQL is used for jointly optimizing the calculation unloading and buffer resource allocation scheme of the edge node, and the maximum profit is realized when the QoS of the service is ensured.
Usually, the training of the DRL relies on "big data", however, it is easier for the application industry to collect "small data" at low cost, and these distributed small data form numerous "data islands", which greatly restricts the usability of the DRL decision. On the other hand, centralized DRL training poses a huge challenge to the computing and storage capabilities of the MEC server. Federal Learning (Federated Learning, FL)[20]And breaking a data island, and obtaining a global optimal model with privacy safety by sharing network parameters. Document [21 ]]A resource management method based on federal learning is provided, and the bottleneck problem of intensive resource usage such as calculation, bandwidth, energy and data in the MEC is solved. To solve the problem of the decrease of average aggregation accuracy due to large data volume difference of participants [22 ]]A fair alpha-FedAvg algorithm is provided, and a more fair global resource optimization model is generated by re-weighting the FL polymerization process by utilizing the alpha value, so that the efficiency of local resource allocation is improved. Document [23 ]]And a collaborative edge caching frame based on Federated Deep Reinforcement Learning (FDRL) is provided, wherein the FDRL uploads near-optimal local DRL parameters to an edge node to participate in the next round of global FL training, so that the obtained local caching resource optimization scheme can effectively reduce backhaul flow and improve content hit rate.
Disclosure of Invention
Aiming at the trend that the mobile edge network in the prior art is intelligent, diversified and integrated, the optimal allocation of multidimensional resources faces many challenges. Aiming at the requirement of mass content service in MEC, in order to improve the accuracy of multi-dimensional resource optimization, the invention models the problem of spectrum, calculation and cache resource joint optimization into a mixed integer nonlinear programming problem, and constructs a double-layer multi-dimensional resource allocation framework decoupling original problem according to the federal deep reinforcement learning. The base station at the bottom layer adopts DDQL to obtain a local resource collaborative optimization strategy in a short period, so that the full utilization of local multidimensional resources is realized; and the edge node of the upper layer adopts FDQL to train the global optimal resource collaborative optimization strategy in a long period, so that the user can obtain the request content with low time delay and high quality.
The invention discloses a multi-dimensional resource collaborative optimization method based on FDQL in a mobile edge network, which constructs a multi-dimensional resource distribution model by taking a minimized MOS as an optimization target and designs a double-layer decision scheme. Firstly, a base station at the bottom layer utilizes a double-depth Q learning DDQL to carry out local model training to obtain an optimal decision in a short period; and then, the edge nodes on the upper layer utilize the Federal deep reinforcement learning FDQL to carry out global model training so as to reduce the deviation of distributed decisions in a long period. Simulation experiments show that the method is superior to other methods in the aspects of reducing content service delay and improving user experience quality.
Drawings
FIG. 1 is a schematic diagram of a mobile edge network;
FIG. 2 is a workflow diagram of an FDQL framework;
FIG. 3 is a graph comparing the training performance of four algorithms;
FIG. 4 is a comparison graph of the resource co-optimization performance of four algorithms under different short periods;
FIG. 5 is a comparison graph of decision times of four algorithms in different short periods;
FIG. 6 is a comparison graph of the resource co-optimization performance of four algorithms for different numbers of users;
FIG. 7 is a graph of loss function comparison of centralized DDQL and FDQL;
FIG. 8 is a graph of average MOS value comparison of a centralized DDQL and an FDQL;
FIG. 9 is a graph of decision time comparison of centralized DDQL and FDQL;
FIG. 10 is a QoE performance comparison graph of centralized DDQL and FDQL for different numbers of users;
fig. 11 is a graph comparing QoE performance of centralized DDQL and FDQL under different numbers of base stations.
Detailed Description
The invention is further described with reference to the following detailed description and accompanying drawings.
1. Overview
The invention provides a multi-dimensional resource collaborative optimization method based on Federal deep reinforcement learning FDQL, which is used for constructing a double-layer resource management architecture. Its main technical contribution includes the following three aspects:
(1) and establishing an MOS (metal oxide semiconductor) maximization model under the constraints of wireless rate, content acquisition delay and cache capacity according to the user content service request, and optimizing the allocation of frequency spectrum, calculation and cache resources in the MEC system by taking the MOS maximization model as a basis.
(2) The base station converts the MOS optimization model into a Markov Decision Process (MDP) with a continuous state space and a high-dimensional action space, and designs a multidimensional resource collaborative optimization algorithm based on the double-depth Q learning DDQL to realize local multidimensional resource joint optimization in a short period.
(3) Aiming at the global multidimensional resource joint optimization problem, the edge nodes carry out parameter aggregation for all associated base stations in a long period, and resource cooperative optimization based on FDQL is implemented. After continuous iteration, the resource optimization decision of the global model achieving ideal convergence approaches to the decision result of the centralized deep Q learning DQL, so that the multi-dimensional resource allocation meets the QoE requirement of a user.
2. System model
In this example, the MEC system consists of N Base Stations (BSs) capable of providing computing and caching services and an edge server, and the Base stations connect the edge node and the neighbor Base Station through a wired optical cable, as shown in fig. 1. Base station
Figure BDA0003382283100000031
The covered cells are distributed with U randomlynIndividual intelligent terminal user, user
Figure BDA0003382283100000032
The base stations are connected by wireless links.
The edge node and the N base stations cooperatively cache a large amount of content to meet the content service requirements of users. Suppose that the cache contents of the whole system are collected in a long period
Figure BDA0003382283100000033
Wherein the content
Figure BDA0003382283100000034
Is shown as Df. Defining the local popularity of content f at base station n as pn,fThen its global popularity
Figure BDA0003382283100000035
Satisfying the following Mandelbrot-Zipf (MZipf) distribution
Figure BDA0003382283100000036
Wherein r isfFor the rank of the content f in the descending sequence of popularity, τ is the stationary factor and β is the skewness factor. Each base station buffers according to popularity in descending order
Figure BDA0003382283100000037
In a part of the content
Figure BDA0003382283100000038
Its buffer status is expressed as
Figure BDA0003382283100000039
And the edge node equipped with enough resources caches
Figure BDA00033822831000000310
To ensure the quality of service of the system.
In the same time period, assume U in the nth cellnThe content request of each user is collected as
Figure BDA00033822831000000311
The content request status of user u is a multi-tuple
Figure BDA00033822831000000312
Wherein
Figure BDA00033822831000000313
Representing a content acquisition rate requirement; when the content f is not requested
Figure BDA00033822831000000314
Therefore, a content request list Table is formed in the base station to record the requirements of each user, and subsequent multidimensional resource allocation and content distribution are facilitated.
2.2 communication model
In the edge cache system, a user makes a content request to an associated base station, and the base station adds the content request into a content request list. If the base station caches the request content locally, the request content is directly sent to the user; on the contrary, if a plurality of neighbor base stations cache the request content,the local base station acquires the content from the nearest neighbor base station and sends the content to the user, otherwise, the local base station acquires the content from the edge node and sends the content to the user. Thus, define y n,u,f1 means that the requested content f is cached at the local base station and y n,u,f0 means uncached; definition of z in the same way n,u,f1 means that the requested content f is cached in the nearest neighbor base station, and z n,u,f0 means uncached.
Because a user may request a plurality of contents in the same period, and the service delay requirements of the users applying automatic navigation, video live broadcast and the like are strict, the base station processes the same content requests of different users in a sampling multicast mode, and the content service quality is enhanced by improving the frequency spectrum utilization rate. For this purpose, the system spectrum resource W is divided into L subchannels of bandwidth B
Figure BDA00033822831000000315
And is shared by N cells. Generally, the Signal to Interference plus Noise Ratio (SINR) of the content received by the user u through the sub-channel l can be expressed as SINR
Figure BDA00033822831000000316
Wherein the transmission power of the base station n to the user u on the subchannel l is Pn,u,lChannel gain of hn,u,l
Figure BDA0003382283100000041
Is the noise power. The downlink transmission rate of subchannel l is then
vn,u,l=B log2(1+SINRn,u,l) (3)
The corresponding downlink transmission time delay is easily obtained from the formula (3)
Figure BDA0003382283100000042
From a set of content requests
Figure BDA0003382283100000043
And list Table, base station n is FnA content distribution sub-channel
Figure BDA0003382283100000044
Introducing a channel connection binary matrix
Figure BDA0003382283100000045
Wherein xn,u,l1 denotes that user u receives a content transmission on subchannel l; on the contrary, x n,u,l0 means no access. Next, define XnThe u-th row vector xn,u=(xn,u,l,…,xn,u,L) Is the channel connection vector of user U, then U in the nth cellnThe total transmission delay of an individual user can be expressed as
Figure BDA0003382283100000046
Wherein when yn,u,f1-y when equal to 0n,u,fIndicating that the request content f is cached in other base stations or edge nodes; when z isn,u,f1-z when equal to 0n,u,fIndicating that the request content f is cached in the edge node; TDn,n′、TDn,mRespectively representing the wired transmission time delay between the base station n and the nearest neighbor base station n' and the edge node m.
2.2, calculation model
At present, streaming media contents such as panoramic video, live video, VR video and the like occupy about 80% of network traffic, and content response and subsequent encoding and reconstruction processes of the streaming media all need to be supported by a computing unit. Therefore, the computing resource allocation scheme directly affects the processing delay of content requests, especially immersive interactive VR videos, and too high delay may cause frequent jams of pictures to cause dizziness, which seriously affects the user experience. To facilitate the distribution rate of requested content, the base station needs to rationally plan the allocation of computing units. Do not define γnThe number of CPU cycles required for the base station n to process a unit task, isPrinciple FnThe total calculated delay for each requested content may be expressed as
Figure BDA0003382283100000047
Wherein λn,fIs the amount of computing resources allocated to the content f. By combining the formulas (5) and (6), U in the nth cellnThe content acquisition delay of each user includes transmission delay and calculation delay, that is:
CAn=TDn+CDn (7)
2.3 caching the update model
The base station in the wireless network system supporting the cache is provided with the cache device with limited capacity, so that after the content service of one period is finished, the base station n needs to set according to the content request
Figure BDA0003382283100000048
The cache is updated. First, the base station n calculates the local request probability of the cached content f
Figure BDA0003382283100000049
Wherein c isn,fThe number of requests for the content f. Consider the average request probability of content f over T consecutive periods
Figure BDA00033822831000000410
Wherein
Figure BDA00033822831000000411
Is the local request probability within the period t. Setting a buffer threshold epsilonqThe base station is composed of
Figure BDA00033822831000000412
And judging whether the content f of the current request needs to be cached continuously or not. Thus, when
Figure BDA00033822831000000413
Time-cache update variable g n,f1 indicates that the content f needs to be cached; when in use
Figure BDA00033822831000000414
Hour g
n,f0 means that the buffer content f is not needed, and the buffer space is released. At this time, the buffer content set of the base station n
Figure BDA00033822831000000415
Is updated to
Figure BDA00033822831000000416
Then, the base station n searches the global popularity p of the content f which is not cached in the requestfIf p isfRatio of
Figure BDA00033822831000000417
The minimum popularity of the medium content is large, i.e.
Figure BDA0003382283100000051
Add the content f to
Figure BDA0003382283100000052
And order
Figure BDA0003382283100000053
Each base station has buffer spaces with different sizes, so the total content of the buffer spaces cannot exceed the maximum buffer capacity C of the base stationsmax. At the same time, the cache update policy should maximize the hit rate (hit ratio) of the requested content, which does not translate into
Figure BDA0003382283100000054
Maximize the sum of the popularity of the contents
Figure BDA0003382283100000055
Generally, the content request of the user has a time duration, so equation (11) can enable the user to achieve a higher QoE in one period.
2.4 problem modeling
Inspired by the widely used QoE metrics, the Mean Opinion Score (MOS) model [24] can be used to measure the quality of content services such as video streaming downloads or web browsing. According to the formula (7) and the formula (11), the MOS model designed by the invention is as follows
Figure BDA0003382283100000056
Wherein the parameters C of the linear modeln,1,Cn,2Make the MOSn∈[1,5]Weight factor wn,1,wn,2Respectively representing the influence degrees of content acquisition delay and cache updating on the MOS. MOS of easily known nth cellnThe higher the score, the higher the QoE of the user.
The invention aims to optimize the system MOS in a maximum mode through multidimensional resource cooperation so as to meet the QoE requirements of users. Thus, the multidimensional resource optimization model is
Figure BDA0003382283100000057
Wherein constraints C1 and C2 represent 0-1 decisions for content delivery; constraint C3 represents a 0-1 decision of channel allocation; constraint C4 indicates that the transmission power allocated by the base station to subchannel/is less than the maximum transmission power; constraint C5 ensures that the user's receiving rate is greater than a predetermined acquisition rate requirement; constraint C6 indicates that the total amount of computing resources allocated is less than the maximum computing power of the base station; constraint C7 ensures that the total amount of content after the cache update is less than the base station cache capacity.
The mixed integer programming problem (13) requires the joint solving of four classes of 0-1 decision variables xn,u,f,yn,u,f,zn,u,f,gn,fAnd a discrete variable λn,f,Pn,u,l. The above decision variables cause the problem to be non-convex optimization, and a solution method of convex optimization cannot be used. How to efficiently and accurately solve the problem (13) is the basis of a multi-dimensional resource collaborative allocation scheme, and the resource optimization of the mobile edge system is realized by using the federal deep reinforcement learning.
3. Multi-dimensional resource collaborative optimization method based on federal deep reinforcement learning
The mobile edge system of the invention is a two-layer network architecture, which respectively solves the problem of resource collaborative optimization of different time scales. The base station at the bottom layer utilizes DDQL to carry out local model training to obtain the optimal decision in a short period; then, the edge nodes on the upper layer perform global model training by using FDQL to reduce the deviation of distributed decision in a long period.
3.1 local multidimensional resource collaborative optimization based on DDQL
In the face of the problem of local multi-dimensional resource collaborative optimization, the base station n is used as an agent to model the agent into a Markov decision process, DDQL is adopted to interact with the environment in a continuous trial and error mode, and an optimal strategy pi is searched through maximum accumulated reward*
MDP is expressed as a quadruplet<Sn,An,PRn,Rn>In which S isnRepresents a state space, AnRepresents a motion space, PRnRepresenting the probability of a state transition, RnRepresenting a reward function.
State space agent requires knowledge of the user and base station information before deciding on action selection, and thus state space SnConsisting of user requests and base station buffer status. At time slot i, system state
Figure BDA0003382283100000061
Is composed of
Figure BDA0003382283100000062
Action space the action space is the set of actions taken by an agent, the action vector of the present invention includesAllocation of resources, computing resources and cache updates, and thus action space anDefined as a multi-dimensional resource co-optimization mode
Figure BDA0003382283100000063
Wherein
Figure BDA0003382283100000064
A channel connection matrix is represented that is,
Figure BDA0003382283100000065
a vector of the power allocation is represented,
Figure BDA0003382283100000066
it is shown that the calculation unit is assigned a vector,
Figure BDA0003382283100000067
representing an updated content cache vector.
Reward function when the environment is in state
Figure BDA0003382283100000068
Execute actions at the time
Figure BDA0003382283100000069
The system will enter the next state
Figure BDA00033822831000000610
And obtain instant rewards
Figure BDA00033822831000000611
The optimization goal of the present invention is to maximize the QoE of the user, thus setting the MOS score as the reward function
Figure BDA00033822831000000612
State space SnTo the action space AnOne mapping of (d) constitutes the policy pi:
Figure BDA00033822831000000613
current state
Figure BDA00033822831000000614
Action taken by policy π
Figure BDA00033822831000000615
Can be expressed as
Figure BDA00033822831000000616
Where γ ∈ (0, 1) is the discount factor. According to Bellman equation, the Q function is updated by the formula
Figure BDA00033822831000000617
Where η ∈ (0, 1) is a learning rate that controls the learning speed.
The invention uses DDQL to search a local multidimensional resource collaborative optimization strategy and updates the parameters of the deep neural network
Figure BDA00033822831000000618
To approach the Q value corresponding to the optimal strategy
Figure BDA00033822831000000619
In order to break the association between adjacent data and maintain the independence of the data, the DDQL establishes an experience replay (experience replay) mechanism to use the experience obtained by the agent
Figure BDA00033822831000000620
And storing the experience playback pool to record each interaction process of the experience playback pool with the environment. When the experience playback pool is full, the new experience will randomly replace the old experience. On the other hand, in order toOvercoming the over-estimation of Q value in training, DDQL constructs the current network for selecting action and the target network for estimating function, and sets their state-action value functions as
Figure BDA00033822831000000621
And
Figure BDA00033822831000000622
wherein
Figure BDA00033822831000000623
Both represent parameters of the deep neural network.
When training starts, the DDQL firstly randomly samples small batch of samples from an experience playback pool and inputs the samples into a current network and a target network, and corresponding Q values are respectively calculated through forward propagation; then using the loss function
Figure BDA00033822831000000624
And performing back propagation on the current network to update the network parameters. Specifically, the calculation formula (20) relates to the parameter
Figure BDA00033822831000000625
Gradient of (2)
Figure BDA00033822831000000626
Then the parameter
Figure BDA00033822831000000627
Can be expressed as
Figure BDA00033822831000000628
Where α represents the learning rate.
The DDQL-based local multidimensional resource collaborative optimization model training method is given below.
Figure BDA00033822831000000629
Figure BDA0003382283100000071
3.2 FDQL-based global multidimensional resource collaborative optimization
As described above, each base station collects content demand data of local users and maximizes the short-term utility of the multidimensional resource collaborative optimization strategy by using DDQL, but it is a difficult problem how to assist edge nodes to quickly find a global optimal strategy. If the edge node uses the traditional centralized DQL training method, not only the communication cost is increased, but also the confidentiality and security of data are damaged. Therefore, the invention designs the FDQL framework on the basis of the DDQL to construct a high-quality centralized multidimensional resource collaborative optimization model. Each base station participating in federal learning uploads model parameters to the edge node instead of user content demand data, so that data transmission burden is effectively reduced, and the problem of user privacy disclosure is solved. For simplicity of illustration, the invention performs multidimensional resource collaborative optimization according to time periods {1, …, T, …, T +1, …, T + T, …, 2T, … }. And in a short period of t ≠ kT, each base station implements DDQL model training to obtain a local optimal strategy. Generally, the content requests of the users have time continuity, so that the local resource allocation strategy of each base station can better meet the QoE required by the users in a period of time. After receiving the content in a long period, the user demand is likely to change, so that the edge node needs to perform FDQL model training in the t-kT long period to obtain a global optimal strategy, and feeds back the global optimal strategy to each base station to enhance the generalization capability of the local DDQL, thereby improving the user content acquisition experience by using a better resource allocation strategy.
Fig. 2 shows the workflow of the FDQL framework. First, each base station updates parameters according to equations (21) and (22)
Figure BDA0003382283100000076
And uploading the data to the edge node; the edge node weights and aggregates the uploaded N parameters and obtains global parameters
Figure BDA0003382283100000072
Wherein
Figure BDA0003382283100000073
wnThe sum of the contents is requested for the user in the nth cell. Next, each base station uses the global parameters of the feedback
Figure BDA0003382283100000074
The next round of DDQL training is performed. The system repeats the above steps until the FDQL algorithm converges.
The FDQL-based global multidimensional resource collaborative optimization model training method is provided below.
Figure BDA0003382283100000075
Figure BDA0003382283100000081
4. Simulation analysis
The following verification proves that the multidimensional resource collaborative optimization scheme provided by the invention can maximally meet the QoE of the user on a Python platform, wherein each Monte Carlo experiment is performed 100 times. In the simulation scene, 1 edge node is deployed in the regional center, 5 base stations are uniformly distributed in an edge network, each base station serves 20 users, and the content requirements of the users are randomly generated. Other simulation parameters are shown in table 1.
TABLE 1 simulation parameters
Figure BDA0003382283100000082
4.1 DDQL parameter selection
The invention selects a feedback neural network comprising 3 hidden layers as a network model of the DDQL, wherein the number of neurons in the hidden layers has various combination modes and needs a large amount of experiments to combine and adjust the number. The invention runs 5 time periods in the 3 rd cell, takes the average MOS value as the evaluation index, and the experimental result is shown in the table 2. It can be seen that when the first input layer is set to 300 neurons, the second hidden layer is set to 600 neurons, and the third output layer is set to 300 neurons, the average MOS value of the DDQL is the highest. Without loss of generality, the optimal parameter settings in table 2 are adopted in the following experiments to ensure fast convergence of DDQL.
TABLE 2 DDQL parameter set comparison
Figure BDA0003382283100000083
4.2 Performance comparison of local multidimensional resource collaborative optimization algorithm
In order to evaluate the performance of the local multi-dimensional resource collaborative optimization algorithm provided by the invention, the local multi-dimensional resource collaborative optimization algorithm is combined with a Genetic Algorithm (GA) and a particle swarm algorithm (PSO)[11]Deep Q Learning (DQL), where PSO is a swarm intelligence optimization algorithm that simulates an entire bird population by designing particles that simulate bird behavior.
FIG. 3 compares the training performance of the four algorithms, and it is easy to see that the convergence performance of the greedy strategy GA is lower than that of the other three algorithms because the GA meets the optimal resource allocation of the current iteration and cannot guarantee that the solution is globally optimal; the PSO algorithm will yield too early convergence, i.e. tends to fall into local minima, and thus its global optimality is lower than DQL, resulting in lower MOS values. In the face of a complex state action-space model, the solving capability of the DQL is lower than that of the DDQL, the DDQL is fast and stable in convergence, the MOS value is the highest, and the best performance is shown. When the training times are more than 600, the four algorithms reach the convergence stationary phase, but the MOS values of the DDQL are respectively improved by 89.58%, 44.41% and 26.39% compared with GA, PSO and DQL, and the obtained multidimensional resource collaborative optimization strategy can provide the best content service experience for users.
Fig. 4 tests the local multidimensional resource co-optimization performance of four algorithms in 6 short periods, with 100 monte carlo experiments per period. As can be seen from fig. 4, DDQL has a higher average MOS value than GA, PSO, DQL in each period, and the user experience is better. The decision time is also one of the key indexes for measuring the QoE of the user, and higher decision delay influences the speed of the user for acquiring the content, thereby reducing the network service quality. Corresponding to fig. 4, fig. 5 shows the average decision time of four algorithms per cycle, where the GA needs more iterations to find a feasible solution of the complex problem, resulting in much higher average decision time than other algorithms; although the average decision time of PSO is greatly reduced compared with GA, the particles are easy to fall into the same chemical, and the diversity of understanding is reduced, so that the convergence rate is lower than that of the reinforcement learning method. Compared with DQL, the average decision time of DDQL is increased, but the average MOS value of a user can reach the highest value, and the optimal multidimensional resource collaborative optimization performance is realized. Table 3 shows that the total average decision time of DDQL is reduced by about 89.76%, 64.86% over 6 short cycles compared to GA, PSO.
TABLE 3 Total mean decision time for the four algorithms
Figure BDA0003382283100000091
Fig. 6 compares the resource co-optimization performance of the four algorithms for different numbers of users. With the increase of the number of users served by each base station, the total content demand is correspondingly increased, and the average MOS values of the four algorithms are gradually reduced. This is because the higher the total content requirement in the system is, the less multidimensional resources are allocated to each user, which affects the content acquisition speed and thus the QoE of the user. The DDQL adopts a double-layer network, and can obtain the optimal resource allocation strategy of a local optimization model (13), so that the MOS value is improved by about 97.62%, 47.34% and 30.10% compared with the GA, the PSO and the DQL, and the best content service is provided.
4.3 Performance comparison of Global multidimensional resource collaborative optimization Algorithm
In the invention, the edge nodes utilize FDQL to carry out global model training so as to reduce the deviation of distributed decision in a long period. In order to verify the validity of the FDQL, it is not compared with the conventional centralized DDQL and the loss function (20) is used as the evaluation criterion. Fig. 7 shows that the loss function of the centralized DDQL is much higher than that of the FDQL in the first 100 rounds of training, and the loss functions of the two algorithms are similar in the last 100 rounds of training, which indicates that the FDQL converges faster and has better stability.
FIG. 8 tests the global multidimensional resource co-optimization performance of four algorithms in 6 long periods. In the centralized DDQL, a large number of multidimensional resources are consumed by uploading all data to the edge nodes by each base station, so that the cooperative optimization performance of global resources is reduced, and the average MOS value of 6 long periods is reduced by 13.7% compared with that of FDQL. FIG. 9 shows the corresponding decision time, wherein the Q matrix of the centralized DDQL training is large in scale, and takes a lot of decision time; and FDQL leads each base station to train local data in parallel, and edge nodes only carry out federal fusion operation, so that the total average decision time is reduced by 83.80 percent compared with centralized DDQL.
Finally, fig. 10 and fig. 11 compare the global multidimensional resource collaborative optimization performance of the centralized DDQL and the FDQL under different network environments. As the number of users served by each base station increases to 26, fig. 10 shows that the global optimum solution is not easily obtained by the centralized DDQL due to the enlargement of the data size, and the average MOS value is sharply decreased. Similarly, when the number of base stations increases, i.e. the network scale increases, fig. 11 shows that the centralized DDQL still has a performance bottleneck, and cannot provide good content service for users of large-scale networks. However, the FDQL can obtain the optimal multidimensional resource optimization scheme regardless of the increase of the content demand of the users in the local cell or the increase of the global network scale, so as to provide stable content service for the users and ensure the efficient operation of the MEC system.
5. Summary of the invention
Aiming at the requirement of mass content service in MEC, the invention provides a multi-dimensional resource collaborative optimization algorithm based on the federal deep learning, which is beneficial to reducing the service delay of a system and improving the user experience. The method models a frequency spectrum, calculation and cache resource joint optimization problem into a mixed integer nonlinear programming problem, and constructs a double-layer multi-dimensional resource allocation framework decoupling original problem according to the federal deep reinforcement learning. The base station at the bottom layer adopts DDQL to obtain a local resource collaborative optimization strategy in a short period, so that the full utilization of local multidimensional resources is realized; and the edge node of the upper layer adopts FDQL to train the global optimal resource collaborative optimization strategy in a long period, so that the user can obtain the request content with low time delay and high quality. Simulation experiments show that the DDQL algorithm has higher local QoE performance than GA, PSO and DQL; meanwhile, the method has better global decision stability than centralized DDQL.
Reference documents:
[1]Ghosh A,Maeder A,Baker M,et al.5G evolution:a view on 5G cellular technology beyond 3GPP release 15[J].IEEE Ac-cess,2019,7(99):127639-127651.
[2]Huang J Y,Nkenyereye L,Sung N M,et al.IoT service slicing and task offloading for edge computing[J].IEEE Access,2020,8(14):11526-11547.
[3]Cheng J,Yuan G,Zhou M,et al.Accessibility analysis and modeling for IoV in an urban scene[J].IEEE Transactions on Vehic-ular Technology,2020,69(4):4246-4256.
[4]He S W,Huang W,Wang J H,et al.Cache-enabled coordinated mobile edge network:opportunities and challenges[J].IEEE Wireless Communications,2020,27(2):204-211.
[5]Yang z,Du Y,Che C,et al.Energy-efficient joint resource allocation algorithms for MEC-enabled emotional computing in ur-ban communities[J].IEEE Access,2019,7(7):137410-137419.
[6]Guo Hz,Liu J J,zhang J.Computation offloading for multi-access mobile edge computing in ultradense networks[J].IEEE Communications Magazme,2018,56(8):14-19.
[7]Kamel M,Hamouda W,Youssef A.Ultra-dense networks:a survey[J].IEEE Communications Surveys&Tutorials,2016,18(4):2522-2545.
[8]Abbas N,zhang Y,Taherkordi A,et al.Mobile edge computing:a survey[J].IEEE Internet of Things Journal,2018,5(1):450-465.
[9]zhang K,Leng S P,He Y J,et al.Mobile edge computing and networking for green and low-latency Internet of things[J].IEEE Communications Magazine,2018,56(5):39-45.
[10]Li H,Xu H,Zhou C,et al.Joint optimization strategy of computation offloading and resource allocation in multi-access edge computing environment[J].IEEE Transactions on Vehicular Technology,2020,69(9):10214-10226.
[11]Guo F,Zhang H,Hong J,et al.An efficient computation offloading management scheme in the densely dep1oyed small cell net-works with mobile edge computing[J].IEEE/ACM Transactions on Networking,2018,26(6):2651-2664.
[12]Cao T,Xu C,J Du,et al.Reliable and efficient multimedia service optimization for edge computing-based 5G networks:game theoretic approaches[J].IEEE Transactions on Vehicular Technology,2020,17(3):1610-1625.
[13] lienwei, zhanghaibo, prince heart. in the internet of vehicles, MEC-based V2x cooperative caching and resource allocation [ J ] communication bulletin, 2021, 42 (02): 26-36.
[14]Kiran N,Pan C,Wang S,et al.Joint resource allocation and computation ofloading in mobile edge computing for SDN based wireless networks[J].Journal of Communications and Networks,2020,22(1):1-11.
[15] Pengjun, wang cheng long, jiang fu, zhao, yue, liu wei rong a rapid deep Q learning network side cloud migration strategy for vehicle services [ J ]. electronic and information bulletin, 2020, 42 (01): 58-64.
[16]Guo B,Zhang X,WangY,et al.Deep-Q-network-based multimedia multi-service QoS optimization for mobile edge computing systems[J].IEEEAccess,2019,7:160961-160972.
[17] Widely available, zhang jun, li wenjing, zhou fan, torpedo, pay timely rain, qixue xue. 148-161.
[18]Ren J,Wang H,Hou T T,et al.Collaborative edge computing and caching with deep reinforcement learning decision agents[J].IEEE Access,2020,8:120604-120612.
[19]Liu T,ZhangY,Zhu Y,et al.Online computation offloading and resource scheduling in mobile-edge computing[J].IEEE Inter-net of Things Journal,2021,8(8):6649-6664.
[20]Messaond S,Bradai A,Ahmed O,et al.Deep federated Q-learning-based network slicing for industrial IoT[J].IEEE Transac-tions on Industrial Informatics,2021,17(8):5572-5582.
[21]Yu R,Li P.Toward resource-efficient federated 1earning in mobile edge computing[J].IEEE Network,2021,35(1):148-155.
[22]Zhong Z,Zhou Y P,Wu D,et al.P-FedAvg:parallelizing federated learning with theoretical guarantees[J].IEEE Transactions on Vehicular Technology,2020,69(4):4246-4256.
[23]Wang X,Wang C,Li X,et al.Federated deep reinforcement learning for internet of things with decentralized cooperative edge caching[J].IEEE Internet of Things Journal,2020,7(10):9441-9455.
[24]Rugelj M,Sedlar U,Volk M,et al.Novel cross-layer QoE-aware radio resource allocation algorithms in multiuser OFDMA systems.IEEE Transactions Communications,2014,62(9):3196-3208。

Claims (4)

1. A multi-dimensional resource collaborative optimization method based on FDQL in a mobile edge network comprises a plurality of base stations and an edge node in a mobile edge computing MEC system, wherein the base stations are communicated with the edge node and a neighbor base station, and the base stations and the edge node have the capability of providing computing and caching services; the method is characterized in that the method for the collaborative optimization of the FDQL-based multidimensional resources in the mobile edge network comprises the following steps: 1) constructing a multi-dimensional resource allocation model to represent allocation and cache updating of frequency spectrums and computing sources; 2) optimizing a multidimensional resource allocation model;
in the step 1), a multidimensional resource allocation model is constructed by taking the minimum mean opinion score MOS as an optimization target;
the MOS model is as follows:
Figure FDA0003382283090000011
wherein the parameters C of the linear modeln,1,Cn,2Make the MOSn∈[1,5]Weight factor wn,1,wn,2Respectively representing the influence degrees of content acquisition delay and cache updating on the MOS; CAnIs U in the nth cellnThe content acquisition delay of each user comprises transmission delay and calculation delay; ps isnIs U in the nth cellnThe base station updates the cache according to the content request set; the nth cell is the range covered by the base station n;
MOS of nth cellnThe higher the score is, the higher the QoE (quality of experience) of the user is, and the multidimensional resource optimization model is maxMOSn
In the step 2) of the said step,
2.1) carrying out local model training on the base station of the bottom layer by using the double-depth Q learning DDQL to obtain the optimal decision in a short period:
2.1.1) taking a base station n as an agent, and modeling a local resource allocation problem into a Markov Decision Process (MDP);
2.1.2 interacting with the environment in a continuous trial and error mode by adopting DDQL, and searching an optimal strategy by maximizing accumulated rewards;
2.2) carrying out global model training on the edge nodes on the upper layer by using the Federal deep reinforcement learning FDQL to reduce the deviation of distributed decisions in a long period:
performing multidimensional resource collaborative optimization according to a time period {1, …, T, …, T, T +1, …, T + T, …, 2T, … };
in a short period that t is not equal to kT, each base station implements DDQL model training to obtain a local optimal multidimensional resource allocation strategy;
and (3) carrying out FDQL model training by the edge node in the t-kT long period to obtain a globally optimal multi-dimensional resource allocation strategy, and feeding back to each base station to enhance the generalization capability of the local DDQL, thereby improving the user content acquisition experience by using a better resource allocation strategy.
2. The method as claimed in claim 1, wherein in the step 2.1.1), the Markov decision process MDP is expressed as a quadruple < Sn,An,PRn,RnIs where SnRepresents a state space, AnRepresents a motion space, PRnRepresenting the probability of a state transition, RnRepresenting a reward function;
state space: the state space S is the space where the agent needs to know the information of the user and the base station before deciding on the action selectionnThe method comprises the steps of consisting of a user request and a base station cache state; at time slot i, system state
Figure FDA0003382283090000012
Where r and c represent content requests and content caches respectively,
Figure FDA0003382283090000013
and
Figure FDA0003382283090000014
respectively represent the 1 st and the U thnThe status of the individual user or users is,
Figure FDA0003382283090000015
representing the buffer state of the base station n;
an action space: the action space is a set of actions taken by the agent; the motion vector includes communication, allocation of computing resources, and cache update, then motion space AnThe method is defined as a multidimensional resource collaborative optimization mode:
Figure FDA0003382283090000016
wherein
Figure FDA0003382283090000017
A channel connection matrix is represented that is,
Figure FDA0003382283090000018
a vector of the power allocation is represented,
Figure FDA0003382283090000019
it is shown that the calculation unit is assigned a vector,
Figure FDA00033822830900000110
a content cache vector representing an update;
the reward function: when the environment is in a state
Figure FDA00033822830900000111
Execute actions at the time
Figure FDA00033822830900000112
The system enters the next state
Figure FDA00033822830900000113
And obtain instant rewards
Figure FDA00033822830900000114
Set the MOS score to the reward function
Figure FDA00033822830900000115
State space SnTo the action space AnOne mapping of (d) constitutes the policy pi:
Figure FDA00033822830900000116
current state
Figure FDA00033822830900000117
Action taken by policy π
Figure FDA00033822830900000118
The action-state value function of (a) is expressed as:
Figure FDA00033822830900000119
wherein γ ∈ (0, 1) is a discount factor;
from Bellman equalisation Bellman's equation, update of Q function
Figure FDA0003382283090000021
Where η ∈ (0, 1) is a learning rate that controls the learning speed.
3. The method as claimed in claim 2, wherein in step 2.1.2), DDQL is used to find the local multidimensional resource collaborative optimization strategy, and parameters of the deep neural network are updated
Figure FDA0003382283090000022
To approach the Q value corresponding to the optimal strategy,
Figure FDA0003382283090000023
DDQL establishes experience playback mechanism to transfer the experience obtained by the agent
Figure FDA0003382283090000024
Storing the experience playback pool to record each interaction process of the experience playback pool and the environment;
when the experience playback pool is full, the new experience randomly replaces the old experience;
the DDQL constructs the current network for selection action and the function for evaluationA plurality of target networks, and setting their state-action value functions as
Figure FDA0003382283090000025
And
Figure FDA0003382283090000026
wherein
Figure FDA0003382283090000027
All represent parameters of the deep neural network;
when training starts, the DDQL firstly randomly samples small batch of samples from an experience playback pool and inputs the samples into a current network and a target network, and corresponding Q values are respectively calculated through forward propagation; then, the loss function is used for carrying out back propagation on the current network so as to update the network parameters;
the loss function is
Figure FDA0003382283090000028
It relates to the parameter
Figure FDA0003382283090000029
Is that
Figure FDA00033822830900000210
Then the parameter
Figure FDA00033822830900000211
Is expressed as
Figure FDA00033822830900000212
Where α represents the learning rate.
4. The method for collaborative optimization of multidimensional resources based on FDQL in mobile edge network as claimed in claim 3, wherein in the step 2.2), the workflow of FDQL framework:
first, each base station updates parameters
Figure FDA00033822830900000213
And uploading the data to the edge node;
the edge node weights and aggregates the uploaded N parameters, and obtains global parameters
Figure FDA00033822830900000214
Wherein the subscript g represents
Figure FDA00033822830900000215
As a result of the global parameter,
Figure FDA00033822830900000216
wnrequesting the sum of contents for the users in the nth cell;
next, each base station uses the global parameters of the feedback
Figure FDA00033822830900000217
Carrying out next round of DDQL training;
and repeating the steps until the FDQL algorithm converges.
CN202111447130.6A 2021-11-30 2021-11-30 FDQL-based multi-dimensional resource collaborative optimization method in mobile edge network Pending CN114143891A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111447130.6A CN114143891A (en) 2021-11-30 2021-11-30 FDQL-based multi-dimensional resource collaborative optimization method in mobile edge network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111447130.6A CN114143891A (en) 2021-11-30 2021-11-30 FDQL-based multi-dimensional resource collaborative optimization method in mobile edge network

Publications (1)

Publication Number Publication Date
CN114143891A true CN114143891A (en) 2022-03-04

Family

ID=80386175

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111447130.6A Pending CN114143891A (en) 2021-11-30 2021-11-30 FDQL-based multi-dimensional resource collaborative optimization method in mobile edge network

Country Status (1)

Country Link
CN (1) CN114143891A (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114449482A (en) * 2022-03-11 2022-05-06 南京理工大学 Heterogeneous vehicle networking user association method based on multi-agent deep reinforcement learning
CN114745383A (en) * 2022-04-08 2022-07-12 浙江金乙昌科技股份有限公司 Mobile edge calculation assisted multilayer federal learning method
CN115002212A (en) * 2022-04-12 2022-09-02 广州大学 Combined caching and unloading method and system based on cross entropy optimization algorithm
CN115080249A (en) * 2022-08-22 2022-09-20 南京可信区块链与算法经济研究院有限公司 Vehicle networking multidimensional resource allocation method and system based on federal learning
CN115208952A (en) * 2022-07-20 2022-10-18 北京交通大学 Intelligent collaborative content caching method
CN115361688A (en) * 2022-07-13 2022-11-18 西安电子科技大学 Industrial wireless edge gateway optimization layout scheme based on machine learning
CN115756873A (en) * 2022-12-15 2023-03-07 北京交通大学 Mobile edge computing unloading method and platform based on federal reinforcement learning
CN116032757A (en) * 2022-12-16 2023-04-28 缀初网络技术(上海)有限公司 Network resource optimization method and device for edge cloud running scene
CN116209015A (en) * 2023-04-27 2023-06-02 合肥工业大学智能制造技术研究院 Edge network cache scheduling method, system and storage medium
CN116321219A (en) * 2023-01-09 2023-06-23 北京邮电大学 Self-adaptive honeycomb base station federation forming method, federation learning method and device
CN116346921A (en) * 2023-03-29 2023-06-27 华能澜沧江水电股份有限公司 Multi-server collaborative cache updating method and device for security management and control of river basin dam
CN117042051A (en) * 2023-08-29 2023-11-10 燕山大学 Task unloading strategy generation method, system, equipment and medium in Internet of vehicles
CN116346921B (en) * 2023-03-29 2024-06-11 华能澜沧江水电股份有限公司 Multi-server collaborative cache updating method and device for security management and control of river basin dam

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114449482A (en) * 2022-03-11 2022-05-06 南京理工大学 Heterogeneous vehicle networking user association method based on multi-agent deep reinforcement learning
CN114449482B (en) * 2022-03-11 2024-05-14 南京理工大学 Heterogeneous Internet of vehicles user association method based on multi-agent deep reinforcement learning
CN114745383A (en) * 2022-04-08 2022-07-12 浙江金乙昌科技股份有限公司 Mobile edge calculation assisted multilayer federal learning method
CN115002212A (en) * 2022-04-12 2022-09-02 广州大学 Combined caching and unloading method and system based on cross entropy optimization algorithm
CN115002212B (en) * 2022-04-12 2024-02-27 广州大学 Combined caching and unloading method and system based on cross entropy optimization algorithm
CN115361688B (en) * 2022-07-13 2023-11-10 西安电子科技大学 Industrial wireless edge gateway optimization layout scheme based on machine learning
CN115361688A (en) * 2022-07-13 2022-11-18 西安电子科技大学 Industrial wireless edge gateway optimization layout scheme based on machine learning
CN115208952A (en) * 2022-07-20 2022-10-18 北京交通大学 Intelligent collaborative content caching method
CN115208952B (en) * 2022-07-20 2023-09-26 北京交通大学 Intelligent collaborative content caching method
CN115080249A (en) * 2022-08-22 2022-09-20 南京可信区块链与算法经济研究院有限公司 Vehicle networking multidimensional resource allocation method and system based on federal learning
CN115756873A (en) * 2022-12-15 2023-03-07 北京交通大学 Mobile edge computing unloading method and platform based on federal reinforcement learning
CN115756873B (en) * 2022-12-15 2023-10-13 北京交通大学 Mobile edge computing and unloading method and platform based on federation reinforcement learning
CN116032757A (en) * 2022-12-16 2023-04-28 缀初网络技术(上海)有限公司 Network resource optimization method and device for edge cloud running scene
CN116032757B (en) * 2022-12-16 2024-05-10 派欧云计算(上海)有限公司 Network resource optimization method and device for edge cloud running scene
CN116321219A (en) * 2023-01-09 2023-06-23 北京邮电大学 Self-adaptive honeycomb base station federation forming method, federation learning method and device
CN116321219B (en) * 2023-01-09 2024-04-19 北京邮电大学 Self-adaptive honeycomb base station federation forming method, federation learning method and device
CN116346921A (en) * 2023-03-29 2023-06-27 华能澜沧江水电股份有限公司 Multi-server collaborative cache updating method and device for security management and control of river basin dam
CN116346921B (en) * 2023-03-29 2024-06-11 华能澜沧江水电股份有限公司 Multi-server collaborative cache updating method and device for security management and control of river basin dam
CN116209015B (en) * 2023-04-27 2023-06-27 合肥工业大学智能制造技术研究院 Edge network cache scheduling method, system and storage medium
CN116209015A (en) * 2023-04-27 2023-06-02 合肥工业大学智能制造技术研究院 Edge network cache scheduling method, system and storage medium
CN117042051B (en) * 2023-08-29 2024-03-08 燕山大学 Task unloading strategy generation method, system, equipment and medium in Internet of vehicles
CN117042051A (en) * 2023-08-29 2023-11-10 燕山大学 Task unloading strategy generation method, system, equipment and medium in Internet of vehicles

Similar Documents

Publication Publication Date Title
CN114143891A (en) FDQL-based multi-dimensional resource collaborative optimization method in mobile edge network
Zhou et al. Incentive-driven deep reinforcement learning for content caching and D2D offloading
Wei et al. Joint optimization of caching, computing, and radio resources for fog-enabled IoT using natural actor–critic deep reinforcement learning
Lin et al. Resource management for pervasive-edge-computing-assisted wireless VR streaming in industrial Internet of Things
CN112020103B (en) Content cache deployment method in mobile edge cloud
Yang et al. Social-energy-aware user clustering for content sharing based on D2D multicast communications
Zhang et al. Joint optimization of cooperative edge caching and radio resource allocation in 5G-enabled massive IoT networks
Shan et al. A survey on computation offloading for mobile edge computing information
Cao et al. Reliable and efficient multimedia service optimization for edge computing-based 5G networks: game theoretic approaches
Ko et al. Joint client selection and bandwidth allocation algorithm for federated learning
Majidi et al. Hfdrl: An intelligent dynamic cooperate cashing method based on hierarchical federated deep reinforcement learning in edge-enabled iot
CN104426979A (en) Distributed buffer scheduling system and method based on social relations
Chua et al. Resource allocation for mobile metaverse with the Internet of Vehicles over 6G wireless communications: A deep reinforcement learning approach
Cha et al. Fuzzy logic based client selection for federated learning in vehicular networks
Fu et al. Traffic prediction-enabled energy-efficient dynamic computing resource allocation in cran based on deep learning
CN113918829A (en) Content caching and recommending method based on federal learning in fog computing network
Xi et al. Real-time resource slicing for 5G RAN via deep reinforcement learning
Balasubramanian et al. FedCo: A federated learning controller for content management in multi-party edge systems
Sun et al. A DQN-based cache strategy for mobile edge networks
CN113821346B (en) Edge computing unloading and resource management method based on deep reinforcement learning
Seid et al. Blockchain-empowered resource allocation in Multi-UAV-enabled 5G-RAN: a multi-agent deep reinforcement learning approach
Lin et al. Joint optimization of preference-aware caching and content migration in cost-efficient mobile edge networks
Zhang et al. Toward intelligent resource allocation on task-oriented semantic communication
Cui et al. Multi-Agent Reinforcement Learning Based Cooperative Multitype Task Offloading Strategy for Internet of Vehicles in B5G/6G Network
Liu et al. Multi-agent federated reinforcement learning strategy for mobile virtual reality delivery networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination