CN116996921A - Whole-network multi-service joint optimization method based on element reinforcement learning - Google Patents

Whole-network multi-service joint optimization method based on element reinforcement learning Download PDF

Info

Publication number
CN116996921A
CN116996921A CN202311252903.4A CN202311252903A CN116996921A CN 116996921 A CN116996921 A CN 116996921A CN 202311252903 A CN202311252903 A CN 202311252903A CN 116996921 A CN116996921 A CN 116996921A
Authority
CN
China
Prior art keywords
network
module
reinforcement learning
rate
route
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311252903.4A
Other languages
Chinese (zh)
Other versions
CN116996921B (en
Inventor
黄川�
崔曙光
李然
符浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chinese University of Hong Kong Shenzhen
Original Assignee
Chinese University of Hong Kong Shenzhen
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chinese University of Hong Kong Shenzhen filed Critical Chinese University of Hong Kong Shenzhen
Priority to CN202311252903.4A priority Critical patent/CN116996921B/en
Publication of CN116996921A publication Critical patent/CN116996921A/en
Application granted granted Critical
Publication of CN116996921B publication Critical patent/CN116996921B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/02Arrangements for optimising operational condition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0499Feedforward networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/092Reinforcement learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/0985Hyperparameter optimisation; Meta-learning; Learning-to-learn
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention discloses a full-network multi-service joint optimization method based on element reinforcement learning, which comprises the following steps: s1, a 5G communication platform comprising a three-layer network structure of a wireless access network, a transmission network and a core network is built, and an objective function of joint optimization is determined; s2, constructing a multi-service-oriented route cache module; s3, constructing a meta reinforcement learning model, which comprisesThe system comprises an Actor network, a Critic network and a task experience set caching module; s4, training and determining parameters of a route cache module based on the meta reinforcement learning model; s5, multi-service joint optimization of the 5G whole network is carried out. The invention is controlled byThe route caching method of the whole system communication network at each layer of network management realizes the whole network joint optimization of multiple services.

Description

Whole-network multi-service joint optimization method based on element reinforcement learning
Technical Field
The invention relates to the field of communication, in particular to a full-network multi-service joint optimization method based on meta reinforcement learning.
Background
In recent years, the rapid development of 5G mobile communication technology has greatly improved the communication quality of end-to-end transmission services. However, with the increasing coverage and interaction depth of terminal devices, the 5G communication network meets the difficulties of large service traffic, high jitter, multiple types and multiple numbers, and brings great challenges to the whole network resource control.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, and provides a full-network multi-service joint optimization method based on element reinforcement learning, which realizes the full-network joint optimization of multi-service by controlling a route caching method of a communication full-network at each layer of network management.
The aim of the invention is realized by the following technical scheme: a full-network multi-service joint optimization method based on element reinforcement learning comprises the following steps:
s1, constructing a 5G communication platform comprising a three-layer network structure of a wireless access network, a transmission network and a core network, and determining a joint optimization objective function;
s2, constructing a multi-service-oriented route cache module;
s3, constructing a meta reinforcement learning model, which comprisesThe system comprises an Actor network, a Critic network and a task experience set caching module;
s4, training and determining parameters of a route cache module based on the meta reinforcement learning model;
s5, multi-service joint optimization of the 5G whole network is carried out.
The beneficial effects of the invention are as follows: the invention realizes the whole network joint optimization of multiple services by controlling the route caching method of the communication whole network at each layer of network management. Considering that the traditional multi-service joint optimization focuses on the design of time and frequency spectrum slice diversity of communication resources of each network, the optimization upper bound of the whole-network multi-service transmission is difficult to approach, and the situation of unstable multi-service rate flow is more difficult to deal with. The invention adopts a meta-reinforcement learning algorithm, can learn the route caching method under the distribution of different business flows of multiple businesses, and greatly improves the utilization rate of the whole network resources and the service quality of the multiple businesses.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Detailed Description
The technical solution of the present invention will be described in further detail with reference to the accompanying drawings, but the scope of the present invention is not limited to the following description.
As shown in fig. 1, a method for optimizing the combination of multiple services in a whole network based on element reinforcement learning includes the following steps:
s1, constructing a 5G communication platform comprising a three-layer network structure of a wireless access network, a transmission network and a core network, and determining a joint optimization objective function;
s101: and building a 5G open communication platform and a three-layer network structure comprising a wireless access network, a transmission network and a core network.
S1011: building a radio access network comprisingA wireless access terminal and an access base station, < >>Personal wireless access terminal initiation->Communication service flows and corresponding air interface communication rate is +.>The method comprises the steps of carrying out a first treatment on the surface of the The access base station comprises->Radio channel resources and channel gain of +.>The method comprises the steps of carrying out a first treatment on the surface of the The radio access network adopts a standard 5G air interface channel allocation scheme, which is marked as +.>;/>After entering the radio access network, the radio access network will be based on +.>Channel allocation is carried out on the communication service flows, and based on the result of the channel allocation, the speed of the communication service flows can correspondingly change when the wireless access network is output, and the communication service flows are recorded as +.>The output rate of the individual communication traffic streams is +.>Its value is->,/> and />And (5) completely determining.
S1012: and constructing a transmission network, including a transmission network route and a transmission network link. The transmission network comprisesBackground stream traffic in individual dimensions, rate is denoted +.>The method comprises the steps of carrying out a first treatment on the surface of the The transmission network adopts standard 5G transmission route protocol, which is marked as +.>. The transmission network is based on->After the routing configuration is completed, the method comprises the steps of (1)>The size of each communication service flow is correspondingly changed, and weThe input rate of the transmission network is recorded asOutput rate is +.>, wherein />The value of (2) is->,/> and />And (5) completely determining.
S1013: and building a core network, wherein the core network comprises a core network route and a core network link. The core network comprisesBackground stream traffic in individual dimensions, rate is denoted +.>The method comprises the steps of carrying out a first treatment on the surface of the The core network adopts the standard 5G core network routing protocol, which is marked as +.>. The core network is based on->After the routing configuration is completed, the method comprises the steps of (1)>The size of each communication service flow will change correspondingly, and we will note that the input rate of the core network is as followsOutput rate is +.>, wherein />The value of (2) is->,/> and />And (5) completely determining.
S102: and (5) representing a multi-service joint optimization target. In the first placeAt time we will->The optimization objective function of individual business is marked +.>The size is +.>,/>,/>,/>,/> and />Influence. The objective function of the multi-service joint optimization can be written as
(1.1)
S2, constructing a multi-service-oriented route cache module;
s201: a first route buffer module is constructed between the wireless access network and the transmission network, and the module can realize the rate control of any input flow by adopting a classical route buffer algorithm. We will input wireless into the networkThe communication traffic flows are input into the module, and the corresponding traffic flow rate is +>The output traffic flow rate is +.>
S202: and constructing a second route buffer module between the transmission network and the core network, wherein the module can realize the rate control of any input flow by adopting a classical route buffer algorithm. We will transport the networkThe communication traffic flows are input into the module, and the corresponding traffic flow rate is +>The output traffic flow rate is +.>
S3, constructing a meta reinforcement learning model, which comprisesThe system comprises an Actor network, a Critic network and a task experience set caching module;
s301: constructionAnd an Actor network. Each Actor network is a double-layer fully-connected neural network, which is marked by +.>The parameters of the individual Actor network are +.>The input is +.>When->At the time->The output of the personal Actor network is +.>I.e. +.>Is>Element>The rate at which the individual traffic streams are output from the first route buffer module; when->At the time->The output of the personal Actor network is +.>I.e. +.>Is the first of (2)Element>The rate at which the individual traffic streams are output from the second route buffer module;
s302: constructing a Critic network which is a double-layer fully-connected nerveNetwork and by parametersCharacterization, its input includes->,/> and />The output of which characterizes the value function of the input variable value;
s303: and (3) constructing a task experience set caching module: the module is a buffer with a fixed storage space, and the initial state is empty and is used for storing task experience generated in the training process of meta reinforcement learning.
S4, training and determining parameters of a route cache module based on the meta reinforcement learning model;
s401, establishing a Markov decision model;
s4011 define a state asAction is +.>The reward is +.>
S4012. Determine a state transition relationship: based on,/>,/>,/>Is estimated by Bayesian reasoning>,/>,/> and />Values or distributions of (2) to thereby obtainA value or distribution of (2);
due toDefined as->,/>,/>,/>Is able to infer +.>Is->,/> and />Is aware of->,/>,/> and />The value or distribution of (2) is obtained>A value or distribution of (2);
s402: generating a task data set;
s4021: order the
S4022: random initializationTraffic flow rate distribution for individual wireless access terminals;
s4023: order the
S4024: observationAnd send in->An Actor network to give a probability of 0.98 +.>The output of the personal Actor network is assigned to +.>Back->The output of the personal Actor network is assigned to +.>And thus get->Is assigned to +.a.A set of random values with a probability of 0.02>To ensure the exploration of the meta reinforcement learning algorithm;
s4025: execution ofThe input rate of the transmission network and the core network is made to be +.> and />
S4026: observing and recording and />Is a value of (2);
s4027: will beArchive as an experience and store in task experience set caching module +.>The experience of each task is concentrated;
s4028: if it isTerminating the loop, proceeding to step S4029, otherwise letting +.>Returning to step S4024;
s4029: if it isTerminating the loop, proceeding to step S403, otherwise letting +.>Returning to step S4022;
s403: training a meta reinforcement learning model;
s4031: random initialization and />Is a value of (2);
s4032: order the
S4033: randomly selecting a task experience set from 100 task experience sets;
s4034: order the
S4035: taking K experiences from task experience setCalculating a loss function
(1.2)
wherein ,,/>is->Personal Actor network->For output at input, ++>For Critic network with-> and />Is the output at the time of input. We use the->Value back propagation update parameter in Critic network>
S4036: minimization ofTo update->Parameter in personal Actor network +.>, wherein ,
s4037: if it isTerminating the loop, proceeding to step S4038, otherwise letting +.>Returning to step S4035;
s4038: if it isTerminating the loop, proceeding to step S404, otherwise letting +.>Returning to step S4033;
s404: before trainingThe Actor network is deployed to the first route buffer module and then +.>And deploying the Actor network to a second route cache module.
S5, carrying out multi-service joint optimization of the 5G whole network;
s501: order the
S502: observationThe value of (2) is fed into a route buffer module, and is obtained based on an Actor network in the route buffer module> and />Is a value of (2);
s503: performing a slave at a first route cache moduleTo->Is performed from +.>To->Is a rate conversion of (2);
s504: judging whether or not to meet
If it isThe circulation is terminated, and multi-service joint optimization of the whole network is completed at the moment;
it should be noted that: this process calculates and />Is updated at different times, since the size of the objective function of the multi-service joint optimization is subject to +.>,/>Influence, so optimize-> and />The effect of the joint optimization of multiple services is achieved.
Otherwise, letAnd returns to step S502.
The above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (6)

1. A full-network multi-service joint optimization method based on element reinforcement learning is characterized by comprising the following steps of: the method comprises the following steps:
s1, constructing a 5G communication platform comprising a three-layer network structure of a wireless access network, a transmission network and a core network, and determining a joint optimization objective function;
s2, constructing a multi-service-oriented route cache module;
s3, constructing a meta reinforcement learning model, which comprisesThe system comprises an Actor network, a Critic network and a task experience set caching module;
s4, training and determining parameters of a route cache module based on the meta reinforcement learning model;
s5, multi-service joint optimization of the 5G whole network is carried out.
2. The method for optimizing the whole network multi-service combination based on element reinforcement learning according to claim 1, wherein the method comprises the following steps: the step S1 includes:
s101: building a 5G communication platform comprising a three-layer network structure of a wireless access network, a transmission network and a core network:
s1011: building a radio access network comprisingA wireless access terminal and an access base station, < >>Personal wireless access terminal initiation->Communication service flows and corresponding air interface communication rate is +.>The method comprises the steps of carrying out a first treatment on the surface of the The access base station comprises->Radio channel resources and channel gain of +.>The method comprises the steps of carrying out a first treatment on the surface of the The radio access network adopts a standard 5G air interface channel allocation scheme, which is marked as +.>;/>After entering the radio access network, the radio access network will be based on +.>Channel allocation is carried out on the communication service flows, and based on the result of the channel allocation, the speed of the communication service flows can correspondingly change when the wireless access network is output, and the communication service flows are recorded as +.>The output rate of the individual communication traffic streams is +.>Its value is->,/> and />Completely determining;
s1012: constructing a transmission network, including a transmission network route and a transmission network link; the transmission network comprisesBackground stream traffic in individual dimensions, rate is denoted +.>The method comprises the steps of carrying out a first treatment on the surface of the The transmission network adopts standard 5G transmission route protocol, which is marked as +.>The method comprises the steps of carrying out a first treatment on the surface of the The transmission network is based on->After the routing configuration is completed, the method comprises the steps of (1)>The size of each communication service flow is correspondingly changed, and the input rate of the transmission network is recorded as +.>Output rate is +.>, wherein />The value of (2) is->,/> and />Completely determining;
s1013: building a core network, including a core network route and a core network link; the core network comprisesBackground stream traffic in individual dimensions, rate is denoted +.>The method comprises the steps of carrying out a first treatment on the surface of the The core network adopts the standard 5G core network routing protocol, which is marked as +.>The core network is based on->After the routing configuration is completed, the method comprises the steps of (1)>The size of each communication service flow will change correspondingly, and the input rate of the core network is recorded as +.>Output rate is +.>, wherein />The value of (2) is->,/> and />Completely determining;
s102: characterizing multi-business joint optimization objectives, in the firstAt time we will->The optimization objective function of each service is recorded asThe objective function of the multi-service joint optimization is recorded as
(1.1)
Size is subject to->,/>,/>,/>,/> and />Influence.
3. The method for optimizing the whole network multi-service combination based on element reinforcement learning according to claim 1, wherein the method comprises the following steps: the step S2 includes:
s201: a first route buffer module is constructed between the wireless access network and the transmission network, the module adopts classical route buffer algorithm to realize the rate control of any input flow, and the wireless input network is connected with the first route buffer moduleThe communication traffic flows are input into the module, and the corresponding traffic flow rate is +>The output traffic flow rate is +.>
S202: a second route buffer module is constructed between the transmission network and the core network, the module adopts classical route buffer algorithm to realize the rate control of any input flow, and the transmission network is connected with the second route buffer moduleIndividual traffic streamsInto the module, the corresponding traffic flow rate is +.>The output traffic flow rate is +.>
4. The method for optimizing the whole network multi-service combination based on element reinforcement learning according to claim 1, wherein the method comprises the following steps: the step S3 includes:
s301: constructionAn Actor network, each of which is a double-layer fully-connected neural network, record +.>The parameters of the individual Actor network are +.>The input is +.>When->At the time->The output of the personal Actor network is +.>I.e. +.>Is>Element>The rate at which the individual traffic streams are output from the first route buffer module; when->At the time->The output of the personal Actor network is +.>I.e. +.>Is>Element>The rate at which the individual traffic streams are output from the second route buffer module;
s302: constructing a Critic network which is a two-layer fully connected neural network and is composed of parametersCharacterization, its input includes->,/> and />The output of which characterizes the value function of the input variable value;
s303: and (3) constructing a task experience set caching module: the module is a buffer with a fixed storage space, and the initial state is empty and is used for storing task experience generated in the training process of meta reinforcement learning.
5. The method for optimizing the whole network multi-service combination based on element reinforcement learning according to claim 1, wherein the method comprises the following steps: the step S4 includes:
s401, establishing a Markov decision model;
s4011 define a state asAction is +.>The reward is +.>
S4012. Determine a state transition relationship: based on,/>,/>,/>Is estimated by Bayesian reasoning>,/>,/> and />The value or distribution of (2) thereby obtaining +.>A value or distribution of (2);
s402: generating a task data set;
s4021: order the
S4022: random initializationTraffic flow rate distribution for individual wireless access terminals;
s4023: order the
S4024: observationAnd send in->An Actor network to give a probability of 0.98 +.>The output of the personal Actor network is assigned to +.>Back->The output of the personal Actor network is assigned to +.>And thus get->To follow a group with a probability of 0.02Assignment of machine value to +.>To ensure the exploration of the meta reinforcement learning algorithm;
s4025: execution ofThe input rate of the transmission network and the core network is made to be +.> and />
S4026: observing and recording and />Is a value of (2);
s4027: will beArchive as an experience and store in task experience set caching module +.>The experience of each task is concentrated;
s4028: if it isTerminating the loop, proceeding to step S4029, otherwise letting +.>Returning to step S4024;
s4029: if it isTerminating the loop, proceeding to step S403, otherwiseLet->Returning to step S4022;
s403: training a meta reinforcement learning model;
s4031: random initialization and />Is a value of (2);
s4032: order the
S4033: randomly selecting a task experience set from 100 task experience sets;
s4034: order the
S4035: taking K experiences from task experience setCalculating a loss function
(1.2)
wherein ,,/>is->Personal Actor network->For the output at the time of input,for Critic network with-> and />Is the output at the time of input; we use the->Value back propagation update parameter in Critic network>
S4036: minimization ofTo update->Parameter in personal Actor network +.>, wherein ,
s4037: if it isTerminating the loop, proceeding to step S4038, otherwise letting +.>Returning to step S4035;
s4038: if it isThe cycle is terminated and, if necessary,step S404 is entered, otherwise let +.>Returning to step S4033;
s404: before trainingThe Actor network is deployed to the first route buffer module and then +.>And deploying the Actor network to a second route cache module.
6. The method for optimizing the whole network multi-service combination based on element reinforcement learning according to claim 1, wherein the method comprises the following steps: the step S5 includes:
s501: order the
S502: observationThe value of (2) is fed into a route buffer module, and is obtained based on an Actor network in the route buffer module> and />Is a value of (2);
s503: performing a slave at a first route cache moduleTo->Is performed from +.>To->Is a rate conversion of (2);
s504: judging whether or not to meet
If it isThe circulation is terminated, and multi-service joint optimization of the whole network is completed at the moment; otherwise, let->And returns to step S502.
CN202311252903.4A 2023-09-27 2023-09-27 Whole-network multi-service joint optimization method based on element reinforcement learning Active CN116996921B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311252903.4A CN116996921B (en) 2023-09-27 2023-09-27 Whole-network multi-service joint optimization method based on element reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311252903.4A CN116996921B (en) 2023-09-27 2023-09-27 Whole-network multi-service joint optimization method based on element reinforcement learning

Publications (2)

Publication Number Publication Date
CN116996921A true CN116996921A (en) 2023-11-03
CN116996921B CN116996921B (en) 2024-01-02

Family

ID=88525186

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311252903.4A Active CN116996921B (en) 2023-09-27 2023-09-27 Whole-network multi-service joint optimization method based on element reinforcement learning

Country Status (1)

Country Link
CN (1) CN116996921B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020249299A1 (en) * 2019-06-11 2020-12-17 Telefonaktiebolaget Lm Ericsson (Publ) Methods and apparatus for data traffic routing
CN113411826A (en) * 2021-06-17 2021-09-17 天津大学 Edge network equipment caching method based on attention mechanism reinforcement learning
CN113596138A (en) * 2021-07-26 2021-11-02 东北大学 Heterogeneous information center network cache allocation method based on deep reinforcement learning
CN113676513A (en) * 2021-07-15 2021-11-19 东北大学 Deep reinforcement learning-driven intra-network cache optimization method
US20230171640A1 (en) * 2021-11-30 2023-06-01 Samsung Electronics Co., Ltd. Traffic optimization module and operating method thereof
CN116321307A (en) * 2023-03-10 2023-06-23 北京邮电大学 Bidirectional cache placement method based on deep reinforcement learning in non-cellular network

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020249299A1 (en) * 2019-06-11 2020-12-17 Telefonaktiebolaget Lm Ericsson (Publ) Methods and apparatus for data traffic routing
CN113411826A (en) * 2021-06-17 2021-09-17 天津大学 Edge network equipment caching method based on attention mechanism reinforcement learning
CN113676513A (en) * 2021-07-15 2021-11-19 东北大学 Deep reinforcement learning-driven intra-network cache optimization method
CN113596138A (en) * 2021-07-26 2021-11-02 东北大学 Heterogeneous information center network cache allocation method based on deep reinforcement learning
US20230171640A1 (en) * 2021-11-30 2023-06-01 Samsung Electronics Co., Ltd. Traffic optimization module and operating method thereof
CN116321307A (en) * 2023-03-10 2023-06-23 北京邮电大学 Bidirectional cache placement method based on deep reinforcement learning in non-cellular network

Also Published As

Publication number Publication date
CN116996921B (en) 2024-01-02

Similar Documents

Publication Publication Date Title
Guo et al. An adaptive wireless virtual reality framework in future wireless networks: A distributed learning approach
CN110809306B (en) Terminal access selection method based on deep reinforcement learning
CN114338504B (en) Micro-service deployment and routing method based on network edge system
CN112020103B (en) Content cache deployment method in mobile edge cloud
CN113475089B (en) Method and system for user-oriented content streaming
CN112995950B (en) Resource joint allocation method based on deep reinforcement learning in Internet of vehicles
CN114390057B (en) Multi-interface self-adaptive data unloading method based on reinforcement learning under MEC environment
Wang et al. Multimodal semantic communication accelerated bidirectional caching for 6G MEC
CN113098714B (en) Low-delay network slicing method based on reinforcement learning
CN112395090B (en) Intelligent hybrid optimization method for service placement in mobile edge calculation
CN113626104B (en) Multi-objective optimization unloading strategy based on deep reinforcement learning under edge cloud architecture
CN114205791A (en) Depth Q learning-based social perception D2D collaborative caching method
CN114281718A (en) Industrial Internet edge service cache decision method and system
CN116321307A (en) Bidirectional cache placement method based on deep reinforcement learning in non-cellular network
CN113993168B (en) Collaborative caching method based on multi-agent reinforcement learning in fog wireless access network
CN115633380A (en) Multi-edge service cache scheduling method and system considering dynamic topology
CN115314944A (en) Internet of vehicles cooperative caching method based on mobile vehicle social relation perception
Chen et al. Twin delayed deep deterministic policy gradient-based intelligent computation offloading for IoT
CN116996921B (en) Whole-network multi-service joint optimization method based on element reinforcement learning
CN111465057B (en) Edge caching method and device based on reinforcement learning and electronic equipment
CN112911614A (en) Cooperative coding caching method based on dynamic request D2D network
CN116204319A (en) Yun Bianduan collaborative unloading method and system based on SAC algorithm and task dependency relationship
CN115756873A (en) Mobile edge computing unloading method and platform based on federal reinforcement learning
CN113766540B (en) Low-delay network content transmission method, device, electronic equipment and medium
WO2023039905A1 (en) Ai data transmission method and apparatus, device, and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant