CN116996921A

CN116996921A - Whole-network multi-service joint optimization method based on element reinforcement learning

Info

Publication number: CN116996921A
Application number: CN202311252903.4A
Authority: CN
Inventors: 黄川�; 崔曙光; 李然; 符浩
Original assignee: Chinese University of Hong Kong Shenzhen
Current assignee: Chinese University of Hong Kong Shenzhen
Priority date: 2023-09-27
Filing date: 2023-09-27
Publication date: 2023-11-03
Anticipated expiration: 2043-09-27
Also published as: CN116996921B

Abstract

The invention discloses a full-network multi-service joint optimization method based on element reinforcement learning, which comprises the following steps: s1, a 5G communication platform comprising a three-layer network structure of a wireless access network, a transmission network and a core network is built, and an objective function of joint optimization is determined; s2, constructing a multi-service-oriented route cache module; s3, constructing a meta reinforcement learning model, which comprisesThe system comprises an Actor network, a Critic network and a task experience set caching module; s4, training and determining parameters of a route cache module based on the meta reinforcement learning model; s5, multi-service joint optimization of the 5G whole network is carried out. The invention is controlled byThe route caching method of the whole system communication network at each layer of network management realizes the whole network joint optimization of multiple services.

Description

Whole-network multi-service joint optimization method based on element reinforcement learning

Technical Field

The invention relates to the field of communication, in particular to a full-network multi-service joint optimization method based on meta reinforcement learning.

Background

In recent years, the rapid development of 5G mobile communication technology has greatly improved the communication quality of end-to-end transmission services. However, with the increasing coverage and interaction depth of terminal devices, the 5G communication network meets the difficulties of large service traffic, high jitter, multiple types and multiple numbers, and brings great challenges to the whole network resource control.

Disclosure of Invention

The invention aims to overcome the defects of the prior art, and provides a full-network multi-service joint optimization method based on element reinforcement learning, which realizes the full-network joint optimization of multi-service by controlling a route caching method of a communication full-network at each layer of network management.

The aim of the invention is realized by the following technical scheme: a full-network multi-service joint optimization method based on element reinforcement learning comprises the following steps:

s1, constructing a 5G communication platform comprising a three-layer network structure of a wireless access network, a transmission network and a core network, and determining a joint optimization objective function;

s2, constructing a multi-service-oriented route cache module;

s3, constructing a meta reinforcement learning model, which comprisesThe system comprises an Actor network, a Critic network and a task experience set caching module;

s4, training and determining parameters of a route cache module based on the meta reinforcement learning model;

s5, multi-service joint optimization of the 5G whole network is carried out.

The beneficial effects of the invention are as follows: the invention realizes the whole network joint optimization of multiple services by controlling the route caching method of the communication whole network at each layer of network management. Considering that the traditional multi-service joint optimization focuses on the design of time and frequency spectrum slice diversity of communication resources of each network, the optimization upper bound of the whole-network multi-service transmission is difficult to approach, and the situation of unstable multi-service rate flow is more difficult to deal with. The invention adopts a meta-reinforcement learning algorithm, can learn the route caching method under the distribution of different business flows of multiple businesses, and greatly improves the utilization rate of the whole network resources and the service quality of the multiple businesses.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

Detailed Description

The technical solution of the present invention will be described in further detail with reference to the accompanying drawings, but the scope of the present invention is not limited to the following description.

As shown in fig. 1, a method for optimizing the combination of multiple services in a whole network based on element reinforcement learning includes the following steps:

s101: and building a 5G open communication platform and a three-layer network structure comprising a wireless access network, a transmission network and a core network.

S1011: building a radio access network comprisingA wireless access terminal and an access base station, < >>Personal wireless access terminal initiation->Communication service flows and corresponding air interface communication rate is +.>The method comprises the steps of carrying out a first treatment on the surface of the The access base station comprises->Radio channel resources and channel gain of +.>The method comprises the steps of carrying out a first treatment on the surface of the The radio access network adopts a standard 5G air interface channel allocation scheme, which is marked as +.>；/>After entering the radio access network, the radio access network will be based on +.>Channel allocation is carried out on the communication service flows, and based on the result of the channel allocation, the speed of the communication service flows can correspondingly change when the wireless access network is output, and the communication service flows are recorded as +.>The output rate of the individual communication traffic streams is +.>Its value is->，/> and />And (5) completely determining.

S1012: and constructing a transmission network, including a transmission network route and a transmission network link. The transmission network comprisesBackground stream traffic in individual dimensions, rate is denoted +.>The method comprises the steps of carrying out a first treatment on the surface of the The transmission network adopts standard 5G transmission route protocol, which is marked as +.>. The transmission network is based on->After the routing configuration is completed, the method comprises the steps of (1)>The size of each communication service flow is correspondingly changed, and weThe input rate of the transmission network is recorded asOutput rate is +.>, wherein />The value of (2) is->，/> and />And (5) completely determining.

S1013: and building a core network, wherein the core network comprises a core network route and a core network link. The core network comprisesBackground stream traffic in individual dimensions, rate is denoted +.>The method comprises the steps of carrying out a first treatment on the surface of the The core network adopts the standard 5G core network routing protocol, which is marked as +.>. The core network is based on->After the routing configuration is completed, the method comprises the steps of (1)>The size of each communication service flow will change correspondingly, and we will note that the input rate of the core network is as followsOutput rate is +.>, wherein />The value of (2) is->，/> and />And (5) completely determining.

S102: and (5) representing a multi-service joint optimization target. In the first placeAt time we will->The optimization objective function of individual business is marked +.>The size is +.>，/>，/>，/>，/> and />Influence. The objective function of the multi-service joint optimization can be written as

（1.1）

S2, constructing a multi-service-oriented route cache module;

s201: a first route buffer module is constructed between the wireless access network and the transmission network, and the module can realize the rate control of any input flow by adopting a classical route buffer algorithm. We will input wireless into the networkThe communication traffic flows are input into the module, and the corresponding traffic flow rate is +>The output traffic flow rate is +.>；

S202: and constructing a second route buffer module between the transmission network and the core network, wherein the module can realize the rate control of any input flow by adopting a classical route buffer algorithm. We will transport the networkThe communication traffic flows are input into the module, and the corresponding traffic flow rate is +>The output traffic flow rate is +.>；

s301: constructionAnd an Actor network. Each Actor network is a double-layer fully-connected neural network, which is marked by +.>The parameters of the individual Actor network are +.>The input is +.>When->At the time->The output of the personal Actor network is +.>I.e. +.>Is>Element>The rate at which the individual traffic streams are output from the first route buffer module; when->At the time->The output of the personal Actor network is +.>I.e. +.>Is the first of (2)Element>The rate at which the individual traffic streams are output from the second route buffer module;

s302: constructing a Critic network which is a double-layer fully-connected nerveNetwork and by parametersCharacterization, its input includes->，/> and />The output of which characterizes the value function of the input variable value;

s303: and (3) constructing a task experience set caching module: the module is a buffer with a fixed storage space, and the initial state is empty and is used for storing task experience generated in the training process of meta reinforcement learning.

s401, establishing a Markov decision model;

s4011 define a state asAction is +.>The reward is +.>；

S4012. Determine a state transition relationship: based on，/>，/>，/>Is estimated by Bayesian reasoning>，/>，/> and />Values or distributions of (2) to thereby obtainA value or distribution of (2);

due toDefined as->，/>，/>，/>Is able to infer +.>Is->，，/> and />Is aware of->，/>，/> and />The value or distribution of (2) is obtained>A value or distribution of (2);

s402: generating a task data set;

s4021: order the；

S4022: random initializationTraffic flow rate distribution for individual wireless access terminals;

s4023: order the；

S4024: observationAnd send in->An Actor network to give a probability of 0.98 +.>The output of the personal Actor network is assigned to +.>Back->The output of the personal Actor network is assigned to +.>And thus get->Is assigned to +.a.A set of random values with a probability of 0.02>To ensure the exploration of the meta reinforcement learning algorithm;

s4025: execution ofThe input rate of the transmission network and the core network is made to be +.> and />；

S4026: observing and recording and />Is a value of (2);

s4027: will beArchive as an experience and store in task experience set caching module +.>The experience of each task is concentrated;

s4028: if it isTerminating the loop, proceeding to step S4029, otherwise letting +.>Returning to step S4024;

s4029: if it isTerminating the loop, proceeding to step S403, otherwise letting +.>Returning to step S4022;

s403: training a meta reinforcement learning model;

s4031: random initialization and />Is a value of (2);

s4032: order the；

S4033: randomly selecting a task experience set from 100 task experience sets;

s4034: order the；

S4035: taking K experiences from task experience setCalculating a loss function

（1.2）

wherein ,，/>is->Personal Actor network->For output at input, ++>For Critic network with-> and />Is the output at the time of input. We use the->Value back propagation update parameter in Critic network>；

S4036: minimization ofTo update->Parameter in personal Actor network +.>, wherein ,；

s4037: if it isTerminating the loop, proceeding to step S4038, otherwise letting +.>Returning to step S4035;

s4038: if it isTerminating the loop, proceeding to step S404, otherwise letting +.>Returning to step S4033;

s404: before trainingThe Actor network is deployed to the first route buffer module and then +.>And deploying the Actor network to a second route cache module.

S5, carrying out multi-service joint optimization of the 5G whole network;

s501: order the；

S502: observationThe value of (2) is fed into a route buffer module, and is obtained based on an Actor network in the route buffer module> and />Is a value of (2);

s503: performing a slave at a first route cache moduleTo->Is performed from +.>To->Is a rate conversion of (2);

s504: judging whether or not to meet；

If it isThe circulation is terminated, and multi-service joint optimization of the whole network is completed at the moment;

it should be noted that: this process calculates and />Is updated at different times, since the size of the objective function of the multi-service joint optimization is subject to +.>，/>Influence, so optimize-> and />The effect of the joint optimization of multiple services is achieved.

Otherwise, letAnd returns to step S502.

The above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A full-network multi-service joint optimization method based on element reinforcement learning is characterized by comprising the following steps of: the method comprises the following steps:

s2, constructing a multi-service-oriented route cache module;

s5, multi-service joint optimization of the 5G whole network is carried out.

2. The method for optimizing the whole network multi-service combination based on element reinforcement learning according to claim 1, wherein the method comprises the following steps: the step S1 includes:

s101: building a 5G communication platform comprising a three-layer network structure of a wireless access network, a transmission network and a core network:

s1011: building a radio access network comprisingA wireless access terminal and an access base station, < >>Personal wireless access terminal initiation->Communication service flows and corresponding air interface communication rate is +.>The method comprises the steps of carrying out a first treatment on the surface of the The access base station comprises->Radio channel resources and channel gain of +.>The method comprises the steps of carrying out a first treatment on the surface of the The radio access network adopts a standard 5G air interface channel allocation scheme, which is marked as +.>；/>After entering the radio access network, the radio access network will be based on +.>Channel allocation is carried out on the communication service flows, and based on the result of the channel allocation, the speed of the communication service flows can correspondingly change when the wireless access network is output, and the communication service flows are recorded as +.>The output rate of the individual communication traffic streams is +.>Its value is->，/> and />Completely determining;

s1012: constructing a transmission network, including a transmission network route and a transmission network link; the transmission network comprisesBackground stream traffic in individual dimensions, rate is denoted +.>The method comprises the steps of carrying out a first treatment on the surface of the The transmission network adopts standard 5G transmission route protocol, which is marked as +.>The method comprises the steps of carrying out a first treatment on the surface of the The transmission network is based on->After the routing configuration is completed, the method comprises the steps of (1)>The size of each communication service flow is correspondingly changed, and the input rate of the transmission network is recorded as +.>Output rate is +.>, wherein />The value of (2) is->，/> and />Completely determining;

s1013: building a core network, including a core network route and a core network link; the core network comprisesBackground stream traffic in individual dimensions, rate is denoted +.>The method comprises the steps of carrying out a first treatment on the surface of the The core network adopts the standard 5G core network routing protocol, which is marked as +.>The core network is based on->After the routing configuration is completed, the method comprises the steps of (1)>The size of each communication service flow will change correspondingly, and the input rate of the core network is recorded as +.>Output rate is +.>, wherein />The value of (2) is->，/> and />Completely determining;

s102: characterizing multi-business joint optimization objectives, in the firstAt time we will->The optimization objective function of each service is recorded asThe objective function of the multi-service joint optimization is recorded as

（1.1）

Size is subject to->，/>，/>，/>，/> and />Influence.

3. The method for optimizing the whole network multi-service combination based on element reinforcement learning according to claim 1, wherein the method comprises the following steps: the step S2 includes:

s201: a first route buffer module is constructed between the wireless access network and the transmission network, the module adopts classical route buffer algorithm to realize the rate control of any input flow, and the wireless input network is connected with the first route buffer moduleThe communication traffic flows are input into the module, and the corresponding traffic flow rate is +>The output traffic flow rate is +.>；

S202: a second route buffer module is constructed between the transmission network and the core network, the module adopts classical route buffer algorithm to realize the rate control of any input flow, and the transmission network is connected with the second route buffer moduleIndividual traffic streamsInto the module, the corresponding traffic flow rate is +.>The output traffic flow rate is +.>。

4. The method for optimizing the whole network multi-service combination based on element reinforcement learning according to claim 1, wherein the method comprises the following steps: the step S3 includes:

s301: constructionAn Actor network, each of which is a double-layer fully-connected neural network, record +.>The parameters of the individual Actor network are +.>The input is +.>When->At the time->The output of the personal Actor network is +.>I.e. +.>Is>Element>The rate at which the individual traffic streams are output from the first route buffer module; when->At the time->The output of the personal Actor network is +.>I.e. +.>Is>Element>The rate at which the individual traffic streams are output from the second route buffer module;

s302: constructing a Critic network which is a two-layer fully connected neural network and is composed of parametersCharacterization, its input includes->，/> and />The output of which characterizes the value function of the input variable value;

5. The method for optimizing the whole network multi-service combination based on element reinforcement learning according to claim 1, wherein the method comprises the following steps: the step S4 includes:

s401, establishing a Markov decision model;

s4011 define a state asAction is +.>The reward is +.>；

S4012. Determine a state transition relationship: based on，/>，/>，/>Is estimated by Bayesian reasoning>，/>，/> and />The value or distribution of (2) thereby obtaining +.>A value or distribution of (2);

s402: generating a task data set;

s4021: order the；

s4023: order the；

S4024: observationAnd send in->An Actor network to give a probability of 0.98 +.>The output of the personal Actor network is assigned to +.>Back->The output of the personal Actor network is assigned to +.>And thus get->To follow a group with a probability of 0.02Assignment of machine value to +.>To ensure the exploration of the meta reinforcement learning algorithm;

S4026: observing and recording and />Is a value of (2);

s4029: if it isTerminating the loop, proceeding to step S403, otherwiseLet->Returning to step S4022;

s403: training a meta reinforcement learning model;

s4031: random initialization and />Is a value of (2);

s4032: order the；

S4033: randomly selecting a task experience set from 100 task experience sets;

s4034: order the；

S4035: taking K experiences from task experience setCalculating a loss function

（1.2）

wherein ,，/>is->Personal Actor network->For the output at the time of input,for Critic network with-> and />Is the output at the time of input; we use the->Value back propagation update parameter in Critic network>；

s4038: if it isThe cycle is terminated and, if necessary,step S404 is entered, otherwise let +.>Returning to step S4033;

6. The method for optimizing the whole network multi-service combination based on element reinforcement learning according to claim 1, wherein the method comprises the following steps: the step S5 includes:

s501: order the；

s504: judging whether or not to meet；

If it isThe circulation is terminated, and multi-service joint optimization of the whole network is completed at the moment; otherwise, let->And returns to step S502.