CN117811907A - Satellite network micro-service deployment method and device based on multi-agent reinforcement learning - Google Patents

Satellite network micro-service deployment method and device based on multi-agent reinforcement learning Download PDF

Info

Publication number
CN117811907A
CN117811907A CN202311360363.1A CN202311360363A CN117811907A CN 117811907 A CN117811907 A CN 117811907A CN 202311360363 A CN202311360363 A CN 202311360363A CN 117811907 A CN117811907 A CN 117811907A
Authority
CN
China
Prior art keywords
model
information
agent
satellite
deployment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311360363.1A
Other languages
Chinese (zh)
Inventor
吴胜
段皓月
纪哲
虞志刚
丁文慧
陆洲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN202311360363.1A priority Critical patent/CN117811907A/en
Publication of CN117811907A publication Critical patent/CN117811907A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0803Configuration setting
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/16Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using machine learning or artificial intelligence

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Radio Relay Systems (AREA)

Abstract

The embodiment of the application provides a satellite network micro-service deployment method and device based on multi-agent reinforcement learning, wherein the method comprises the following steps: acquiring resource demand information of the micro service; determining resource utilization rate information and time delay information of the satellite node according to a pre-established resource utilization rate model and time delay model and configuration information of the satellite node; under the condition that the resource utilization rate information is smaller than a first preset value or the time delay information is smaller than a second preset value, a pre-trained multi-agent strategy deployment model is adopted to determine the deployment strategy of the satellite nodes corresponding to the resource demand information of the micro-service, the server terminal is configured according to the deployment strategy of the satellite nodes, the resource demand of the micro-service on the server and the resource surplus of each satellite node can be utilized, the resource configuration of each satellite node is realized, the resource utilization balance of the satellite nodes is improved, the calling time delay is reduced, and the configuration efficiency is improved.

Description

Satellite network micro-service deployment method and device based on multi-agent reinforcement learning
Technical Field
The application relates to the technical field of artificial intelligence, in particular to a satellite network micro-service deployment method and device based on multi-agent reinforcement learning.
Background
With the continuous evolution of software construction and operation modes, the traditional centralized architecture cannot meet various application requirements due to the lack of flexibility, difficulty in expansion and migration and the like, and gradually evolves into a distributed micro-service architecture. The micro-service architecture is characterized in that complex applications are split into a plurality of relatively independent small applications according to a logical relationship, the micro-services can be independently developed, updated, expanded and deployed under the condition of not influencing each other, lightweight protocols are adopted for communication, the micro-services can be deployed into different satellite edge nodes, and the engineering project has higher expandability, reliability and flexible distributed deployment capability based on the micro-service architecture.
At present, with the increasing demands of communication services, the defects of insufficient spectrum resources, small coverage area and the like of a ground communication network appear. Compared with ground communication, satellite communication has unique advantages, such as wide coverage, high system reliability, large communication capacity and no influence of natural disasters such as earthquakes, but each satellite has different resource amounts, different micro servers have different requests for satellite resources, the resource remaining amounts of each satellite are different, and how different micro servers can be deployed on each satellite, so that reasonable configuration of satellite resources is a problem which needs to be solved at present.
Disclosure of Invention
The aim of some embodiments of the present application is to provide a satellite network micro-service deployment method and device based on multi-agent reinforcement learning, by the technical scheme of the embodiments of the present application, resource demand information of micro-service is obtained; determining resource utilization rate information and time delay information of the satellite node according to a pre-established resource utilization rate model and time delay model and configuration information of the satellite node; when the resource utilization rate information is smaller than a first preset value or the time delay information is smaller than a second preset value, a pre-trained multi-agent strategy deployment model is adopted to determine a deployment strategy of a satellite node corresponding to the resource demand information of the micro service, wherein the pre-trained multi-agent strategy deployment model is obtained by training each parameter of an agent network model by adopting a multi-agent depth determination strategy gradient algorithm; according to the embodiment of the application, the resource utilization rate model and the time delay model are established based on the micro-service architecture, the resource utilization rate information and the time delay information of the satellite nodes are determined according to the configuration information of the satellite nodes, then the deployment strategy of the satellite nodes corresponding to the resource demand information of the micro-service is determined by adopting the pre-trained multi-agent strategy deployment model, and the server terminal is configured according to the deployment strategy of the satellite nodes, so that the resource configuration can be carried out on each satellite node according to the resource demand of the micro-service and the resource surplus of each satellite node, the resource utilization balance degree of the satellite nodes is improved, the calling time delay is reduced, and the configuration efficiency is improved.
In a first aspect, some embodiments of the present application provide a satellite network micro-service deployment method based on multi-agent reinforcement learning, including:
acquiring resource demand information of the micro service;
determining resource utilization rate information and time delay information of the satellite node according to a pre-established resource utilization rate model and time delay model and configuration information of the satellite node;
when the resource utilization rate information is smaller than a first preset value or the time delay information is smaller than a second preset value, a pre-trained multi-agent strategy deployment model is adopted to determine a deployment strategy of a satellite node corresponding to the resource demand information of the micro service, wherein the pre-trained multi-agent strategy deployment model is obtained by training each parameter of an agent network model by adopting a multi-agent depth determination strategy gradient algorithm;
and configuring the server terminal according to the deployment strategy of the satellite node.
According to some embodiments of the application, the resource utilization rate model and the time delay model are established, the resource utilization rate information and the time delay information of the satellite nodes are determined according to the configuration information of the satellite nodes, then a pre-trained multi-agent strategy deployment model is adopted, the deployment strategy of the satellite nodes corresponding to the resource demand information of the micro-service is determined, and the server terminal is configured according to the deployment strategy of the satellite nodes, so that the resource configuration can be carried out on each satellite node according to the resource demand of the micro-service and the resource surplus of each satellite node, namely, the micro-service with different resource demands is deployed on the proper satellite nodes, the resource utilization balance degree of the satellite nodes is improved, the calling time delay is reduced, and the configuration efficiency is improved.
Optionally, the multi-agent policy deployment model is obtained by:
obtaining an agent sample parameter, wherein the agent sample parameter at least comprises an agent observation environment:
acquiring an intelligent network model, wherein the intelligent network model is at least an actor network model and a criticizer network model;
inputting the intelligent agent observation environment into the actor network model, and outputting deployment actions of intelligent agents;
inputting the deployment action and the global state of the intelligent agent into the criticizer network model, and outputting an action judgment value;
according to the current state information, action information, rewarding information and state information of the intelligent agent at the next moment, a playback pool is established;
a multi-agent depth determination strategy gradient algorithm is adopted, and network parameters in the actor network model and the criticizer network model are updated according to the current state information, the action information, the rewarding information and the state information of the next moment acquired in the playback pool;
and under the condition that the actor network model and the criticizer network model are converged, determining the converged actor network model and the criticizer network model as the multi-agent strategy deployment model.
According to the method and the device, the micro-service deployment problem is converted into a partially observable Markov decision process, a multi-agent reinforcement learning method is adopted to solve the problem, a centralized training mode and a distributed execution mode are adopted, in a training stage, a container instance of the micro-service is used as an agent to acquire global information, an optimal deployment scheme is obtained, and in an execution stage, the micro-service can be deployed only by means of an observation space of the micro-service, so that communication expenditure among the micro-services is greatly reduced.
Optionally, the updating the network parameters in the actor network model and the criticizing agent network model includes:
acquiring a first loss function of an actor network model and a second loss function of the criticizer network model;
gradient calculation is carried out on the first loss function and the second loss function respectively;
and updating network parameters in the actor network model and the criticism network model by using a gradient descent method.
Some embodiments of the application adopt a method of fixing a network, fix a target network and transmit original network parameters to the target network at intervals, avoid continuous change of an updated target, and ensure the training stability.
Optionally, the configuration information of the satellite nodes at least includes the number of the satellite nodes, the total amount of the resource types and the heterogeneous resource capacity.
Optionally, the resource utilization model is obtained by:
acquiring resource balance information of a first resource utilization rate model of different types of resources on the same satellite node and node balance information of a second resource utilization rate model of the same type of resources on different satellite nodes;
and determining the resource utilization rate model according to the resource balance degree information and the weight value corresponding to the resource balance degree information, and the node balance degree information and the weight value corresponding to the node balance degree information.
Optionally, the delay model includes at least a transmission delay sub-model, a propagation delay sub-model, and a migration delay sub-model.
Some embodiments of the present application build a resource utilization model and a latency model, minimize resource utilization variance and latency, and represent the micro-service deployment problem as a multi-objective optimization problem.
In a second aspect, some embodiments of the present application provide a satellite network micro-service deployment apparatus based on multi-agent reinforcement learning, including:
The acquisition module is used for acquiring resource demand information of the micro service;
the first determining module is used for determining the resource utilization rate information and the time delay information of the satellite node according to a pre-established resource utilization rate model and a time delay model and configuration information of the satellite node;
the second determining module is configured to determine a deployment strategy of a satellite node corresponding to the resource demand information of the micro service by using a pre-trained multi-agent strategy deployment model when the resource utilization rate information is smaller than a first preset value or the time delay information is smaller than a second preset value, where the pre-trained multi-agent strategy deployment model is obtained by training each parameter of an agent network model by using a multi-agent depth determination strategy gradient algorithm;
and the configuration module is used for configuring the server terminal according to the deployment strategy of the satellite node.
According to some embodiments of the application, a resource utilization rate model and a time delay model are established based on a micro-service architecture, resource utilization rate information and time delay information of satellite nodes are determined according to configuration information of the satellite nodes, then a pre-trained multi-agent strategy deployment model is adopted to determine a deployment strategy of the satellite nodes corresponding to resource demand information of the micro-service, and a server terminal is configured according to the deployment strategy of the satellite nodes.
Optionally, the apparatus further comprises a model training module for:
obtaining an agent sample parameter, wherein the agent sample parameter at least comprises an agent observation environment:
acquiring an intelligent network model, wherein the intelligent network model is at least an actor network model and a criticizer network model;
inputting the intelligent agent observation environment into the actor network model, and outputting deployment actions of intelligent agents;
inputting the deployment action and the global state of the intelligent agent into the criticizer network model, and outputting an action judgment value;
according to the current state information, action information, rewarding information and state information of the intelligent agent at the next moment, a playback pool is established;
a multi-agent depth determination strategy gradient algorithm is adopted, and network parameters in the actor network model and the criticizer network model are updated according to the current state information, the action information, the rewarding information and the state information of the next moment acquired in the playback pool;
and under the condition that the actor network model and the criticizer network model are converged, determining the converged actor network model and the criticizer network model as the multi-agent strategy deployment model.
According to the method and the device, the micro-service deployment problem is converted into a partially observable Markov decision process, a multi-agent reinforcement learning method is adopted to solve the problem, a centralized training mode and a distributed execution mode are adopted, in a training stage, a container instance of the micro-service is used as an agent to acquire global information, an optimal deployment scheme is obtained, and in an execution stage, the micro-service can be deployed only by means of an observation space of the micro-service, so that communication expenditure among the micro-services is greatly reduced.
Optionally, the model training module is configured to:
acquiring a first loss function of an actor network model and a second loss function of the criticizer network model;
gradient calculation is carried out on the first loss function and the second loss function respectively;
and updating network parameters in the actor network model and the criticism network model by using a gradient descent method.
Some embodiments of the application adopt a method of fixing a network, fix a target network and transmit original network parameters to the target network at intervals, avoid continuous change of an updated target, and ensure the training stability.
Optionally, the configuration information of the satellite nodes at least includes the number of the satellite nodes, the total amount of the resource types and the heterogeneous resource capacity.
Optionally, the model training module is configured to:
acquiring resource balance information of a first resource utilization rate model of different types of resources on the same satellite node and node balance information of a second resource utilization rate model of the same type of resources on different satellite nodes;
and determining the resource utilization rate model according to the resource balance degree information and the weight value corresponding to the resource balance degree information, and the node balance degree information and the weight value corresponding to the node balance degree information.
Optionally, the delay model includes at least a transmission delay sub-model, a propagation delay sub-model, and a migration delay sub-model.
Some embodiments of the present application build a resource utilization model and a latency model, minimize resource utilization variance and latency, and represent the micro-service deployment problem as a multi-objective optimization problem.
In a third aspect, some embodiments of the present application provide an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor, when executing the program, may implement the satellite network micro-service deployment method based on multi-agent reinforcement learning according to any of the embodiments of the first aspect.
In a fourth aspect, some embodiments of the present application provide a computer readable storage medium having stored thereon a computer program, which when executed by a processor, may implement a satellite network micro-service deployment method based on multi-agent reinforcement learning according to any of the embodiments of the first aspect.
In a fifth aspect, some embodiments of the present application provide a computer program product, where the computer program product includes a computer program, where the computer program when executed by a processor may implement the satellite network micro-service deployment method based on multi-agent reinforcement learning according to any of the embodiments of the first aspect.
Drawings
In order to more clearly illustrate the technical solutions of some embodiments of the present application, the drawings that are required to be used in some embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort to a person having ordinary skill in the art.
Fig. 1 is a schematic flow chart of a satellite network micro-service deployment method based on multi-agent reinforcement learning according to an embodiment of the present application;
FIG. 2 is a schematic flow chart of another satellite network micro-service deployment method based on multi-agent reinforcement learning according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a micro-service deployment scenario provided in an embodiment of the present application;
FIG. 4 is a network architecture diagram of model training provided by an embodiment of the present application;
FIG. 5 is a schematic flow chart of model training according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a satellite network micro-service deployment device based on multi-agent reinforcement learning according to an embodiment of the present application;
fig. 7 is a schematic diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions in some embodiments of the present application will be described below with reference to the drawings in some embodiments of the present application.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only to distinguish the description, and are not to be construed as indicating or implying relative importance.
With the continuous evolution of software construction and operation modes, the traditional centralized architecture cannot meet various application requirements due to the lack of flexibility, difficulty in expansion and migration and the like, and gradually evolves into a distributed micro-service architecture. The micro-service architecture is characterized in that complex applications are split into a plurality of relatively independent small applications according to a logical relationship, the micro-services can be independently developed, updated, expanded and deployed under the condition of not influencing each other, lightweight protocols are adopted for communication, the micro-services can be deployed into different satellite edge nodes, and the engineering project has higher expandability, reliability and flexible distributed deployment capability based on the micro-service architecture.
At present, with the increasing demands of communication services, the defects of insufficient spectrum resources, small coverage area and the like of a ground communication network appear. Compared with ground communication, satellite communication has unique advantages, such as wide coverage, high system reliability, large communication capacity and no influence of natural disasters such as earthquakes, but each satellite has different resource amounts, different micro servers have different requests for satellite resources and the resource residual amounts of each satellite are different, and therefore, some embodiments of the application provide a satellite network micro service deployment method based on multi-agent reinforcement learning, which comprises the steps of obtaining resource demand information of micro services; determining resource utilization rate information and time delay information of the satellite node according to a pre-established resource utilization rate model and time delay model and configuration information of the satellite node; under the condition that the resource utilization rate information is smaller than a first preset value or the time delay information is smaller than a second preset value, a pre-trained multi-agent strategy deployment model is adopted to determine the deployment strategy of the satellite node corresponding to the resource demand information of the micro service, wherein the pre-trained multi-agent strategy deployment model is obtained by training each parameter of an agent network model by adopting a multi-agent depth determination strategy gradient algorithm; the method comprises the steps of configuring a server terminal according to a deployment strategy of a satellite node, establishing a resource utilization rate model and a time delay model based on a micro-service architecture, determining resource utilization rate information and time delay information of the satellite node according to configuration information of the satellite node, then determining a deployment strategy of the satellite node corresponding to micro-service resource demand information by adopting a pre-trained multi-agent strategy deployment model, and configuring the server terminal according to the deployment strategy of the satellite node.
As shown in fig. 1, an embodiment of the present application provides a satellite network micro-service deployment method based on multi-agent reinforcement learning, which includes:
s101, acquiring resource demand information of a micro service;
the server terminal is used for executing micro services, each micro service corresponds to at least one container instance, and the container technology is a lightweight resource virtualization technology, which is a technology for abstracting, converting and dividing computing resources and presenting one or more computing resources. Among them, docker is the most popular container technology at present, and is widely applied to micro service deployment and cloud computing platforms.
The scheduling platform obtains resource demand information of each micro service, for example, the resource demand information comprises which micro service obtains the resource quantity of which satellite node, such as CPU, memory and disk IO;
s102, determining resource utilization rate information and time delay information of a satellite node according to a pre-established resource utilization rate model and time delay model and configuration information of the satellite node;
the configuration information of the satellite nodes at least comprises the number of the satellite nodes, the total quantity of resource types and heterogeneous resource capacity.
Specifically, a resource utilization rate model and a time delay model are pre-established on a scheduling platform, wherein the resource utilization rate model is expressed by variance and is divided into two types. One type is the variance of different resources on the same node, so that excessive resources of one type are prevented, a short-board effect is caused, and resource waste is caused; the other type is the variance of the same kind of resources on different nodes, and satellite node resources are prevented from being idle.
The delay is divided into transmission delay, propagation delay and migration delay. The transmission delay may be expressed as a quotient of the data size and the transmission rate, which may be determined by shannon's formula. Propagation delay is proportional to the physical distance between nodes. The migration delay is determined by the migration frequency of the micro-service.
The scheduling platform determines the resource utilization rate information and the time delay information of the satellite nodes according to a pre-established resource utilization rate model and a time delay model and configuration information of the satellite nodes.
S103, under the condition that the resource utilization rate information is smaller than a first preset value or the time delay information is smaller than a second preset value, a pre-trained multi-agent strategy deployment model is adopted to determine the deployment strategy of the satellite node corresponding to the resource demand information of the micro service, wherein the pre-trained multi-agent strategy deployment model is obtained by training each parameter of an agent network model by adopting a multi-agent depth determination strategy gradient algorithm;
specifically, the scheduling platform acquires global state information in advance, the global state information comprises resource occupation conditions of satellite nodes, position information of satellites and deployment conditions of containers, satellite positions are changed continuously along with time, an intelligent body can make corresponding actions according to the change of the positions to cause state change, a reward function is established under the condition that resource utilization rate variance and time delay are as small as possible, an intelligent body network model is established according to the information, and then a Multi-intelligent body depth determination strategy gradient algorithm (Multi-agent Deep Deterministic Policy Gradient, MADDPG) is adopted to train each parameter of the intelligent body network model to obtain a Multi-intelligent body strategy deployment model.
And under the condition that the resource utilization rate information is smaller than a first preset value or the time delay information is smaller than a second preset value, determining a deployment strategy of the satellite node corresponding to the resource demand information of the micro-service by adopting a pre-trained multi-agent strategy deployment model.
S104, configuring the server terminal according to the deployment strategy of the satellite node.
Specifically, the scheduling platform configures each micro service with the obtained deployment strategy of each satellite node, so that the satellite nodes execute the deployment strategy, the resource utilization balance of the satellite nodes is improved, the calling time delay is reduced, and the configuration efficiency is improved.
According to some embodiments of the application, a resource utilization rate model and a time delay model are established based on a micro-service architecture, resource utilization rate information and time delay information of satellite nodes are determined according to configuration information of the satellite nodes, then a pre-trained multi-agent strategy deployment model is adopted to determine a deployment strategy of the satellite nodes corresponding to resource demand information of micro-services, and a server terminal is configured according to the deployment strategy of the satellite nodes, so that resource configuration can be carried out on each satellite node according to the resource demand of the micro-services on the server and the resource surplus of each satellite node, namely, the micro-services with different resource demands are deployed on the proper satellite nodes, the resource utilization balance degree of the satellite nodes is improved, the calling time delay is reduced, and the configuration efficiency is improved.
The satellite network micro-service deployment method based on multi-agent reinforcement learning provided by the embodiment is further described in the further embodiment.
Fig. 2 is a flow chart of another satellite network micro-service deployment method based on multi-agent reinforcement learning according to an embodiment of the present application, as shown in fig. 2, where the satellite network micro-service deployment method based on multi-agent reinforcement learning includes:
step 1: constructing a micro-service deployment model;
specifically, determining the network structure, the number of satellite nodes and the resource types of the satellite edge computing system, and constructing a micro-service deployment model, as shown in fig. 3, including steps 101 to 104, as follows:
step 101: consider a satellite edge computation scenario, as shown in figure one, comprising a set of satellite edge nodes s= { S 1 ,s 2 ,...,s N Where N is the number of satellite nodes. In the edge computing scenario, the total amount of resource types is R (CPU, memory, disk IO, etc.), denoted as r= { R 1 ,r 2 ,...,r R }. For node s i Its heterogeneous resource capacity is represented as vector V i ={V i 1 ,V i 2 ,...,V i R }, wherein V i j Representing node s i Upper resource r j Is used to determine the available capacity of the battery.
Step 102: the micro-service set of the target deployment application in the satellite edge computing platform is ms= { MS 1 ,ms 2 ,...,ms M Where M is the number of micro services deployed in the form of containers into edge nodes. Resource requests for different microservices are set as vectorsWherein->Representing microservices ms i For resource r j Is a request amount of (a) to be used.
Step 103: each micro service can have multiple copies, namely multiple containers are deployed on different nodes, and the number of the containers of each micro service is set as Q= { Q 1 ,q 2 ,...,q M }, where q i Representing microservices ms i The number of copies of the container. The total amount of containers to be deployed isΣq, which can be expressed as a set Representing microservices ms i A corresponding jth container copy. Defining the container instance scheduling decision variable +.>When->Time-indicating container instanceDeployed at node s i And otherwise, the value is 0.
Step 104: the call relationship between micro services is represented by directed acyclic graph, and the adjacency matrix Y represents the call relationship when Y ij Representing micro-service ms when=1 i The next micro-service invoked is ms j Otherwise the value is 0.
Step 2: the optimization problem is represented by establishing a resource utilization rate model and a time delay model. Minimizing the resource utilization variance and latency represents the micro-service deployment problem as a multi-objective optimization problem.
Alternatively, the resource utilization model is obtained by:
Acquiring resource balance information of a first resource utilization rate model of different types of resources on the same satellite node and node balance information of a second resource utilization rate model of the same type of resources on different satellite nodes;
and determining a resource utilization rate model according to the resource balance degree information and the weight value corresponding to the resource balance degree information, and the node balance degree information and the weight value corresponding to the node balance degree information.
In this step, the resource utilization models are represented by variances and are classified into two categories. One type is the variance of different resources on the same node, so that excessive resources of one type are prevented, a short-board effect is caused, and resource waste is caused; the other type is the variance of the same kind of resources on different nodes, and satellite node resources are prevented from being idle.
Step 201: and establishing a satellite node resource utilization rate model.
Node s i Upper resource r j Utilization u of (2) i.j Can be expressed as:
the resource utilization rate model is divided into two types, wherein one type is the resource utilization rate of different types of resources on the same node. When a plurality of micro services with the same resource type are deployed to the same node, other micro services cannot be deployed on the node, so that a 'short-board effect' is formed, and resource waste is caused. The resource balance is expressed by standard deviation, node s i Balance epsilon of all resources on i Can be expressed as:
the other type is the resource utilization of the same type of resource on different nodes. When the number of micro services increases, all satellite edge nodes are hoped to be utilized, so that resource idling is prevented, and resource waste is caused. Resource r j Equalization on different nodesCan be expressed as:
the calculation formula of the resource utilization ratio U of the cluster is as follows, wherein α is a weight factor:
optionally, the delay model comprises at least a transmission delay sub-model, a propagation delay sub-model, and a migration delay sub-model.
The time delay is divided into transmission time delay, propagation time delay and migration time delay. The transmission delay may be expressed as a quotient of the data size and the transmission rate, which may be determined by shannon's formula. Propagation delay is proportional to the physical distance between nodes. The migration delay is determined by the migration frequency of the micro-service.
Step 202: and establishing a time delay model, wherein the time delay model is divided into transmission time delay, propagation time delay and migration time delay.
In particular, depending on the service deployment scenario, the communication links between the ground stations and the satellite nodes and the communication links between the satellite nodes need to be considered. According to shannon's theorem, the data transmission rate of a ground station to a destination satellite node can be expressed as:
Wherein W is g_s For the channel bandwidth, p g Transmit power for ground station g g_s For the channel gain between the ground station and the destination satellite, N 0 Representing the background noise and,representing the sum of other noise interference powers from the ground station to the satellite.
Is provided withFor node s i Sum s j Channel bandwidth of inter-satellite link, +.>For the signal-to-noise ratio between two nodes, the data transmission rate between two satellite nodes is:
from the adjacency matrix, the complete scheduling chain data transmission delay can be calculated as:
wherein d g_s Representing the size of the data volume of the ground station to satellite transmission, d i,j Representing satellite nodes s i Sum s j The size of the amount of data transmitted.
The information propagation rate is the speed of light c, the node s i Sum s j The distance between them is denoted as dis i,j The propagation delay can be expressed as:
when the satellite moves out of the visible range of the serving cell, a migration action is required. When the agent makes a migration action, a migration delay is generated. Modeling the migration action of the intelligent agent into a directed weight graph, wherein the weight is the migration time delay from the original satellite to the target satellite. The overall migration cost is the sum of weights, expressed as
Tr represents the whole migration link, w, under the action of the agent in the time period i,j Representing the weights.
The overall delay D can be calculated as follows:
D=τ transpropmig
Some embodiments of the present application build a resource utilization model and a latency model, minimize resource utilization variance and latency, and represent the micro-service deployment problem as a multi-objective optimization problem.
Step 203: the optimization problem is represented.
Based on the model, a joint optimization problem can be established, the standard deviation U of the resource utilization rate is minimized, and the overall time delay D is minimized, which is expressed as:
P:min(U),min(D)
wherein, C1 represents that each micro-service deploys at least one container instance to realize the function of the micro-service, and C2 represents that the data volume of the micro-service request cannot be larger than the maximum capacity of the resources of the node.
Step 3: the micro-service deployment problem is expressed as a partially observable Markov decision process, and is solved by adopting a multi-agent reinforcement learning method.
As shown in fig. 4, since the agents cannot acquire all the state information, the problem is that each agent has a separate observation space, and the relative position of the satellite changes with time, thereby affecting the communication delay, so that the environmental state changes with time and the actions of the agents.
Optionally, the multi-agent policy deployment model is obtained by:
obtaining an agent sample parameter, wherein the agent sample parameter at least comprises an agent observation environment:
Acquiring an intelligent network model, wherein the intelligent network model at least comprises an actor network model and a criticizer network model;
inputting the observation environment of the intelligent agent into an actor network model, and outputting the deployment action of the intelligent agent;
the deployment action and the global state of the intelligent agent are input into a criticizer network model, and an action judgment value is output;
according to the current state information, action information, rewarding information and state information of the intelligent agent at the next moment, a playback pool is established;
adopting a multi-agent depth determination strategy gradient algorithm, and updating network parameters in an actor network model and a criticizer network model according to the current state information, the action information, the rewarding information and the state information of the next moment acquired in a playback pool;
in the case where the actor network model and the criticizer network model converge, the converged actor network model and criticizer network model are determined as the multi-agent policy deployment model.
According to the method and the device, the micro-service deployment problem is converted into a partially observable Markov decision process, a multi-agent reinforcement learning method is adopted to solve the problem, a centralized training mode and a distributed execution mode are adopted, in a training stage, a container instance of the micro-service is used as an agent to acquire global information, an optimal deployment scheme is obtained, and in an execution stage, the micro-service can be deployed only by means of an observation space of the micro-service, so that communication expenditure among the micro-services is greatly reduced.
Optionally, updating network parameters in the actor network model and the criticizing network model includes:
acquiring a first loss function of an actor network model and a second loss function of a criticizer network model;
gradient calculation is carried out on the first loss function and the second loss function respectively;
and updating network parameters in the actor network model and the criticism network model by using a gradient descent method.
Some embodiments of the application adopt a method of fixing a network, fix a target network and transmit original network parameters to the target network at intervals, avoid continuous change of an updated target, and ensure the training stability.
Specifically, some observable Markov decision process representations, i.e., node preselections, in embodiments of the present application;
step 301: state space representation. The global state information includes the resource occupancy of the satellite nodes, the location information of the satellites, and the deployment of the containers, and may be represented as s= [ u, p, c ], where:
u=[u 1,1 ,u 1,2 ,...,u 1,R ,u 2,1 ,u 2,2 ,...,u 2,R ,...,u N,1 ,u N,2 ,...,u N,R ]
p=[x 1 ,y 1 ,z 1 ,x 2 ,y 2 ,z 2 ,...,x N ,y N ,z N ]
u i,j for node s i Upper resource r j Availability of [ x ] i ,y i ,z i ]For node s i Is provided with a coordinate of the position of (c),the index sequence number of the node is deployed for the container.
The container can not acquire global state information as an intelligent agent in the deployment process, and the observation space of the container instance j on the micro service i can be expressed as
Step 302: and (5) representing the action space. The action space of container instance j on microservice i is expressed ask is the number of nodes meeting the resource requirement in the observation space, when +.>When it indicates that the container is deployed to that node, otherwise 0./>
Step 303: state transfer function representation. The satellite position changes continuously with time, the intelligent body can make corresponding action according to the change of the position, the state changes, and the state transfer function can be expressed as
Step 304: the bonus function is represented. It is desirable that the resource utilization variance and delay be as small as possible, so the reward function can be expressed as
reward=-(βU+(1-β)D)
Step 4: model training is carried out on each parameter of an intelligent agent network model by adopting a multi-intelligent agent depth determination strategy gradient algorithm MADDPG, including building a neural network and updating network parameters;
the method comprises the following steps:
an agent network is built as shown in fig. 4. The container is considered as one agent, each agent comprising four networks, an Actor network μ (o i ;θ i ) Target actor network t_μ (o i ;θ i ) And Critic networks c (s, a; omega i ) A Targetcritic network t_c (s, a; omega i ). Comprising steps 401 to 402. The method of fixing the network is adopted, the Target network is fixed, the original network parameters are transmitted to the Target network at intervals, the continuous change of the updated Target is avoided, and the training stability is ensured.
Step 401: two network settings. The input of the Actor network, i.e. the Actor network model, is the local observation information o of the current intelligent agent i The resource occupation condition of the node is contained, and the output is deployment action a. The input of the Critic network, namely the criticizer network model, is the action and the global state output by the Actor network, namely the global state information s and the action a, and the output is the corresponding Q value for judging the quality of the action executed by the agent in the current state.
Step 402: network parameter delivery process. In parameter updating, if the updating target is changed continuously, the updating is difficult. Therefore, a fixed network method is adopted, the parameters of the Target network are fixed, the original network parameters are transmitted to the Target network at intervals, and the training stability is ensured.
Step five: building an experience playback pool D, randomly taking deployment actions by the agent according to the noise setting, generating a quadruple, namely recording state, action of the agent, next time state and rewards, and recording as(s) t ,a t ,r t ,s t+1 ). Since MADDPG algorithm is an exclusive strategy, the channel can be utilizedThe pool is played back to eliminate the correlation of the historical experience, the historical experience is broken up, and a batch of experience data is randomly selected when the neural network is trained, so that the neural network is trained better.
Step six: and executing MADDPG algorithm to update network parameters for centralized training. And randomly selecting four elements in the playback pool, and updating the Actor and Critic network parameters until convergence.
The updating process mainly comprises steps 601-602,
step 601: and updating the Actor network parameters. The loss function of the Actor network is-Q, -Q needs to be obtained by inputting the output action of the Actor network into the current Critic network, -Q is smaller and better. Observation space o for playback of agent i in pool i An Actor network μ (o i ;θ i ) In (a), a deployment action a is obtained i Then the global state information s and a i Inputting to Critic network to obtain Q value of the action, and updating network parameter theta by gradient descent with-Q as loss function i . In particular, the loss function may be expressed as
Wherein x= (o) 1 ,o 2 ,...,o N ) Representing the observation space of all agents, a i Representing agent i in its policy μ i The following actions. According to the chain law, its gradient can be expressed as
Updating the parameter θ using gradient descent i
Step 602: critic network parameters are updated. Critic network needs to make the predicted Q value as accurate as possible, so its loss function is Critic networkOutput Q(s) 0 ,a 0 ;ω i ) Sum of value (predicted value) and output Q value of Targetcritc network and prize r 1 +γQ(s 1 ,a 1 ;ω i ) The smaller the difference between (actual values), the better. The difference can be represented by MSE, and the network parameter omega is updated by gradient descent method i . In particular, the loss function may be expressed as
Calculating gradients
Updating parameter omega using gradient descent method i
The two networks are graded according to the respective loss functions, and the network parameters are updated by using a gradient descent method.
Fig. 5 is a schematic flow chart of model training provided in an embodiment of the present application, as shown in fig. 5, including:
1) Initializing a network and satellite nodes to obtain a candidate node set;
2) The container intelligent agent randomly generates actions to obtain a quadruple;
3) Storing the quadruple into an experience playback pool;
4) Randomly extracting four-element groups;
5) Updating an Actor network and a Critic network according to the LOSS function;
6) Updating Target network parameters, namely Target network parameters;
7) And outputting the trained strategy network, namely the multi-agent strategy deployment model.
Step 5: policy network deployment.
The trained policy network is deployed to the container agent, and the microservices can independently make optimal decisions based on local observations.
The satellite network micro-service deployment method based on multi-agent reinforcement learning is provided to solve the problem of micro-service deployment, wherein a neural network part adopts a fixed network method, and is divided into an original network and a target network, the target network is fixed first, and original network parameters are transmitted to the target network at intervals, so that updating difficulty caused by continuous change of an updating target is avoided, and training stability is ensured. After training is completed, the intelligent agent only needs to utilize the self observation space to make the best action by utilizing the strategy network, so that the cost caused by frequent interaction between micro services is reduced.
It should be noted that, in this embodiment, each of the possible embodiments may be implemented separately, or may be implemented in any combination without conflict, which is not limited to the implementation of the present application.
Another embodiment of the present application provides a satellite network micro-service deployment device based on multi-agent reinforcement learning, which is configured to execute the satellite network micro-service deployment method based on multi-agent reinforcement learning provided in the foregoing embodiment.
Fig. 6 is a schematic structural diagram of a satellite network micro-service deployment device based on multi-agent reinforcement learning according to an embodiment of the present application. The satellite network micro-service deployment device based on multi-agent reinforcement learning comprises an acquisition module 601, a first determination module 602, a second determination module 603 and a configuration module 604, wherein:
the acquisition module 601 is configured to acquire resource requirement information of a micro service;
the first determining module 602 is configured to determine resource utilization information and delay information of the satellite node according to a pre-established resource utilization model and delay model and configuration information of the satellite node;
the second determining module 603 is configured to determine a deployment policy of a satellite node corresponding to the resource demand information of the micro service by using a pre-trained multi-agent policy deployment model when the resource utilization information is smaller than a first preset value or the time delay information is smaller than a second preset value, where the pre-trained multi-agent policy deployment model is obtained by training each parameter of the agent network model by using a multi-agent depth determination policy gradient algorithm;
The configuration module 604 is configured to configure the server terminal according to a deployment policy of the satellite node.
The specific manner in which the individual modules perform the operations of the apparatus of this embodiment has been described in detail in connection with embodiments of the method and will not be described in detail herein.
According to some embodiments of the application, a resource utilization rate model and a time delay model are established based on a micro-service architecture, resource utilization rate information and time delay information of satellite nodes are determined according to configuration information of the satellite nodes, then a pre-trained multi-agent strategy deployment model is adopted to determine a deployment strategy of the satellite nodes corresponding to resource demand information of micro-services, and a server terminal is configured according to the deployment strategy of the satellite nodes, so that resource configuration can be carried out on each satellite node according to the resource demand of the micro-services on the server and the resource surplus of each satellite node, namely, the micro-services with different resource demands are deployed on the proper satellite nodes, the resource utilization balance degree of the satellite nodes is improved, the calling time delay is reduced, and the configuration efficiency is improved.
The satellite network micro-service deployment device based on multi-agent reinforcement learning provided by the embodiment is further described in the other embodiment.
Optionally, the apparatus further comprises a model training module for:
obtaining an agent sample parameter, wherein the agent sample parameter at least comprises an agent observation environment:
acquiring an intelligent network model, wherein the intelligent network model is at least an actor network model and a criticizer network model;
inputting the observation environment of the intelligent agent into an actor network model, and outputting the deployment action of the intelligent agent;
the deployment action and the global state of the intelligent agent are input into a criticizer network model, and an action judgment value is output;
according to the current state information, action information, rewarding information and state information of the intelligent agent at the next moment, a playback pool is established;
adopting a multi-agent depth determination strategy gradient algorithm, and updating network parameters in an actor network model and a criticizer network model according to the current state information, the action information, the rewarding information and the state information of the next moment acquired in a playback pool;
in the case where the actor network model and the criticizer network model converge, the converged actor network model and criticizer network model are determined as the multi-agent policy deployment model.
According to the method and the device, the micro-service deployment problem is converted into a partially observable Markov decision process, a multi-agent reinforcement learning method is adopted to solve the problem, a centralized training mode and a distributed execution mode are adopted, in a training stage, a container instance of the micro-service is used as an agent to acquire global information, an optimal deployment scheme is obtained, and in an execution stage, the micro-service can be deployed only by means of an observation space of the micro-service, so that communication expenditure among the micro-services is greatly reduced.
Optionally, the model training module is configured to:
acquiring a first loss function of an actor network model and a second loss function of a criticizer network model;
gradient calculation is carried out on the first loss function and the second loss function respectively;
and updating network parameters in the actor network model and the criticism network model by using a gradient descent method.
Some embodiments of the application adopt a method of fixing a network, fix a target network and transmit original network parameters to the target network at intervals, avoid continuous change of an updated target, and ensure the training stability.
Optionally, the configuration information of the satellite nodes includes at least the number of satellite nodes, the total amount of resource types and heterogeneous resource capacity.
Optionally, the model training module is configured to:
acquiring resource balance information of a first resource utilization rate model of different types of resources on the same satellite node and node balance information of a second resource utilization rate model of the same type of resources on different satellite nodes;
and determining a resource utilization rate model according to the resource balance degree information and the weight value corresponding to the resource balance degree information, and the node balance degree information and the weight value corresponding to the node balance degree information.
Optionally, the delay model comprises at least a transmission delay sub-model, a propagation delay sub-model, and a migration delay sub-model.
Some embodiments of the present application build a resource utilization model and a latency model, minimize resource utilization variance and latency, and represent the micro-service deployment problem as a multi-objective optimization problem.
The specific manner in which the individual modules perform the operations of the apparatus of this embodiment has been described in detail in connection with embodiments of the method and will not be described in detail herein.
It should be noted that, in this embodiment, each of the possible embodiments may be implemented separately, or may be implemented in any combination without conflict, which is not limited to the implementation of the present application.
The embodiment of the application also provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, can implement the operations of the method corresponding to any embodiment of the satellite network micro-service deployment method based on multi-agent reinforcement learning provided in the above embodiment.
The embodiment of the application also provides a computer program product, which comprises a computer program, wherein the computer program can realize the operation of the method corresponding to any embodiment in the satellite network micro-service deployment method based on multi-agent reinforcement learning provided by the embodiment when being executed by a processor.
As shown in fig. 7, some embodiments of the present application provide an electronic device 700, the electronic device 700 comprising: memory 710, processor 720, and a computer program stored on memory 710 and executable on processor 720, wherein processor 720, when reading the program from memory 710 and executing the program via bus 730, can implement the method of any of the embodiments as included in the multi-agent reinforcement learning based satellite network micro-service deployment method described above.
Processor 720 may process the digital signals and may include various computing structures. Such as a complex instruction set computer architecture, a reduced instruction set computer architecture, or an architecture that implements a combination of instruction sets. In some examples, processor 720 may be a microprocessor.
Memory 710 may be used for storing instructions to be executed by processor 720 or data related to execution of the instructions. Such instructions and/or data may include code to implement some or all of the functions of one or more modules described in embodiments of the present application. The processor 720 of the disclosed embodiments may be configured to execute instructions in the memory 710 to implement the methods shown above. Memory 710 includes dynamic random access memory, static random access memory, flash memory, optical memory, or other memory known to those skilled in the art.
The above is only an example of the present application, and is not intended to limit the scope of the present application, and various modifications and variations will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principles of the present application should be included in the protection scope of the present application. It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.
The foregoing is merely specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Claims (10)

1. The satellite network micro-service deployment method based on multi-agent reinforcement learning is characterized by comprising the following steps:
acquiring resource demand information of the micro service;
determining resource utilization rate information and time delay information of the satellite node according to a pre-established resource utilization rate model and time delay model and configuration information of the satellite node;
when the resource utilization rate information is smaller than a first preset value or the time delay information is smaller than a second preset value, a pre-trained multi-agent strategy deployment model is adopted to determine a deployment strategy of a satellite node corresponding to the resource demand information of the micro service, wherein the pre-trained multi-agent strategy deployment model is obtained by training each parameter of an agent network model by adopting a multi-agent depth determination strategy gradient algorithm;
and configuring the server terminal according to the deployment strategy of the satellite node.
2. The satellite network micro-service deployment method based on multi-agent reinforcement learning according to claim 1, wherein the multi-agent policy deployment model is obtained by:
obtaining an agent sample parameter, wherein the agent sample parameter at least comprises an agent observation environment:
Acquiring an intelligent network model, wherein the intelligent network model is at least an actor network model and a criticizer network model;
inputting the intelligent agent observation environment into the actor network model, and outputting deployment actions of intelligent agents;
inputting the deployment action and the global state of the intelligent agent into the criticizer network model, and outputting an action judgment value;
according to the current state information, action information, rewarding information and state information of the intelligent agent at the next moment, a playback pool is established;
a multi-agent depth determination strategy gradient algorithm is adopted, and network parameters in the actor network model and the criticizer network model are updated according to the current state information, the action information, the rewarding information and the state information of the next moment acquired in the playback pool;
and under the condition that the actor network model and the criticizer network model are converged, determining the converged actor network model and the criticizer network model as the multi-agent strategy deployment model.
3. The satellite network micro-service deployment method based on multi-agent reinforcement learning of claim 2, wherein the updating of network parameters in the actor network model and the criticizer network model comprises:
Acquiring a first loss function of an actor network model and a second loss function of the criticizer network model;
gradient calculation is carried out on the first loss function and the second loss function respectively;
and updating network parameters in the actor network model and the criticism network model by using a gradient descent method.
4. The satellite network micro-service deployment method based on multi-agent reinforcement learning according to claim 1, wherein the configuration information of the satellite nodes at least comprises the number of satellite nodes, the total amount of resource types and heterogeneous resource capacity.
5. The satellite network micro-service deployment method based on multi-agent reinforcement learning of claim 1, wherein the resource utilization model is obtained by:
acquiring resource balance information of a first resource utilization rate model of different types of resources on the same satellite node and node balance information of a second resource utilization rate model of the same type of resources on different satellite nodes;
and determining the resource utilization rate model according to the resource balance degree information and the weight value corresponding to the resource balance degree information, and the node balance degree information and the weight value corresponding to the node balance degree information.
6. The satellite network micro-service deployment method based on multi-agent reinforcement learning of claim 1, wherein the delay model comprises at least a transmission delay sub-model, a propagation delay sub-model, and a migration delay sub-model.
7. A satellite network micro-service deployment device based on multi-agent reinforcement learning, the device comprising:
the acquisition module is used for acquiring resource demand information of the micro service;
the first determining module is used for determining the resource utilization rate information and the time delay information of the satellite node according to a pre-established resource utilization rate model and a time delay model and configuration information of the satellite node;
the second determining module is configured to determine a deployment strategy of a satellite node corresponding to the resource demand information of the micro service by using a pre-trained multi-agent strategy deployment model when the resource utilization rate information is smaller than a first preset value or the time delay information is smaller than a second preset value, where the pre-trained multi-agent strategy deployment model is obtained by training each parameter of an agent network model by using a multi-agent depth determination strategy gradient algorithm;
And the configuration module is used for configuring the server terminal according to the deployment strategy of the satellite node.
8. The multi-agent reinforcement learning based satellite network micro-service deployment device of claim 7, further comprising a model training module for:
obtaining an agent sample parameter, wherein the agent sample parameter at least comprises an agent observation environment:
acquiring an intelligent network model, wherein the intelligent network model is at least an actor network model and a criticizer network model;
inputting the intelligent agent observation environment into the actor network model, and outputting deployment actions of intelligent agents;
inputting the deployment action and the global state of the intelligent agent into the criticizer network model, and outputting an action judgment value;
according to the current state information, action information, rewarding information and state information of the intelligent agent at the next moment, a playback pool is established;
a multi-agent depth determination strategy gradient algorithm is adopted, and network parameters in the actor network model and the criticizer network model are updated according to the current state information, the action information, the rewarding information and the state information of the next moment acquired in the playback pool;
And under the condition that the actor network model and the criticizer network model are converged, determining the converged actor network model and the criticizer network model as the multi-agent strategy deployment model.
9. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor, when executing the program, implements the multi-agent reinforcement learning-based satellite network micro-service deployment method of any one of claims 1-6.
10. A computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, and wherein the program is executed by a processor to implement the satellite network micro-service deployment method based on multi-agent reinforcement learning according to any one of claims 1 to 6.
CN202311360363.1A 2023-10-19 2023-10-19 Satellite network micro-service deployment method and device based on multi-agent reinforcement learning Pending CN117811907A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311360363.1A CN117811907A (en) 2023-10-19 2023-10-19 Satellite network micro-service deployment method and device based on multi-agent reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311360363.1A CN117811907A (en) 2023-10-19 2023-10-19 Satellite network micro-service deployment method and device based on multi-agent reinforcement learning

Publications (1)

Publication Number Publication Date
CN117811907A true CN117811907A (en) 2024-04-02

Family

ID=90432359

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311360363.1A Pending CN117811907A (en) 2023-10-19 2023-10-19 Satellite network micro-service deployment method and device based on multi-agent reinforcement learning

Country Status (1)

Country Link
CN (1) CN117811907A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118331591A (en) * 2024-06-11 2024-07-12 之江实验室 Method, device, storage medium and equipment for deploying intelligent algorithm on satellite

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118331591A (en) * 2024-06-11 2024-07-12 之江实验室 Method, device, storage medium and equipment for deploying intelligent algorithm on satellite

Similar Documents

Publication Publication Date Title
CN111756812B (en) Energy consumption perception edge cloud cooperation dynamic unloading scheduling method
CN113346944B (en) Time delay minimization calculation task unloading method and system in air-space-ground integrated network
CN113326126B (en) Task processing method, task scheduling method, device and computer equipment
CN109884897B (en) Unmanned aerial vehicle task matching and calculation migration method based on deep reinforcement learning
CN111858009A (en) Task scheduling method of mobile edge computing system based on migration and reinforcement learning
CN111447005B (en) Link planning method and device for software defined satellite network
CN112272381B (en) Satellite network task deployment method and system
CN117811907A (en) Satellite network micro-service deployment method and device based on multi-agent reinforcement learning
CN112083933A (en) Service function chain deployment method based on reinforcement learning
CN113098714A (en) Low-delay network slicing method based on deep reinforcement learning
CN114415735B (en) Dynamic environment-oriented multi-unmanned aerial vehicle distributed intelligent task allocation method
CN117041330B (en) Edge micro-service fine granularity deployment method and system based on reinforcement learning
CN112784362A (en) Hybrid optimization method and system for unmanned aerial vehicle-assisted edge calculation
CN110874626B (en) Quantization method and quantization device
CN113708982B (en) Service function chain deployment method and system based on group learning
CN116451934A (en) Multi-unmanned aerial vehicle edge calculation path optimization and dependent task scheduling optimization method and system
CN115033359A (en) Internet of things agent multi-task scheduling method and system based on time delay control
CN114022731A (en) Federal learning node selection method based on DRL
CN117290071A (en) Fine-grained task scheduling method and service architecture in vehicle edge calculation
Mobasheri et al. Toward developing fog decision making on the transmission rate of various IoT devices based on reinforcement learning
CN113992520B (en) Virtual network resource deployment method and system
CN112637032B (en) Service function chain deployment method and device
Wang et al. Solving virtual network mapping fast by combining neural network and MCTS
Xiang et al. Hierarchical Disturbance Propagation Mechanism and Improved Contract Net Protocol for Satellite TT&C Resource Dynamic Scheduling
CN116684273B (en) Automatic planning method and system for mobile communication network structure based on particle swarm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination