CN111444009A - Resource allocation method and device based on deep reinforcement learning - Google Patents

Resource allocation method and device based on deep reinforcement learning Download PDF

Info

Publication number
CN111444009A
CN111444009A CN201911117328.0A CN201911117328A CN111444009A CN 111444009 A CN111444009 A CN 111444009A CN 201911117328 A CN201911117328 A CN 201911117328A CN 111444009 A CN111444009 A CN 111444009A
Authority
CN
China
Prior art keywords
service
resource
representing
state parameters
evaluation parameter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911117328.0A
Other languages
Chinese (zh)
Other versions
CN111444009B (en
Inventor
张海涛
郭彤宇
郭建立
黄瀚
何晨泽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
CETC 54 Research Institute
Original Assignee
Beijing University of Posts and Telecommunications
CETC 54 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications, CETC 54 Research Institute filed Critical Beijing University of Posts and Telecommunications
Priority to CN201911117328.0A priority Critical patent/CN111444009B/en
Publication of CN111444009A publication Critical patent/CN111444009A/en
Application granted granted Critical
Publication of CN111444009B publication Critical patent/CN111444009B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5072Grid computing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5021Priority

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the invention provides a resource allocation method and a device based on deep reinforcement learning, wherein the method comprises the following steps: determining services of various resources to be allocated contained in an application program request of a user and allocation priority of each service; determining state parameters of a current edge micro cloud system, wherein the state parameters comprise a resource balance degree evaluation parameter, a response delay evaluation parameter and the resource surplus of each computing node in each micro cloud; inputting the state parameters into a resource balance optimization model which is trained in advance to obtain a first target computing node of a first service; the resource balance optimization model is completed based on deep reinforcement learning training, and a first service is deployed in a first target computing node; and updating the state parameters, and returning to the parameter input step until each service of the resources to be allocated, which is contained in the application program request, completes the resource allocation. Compared with the traditional resource allocation method, the method can meet the communication delay requirement and achieve higher resource utilization balance degree.

Description

Resource allocation method and device based on deep reinforcement learning
Technical Field
The present invention relates to the field of wireless communication technologies, and in particular, to a resource allocation method and apparatus based on deep reinforcement learning.
Background
In recent years, with the progress of informatization and networking, information systems have played an increasingly important role in the fields of military affairs, disaster relief, and the like. In such a highly dynamic environment, mission plans and equipment configurations may change frequently, and network connectivity may fluctuate. The service resources based on the stand-alone equipment are very limited, and cannot deal with complex computing tasks. Cloud computing technology is an effective means to address such scenarios. In the cloud computing technology, resource configuration can be performed in a user-defined manner according to task requirements, so that convenient and flexible management service is provided for large-scale application programs, however, a traditional cloud platform is usually deployed in a region far away from a user, communication delay is high, and continuous and reliable service is difficult to provide in an environment with unstable network.
To solve the above problem, an edge micro cloud platform is created. The edge micro cloud platform is an emerging cloud computing model and is composed of a plurality of edge micro clouds which are distributed and deployed, each edge micro cloud comprises a plurality of small servers, and the scale of the edge micro cloud platform can be adjusted along with task requirements. Most of the edge micro clouds are deployed on a mobile vehicle and move according to task requirements, so that higher-quality cloud services are provided. With the development of microservice technology, an application is generally composed of a plurality of combined services which communicate with each other, and each combined service has different requirements on resources with different dimensions. Because the computing power of a single edge cloudlet is limited and cannot meet all service requirements, the combined service can be distributed and deployed in different edge cloudlets, and the different edge cloudlets cooperate with each other to provide the computing power together.
However, in the existing edge clouding technology, when allocating resources for a service, a situation of resource fragmentation often occurs, that is, resource allocation is unbalanced, thereby causing waste of resources in a certain dimension. In addition, when allocating resources, not only the resource requirements of each service need to be considered, but also the communication requirements between services need to be considered, which further increases the complexity of resource allocation, and none of the related art considers the communication requirements between services.
Therefore, a resource allocation method that can satisfy the communication requirements between services and has a high balance of resource utilization is needed.
Disclosure of Invention
An object of the embodiments of the present invention is to provide a method and an apparatus for resource allocation based on deep reinforcement learning, so as to improve the balance of resource utilization. The specific technical scheme is as follows:
in order to achieve the above object, an embodiment of the present invention provides a resource allocation method based on deep reinforcement learning, which is applied to a control platform of an edge cloudlet system, where the edge cloudlet system further includes a plurality of cloudlets, each cloudlet includes a plurality of computing nodes, and the method includes:
determining services of various resources to be allocated contained in an application program request of a user and allocation priority of each service;
determining state parameters of a current edge micro cloud system, wherein the state parameters comprise a resource balance degree evaluation parameter, a response delay evaluation parameter and the resource surplus of each computing node in each micro cloud;
inputting the state parameters into a resource balance optimization model which is trained in advance to obtain a first target computing node of a first service; the first service is the service with the highest priority currently allocated; the resource balance optimization model is completed based on deep reinforcement learning training, wherein a training set of the deep reinforcement learning comprises: sample state parameters of the edge micro-cloud system;
deploying the first service to the first target computing node;
and updating the state parameters, and returning to the step of inputting the state parameters into the resource balance optimization model which is trained in advance until the resource allocation of each service of the resources to be allocated contained in the application program request is completed.
Optionally, the resource balance evaluation parameter is calculated based on the following formula:
Figure BDA0002274422600000021
Figure BDA0002274422600000022
Figure BDA0002274422600000023
Figure BDA0002274422600000031
Figure BDA0002274422600000032
wherein the content of the first and second substances,
Figure BDA0002274422600000033
representing the resource utilization variance of the jth computing node in the ith cloudlet, D representing the number of classes of resources,
Figure BDA0002274422600000034
indicating the resource utilization rate of the d-th type resource in the j-th computing node in the ith micro cloud,
Figure BDA0002274422600000035
representing the resource utilization rate of all kinds of resources in the jth computing node in the ith micro cloudX denotes the resource allocation policy,
Figure BDA0002274422600000036
representing the resource balance rate, RUBD, of the jth computing node in the ith cloutiIndicating resource utilization balance of the ith cloudlet, LiRepresenting the total number of compute nodes, RUBD, in the ith cloutTotalRepresenting a resource balance evaluation parameter of the edge micro cloud system, wherein K represents the total number of micro clouds in the edge micro cloud system;
calculating the response delay evaluation parameter based on the following formula:
tTotal=TComp(X)+TTR(X)
tTotalindicating a response delay evaluation parameter, TComp(X) represents a calculation delay, TTR(X) represents a transmission delay.
Optionally, the resource balancing optimization model is trained according to the following steps:
acquiring a preset neural network model and the training set;
inputting the sample state parameters into the neural network model to obtain service placement actions; the service placement action represents determining a placed target compute node for a sample service;
updating the sample state parameters based on the service placement action to obtain updated sample state parameters;
calculating an incentive value of the service placement action based on the resource balance degree evaluation parameter and the response delay evaluation parameter contained in the sample state parameter and the resource balance degree evaluation parameter contained in the updated sample state parameter and the response delay evaluation parameter;
substituting the sample state parameter, the updated sample state parameter, the service placing action and the reward value of the service placing action into a preset loss function, and calculating the loss value of the service placing action;
determining whether the neural network model converges according to the loss value;
if not, adjusting parameter values in the neural network model, and returning to the step of inputting the updated sample state parameters into the neural network model to obtain a service placing action;
and if so, determining the current neural network model as a resource balance optimization model.
Optionally, the loss function is:
Figure BDA0002274422600000041
Figure BDA0002274422600000042
wherein L represents a loss function, E [ [ alpha ] ]]Representing a mathematical expectation, n representing the number of sets of historical iteration data referenced per iteration, t representing the time of day,
Figure BDA0002274422600000043
representing the priority weights of n sets of historical iteration data after time t,
Figure BDA0002274422600000044
representing the sum of the prize values for n iterations after time t,
Figure BDA0002274422600000045
representing a decay factor, Q, for a reward value of n iterations after time ttargetRepresenting the target network, QevaRepresenting an estimated network, stA sample state parameter representing the time t, atService Placement action, s, representing time tt+nRepresents the state parameters of the samples after iteration n times, a' represents the service placement action for making the estimated network output the maximum value, k represents the iteration number,
Figure BDA0002274422600000046
representing a decay factor, r, for the reward value of the kth iteration after time tt+k+1Representing the prize value of the kth iteration after time t.
In order to achieve the above object, an embodiment of the present invention further provides a resource allocation apparatus based on deep reinforcement learning, which is applied to a control platform of an edge cloudlet system, where the edge cloudlet system further includes a plurality of cloudlets, each cloudlet includes a plurality of computing nodes, and the apparatus includes:
the first determining module is used for determining services of various resources to be allocated contained in an application program request of a user and the allocation priority of each service;
the second determining module is used for determining state parameters of the current edge micro cloud system, wherein the state parameters comprise a resource balance degree evaluation parameter, a response delay evaluation parameter and the resource surplus of each computing node in each micro cloud;
the input module is used for inputting the state parameters into a resource balance optimization model which is trained in advance to obtain a first target computing node of a first service; the first service is the service with the highest priority currently allocated; the resource balance optimization model is completed based on deep reinforcement learning training, wherein a training set of the deep reinforcement learning comprises: sample state parameters of the edge micro-cloud system;
a deployment module to deploy the first service to the first target computing node;
and the updating module is used for updating the state parameters and triggering the input module until each service of the resources to be allocated contained in the application program request completes resource allocation.
Optionally, the apparatus further comprises: a calculating module, configured to calculate the resource balance evaluation parameter based on the following formula:
Figure BDA0002274422600000051
Figure BDA0002274422600000052
Figure BDA0002274422600000053
Figure BDA0002274422600000054
Figure BDA0002274422600000055
wherein the content of the first and second substances,
Figure BDA0002274422600000056
representing the resource utilization variance of the jth computing node in the ith cloudlet, D representing the number of classes of resources,
Figure BDA0002274422600000057
indicating the resource utilization rate of the d-th type resource in the j-th computing node in the ith micro cloud,
Figure BDA0002274422600000058
represents the average value of the resource utilization rate of all kinds of resources in the jth computing node in the ith micro cloud, X represents the resource allocation strategy,
Figure BDA0002274422600000059
representing the resource balance rate, RUBD, of the jth computing node in the ith cloutiIndicating resource utilization balance of the ith cloudlet, LiRepresenting the total number of compute nodes, RUBD, in the ith cloutTotalRepresenting a resource balance evaluation parameter of the edge micro cloud system, wherein K represents the total number of micro clouds in the edge micro cloud system;
calculating the response delay evaluation parameter based on the following formula:
tTotal=TComp(X)+TTR(X)
tTotalindicating a response delay evaluation parameter, TComp(X) represents a calculation delay, TTR(X) represents a transmission delay.
Optionally, the apparatus further comprises: a training module, configured to train the resource balancing optimization model according to the following steps:
acquiring a preset neural network model and the training set;
inputting the sample state parameters into the neural network model to obtain service placement actions; the service placement action represents determining a placed target compute node for a sample service;
updating the sample state parameters based on the service placement action to obtain updated sample state parameters;
calculating an incentive value of the service placement action based on the resource balance degree evaluation parameter and the response delay evaluation parameter contained in the sample state parameter and the resource balance degree evaluation parameter contained in the updated sample state parameter and the response delay evaluation parameter;
substituting the sample state parameter, the updated sample state parameter, the service placing action and the reward value of the service placing action into a preset loss function, and calculating the loss value of the service placing action;
determining whether the neural network model converges according to the loss value;
if not, adjusting parameter values in the neural network model, and returning to the step of inputting the updated sample state parameters into the neural network model to obtain a service placing action;
and if so, determining the current neural network model as a resource balance optimization model.
Optionally, the loss function is:
Figure BDA0002274422600000061
Figure BDA0002274422600000062
wherein L represents a loss function, E [ [ alpha ] ]]Representing a mathematical expectation, n representing the number of sets of historical iteration data referenced per iteration, and t representingAt the moment of time, the time of day,
Figure BDA0002274422600000063
representing the priority weights of n sets of historical iteration data after time t,
Figure BDA0002274422600000064
representing the sum of the prize values for n iterations after time t,
Figure BDA0002274422600000065
representing a decay factor, Q, for a reward value of n iterations after time ttargetRepresenting the target network, QevaRepresenting an estimated network, stA sample state parameter representing the time t, atService Placement action, s, representing time tt+nRepresents the state parameters of the samples after iteration n times, a' represents the service placement action for making the estimated network output the maximum value, k represents the iteration number,
Figure BDA0002274422600000066
representing a decay factor, r, for the reward value of the kth iteration after time tt+k+1Representing the prize value of the kth iteration after time t.
In order to achieve the above object, an embodiment of the present invention further provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;
a memory for storing a computer program;
and the processor is used for realizing any method step when executing the program stored in the memory.
To achieve the above object, an embodiment of the present invention further provides a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements any of the above method steps.
Therefore, the resource allocation method and device based on deep reinforcement learning provided by the embodiment of the invention can determine the services of various resources to be allocated contained in the application program request of the user and the allocation priority of each service; determining state parameters of a current edge micro cloud system, wherein the state parameters comprise a resource balance degree evaluation parameter, a response delay evaluation parameter and the resource surplus of each computing node in each micro cloud; inputting the state parameters into a resource balance optimization model which is trained in advance to obtain a first target computing node of a first service, and deploying the first service in the first target computing node; and updating the state parameters, and returning to the step of inputting the state parameters into the resource balance optimization model which is trained in advance until each service of the resources to be distributed contained in the application program request completes the resource distribution. Therefore, response delay and resource allocation balance are comprehensively considered, and a network model is trained in a deep reinforcement learning mode.
Of course, it is not necessary for any product or method of practicing the invention to achieve all of the above-described advantages at the same time.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flowchart of a resource allocation method based on deep reinforcement learning according to an embodiment of the present invention;
fig. 2 is a schematic flowchart of a process for training a resource balancing optimization model according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a deep reinforcement learning-based resource allocation apparatus according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In order to solve the technical problem of low resource utilization balance of the existing service resource allocation method in the field of edge cloudiness, the embodiment of the invention provides a resource allocation method and device based on deep reinforcement learning, electronic equipment and a computer readable storage medium.
For ease of understanding, the following description will first describe an application scenario of the embodiment of the present invention.
The resource allocation method based on deep reinforcement learning provided by the embodiment of the invention can be applied to highly dynamic scenes such as military field, disaster relief field and the like, and the scenes usually adopt an edge micro cloud system to provide services for application programs. An edge cloudlet system may include a control platform, a plurality of cloudlets, each cloudlet containing a plurality of computing nodes, wherein a computing node may represent an electronic device containing a processor, a communication interface, a memory, and a communication bus, such as a personal computer, etc., which may be typically disposed on a mobile vehicle. A cloudlet represents a collection of computing nodes, and a cloudlet may typically include computing nodes on a mobile vehicle. The resource allocation method based on deep reinforcement learning provided by the embodiment of the invention can be applied to a control platform, namely the control platform determines to deploy each service contained in an application program in which computing node.
Specifically, referring to fig. 1, the resource allocation method based on deep reinforcement learning according to an embodiment of the present invention may include the following steps:
s101: determining a plurality of services of resources to be allocated contained in the application program request of the user and the allocation priority of each service.
In the embodiment of the present invention, the application request of the user may include a plurality of services, such as a location service, an image processing service, and the like, which need to be deployed on the computing node in the micro cloud, so that the computing node can provide corresponding resources for the services. In the embodiment of the present invention, the process of allocating resources to a service may also be understood as a process of allocating a computing node to a service.
In addition, since there is a certain association relationship between services, the services to be allocated with resources have allocation priorities. For example, if service a needs to depend on service b, then service b has a higher allocation priority than service a, i.e. service a can only be allocated if service b is allocated a compute node first.
S102: and determining state parameters of the current edge micro cloud system, wherein the state parameters comprise a resource balance degree evaluation parameter, a response delay evaluation parameter and the resource residual quantity of each computing node in each micro cloud.
In the embodiment of the invention, the resource balance evaluation parameter represents the balance of resource allocation of the computing nodes in the micro cloud.
In one embodiment of the present invention, the resource balance evaluation parameter may be calculated based on the following formula:
first, the resource utilization variance is defined as follows:
Figure BDA0002274422600000091
wherein the content of the first and second substances,
Figure BDA0002274422600000092
representing the resource utilization variance of the jth computing node in the ith cloudlet, D representing the number of classes of resources,
Figure BDA0002274422600000093
indicating the resource utilization rate of the d-th type resource in the j-th computing node in the ith micro cloud,
Figure BDA0002274422600000094
representing the resource utilization rate of all kinds of resources in the jth computing node in the ith micro cloudX denotes the resource allocation policy.
Since the resource utilization variance cannot reflect the condition of resource utilization imbalance, in which the utilization rate of a specific resource is too high, and the utilization rates of other types of resources are low, the resource balance rate can be defined according to the following formula:
Figure BDA0002274422600000095
wherein the content of the first and second substances,
Figure BDA0002274422600000096
and expressing the resource balance rate of the j computing node in the ith micro cloud.
Further, normalizing the resource balance rate may define the resource utilization balance degree according to the following formula:
Figure BDA0002274422600000097
wherein the content of the first and second substances,
Figure BDA0002274422600000098
the resource utilization balance degree of the jth computing node in the ith micro cloud is represented, the value of the resource utilization balance degree is between 0 and 1, and the larger the resource utilization balance degree value is, the more balanced the resource utilization is.
Further, the resource utilization balance degree of the whole edge micro cloud system can be determined:
Figure BDA0002274422600000101
Figure BDA0002274422600000102
wherein the RUBDiIndicating resource utilization balance of the ith cloudlet, LiRepresenting the total number of compute nodes, RUBD, in the ith cloutTotalExpressing the resource utilization balance of the edge micro-cloud system, namely a resource balance evaluation parameter, and K represents an edgeThe total number of cloudiness in the limbal cloudiness system.
In the embodiment of the present invention, the response delay evaluation parameter represents the sum of the computation delay and the communication delay of the edge micro-cloud system, that is:
tTotal=TComp(X)+TTR(X)
tTotalindicating a response delay evaluation parameter, TComp(X) represents a calculation delay, TTR(X) represents a transmission delay.
In the embodiment of the present invention, can be used
Figure BDA0002274422600000103
And the resource residual quantity of the j computing node in the ith micro cloud is represented.
Furthermore, in the embodiment of the present invention, a set s may be used to represent a state parameter of the current edge micro cloud system, and then:
Figure BDA0002274422600000104
that is, the state parameters of the edge micro-cloud system may be composed of a resource balance evaluation parameter, a response delay evaluation parameter, and the remaining amount of resources of each computing node in each micro-cloud.
S103: inputting the state parameters into a resource balance optimization model which is trained in advance to obtain a first target computing node of a first service; the first service is the service with the highest priority currently allocated, and the resource balance optimization model is completed based on deep reinforcement learning training, wherein the training set of the deep reinforcement learning comprises: sample state parameters of the edge clouding system.
In the embodiment of the invention, after the control platform obtains the state parameters of the current edge micro cloud system, the state parameters can be input into the resource balance optimization model, and the resource balance optimization model is completed based on deep reinforcement learning training according to the training set, so that a resource allocation strategy which is most suitable for the current state parameters can be output.
Specifically, the resource balance optimization model outputs a first target computing node of the first serviceAnd the first service is the service with the highest current allocation priority. For example, if the service with the highest priority is currently allocated as the positioning service, the resource balancing optimization model outputs the target computing node of the positioning service, which can be recorded as
Figure BDA0002274422600000105
Then
Figure BDA0002274422600000106
Indicating that the location service is assigned to the jth node in the ith cloudlet.
The training process of the resource balancing optimization model can be referred to below, and is not described herein again.
S104: the first service is deployed to a first target compute node.
In this embodiment of the present invention, after determining a first target computing node of a first service, the first service may be deployed in the first target computing node, and then the first service may be run based on resources in the first target computing node.
S105: and updating the state parameters, and returning to the step of inputting the state parameters into the resource balance optimization model which is trained in advance until each service of the resources to be distributed contained in the application program request completes the resource distribution.
In the embodiment of the invention, after the resources are allocated to the first service, the state parameters of the edge micro cloud system are changed, so that the control platform counts the current state parameters of the edge micro cloud system again and continues to allocate the resources to the next service.
And the control platform inputs the current state parameters into the resource balance optimization model so as to obtain the target computing node of the service with the highest current distribution priority.
In the embodiment of the present invention, the above steps may be executed in a loop until each service of the resource to be allocated included in the application request completes resource allocation.
Therefore, the resource allocation method based on deep reinforcement learning provided by the embodiment of the invention can determine the services of various resources to be allocated contained in the application program request of the user and the allocation priority of each service; determining state parameters of a current edge micro cloud system, wherein the state parameters comprise a resource balance degree evaluation parameter, a response delay evaluation parameter and the resource surplus of each computing node in each micro cloud; inputting the state parameters into a resource balance optimization model which is trained in advance to obtain a first target computing node of a first service, and deploying the first service in the first target computing node; and updating the state parameters, and returning to the step of inputting the state parameters into the resource balance optimization model which is trained in advance until each service of the resources to be distributed contained in the application program request completes the resource distribution. Therefore, response delay and resource allocation balance are comprehensively considered, and a network model is trained in a deep reinforcement learning mode.
In the embodiment of the invention, the resource balance optimization model can be trained based on deep reinforcement learning. Deep reinforcement learning is a combination of reinforcement learning and deep learning.
For ease of understanding, reinforcement learning is briefly described below.
The reinforcement learning is a type of machine learning, and the basic idea is to generate actions according to scene states, obtain learning information through receiving environment rewards for the actions and update model parameters, and finally, can realize the optimal action in a specific scene state.
In the embodiment of the invention, the resource allocation process can be modeled as a reinforcement learning model, wherein the scene state is a state parameter of the edge micro-cloud system, and the action is a resource allocation strategy for a certain service. Therefore, the trained resource balance optimization model can output an optimal resource allocation strategy for a certain service according to the state parameters of the edge micro-cloud system.
In one embodiment of the present invention, referring to fig. 2, the resource balancing optimization model may be trained by the following steps:
s201: and acquiring a preset neural network model and a training set.
As can be understood by those skilled in the art, in contrast to conventional supervised learning, the reinforcement learning has no label as a sample during training, and only initial input states are required as a training set during training of the reinforcement learning.
In the embodiment of the invention, the training set can be a sample state parameter of the edge micro-cloud system.
S202: inputting the sample state parameters into a neural network model to obtain service placement actions; the service placement action represents determining a placed target compute node for the sample service.
In the embodiment of the invention, the input of the neural network model is a sample state parameter, the output is a service placing action, and the service placing action represents that a placed target computing node is determined for a sample service. The sample state parameter may be denoted by s, as above,
Figure BDA0002274422600000121
and the set of the resource balance evaluation parameter, the response delay evaluation parameter and the resource residual quantity of each computing node in each micro cloud is represented. The service placement action may be represented by a,
Figure BDA0002274422600000122
Figure BDA0002274422600000123
the representation service is assigned to the jth node in the ith cloudlet.
S203: and updating the sample state parameters based on the service placing action to obtain the updated sample state parameters.
After the service placement action is generated, the sample state parameters are updated to obtain updated sample state parameters which are used as input of the next round of training.
S204: and calculating the reward value of the service placement action according to the resource balance evaluation parameter and the response delay evaluation parameter contained in the sample state parameter and the resource balance evaluation parameter contained in the updated sample state parameter.
In the embodiment of the invention, after each iteration, the reward value of the service placing action in the current iteration can be calculated, and the reward value of the current placing action is easy to understand if the more balanced the resource balance degree evaluation parameter shows resource distribution after the current placing action is, the lower the response delay evaluation parameter shows the response delay, and the higher the reward value of the current placing action is.
Specifically, in one embodiment of the present invention, the reward value for the service placement action may be calculated based on the following formula:
Figure BDA0002274422600000131
wherein r isnRepresenting the prize value for the nth iteration,
Figure BDA0002274422600000132
representing the resource balance evaluation parameters after the (n-1) th iteration,
Figure BDA0002274422600000133
representing the resource balance evaluation parameter after the nth iteration,
Figure BDA0002274422600000134
denotes the service delay after the nth iteration, LaveRepresenting the average service delay constraint.
The above formula is only one way to calculate the prize value, and the embodiments of the present invention are not limited to calculating the prize value using the above formula.
S205: and substituting the sample state parameter, the updated sample state parameter, the service placing action and the reward value of the service placing action into a preset loss function, and calculating the loss value of the service placing action.
For the sake of easy understanding, the description will be made by taking the nth iteration as an example, and the state parameter of the sample after the (n-1) th iteration is sn-1The state parameter of the sample after the nth iteration is snAnd the service placement action output by the nth iteration is used as anThe reward value of the nth iteration is rnThen can be based on sn-1,sn,an,rnAnd calculating the loss value of the service placement action in the nth iteration according to a preset loss function.
Those skilled in the art will appreciate that in the field of deep reinforcement learning, each iteration can obtain a new Q value, which is a function of state s and action a, and represents the expected gain obtained by taking action a in a certain state s. The Q value is typically determined by the target network and the estimated network, where the target network uses QtargetRepresenting, estimating Q for the networkevaRepresenting, a target network QtargetThe output Q value is updated iteratively, and the network Q is estimatedevaThe output Q value is before the iterative update.
In the embodiment of the invention, the output Q values of the target network and the estimated network can be determined based on the sample state parameters before and after iteration and corresponding service placement actions, and then a loss function is constructed by combining the reward values.
Specifically, in an embodiment of the present invention, the preset loss function may be:
Loss=E[(rn+1+γQtarget(sn+1,argmaxa′(Qeva(sn+1,a′)))-Qeva(sn,an))2]
wherein, E2]Representing a mathematical function, rn+1Denotes the reward value of the (n + 1) th iteration, gamma denotes the decay factor, QtargetRepresenting the target network, QevaRepresenting an estimated network, snRepresenting the state parameter of the sample after the n-th iteration, anAnd a' represents the service placing action which enables the estimated network to output the maximum value.
In the embodiment of the invention, the state parameters of the samples before and after each iteration, the service placement action and the reward value are used as variables in the loss function, so as to train the neural network model, and in order to accelerate the convergence speed and improve the accuracy of the network model, the training can be carried out by combining the data of previous multiple steps in the subsequent iteration process. For example, for the third iteration, the sample state parameter, the service placement action, the reward value of the first iteration, and the sample state parameter, the service placement action, and the reward value of the second iteration may be used as training data. I.e. the data of the first few iterations can be considered together in the loss function.
In addition, since the difference value of the Q values output by the target network and the estimated network in each iteration can reflect the reference degree of the data in the iteration as training data, that is, the larger the difference value, the more training worth the data in the iteration is indicated, so that a larger sampling weight can be set for the data. For example, if the current iteration is the 5 th iteration, if the data of the 2 nd to 4 th iterations are selected for training, and for the 2 nd to 4 th iterations, if the difference between the Q values output by the target network and the estimated network is the largest in the 3 rd iteration, a larger sampling weight may be set for the data of the 3 rd iteration.
In the embodiment of the present invention, the loss function may be improved based on two aspects, namely, multi-step joint training and setting of sampling weights, and specifically, the loss function improved based on the two aspects may be:
Figure BDA0002274422600000141
Figure BDA0002274422600000142
wherein L represents a loss function after improvement, E2]Representing a mathematical expectation, n representing the number of sets of historical iteration data referenced per iteration, t representing the time of day,
Figure BDA0002274422600000143
representing the priority weights of n sets of historical iteration data after time t,
Figure BDA0002274422600000144
it can be determined according to the difference between the Q values of the target network and the estimated network output in the historical iteration data, i.e. the larger the difference between the Q values, the greater the priority weight of the historical iteration data in the round,
Figure BDA0002274422600000151
representing the sum of the prize values for n iterations after time t,
Figure BDA0002274422600000152
representing the decay factor for the reward value of n iterations after time t,
Figure BDA0002274422600000153
the value of (A) can be set according to the actual situation, QtargetRepresenting the target network, QevaRepresenting an estimated network, stA sample state parameter representing the time t, atService Placement action, s, representing time tt+nRepresents the state parameters of the samples after iteration n times, a' represents the service placement action for making the estimated network output the maximum value, k represents the iteration number,
Figure BDA0002274422600000154
representing a decay factor, r, for the reward value of the kth iteration after time tt+k+1Representing the prize value of the kth iteration after time t.
Therefore, the improved loss function considers multiple groups of historical iteration data generated by previous iteration, sets priority weights based on the difference value of Q values in each previous iteration, and can train the neural network more pertinently, so that the convergence speed is increased, and the accuracy of the network model is improved.
S206: and determining whether the neural network model converges according to the loss value, otherwise executing S207, and executing S208.
When the loss value does not exceed the preset loss threshold, the neural network model may be considered to have converged. In addition, the maximum number of iterations may also be preset, and when the maximum number of iterations is reached, the neural network model may also be considered to have converged, which is not limited.
S207: and adjusting the parameter values in the neural network model, and returning to execute the step S202.
And when the loss value shows that the neural network model does not converge, adjusting the parameter value, returning to the step S202, and starting a new round of iterative training.
S208: and determining the current neural network model as a resource balance optimization model.
Based on the same inventive concept, according to the above embodiment of the resource allocation method based on deep reinforcement learning, an embodiment of the present invention further provides a resource allocation method based on deep reinforcement learning, and referring to fig. 3, the method may include the following modules:
a first determining module 301, configured to determine services of multiple resources to be allocated included in an application request of a user, and an allocation priority of each service;
a second determining module 302, configured to determine state parameters of the current edge micro-cloud system, where the state parameters include a resource balance evaluation parameter, a response delay evaluation parameter, and a resource remaining amount of each computing node in each micro-cloud;
an input module 303, configured to input the state parameter into a resource balancing optimization model that is trained in advance, to obtain a first target computing node of the first service; the first service is the service with the highest priority currently allocated; the resource balance optimization model is completed based on deep reinforcement learning training, wherein the training set of the deep reinforcement learning comprises the following steps: sample state parameters of the edge micro-cloud system;
a deployment module 304 for deploying the first service to the first target computing node;
the updating module 305 is configured to update the status parameters and trigger the input module until the resource allocation is completed for each service of the resource to be allocated included in the application request.
In an embodiment of the present invention, on the basis of the apparatus shown in fig. 3, a calculation module may further be included, where the calculation module is configured to calculate a resource balance evaluation parameter based on the following formula:
Figure BDA0002274422600000161
Figure BDA0002274422600000162
Figure BDA0002274422600000163
Figure BDA0002274422600000164
Figure BDA0002274422600000165
wherein the content of the first and second substances,
Figure BDA0002274422600000166
representing the resource utilization variance of the jth computing node in the ith cloudlet, D representing the number of classes of resources,
Figure BDA0002274422600000167
indicating the resource utilization rate of the d-th type resource in the j-th computing node in the ith micro cloud,
Figure BDA0002274422600000168
represents the average value of the resource utilization rate of all kinds of resources in the jth computing node in the ith micro cloud, X represents the resource allocation strategy,
Figure BDA0002274422600000169
representing the resource balance rate, RUBD, of the jth computing node in the ith cloutiIndicating resource utilization balance of the ith cloudlet, LiRepresenting the total number of compute nodes, RUBD, in the ith cloutTotalExpressing a resource balance evaluation parameter of the edge micro cloud system, and K expresses the total number of micro clouds in the edge micro cloud system;
calculating a response delay evaluation parameter based on the following formula:
tTotal=TComp(X)+TTR(X)
tTotalindicating a response delay evaluation parameter, TComp(X) represents a calculation delay, TTR(X) represents a transmission delay.
In an embodiment of the present invention, on the basis of the apparatus shown in fig. 3, a training module may further be included, where the training module is configured to train a resource balancing optimization model according to the following steps:
acquiring a preset neural network model and a training set;
inputting the sample state parameters into a neural network model to obtain service placement actions; the service placement action represents determining a placed target compute node for the sample service;
updating the sample state parameters based on the service placement action to obtain updated sample state parameters;
responding to the delay evaluation parameter based on the resource balance evaluation parameter contained in the sample state parameter, the response delay evaluation parameter and the resource balance evaluation parameter contained in the updated sample state parameter, and calculating the reward value of the service placing action;
substituting the sample state parameter, the updated sample state parameter, the service placing action and the reward value of the service placing action into a preset loss function, and calculating the loss value of the service placing action;
determining whether the neural network model converges according to the loss value;
if not, adjusting parameter values in the neural network model, and returning to the step of inputting the updated sample state parameters into the neural network model to obtain a service placing action;
and if so, determining the current neural network model as a resource balance optimization model.
In one embodiment of the invention, the loss function may be:
Figure BDA0002274422600000171
Figure BDA0002274422600000172
wherein L represents a loss function, E [ [ alpha ] ]]Representing a mathematical expectation, n represents eachThe number of sets of historical iteration data referenced by the sub-iteration, t representing the time of day,
Figure BDA0002274422600000173
representing the priority weights of n sets of historical iteration data after time t,
Figure BDA0002274422600000174
representing the sum of the prize values for n iterations after time t,
Figure BDA0002274422600000175
representing a decay factor, Q, for a reward value of n iterations after time ttargetRepresenting the target network, QevaRepresenting an estimated network, stA sample state parameter representing the time t, atService Placement action, s, representing time tt+nRepresents the state parameters of the samples after iteration n times, a' represents the service placement action for making the estimated network output the maximum value, k represents the iteration number,
Figure BDA0002274422600000181
representing a decay factor, r, for the reward value of the kth iteration after time tt+k+1Representing the prize value of the kth iteration after time t.
By applying the resource allocation device based on deep reinforcement learning provided by the embodiment of the invention, the services of various resources to be allocated contained in the application program request of a user and the allocation priority of each service can be determined; determining state parameters of a current edge micro cloud system, wherein the state parameters comprise a resource balance degree evaluation parameter, a response delay evaluation parameter and the resource surplus of each computing node in each micro cloud; inputting the state parameters into a resource balance optimization model which is trained in advance to obtain a first target computing node of a first service, and deploying the first service in the first target computing node; and updating the state parameters, and returning to the step of inputting the state parameters into the resource balance optimization model which is trained in advance until each service of the resources to be distributed contained in the application program request completes the resource distribution. Therefore, response delay and resource allocation balance are comprehensively considered, and a network model is trained in a deep reinforcement learning mode.
Based on the same inventive concept, according to the implementation of the above resource allocation method based on deep reinforcement learning, the embodiment of the present invention provides an electronic device, as shown in fig. 4, including a processor 401, a communication interface 402, a memory 403, and a communication bus 404, where the processor 401, the communication interface 402, and the memory 403 complete mutual communication via the communication bus 404,
a memory 403 for storing a computer program;
the processor 401, when executing the program stored in the memory 403, implements the following steps:
determining services of various resources to be allocated contained in an application program request of a user and allocation priority of each service;
determining state parameters of a current edge micro cloud system, wherein the state parameters comprise a resource balance degree evaluation parameter, a response delay evaluation parameter and the resource surplus of each computing node in each micro cloud;
inputting the state parameters into a resource balance optimization model which is trained in advance to obtain a first target computing node of a first service; the first service is the service with the highest priority currently allocated; the resource balance optimization model is completed based on deep reinforcement learning training, wherein the training set of the deep reinforcement learning comprises the following steps: sample state parameters of the edge micro-cloud system;
deploying a first service at a first target computing node;
and updating the state parameters, and returning to the step of inputting the state parameters into the resource balance optimization model which is trained in advance until each service of the resources to be distributed contained in the application program request completes the resource distribution.
The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 4, but this does not indicate only one bus or one type of bus.
The communication interface is used for communication between the electronic equipment and other equipment.
The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component.
By applying the electronic equipment provided by the embodiment of the invention, the services of various resources to be allocated contained in the application program request of the user and the allocation priority of each service can be determined; determining state parameters of a current edge micro cloud system, wherein the state parameters comprise a resource balance degree evaluation parameter, a response delay evaluation parameter and the resource surplus of each computing node in each micro cloud; inputting the state parameters into a resource balance optimization model which is trained in advance to obtain a first target computing node of a first service, and deploying the first service in the first target computing node; and updating the state parameters, and returning to the step of inputting the state parameters into the resource balance optimization model which is trained in advance until each service of the resources to be distributed contained in the application program request completes the resource distribution. Therefore, response delay and resource allocation balance are comprehensively considered, and a network model is trained in a deep reinforcement learning mode.
Based on the same inventive concept, implemented according to the above deep reinforcement learning-based resource allocation method, in yet another embodiment of the present invention, a computer-readable storage medium is further provided, in which a computer program is stored, and the computer program, when executed by a processor, implements any of the above deep reinforcement learning-based resource allocation method steps.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the above-mentioned resource allocation apparatus embodiment, electronic device embodiment and computer-readable storage medium embodiment based on deep reinforcement learning, since they are substantially similar to the above-mentioned resource allocation method embodiment based on deep reinforcement learning, the description is relatively simple, and for the relevant points, refer to the partial description of the above-mentioned resource allocation method embodiment based on deep reinforcement learning.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (10)

1. A resource allocation method based on deep reinforcement learning is applied to a control platform of an edge micro-cloud system, wherein the edge micro-cloud system further comprises a plurality of micro-clouds, each micro-cloud comprises a plurality of computing nodes, and the method comprises the following steps:
determining services of various resources to be allocated contained in an application program request of a user and allocation priority of each service;
determining state parameters of a current edge micro cloud system, wherein the state parameters comprise a resource balance degree evaluation parameter, a response delay evaluation parameter and the resource surplus of each computing node in each micro cloud;
inputting the state parameters into a resource balance optimization model which is trained in advance to obtain a first target computing node of a first service; the first service is the service with the highest priority currently allocated; the resource balance optimization model is completed based on deep reinforcement learning training, wherein a training set of the deep reinforcement learning comprises: sample state parameters of the edge micro-cloud system;
deploying the first service to the first target computing node;
and updating the state parameters, and returning to the step of inputting the state parameters into the resource balance optimization model which is trained in advance until the resource allocation of each service of the resources to be allocated contained in the application program request is completed.
2. The method of claim 1, wherein the resource balance evaluation parameter is calculated based on the following formula:
Figure FDA0002274422590000011
Figure FDA0002274422590000012
Figure FDA0002274422590000013
Figure FDA0002274422590000014
Figure FDA0002274422590000021
wherein, the RUVi jRepresenting the resource utilization variance of the jth computing node in the ith cloudlet, D representing the number of classes of resources,
Figure FDA0002274422590000022
indicating the resource utilization rate of the d-th type resource in the j-th computing node in the ith micro cloud,
Figure FDA0002274422590000023
represents the average value of the resource utilization rate of all kinds of resources in the jth computing node in the ith micro cloud, X represents the resource allocation strategy,
Figure FDA0002274422590000024
representing the resource balance rate, RUBD, of the jth computing node in the ith cloutiIndicating resource utilization balance of the ith cloudlet, LiRepresenting the total number of compute nodes, RUBD, in the ith cloutTotalRepresenting a resource balance evaluation parameter of the edge micro cloud system, wherein K represents the total number of micro clouds in the edge micro cloud system;
calculating the response delay evaluation parameter based on the following formula:
tTotal=TComp(X)+TTR(X)
tTotalindicating a response delay evaluation parameter, TComp(X) represents a calculation delay, TTR(X) represents a transmission delay.
3. The method of claim 1, wherein the resource balancing optimization model is trained by:
acquiring a preset neural network model and the training set;
inputting the sample state parameters into the neural network model to obtain service placement actions; the service placement action represents determining a placed target compute node for a sample service;
updating the sample state parameters based on the service placement action to obtain updated sample state parameters;
calculating an incentive value of the service placement action based on the resource balance degree evaluation parameter and the response delay evaluation parameter contained in the sample state parameter and the resource balance degree evaluation parameter contained in the updated sample state parameter and the response delay evaluation parameter;
substituting the sample state parameter, the updated sample state parameter, the service placing action and the reward value of the service placing action into a preset loss function, and calculating the loss value of the service placing action;
determining whether the neural network model converges according to the loss value;
if not, adjusting parameter values in the neural network model, and returning to the step of inputting the updated sample state parameters into the neural network model to obtain a service placing action;
and if so, determining the current neural network model as a resource balance optimization model.
4. The method of claim 3, wherein the loss function is:
Figure FDA0002274422590000031
Figure FDA0002274422590000032
wherein L represents a loss function, E [ [ alpha ] ]]Representing a mathematical expectation, n representing the number of sets of historical iteration data referenced per iteration, t representing the time of day,
Figure FDA0002274422590000033
representing the priority weight, r, of n sets of historical iterative data after time tt (n)Representing the sum of the prize values for n iterations after time t,
Figure FDA0002274422590000034
representing a decay factor, Q, for a reward value of n iterations after time ttargetRepresenting the target network, QevaRepresenting an estimated network, stA sample state parameter representing the time t, atService Placement action, s, representing time tt+nRepresents the state parameters of the samples after iteration n times, a' represents the service placement action for making the estimated network output the maximum value, k represents the iteration number,
Figure FDA0002274422590000035
representing a decay factor, r, for the reward value of the kth iteration after time tt+k+1Representing the prize value of the kth iteration after time t.
5. A resource allocation device based on deep reinforcement learning is applied to a control platform of an edge micro-cloud system, wherein the edge micro-cloud system further comprises a plurality of micro-clouds, each micro-cloud comprises a plurality of computing nodes, and the device comprises:
the first determining module is used for determining services of various resources to be allocated contained in an application program request of a user and the allocation priority of each service;
the second determining module is used for determining state parameters of the current edge micro cloud system, wherein the state parameters comprise a resource balance degree evaluation parameter, a response delay evaluation parameter and the resource surplus of each computing node in each micro cloud;
the input module is used for inputting the state parameters into a resource balance optimization model which is trained in advance to obtain a first target computing node of a first service; the first service is the service with the highest priority currently allocated; the resource balance optimization model is completed based on deep reinforcement learning training, wherein a training set of the deep reinforcement learning comprises: sample state parameters of the edge micro-cloud system;
a deployment module to deploy the first service to the first target computing node;
and the updating module is used for updating the state parameters and triggering the input module until each service of the resources to be allocated contained in the application program request completes resource allocation.
6. The apparatus of claim 5, further comprising: a calculating module, configured to calculate the resource balance evaluation parameter based on the following formula:
Figure FDA0002274422590000041
Figure FDA0002274422590000042
Figure FDA0002274422590000043
Figure FDA0002274422590000044
Figure FDA0002274422590000045
wherein, the RUVi jRepresenting the resource utilization variance of the jth computing node in the ith cloudlet, D representing the number of classes of resources,
Figure FDA0002274422590000046
indicating the resource utilization rate of the d-th type resource in the j-th computing node in the ith micro cloud,
Figure FDA0002274422590000047
represents the average value of the resource utilization rate of all kinds of resources in the jth computing node in the ith micro cloud, X represents the resource allocation strategy,
Figure FDA0002274422590000048
representing the resource balance rate, RUBD, of the jth computing node in the ith cloutiIndicating resource utilization balance of the ith cloudlet, LiRepresenting the total number of compute nodes, RUBD, in the ith cloutTotalRepresenting a resource balance evaluation parameter of the edge micro cloud system, wherein K represents the total number of micro clouds in the edge micro cloud system;
calculating the response delay evaluation parameter based on the following formula:
tTotal=TComp(X)+TTR(X)
tTotalindicating a response delay evaluation parameter, TComp(X) represents a calculation delay, TTR(X) represents a transmission delay.
7. The apparatus of claim 5, further comprising: a training module, configured to train the resource balancing optimization model according to the following steps:
acquiring a preset neural network model and the training set;
inputting the sample state parameters into the neural network model to obtain service placement actions; the service placement action represents determining a placed target compute node for a sample service;
updating the sample state parameters based on the service placement action to obtain updated sample state parameters;
calculating an incentive value of the service placement action based on the resource balance degree evaluation parameter and the response delay evaluation parameter contained in the sample state parameter and the resource balance degree evaluation parameter contained in the updated sample state parameter and the response delay evaluation parameter;
substituting the sample state parameter, the updated sample state parameter, the service placing action and the reward value of the service placing action into a preset loss function, and calculating the loss value of the service placing action;
determining whether the neural network model converges according to the loss value;
if not, adjusting parameter values in the neural network model, and returning to the step of inputting the updated sample state parameters into the neural network model to obtain a service placing action;
and if so, determining the current neural network model as a resource balance optimization model.
8. The apparatus of claim 7, wherein the loss function is:
Figure FDA0002274422590000051
Figure FDA0002274422590000052
wherein L represents a loss function, E [ [ alpha ] ]]Representing a mathematical expectation, n representing the number of sets of historical iteration data referenced per iteration, t representing the time of day,
Figure FDA0002274422590000053
representing the priority weight, r, of n sets of historical iterative data after time tt (n)Representing the sum of the prize values for n iterations after time t,
Figure FDA0002274422590000054
representing a decay factor, Q, for a reward value of n iterations after time ttargetRepresenting the target network, QevaRepresenting an estimated network, stA sample state parameter representing the time t, atService Placement action, s, representing time tt+nRepresents the state parameters of the samples after iteration n times, a' represents the service placement action for making the estimated network output the maximum value, k represents the iteration number,
Figure FDA0002274422590000055
representing a decay factor, r, for the reward value of the kth iteration after time tt+k+1Representing the prize value of the kth iteration after time t.
9. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;
a memory for storing a computer program;
a processor for implementing the method steps of any of claims 1 to 4 when executing a program stored in the memory.
10. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of claims 1 to 4.
CN201911117328.0A 2019-11-15 2019-11-15 Resource allocation method and device based on deep reinforcement learning Active CN111444009B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911117328.0A CN111444009B (en) 2019-11-15 2019-11-15 Resource allocation method and device based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911117328.0A CN111444009B (en) 2019-11-15 2019-11-15 Resource allocation method and device based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN111444009A true CN111444009A (en) 2020-07-24
CN111444009B CN111444009B (en) 2022-10-14

Family

ID=71626797

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911117328.0A Active CN111444009B (en) 2019-11-15 2019-11-15 Resource allocation method and device based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN111444009B (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112492651A (en) * 2020-11-23 2021-03-12 中国联合网络通信集团有限公司 Resource scheduling scheme optimization method and device
CN112600906A (en) * 2020-12-09 2021-04-02 中国科学院深圳先进技术研究院 Resource allocation method and device for online scene and electronic equipment
CN112650583A (en) * 2020-12-23 2021-04-13 新智数字科技有限公司 Resource allocation method, device, readable medium and electronic equipment
CN112799817A (en) * 2021-02-02 2021-05-14 中国科学院计算技术研究所 Micro-service resource scheduling system and method
CN112836796A (en) * 2021-01-27 2021-05-25 北京理工大学 Method for super-parameter collaborative optimization of system resources and model in deep learning training
CN112860512A (en) * 2021-01-29 2021-05-28 平安国际智慧城市科技股份有限公司 Interface monitoring optimization method and device, computer equipment and storage medium
CN112866041A (en) * 2021-04-23 2021-05-28 南京蓝洋智能科技有限公司 Adaptive network system and training method
CN112988380A (en) * 2021-02-25 2021-06-18 电子科技大学 Kubernetes-based cluster load adjusting method and storage medium
CN113014649A (en) * 2021-02-26 2021-06-22 济南浪潮高新科技投资发展有限公司 Cloud Internet of things load balancing method, device and equipment based on deep learning
CN113176947A (en) * 2021-05-08 2021-07-27 武汉理工大学 Dynamic task placement method based on delay and cost balance in serverless computing
CN113364831A (en) * 2021-04-27 2021-09-07 国网浙江省电力有限公司电力科学研究院 Multi-domain heterogeneous computing network resource credible cooperation method based on block chain
CN113391907A (en) * 2021-06-25 2021-09-14 中债金科信息技术有限公司 Task placement method, device, equipment and medium
CN113408641A (en) * 2021-06-30 2021-09-17 北京百度网讯科技有限公司 Method and device for training resource generation model and generating service resources
CN113448425A (en) * 2021-07-19 2021-09-28 哈尔滨工业大学 Dynamic parallel application program energy consumption runtime optimization method and system based on reinforcement learning
CN113691840A (en) * 2021-08-31 2021-11-23 江苏赞奇科技股份有限公司 Video stream control method and system with high availability
CN114116156A (en) * 2021-10-18 2022-03-01 武汉理工大学 Cloud-edge cooperative double-profit equilibrium taboo reinforcement learning resource allocation method
CN114301922A (en) * 2020-10-07 2022-04-08 智捷科技股份有限公司 Reverse proxy method with delay perception load balancing and storage device
CN114339311A (en) * 2021-12-09 2022-04-12 北京邮电大学 Video cloud transcoding and distribution joint decision method and system
CN114338504A (en) * 2022-03-15 2022-04-12 武汉烽火凯卓科技有限公司 Micro-service deployment and routing method based on network edge system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106874108A (en) * 2016-12-28 2017-06-20 广东工业大学 Thin cloud is minimized in mobile cloud computing use number technology
CN110351571A (en) * 2019-07-05 2019-10-18 清华大学 Live video cloud transcoding resource allocation and dispatching method based on deeply study
US20190325304A1 (en) * 2018-04-24 2019-10-24 EMC IP Holding Company LLC Deep Reinforcement Learning for Workflow Optimization
CN110418416A (en) * 2019-07-26 2019-11-05 东南大学 Resource allocation methods based on multiple agent intensified learning in mobile edge calculations system
CN110445866A (en) * 2019-08-12 2019-11-12 南京工业大学 Task immigration and collaborative load-balancing method in a kind of mobile edge calculations environment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106874108A (en) * 2016-12-28 2017-06-20 广东工业大学 Thin cloud is minimized in mobile cloud computing use number technology
US20190325304A1 (en) * 2018-04-24 2019-10-24 EMC IP Holding Company LLC Deep Reinforcement Learning for Workflow Optimization
CN110351571A (en) * 2019-07-05 2019-10-18 清华大学 Live video cloud transcoding resource allocation and dispatching method based on deeply study
CN110418416A (en) * 2019-07-26 2019-11-05 东南大学 Resource allocation methods based on multiple agent intensified learning in mobile edge calculations system
CN110445866A (en) * 2019-08-12 2019-11-12 南京工业大学 Task immigration and collaborative load-balancing method in a kind of mobile edge calculations environment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
XIN HU等: "A Deep Reinforcement Learning-Based Framework for Dynamic", 《IEEE COMMUNICATIONS LETTERS》 *

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114301922A (en) * 2020-10-07 2022-04-08 智捷科技股份有限公司 Reverse proxy method with delay perception load balancing and storage device
CN112492651B (en) * 2020-11-23 2023-07-21 中国联合网络通信集团有限公司 Resource scheduling scheme optimization method and device
CN112492651A (en) * 2020-11-23 2021-03-12 中国联合网络通信集团有限公司 Resource scheduling scheme optimization method and device
CN112600906A (en) * 2020-12-09 2021-04-02 中国科学院深圳先进技术研究院 Resource allocation method and device for online scene and electronic equipment
CN112650583A (en) * 2020-12-23 2021-04-13 新智数字科技有限公司 Resource allocation method, device, readable medium and electronic equipment
CN112836796A (en) * 2021-01-27 2021-05-25 北京理工大学 Method for super-parameter collaborative optimization of system resources and model in deep learning training
CN112836796B (en) * 2021-01-27 2022-07-01 北京理工大学 Method for super-parameter collaborative optimization of system resources and model in deep learning training
CN112860512A (en) * 2021-01-29 2021-05-28 平安国际智慧城市科技股份有限公司 Interface monitoring optimization method and device, computer equipment and storage medium
CN112860512B (en) * 2021-01-29 2022-07-15 平安国际智慧城市科技股份有限公司 Interface monitoring optimization method and device, computer equipment and storage medium
CN112799817A (en) * 2021-02-02 2021-05-14 中国科学院计算技术研究所 Micro-service resource scheduling system and method
CN112988380A (en) * 2021-02-25 2021-06-18 电子科技大学 Kubernetes-based cluster load adjusting method and storage medium
CN112988380B (en) * 2021-02-25 2022-06-17 电子科技大学 Kubernetes-based cluster load adjusting method and storage medium
CN113014649A (en) * 2021-02-26 2021-06-22 济南浪潮高新科技投资发展有限公司 Cloud Internet of things load balancing method, device and equipment based on deep learning
CN112866041A (en) * 2021-04-23 2021-05-28 南京蓝洋智能科技有限公司 Adaptive network system and training method
CN113364831A (en) * 2021-04-27 2021-09-07 国网浙江省电力有限公司电力科学研究院 Multi-domain heterogeneous computing network resource credible cooperation method based on block chain
CN113364831B (en) * 2021-04-27 2022-07-19 国网浙江省电力有限公司电力科学研究院 Multi-domain heterogeneous computing network resource credible cooperation method based on block chain
CN113176947A (en) * 2021-05-08 2021-07-27 武汉理工大学 Dynamic task placement method based on delay and cost balance in serverless computing
CN113176947B (en) * 2021-05-08 2024-05-24 武汉理工大学 Dynamic task placement method based on delay and cost balance in server-free calculation
CN113391907A (en) * 2021-06-25 2021-09-14 中债金科信息技术有限公司 Task placement method, device, equipment and medium
CN113408641B (en) * 2021-06-30 2024-04-26 北京百度网讯科技有限公司 Training of resource generation model and generation method and device of service resource
CN113408641A (en) * 2021-06-30 2021-09-17 北京百度网讯科技有限公司 Method and device for training resource generation model and generating service resources
CN113448425B (en) * 2021-07-19 2022-09-09 哈尔滨工业大学 Dynamic parallel application program energy consumption runtime optimization method and system based on reinforcement learning
CN113448425A (en) * 2021-07-19 2021-09-28 哈尔滨工业大学 Dynamic parallel application program energy consumption runtime optimization method and system based on reinforcement learning
CN113691840A (en) * 2021-08-31 2021-11-23 江苏赞奇科技股份有限公司 Video stream control method and system with high availability
CN114116156A (en) * 2021-10-18 2022-03-01 武汉理工大学 Cloud-edge cooperative double-profit equilibrium taboo reinforcement learning resource allocation method
CN114116156B (en) * 2021-10-18 2022-09-09 武汉理工大学 Cloud-edge cooperative double-profit equilibrium taboo reinforcement learning resource allocation method
CN114339311A (en) * 2021-12-09 2022-04-12 北京邮电大学 Video cloud transcoding and distribution joint decision method and system
CN114338504A (en) * 2022-03-15 2022-04-12 武汉烽火凯卓科技有限公司 Micro-service deployment and routing method based on network edge system

Also Published As

Publication number Publication date
CN111444009B (en) 2022-10-14

Similar Documents

Publication Publication Date Title
CN111444009B (en) Resource allocation method and device based on deep reinforcement learning
CN110851529B (en) Calculation power scheduling method and related equipment
CN110163368B (en) Deep learning model training method, device and system based on mixed precision
CN113326126B (en) Task processing method, task scheduling method, device and computer equipment
US10838839B2 (en) Optimizing adaptive monitoring in resource constrained environments
US20190325304A1 (en) Deep Reinforcement Learning for Workflow Optimization
US20190251443A1 (en) Automatically scaling neural networks based on load
CN110413396B (en) Resource scheduling method, device and equipment and readable storage medium
CN109799550B (en) Method and device for predicting rainfall intensity
US20170277620A1 (en) Systems and methods for providing dynamic and real time simulations of matching resources to requests
CN113824489A (en) Satellite network resource dynamic allocation method, system and device based on deep learning
CN112884016A (en) Cloud platform credibility evaluation model training method and cloud platform credibility evaluation method
CN110795217A (en) Task allocation method and system based on resource management platform
CN113485833B (en) Resource prediction method and device
CN113378498A (en) Task allocation method and device
CN112019382B (en) Health assessment method, system and device of cloud computing management platform
CN111813524B (en) Task execution method and device, electronic equipment and storage medium
CN111836274B (en) Service processing method and device
CN113591999A (en) End edge cloud federal learning model training system and method
CN112101394B (en) Provider domain deployment method, device, computing equipment and computer storage medium
US11599793B2 (en) Data integration demand management using artificial intelligence
CN117455660B (en) Financial real-time safety detection system, method, equipment and storage medium
CN113313195B (en) Labeling task processing method, labeling task processing device, labeling task processing equipment, labeling task processing storage medium and labeling task processing program product
US20230064500A1 (en) Optimizing machine learning as-a-service performance for cellular communication systems
US11567809B2 (en) Accelerating large-scale image distribution

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant