CN113822456A

CN113822456A - Service combination optimization deployment method based on deep reinforcement learning in cloud and mist mixed environment

Info

Publication number: CN113822456A
Application number: CN202010562269.4A
Authority: CN
Inventors: 周峰; 吕智慧; 吴杰; 陈晓伟
Original assignee: Fudan University
Current assignee: Fudan University
Priority date: 2020-06-18
Filing date: 2020-06-18
Publication date: 2021-12-21

Abstract

The invention belongs to the technical field of cloud computing, edge computing and service computing in the field of computers, relates to a service combination optimization deployment scheme based on deep reinforcement learning in a cloud and fog mixed environment, and particularly relates to an application service dynamic optimization deployment method based on deep reinforcement learning in an edge computing scene. The energy consumption and the cost are saved, and meanwhile, the system resources of the edge server are efficiently utilized.

Description

Service combination optimization deployment method based on deep reinforcement learning in cloud and mist mixed environment

Technical Field

The invention belongs to the technical field of cloud computing, edge computing and service computing in the field of computers, relates to a service combination optimization deployment scheme based on deep reinforcement learning in a cloud and fog mixed environment, and particularly relates to an application service dynamic optimization deployment method based on deep reinforcement learning mainly in an edge computing scene.

Background

It is reported that service deployment has become a focus of cloud computing environment research, and at present, several related researches on service deployment in a cloud scenario show that, in a cloud service provider in a public cloud market, service components supporting computing are relatively dispersed in aspects of cloud platform interfaces, pricing and cloud platform functions, wherein, particularly for data-intensive applications, all the service components supporting computing are components with data dependence and logic dependence on each other, how to select appropriate cloud resources for a plurality of logically integrated service components, select an optimal deployment strategy, and deploy the optimal deployment strategy to ensure that the computing efficiency and communication efficiency between services are relatively excellent in a search space of a global deployment scheme, which is a service deployment problem.

It is known in the art that the service deployment problem is a typical optimization problem, and most of the conventional service deployment schemes are based on heuristic algorithms (heuristic algorithms). Heuristic algorithms are usually proposed by using the rules found in the biological world or the experiences summarized from life, and the aim is to find a high-quality solution in the combinatorial optimization problem as quickly as possible under the acceptable expense of users; common heuristic algorithms include genetic algorithms, ant colony algorithms, simulated annealing methods, and the like.

With the development of the 5G technology, the number of mobile devices and internet of things devices accessing to the edge of the network increases, and these accessed mobile devices need to use the functions provided by the services deployed on the edge server, but the number of mobile devices such as mobile phones and the like accessed to the network may change regularly with the change of time and the movement of users; the service deployment scheme based on the heuristic algorithm allocates rated resources to the service on the edge service or fixes the service, so that the performance of the edge server is difficult to be fully exerted, and the resource utilization rate is not high.

Based on the current situation of the prior art, the inventor of the application proposes to provide a service combination optimal deployment strategy which is formed by logical dependence of a plurality of service components and has a better global condition, so as to effectively reduce the average response time of user services and the energy consumption of edge servers, and particularly provides a service combination optimal deployment scheme based on deep reinforcement learning in a cloud and fog mixed environment.

Disclosure of Invention

The invention aims to provide a service deployment strategy with a better global situation in a service combination formed by logic dependence of a plurality of service components under the background of a cloud and mist mixed architecture based on the current situation of the prior art, in particular to a service combination optimized deployment scheme based on deep reinforcement learning under a cloud and mist mixed architecture environment, and particularly to an application service dynamic optimized deployment method based on deep reinforcement learning mainly under an edge computing scene. The optimization problem is solved based on the idea of deep reinforcement learning DDPG (deep dependent Policy gradient) algorithm, and the average response time of user service and the energy consumption of an edge server can be effectively reduced.

The invention provides an application service dynamic optimization deployment method based on deep reinforcement learning in an edge computing scene, which is characterized in that an intelligent decision Agent is added between a user and services running on each edge node, the Agent integrates the current application services running on each edge node, the access usage amount of each service in the next time period is predicted according to the resource condition and the user request amount in each time period, and then the resource usage condition of each service on the edge node is adjusted, so that the dynamic service deployment of a service set on each edge node is carried out, a service provider is helped to make an optimization deployment strategy in each time period, the service quality is improved, the energy consumption and the cost are saved, and meanwhile, the system resources of the edge server are efficiently utilized.

Specifically, in the invention, an overall design frame is mainly aimed at, the design frame is expanded, and detailed description and design are respectively carried out on each part; the framework of the invention comprises: aiming at a cloud service entity and a network edge node entity in fog computing, corresponding entity models are designed to express configuration setting of application specifications of the cloud service entity and the network edge node entity, and a cloud and fog mixed service scene is provided.

In the invention, aiming at the problem of dynamic optimization deployment of application services under a cloud computing and fog computing mixed architecture, problem modeling is carried out based on a deep reinforcement learning algorithm, constraint conditions and objective functions of the service optimization deployment in edge nodes are provided, and meanwhile, a dynamic optimization strategy is applied under the scene aiming at dependent service combinations. According to the invention, by designing an effective service combination optimization deployment scheme, the service quality under a cloud and mist mixed configuration scene is improved, a better balance point is found between the service quality and the system energy consumption for a service provider, the service quality is improved, and the cost is saved.

More specifically, the present invention is to provide a novel,

the framework of the invention comprises: aiming at a complex and changeable cloud and mist mixed scene in reality, a mathematical model which is easy to describe and is close to a real environment is provided, and a system of a dynamic service deployment strategy in the cloud and mist mixed scene is provided. The technical scheme of the invention comprises the following steps:

in a first aspect: the mathematical modeling of the dynamic service deployment in the cloud and fog mixed configuration scene comprises the problem assumption of the dynamic service deployment of the mobile edge computing and a service deployment strategy system optimization target calculation formula;

the problem assumption of dynamic service deployment of mobile edge computing is to make a service deployment strategy system closer to reality, and the problem assumption is composed of a service request independent assumption, a service request type assumption, a service request message length assumption and a data transmission experiment assumption;

the service deployment strategy system optimization target calculation formula is used for formally representing the optimization target of the service deployment strategy system and is composed of a user request average response time and a server energy consumption calculation formula.

In a second aspect: the system for dynamic service deployment strategy in cloud and mist mixed scene comprises a parameter configuration module, a service request module, an agent module, an experience sampling module, an actor neural network module and a critic neural network module;

the parameter configuration module consists of a user service request parameter submodule, an edge server parameter submodule, a transmission network bandwidth parameter submodule and a DDPG algorithm hyper-parameter submodule and is respectively used for setting user service request related parameters, edge server related parameters, transmission network bandwidth parameters and DDPG algorithm hyper-parameters;

the service request module is used for preprocessing the task request sent by the mobile user to the edge server in each time, obtaining the distribution situation of the types and the quantity of the user service requests in each time period, splitting the user service requests into sub-services and storing the logic dependency relationship among the sub-services.

And the agent module is used for carrying out iterative computation on deep reinforcement learning to obtain an agent of an optimal solution of edge service deployment. The intelligent agent is a processing module of a deep reinforcement learning algorithm and is used for coordinating and scheduling other modules;

the experience sampling module is used for collecting (state, action, return and next step state) quadruples generated in the iterative process of the deep reinforcement learning algorithm according to a sampling strategy; the collected quadruplet data is used for training an Actor neural network and a critic neural network;

the actor neural network module is used for learning the valuation neural network, finding out a proper service deployment scheme, and optimizing the average time delay of a service request and the energy consumption of the edge server;

and the critic neural network module is used for learning the strategy neural network and evaluating the quality of the corresponding service deployment scheme under the current service request distribution condition.

In the embodiment of the invention, under the framework scene that cloud computing and fog computing exist simultaneously, the method is applied to the situation that a plurality of edge counting nodes exist simultaneously, the method carries out the prediction selection of the deployment scheme on the specific service application on the edge nodes, and simultaneously adjusts and distributes the resources required by the service application; the predicted initial service deployment scheme and the resource allocation adjustment strategy of the service are completed by a resource decision body Agent in the system, and a core prediction algorithm in the intelligent decision body Agent is realized based on deep reinforcement learning DDPG (deep dependent Policy gradient); the application optimization deployment system based on deep reinforcement learning under the mixed architecture scene of cloud computing and fog computing mainly focuses on how to realize the automatic optimization deployment of service application in an edge server where edge nodes of the fog computing scene are located; the realization of the scheme aims at that a service user is mainly a cloud service provider, and the invention can meet the user service deployment requirement and the service quality requirement, simultaneously can efficiently utilize only edge server resources as much as possible, and reduce the server energy consumption in each edge node, thereby reducing the capital investment of the service provider macroscopically and ensuring that the acceptable application performance is achieved; the specific implementation process is as follows:

(1) an edge node service deployment prediction system design based on deep reinforcement learning DDPG (deep Deterministic Policy gradient) is provided,

in order to realize the optimal deployment scheme prediction selection of the service combination in a plurality of edge node environments, the invention realizes a set of universal and fully-functional intermediate agent architecture design on each edge server; on one hand, the method receives collectors Cavisors on each edge server at regular time, the edge servers sent at regular time store the average utilization rate and state of resources in the previous time period, an intermediate Agent framework integrates, cleans and processes input data collected on each edge server to form an input data set, then the data set is used as the input data of an intelligent decision Agent, the intelligent decision Agent calculates the deployment strategy and resource allocation amount of each service application in the next time period, and then services on each edge server adjust service deployment and allocate resources according to the strategy and resource allocation amount;

(2) proposing a mathematical model for dynamic service deployment of the system

The total amount of server resources of the edge node is small, the number of service applications capable of being carried is small, but the edge node is close to various mobile and intelligent devices, so that the service can better and more quickly respond to the request, namely the delay time of the service is short; in an edge computing scene, due to the fact that the scene is complex and numerous factors influencing service deployment are used, when a depth reinforcement learning algorithm is used for deployment strategy prediction, the service deployment scene under the whole edge computing needs to be defined and modeled, various influencing factors in an entity scene are quantized, a model entity is established for the algorithm, the format of data is defined and processed, and the problems of measurement and definition of algorithm parameters in the scene are mainly solved; the establishment of a mathematical model of dynamic service deployment enables deep reinforcement learning to be well applied to the dynamic service deployment problem in a cloud and fog mixed configuration scene, and the input data acquisition of the algorithm is sourced from current affair acquisition data on an operation and maintenance edge server of a cloud service provider, so that the system can be well applied to a real environment;

(3) providing a service combination deployment model in an edge computing environment

An actual application scene based on edge computing is in a starting development stage along with the technical development, the demands of massive users enable service application combinations to be increased explosively, so that the service pressure borne by a cloud data center is huge, the service response time delay is prolonged, and the user experience is reduced, so that part of service combinations are deployed on edge nodes close to the users, and the great trend of demands and technical development is achieved; and, in the edge computing scenario, because the edge server resources are limited, the service quality is guaranteed while the service combination is deployed, so that the difficulty of service deployment is greatly increased, and at present, no good solution is available to solve the problems of service deployment in the edge scenario, and the like.

Firstly, each edge node is composed of a small number of servers, the small number of servers form an edge node, the resource amount on the edge server is small, so that the number of service combinations capable of being carried is limited, the resources which can be distributed by service components in the service combinations are also limited, each service combination can be initially deployed on each edge node according to a user request at an initial node of a service combination life cycle, a static deployment scheme is traditionally used for solving the problem, namely, after the service combination is initially deployed, the positions of the services on the servers and the resource usage amount are all limited and fixed, and the deployment position and the resource occupation limit of the whole service cannot be changed in the next service life cycle; however, the request of the user for the service combination on each edge node changes with various conditions such as the position change of the user, time and environmental factors, if the position and the resource occupation amount of the service combination deployment are not adjusted according to the actual conditions, the service is difficult to efficiently utilize the system resources and provide high-quality service at any time, if the whole service combination deployment strategy of the next time period can be adjusted according to the service running state and the request condition of the previous time period, and the resource usage amount of service distribution is adjusted, the problem of low resource utilization rate can be solved, the service quality is improved, namely, dynamic service deployment; generally, a cloud service provider puts forward specific requirements of services to be provided according to service attributes of the cloud service provider, for example, how many edge server nodes the cloud service provider may have to deploy services that the cloud service provider needs to provide to actual users, each edge server may deploy multiple or single service components in a service combination, how reasonably the multiple service components are deployed to each edge node, and it is a problem that the service provider cares about how to seek a dynamic balance between energy consumption and service quality of the server, so that the invention designs a unified intermediate Agent system on each edge node layer, which is an edge node service deployment prediction system developed based on deep reinforcement learning DDPG, that is, an intelligent decision Agent; the data acquisition function module runs on each edge node, the module can regularly send the request condition and the resource use condition of each service component on the edge node in the previous time period to the intelligent decision-making Agent, the service provider sets different parameters for the intelligent decision-making Agent according to the time period, the Agent comprehensively sets the parameters and the collected data, processes the data to generate input data, then the input data is input into the algorithm module for calculation to obtain a service deployment strategy recommended in the next time period, and then the service deployment strategy is applied to the service deployment of each edge node.

In the invention, based on the above contents, scene parameters under an edge calculation scene are complex and difficult to quantify, but when a depth reinforcement learning algorithm is applied to solve the problem, key parameters and available parameters must be screened from complex and numerous scene parameters, so that the scene needs to be defined and modeled mathematically, after the modeling is completed, the parameters need to be quantified in a limited manner, and finally input data suitable for an intelligent decision Agent is obtained.

In the invention, in order to research the dynamic service deployment problem, a complex and changeable cloud and fog mixed scene in reality needs to be abstractly converted into a mathematical model which is easy to describe and is close to a real environment, and in the process of converting problem definition into the mathematical model, in order to make the real scene closer to the problem, the invention needs to provide some problem predefined conditions, wherein the problem is mainly developed based on the following four assumptions;

assume one: the requests of different users in different time periods are independent from each other, namely, one user request is not connected with the request of another user in the current time period;

assume two: the service request types of the users are m, the m different user requests are formed by arranging and combining n sub-services, namely the request types of the users are m in total, the user request is formed by a plurality of sub-services derived from the n sub-services, and logic dependence and execution sequence precedence relation exist among the plurality of sub-services, for example, the user request A can be split into n1- > n2- > n3 sub-services, the user request A is firstly calculated by the n1 service, the result after the n1 service calculation is used as the input of the n2 service, the result obtained by the n2 service calculation is used as the input of the n3 service calculation, and finally the n3 service calculation is used for obtaining the result and returning the result to the users;

suppose three: the lengths of the messages in the process of each stage of the user request are approximately equal, namely, the lengths of the messages are considered to be approximately equal no matter the request initiated by the user at the edge end, the forwarding calculation of the user request among the sub-services and the calculation result finally returned by the user request to the edge end;

assume four: the transmission delays of the unit length data packets in the data transmission process are fixed and equal, and it is assumed that the unit transmission speeds of the data packets between the user mobile device and the edge server are equal, the unit transmission speeds of the data packets between the edge server and the cloud center are also equal, and the transmission delay between the edge server and the edge server is equal to zero.

In the invention, in the edge dynamic service deployment, the optimization target mainly comprises two aspects, one is the average response time of a user request, and the other is the energy consumption of a server end; the invention dynamically adjusts the proportion of the edge server side service resources according to different user service request quantity distribution conditions and different types distribution conditions in different time periods, thereby achieving the minimum user average response time and the minimum server energy consumption under the condition of user requests in the current time.

In the invention, as the computing power of the edge server is much weaker than that of a computing center with a large cloud end, in order to fully utilize the computing resources of the edge server and reduce the average response time of user requests and the energy consumption of the server as much as possible, the dynamic service deployment is carried out according to the request number of users in each period of time; the requests of the mobile edge end users are different in the distribution situation of the requests in each time period, and the dynamic deployment of the mobile edge computing service aiming at the different user request distribution situations in different time periods can meet the requirements of lower interactive response time of the user requests and the energy consumption of the edge server end; the evaluation criterion for evaluating the quality of a mobile edge computing dynamic service deployment scheme is the average response time of a user request and the energy consumption of a server.

In the invention, the average response time of the user service request in a certain time period is calculated by adopting the following calculation formula,

ti denotes the response time of a single user service request during the time period, and n denotes the total number of user requests during the time period.

The response time of a single user service request refers to the time consumed by the user to obtain a calculation result from the initiation of the request by the user in the whole service request, and the response time can be divided into three parts according to the processing process of the service request, namely service request initiation transmission time, service request calculation time and service request return time.

T＝T_in+T_process+T_out

T_inInitiating transmission time for service request, representing the transmission time of user request of mobile edge terminal sent from edge terminal equipment to appointed edge server terminal, the service request initiating process is that the size of data volume of service request sent by edge terminal equipment for mobile user is D_inAnd transmits the request R to the edge server through the network, wherein the bandwidth of the transmission network is V_in(ii) a The calculation expression of the service request initiation transmission time is as follows,

T_processcalculating time for service request, which represents time for splitting user request of mobile edge calculation into sub-services with logic dependence relation in edge server network according to user request type and forwarding operation between edge servers; the service request calculation process is that the size of the data volume received by the edge service cluster is D_inThe request R is split into n sub-services with logic dependency according to the type of the request, and then each sub-service carries out forwarding calculation between edge servers according to the service matching conditions on each current edge calculation server to obtain a final result; the time of the user service request calculation is composed of several sub-service calculation request times t_{process_i}The time required for the ith sub-service to run on the edge server, assuming that the information transmission delay between the edge servers is zero, the serverThe calculation expression of the service request calculation time is as follows,

T_outthe service request return time represents the time of the user request of the mobile edge computing returned to the user edge equipment by the edge server of the last sub-service after being processed, and the service request return process is that the data volume after the edge server computing processing is D_outHas a calculated passing bandwidth of V_outThe network of the mobile subscriber returns to the edge device of the mobile subscriber, the expression of the service request return time is shown in the following formula,

the linear power consumption model based on the CPU utilization rate is the server energy consumption model which is most widely applied in the data center energy consumption research, so for the calculation of the server energy consumption, the power model based on the CPU utilization rate is adopted to calculate the edge service energy consumption, the model can accurately track the power use condition of the server, and u is the CPU utilization rate of the server at the current moment, P_maxRepresenting the average power at full utilization of the server CPU, P_idleRepresenting the average power, T, of the server CPU in the idle state_uRepresenting the running time of the server when the CPU utilization rate is equal to u, and E representing the total energy consumption of the server; the energy consumption formula of any server on the edge end can be expressed as follows,

E＝((P_max-P_idle)*u+P_idle)*T_u。

in the invention, after the prediction system has input parameters, how to establish the relation between scene parameters and service deployment problems is further considered, a service combination deployment model under the marginal computing environment also needs to be modeled, and the expression form of a deployment strategy in an algorithm is quantized;

the problem is solved by the following steps:

for the edge service deployment prediction system environment deployment, resource allocation on a plurality of edge servers is adjusted at the same time, and the adjustment can be represented by an N × M matrix, a matrix element represents a proportion of service deployment adjustment of service resources of the N edge servers to the M edge services, and Anm represents a resource of which the nth edge server adjusts Anm to the M services, that is, the nth edge server newly increases or decreases the resource of the Anm on the mth service;

the edge computing service deployment state describes a service deployment state of mobile edge computing in the iteration process of the deep reinforcement learning algorithm in the current time slice, the edge computing state refers to the percentage of M service resources distributed on N edge servers at a certain moment, and can be represented by an N-M matrix, matrix elements represent the proportion of the M edge servers to the M service deployment distributed service resources, and S is the ratio of the N edge servers to the M edge servers_nmIndicating that the Nth edge server has allocated S to M services_nmThe service resource of (2);

the whole service deployment strategy prediction system is also a feedback reward of an intelligent agent in an algorithm for the edge computing service deployment adjustment action at the current moment according to the current service deployment state environment, wherein the edge computing service deployment reward is a reward given by the intelligent agent when the edge service deployment state selects a certain service deployment action to adjust in the algorithm iteration process at a certain time; the intelligent agent calculates the average response time of the user request and the total energy consumption of the server in the current time period according to the mobile edge calculation service deployment state after the current action is adjusted, and the intelligent agent feeds back corresponding excitation by comparing the average response time of the user in the last iteration process with the total energy consumption of the server; the following description is made with respect to excitation feedback results for four different cases:

1. if Tt +1< Tt and Ent +1< Ent then Reward is β Reward (β is 2), i.e. the next user average response time is less than the current user average response time and the next total server energy consumption is less than the current total server energy consumption, the agent gives 2 times positive feedback.

Tt +1< Tt and Ent +1> Ent then Reward β Reward (β ═ 1), i.e. the next user average response time is less than the current user average response time and the next total server energy consumption is greater than the current total server energy consumption, the agent gives 1 time positive feedback.

Tt +1> Tt and Ent +1< Ent then Reward β Reward (β 0.5), i.e. the next time the user average response time is greater than the current user average response time and the next time the server total energy consumption is less than the current server total energy consumption, the agent gives 0.5 times positive feedback.

Tt +1> Tt and Ent +1> Ent then Reward ═ β ═ Reward (β ═ -1), i.e. the next time the user average response time is greater than the current user average response time and the next time the total server energy consumption is greater than the current server total energy consumption, the agent gives 1 time negative feedback.

In the invention, under a cloud and mist mixed configuration scene, due to the fact that multiple influence factors are considered and the wider and wider market conditions are faced, the problem is a complex decision problem with a larger problem space, aiming at the edge computing environment, the invention designs a related intelligent decision Agent system based on deep reinforcement learning, and the system can predict the deployment strategy of the service combination at the edge end in the next stage; the core of the system is an intelligent agent module, and the design flow of the algorithm in the intelligent agent module is mainly based on the following steps:

first, two neural networks need to be established: the evaluation network Q ═ s, a | θ^Q) And policy network mu (s | theta)^u) Then, parameters of an online evaluation network and an online strategy network are initialized, and the parameters are initialized to random numbers theta^Q＝RandomInit(Q),θ^μRandomInit (mu), and then copying the parameters of the two online neural networks to the corresponding target neural network parameters theta^Q′＝θ^Q，θ^μ′＝θ^μObtaining two identical neural networks Q ', mu' for soft update, finally setting the size of the memory buffer area to m, and initializing the memory playback buffer area

Forepoch is 1, maximum number of rounds do:

initializing the state of the deep reinforcement learning environment, loading the quantity distribution condition and the type distribution condition of the user service request in each time period, and S₁＝Env.reset()

For t is 1, the maximum number of steps do of the cycle:

1. the policy network obtains an edge service deployment adjustment action a according to the behavior policy beta_t，a_t＝μ(s_t|θ^u)

2. Adding exploration noise to the action at the current moment to obtain a new edge service deployment adjustment action a_t，a_t＝a_t+Norm(ExploreNoise)

3. Interacting with the environment, and obtaining the state and the income according to two indexes of average response time of the user request and energy consumption of the server_t+1,r_t+1＝env.step(a)

4. Saving the status, action, report, next status into the memory R, R_t,a_t,r_t,s_t+1)

5. Sampling n samples from memory, R⁽ⁿ⁾＝R.sample(n)

6. Estimating motion using estimation network y_i＝r_i+γQ′(s_i+1,μ′(s_i+1|θ^μ′)|θ^Q′)

7. Updating parameters of the evaluation network and the strategy network and calculating the difference between the evaluation and the memory

8. Update parameter Q.update (L) according to gradient_Q),μ.update(L_μ)

θ^Q′＝τθ^Q+(1-τ)θ^Q′Soft update of parameters of a target estimation network

θ^μ′＝τθ^μ+(1-τ)θ^μ′Soft update of parameters of a target policy network

Save (mu') output target action network

The intelligent agent module can well predict and suggest the service combination strategy on each edge server in the next stage through the algorithm flow designed above.

The invention has the advantages that:

the DDPG algorithm-based user service request deployment algorithm of the system of the dynamic service deployment strategy in the cloud and fog mixed scene can effectively reduce the average user service request response time and the edge server energy consumption, and the system trains a strategy network and a value network through a neural network, so that an intelligent agent can obtain a better mobile edge computing service deployment strategy which accords with the expectation according to the user service request in the current time period.

Drawings

FIG. 1 is a diagram of a moving edge calculation.

Fig. 2 is a module level diagram of a service deployment policy system according to an embodiment of the present invention.

Fig. 3 is a flow chart of a user service request.

FIG. 4 is a DAG diagram of a service request processing task flow.

Fig. 5 is a diagram illustrating the distribution of the number of service requests per hour according to the embodiment.

FIG. 6 is a diagram illustrating the distribution of different types of user service requests according to an embodiment.

FIG. 7 is a graph of Q _ Loss values during an iterative neural network training process.

FIG. 8 is a graph comparing average response times of different algorithms.

FIG. 9 is a graph comparing energy consumption of edge servers for different algorithms.

Fig. 10 is a general structural view of the present invention.

Detailed Description

In order to make the technical problems to be solved and the technical solutions proposed by the present invention clearer and highlight the advantages of the present invention in solving edge dynamic service deployment under a cloud and mist mixed architecture, the following will be described in detail with reference to the accompanying drawings and specific embodiments.

In the dynamic service deployment system for the cloud and mist mixed scene, the moving edge calculation scene is mainly faced under the cloud and mist mixed scene, the scene structure diagram is shown in fig. 1, and the system is developed based on the following four assumptions.

Assume one: the requests of different users in different time periods are independent of each other, i.e. one user request is not linked to another user request in the current time period.

Assume two: the service request types of the users are m, the m different user requests are formed by arranging and combining n sub-services, namely the request types of the users are m in total, the user requests are formed by a plurality of sub-services originated from the n sub-services, and the sub-services have logic dependence and execution sequence precedence relationship. For example, the user request A can be split into n1- > n2- > n3 sub-services, the user request A is firstly subjected to the calculation of the n1 service, the result of the n1 service after calculation is taken as the input of the n2 service, the result of the n2 service after calculation is taken as the input of the n3 service calculation, and finally the n3 service obtains the result through calculation and returns the result to the user.

Suppose three: the lengths of the messages in each stage process of the user request are approximately equal, that is, the lengths of the messages are considered to be approximately equal no matter whether the user initiates a request at the edge end, the user requests are forwarded and calculated among the sub-services, and the user requests finally return to the calculation result at the edge end.

Assume four: the transmission time delay of the unit length data message in the data transmission process is fixed and equal. It is assumed that the unit transmission speeds of the data packets between the user mobile device and the edge server are equal, the unit transmission speeds of the data packets between the edge server and the cloud center are also equal, and the transmission delay between the edge server and the edge server is equal to zero.

In the dynamic service deployment system of the cloud and mist mixed scene, the optimization target mainly comprises two aspects, one is the average response time of a user request, and the other is the energy consumption of the server side. The invention dynamically adjusts the ratio of the edge server side service resources according to different user service request quantity distribution conditions and different types distribution conditions in different time periods so as to achieve the minimum user average response time and the minimum server energy consumption under the condition of user requests in the current time.

The request distribution conditions of the mobile edge end user in each time period are different, and the dynamic deployment of the mobile edge computing service aiming at the different user request distribution conditions in different time periods can meet the requirements of lower interactive response time of the user request and the energy consumption of the edge server end; the evaluation criterion for evaluating the quality of a mobile edge computing dynamic service deployment scheme is the average response time of a user request and the energy consumption of a server, and then the two evaluation indexes of the mobile edge computing dynamic service deployment are formally defined and explained.

For the calculation of the average response time of the user service request in a certain time period, a flow chart of the user service request is shown in fig. 3, and a calculation formula is shown in formula 1.

T_iRepresenting the response time of a single user service request during that time period, and n represents the total number of user requests during that time period.

The response time of a single user service request refers to the time consumed by the user to obtain a calculation result from the initiation of the user to the acquisition of the whole service request, and the response time can be divided into three parts according to the processing process of the service request, namely service request initiation transmission time, service request calculation time and service request return time; for the response time calculation of a single user service request, the calculation formula 2 is adopted to calculate:

T＝T_in+T_process+T_out (2)

(1)T_ininitiating a transmission time for the service request, the user request representing the mobile edge terminal being sent by the edge terminal device toAppointing the transmission time of the edge server end, and the process of service request initiation is that the mobile user uses the edge end equipment to send the service request data volume D_inAnd transmits the request R to the edge server through the network, wherein the bandwidth of the transmission network is V_inThe calculation expression of the service request initiation transmission time is shown in formula 3,

(2)T_processcalculating time for service request, wherein the time represents the time for splitting the user request of mobile edge calculation into sub-services with logic dependency relationship in the edge server network according to the user request type and forwarding operation between the edge servers, and the service request calculation process is that the size of the data volume received by the edge service cluster is D_inThe request R is divided into n sub-services with logic dependency relationship according to the type of the request, then each sub-service carries out forwarding calculation between edge servers according to the service matching conditions on each current edge calculation server and obtains the final result, the time of user service request calculation is composed of a plurality of sub-service calculation request times, t_{process_i}Assuming that the information transmission delay between the edge servers is zero, the calculation expression of the service request calculation time is shown in formula 4,

(3)T_outthe service request return time represents the time of the user request of the mobile edge computing returned to the user edge equipment by the edge server of the last sub-service after being processed, and the service request return process is that the data volume after the edge server computing processing is D_outHas a calculated passing bandwidth of V_outThe network of (2) returns to the edge device of the mobile subscriber, the expression of the service request return time is as followsAs shown in the formula 5, the compound is represented by,

the linear power consumption model based on the CPU utilization rate is a server energy consumption model which is most widely applied in data center energy consumption research, therefore, for the calculation of the server energy consumption, a power model based on the CPU utilization rate is adopted to calculate the edge service energy consumption, the model can accurately track the power use condition of the server, and u is the CPU utilization rate of the server at the current moment, P_maxRepresenting the average power at full utilization of the server CPU, P_idleRepresenting the average power, T, of the server CPU in the idle state_uThe time of the server running under the condition that the CPU utilization rate is equal to u is shown, E shows the total energy consumption of the server, the energy consumption formula of any server on the edge end is shown by a formula 6,

E＝((P_max-P_idle)*u+P_idle)*T_u (6)

according to the method, a dynamic service deployment strategy system under a cloud and mist mixed scene provides a scheme for matching service resources on an edge server by taking average response time of a user service request and a server energy consumption calculation formula as an optimization target, the strategy system mainly comprises a parameter configuration module, a service request module, an agent module, an experience sampling module, an actor neural network module and a critic neural network module, and the framework diagram of the strategy system is shown in FIG. 2;

the parameter configuration module is composed of a user service request parameter submodule, an edge server parameter submodule, a transmission network bandwidth parameter submodule and a DDPG algorithm hyper-parameter submodule and is respectively used for setting user service request related parameters, edge server related parameters, transmission network bandwidth parameters and DDPG algorithm hyper-parameters; the user service request parameter submodule comprises parameters such as the number of service types available to a user, the input data volume of calculation requirements of each service component, the calculation output data volume of each service component, the average instruction number required by the service to execute calculation, the other resource requirement volume of the service component when executing calculation tasks, the maximum response time threshold of each service component and the like; the edge server parameter submodule comprises parameters such as the number of edge servers, the resource capacity of the edge servers, the CPU cycle frequency of the cloud server, the average CPU cycle number required by the server to execute each instruction and the like; the transmission network bandwidth parameter submodule comprises parameters such as network transmission bandwidth between the mobile equipment and the edge server, network transmission bandwidth between the edge servers and the cloud server and the like; the DDPG algorithm hyper-parameter submodule comprises parameters such as the type of a neural network optimizer, the type of a neural network activation function, the learning rate in an actor or network, the learning rate in a critic network, the iteration times, the noise distribution of model output actions, a return attenuation coefficient, soft update parameters, the capacity of a memory playback buffer area, the quantity of small-batch training data and the like;

the service request module is used for preprocessing a task request sent by a mobile user to an edge server in each time, obtaining the distribution condition of the type and the quantity of the user service request in each time period, splitting the user service request into sub-services and storing the logic dependency relationship among the sub-services; the service request of the user is mostly composed of a plurality of sub-jobs with logic dependence, which can be represented by DAG Directed acyclic graph (Directed acyclic graph), each work job can be composed of a plurality of sub-work jobs, and the work job task can be regarded as completed only if all the sub-work jobs are completed, that is, if the sub-work task-2 depends on the sub-work task-1, any instance of the sub-work task 2 must be processed after the sub-work task-1 is completed; the task DAG in a job may be inferred from the name field of the user request category, e.g., M1_2_ 4: meaning that sub-work task-2 depends on the completion of sub-work task-1, then sub-work task 4 depends on the completion of sub-work task-2, the DAG diagram of the work task flow is shown in fig. 4.

The intelligent agent module is used for carrying out iterative computation on deep reinforcement learning to obtain an intelligent agent of an optimal solution of edge service deployment, the intelligent agent is a processing module of a deep reinforcement learning algorithm and is used for coordinating and scheduling other modules, the intelligent agent module solves the problem of how to deploy service resources in an edge service deployment environment to obtain the minimum average request time delay and the minimum energy consumption, and the intelligent agent module carries out deep reinforcement learning through cyclic iteration and is communicated with an experience sampling module, an actor neural network module and a critic neural network module to train the actor neural network and the critic neural network.

The experience sampling module is used for collecting (state, action, return and next step state) quadruples generated in the iterative process of a deep reinforcement learning algorithm according to a sampling strategy, and the experience sampling module takes the sampled (state, action, return and next step state) quadruples as a training set of the deep reinforcement learning neural network in a batch processing mode and is used for training an Actor neural network and a critic neural network to enable the neural network to be converged to an expected value.

The actor or neural network module is used for learning the valuation neural network, finding out a proper service deployment scheme, optimizing the average time delay of service requests and the energy consumption of the edge server, and the actor or neural network obtains the adjustment of the resource ratio of the edge server in the next iteration process according to the distribution condition of the service requests in the current time period and the resource ratio state of the edge server in the previous iteration process.

The critic neural network module is used for learning a strategy neural network and evaluating the quality of a corresponding service deployment scheme under the current service request distribution condition, and the critic neural network module evaluates according to the current time period service request distribution condition, the edge server resource matching state and the edge server resource adjustment action of the next iteration so that the intelligent module judges the neural network training effect and carries out the next iteration.

Example 1

In order to implement the whole process of the method and evaluate the performance of the method, the experimental part of the invention carries out experimental research on a user service request based on a real cloud and mist mixed scene in a simulation environment to obtain an experimental result of a mobile edge computing dynamic service deployment algorithm based on the method of the invention, and selects a frequency-based service deployment algorithm and an LSTM-based service deployment method as comparison experiments to verify the quality of a deployment result scheme of the invention on the problem and compares the two to obtain the experimental result.

In this embodiment, a part of the cluster-trace-v2018 of the preprocessed Alibara open cluster tracking calculation is used as the user service request data of the experiment. The data in this data set originates from a real production cluster environment, where there are online services (also known as long running applications, LRAs) and batch workloads on each computer in the cluster. The data file Cluster-trace-v2018 comprises about 4000 machines, 8 days per day, consisting of 6 tables (each being a file).

Because the cluster-trace-v2018 experiment data set stores and records the related data of the physical service machine, the container running on the physical machine, the batch processing task instance and the type, only the service request data of the user is concerned in the experiment, the cluster-trace-v2018 experiment data set needs to be subjected to certain data preprocessing abstraction and the service request data information of the mobile edge end user is simulated; typically, a user service request in a mobile edge computing scenario needs to contain these several fields: time, time stamp, user IP address, request type, access edge server address, number of instructions required by service request calculation, request service dependency relationship and the like; abstracting each batch processing task in a batch _ task table in a data set into a service request of a mobile edge end, wherein a task _ name field in the batch _ task table contains the logic dependency relationship of the batch processing tasks, a start _ time field contains the time for the batch processing tasks to start working, and a jobname field contains the names of the batch processing tasks; considering the scene of the service request of the mobile edge end user, mapping a task _ name field to a dependency relationship field in a service request table, mapping a start _ time field to a time and timestamp field in the service request table, and mapping a job _ name field to a request type field in the service request table; for the processing of the IP address field of the mobile user in the service request table, the IP address of the mobile edge end user may come from different users in the same region in reality, in this embodiment, in order to better simulate the real environment, the IP address of the mobile user is randomly generated to indicate that the request may come from any mobile user, for the processing of the IP address field of the access edge server in the service request table, the service request of the mobile edge end user in reality is processed by some fixed edge servers in a region, the IP address of the access edge server is the information determined by the IP address and the geographic location of the user, in this embodiment, N IP addresses of the edge servers are set, all the user requests are accessed to one of the N edge servers, for the processing of the instruction number field in the service request table, the number of the execution instructions required by the user request in reality depends on the type of the user request, in the embodiment, the required number of instructions is set according to the type of the user request;

100000 preprocessed cluster-trace-v2018 user service requests are selected as experimental data, the distribution of the number of the service requests per hour from 2018-10-02 to 2018-10-08 is analyzed and counted according to the distribution of the user service requests on a time axis, the distribution is shown in figure 6, and the distribution of the service requests from 2018-10-02 to 2018-10-08 from 2018-10-08 is analyzed and counted according to the distribution of the user service requests on service types, the distribution is shown in figure 7.

According to the method, based on a DDPG (distributed data group) mobile edge computing dynamic service deployment algorithm, data in a time period from 2018-10-02 to 2018-10-07 are used as training data, appropriate neural network parameters are obtained after algorithm iterative training, and a neural network model of the parameters is used for implementing mobile edge computing dynamic service deployment; then, the user service request data in the 2018-10-08 time period is used as a new input request parameter, the algorithm model outputs a mobile edge computing service resource matching scheme with the highest valuation in each hour time period, the corresponding user request average response time and the corresponding edge server energy consumption, and a Q _ Loss curve of the valuation network of the intelligent agent module after 25000 iterations is shown in fig. 8.

In order to compare the dynamic service experiment effect of the DDPG algorithm-based mobile edge calculation provided by the invention, a frequency-based service request deployment algorithm and an algorithm for service deployment based on LSTM prediction request distribution condition are selected as reference algorithms and the results are compared;

the service request deployment algorithm based on frequency is to count according to a service request data set in a time period from 2018-10-02 to 2018-10-07 to obtain the distribution condition of user service requests in each hour, count according to the types of the user service requests and the sub-service logic dependency relationship of the types, finally obtain the proportion condition of the request quantity of each sub-service in each hour and adjust the proportion condition of each service resource in each time period from 2018-10-08 according to the proportion condition;

the algorithm for service deployment based on the LSTM prediction request distribution situation takes data in a time period from 2018-10-02 to 2018-10-07 as training data, predicts the user service request distribution situation in the time period from 2018-10-08, and adjusts the service resource proportion situation in the current time period according to the predicted proportion situation of the number of the sub-service requests;

sequencing according to the average response time of the user service requests, so that the average response time of the user service requests based on the DDPG algorithm is shortest, the algorithm for service deployment is next to that based on the LSTM prediction request distribution condition, and the average response time of the user service requests based on the frequency service request deployment algorithm is longest; the DDPG algorithm-based user service request deployment algorithm can effectively reduce average user service request response time and edge server energy consumption, and the strategy network and the value network are trained through the neural network, so that an intelligent agent can obtain a better mobile edge computing service deployment strategy according with expectation according to the user service request in the current time period.

The method of the invention is based on the DDPG algorithm and calculates the shortest average response time of the user request under the service deployment scheme of the mobile edge, but the energy consumption of the edge server under the service deployment scheme is obviously larger than that of the service deployment scheme based on the LSTM algorithm and the service deployment scheme based on the frequency; the comparison graph of the energy consumption of the edge servers with different algorithms also illustrates from the side that the mobile edge computing service deployment scheme based on the DDPG algorithm cannot simultaneously meet the two requirements of the shortest average response time and the minimum energy consumption of the edge servers; in the process of modeling a deep reinforcement learning problem, the weight of the average response time of the user request in the Reward is set to be larger than the energy consumption of the edge server, and the index of the average response time of the user request can be more considered in the corresponding mobile edge computing service deployment scheme based on the DDPG algorithm, so that the computing task on a part of the edge servers is heavier while the average response time of the user request is minimized by the algorithm, the corresponding energy consumption index is correspondingly increased, and the experimental effect of the service deployment scheme is shown in fig. 9 and 10.

The foregoing is a preferred embodiment of the present invention, which is described in more detail and detail, but is not to be construed as limiting the scope of the invention. It should be noted that modifications and adaptations may occur to those skilled in the art without departing from the principles of the present invention and should be considered within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A service combination optimization deployment method based on deep reinforcement learning in a cloud and mist mixed environment is characterized in that the method is a method for predicting and selecting a deployment scheme of specific service application on edge nodes and adjusting and distributing resources required by the service application under the condition that a plurality of edge meter nodes exist simultaneously in an architecture scene that cloud computing and mist computing exist simultaneously, wherein a resource decision body Agent in a system is adopted to complete a predicted initial service deployment scheme and a resource distribution adjustment strategy of services, and a core prediction algorithm in an intelligent decision body Agent is realized based on the deep reinforcement learning DDPG (deep dependent decision Policy) Policy.

2. The method as claimed in claim 1, wherein in the method, the application optimization deployment system based on deep reinforcement learning is mainly focused on an edge server where an edge node in a fog computing scene is located, service applications are automatically and optimally deployed, a service user targeted for implementing a service combination optimization deployment scheme is mainly a cloud service provider, while meeting the user service deployment requirement and the service quality requirement, only edge server resources are efficiently utilized, the server energy consumption in each edge node is reduced, the capital investment of the service provider is macroscopically reduced, and at the same time, an acceptable application performance is guaranteed, and the specific implementation process is as follows:

(1) the design of an edge node service deployment prediction system based on deep reinforcement learning DDPG (deep dependent Policy gradient) is provided, which comprises the steps of designing a universal intermediate agent framework with comprehensive functions on each edge server, it will receive the collector Cavisor on each edge server at regular time, the edge server sent at regular time will receive the average utilization rate and status of CPU, memory and hard disk storage resource in the last time period, the middle Agent framework integrates, cleans and processes the input data collected on each edge server to form an input data set, then the data set is used as the input data of an intelligent decision Agent, the intelligent decision Agent calculates the deployment strategy and the resource allocation quantity of each service application in the next time period, and then the services on each edge server adjust the service deployment and allocate resources according to the strategy and the resource allocation quantity;

(2) a mathematical model for dynamic service deployment of the system is proposed,

establishing a mathematical model of dynamic service deployment, and applying deep reinforcement learning to the dynamic service deployment in a cloud and fog mixed configuration scene, wherein input data acquisition of an algorithm is sourced from current affair acquisition data on an edge server operated and maintained by a cloud service provider, so that the system can be well applied to a real environment; the established mathematical model of dynamic service deployment quantifies various influencing factors in an entity scene, defines and processes the format of data, and solves the problems of measurement and definition of algorithm parameters in the scene;

(3) a service combination deployment model in an edge computing environment is proposed,

the method comprises the steps of enabling service application combinations to be increased explosively based on the requirements of a large number of users, deploying part of service combinations to edge nodes close to the users, establishing a service combination department calculation model in an edge computing environment, and supporting deployment requests of the service combinations on the edge nodes.

3. The method of claim 1, wherein the overall system operation of said method includes,

and carrying out dynamic service deployment: each edge node consists of a small number of servers which form an edge node; in the initial node of the service combination life cycle, each service combination is initially deployed on each edge node according to the user request, the whole service combination deployment strategy in the next time period is adjusted along with the change of the user position and the change of the time and environmental factor conditions according to the service running state and the request condition in the previous time period, the resource usage of service allocation is adjusted, the problem of low resource utilization rate is solved, the service quality is improved,

an intermediate Agent system is designed above each edge node layer, namely, an edge node service deployment prediction system developed based on deep reinforcement learning DDPG (distributed denial of service), namely, an intelligent decision Agent: a data acquisition function module runs on each edge node, the module sends the request condition and the resource use condition of each service component on the edge node in the previous time period to an intelligent decision-making Agent at regular time, a service provider sets different parameters for the intelligent decision-making Agent according to the time period, the Agent comprehensively sets the parameters and the collected data, processes the data to generate input data, then the input data is input into an algorithm module for calculation to obtain a service deployment strategy recommended in the next time period, and then the service deployment strategy is applied to the service deployment of each edge node;

the problem that a plurality of service components are reasonably deployed to each edge node and dynamic balance is sought between the energy consumption and the service quality of the server, which is concerned by a service provider, is solved.

4. The method according to claim 2 or 3, wherein the data quantization and collection work of the input data suitable for the intelligent decision Agent is completed by a collection program of an edge node service deployment prediction system based on deep reinforcement learning DDPG (deep Deterministic Policy gradient);

the dynamic service deployment problem needs to abstract and convert a complex and changeable cloud and fog mixed scene in reality into an easily-described mathematical model close to a real environment, wherein a problem predefined condition is obtained based on the following four assumptions:

assume two: the service request types of the users are m, the m different user requests are formed by arranging and combining n sub-services, namely the request types of the users are m in total, the user request is formed by a plurality of sub-services originated from the n sub-services, and the plurality of sub-services have logic dependence and execution sequence precedence relationship; for example, the user request A can be split into n1- > n2- > n3 sub-services, the user request A is firstly subjected to the calculation of the n1 service, the result of the n1 service after calculation is taken as the input of the n2 service, the result of the n2 service after calculation is taken as the input of the n3 service calculation, and finally the n3 service obtains the result through calculation and returns the result to the user;

5. A method according to claim 2 or 3, characterized in that, in the method,

in the edge dynamic service deployment problem, an optimization target mainly comprises two aspects, wherein one is the average response time of a user request, and the other is the energy consumption of a server end; the proportion of the edge server side service resources is dynamically adjusted according to different user service request quantity distribution conditions and different types distribution conditions in different time periods, so that the minimum user average response time and the minimum server energy consumption under the condition of user requests in the current time are achieved;

setting an evaluation standard for evaluating the dynamic service deployment scheme of the mobile edge computing: the method comprises the following steps that a user requests an evaluation index of the average response time and the energy consumption of a server, and the evaluation index is formally defined and explained as follows:

the average response time of the user service request in a certain time period is calculated by adopting the following calculation formula,

ti denotes the response time of a single user service request during the time period, n denotes the total number of user requests during the time period,

the response time of a single user service request refers to the time consumed by the user to obtain a calculation result from the initiation of the request by the user in the whole service request, and is divided into three parts according to the processing process of the service request, namely the service request initiation transmission time, the service request calculation time and the service request return time, and the response time calculation of the single user service request adopts the following formula to calculate:

T＝T_in+T_process+T_out

T_ininitiating a transmission time for the service request, wherein the transmission time represents the transmission time of the user request of the mobile edge terminal sent to the appointed edge server terminal by the edge terminal equipment, and the service request is initiatedThe data size of service request sent by the edge terminal equipment by the mobile user is D_inAnd transmits the request R to the edge server through the network, wherein the bandwidth of the transmission network is V_inThe calculation expression of the service request initiation transmission time is shown as the following formula,

T_processcalculating time for service request, wherein the time represents the time for splitting the user request of mobile edge calculation into sub-services with logic dependency relationship in the edge server network according to the user request type and forwarding operation between the edge servers, and the service request calculation process is that the size of the data volume received by the edge service cluster is D_inAccording to the request R, the request R is divided into n sub-services with logic dependency relationship according to the request type, then each sub-service carries out forwarding calculation between edge servers according to the service matching conditions on each current edge calculation server to obtain a final result, the time of user service request calculation is composed of a plurality of sub-service calculation request times, t_{process_i}Assuming that the information transmission delay between the edge servers is zero, the calculation expression of the service request calculation time is shown as the following formula,

the linear power consumption model based on the CPU utilization rate is a server energy consumption model which is most widely applied in data center energy consumption research, a power model based on the CPU utilization rate is adopted to calculate the edge service energy consumption for the calculation of the server energy consumption, the model can accurately track the power use condition of the server, and u is the CPU utilization rate of the server at the current moment, P_maxRepresenting the average power at full utilization of the server CPU, P_idleRepresenting the average power, T, of the server CPU in the idle state_uThe energy consumption formula of any server on the edge end is expressed by the following formula,

E＝((P_max-P_idle)*u+P_idle)*T_u。

6. the method according to claim 2 or 3, wherein the method for establishing the relation between the scene parameters and the service deployment problem, modeling the service combination deployment model in the edge computing environment and quantifying the expression form of the deployment strategy in the algorithm comprises the following steps:

predicting system environment deployment for edge service deployment refers to adjusting resource allocation on a plurality of edge servers at the same time, and is represented by an N-M matrix, wherein matrix elements represent the proportion of service deployment adjustment service resources of the N edge servers to M edge services, and Anm represents the resource of Anm adjusted by the Nth edge server to the M services, namely the resource of Anm is newly added or reduced by the Nth edge server on the Mth service;

the edge computing service deployment state describes a service deployment state of mobile edge computing in the iteration process of the deep reinforcement learning algorithm in the current time slice, the edge computing state refers to the percentage of M service resources distributed on N edge servers at a certain moment, and the matrix of N x M represents the service deployment stateWherein, the matrix element represents the proportion of the N edge servers to the M edge servers for service deployment and distribution of service resources, S_nmIndicating that the Nth edge server has allocated S to M services_nmThe service resource of (2);

the whole service deployment strategy prediction system is characterized in that an intelligent agent in an algorithm needs to feed back rewards of edge computing service deployment adjusting actions at the current moment according to the current service deployment state environment, the edge computing service deployment rewards are rewards given by the intelligent agent when the edge computing service deployment state selects a certain service deployment action for adjustment in the algorithm iteration process at a certain time, the intelligent agent calculates the user request average response time and the server total energy consumption in the current time period according to the mobile edge computing service deployment state after the current action is adjusted, and the intelligent agent feeds back corresponding excitation by comparing the user average response time and the server total energy consumption in the last iteration process;

the excitation feedback results are as follows:

(1) if Tt +1< Tt and Ent +1< Ent then Reward is β Reward (β is 2), i.e. the next user average response time is less than the current user average response time and the next total server energy consumption is less than the current total server energy consumption, the agent gives 2 times positive feedback;

(2) tt +1< Tt and Ent +1> Ent, then Reward is beta Reward (beta is 1), namely the next user average response time is less than the current user average response time and the next server total energy consumption is greater than the current server total energy consumption, and the intelligent agent gives 1 time of positive feedback;

(3) tt +1> Tt and Ent +1< Ent, then Reward is beta rewarded (beta is 0.5), namely the next time the user average response time is larger than the current user average response time and the next time the server total energy consumption is smaller than the current server total energy consumption, the intelligent agent gives 0.5 times of positive feedback;

(4) tt +1> Tt and Ent +1> Ent then Reward ═ β ═ Reward (β ═ -1), i.e. the next time the user average response time is greater than the current user average response time and the next time the total server energy consumption is greater than the current server total energy consumption, the agent gives 1 time negative feedback.

7. The method of claim 1, wherein the method is characterized in that a related intelligent decision Agent system is designed, the system can predict the deployment strategy of the service combination of the next stage at the edge, the core of the system is an intelligent Agent module, and the design flow of the algorithm in the intelligent Agent module is mainly based on the following steps:

Forepoch is 1, maximum number of rounds do:

For t is 1, the maximum number of steps do of the cycle:

(1) the policy network obtains an edge service deployment adjustment action a according to the behavior policy beta_t，a_t＝μ(s_t|θ^u)

(2) Adding exploration noise to the action at the current moment to obtain a new edge service deployment adjustment action a_t，a_t＝a_t+Norm(ExploreNoise)

(3) Interacting with the environment, and obtaining the state and the income according to two indexes of average response time of the user request and energy consumption of the server_t+1,r_t+1＝env.step(a)

(4) Saving the status, action, report, next status into the memory R, R_t,a_t,r_t,s_t+1)

(5) Sampling n samples from memory, R⁽ⁿ⁾＝R.sample(n)

(6) Estimating motion using estimation network y_i＝r_i+γQ′(s_i+1,μ′(s_i+1|θ^μ′)|θ^Q′)

(7) Updating parameters of the evaluation network and the strategy network and calculating the difference between the evaluation and the memory

(8) Update parameter Q.update (L) according to gradient_Q),μ.update(L_μ)

Save (mu') output target action network

The intelligent agent module can predict and suggest the service combination strategy on each edge server in the next stage through the algorithm flow designed above.