CN117255126A - Data-intensive task edge service combination method based on multi-objective reinforcement learning - Google Patents

Data-intensive task edge service combination method based on multi-objective reinforcement learning Download PDF

Info

Publication number
CN117255126A
CN117255126A CN202311038793.1A CN202311038793A CN117255126A CN 117255126 A CN117255126 A CN 117255126A CN 202311038793 A CN202311038793 A CN 202311038793A CN 117255126 A CN117255126 A CN 117255126A
Authority
CN
China
Prior art keywords
service
task
data
subtask
state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311038793.1A
Other languages
Chinese (zh)
Inventor
程良伦
黄诗卿
王涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Nengge Knowledge Technology Co ltd
Guangdong University of Technology
Original Assignee
Guangdong Nengge Knowledge Technology Co ltd
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Nengge Knowledge Technology Co ltd, Guangdong University of Technology filed Critical Guangdong Nengge Knowledge Technology Co ltd
Priority to CN202311038793.1A priority Critical patent/CN117255126A/en
Publication of CN117255126A publication Critical patent/CN117255126A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/092Reinforcement learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/60Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses a data-intensive task edge service combination method based on multi-objective reinforcement learning, which comprises the steps of receiving a service request, acquiring task request data, decomposing the task request data, acquiring a plurality of subtask data, constructing a dependency graph of a task execution flow, determining an optimal service scheduling strategy by combining a service combination optimization algorithm, and performing service scheduling to execute a data-intensive task. According to the method and the system, the advantages of edge calculation are fully utilized by a service combination optimization algorithm, the calculation resources and the network resources of the edge nodes are reasonably combined and cooperated, limited resources are effectively utilized, resource waste is avoided, the condition that resources among the edge nodes occupy imbalance is reduced, the efficiency and the performance of task processing are optimized while the node load is balanced, real-time efficient processing of data-intensive tasks is achieved, and the production efficiency and the resource utilization efficiency are improved. The method and the device are widely applied to the technical field of data processing.

Description

Data-intensive task edge service combination method based on multi-objective reinforcement learning
Technical Field
The application relates to the technical field of data processing, in particular to a data-intensive task edge service combination method based on multi-objective reinforcement learning.
Background
In the last decades, urban and industrial internet environments have undergone rapid development and revolution. With the popularity of internet of things (IoT) technology, sensors, devices, and systems generate large amounts of data, covering a variety of areas from urban infrastructure to industrial production. These data are widely used for real-time monitoring, prediction, optimization, decision making, and the like.
However, in the face of massive real-time data processing tasks, there may be some limitations to conventional cloud computing modes. Because the data may face problems such as network delay, bandwidth limitation, potential safety hazards and the like in the transmission process, it may be impractical to transmit all the data to the cloud for centralized processing. Particularly, for tasks with high real-time requirements, such as intelligent traffic management, intelligent factory automation, real-time environment monitoring and the like, the performance of the traditional cloud computing may be reduced due to the delay of the traditional cloud computing, and the demands of the data-intensive tasks on computing, storage and network resources are high, so that the traditional cloud computing cannot meet the demands of efficient processing.
Noun interpretation:
data intensive tasks: the computer program task containing a large amount of data analysis processing operation has the characteristics of large data volume, high data complexity and high data change speed.
Disclosure of Invention
In order to solve at least one technical problem in the related art, the embodiment of the application provides a data-intensive task edge service combination method based on multi-objective reinforcement learning, which aims to efficiently process data-intensive tasks in a distributed edge service component, balance loads of edge nodes and achieve higher overall performance.
The data-intensive task edge service combination method based on multi-objective reinforcement learning provided by the embodiment of the application comprises the following steps:
receiving a service request and acquiring task request data; the task request data is relevant task data corresponding to the data-intensive task sending out the service request;
decomposing the task request data to obtain a plurality of subtask data;
constructing a dependency graph of a task execution flow according to the plurality of subtask data;
determining an optimal service scheduling strategy according to a preset service combination optimization algorithm and the dependency graph;
and carrying out service scheduling according to the optimal service scheduling strategy so as to execute the data-intensive task.
In some embodiments, the subtask data includes subtask category data, subtask number data, and subtask execution data.
In some embodiments, the step of constructing a dependency graph of the task execution flow according to the plurality of subtask data specifically includes:
initializing the dependency graph;
acquiring task dependency relations among all the subtasks according to the plurality of subtask data;
and constructing the dependency graph according to the task dependency.
In some embodiments, the step of determining the optimal service scheduling policy according to a preset service combination optimization algorithm and the dependency graph specifically includes:
constructing a multi-target Markov decision process MDP model, and training the MDP model in a deep reinforcement learning mode;
and traversing the dependency graph, and selecting services and nodes through the MDP model to obtain the optimal service scheduling strategy.
In some embodiments, the dependency graph is specifically represented by the following formula:
G de =(N,V,W)
N=RS
V={v ij ,v jk ...,}
wherein G is de Representing a dependency graph, wherein RS is a task, N is a node set, each node in the node set is a subtask in the task RS, V is an edge set, V ij Representing subtask SF i And subtask SF j There is a sequence relation between SF i Is the starting point and is SF j Front-end tasks of SF i And SF (sulfur hexafluoride) j Respectively an ith subtask and a jth subtask in the RS, W is node additional information,respectively correspond to the current subtasks SF i Is the integrated processing time of SF i Result waiting time of processing completion, processing SF i Is a service, node carrying the service and SF i Is executed by the processor.
In some embodiments, the MDP model is specifically represented by the following formula:
MOMDP={SS,A,P,R,γ,ω,D ω }
wherein MOMDP represents an MDP model, SS is a state set in a service scheduling process, A is an action, P is a probability that executing action A under the state SS obtains rewards R and changes the state, R is rewards of executing action A under the state SS, gamma is a discount rate, gamma epsilon [0,1 ]]Gamma denotes historical rewards and impact on the future, bigger denotes the more important past selections, ω is the task allocation preference parameter, ω=<ω Processing/responding ,ω Network system ,ω (Resource) >Represents the preference degree omega of the system for each optimization target Processing/responding Preference parameters, ω, for processing time and response time of the service Network system To perform preference parameters, ω, for network conditions when a service is performed (Resource) Preference parameter for resource allocation situation when service is executed, D ω Is the probability distribution of ω;
wherein the SS k SF for the kth state in SS k Is in state SS k Corresponding subtasks to be executed currently, SF k-1 For the current subtask SF k Is used for the pre-task of (1),for subtask SF k-1 Data amount to be processed, +.>To represent the execution of the pre-task SF k-1 N represents a set comprising all service entity nodes, +>To independently complete subtask SF k And having the same input-output type, set of atomic micro-services,/->For subtask SF k Minimum resource requirement for execution, +.>Indicating whether there is a satisfactory subtask SF k Atomic micro-service performing the minimum resource requirements needed +.>For the available resources of the service entity node n, < +.>Indicating whether available service entity nodes n exist, BA is the bandwidth of all links, PA is a network reachable matrix, QT is the data packet queuing delay of all nodes, SF k+1 For the current subtask SF k Post tasks of (2);
wherein A is k To perform subtask SF k Is used for the action of (a),to independently complete subtask SF k And having the same input-output type, set of atomic micro-services,/->For atomic micro-services, k identifies subtasks, j identifies atomic micro-services are in the collectionBit sequence of->Representing atomic microservices->May be deployed on a service entity node N to provide a service, N representing a set containing all service entity nodes;
Wherein,is shown in the current state SS k Under, select->Action, get rewarded->And observe the state change to SS k+1 Probability of (2);
wherein,is a bonus vector, represented in state SS k Next, executing action A kj The resulting instant prize is awarded to the user,representing user's +.>Satisfaction in treatment time, +.>Representing user's +.>Satisfaction in time of responding to user request, < >>Is a user's execution completion atomic micro-service->The satisfaction of the time the data packet is transmitted in the network is then determined.
In some embodiments, the step of constructing a multi-objective markov decision process MDP model and training the MDP model by adopting a deep reinforcement learning mode specifically includes:
setting an optimization target and constraint conditions;
establishing a service scheduling database;
acquiring historical service scheduling data and storing the historical service scheduling data into the service scheduling database;
invoking the historical service scheduling data from the service scheduling database by adopting a deep reinforcement learning mode, performing iterative training on the MDP model according to the optimization target and the constraint condition, and calculating a loss function;
And stopping training the MDP model when the error result reflected by the loss function is in a preset range or the iterative training frequency reaches a preset frequency threshold value.
In some embodiments, the loss function is set by using the homolunar optimization method, specifically expressed by the following formula:
L(θ)=(1-λ)L A (θ)+λL B (θ)(λ∈[0,1])
wherein L is A (θ) is a time-series differential mean square error of the Q value of the network output, θ is a model parameter, L B (θ) is a function of the values that will be smoother, ω and ω ' are the set preference vectors, Q (ss, a, ω; θ) is the value of selecting action a in the task current state ss, ss and ss ' are the task states, a is the action, y is the approximation of Q (ss, a, ω; θ), Q ' (ss ', a, ω '; θ) k ) The Q value representing the next state of maximum inner product with y under the current selected preference ω, r is the current prize value, λ is initially 0, and λ=λ is increased exponentially step ,discount factor>Step is the number of iterations of the model.
In some embodiments, the step of traversing the dependency graph, and selecting a service and a node through the MDP model to obtain the optimal service scheduling policy specifically includes:
Setting a time period of service scheduling; the time period comprises a state setting period, a service scheduling period and an information storage period;
when the state setting period is in, detecting the service execution state corresponding to each subtask, modifying the task state of each subtask, and updating the task state;
when the service scheduling period is in, traversing the dependency graph, detecting node additional information corresponding to each subtask, determining an arranging task, selecting a service and a node for executing the arranging task through the MDP model, and determining a service scheduling strategy of the arranging task; the scheduling task is a subtask which is used for carrying out service scheduling in the service scheduling period and is executed after the service scheduling period is finished;
collecting and storing historical service schedule data while in the information storage period; the historical service schedule data is used for training by the MDP model.
In some embodiments, when executing the step of detecting the service execution state corresponding to each subtask while in the state setting period, modifying the task state of each subtask, and updating the task state, the method further includes:
And stopping executing service scheduling operation on the current task when the state of service execution corresponding to each subtask is detected to be completed, and receiving task request data corresponding to the next task sending out a service request.
According to the data-intensive task edge service combination method based on multi-objective reinforcement learning, service requests are received, task request data are obtained, the task request data are decomposed, a plurality of subtask data are obtained, a dependency graph of task execution flow is built, an optimal service scheduling strategy is determined by combining a service combination optimization algorithm, and service scheduling is performed to execute tasks. According to the method and the system, the advantages of edge calculation are fully utilized by a service combination optimization algorithm, the calculation resources and the network resources of the edge nodes are reasonably combined and cooperated, limited resources are effectively utilized, resource waste is avoided, the condition that resources among the edge nodes occupy imbalance is reduced, the efficiency and the performance of task processing are optimized while the node load is balanced, real-time efficient processing of data-intensive tasks is achieved, and the production efficiency and the resource utilization efficiency are improved.
Drawings
FIG. 1 is a flow chart of a method for data-intensive task edge service composition based on multi-objective reinforcement learning provided in an embodiment of the present application;
FIG. 2 is a schematic algorithm flow diagram of a model training mode of an MDP model in an embodiment of the present application;
FIG. 3 is a flow chart of determining an optimal service scheduling policy using a dependency graph and MDP model in an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
It should be noted that although functional block division is performed in a device diagram and a logic sequence is shown in a flowchart, in some cases, the steps shown or described may be performed in a different order than the block division in the device, or in the flowchart. The terms first, second and the like in the description and in the claims and in the above-described figures, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the present application.
Referring to fig. 1, fig. 1 is an optional flowchart of a data-intensive task edge service combining method based on multi-objective reinforcement learning according to an embodiment of the present application, which may include, but is not limited to, steps S101 to S105:
step S101, receiving a service request and acquiring task request data;
step S102, decomposing task request data to obtain a plurality of subtask data;
step S103, constructing a dependency graph of a task execution flow according to the plurality of subtask data;
step S104, determining an optimal service scheduling strategy according to a preset service combination optimization algorithm and a dependency graph;
step S105, service scheduling is carried out according to the optimal service scheduling strategy to execute the data-intensive tasks.
In step S101 of some embodiments, the task request data is relevant task data corresponding to a data-intensive task RS that sends out a service request, where the task relevant data includes multiple subtask data, specifically, the task request data of the RS is obtained, and the RS may be divided into multiple subtask SFs that are smaller i Each subtask SF i And own subtask data are provided, namely:
RS={SF 1 ,SF 2 ,…,SF k ,…,SF j }
wherein when all SF is i After all execution is completed, the RS execution is considered to be completed. Where, assuming that only sequential execution relationships exist between subtasks, each subtask may be completed by relying on an atomic micro-service (the smallest unit of service provided). Meanwhile, in order to mask the problems due to system isomerism, each atomic microservice (i identifies subtasks, j identifies bit sequences), all packaged in the form of a docker image. When needed, the task assigning center will require the service entity node (which may be a stand-alone or an edge node cluster, appearing outside as a single node providing resources) to dynamically load, instantiate and immediately execute the image, using +.>Representing execution task SF j Is a service entity node of (a). Use->Representing atomic microservices->Whether deployed to node N (N e N), N is used to represent the set containing all the service entity nodes, i.e., n= {1,2,3,.}.
In some embodiments, the following definitions are made for the application scenario and related formulas of the service combination optimization algorithm in the embodiments of the present application:
for each atomic microservice described aboveIt will all predefine its own SLA (Service Level Agreement (service level agreement), which is a contract or agreement defining the service level requirements of the service provider, in which SLA will be right->Maximum overall delay ∈of (2)>Minimum overall delay->Maximum data volume that can be processedMinimum data size->And minimum resource requirements for its execution +.>To make definitions, particularly defineThe following are provided:
considering the hardware differences, performance loss (including instruction virtualization, etc.) that may occur when a service is executed may require different resource footprints to achieve the same performance, and thus, on different machines, The minimum resources required for execution may also be different, namely:
wherein,representing execution of atomic microservices->Optional resource requirement, maximum data size +.>And minimum data size->The values are set according to practical experience, so that the balance between the resource requirement and the processing efficiency is maintained in the reference running environment.
For maximum overall delayIt can be subdivided into maximum processing delay, maximum response delay and maximum transmission delay, namely:
wherein,for maximum processing delay, the maximum processing delay refers to +.>The processing size in the reference operating environment is +.>During the data of (a) the desired delay taken from the start of the processing to the correct result is +.>For maximum response time delay, maximum response time delay refers to +.>On the reference operating environment, the upper bound of the time delay from the complete acquisition of the input data to the actual start of processing is->For maximum transmission delay, the maximum transmission delay means +.>The expected maximum time delay from the start of transmission of the result data to the completion of the transmission of the last frame of result data on a reference operating environment comprising computing resources and network resources, which can be set according to the service characteristics, for- >And +.>The setting of (2) will be determined according to the characteristics of the service itself, and may also be given in combination with actual experience.
For minimum overall delayAnd can be subdivided into the mostSmall processing delay, minimum response delay and minimum transmission delay, namely:
wherein,for minimum processing delay, minimum processing delay refers to +.>The processing size in the reference operating environment is +.>The desired delay from the start of the processing to the time of obtaining the correct result, S is the minimum response delay, the minimum response delay referring to +.>In the reference operating environment, the lower bound of the time delay from the complete acquisition of the input data to the actual start of processing is->For minimum transmission delay, minimum transmission delay means +.>In the reference operating environment, a desired minimum delay from the start of transmission of the result data to the completion of transmission of the last frame of the result data is expected forAnd +.>The setting of (2) will be determined according to the characteristics of the service itself, and may also be given in combination with actual experience.
In some embodimentsIn an example, for a determined subtask SF i And the subtask data amount D that it needs to process, the system will generate a data amount capable of processing SF i And satisfies the maximum data volume that can be processedIs a set of selectable microservices:
Wherein,to process subtask SF i Is a set of optional microservices of any one atomic microservice +.>All can independently complete subtasks SF i And have the same input-output type.
When the service is deployed to the node, the host n needs to use the current available resourcesAllocating not less than performing subtasks SF i Minimum resource requirement->To support service operations, otherwise host n will refuse to provide service.
When the service execution is completed, the subtask SF is completed i After that, the host n will immediately reclaim the resources it occupies, and the result of execution will be saved in the host n (if the execution is abnormally finished, the result of execution will be the input data of the execution exception service, so that other machines can be assigned to complete the service by the dispatch center later), if the task dispatch center is not in a fixed length of time eta Results wait Internally responding to the result (i.e. indicating the machine to be executed next, facilitating the current machine to upload the result of this stepTo the next executing machine, constituting a workflow), host n discards the result.
The network topology between the serving entity nodes is known, the available bandwidth of the network links of serving entity node i to serving entity node k It is known that the queuing delay of the data packets of the service entity node i is +.>The available bandwidth, the network topology and the data packet queuing delay are known to be refreshed within a fixed time interval, so that the service entity node is ensured to grasp the network state. Use matrix +.>Representing the available bandwidth of all network links, pa= (PA) ik )∈{0,1} N×N As a network reachability matrix (pa if there is a direct link between serving entity node i and serving entity node k ik =1, and vice versa 0), qt= { QT is used i I e N represents the packet queuing delay for all serving entity nodes.
For the best path for transmitting data of size O from serving entity node i to serving entity node k (i+.k)And corresponding total transmission delay->The method can be obtained by the following steps:
if the pa is ik =1, the best path is defined asThe total time delay is as follows:
if pa is ik =0, then calculateObtaining a directed graph NET (N, V) according to the PA, wherein N represents a service entity node, m and N are both service entity nodes, and +.>Representing the total delay from the serving entity node m to the serving entity node n.
Wherein pa is mn Representing the network reachability matrix from service entity node m to service entity node n, if V mn And is not 0, it means that there is a weight V from the service entity node m to the service entity node n mn And the value of which is the total time O is transmitted from serving entity node m to serving entity node n.
After obtaining g=net (N, V), using Dijkstra algorithm to obtain an optimal transmission path of data from service entity node i to service entity node kThe total delay of transmission of O from i to k can be obtained at the same time:
wherein i, m, n and k are all service entity nodes, and when i=k, the current transmission is independent of the network, and then
According to the scene definition, the formula definition and the like, the optimization target and the constraint condition of the service combination optimization algorithm in the embodiment of the application are obtained, and the optimization target and the constraint condition are specifically represented by the following formulas:
if it isThen->
If->Then->
For equation (3) above, which is the optimization objective, it is indicated that the maximization of the accumulation is requiredAnd->
Representing user to task SF i The degree of satisfaction in the processing time (total execution time excluding the time of data preparation, waiting for processing and data transmission) is as high as possible, and can be obtained by the following formula:
wherein,representing user's +.>Satisfaction in treatment time, +.>Binary labels 0 and 1 for labeling whether +.>Processing SF i ,/>For atomic microservice->Is used to determine the actual processing delay of the (c) signal,representing from- >After completion of the preparation work, SF is processed from the beginning i Processing completion SF i Time spent,/->For the maximum processing delay +.>Is the minimum processing delay.
Representing user to service SF i The satisfaction degree of the speed responding to the user request is better as the value is larger, and the satisfaction degree is obtained through the following formula:
wherein,representing user's +.>Satisfaction in time of responding to user request, < >>Binary labels 0 and 1 for labeling whether +.>Processing SF i ,/>For atomic microservice->Is>Representing from->Receives the processing request until the SF is actually processed i Time delay between->For the maximum response time delay, +.>Is the minimum response delay.
For equation (4) above, which is the optimization objective, it is indicated that the maximization of the accumulation is requiredIndicating the user to SF i The result of the execution is transmitted to the processing SF i+1 The greater the value, the better the satisfaction of the time of transmission of the data packet in the network, and can be obtained by:
wherein,representing user's execution of atomic micro-service>Afterwards, the satisfaction of the time of the transmission of the data packet in the network,/->Binary labels 0 and 1 for labeling whether +. >Processing SF i ,/>Is atomic micro-service->Transmission delay of->Representing from->After execution, SF is completed i The time delay of the complete transmission of the resulting data from m to n, and (2)>Can be obtained by calculation of the above formula (1) and formula (2).
For equation (5) above, which is an optimization objective, it represents the resource allocation situation where a balanced system is required, where P n (n.epsilon.N) represents the sum of resources of the service entity node NSpecific gravity occupied in system resources, +.>N represents a set containing all the service entity nodes, being the available resources of the service entity node N.
For the above formula (6), it is a constraint condition in which,to process subtask SF i Is represented by formula (6) for any task SF i It is necessary to find atomic micro-services that can satisfy the task function.
For the above formula (7), it is a constraint condition in whichRepresenting atomic microservices->Whether or not deployed to node N (n.epsilon.N), equation (7) represents the available resources of only the current computing node ∈N>Greater than the required resource->Atomic microservices can be deployed on nodes at that time.
The above formula (8) is used as a constraintRepresenting atomic microservices->Whether or not deployed to node N (n.epsilon.N), equation (8) represents the SF for the task i Must be able to find an atomic microservice +.>Which is to be deployed on a certain service entity node n to perform a task SF i Is a task request of (1).
For the above formula (9), it is a constraint condition in whichRepresenting atomic microservices->Whether or not to be deployed to node N (n.epsilon.N), use ∈ ->Representing atomic microservices->Whether or not deployed to node m (mεN), equation (9) represents SF i Will only be served by atomic micro-services deployed on a single service entity node n>And (5) processing.
Under the condition of defining the above scenes, optimizing targets and constraint conditions, the system models the above optimization problems as MDP problems, and solves the problems by using a reinforcement learning mode. Since the optimization objective is related to the choice of micro-services and nodes, the reinforcement learning model generated by the MDP problem can find the atomic micro-services under the condition that the constraint condition is satisfiedAnd a combination of service entity nodes n, so that the combination thereof can maximize the optimization objective as much as possible, namely, the above formula (3), the above formula (4), and the above formula (5).
In some embodiments, step S103 may include, but is not limited to including, step S201 to step S203:
step S201, initializing a dependency graph;
step S202, task dependency relations among all the subtasks are obtained according to the subtask data;
Step S203, constructing a dependency graph according to the task dependency.
In steps S201 to S203 of some embodiments, task RS is decomposed to obtain a plurality of subtasks SF i And corresponding subtask data including subtask category data, subtask number data, and subtaskExecuting data and the like, and constructing a dependency relationship graph according to task dependency relationships among all sub-tasks, wherein the dependency relationship graph is specifically represented by the following formula:
G de =(N,V,W)
N=RS
V={v ij ,v jk ...,}
wherein G is de Representing a dependency graph, wherein RS is a task, N is a node set, each node in the node set is a subtask in the task RS, V is an edge set, V ij Representing subtask SF i And subtask SF j There is a sequence relation between SF i Is the starting point and is SF j Front-end tasks of SF i And SF (sulfur hexafluoride) j Respectively an ith subtask and a jth subtask in the RS, W is node additional information,respectively correspond to the current subtasks SF i Is the integrated processing time of SF i Result waiting time of processing completion, processing SF i Is a service, node carrying the service and SF i Is executed by the processor.
SF as described above i The execution state of (1) includes:
and (3) to be executed: indicating that the current task has a pre-task which is not completed and does not meet the execution condition
The method can be performed as follows: indicating that all the front-end tasks are completed and the tasks meet the execution conditions and can be scheduled
Is executing: indicating that the scheduling of the task is completed currently and the task is not executed yet to finish
Normal execution ends: indicating that the current task is normally executed and obtaining a correct result.
Abnormal execution ends: indicating that the current task is completed in abnormal execution and the correct result is not obtained.
In some embodiments, step S104 may include, but is not limited to including, step S301 to step S302:
step S301, constructing a multi-objective Markov decision process MDP model, and training the MDP model in a deep reinforcement learning mode;
step S302, traversing the dependency graph, and selecting services and nodes through an MDP model to obtain an optimal service scheduling strategy.
In step S301 of some embodiments, in consideration of markov nature naturally existing between task execution, in the task scheduling, the load of the node is balanced while the corresponding QOE related index is maximized as much as possible, so as to meet the optimization objective as much as possible. The optimization problem is modeled as an action selection problem: for the current task SF i The data size that needs to be processed is known to beHow to select the proper<Service-node>Action to cause SF i Can be performed while maximizing the above +_ as much as possible while guaranteeing system load balancing >And->
The multi-objective Markov decision process MDP model is specifically represented by the following formula:
MOMDP={SS,A,P,R,γ,ω,D ω }
wherein MOMDP represents an MDP model, SS is a state set in a service scheduling process, A is an action, P is a probability that executing action A under the state SS obtains rewards R and changes the state, R is rewards of executing action A under the state SS, gamma is a discount rate, gamma epsilon [0,1 ]]Gamma denotes historical rewards and impact on the future, bigger denotes the more important past selections, ω is the task allocation preference parameter, ω=<ω Processing/responding ,ω Network system ,ω (Resource) >Representation systemPreference degree, omega, of each optimization objective Processing/responding Preference parameters, ω, for processing time and response time of the service Network system To perform preference parameters, ω, for network conditions when a service is performed (Resource) Preference parameter for resource allocation situation when service is executed, D ω Is the probability distribution of ω;
wherein the SS k SF for the kth state in SS k Is in state SS k Corresponding subtasks to be executed currently, SF k-1 For the current subtask SF k Is used for the pre-task of (1),for subtask SF k-1 Data amount to be processed, +.>To represent the execution of the pre-task SF k-1 N represents a set comprising all service entity nodes, +>To independently complete subtask SF k And having the same input-output type, set of atomic micro-services,/->For subtask SF k Minimum resource requirement for execution, +.>Indicating whether there is a satisfactory subtask SF k Atomic micro-service performing the minimum resource requirements needed +.>For the available resources of the service entity node n, < +.>Indicating whether available service entity nodes n exist, BA is the bandwidth of all links, PA is a network reachable matrix, QT is the data packet queuing delay of all nodes, SF k+1 For the current subtask SF k Post tasks of (2);
wherein A is k To perform subtask SF k Is used for the action of (a),to independently complete subtask SF k And having the same input-output type, set of atomic micro-services,/->For atomic micro-services, k identifies subtasks, j identifies atomic micro-services are in the collectionBit sequence of->Representing atomic microservices->May be deployed on a service entity node N to provide a service, N representing a set containing all service entity nodes;
wherein,is shown in the current state SS k Under, select A kj Action, get rewards/>And observe the state change to SS k+1 Probability of (2);
wherein,is a bonus vector, represented in state SS k Next, executing action A kj The resulting instant prize is awarded to the user,representing user's +. >Satisfaction in treatment time, +.>Representing user's +.>Satisfaction in time of responding to user request, < >>Representing user's execution of atomic micro-service>Afterwards, the satisfaction degree of the transmission time of the data packet in the network; />
Wherein,representing available resources of the serving entity node n, < +.>P is the sum of resources of the service entity node n n Resource sum representing service entity node n +.>The weight occupied in the system resource, N, represents the set containing all the service entity nodes.
In the embodiment of the present application, multiple objectives are represented by preference vectors, i.e. the task allocation preference parameter ω is added, most of conventional reinforcement learning does not have a preference concept, and the reward function used by the conventional reinforcement learning is a value function, and is also a function with a given state and action, and can output only a single value. The user preference vector is formed by selecting a user from a preset task distribution, and the single-value reward function in the traditional sense can be obtained by carrying out inner product on the vector output by the reward function and the user preference vector.
By introducing the user preference vector, the user can select the emphasis point of the task to be completed, such as shorter time, lower system load and the like, and then input the user preference vector, the subtask to be completed and the system state into the training completion model, so that a subtask processing scheme which meets the user preference as far as possible on the premise of completing the user task can be obtained. For example, the reward function is [ goal 1, goal 2, goal 3], the user's preference may be similar to [1,2,3], where 1,2,3 in [1,2,3] are weights assigned to goal 1, goal 2, and goal 3, respectively, by the user so that the user preference vector may indicate that the user is more focused on the third goal.
Each subtask can be provided with different preferences, the system can select a proper scheme to execute according to the subtask which is required to be executed currently and the corresponding user preference vector, and the combined services in the edge service combination method are provided by edge equipment, so that the real-time performance of data processing can be enhanced.
In some embodiments, step S301 may include, but is not limited to including, step S401 to step S405:
step S401, setting an optimization target and constraint conditions;
Step S402, a service scheduling database is established;
step S403, acquiring historical service scheduling data, and storing the historical service scheduling data into a service scheduling database;
step S404, adopting a deep reinforcement learning mode to call historical service scheduling data from a service scheduling database, carrying out iterative training on the MDP model according to an optimization target and constraint conditions, and calculating a loss function;
and step S405, when the error result reflected by the loss function is within a preset range or the iterative training times reach a preset time threshold, stopping training the MDP model.
In step S405 of some embodiments, for the smoothness of the loss function, the loss function is set by using the homotopy optimization method, specifically expressed by the following formula:
L(θ)=(1-λ)L A (θ)+λL B (θ)(λ∈[0,1])
taking into account L A The optimal solution front for (θ) is discrete, resulting in a steep function, making optimization more difficult, and thereforeIntroduction of L B (θ) as a smoothing term, wherein L A (θ) is a time-series differential mean square error of the Q value of the network output, θ is a model parameter, L B (θ) is a function of the values that will be smoother, ω and ω ' are the set preference vectors, Q (ss, a, ω; θ) is the value of selecting action a in the task current state ss, ss and ss ' are the task states, a is the action, y is the approximation of Q (ss, a, ω; θ), Q ' (ss ', a, ω '; θ) k ) The Q value representing the next state of maximum inner product with y is the current prize value, r is the current prize value, λ is initially 0, and then exponentially increasing, i.e., λ=λ (discrete factor) at the current selected preference ω step ,discount factor>Step is the number of iterations of the model.
In some embodiments, step S302 may include, but is not limited to including, step S501 through step S504:
step S501, setting a time period of service scheduling;
step S502, when the state setting period is in, detecting the service execution state corresponding to each subtask, modifying the task state of each subtask, and updating the task state;
step S503, traversing the dependency graph when the service scheduling period is in, detecting node additional information corresponding to each subtask, determining a scheduling task, selecting a service and a node for executing the scheduling task through an MDP model, and determining a service scheduling strategy of the scheduling task;
step S504, when in the information storage period, collects and stores history service schedule data.
In step S501 of some embodiments, the time period ψ includes a state setting period ψ i,check Service scheduling period ψ i,chor Information storage period ψ i,model The method comprises the steps of carrying out a first treatment on the surface of the Every time the system passes a fixed time period psi, the system execution state is checked once, and the next task to be executed is arranged according to the task execution state.
In step S502 of some embodiments, a period ψ is set in the state i,check Middle pair subtasksBased on the state of its service execution, including normal completion of execution and abnormal end of execution, in particular:
the normal completion execution: i.e. the service has completed processing the task, at which point the task SF it is performing k The state of (a) is changed from being executed to normal execution ending, and a timer theta is initialized k And take SF as k Side v as starting point k. The task of the corresponding endpoint is changed to an executable state.
Abnormal execution ends: i.e. an exception in service execution, including service exception exit, total service processing time exceeding eta Maximum time delay . At this time, SF is taken k The state of (a) is changed from being executed to abnormal execution ending, and a timer theta is initialized k
In step S503 of some embodiments, the task is scheduled as a subtask that is scheduled for service in the service scheduling period and that is executed after the service scheduling period is ended; historical service schedule data is used for training by the MDP model. After the update of the task state is completed in step S502, the service scheduling period ψ is entered i,chor Traversing the dependency graph, searching whether a task with abnormal execution ended exists or not according to node additional information W of the node corresponding to each subtask, and if not, corresponding theta k The highest subtask SF k If the task is empty, traversing the W again, searching the subtasks in an executable state and taking the subtasks as the tasks (scheduling tasks) needing to be scheduled.
Subsequently, assume that SF is determined this time k As tasks that need to be scheduled (orchestration tasks), the system will collect system state, including:
front-end task SF requiring processing k-1 Results data of (a)
Temporary storageIs->Or->According to SF k If SF is determined to be in the state of k Is executable, thenShould be temporarily present +.>Conversely, SF should be performed before temporary existence k Failed machine->Applying;
and->Wherein (1)>Is an optional task set,/->Is the minimum amount of resources required for the service, +.>Is a node available resource;
system network conditions: the BA, the PA and the QT.
After the system state is collected, the system sets proper task distribution preference parameters omega according to the current execution time and system load condition of the task RS, inputs the collected system state and omega into a model and selects the next combinationCombination of adapted atomic micro-services and service entity nodes
After the combination is obtained, n loading is requiredAnd requires n to allocate enough resources, then indicates temporary storage +. >Will->Transmitting to n, end θ k Is to re-clock SF k+1 Is being executed to complete the SF k+1 Is arranged in the sequence of the steps. If no executable task is currently found or no appropriate task execution combination is found, this stage is skipped.
In step S504 of some embodiments, after completing the service schedule of the orchestration task, the system enters an information storage period ψ i,model The period requires storing the current collected system state as historical service scheduling data, facilitating iterative training of subsequent models. The historical service schedule data includes information such as the state observed by the current system and the accumulated time of service operation, specifically, at a certain psi i,model In the period, if the previous system is psi in the same time period i,chor Periodically arrange tasks SF k The corresponding observed subtask state SS k And action A taken kj Stored in the service dispatch database and processed by the pre-task SF k-1 Atomic microservice of (2)In the execution of (2), the pre-task SF is calculated by combining the above expression (11), the above expression (13), the above expression (15) and the above expression (16) k-1 Corresponding reward vector->And stored in a service dispatch database.
The MDP model then trains by randomly sampling pairs of sequential state actions (historical service schedule data) from the service schedule database, specifically:
Wherein the SS k-1 For the prepositive task SF k-1 Task state of A k-1p For the prepositive task SF k-1 The action of the selection is performed,for the prepositive task SF k-1 In state SS k-1 Select action A k-1p Obtained bonus vector, SS k For the current task SF k Is a task state of (a).
Referring to fig. 2, fig. 2 is an algorithm flow diagram of an alternative model training manner of the MDP model in the embodiment of the present application, after a sufficient number of pairs of continuous state actions are stored in the service scheduling database, the MDP model starts to collect data of the action experience set for training, until the error result reflected by the loss function is within a preset range or the number of iterative training reaches a preset number threshold, the training of the MDP model is stopped, and the training is ended.
In step S502 of some embodiments, when it is detected that the service execution state corresponding to each subtask is completed, the execution of the service scheduling operation on the current task RS is stopped, and task request data corresponding to the next task that issues a service request is received.
Referring to fig. 3, fig. 3 is an optional flowchart for determining an optimal service scheduling policy by using a dependency graph and an MDP model according to an embodiment of the present application, where the method includes the following steps:
decomposing task request data of the data-intensive tasks, and establishing a dependency relationship graph according to task dependency relationships among all subtasks;
Detecting service execution states of all the subtasks, and updating task states of all the subtasks;
traversing the dependency graph, and determining an orchestration task;
determining an optimal service scheduling combination for arranging tasks through an MDP model;
calculating historical service scheduling data generated by selecting the optimal service scheduling combination, and storing the historical service scheduling data into a service scheduling database;
performing service scheduling of arranging tasks;
detecting the service execution state of each subtask in the next time period;
and judging whether the data-intensive task is executed or not.
According to the data-intensive task edge service combination method based on multi-objective reinforcement learning, service requests are received, task request data are obtained, the task request data are decomposed, a plurality of subtask data are obtained, a dependency graph of task execution flow is built, an optimal service scheduling strategy is determined in combination with a service combination optimization algorithm, and service scheduling is performed to execute tasks. According to the method and the system, the advantages of edge calculation are fully utilized by a service combination optimization algorithm, the calculation resources and the network resources of the edge nodes are reasonably combined and cooperated, limited resources are effectively utilized, resource waste is avoided, the condition that resources among the edge nodes occupy imbalance is reduced, the efficiency and the performance of task processing are optimized while the node load is balanced, real-time efficient processing of data-intensive tasks is achieved, and the production efficiency and the resource utilization efficiency are improved.
The embodiments described in the embodiments of the present application are for more clearly describing the technical solutions of the embodiments of the present application, and do not constitute a limitation on the technical solutions provided by the embodiments of the present application, and as those skilled in the art can know that, with the evolution of technology and the appearance of new application scenarios, the technical solutions provided by the embodiments of the present application are equally applicable to similar technical problems.
It will be appreciated by those skilled in the art that the technical solutions shown in the figures do not constitute limitations of the embodiments of the present application, and may include more or fewer steps than shown, or may combine certain steps, or different steps.
Preferred embodiments of the present application are described above with reference to the accompanying drawings, and thus do not limit the scope of the claims of the embodiments of the present application. Any modifications, equivalent substitutions and improvements made by those skilled in the art without departing from the scope and spirit of the embodiments of the present application shall fall within the scope of the claims of the embodiments of the present application.

Claims (10)

1. A data-intensive task edge service combining method based on multi-objective reinforcement learning, comprising:
receiving a service request and acquiring task request data; the task request data is relevant task data corresponding to the data-intensive task sending out the service request;
Decomposing the task request data to obtain a plurality of subtask data;
constructing a dependency graph of a task execution flow according to the plurality of subtask data;
determining an optimal service scheduling strategy according to a preset service combination optimization algorithm and the dependency graph;
and carrying out service scheduling according to the optimal service scheduling strategy so as to execute the data-intensive task.
2. The data-intensive task edge service combining method of claim 1, wherein the subtask data comprises subtask category data, subtask number data, and subtask execution data.
3. The method for combining data-intensive task edge services according to claim 1, wherein the step of constructing a dependency graph of task execution flow from a plurality of subtask data specifically comprises:
initializing the dependency graph;
acquiring task dependency relations among all the subtasks according to the plurality of subtask data;
and constructing the dependency graph according to the task dependency.
4. The method for combining data-intensive task edge services according to claim 1, wherein the step of determining an optimal service scheduling policy according to a preset service combination optimization algorithm and the dependency graph specifically comprises:
Constructing a multi-target Markov decision process MDP model, and training the MDP model in a deep reinforcement learning mode;
and traversing the dependency graph, and selecting services and nodes through the MDP model to obtain the optimal service scheduling strategy.
5. The method of claim 4, wherein the dependency graph is specifically represented by the following formula:
G de =(N,V,W)
N=RS
V={v ij ,v jk ...,}
wherein G is de Representing a dependency graph, wherein RS is a task, N is a node set, each node in the node set is a subtask in the task RS, V is an edge set, V ij Representing subtask SF i And subtask SF j There is a sequence relation between SF i Is the starting point and is SF j Front-end tasks of SF i And SF (sulfur hexafluoride) j Respectively an ith subtask and a jth subtask in the RS, W is node additional information,respectively correspond to the current subtasks SF i Is used for the integrated processing time of the (a),SF i result waiting time of processing completion, processing SF i Is a service, node carrying the service and SF i Is executed by the processor.
6. The data-intensive task edge service combining method of claim 4, wherein the MDP model is specifically represented by the following formula:
MOMDP={SS,A,P,R,γ,ω,D ω }
wherein MOMDP represents an MDP model, SS is a state set in a service scheduling process, A is an action, P is a probability that executing action A under the state SS obtains rewards R and changes the state, R is rewards of executing action A under the state SS, gamma is a discount rate, gamma epsilon [0,1 ] ]Gamma denotes historical rewards and impact on the future, bigger denotes the more important past selections, ω is the task allocation preference parameter, ω=<ω Processing/responding ,ω Network system ,ω (Resource) >Represents the preference degree omega of the system for each optimization target Processing/responding Preference parameters, ω, for processing time and response time of the service Network system To perform preference parameters, ω, for network conditions when a service is performed (Resource) Preference parameter for resource allocation situation when service is executed, D ω Is the probability distribution of ω;
wherein the SS k SF for the kth state in SS k Is in state SS k Corresponding subtasks to be executed currently, SF k-1 For the current subtask SF k Is used for the pre-task of (1),for subtask SF k-1 Data amount to be processed, +.>To represent execution of the pre-taskSF k-1 N represents a set comprising all service entity nodes, +>To independently complete subtask SF k And having the same input-output type, set of atomic micro-services,/->For subtask SF k Minimum resource requirement for execution, +.>Indicating whether there is a satisfactory subtask SF k Atomic micro-service performing the minimum resource requirements needed +.> For the available resources of the service entity node n, < +.>Indicating whether available service entity nodes n exist, BA is the bandwidth of all links, PA is a network reachable matrix, QT is the data packet queuing delay of all nodes, SF k+1 For the current subtask SF k Post tasks of (2);
wherein A is k To perform subtask SF k Is used for the action of (a),to independently complete subtask SF k And atomic microservices with the same input-output typeSet of->For atomic micro-services, k identifies subtasks, j identifies that atomic micro-services are in the set +.>Bit sequence of->Representing atomic microservices->May be deployed on a service entity node N to provide a service, N representing a set containing all service entity nodes;
wherein,is shown in the current state SS k Under, select A kj Action, get rewarded->And observe the state change to SS k+1 Probability of (2);
wherein,is a bonus vector, represented in state SS k Next, executing action A kj The resulting instant prize is awarded to the user,representing user's +.>Satisfaction in treatment time, +.>Representing user's +.>Satisfaction in time of responding to user request, < >>Is a user's execution completion atomic micro-service->The satisfaction of the time the data packet is transmitted in the network is then determined.
7. The method for data-intensive task edge service combining as defined in claim 4, wherein the step of constructing a multi-objective markov decision process MDP model and training the MDP model by deep reinforcement learning comprises:
Setting an optimization target and constraint conditions;
establishing a service scheduling database;
acquiring historical service scheduling data and storing the historical service scheduling data into the service scheduling database;
invoking the historical service scheduling data from the service scheduling database by adopting a deep reinforcement learning mode, performing iterative training on the MDP model according to the optimization target and the constraint condition, and calculating a loss function;
and stopping training the MDP model when the error result reflected by the loss function is in a preset range or the iterative training frequency reaches a preset frequency threshold value.
8. The method of claim 7, wherein the loss function is set by using a homolun optimization method, and is specifically represented by the following formula:
L(θ)=(1-λ)L A (θ)+λL B (θ)(λ∈[0,1])
wherein L is A (θ) is a time-series differential mean square error of the Q value of the network output, θ is a model parameter, L B (θ) is a function of the values that will be smoother, ω and ω ' are the set preference vectors, Q (ss, a, ω; θ) is the value of selecting action a at task current states ss and ω, ss and ss ' are task states, a is action, y is an approximation of Q (ss, a, ω; θ), Q ' (ss ', a, ω '; θ) k ) The Q value representing the next state of maximum inner product with y under the current selected preference ω, r is the current prize value, λ is initially 0, and λ=λ is increased exponentially thereafter step ,discount factor>Step is the number of iterations of the model.
9. The method for combining data-intensive task edge services according to claim 5, wherein the step of traversing the dependency graph, performing service and node selection through the MDP model, and obtaining the optimal service scheduling policy specifically comprises:
setting a time period of service scheduling; the time period comprises a state setting period, a service scheduling period and an information storage period;
when the state setting period is in, detecting the service execution state corresponding to each subtask, modifying the task state of each subtask, and updating the task state;
when the service scheduling period is in, traversing the dependency graph, detecting node additional information corresponding to each subtask, determining an arranging task, selecting a service and a node for executing the arranging task through the MDP model, and determining a service scheduling strategy of the arranging task; the scheduling task is a subtask which is used for carrying out service scheduling in the service scheduling period and is executed after the service scheduling period is finished;
Collecting and storing historical service schedule data while in the information storage period; the historical service schedule data is used for training by the MDP model.
10. The method of claim 9, wherein when executing the step of detecting a state of service execution corresponding to each subtask while in the state setting cycle, modifying a task state of each subtask, and updating the task state, further comprising:
and stopping executing service scheduling operation on the current task when the state of service execution corresponding to each subtask is detected to be completed, and receiving task request data corresponding to the next task sending out a service request.
CN202311038793.1A 2023-08-16 2023-08-16 Data-intensive task edge service combination method based on multi-objective reinforcement learning Pending CN117255126A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311038793.1A CN117255126A (en) 2023-08-16 2023-08-16 Data-intensive task edge service combination method based on multi-objective reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311038793.1A CN117255126A (en) 2023-08-16 2023-08-16 Data-intensive task edge service combination method based on multi-objective reinforcement learning

Publications (1)

Publication Number Publication Date
CN117255126A true CN117255126A (en) 2023-12-19

Family

ID=89128417

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311038793.1A Pending CN117255126A (en) 2023-08-16 2023-08-16 Data-intensive task edge service combination method based on multi-objective reinforcement learning

Country Status (1)

Country Link
CN (1) CN117255126A (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180041905A1 (en) * 2016-08-05 2018-02-08 Nxgen Partners Ip, Llc Ultra-broadband virtualized telecom and internet
CN113282368A (en) * 2021-05-25 2021-08-20 国网湖北省电力有限公司检修公司 Edge computing resource scheduling method for substation inspection
CN113641447A (en) * 2021-07-16 2021-11-12 北京师范大学珠海校区 Online learning type scheduling method based on container layer dependency relationship in edge calculation
CN113822456A (en) * 2020-06-18 2021-12-21 复旦大学 Service combination optimization deployment method based on deep reinforcement learning in cloud and mist mixed environment
CN113873022A (en) * 2021-09-23 2021-12-31 中国科学院上海微系统与信息技术研究所 Mobile edge network intelligent resource allocation method capable of dividing tasks
US20220035878A1 (en) * 2021-10-19 2022-02-03 Intel Corporation Framework for optimization of machine learning architectures
WO2022171082A1 (en) * 2021-02-10 2022-08-18 中国移动通信有限公司研究院 Information processing method, apparatus, system, electronic device and storage medium
CN115022332A (en) * 2022-05-30 2022-09-06 广西师范大学 Dynamic service placement method based on deep reinforcement learning in edge calculation
CN115604768A (en) * 2022-11-30 2023-01-13 成都中星世通电子科技有限公司(Cn) Electromagnetic perception task dynamic migration method, system and terminal based on resource state
CN115714820A (en) * 2022-11-14 2023-02-24 北方工业大学 Distributed micro-service scheduling optimization method
WO2023091664A1 (en) * 2021-11-19 2023-05-25 Intel Corporation Radio access network intelligent application manager
CN116489226A (en) * 2023-04-25 2023-07-25 重庆邮电大学 Online resource scheduling method for guaranteeing service quality

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180041905A1 (en) * 2016-08-05 2018-02-08 Nxgen Partners Ip, Llc Ultra-broadband virtualized telecom and internet
CN113822456A (en) * 2020-06-18 2021-12-21 复旦大学 Service combination optimization deployment method based on deep reinforcement learning in cloud and mist mixed environment
WO2022171082A1 (en) * 2021-02-10 2022-08-18 中国移动通信有限公司研究院 Information processing method, apparatus, system, electronic device and storage medium
CN113282368A (en) * 2021-05-25 2021-08-20 国网湖北省电力有限公司检修公司 Edge computing resource scheduling method for substation inspection
CN113641447A (en) * 2021-07-16 2021-11-12 北京师范大学珠海校区 Online learning type scheduling method based on container layer dependency relationship in edge calculation
CN113873022A (en) * 2021-09-23 2021-12-31 中国科学院上海微系统与信息技术研究所 Mobile edge network intelligent resource allocation method capable of dividing tasks
US20220035878A1 (en) * 2021-10-19 2022-02-03 Intel Corporation Framework for optimization of machine learning architectures
WO2023091664A1 (en) * 2021-11-19 2023-05-25 Intel Corporation Radio access network intelligent application manager
CN115022332A (en) * 2022-05-30 2022-09-06 广西师范大学 Dynamic service placement method based on deep reinforcement learning in edge calculation
CN115714820A (en) * 2022-11-14 2023-02-24 北方工业大学 Distributed micro-service scheduling optimization method
CN115604768A (en) * 2022-11-30 2023-01-13 成都中星世通电子科技有限公司(Cn) Electromagnetic perception task dynamic migration method, system and terminal based on resource state
CN116489226A (en) * 2023-04-25 2023-07-25 重庆邮电大学 Online resource scheduling method for guaranteeing service quality

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
乐光学;戴亚盛;杨晓慧;刘建华;游真旭;朱友康;: "边缘计算可信协同服务策略建模", 计算机研究与发展, no. 05, 15 May 2020 (2020-05-15) *
刘伟;黄宇成;杜薇;王伟;: "移动边缘计算中资源受限的串行任务卸载策略", 软件学报, no. 06, 8 June 2020 (2020-06-08) *
白昱阳;黄彦浩;陈思远;张俊;李柏青;王飞跃;: "云边智能:电力系统运行控制的边缘计算方法及其应用现状与展望", 自动化学报, no. 03, 15 March 2020 (2020-03-15) *

Similar Documents

Publication Publication Date Title
CN109491790B (en) Container-based industrial Internet of things edge computing resource allocation method and system
Zhang et al. A-SARSA: A predictive container auto-scaling algorithm based on reinforcement learning
CN113824489B (en) Satellite network resource dynamic allocation method, system and device based on deep learning
CN111274036A (en) Deep learning task scheduling method based on speed prediction
WO2020186872A1 (en) Expense optimization scheduling method for deadline constraint under cloud scientific workflow
CN114253735B (en) Task processing method and device and related equipment
CN116069512B (en) Serverless efficient resource allocation method and system based on reinforcement learning
CN116662010B (en) Dynamic resource allocation method and system based on distributed system environment
CN110086855A (en) Spark task Intellisense dispatching method based on ant group algorithm
CN115934333A (en) Historical data perception-based cloud computing resource scheduling method and system
Ibn-Khedher et al. Edge computing assisted autonomous driving using artificial intelligence
CN106502790A (en) A kind of task distribution optimization method based on data distribution
KR20230007941A (en) Edge computational task offloading scheme using reinforcement learning for IIoT scenario
Badri et al. A sample average approximation-based parallel algorithm for application placement in edge computing systems
AlOrbani et al. Load balancing and resource allocation in smart cities using reinforcement learning
CN117349026B (en) Distributed computing power scheduling system for AIGC model training
Bensalem et al. Scaling Serverless Functions in Edge Networks: A Reinforcement Learning Approach
CN113205128A (en) Distributed deep learning performance guarantee method based on serverless computing
CN117311973A (en) Computing device scheduling method and device, nonvolatile storage medium and electronic device
Bensalem et al. Towards optimal serverless function scaling in edge computing network
CN117255126A (en) Data-intensive task edge service combination method based on multi-objective reinforcement learning
CN109298932B (en) OpenFlow-based resource scheduling method, scheduler and system
CN116109058A (en) Substation inspection management method and device based on deep reinforcement learning
CN115766475A (en) Semi-asynchronous power federal learning network based on communication efficiency and communication method thereof
CN111367632B (en) Container cloud scheduling method based on periodic characteristics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination