CN116436797A

CN116436797A - Task scheduling model training method and device, task scheduling method and electronic equipment

Info

Publication number: CN116436797A
Application number: CN202310303602.3A
Authority: CN
Inventors: 杨术; 崔来中; 张利民; 常晓磊
Original assignee: Shenzhen University
Current assignee: Shenzhen University
Priority date: 2023-03-16
Filing date: 2023-03-16
Publication date: 2023-07-14

Abstract

The application is applicable to the technical field of the Internet of things, and provides a task scheduling model training method, a device, a task scheduling method and electronic equipment, wherein the task scheduling model is applied to mobile edge calculation, and the task scheduling model training method comprises the following steps: acquiring a first operation time length of a task in an edge server in a first time period, wherein the first time period is determined according to a first time node; updating parameters of the task scheduling model according to the first running time of the task; according to the updated task scheduling model, deploying the task to be processed to the edge server; determining a second operation time length of the deployed task in a second period, determining rewards according to the second operation time length, and iteratively updating parameters of the task scheduling model until the task scheduling model meets preset requirements. The method and the device can adapt to different task scheduling requirements, reduce the time of task scheduling service and provide low-delay task scheduling service for users.

Description

Task scheduling model training method and device, task scheduling method and electronic equipment

Technical Field

The application belongs to the technical field of the Internet of things, and particularly relates to a task scheduling model training method and device, a task scheduling method and electronic equipment.

Background

With the continuous increase of intelligent devices and the rapid development of the industry of the internet of things, the continuous increase of user devices connected into the internet of things is realized, and the continuous increase of the variety of user tasks is realized. In order to solve the problems of mass user equipment connection and user task diversity improvement in the future, a mobile edge computing platform is designed, and a traditional mobile edge computing platform adopts a heuristic algorithm or a task unloading and scheduling algorithm based on reinforcement learning, so that task scheduling service is provided for a large number of users, and computation is completed. Although the traditional mobile edge computing platform can rapidly and efficiently solve the task scheduling problem under most conditions, when task scheduling is carried out on massive, diversified, randomly-changed and dynamically-changed task requests, task scheduling service capable of rapidly and efficiently completing tasks cannot be provided, and when complex task requests are carried out, the generated task scheduling service can complete tasks only by spending a great deal of time, and is low in efficiency.

Disclosure of Invention

The embodiment of the application provides a task scheduling model training method and device, a task scheduling method and electronic equipment, and can solve the problem of high delay of task scheduling service.

In a first aspect, an embodiment of the present application provides a task scheduling model training method, where the task scheduling model is applied to mobile edge calculation, and the task scheduling model training method includes the following steps:

in one possible implementation manner of the first aspect,

acquiring a first operation time length of a task in an edge server in a first time period, wherein the first time period is determined according to a first time node;

updating parameters of a task scheduling model according to the first running time of the task;

according to the updated task scheduling model, deploying the task to be processed to an edge server;

determining rewards updated by the task scheduling model according to the first operation time length and the updated task scheduling model, and iteratively updating parameters of the task scheduling model according to the rewards until the task scheduling model meets preset requirements.

In one possible implementation manner of the first aspect, obtaining a first running duration of a task in an edge server includes:

acquiring a task of an edge server running in a first period, and acquiring the running time of the task in the first period;

and determining the first operation time length according to the tasks operated in the first time period and the operation time length of each task in the first time period.

In a possible implementation manner of the first aspect, acquiring a running duration of a task in the first period includes:

acquiring the starting time and the ending time of a task, and the starting time and the ending time of a first period;

and determining the running time of the task in the first time period according to the precedence relationship between the starting time of the task and the starting time of the first time period and the precedence relationship between the ending time of the task and the ending time of the first time period.

In a possible implementation manner of the first aspect, updating parameters of the task scheduling model according to a first running time of the task includes:

acquiring an average reward positioned before a first time node;

and updating parameters of the task scheduling model according to the accumulated rewards and the average rewards and combining a preset learning rate.

acquiring task request information and edge server information;

and inputting a task scheduling model according to the task request information, the edge server information and the first operation time length, and outputting parameters of the task scheduling model.

In a possible implementation manner of the first aspect, the task request information includes one or more of a feature of a subtask hierarchy of task division, a feature of a task hierarchy, and a feature of a global task hierarchy, and the edge server information includes one or more of a computing power feature, a communication resource feature, a computing resource state feature, and a storage resource state feature of the edge server.

In a second aspect, an embodiment of the present application provides a task scheduling method, which generates a task scheduling scheme using a trained task scheduling model, where the method includes the following steps:

in a possible implementation manner of the second aspect, global task request and edge server information are acquired;

inputting a global task request and edge server information into a trained task scheduling model obtained in the task scheduling model training method according to any one of the first aspect, and obtaining a scheduling strategy output by the task scheduling model;

and dividing the global task request into one or more micro-services according to a scheduling policy, and deploying the micro-services to an edge server.

In a third aspect, an embodiment of the present application provides a task scheduling model training device, where a task scheduling model is applied to mobile edge calculation, the device includes:

the configuration module is used for acquiring a first running time of a task in the edge server in a first time period, wherein the first time period is determined according to a first time node;

the parameter updating module is used for updating parameters of the task scheduling model according to the first running time of the task;

the task scheduling module is used for deploying the task to be processed to the edge server according to the updated task scheduling model;

and the training module is used for determining rewards updated by the task scheduling model according to the first operation time length and the updated task scheduling model, and iteratively updating parameters of the task scheduling model according to the rewards until the task scheduling model meets the preset requirement.

In a fourth aspect, an embodiment of the present application provides an electronic device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor implements the task scheduling model training method as described in any one of the first aspects when executing the computer program, or implements the task scheduling method as described in any one of the second aspects when executing the computer program.

In a fifth aspect, embodiments of the present application provide a computer readable storage medium storing a computer program, where the computer program when executed by a processor implements a task scheduling model training method as described in any one of the first aspects, or where the computer program when executed by a processor implements a task scheduling method as described in any one of the second aspects.

In a sixth aspect, embodiments of the present application provide a computer program product that, when run on an electronic device, causes the electronic device to perform the task scheduling model training method described in any one of the first aspects above, or causes the electronic device to perform the task scheduling method described in any one of the second aspects above.

It will be appreciated that the advantages of the second to sixth aspects may be found in the relevant description of the first aspect, and are not described here again.

Compared with the prior art, the embodiment of the application has the beneficial effects that:

the application provides a task scheduling model training method, which comprises the steps of acquiring a first operation time length of a task in an edge server within a first time period determined by a first time node, updating parameters of a task scheduling model according to the first operation time length, deploying a task to be processed to the edge server based on the updated task scheduling model, determining rewards according to the first operation time length and the updated task scheduling model, updating the parameters of the task scheduling model based on the rewards, and completing training of the task scheduling model when the task scheduling model meets preset requirements. Task scheduling is carried out based on the model, so that the method can effectively adapt to task scheduling requirements of massive, diversified, random and dynamic changes, and is beneficial to improving scheduling efficiency.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the following description will briefly introduce the drawings that are needed in the embodiments or the description of the prior art, it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is an application scenario diagram of task scheduling work provided by an embodiment of the present application;

FIG. 2 is a flow chart of a task scheduling model training method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a training scenario of a task scheduling model provided by an embodiment of the present application;

fig. 4 is a schematic implementation flow chart of a task scheduling method according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a task scheduling model training device according to an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system configurations, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "and/or" as used in this specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.

In addition, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used merely to distinguish between descriptions and are not to be construed as indicating or implying relative importance.

The task scheduling model training method, the task scheduling device, the task scheduling method and the electronic equipment can solve the problem of high delay of task scheduling service, improve task completion efficiency of users, are applied to task scheduling processes, and are suitable for scenes of calculating user tasks.

Fig. 1 is an application scenario schematic diagram of a task scheduling method provided in an embodiment of the present application. As shown in fig. 1, the application scenario includes a user equipment 101, a radio base station 102, an edge server 103, and a central controller 104. Only one of the ue 101, the radio base station 102, the edge server 103 and the central controller 104 is shown in fig. 1, but one or more are needed in practical applications. For example, in an application scenario, the application scenario may include a plurality of user equipments 101, a radio base station 102, an edge server 103 and a central controller 104, where tasks requested by the plurality of user equipments 101 are scheduled to different edge servers 103 by the central controller 104. Wherein the task scheduling model is located at the central controller 104.

The ue 101 may continuously generate task request information and task data.

The radio base station 102 may collect and aggregate task request information generated by the user equipment 101, and may also complete data transmission.

The edge servers 103 have different edge server information, and can acquire task data to calculate, and return calculation results after calculation is completed.

The central controller 104 is configured to maintain and collect integrated task request information, train a task scheduling model, and use the trained task scheduling model to continuously generate scheduling decisions according to the task request information, and return the scheduling decisions, so as to control task scheduling.

As shown in fig. 1, a specific task scheduling procedure is:

1. the user equipment 101 sends the task information to the radio base station 102, and the radio base station 102 forwards the task information to the central controller 104.

2. The central controller 104 acquires the task information and transmits the task information to the task scheduling model for processing, and the task scheduling model combines the edge server information to generate a scheduling policy and transmits the scheduling policy to the central controller 104 for returning to the user equipment 101 and the edge server 103.

3. After the user equipment 101 acquires the scheduling policy, the task data is transmitted to the radio base station 102, and the radio base station 102 forwards the task data to the edge server 103.

4. The edge server 103 calculates the task data according to the scheduling policy calculation, and the obtained calculation result is returned to the user equipment 101 to complete the task scheduling work.

Fig. 2 is a flow chart of a task scheduling model training method according to an embodiment of the present application, where the task scheduling model is applied to mobile edge computing, and the steps of the method are as follows:

in S201, a first running duration of a task in an edge server in a first period is obtained.

Wherein the first time period is determined from a first time node. For example, the first period may be a period from a time node between first time nodes to the first time node.

In one possible implementation, the above steps may set a predetermined time interval, and determine a plurality of time nodes according to the time interval. Illustratively, in the embodiment of the present application, the ordering of any time node may be represented by a positive integer k, such as: kth time node, kth+1th time node, etc.

In one possible implementation manner, in the process of training the task scheduling model, the edge servers are a certain number of randomly generated edge servers, and the computing power, the resources, the bandwidth, the memory and other attributes of the edge servers are different.

In one possible implementation, the above tasks are generated according to a poisson process. The poisson process described above refers to each bounded region in each bounded time interval or in a certain space. A random number of events is assigned such that the number of events in one time interval or spatial region and the number of events in another non-overlapping time interval or spatial region are independent. The number of events in each time interval or space region is a random number, following a poisson distribution. Technically, each set of finite measures is assigned a poisson distributed random variable. Thereby realizing the random generation of the task.

In a possible implementation manner, at the time of the kth time node, the running duration of the task in the edge server in the kth time period corresponding to the kth time node is acquired, wherein the kth time period is determined according to the kth time node, for example, the first time period is determined according to the first time node, and the first time period can be a time period from the first time node to a time node before the first time node. Specifically, by acquiring the number of tasks that the edge server operates in the kth-1 time node to the kth time node, in combination with the time period between the kth-1 time node and the kth time node, the operation duration of the tasks at the current time node can be obtained. The kth time node may represent a current time, and the kth-1 time node represents a time preceding the current time.

In S202, parameters of the task scheduling model are updated according to the first running time of the task.

In one possible implementation manner, the task scheduling model updates the parameters trained by the task scheduling model according to the running time of the task, and when the task scheduling model is trained, the task scheduling model determines the parameters capable of reducing the task completion time according to the obtained first running time of each task in the edge server in the first period. The parameters may be updated continuously in the iterative updating process, so as to optimize the task allocation mode.

In the embodiment of the application, the reward corresponding to the first time node can be determined according to the task state corresponding to the first operation time length, so that the parameters in the task scheduling model can be updated conveniently according to the reward.

When determining the rewards corresponding to the first time node, the rewards r of the kth time node can be determined according to the goal of minimizing the average completion time of the tasks _k ＝-(t _k -t _k-1 )N _k As a penalty term, k is a positive integer. Wherein t is _k -t _k-1 The duration from the kth-1 time node to the kth time node, i.e., the kth period, is represented. N (N) _k Indicating the number of tasks in the system during the period k, or the number of tasks handled by the edge server. The goal of the agent can therefore be seen as minimizing the desired penalty:

i.e., minimizing the average completion time of a task by minimizing the number of outstanding tasks in the system. Wherein T is the number of time nodes or time periods included in the training segment, T _T For the duration of the training segment.

In the training process, one training segment (epoode) is considered to be finished only when all the generated tasks are scheduled to be executed. However, returning only one reward in one training segment results in an excessively long action sequence and too sparse rewards, which makes optimization of the task scheduling model difficult. Therefore, the cumulative rewards can be returned at each time node of the training, and the running duration of the task in the first time period can be determined based on the starting time and the ending time of the task, the starting time and the ending time of the first time period and the precedence relationship between the starting time of the task and the starting time of the first time period and the precedence relationship between the ending time of the task and the ending time of the first time period.

For example, the jackpot may be expressed by the following formula:

wherein->

Is the set of tasks that have arrived in the system, tk is the system time at step k,

is task T _i A completion time and a start time of (a). The time sum elapsed between two scheduled time nodes, i.e., the kth period, for each task may be determined by the jackpot. The prize may be set to a negative number, at which point the agent's goal is to maximize the jackpot. The goal of minimizing the average task completion time may be achieved by reducing the number of tasks in the system so that scheduling decisions may be made.

The parameters in the task scheduling model may be updated using a gradient descent method based on the rewards determined at the time of the first task.

According to the Monte Carlo method, an agent (e.g., a central controller) may sample multiple trajectories and perform multiple trajectory evaluation gradients according to a strategy output by the current task scheduling model. The gradient descent parameter θ may then be updated by the REINFORCE-band reference algorithm. The parameters of the task scheduling model may be updated based on the average rewards prior to the first time node in combination with the rewards determined by the first time node. For the kth time node (k is a positive integer), the update process can be expressed as:

wherein alpha represents learning rate, b _k Indicating the average jackpot benchmark (or alternatively, the average prize) prior to the kth step. Then->

Can represent the desired jackpot according to the current strategy, that is + ->

The advantage of comparing the benchmark rewards according to the current strategy at the kth step is represented. />

Representing lifting policy pi _θ (s _k ，a _k ) If the desired jackpot based on the current strategy is better than the baseline jackpot). Thus strengthening this policy direction may result in better rewards for the policy. In a possible implementation, entropy terms may also be added to explore the action space to obtain a more robust strategy.

The state of the environment can be observed by the agent (central controller) during the interaction of the agent with the environment. The method and the device can split the requested task into a plurality of micro-services, and deploy each micro-service in a container mode of the edge server, so that average task completion time in the system is optimized.

In addition, since graph information and node dependency relationships of micro servers split by different tasks are different, and the number of Directed Acyclic Graphs (DAGs) of tasks is different from the number of subtasks of each task, the task scheduling model needs to adapt to the change of environmental state information. The embodiment of the application can input the task request information and the edge server information into the task scheduling model so as to adapt to the change of the environment state information. And if only the features of the subtask nodes are taken as inputs, the features are often isolated, and the structural information of the graph (such as calculation/data quantity on a key path from a certain node) is ignored. The application can use a graph rolling network to embed three layers of DAG state information. The DAG state information may include:

the subtask-level features, i.e., the embedded epsilon of each node (subtask), collecting information about that node and its child nodes, may include: the total computation/data amount of the critical path from the node, one or more of the resource demands.

Features of the task hierarchy, i.e. embedding of each task

Summarizing the information of the overall task DAG may include one or more of the computing/data volume of the overall task, the operational subtask nodes.

Features of the global task hierarchy, i.e. embedding of global task information

And integrating information embedding of all tasks. One or more of the number of nodes, the number of servers allocated, the total amount, the resource demand for each task may be included.

For a particular task DAG view, we define

Is G _i Original characteristic information of node v in (e.g. count of the nodeCalculation/quantity, resource requirements). By means of message passing (iterative transfer of information from child node to parent node), node v gathers child node information that it can reach, we use +.>

A set of child nodes representing node v. Thus, we can express the node level information embedding process with the following formula: />

Wherein f (·) and g (·) represent nonlinear variations. In the network model input, the agent will start the iterative calculation +.>

Therefore we can calculate +.>

We can collect a task G by adding a task DAG summary node (all nodes (subtasks) of all the tasks are child nodes of the node) _i Is embedded into the data to obtain the summary information of a task DAG>

Likewise, we can add a global inode (summarize all tasks DAG node +.>

Child node as the node) to calculate summary information of global level +.>

Finally, the three levels of information features are used as part of the input of the training task scheduling model, so that the embedding of the information features of the graph is realized.

In a possible implementation, the DAG task features input into the task scheduling model for selecting micro-services (or sub-tasks resulting from task partitioning) may include one or more of the following features: 1) The number of assigned edge servers; 2) Remaining micro-service numbers; 3) A number of executable micro services; 4) Micro-service parent node number; 5) Micro-service computational demand; 6) Micro-service data volume; 7) CPU demand; 8) Memory requirements.

Edge server characteristics for selecting an edge server may include computing power characteristics, communication resource characteristics, computing resource status characteristics, and storage resource status characteristics of the edge server. For example, one or more of the following features may be included: 1) Calculating force by an edge server; 2) The edge server transmits the bandwidth; 3) The number of CPUs available to the edge server; 4) The edge server may use memory.

In S203, according to the updated task scheduling model, the task to be processed is deployed to the edge server.

And according to the dynamically changed task characteristics and the edge server characteristics, the parameters of the task scheduling model are adjusted by combining the determined rewards so as to update the task scheduling model. Based on the updated task scheduling model, a strategy for distributing tasks under the environment of the current task characteristics and the edge server characteristics can be calculated.

In a possible implementation, the task features may be combined with the extracted embedded information in the graph convolution network as input to a task selection policy network, i.e. a task scheduling model. The individual micro-services/edge servers may be prioritized via a task selection policy network (task scheduling model) and a softmax operation may be employed to calculate the probability of selecting individual micro-services/servers. At each scheduling trigger, the agent selects a microservice ms according to the probability distribution _i And edge server s _j Combined into one scheduling decision (ms) _i ，s _j ) I.e. will micro-serve ms _i Deployment to edge servers s by means of containers _j Thereby completing one action selection.

In S204, determining a second operation time length of the deployed task in a second period, determining rewards according to the second operation time length, and updating parameters of the task scheduling model according to rewards iteration until the task scheduling model meets preset requirements.

Wherein the second period may be an adjacent period after the first period. The second time node is a time node after the first time node, for example, may be two adjacent time nodes.

And determining a second time period corresponding to the second time node and a second operation time period corresponding to the task in the second time period according to the mode of determining the first operation time period of the first time period. A corresponding reward is determined based on the second run length, such that the benefit of the reward relative to the scheduling policy of the first time node relative to the previous scheduling policy can be based on. If there is an advantage, the parameters of the task scheduling model may be further updated according to the advantage, a new scheduling policy may be generated, and a corresponding third operation duration at the third time node may be obtained based on the new scheduling policy, that is, the task scheduling model parameters may be updated in the third operation duration, and the iterative updating may be repeated until the task scheduling model obtains the maximized cumulative rewards, for example, may include convergence of the task scheduling model.

Based on the above description of fig. 2 and fig. 2, the following description is made according to an application scenario diagram trained by the task scheduling model shown in fig. 3.

Fig. 3 shows an application scenario diagram of a task scheduling model training method provided in an embodiment of the present application, where, as shown in fig. 3, an environment includes a user device and an edge server, and task request information and edge server information are obtained from the user device and the edge server in a distributed manner as states. And the task request information is processed by a graph convolution network in the task scheduling model, so that the embedding of graph information features is realized. And the graph convolution network outputs the processed task request information to a strategy network or a task scheduling model. The strategy network updates strategy network parameters according to the task request information and the edge server information, outputs updated strategies, generates actions according to the updated strategies, and deploys the task request selected according to the strategies and the edge server to serve as an action return environment; after the environment acquires the action, the environment returns a reward to the task scheduling model.

Fig. 4 shows a flow chart of a task scheduling method provided in an embodiment of the present application, where a task scheduling scheme is calculated and generated by using a trained task scheduling model, and the task scheduling model is located in a central controller, and the method includes the following steps:

in S401, global task request and edge server information are acquired.

In one possible implementation manner, the types of the obtained task request information and the obtained edge server information are described in detail in the foregoing description of the embodiment of the present application.

In S402, the global task request and the edge server information are input to the trained task scheduling model, so as to obtain a scheduling policy output by the task scheduling model.

In one possible implementation manner, the task scheduling model after training obtains task request information, and generates a scheduling parameter according to the task request information and the edge server information and combining the scheduling strategy finally obtained after training. Prioritizing individual micro-services/edge servers according to the scheduling policy may be considered and employing softmax operations to calculate the probability of selecting individual micro-services/servers. At each scheduling trigger, the agent selects a microservice ms according to the probability distribution _i And edge server s _j Combined into one scheduling decision (ms) _i ，s _j ) I.e. will micro-serve ms _i Deployment to edge servers s by means of containers _j Thereby completing one action selection.

In S403, the global task request is divided into one or more micro services according to the scheduling policy, and deployed to the edge server.

According to the selected micro-services and the edge servers, the micro-services are distributed to the edge servers, so that the edge servers process the scheduled micro-services, and the average completion time of the tasks is reduced.

Fig. 5 is a schematic structural diagram of a task scheduling model training device according to an embodiment of the present application, where the device includes:

a first operation duration obtaining unit 501, configured to obtain a first operation duration of a task in an edge server in a first period, where the first period is determined according to a first time node;

a deployment unit 502, configured to update parameters of the task scheduling model according to a first running duration of the task, and deploy a task to be processed to the edge server according to the updated task scheduling model;

a parameter updating unit 503, configured to determine a second operation duration of the deployed task in a second period, determine a reward according to the second operation duration, and iteratively update parameters of the task scheduling model according to the reward until the task scheduling model meets a preset requirement, where the second period is determined according to a second time node, and the second time node is subsequent to the first time node.

The task scheduling model training device corresponds to the task scheduling model training method.

Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 6, the electronic device 6 of this embodiment includes: at least one processor 60 (only one shown in fig. 6), a memory 61 and a computer program 62 stored in the memory 61 and executable on the at least one processor 60, the processor 60 executing the computer program 62 implementing the steps in any of the various intelligent toy production method embodiments described above.

The electronic device 6 may be a computing device such as a desktop computer, a notebook computer, a palm computer, a cloud server, etc. The electronic equipment can be used for training a task scheduling model and can also be used for task scheduling. The electronic device may include, but is not limited to, a processor 60, a memory 61. It will be appreciated by those skilled in the art that fig. 6 is merely an example of an electronic device and is not meant to be limiting as to the electronic device 6, and may include more or fewer components than shown, or may combine certain components, or different components, such as may also include input-output devices, network access devices, etc.

The processor 60 may be a central processing unit (Central Processing Unit, CPU), the processor 60 may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 61 may in some embodiments be an internal storage unit of the electronic device 6, such as a hard disk or a memory of the electronic device 6. The memory 61 may in other embodiments also be an external storage device of the electronic device 6, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card) or the like, which are provided on the electronic device 6. Further, the memory 61 may also include both an internal storage unit and an external storage device of the electronic device 6. The memory 61 is used for storing an operating system, application programs, boot loader (BootLoader), data, other programs, etc., such as program codes of the computer program. The memory 61 may also be used for temporarily storing data that has been output or is to be output.

It should be noted that, because the content of information interaction and execution process between the above devices/units is based on the same concept as the method embodiment of the present application, specific functions and technical effects thereof may be referred to in the method embodiment section, and will not be described herein again.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.

Embodiments of the present application also provide a computer readable storage medium storing a computer program which, when executed by a processor, implements steps that may implement the various method embodiments described above.

Embodiments of the present application provide a computer program product which, when run on a mobile terminal, causes the mobile terminal to perform steps that may be performed in the various method embodiments described above.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application implements all or part of the flow of the method of the above embodiments, and may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to an apparatus/terminal device, recording medium, computer Memory, read-Only Memory (ROM), random access Memory (RAM, random Access Memory), electrical carrier signals, telecommunications signals, and software distribution media. Such as a U-disk, removable hard disk, magnetic or optical disk, etc. In some jurisdictions, computer readable media may not be electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/network device and method may be implemented in other manners. For example, the apparatus/network device embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical functional division, and there may be additional divisions in actual implementation, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

The above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims

1. A method for training a task scheduling model, wherein the task scheduling model is applied to mobile edge calculation, and the method comprises the following steps:

updating parameters of the task scheduling model according to the first running time of the task, and deploying the task to be processed to the edge server according to the updated task scheduling model;

determining a second operation time length of the deployed task in a second time period, determining rewards according to the second operation time length, and iteratively updating parameters of the task scheduling model according to the rewards until the task scheduling model meets preset requirements, wherein the second time period is determined according to a second time node, and the second time node is behind the first time node.

2. The method of claim 1, wherein obtaining a first run length of a task in an edge server comprises:

acquiring a task of the edge server running in the first period, and acquiring the running time of the task in the first period;

3. The method of claim 2, wherein obtaining the run length of the task in the first period of time comprises:

acquiring the starting time and the ending time of the task, and the starting time and the ending time of the first period;

determining the running duration of the task in the first time period according to the precedence relation between the starting time of the task and the starting time of the first time period and the precedence relation between the ending time of the task and the ending time of the first time period.

4. The method according to any one of claims 1 to 2, wherein updating parameters of the task scheduling model according to a first run length of the task comprises:

acquiring an average reward positioned before a first time node;

and updating parameters of the task scheduling model according to the rewards and the average rewards and in combination with a preset learning rate.

5. The method of claim 1, wherein updating parameters of the task scheduling model based on the first run time of the task comprises:

acquiring task request information and edge server information;

and inputting the task scheduling model according to the task request information, the edge server information and the first operation time length, and outputting parameters of the task scheduling model.

6. The method of claim 5, wherein the task request information includes one or more of a feature of a subtask hierarchy of the task partition, a feature of the task hierarchy, and a feature of a global task hierarchy, and the edge server information includes one or more of a computational power feature, a communication resource feature, a computing resource state feature, and a storage resource state feature of the edge server.

7. A method of task scheduling, the method comprising:

acquiring global task requests and edge server information;

inputting the global task request and the edge server information into the trained task scheduling model obtained by the method according to any one of claims 1-7, so as to obtain a scheduling strategy output by the task scheduling model;

and dividing the global task request into one or more micro-services according to the scheduling strategy, and deploying the micro-services to the edge server.

8. A task scheduling model training device, wherein the task scheduling model is applied to mobile edge computing, the device comprising:

the first operation time length acquisition unit is used for acquiring a first operation time length of a task in the edge server in a first time period, wherein the first time period is determined according to a first time node;

the deployment unit is used for updating parameters of the task scheduling model according to the first running time of the task and deploying the task to be processed to the edge server according to the updated task scheduling model;

the parameter updating unit is used for determining a second operation time length of the deployed task in a second time period, determining rewards according to the second operation time length, and iteratively updating parameters of the task scheduling model according to the rewards until the task scheduling model meets preset requirements, wherein the second time period is determined according to a second time node, and the second time node is behind the first time node.

9. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the task scheduling model training method of any one of claims 1 to 6 and/or the task scheduling method of any one of claim 7 when executing the computer program.

10. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the task scheduling model training method of any one of claims 1 to 6 and/or implements the task scheduling method of any one of claim 7.