CN117873689A

CN117873689A - Task allocation method, device, equipment and computer readable storage medium

Info

Publication number: CN117873689A
Application number: CN202410270361.1A
Authority: CN
Inventors: 万翔
Original assignee: Inspur Computer Technology Co Ltd
Current assignee: Inspur Computer Technology Co Ltd
Priority date: 2024-03-11
Filing date: 2024-03-11
Publication date: 2024-04-12
Anticipated expiration: 2044-03-11
Also published as: CN117873689B

Abstract

The invention relates to the technical field of edge terminal computing, and discloses a task allocation method, a device, equipment and a computer readable storage medium. And determining a cost function for executing the task based on the first control time delay and the first total energy consumption value corresponding to the edge terminal when executing the task in the current time slot and the second control time delay and the second total energy consumption value corresponding to the edge server when executing the task in the current time slot. And constructing a measurement function for selecting the strategy actions according to the corresponding state quantity, the strategy action quantity and the cost function in the current time slot. According to a strategy selection mode and a measurement minimum principle corresponding to strategy actions, adjusting the state quantity and the value of the strategy action quantity, and determining a target strategy action under the current time slot; and acquiring an action task issued after the target device pointed by the target strategy action executes the calculation task, and executing the action task until all tasks are completed. And the time delay and the energy consumption are controlled in a combined and optimized way, so that the task unloading efficiency is improved.

Description

Task allocation method, device, equipment and computer readable storage medium

Technical Field

The present invention relates to the field of edge terminal computing technologies, and in particular, to a task allocation method, apparatus, device, and computer readable storage medium.

Background

In the fully-interactive multi-edge terminal scene, the control delay is formed by the end-to-end network delay for acquiring all edge terminal parameter data, the delay for running calculation and the end-to-end network delay for transmitting the action command/reference to the execution unit, and the control delay relates to the stability of the whole system and is required to be limited within the system sampling period. Therefore, limiting the control delay is one aspect of the full interaction of the multi-edge terminal scenario. Meanwhile, energy consumption corresponding to transmission and calculation of the edge terminal is also a problem of concern in system design.

With the development of multi-core computing, mobile edge computing (Mobile Edge Computing, MEC) can provide sufficient computing power at the network edge while guaranteeing low latency with the assistance of a suitable wireless network.

Current task offloading methods generally consider selecting a fixed MEC server to perform a computational task. However, in a practical scenario with limited computing resources, selecting a fixed MEC server to perform a computing task may result in inefficient task offloading.

It can be seen how to improve the task offloading efficiency of the multi-edge terminal system, which is a problem to be solved by those skilled in the art.

Disclosure of Invention

The embodiment of the invention aims to provide a task allocation method, a device, equipment and a computer readable storage medium, which can solve the problem of low task unloading efficiency of a multi-edge terminal system.

In order to solve the above technical problems, an embodiment of the present invention provides a task allocation method, including:

determining a cost function for executing the task based on a first control time delay and a first total energy consumption value corresponding to the edge terminal when executing the task in the current time slot, and a second control time delay and a second total energy consumption value corresponding to the edge server when executing the task in the current time slot;

constructing a measurement function for selecting strategy actions according to the corresponding state quantity, strategy action quantity and the cost function in the current time slot; the policy actions comprise selecting an edge server to execute a computing task and selecting an edge terminal to execute the computing task;

according to a strategy selection mode and a measurement minimum principle corresponding to strategy actions, adjusting the state quantity and the value of the strategy action quantity, and determining a target strategy action under the current time slot;

And acquiring an action task issued after the target device pointed by the target strategy action executes the calculation task, and executing the action task until all tasks are completed.

In one aspect, the adjusting the state quantity and the value of the policy action quantity according to the policy selection mode and the metric minimum principle corresponding to the policy action, and determining the target policy action under the current time slot includes:

determining a strategy action selection mode under the current time slot according to the proportional relation between the state quantity and the random number, and adjusting the state quantity and the strategy action quantity;

under the condition that the strategy action selection mode is random selection, taking any strategy action selected randomly as a target strategy action in the current time slot;

and under the condition that the strategy action selection mode is selected by a measurement minimum principle, taking the strategy action corresponding to the measurement function with the minimum value as the target strategy action under the current time slot.

In one aspect, determining the policy action selection mode in the current time slot according to the proportional relationship between the state number and the random number, and adjusting the state number and the policy action number includes:

Generating a random number in the current time slot;

judging whether the random number is larger than the reciprocal of the state quantity or not;

adding one to the number of policy actions if the random number is greater than the inverse of the number of states; setting the strategy action selection mode as a measurement minimum principle selection mode;

adding one to the number of states if the random number is less than or equal to the inverse of the number of states; and setting the strategy action selection mode to be random selection.

In one aspect, after the policy action corresponding to the time when the metric function is the smallest is taken as the target policy action under the current time slot, the method further includes:

and recording the measurement value with the minimum measurement function value under the current time slot so as to conveniently call the measurement value under the next time slot.

In one aspect, the obtaining the action task issued after the target device pointed by the target policy action executes the calculation task includes:

uploading the parameter data of the target equipment to a base station under the condition that the target equipment is an edge terminal, so that the base station can send all the parameter data to all the edge terminals in a broadcasting mode under the condition that the base station receives the parameter data uploaded by all the edge terminals;

And under the condition that all the parameter data broadcasted by the base station are acquired, determining a next action task based on all the parameter data, and executing the action task.

uploading the self parameter data to a base station under the condition that the target equipment is an edge server, so that the base station can determine a next action task based on all the parameter data under the condition that the base station receives all the parameter data uploaded by the edge terminals, and send the action task to all the edge terminals in a broadcasting mode;

and executing the action task under the condition that the action task broadcasted by the base station is acquired.

In one aspect, the determining the cost function for executing the task based on the first control delay and the first total energy value corresponding to the edge terminal when executing the task in the current time slot, and the second control delay and the second total energy value corresponding to the edge server when executing the task in the current time slot includes:

determining a first control time delay corresponding to the edge terminal when executing a task in the current time slot based on the data volume of the parameter data, the link performance parameter and the terminal performance parameter of the edge terminal in the current time slot;

Determining a first total energy consumption value corresponding to the edge terminal when executing the task in the current time slot according to a first control time delay corresponding to the edge terminal when executing the task in the current time slot and the terminal performance parameter;

determining a second control time delay corresponding to the edge server when executing the task in the current time slot based on the data volume of the parameter data, the link performance parameter and the server performance parameter of the edge server in the current time slot;

determining a second total energy consumption value corresponding to the edge server when executing the task in the current time slot according to a second control time delay corresponding to the edge server when executing the task in the current time slot and the server performance parameter;

and constructing a cost function for executing the task based on the first control time delay, the first total energy consumption value, the second control time delay, the second total energy consumption value, the weight parameter and the cost penalty parameter.

In one aspect, the determining, based on the data amount of the parameter data, the link performance parameter and the terminal performance parameter of the edge terminal in the current time slot, the first control delay corresponding to the task executed by the edge terminal in the current time slot includes:

Determining uploading time according to the data volume of parameter data, uplink spectrum bandwidth, uplink channel gain, self transmitting power and unilateral noise power spectrum density under the current time slot;

determining propagation time according to the data volume of parameter data, downlink spectrum bandwidth, downlink channel gain, transmitting power of a base station and single-side noise power spectrum density under the current time slot;

determining the calculation time of the terminal according to the calculation amount and the calculation force of the terminal;

and determining the first control time delay based on the uploading time, the propagation time and the terminal calculation time.

In one aspect, determining the uploading time according to the data size of the parameter data, the uplink spectrum bandwidth, the uplink channel gain, the own transmitting power, and the single-side noise power spectrum density in the current time slot includes:

an uploading time calculation formula is called, and the data volume of parameter data, the uplink spectrum bandwidth, the uplink channel gain, the self transmitting power and the unilateral noise power spectrum density under the current time slot are processed to obtain uploading time; the calculation formula of the uploading time is as follows:

；

Wherein,indicating the uploading time of all the parameter data which are completely uploaded by the edge terminals,Mrepresenting the total number of all edge terminals +.>Representing edge terminationiIn the first placetUpload time in each slot, +.>Representing edge terminationiIn the first placetData amount of parameter data of each time slot, +.>Representing edge terminationiIn the first placetUplink throughput rate of one slot,w ^U representing the uplink spectrum bandwidth, < > and->Representing edge terminationiIn the first placetUplink channel gain of a slot, +.>Representing edge terminationiThe transmission power of the device itself is set,N ₀ representing the single-sided noise power spectral density.

In one aspect, the determining the propagation time according to the data size of the parameter data in the current time slot, the downlink spectrum bandwidth, the downlink channel gain, the transmitting power of the base station, and the single-side noise power spectrum density includes:

invoking a propagation time calculation formula, and processing the data volume of the parameter data, the downlink spectrum bandwidth, the downlink channel gain, the transmitting power of the base station and the single-side noise power spectrum density under the current time slot to obtain propagation time; wherein, the travel time calculation formula is:

；

wherein,represents a set of edge terminals that select the edge terminals to perform a computing task, Representing edge terminationiIn the first placetPropagation time corresponding to each time slot, < >>Superscript in (3)LDBroadcasting for representing the correspondence of edge terminals, +.>Representing slave edge terminalsi-1 start uploading parameter data to the propagation time to complete its broadcast,/>Representing slave edge terminalsiThe propagation time to begin uploading the parameter data to complete its broadcast,represent the firsttDownlink throughput rate of one slot,w ^D representing the downlink spectrum bandwidth, < > and->Representing edge terminationiIn the first placetDownlink channel gain for one slot,P ^D representation houseThe transmission power of the base station.

In one aspect, determining the computing time of the terminal according to the computing amount and the computing power of the terminal includes:

and dividing the calculation amount and the calculation force of the terminal to obtain the calculation time of the terminal.

In one aspect, the determining, according to the first control delay corresponding to the task performed by the edge terminal in the current time slot and the terminal performance parameter, the first total energy consumption value corresponding to the task performed by the edge terminal in the current time slot includes:

and determining the first total energy consumption value according to the self transmitting power, the uploading time, the capacity consumption value in unit time, the propagation time, the self calculated amount and the self calculated force.

In one aspect, the determining the first total energy consumption value according to the own transmitting power, the uploading time, the capacity consumption value of unit time, the propagation time, the own calculation amount and the own calculation power includes:

invoking a first energy consumption calculation formula, and processing the self transmitting power, the uploading time, the capacity consumption value of unit time, the propagation time, the self calculated amount and the self calculated force to obtain the first total energy consumption value; the first energy consumption calculation formula is as follows:

；

wherein,representing edge terminationiSelf-emission power, < >>Representing edge terminationiIn the first placetUploading time corresponding to each time slot, +.>Representing edgesTerminaliCapacity consumption value per second when receiving downlink data,/v>Representing edge terminationiIn the first placetPropagation time corresponding to each time slot, < >>Representing the capacitive switch value consumption of the computing chip,B _t indicating that the edge termination itself is at the firsttThe amount of computation of the time slots is,f _i representing the edge terminationiCalculation power of oneself, < >>Representing said first total energy consumption value,/v>Superscript in (3)LFor representing the power consumption of the edge termination.

In one aspect, the determining, based on the data size, the link performance parameter and the server performance parameter of the parameter data of the edge server in the current time slot, the second control delay corresponding to the task executed by the edge server in the current time slot includes:

determining the calculation time of the server according to the calculation amount of the edge server and the calculation force of the edge server;

and determining the second control time delay based on the uploading time, the propagation time and the server calculation time.

In one aspect, the determining, according to the second control delay corresponding to the task performed by the edge server in the current time slot and the server performance parameter, the second total energy consumption value corresponding to the task performed by the edge server in the current time slot includes:

and determining the second total energy consumption value according to the transmitting power of the edge terminal, the uploading time, the capacity consumption value of the edge server in unit time, the propagation time, the calculated amount of the edge server and the calculated power of the edge server.

In one aspect, the determining the second total energy consumption value according to the transmitting power of the edge terminal, the uploading time, the capability consumption value of the edge server in unit time, the propagation time, the calculated amount of the edge server and the calculated power of the edge server includes:

invoking a second energy consumption calculation formula, and processing the transmitting power of the edge terminal, the uploading time, the capacity consumption value of the edge server in unit time, the propagation time, the calculated amount of the edge server and the calculated force of the edge server to obtain a second total energy consumption value; the second energy consumption calculation formula is as follows:

；

wherein,an edge terminal set representing a selection of edge servers to perform a computing task +.>Representing edge terminationiSelf-emission power, < >>Representing edge terminationiIn the first placetThe upload time corresponding to the time slot is determined,representing the edge serveriCapacity consumption value per second when receiving downlink data,/v>Middle and upper markEDBroadcasting representing edge server correspondence, +.>Representing edge serversiIn the first placetThe propagation time corresponding to the time slot is determined,representing a second total energy consumption value,/->Middle and upper markERepresenting the power consumption of the edge server.

In one aspect, the constructing the cost function for executing the task based on the first control delay, the first total energy value, the second control delay, the second total energy value, the weight parameter, and the cost penalty parameter includes:

invoking a cost function calculation formula, and processing the first control time delay, the first total energy consumption value, the second control time delay, the second total energy consumption value, the weight parameter and the cost penalty parameter to obtain the cost function; the cost function calculation formula is as follows:

；

wherein,representing the total control delay->Representing the total energy consumption value>Representing said first control delay,/for>Representing said first total energy consumption value,/v>Representing the time delay of the second control,representing said second total energy consumption value, +.>The weight parameter is represented as a function of the weight parameter,C ^p representing the cost penalty parameter in question,representing the cost function, when the total control delay is greater than the set delay threshold value +.>When the total control delay is less than or equal to the set delay threshold +.>Selecting the edge terminal to execute the computing task>Selecting an edge server to perform a computing task>。

In one aspect, the constructing a metric function for selecting a policy action according to the number of states, the number of policy actions, and the cost function corresponding to the current time slot includes:

When the random number generated in the current time slot is larger than the reciprocal of the state quantity corresponding to the current time slot, a measurement function calculation formula is called, and the corresponding strategy action quantity and the cost function in the current time slot are processed to obtain the measurement function; the calculation formula of the metric function is as follows:

；

wherein,representing edge terminationiIn the first placeτCorresponding cost function at each time slot, +.>Representing edge terminationiIn the first placeτThe state selected under each time slot iss、Selected policy actions asx，/>Representing the corresponding policy action number, < > under the current time slot>Representing edge terminationiIn the first placetCorresponding metric functions under each time slot.

In one aspect, after the target device to which the target policy action is directed performs the calculation task and performs the action task, the method further includes:

adding one to the time slot number after each execution of the action task;

judging whether the latest time slot number is larger than or equal to the total number of time slots;

ending the task allocation operation when the latest time slot number is greater than or equal to the total time slot number;

and returning to the step of determining the cost function of executing the task based on the first control time delay and the first total energy consumption value corresponding to the edge terminal executing the task in the current time slot and the second control time delay and the second total energy consumption value corresponding to the edge server executing the task in the current time slot when the latest time slot number is smaller than the total time slot number.

The embodiment of the invention also provides a task allocation device, which comprises a determination unit, a construction unit, an adjustment unit, an acquisition unit and an execution unit;

the determining unit is used for determining a cost function for executing the task based on a first control time delay and a first total energy consumption value corresponding to the edge terminal when executing the task in the current time slot and a second control time delay and a second total energy consumption value corresponding to the edge server when executing the task in the current time slot;

the construction unit is used for constructing a measurement function for selecting strategy actions according to the corresponding state quantity, strategy action quantity and the cost function in the current time slot; the policy actions comprise selecting an edge server to execute a computing task and selecting an edge terminal to execute the computing task;

the adjusting unit is used for adjusting the state quantity and the value of the strategy action quantity according to a strategy selection mode and a measurement minimum principle corresponding to the strategy action, and determining a target strategy action under the current time slot;

the obtaining unit is used for obtaining an action task issued after the target equipment pointed by the target strategy action executes the calculation task;

The execution unit is used for executing the action tasks until all tasks are completed.

In one aspect, the per-adjustment unit includes a manner determination subunit, a first as subunit, and a second as subunit;

the mode determining subunit is configured to determine a policy action selection mode in the current time slot according to a proportional relationship between the state number and the random number, and adjust the state number and the policy action number;

the first serving as a subunit is configured to, when the policy action selection manner is random selection, use any one of the random selected policy actions as a target policy action in the current time slot;

and the second subunit is configured to, when the policy action selection mode is selected by using the metric minimum principle, use a policy action corresponding to the time when the metric function has the minimum value as the target policy action in the current time slot.

In one aspect, the manner determination subunit is configured to generate a random number in the current time slot; judging whether the random number is larger than the reciprocal of the state quantity or not; adding one to the number of policy actions if the random number is greater than the inverse of the number of states; setting the strategy action selection mode as a measurement minimum principle selection mode; adding one to the number of states if the random number is less than or equal to the inverse of the number of states; and setting the strategy action selection mode to be random selection.

In one aspect, the system further comprises a recording unit;

the recording unit is used for recording the measurement value with the minimum measurement function value under the current time slot so as to call the measurement value under the next time slot.

On the one hand, the obtaining unit is configured to upload, when the target device is an edge terminal, own parameter data to a base station, so that the base station sends, when receiving all the parameter data uploaded by the edge terminal, all the parameter data to all the edge terminals in a broadcast manner; and under the condition that all the parameter data broadcasted by the base station are acquired, determining a next action task based on all the parameter data, and executing the action task.

On the one hand, the obtaining unit is configured to upload the parameter data of the target device to the base station when the target device is an edge server, so that the base station determines a next action task based on all the parameter data when receiving all the parameter data uploaded by the edge terminals, and sends the action task to all the edge terminals in a broadcast manner; and executing the action task under the condition that the action task broadcasted by the base station is acquired.

In one aspect, the determining unit includes a first time delay determining subunit, a first energy consumption determining subunit, a second time delay determining subunit, a second energy consumption determining subunit, and a cost constructing subunit;

the first time delay determining subunit is configured to determine a first control time delay corresponding to the edge terminal when performing a task in a current time slot based on a data amount of parameter data, a link performance parameter and a terminal performance parameter of the edge terminal in the current time slot;

the first energy consumption determining subunit is configured to determine a first total energy consumption value corresponding to the edge terminal when performing a task in a current time slot according to a first control delay corresponding to the edge terminal when performing the task in the current time slot and the terminal performance parameter;

the second delay determining subunit is configured to determine a second control delay corresponding to the edge server when performing a task in the current time slot based on the data size, the link performance parameter and the server performance parameter of the parameter data of the edge server in the current time slot;

the second energy consumption determining subunit is configured to determine a second total energy consumption value corresponding to the edge server when the edge server performs the task in the current time slot according to a second control delay corresponding to the edge server when the edge server performs the task in the current time slot and the server performance parameter;

The cost construction subunit is configured to construct a cost function for executing a task based on the first control delay, the first total energy consumption value, the second control delay, the second total energy consumption value, the weight parameter, and the cost penalty parameter.

On the one hand, the first time delay determining subunit is configured to determine the uploading time according to the data size of the parameter data, the uplink spectrum bandwidth, the uplink channel gain, the own transmitting power, and the single-side noise power spectrum density in the current time slot; determining propagation time according to the data volume of parameter data, downlink spectrum bandwidth, downlink channel gain, transmitting power of a base station and single-side noise power spectrum density under the current time slot; determining the calculation time of the terminal according to the calculation amount and the calculation force of the terminal; and determining the first control time delay based on the uploading time, the propagation time and the terminal calculation time.

On the one hand, the first delay determining subunit is configured to call an uplink time calculation formula, and process the data volume of the parameter data, the uplink spectrum bandwidth, the uplink channel gain, the own transmitting power, and the single-side noise power spectrum density in the current time slot to obtain an uplink time; the calculation formula of the uploading time is as follows:

；

On the one hand, the first delay determining subunit is configured to invoke a propagation time calculation formula, and process the data volume of the parameter data, the downlink spectrum bandwidth, the downlink channel gain, the transmitting power of the base station, and the single-side noise power spectrum density in the current time slot to obtain a propagation time; wherein, the travel time calculation formula is:

；

wherein,represents a set of edge terminals that select the edge terminals to perform a computing task,representing edge terminationiIn the first placetPropagation time corresponding to each time slot, < >>Superscript in (3)LDBroadcasting for representing the correspondence of edge terminals, +.>Representing slave edge terminals i-1 start uploading parameter data to the propagation time to complete its broadcast,/>Representing slave edge terminalsiStart uploading parameter data to the propagation time of completing its broadcast,/->Represent the firsttDownlink throughput rate of one slot,w ^D representing the downlink spectrum bandwidth, < > and->Representing edge terminationiIn the first placetDownlink channel gain for one slot,P ^D representing the transmit power of the base station.

In one aspect, the first delay determining subunit is configured to divide the calculation amount of the first delay determining subunit by the calculation force of the first delay determining subunit to obtain the calculation time of the terminal.

In one aspect, the first energy consumption determining subunit is configured to determine the first total energy consumption value according to its own transmit power, the upload time, a capability consumption value per unit time, the propagation time, the own calculation amount, and the own calculation power.

On the one hand, the first energy consumption determining subunit is configured to invoke a first energy consumption calculation formula, and process the own transmitting power, the uploading time, the capability consumption value in unit time, the propagation time, the own calculation amount and the own calculation power to obtain the first total energy consumption value; the first energy consumption calculation formula is as follows:

；

Wherein,representing edge terminationiSelf-emission power, < >>Representing edge terminationiIn the first placetUploading time corresponding to each time slot, +.>Representing edge terminationiCapacity consumption value per second when receiving downlink data,/v>Representing edge terminationiIn the first placetPropagation time corresponding to each time slot, < >>Representing the capacitive switch value consumption of the computing chip,B _t indicating that the edge termination itself is at the firsttThe amount of computation of the time slots is,f _i representing the edge terminationiCalculation power of oneself, < >>Representing said first total energy consumption value,/v>Superscript in (3)LFor representing the power consumption of the edge termination.

On the one hand, the second time delay determining subunit is configured to determine the uploading time according to the data size of the parameter data, the uplink spectrum bandwidth, the uplink channel gain, the own transmitting power, and the single-side noise power spectrum density in the current time slot; determining propagation time according to the data volume of parameter data, downlink spectrum bandwidth, downlink channel gain, transmitting power of a base station and single-side noise power spectrum density under the current time slot; determining the calculation time of the server according to the calculation amount of the edge server and the calculation force of the edge server; and determining the second control time delay based on the uploading time, the propagation time and the server calculation time.

In one aspect, the second energy consumption determining subunit is configured to determine the second total energy consumption value according to a transmitting power of an edge terminal, the uploading time, a capability consumption value of an edge server in unit time, the propagation time, a calculated amount of the edge server, and a calculated power of the edge server.

On the one hand, the second energy consumption determining subunit is configured to invoke a second energy consumption calculation formula, and process the transmitting power of the edge terminal, the uploading time, the capability consumption value of the edge server in unit time, the propagation time, the calculated amount of the edge server, and the calculated power of the edge server to obtain the second total energy value; the second energy consumption calculation formula is as follows:

；

wherein,an edge terminal set representing a selection of edge servers to perform a computing task +.>Representing edge terminationiSelf-emission power, < >>Representing edge terminationiIn the first placetThe upload time corresponding to the time slot is determined,representing the edge serveriCapacity consumption value per second when receiving downlink data,/v>Middle and upper markEDBroadcasting representing edge server correspondence, +.>Representing edge serversiIn the first placetThe propagation time corresponding to the time slot is determined, Representing a second total energy consumption value,/->Middle and upper markERepresenting the power consumption of the edge server.

On the one hand, the cost construction subunit is configured to invoke a cost function calculation formula, and process the first control delay, the first total energy consumption value, the second control delay, the second total energy consumption value, the weight parameter and the cost penalty parameter to obtain the cost function; the cost function calculation formula is as follows:

；

wherein,representing the total control delay->Representing the total energy consumption value>Representing said first control delay,/for>Representing said first total energy consumption value,/v>Representing the time delay of the second control,representing said second total energy consumption value, +.>The weight parameter is represented as a function of the weight parameter,C ^p representing the cost penalty parameter in question,representing the cost function, when the total control delay is greater than the set delay threshold value +.>When the total control delay is less than or equal to the set delay threshold +.>Selecting the edge terminal to execute the computing task>Selecting an edge server to perform a computing task>。/>

In one aspect, the construction unit is configured to call a metric function calculation formula to process the number of policy actions corresponding to the current time slot and the cost function to obtain the metric function when the random number generated in the current time slot is greater than the reciprocal of the number of states corresponding to the current time slot; the calculation formula of the metric function is as follows:

；

Wherein,representing edge terminationiIn the first placeτCorresponding cost function under each time slotCount (n)/(l)>Representing edge terminationiIn the first placeτThe state selected under each time slot iss、Selected policy actions asx，/>Representing the corresponding policy action number, < > under the current time slot>Representing edge terminationiIn the first placetCorresponding metric functions under each time slot.

On the one hand, the device also comprises an accumulation unit, a judging unit and an ending unit;

the accumulation unit is used for adding one to the time slot number when the action task is executed once;

the judging unit is used for judging whether the latest time slot number is larger than or equal to the total number of time slots; triggering the determining unit to execute the first control time delay and the first total energy consumption value corresponding to the execution of the task under the current time slot based on the edge terminal, and the second control time delay and the second total energy consumption value corresponding to the execution of the task under the current time slot based on the edge server under the condition that the latest time slot number is smaller than the total time slot number;

the ending unit is configured to end the task allocation operation when the latest time slot number is greater than or equal to the total number of time slots.

The embodiment of the invention also provides a task allocation device, which comprises:

A memory for storing a computer program;

and a processor for executing the computer program to implement the steps of the task allocation method as described above.

The embodiment of the invention also provides a computer readable storage medium, wherein the computer readable storage medium stores a computer program, and the computer program realizes the steps of the task allocation method when being executed by a processor.

According to the technical scheme, the cost function for executing the task is determined based on the first control time delay and the first total energy consumption value corresponding to the edge terminal executing the task in the current time slot and the second control time delay and the second total energy consumption value corresponding to the edge server executing the task in the current time slot. Constructing a measurement function for selecting strategy actions according to the corresponding state quantity, strategy action quantity and cost function in the current time slot; the policy actions include selecting an edge server to execute a computing task and selecting an edge terminal to execute the computing task. According to a strategy selection mode and a measurement minimum principle corresponding to strategy actions, adjusting the state quantity and the value of the strategy action quantity, and determining a target strategy action under the current time slot; and acquiring an action task issued after the target device pointed by the target strategy action executes the calculation task, and executing the action task until all tasks are completed. The method has the beneficial effect that the measurement function is constructed by jointly optimizing and controlling the time delay and the energy consumption. Based on the strategy selection mode and the measurement function, the optimal strategy action under each time slot can be determined, the interaction efficiency between the edge terminal and the edge server is effectively improved, the distribution of computing power resources of the edge terminal and the edge server is optimized, and the task unloading efficiency of the multi-edge terminal system is improved.

Drawings

For a clearer description of embodiments of the present invention, the drawings that are required to be used in the embodiments will be briefly described, it being apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to the drawings without inventive effort for those skilled in the art.

FIG. 1 is a flowchart of a task allocation method according to an embodiment of the present invention;

FIG. 2 is a flow chart of a method for determining a cost function based on control delay and energy consumption values according to an embodiment of the present invention;

fig. 3 is a schematic time sequence diagram of an edge terminal executing a computing task according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a time sequence of an edge server executing a computing task according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a task allocation device according to an embodiment of the present invention;

fig. 6 is a block diagram of a task allocation device according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by a person of ordinary skill in the art without making any inventive effort are within the scope of the present invention.

The terms "comprising" and "having" in the description of the invention and in the above-described figures, as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements but may include other steps or elements not expressly listed.

In order to better understand the aspects of the present invention, the present invention will be described in further detail with reference to the accompanying drawings and detailed description.

Next, a task allocation method provided by the embodiment of the present invention is described in detail. Fig. 1 is a flowchart of a task allocation method according to an embodiment of the present invention, where the method includes:

s101: and determining a cost function for executing the task based on the first control time delay and the first total energy consumption value corresponding to the edge terminal when executing the task in the current time slot and the second control time delay and the second total energy consumption value corresponding to the edge server when executing the task in the current time slot.

The current task offloading method generally adopts a fixed edge server to execute a task scheduling policy, but in a practical scene of limited computing resources, the queuing waiting time of the task on the edge server is a factor which affects the service quality. Meanwhile, energy consumption corresponding to transmission and calculation of edge devices is another concern in system design. The edge devices may include edge terminals and edge servers. Therefore, in the embodiment of the invention, in order to better evaluate the system performance during task processing, the control delay and the energy consumption are considered at the same time.

In practical applications, the computing task may be performed by an edge terminal or by an edge server. In order to select an optimal calculation mode, the control delay and the total energy consumption value corresponding to the task execution of the edge terminal and the control delay and the total energy consumption value corresponding to the task execution of the edge server can be comprehensively analyzed when the cost function is constructed.

For convenience of distinction, the control delay corresponding to the edge terminal may be referred to as a first control delay, and the control delay corresponding to the edge server may be referred to as a second control delay; the total energy consumption value corresponding to the edge terminal is called a first total energy consumption value, and the total energy consumption value corresponding to the edge server is called a second total energy consumption value.

In the embodiment of the invention, the method can adoptNRepresenting the total number of time slots. The task allocation manner under each time slot is similar, and the description is given by taking the current time slot as an example.

The corresponding cost function in the current time slot is used for evaluating the control delay and energy consumption generated when executing the task in the current time slot.

In practical application, when determining the cost function, weights corresponding to the first control delay and the second control delay and weights corresponding to the first total energy consumption value and the second total energy consumption value can be determined according to the selected strategy action under the current time slot. The sum of the weight corresponding to the first control time delay and the weight corresponding to the second control time delay is one, and the sum of the weight corresponding to the first total energy consumption value and the weight corresponding to the second total energy consumption value is one.

S102: and constructing a measurement function for selecting the strategy actions according to the corresponding state quantity, the strategy action quantity and the cost function in the current time slot.

The policy actions may include selecting an edge server to perform a computing task and selecting an edge terminal to perform a computing task.

The cost of one edge terminal is affected not only by its own computation policy, but also by the computation policies of other edge terminals. At the same time, collecting global network information for edge terminals to determine the proper computation style may result in additional latency. In order for each edge terminal to be able to select the appropriate policy actions online in the presence of uncertainty in the overall system dynamics and operating environment, a method to cope with the uncertainty therein is needed, and machine learning is just such an efficient way. Because the edge terminal obtains cost feedback before taking the next action, the aim of controlling the dynamic networking multi-edge terminal system on line can be achieved by adopting the algorithm based on reinforcement learning-time sequence difference.

The corresponding number of states in the current time slot is used to represent the number of randomly selected times from the beginning of the first time slot to the selection in the current time slot. For convenience of description, may employ Representing edge terminationiBeginning at slot 1 to slot 1tA selected number of randomly selected times in each time slot. />Superscript in (3)SFor characterizing randomly selected policy actions.

The corresponding number of policy actions in the current time slot is used to represent the number of metrics minimums selected from the beginning of the first time slot to the current time slot. For convenience of description, may employRepresenting edge terminationiBeginning at slot 1 to slot 1tThe number of selected metric minimums in each slot. />Superscript in (3)S-AFor characterization based onThe policy action is selected based on the metric least principle. Wherein (1)>Representing a set of policy actions->Representing edge terminationiIn the first placetSelecting edge terminals to execute calculation tasks under a time slot, < >>Representing edge terminationiIn the first placetThe edge server is selected to perform the computing task under each time slot.

In the embodiment of the invention, the strategy action selection mode can be determined based on the magnitude relation between the random number generated in the current time slot and the reciprocal of the state quantity corresponding to the current time slot.

Under the condition that the random number generated in the current time slot is larger than the reciprocal of the corresponding state quantity in the current time slot, a measurement function calculation formula can be called, and the corresponding strategy action quantity and cost function in the current time slot are processed to obtain a measurement function; the calculation formula of the metric function is as follows:

；

Wherein,representing edge terminationiIn the first placeτCorresponding cost function at each time slot, +.>Representing edge terminationiIn the first placeτThe state selected under each time slot iss、Selected policy actions asx，/>Representing the corresponding policy action number, < > under the current time slot>Representing edge terminationiIn the first placetCorresponding metric functions under each time slot. />

In practical application, edge terminalsiIn the first placetStatus under a time slotWherein->Representing edge terminationiIn the first placetUplink channel gain of a slot, +.>Representing edge terminationiIn the first placetDownlink channel gain of a time slot, +.>Representing the weight parameters.

S103: and adjusting the state quantity and the value of the strategy action quantity according to the strategy selection mode and the measurement minimum principle corresponding to the strategy action, and determining the target strategy action under the current time slot.

In the embodiment of the invention, the strategy action selection mode under the current time slot can be determined according to the proportional relation between the state quantity and the random number, and the state quantity and the strategy action quantity are regulated.

The policy action selection mode can comprise two types of random selection and metric minimum principle selection. Wherein, the random selection refers to random selection of a strategy action, and the selection of the measurement minimum principle refers to selection of the strategy action corresponding to the minimum value of the measurement function.

The proportional relationship of the number of states to the random number may be comparing the magnitude of the random number to the inverse of the number of states. In practical application, a random number can be generated in the current time slot, and whether the random number is larger than the reciprocal of the state number is judged.

In the case where the random number is greater than the inverse of the state number, the policy action number may be incremented by one; setting a strategy action selection mode as a measurement minimum principle selection mode; adding one to the number of states if the random number is less than or equal to the reciprocal of the number of states; and setting the strategy action selection mode as random selection.

And under the condition that the strategy action selection mode is random selection, any strategy action selected randomly can be used as the target strategy action in the current time slot.

Under the condition that the policy action selection mode is selected by a measurement minimum principle, the policy action corresponding to the minimum value of the measurement function can be used as the target policy action under the current time slot.

In a specific implementation, the target policy action for the current time slot may be determined according to the following formula,

；

wherein,the selection mode of the representing strategy action is selected as the measurement minimum principle.

S104: and acquiring an action task issued after the target device pointed by the target strategy action executes the calculation task, and executing the action task until all tasks are completed.

In the case that the target device to which the target policy action is directed is an edge terminal, the execution of the computing task by the edge terminal is described. When the target device is an edge terminal, the parameter data of the target device can be uploaded to the base station, so that the base station can send all the parameter data to all the edge terminals in a broadcasting mode when receiving the parameter data uploaded by all the edge terminals; and under the condition that all parameter data broadcast by the base station are acquired, determining a next action task based on all parameter data, and executing the action task.

In the case where the target device to which the target policy action is directed is an edge server, execution of the computing task by the edge server is described. When the target device is an edge server, the self parameter data can be uploaded to the base station, so that the base station can determine a next action task based on all the parameter data under the condition that the base station receives the parameter data uploaded by all the edge terminals, and the action task is sent to all the edge terminals in a broadcasting mode; and executing the action task under the condition that the action task broadcasted by the base station is acquired.

In an embodiment of the present invention, the constructed metric function may be formulated as follows,

；

Wherein,representing representation edge terminationiIn the first placet-The corresponding metric function for 1 slot down,representing representation edge terminationiIn the first placet-The corresponding number of policy actions at 1 time slot, < >>Representing representation edge terminationiIn the first placetThe state selected under each time slot iss、Selected policy actions asx，/>Representing edge terminationiIn the first placetThe corresponding cost function for each time slot.

In order to facilitate the adjustment of the metric value in the next time slot by combining the expression formulas of the metric functions, the metric value with the smallest value of the metric function in the current time slot can be recorded after the metric value with the smallest value of the metric function in the current time slot is determined, so that the metric value corresponding to the last time slot can be directly adjusted when the metric value in the next time slot needs to be calculated.

Edge terminal initialization time slott=1, number of recording states under initialization, number of policy actions, and correspondingQThe values are all 0. All edge terminals repeat the above processes in parallel mode at the same time until the interaction between the edge terminalsThe process ends.

The edge terminal can update the table of the record state quantity, the strategy action quantity and the corresponding Q value every time the strategy action is determined, the strategy action is reported to the edge server through session signaling, the edge server determines the polling uploading sequence of the edge terminal, and the sequence is notified to all the edge terminals through session signaling. After the edge terminals receive the polling uploading sequence, all the edge terminals upload the parameter data to the edge server in a polling mode, and meanwhile, the edge server forwards the parameter data to equipment for executing calculation tasks in a broadcasting mode.

When the device for executing the calculation task is an edge terminal, the edge terminal calculates a next action instruction after receiving all parameter data. When the equipment for executing the calculation task is an edge server, the edge server calculates a next action instruction of the edge terminal after receiving all parameter data, and transmits the calculated next action instruction to the edge terminal in a broadcasting mode.

In the embodiment of the present invention, the number of slots may be increased by one every time an action task is executed. And judging whether the latest time slot number is larger than or equal to the total number of time slots. And returning to the step of determining the cost function of executing the task based on the first control time delay and the first total energy value corresponding to the edge terminal when executing the task in the current time slot and the second control time delay and the second total energy value corresponding to the edge server when executing the task in the current time slot under the condition that the latest time slot number is smaller than the total time slot number. And ending the task allocation operation when the latest time slot number is greater than or equal to the total time slot number.

Fig. 2 is a flowchart of a method for determining a cost function based on control delay and energy consumption values according to an embodiment of the present invention, where the method includes:

s201: and determining a first control time delay corresponding to the edge terminal when executing the task in the current time slot based on the data volume of the parameter data, the link performance parameter and the terminal performance parameter of the edge terminal in the current time slot.

Based on the processing flow of the parameter data in the actual application scene, the generated first control delay can comprise uploading time, propagation time and terminal calculation time.

The link performance parameters may include uplink spectrum bandwidth, uplink channel gain, single-sided noise power spectral density, downlink spectrum bandwidth, downlink channel gain, transmit power of the base station.

The terminal performance parameters may include edge terminalsiOwn transmit power, own calculation amount, and own calculation power.

In the embodiment of the invention, for the uploading time, the uploading time can be determined based on the data volume of the parameter data according to the current time slot, the uplink spectrum bandwidth, the uplink channel gain, the self transmitting power and the unilateral noise power spectrum density.

In a specific implementation, an uploading time calculation formula can be called, and the data volume of parameter data, the uplink spectrum bandwidth, the uplink channel gain, the self transmitting power and the unilateral noise power spectrum density under the current time slot are processed to obtain uploading time; the calculation formula of the uploading time is as follows:

；

Wherein,indicating the uploading time of all the edge terminals after uploading the parameter data,Mrepresenting the total number of all edge terminals +.>Representing edge terminationiIn the first placetUpload time in each slot, +.>Representing edge terminationiIn the first placetData amount of parameter data of each time slot, +.>Representing edge terminationiIn the first placetUplink throughput rate of one slot,w ^U representing uplink spectrum bandwidth, +.>Representing edge terminationiIn the first placetUplink channel gain of a slot, +.>Representing edge terminationiThe transmission power of the device itself is set,N ₀ representing the single-sided noise power spectral density.

For the propagation time, the propagation time can be determined according to the data volume of the parameter data, the downlink spectrum bandwidth, the downlink channel gain, the transmitting power of the base station and the single-side noise power spectrum density under the current time slot.

In a specific implementation, a propagation time calculation formula can be called, and the data volume of parameter data, the downlink spectrum bandwidth, the downlink channel gain, the transmitting power of a base station and the single-side noise power spectrum density under the current time slot are processed to obtain the propagation time; wherein, the travel time calculation formula is:

；

wherein,represents a set of edge terminals that select the edge terminals to perform a computing task, Representing edge terminationiIn the first placetPropagation time corresponding to each time slot, < >>Superscript in (3)LDBroadcasting for representing the correspondence of edge terminals, +.>Representing slave edge terminalsi-1 start uploading parameter data to the propagation time to complete its broadcast,/>Representing slave edge terminalsiStart uploading parameter data to the propagation time of completing its broadcast,/->Represent the firsttDownlink throughput rate of one slot,w ^D representing downlink spectrum bandwidth, +.>Representing edge terminationiIn the first placetDownlink channel gain for one slot,P ^D representing the transmit power of the base station.

For the terminal calculation time, the terminal calculation time can be determined according to the calculation amount and the calculation force of the terminal.

In a specific implementation, the calculation amount of the terminal and the calculation force of the terminal can be divided to obtain the calculation time of the terminal. Edge terminal by calling calculation formula of terminal calculation timeiThe calculation amount and the calculation force of the terminal are processed, so that the calculation time of the terminal can be determined; the calculation formula of the terminal calculation time is as follows:

；

wherein,representing edge terminationiIn the first placetThe corresponding terminal under the time slot calculates the time,B _t representing the calculation amount of the edge terminal +.>Superscript in (3) LCFor representing the corresponding computation of the edge termination,f _i representing edge terminationiIs a result of the calculation of (a).B _t Proportional to the sum of the data amounts of all parameter data, i.e. +.>，MRepresenting the total number of edge terminals.

After determining the upload time, the propagation time, and the terminal calculation time, the first control delay may be determined based on the upload time, the propagation time, and the terminal calculation time.

Considering edge terminationMThe delay calculation mode of (a) is different from other edge terminals because the edge terminalsMAs the last edge terminal uploading the parameter data, it knows the self parameter data without waiting for the broadcast forwarding of the self parameter data, so the calculation can be started earlier than other edge terminals, and correspondingly, the first control delay can be expressed by the following formula,

。

s202: and determining a first total energy consumption value corresponding to the edge terminal when executing the task in the current time slot according to the first control time delay and the terminal performance parameter corresponding to the edge terminal when executing the task in the current time slot.

In the embodiment of the invention, the first total energy consumption value can be determined according to the self transmitting power, the uploading time, the capacity consumption value of unit time, the propagation time, the self calculated amount and the self calculated force.

In a specific implementation, a first energy consumption calculation formula can be called to process the self-emission power, the uploading time, the capacity consumption value of unit time, the propagation time, the self-calculation amount and the self-calculation force to obtain a first total energy consumption value; the first energy consumption calculation formula is:

；

wherein,representing edge terminationiSelf-emission power, < >>Representing edge terminationiIn the first placetUploading time corresponding to each time slot, +.>Representing edge terminationiCapacity consumption value per second when receiving downlink data,/v>Representing edge terminationiIn the first placetPropagation time corresponding to each time slot, < >>Representing the capacitive switch value consumption of the computing chip,B _t representing edgesThe edge terminal itself being attThe amount of computation of the time slots is,f _i representing edge terminationiCalculation power of oneself, < >>Representing a first total energy consumption value,/->Superscript in (3)LFor representing the power consumption of the edge termination.

S203: and determining a second control time delay corresponding to the edge server when executing the task in the current time slot based on the data volume of the parameter data, the link performance parameter and the server performance parameter of the edge server in the current time slot.

Based on the processing flow of the parameter data in the actual application scene, the generated second control delay can comprise uploading time, propagation time and server calculation time.

The server performance parameters may include the computing power of the edge server itself and the computing power of itself.

In the embodiment of the invention, the uploading time can be determined according to the data volume of the parameter data, the uplink spectrum bandwidth, the uplink channel gain, the self transmitting power and the unilateral noise power spectrum density in the current time slot.

The specific calculation method of the uploading time can be referred to the description of S201, and will not be described herein.

The specific calculation manner of the propagation time may be referred to the description of S201, and will not be described herein.

In the embodiment of the invention, the edge is convenient to sumThe propagation time corresponding to the computing task executed by the edge terminal can be differentiated by adoptingRepresenting the propagation time corresponding to the edge server performing the computing task.

Wherein,an edge terminal set representing a selection of edge servers to perform a computing task +.>Representing edge terminationiIn the first placetPropagation time corresponding to each time slot, < >>Superscript in (3)EDBroadcasting for representing edge server correspondence, +.>Representing slave edge terminalsi-1 start uploading parameter data to the propagation time to complete its broadcast,/>Representing slave edge terminalsiThe propagation time to begin uploading the parameter data to complete its broadcast.

For the calculation time of the server, the calculation time of the server can be determined according to the calculation amount of the edge server and the calculation force of the edge server.

In a specific implementation, the calculation amount of the server may be divided by the calculation force of the server to obtain the calculation time of the server. The calculation time of the server can be determined by calling a calculation formula of the calculation time of the server and processing the calculation amount of the edge server and the calculation force of the server; the calculation formula of the calculation time of the server is as follows:

；

wherein,representing edge serversiIn the first placetCorresponding server calculation time under each time slot, < +.>Superscript in (3)ECFor representing the corresponding computation of the edge server,B _t representing the computational load of the edge server, FRepresenting the computational power of the edge server.B _t Proportional to the sum of the data amounts of all parameter data, i.e. +.>，MRepresenting the total number of edge terminals.

After determining the upload time, the propagation time, and the server computation time, the second control delay may be determined based on the upload time, the propagation time, and the terminal computation time.

The second control delay may be expressed as follows,

。

s204: and determining a second total energy consumption value corresponding to the edge server when executing the task in the current time slot according to the second control time delay corresponding to the edge server when executing the task in the current time slot and the server performance parameter.

In practical application, the second total energy consumption value may be determined according to the transmitting power of the edge terminal, the uploading time, the capacity consumption value of the edge server in unit time, the propagation time, the calculated amount of the edge server and the calculated power of the edge server.

In a specific implementation, a second energy consumption calculation formula may be called, and the transmitting power, the uploading time, the capacity consumption value of the edge server in unit time, the propagation time, the calculated amount of the edge server and the calculated force of the edge server of the edge terminal are processed to obtain a second total energy consumption value; the second energy consumption calculation formula is:

；

Wherein,an edge terminal set representing a selection of edge servers to perform a computing task +.>Representing edge terminationiSelf-emission power, < >>Representing edge terminationiIn the first placetThe upload time corresponding to the time slot is determined,representing edge serversiCapacity consumption value per second when receiving downlink data,/v>Middle and upper markEDBroadcasting representing edge server correspondence, +.>Representing edge serversiIn the first placetPropagation time corresponding to each time slot, < >>Representing a second total energy consumption value,/->Middle and upper markERepresenting the power consumption of the edge server.

S205: and constructing a cost function for executing the task based on the first control delay, the first total energy value, the second control delay, the second total energy value, the weight parameter and the cost penalty parameter.

In a specific implementation, a cost function calculation formula can be called, and a first control time delay, a first total energy consumption value, a second control time delay, a second total energy consumption value, a weight parameter and a cost penalty parameter are processed to obtain a cost function; the cost function calculation formula is:

；

wherein,representing the total control delay->Representing the total energy consumption value>Representing a first control delay->Representing a first total energy consumption value,/->Representing a second control delay- >Representing a second total energy consumption value,/->The weight parameter is represented by a number of weight parameters,C ^p representing cost penalty parameters, ++>Representing a cost function when the total control delay is greater than a set delay threshold>When the total control delay is less than or equal to the set delay thresholdSelecting the edge terminal to execute the computing task>Selecting an edge server to perform a computing task。

As can be seen from the above description, there are two ways to perform the computing task, one is that the edge terminal performs the computing task, and the other is that the edge server performs the computing task.

Fig. 3 is a schematic time sequence diagram of an edge terminal performing a computing task according to an embodiment of the present invention, and an implementation process of edge terminal computing may include two stages: (1) The base station collects the parameter data of all the edge terminals and sends the parameter data to all the edge terminals in a broadcasting mode. (2) The computing unit of the edge terminal computes the next action instruction based on all the parameter data.

In FIG. 3, byMFor example, edge terminals are edge terminal 1, edge terminals 2, … …, and edge terminals respectivelyMIn time slotstFor example, the edge terminal, base station/edge server is in the time slottThe operation in this is described. And time slots tThe next adjacent time slot is the time slott+1. Each edge terminal may upload the collected own parameter data to the base station/edge server. In the embodiment of the invention, the time for the edge terminal to collect the parameter data is called as collection time, the time for the edge terminal to upload the parameter data is called as uploading time, and the time for the edge terminal to determine to execute the next action is called as execution action time. The parameter data may include an action parameter and an environmental parameter. The base station transmits all the parameter data to all the edge terminals by broadcasting, and the arrow pointing from the base station to the edge terminal in fig. 3 indicates the edge terminal to which the parameter data is propagated. According to the broadcasting mode, each edge terminal can acquire all parameter data, so that a next action instruction is determined based on all parameter data, and then a task corresponding to the next action instruction is executed.

Fig. 4 is a schematic time sequence diagram of an edge server executing a computing task according to an embodiment of the present invention, and a computing implementation process of the edge server includes three stages: and (1) uploading parameter data by all edge terminals. (2) And the edge server calculates a next action instruction of the edge terminal according to all the parameter data. (3) The edge server transmits the next action instruction to all edge terminals in a broadcasting mode.

In FIG. 4, byMFor example, edge terminals are edge terminal 1, edge terminals 2, … …, and edge terminals respectivelyMIn time slotstFor example, the edge terminal, base station/edge server is in the time slottThe operation in this is described. And time slotstThe next adjacent time slot is the time slott+1. Each edge terminal can upload the acquired self parameter data to an edge server. After the edge server obtains all the parameter data, the next action instruction of the edge terminal can be calculated. The arrow from the base station to the edge terminal in fig. 4 indicates the edge terminal to which the next action instruction propagates. According to the broadcasting mode, each edge terminal can acquire a next action instruction and then execute a task corresponding to the next action instruction.

According to the embodiment of the invention, the cost function for comprehensively controlling the time delay and the energy consumption can be constructed according to the whole time sequence diagram when the edge terminal executes the calculation task and the whole time sequence diagram when the edge server executes the calculation task. On the basis, resource allocation optimization is performed through a reinforcement learning-time sequence difference algorithm, so that the interaction efficiency between the edge terminal and the edge server is effectively improved.

Fig. 5 is a schematic structural diagram of a task allocation device according to an embodiment of the present invention, which includes a determining unit 51, a constructing unit 52, an adjusting unit 53, an obtaining unit 54, and an executing unit 55;

the determining unit 51 is configured to determine a cost function for executing a task based on a first control delay and a first total energy value corresponding to the edge terminal when executing the task in the current time slot, and a second control delay and a second total energy value corresponding to the edge server when executing the task in the current time slot;

a construction unit 52, configured to construct a metric function for selecting a policy action according to the number of states, the number of policy actions, and the cost function corresponding to the current time slot; the policy actions comprise selecting an edge server to execute a computing task and selecting an edge terminal to execute the computing task;

the adjusting unit 53 is configured to adjust the state number and the value of the policy action number according to the policy selection mode and the metric minimum principle corresponding to the policy action, and determine the target policy action in the current time slot;

the obtaining unit 54 is configured to obtain an action task issued after the target device to which the target policy action points performs the calculation task;

and the execution unit 55 is used for executing the action tasks until all the tasks are completed.

In some embodiments, determining the sub-unit, the first as sub-unit, and the second as sub-unit by way of the adjustment unit includes;

the mode determining subunit is used for determining a strategy action selecting mode under the current time slot according to the proportional relation between the state quantity and the random number, and adjusting the state quantity and the strategy action quantity;

the first is used as a subunit, and is used for taking any one of the randomly selected strategy actions as a target strategy action in the current time slot under the condition that the strategy action selection mode is the random selection;

and the second subunit is used for taking the strategy action corresponding to the minimum value of the measurement function as the target strategy action under the current time slot under the condition that the strategy action selection mode is selected by the measurement minimum principle.

In some embodiments, the pattern determination subunit is configured to generate a random number for the current time slot; judging whether the random number is larger than the reciprocal of the state number; adding one to the number of policy actions if the random number is greater than the reciprocal of the number of states; setting a strategy action selection mode as a measurement minimum principle selection mode; adding one to the number of states if the random number is less than or equal to the reciprocal of the number of states; and setting the strategy action selection mode as random selection.

In some embodiments, further comprising a recording unit;

and the recording unit is used for recording the measurement value with the minimum measurement function value under the current time slot so as to facilitate the measurement value to be adjusted under the next time slot.

In some embodiments, the obtaining unit is configured to upload, when the target device is an edge terminal, own parameter data to the base station, so that the base station sends, when receiving the parameter data uploaded by all the edge terminals, all the parameter data to all the edge terminals in a broadcast manner; and under the condition that all parameter data broadcast by the base station are acquired, determining a next action task based on all parameter data, and executing the action task.

In some embodiments, the obtaining unit is configured to upload, when the target device is an edge server, own parameter data to the base station, so that the base station determines, when receiving the parameter data uploaded by all edge terminals, a next action task based on all the parameter data, and sends the action task to all the edge terminals in a broadcast manner; and executing the action task under the condition that the action task broadcasted by the base station is acquired.

In some embodiments, the determining unit includes a first time delay determining subunit, a first energy consumption determining subunit, a second time delay determining subunit, a second energy consumption determining subunit, and a cost constructing subunit;

The first time delay determining subunit is used for determining a first control time delay corresponding to the execution of the task by the edge terminal under the current time slot based on the data volume of the parameter data, the link performance parameter and the terminal performance parameter of the edge terminal under the current time slot;

the first energy consumption determining subunit is used for determining a first total energy consumption value corresponding to the edge terminal when executing the task in the current time slot according to a first control time delay corresponding to the edge terminal when executing the task in the current time slot and the terminal performance parameter;

the second time delay determining subunit is used for determining a second control time delay corresponding to the execution of the task by the edge server under the current time slot based on the data volume of the parameter data, the link performance parameter and the server performance parameter of the edge server under the current time slot;

the second energy consumption determining subunit is used for determining a second total energy consumption value corresponding to the edge server when executing the task in the current time slot according to a second control time delay corresponding to the edge server when executing the task in the current time slot and the server performance parameter;

and the cost construction subunit is used for constructing a cost function for executing the task based on the first control time delay, the first total energy consumption value, the second control time delay, the second total energy consumption value, the weight parameter and the cost penalty parameter.

In some embodiments, the first delay determining subunit is configured to determine the uploading time according to a data amount of parameter data, an uplink spectrum bandwidth, an uplink channel gain, a self transmitting power, and a single-side noise power spectrum density in a current time slot; determining propagation time according to the data volume of parameter data, downlink spectrum bandwidth, downlink channel gain, transmitting power of a base station and single-side noise power spectrum density under the current time slot; determining the calculation time of the terminal according to the calculation amount and the calculation force of the terminal; and determining the first control time delay based on the uploading time, the propagation time and the terminal calculation time.

In some embodiments, the first delay determining subunit is configured to call an uplink time calculation formula, and process the data amount of the parameter data, the uplink spectrum bandwidth, the uplink channel gain, the own transmit power, and the single-side noise power spectrum density in the current time slot to obtain an uplink time; the calculation formula of the uploading time is as follows:

；/>

wherein,indicating the uploading time of all the edge terminals after uploading the parameter data,Mrepresenting the total number of all edge terminals +.>Representing edge termination iIn the first placetUpload time in each slot, +.>Representing edge terminationiIn the first placetData amount of parameter data of each time slot, +.>Representing edge terminationiIn the first placetUplink throughput rate of one slot,w ^U representing uplink spectrum bandwidth, +.>Representing edge terminationiIn the first placetUplink channel gain of a slot, +.>Representing edge terminationiThe transmission power of the device itself is set,N ₀ representing the single-sided noise power spectral density.

In some embodiments, the first delay determining subunit is configured to invoke a propagation time calculation formula, and process the data amount of the parameter data, the downlink spectrum bandwidth, the downlink channel gain, the transmitting power of the base station, and the single-side noise power spectrum density in the current time slot to obtain a propagation time; wherein, the travel time calculation formula is:

；

wherein,represents a set of edge terminals that select the edge terminals to perform a computing task,representing edge terminationiIn the first placetPropagation time corresponding to each time slot, < >>Superscript in (3)LDBroadcasting for representing the correspondence of edge terminals, +.>Representing slave edge terminalsi-1 start toUploading parameter data to the propagation time of completing its broadcast,/->Representing slave edge terminalsiThe propagation time to begin uploading the parameter data to complete its broadcast, Represent the firsttDownlink throughput rate of one slot,w ^D representing downlink spectrum bandwidth, +.>Representing edge terminationiIn the first placetDownlink channel gain for one slot,P ^D representing the transmit power of the base station.

In some embodiments, the first delay determining subunit is configured to divide the calculation amount of the first delay determining subunit by the calculation force of the first delay determining subunit to obtain the terminal calculation time.

In some embodiments, the first energy consumption determining subunit is configured to determine the first total energy consumption value according to its own transmit power, upload time, capability consumption value per unit time, propagation time, own calculation amount, and own calculation power.

In some embodiments, the first energy consumption determining subunit is configured to invoke a first energy consumption calculation formula, and process the own transmitting power, the uploading time, the capability consumption value of unit time, the propagation time, the own calculation amount and the own calculation power to obtain a first total energy consumption value; the first energy consumption calculation formula is:

；

wherein,representing edge terminationiSelf-emission power, < >>Representing edge terminationiIn the first placetUploading time corresponding to each time slot, +.>Representing edge terminationiCapacity consumption value per second when receiving downlink data,/v >Representing edge terminationiIn the first placetPropagation time corresponding to each time slot, < >>Representing the capacitive switch value consumption of the computing chip,B _t indicating that the edge termination itself is at the firsttThe amount of computation of the time slots is,f _i representing edge terminationiCalculation power of oneself, < >>Representing a first total energy consumption value,/->Superscript in (3)LFor representing the power consumption of the edge termination.

In some embodiments, the second delay determining subunit is configured to determine the uploading time according to the data size of the parameter data, the uplink spectrum bandwidth, the uplink channel gain, the own transmitting power, and the single-side noise power spectrum density in the current time slot; determining propagation time according to the data volume of parameter data, downlink spectrum bandwidth, downlink channel gain, transmitting power of a base station and single-side noise power spectrum density under the current time slot; determining the calculation time of the server according to the calculation amount of the edge server and the calculation force of the edge server; and determining the second control time delay based on the uploading time, the propagation time and the server calculation time.

In some embodiments, the second energy consumption determining subunit is configured to determine the second total energy consumption value according to a transmission power of the edge terminal, an upload time, a capability consumption value of the edge server per unit time, a propagation time, a calculation amount of the edge server, and a calculation power of the edge server.

In some embodiments, the second energy consumption determining subunit is configured to invoke a second energy consumption calculation formula, and process the transmitting power of the edge terminal, the uploading time, the capability consumption value of the edge server per unit time, the propagation time, the calculated amount of the edge server, and the calculated power of the edge server to obtain a second total energy consumption value; the second energy consumption calculation formula is:

；

In some embodiments, the cost construction subunit is configured to call a cost function calculation formula, and process the first control delay, the first total energy value, the second control delay, the second total energy value, the weight parameter, and the cost penalty parameter to obtain a cost function; the cost function calculation formula is:

；

Wherein,representing the total control delay->Representing the total energy consumption value>Representing a first control delay->Representing a first total energy consumption value,/->Representing a second control delay->Representing a second total energy consumption value,/->The weight parameter is represented by a number of weight parameters,C ^p representing cost penalty parameters, ++>Representing a cost function when the total control delay is greater than a set delay threshold>When the total control delay is less than or equal to the set delay thresholdSelecting an edge terminationExecution of computing tasks->Selecting an edge server to perform a computing task。

In some embodiments, the construction unit is configured to call a metric function calculation formula when the random number generated in the current time slot is greater than the reciprocal of the state number corresponding to the current time slot, and process the policy action number and the cost function corresponding to the current time slot to obtain a metric function; the calculation formula of the metric function is as follows:

；

In some embodiments, the method further comprises an accumulation unit, a judgment unit and an ending unit;

a judging unit for judging whether the latest time slot number is greater than or equal to the total time slot number; under the condition that the latest time slot number is smaller than the total time slot number, triggering a determining unit to execute a step of determining a cost function for executing a task based on a first control time delay and a first total energy consumption value corresponding to the edge terminal when executing the task in the current time slot and a second control time delay and a second total energy consumption value corresponding to the edge server when executing the task in the current time slot;

and an ending unit for ending the task allocation operation in the case that the latest time slot number is greater than or equal to the total time slot number.

The description of the features of the embodiment corresponding to fig. 5 may be referred to the related description of the embodiment corresponding to fig. 1 to 4, and will not be repeated here.

Fig. 6 is a block diagram of a task allocation device according to an embodiment of the present invention, where, as shown in fig. 6, the task allocation device includes: a memory 60 for storing a computer program;

a processor 61 for implementing the steps of the task allocation method according to the above embodiment when executing a computer program.

The task allocation device provided in this embodiment may include, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, or the like.

Processor 61 may include one or more processing cores, such as a 4-core processor, an 8-core processor, etc. The processor 61 may be implemented in at least one hardware form of DSP (Digital Signal Processing ), FPGA (Field-Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array ). The processor 61 may also include a main processor, which is a processor for processing data in an awake state, also called a CPU (Central Processing Unit ), and a coprocessor; a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 61 may integrate a GPU (Graphics Processing Unit, image processor) for rendering and drawing of content required to be displayed by the display screen. In some embodiments, the processor 61 may also include an AI (Artificial Intelligence ) processor for processing computing operations related to machine learning.

Memory 60 may include one or more computer-readable storage media, which may be non-transitory. Memory 60 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In this embodiment, the memory 60 is at least used for storing a computer program 601, which, when loaded and executed by the processor 61, is capable of implementing the relevant steps of the task allocation method disclosed in any of the foregoing embodiments. In addition, the resources stored in the memory 60 may further include an operating system 602, data 603, and the like, where the storage manner may be transient storage or permanent storage. The operating system 602 may include Windows, unix, linux, among other things. The data 603 may include, but is not limited to, a first control delay and a first total energy consumption value corresponding to the edge terminal when performing a task in a current time slot, a second control delay and a second total energy consumption value corresponding to the edge server when performing a task in a current time slot, a corresponding number of states, a number of policy actions, a policy selection manner, and the like.

In some embodiments, the task allocation device may further include a display 62, an input-output interface 63, a communication interface 64, a power supply 65, and a communication bus 66.

Those skilled in the art will appreciate that the configuration shown in fig. 6 is not limiting of the task dispensing device and may include more or fewer components than illustrated.

It will be appreciated that the task allocation method in the above embodiments may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as a stand alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in part or in whole or in part in the form of a software product stored in a storage medium for performing all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random-access Memory (Random Access Memory, RAM), an electrically erasable programmable ROM, registers, a hard disk, a removable disk, a CD-ROM, a magnetic disk, or an optical disk, etc. various media capable of storing program codes.

Based on this, the embodiment of the invention also provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of the task allocation method as described above.

The task allocation method, the device, the equipment and the computer readable storage medium provided by the embodiment of the invention are described in detail. In the description, each embodiment is described in a progressive manner, and each embodiment is mainly described by the differences from other embodiments, so that the same similar parts among the embodiments are mutually referred. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The task allocation method, the device, the equipment and the computer readable storage medium provided by the invention are described in detail above. The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to facilitate an understanding of the method of the present invention and its core ideas. It should be noted that it will be apparent to those skilled in the art that various modifications and adaptations of the invention can be made without departing from the principles of the invention and these modifications and adaptations are intended to be within the scope of the invention as defined in the following claims.

Claims

1. A method of task allocation, comprising:

According to a strategy selection mode and a measurement minimum principle corresponding to the strategy actions, adjusting the state quantity and the value of the strategy action quantity, and determining a target strategy action under the current time slot;

2. The task allocation method according to claim 1, wherein the adjusting the state number and the value of the policy action number according to the policy selection manner and the metric minimization principle corresponding to the policy action, and determining the target policy action in the current time slot includes:

3. The task allocation method according to claim 2, wherein determining a policy action selection manner in the current time slot according to the proportional relationship between the state number and the random number, and adjusting the state number and the policy action number includes:

generating a random number in the current time slot;

4. The task allocation method according to claim 2, further comprising, after said policy action corresponding to the time when the metric function is minimized is taken as the target policy action in the current time slot:

5. The task allocation method according to claim 1, wherein the obtaining the action task issued after the target device to which the target policy action is directed performs the calculation task includes:

uploading the parameter data of the target equipment to a base station under the condition that the target equipment is the edge terminal, so that the base station can send all the parameter data to all the edge terminals in a broadcasting mode under the condition that the base station receives the parameter data uploaded by all the edge terminals;

6. The task allocation method according to claim 1, wherein the obtaining the action task issued after the target device to which the target policy action is directed performs the calculation task includes:

uploading the self parameter data to a base station under the condition that the target equipment is the edge server, so that the base station can determine a next action task based on all the parameter data under the condition that the base station receives all the parameter data uploaded by the edge terminals, and send the action task to all the edge terminals in a broadcasting mode;

7. The task allocation method according to claim 1, wherein determining the cost function for executing the task based on the first control delay and the first total energy consumption value corresponding to the edge terminal executing the task in the current time slot, and the second control delay and the second total energy consumption value corresponding to the edge server executing the task in the current time slot includes:

determining the first control time delay corresponding to the edge terminal when executing a task in the current time slot based on the data volume of the parameter data, the link performance parameter and the terminal performance parameter of the edge terminal in the current time slot;

determining the first total energy consumption value corresponding to the edge terminal when executing the task in the current time slot according to the first control time delay corresponding to the edge terminal when executing the task in the current time slot and the terminal performance parameter;

determining the second control time delay corresponding to the edge server when executing the task in the current time slot based on the data volume of the parameter data, the link performance parameter and the server performance parameter of the edge server in the current time slot;

8. The method for task allocation according to claim 7, wherein determining the first control delay corresponding to the edge terminal when performing the task in the current time slot based on the data amount of the parameter data, the link performance parameter, and the terminal performance parameter of the edge terminal in the current time slot includes:

9. The method for allocating tasks according to claim 8, wherein determining the uploading time according to the data amount of the parameter data in the current time slot, the uplink spectrum bandwidth, the uplink channel gain, the own transmitting power, the single-side noise power spectrum density comprises:

；

wherein,indicating the uploading time of all the parameter data which are completely uploaded by the edge terminals,Mrepresenting the total number of all edge terminals +.>Representing edge terminationiIn the first placetUpload time in each slot, +.>Representing edge terminationiIn the first placetData amount of parameter data of each time slot, +.>Representing edge terminationiIn the first placetUplink throughput rate of one slot, w ^U Representing the uplink spectrum bandwidth, < > and->Representing edge terminationiIn the first placetUplink channel gain of a slot, +.>Representing edge terminationiThe transmission power of the device itself is set,N ₀ representing the single-sided noise power spectral density.

10. The method of task allocation according to claim 9, wherein determining the propagation time according to the data amount of the parameter data in the current time slot, the downlink spectrum bandwidth, the downlink channel gain, the transmission power of the base station, and the single-side noise power spectrum density comprises:

；

wherein,represents a set of edge terminals that select the edge terminals to perform a computing task,representing edge terminationiIn the first placetPropagation time corresponding to each time slot, < >>Superscript in (3)LDBroadcasting for representing the correspondence of edge terminals, +.>Representing slave edge terminalsi-1 start uploading parameter data to the propagation time to complete its broadcast,/>Representing slave edge terminals iThe propagation time to begin uploading the parameter data to complete its broadcast,represent the firsttDownlink throughput rate of one slot,w ^D representing the downlink spectrum bandwidth, < > and->Representing edge terminationiIn the first placetDownlink channel gain for one slot,P ^D representing the transmit power of the base station.

11. The task allocation method according to claim 8, wherein determining the terminal calculation time according to the own calculation amount and the own calculation force includes:

12. The method for task allocation according to claim 8, wherein determining the first total energy consumption value corresponding to the edge terminal when performing the task in the current time slot according to the first control delay corresponding to the edge terminal when performing the task in the current time slot and the terminal performance parameter includes:

13. The method according to claim 12, wherein determining the first total energy consumption value according to the own transmission power, the upload time, the capacity consumption value per unit time, the propagation time, the own calculation amount, and the own calculation power includes:

；

14. The method for task allocation according to claim 7, wherein determining the second control delay corresponding to the edge server when performing the task in the current time slot based on the data amount of the parameter data, the link performance parameter, and the server performance parameter of the edge server in the current time slot includes:

15. The method for task allocation according to claim 14, wherein determining the second total energy value corresponding to the edge server when performing the task in the current time slot according to the second control delay corresponding to the edge server when performing the task in the current time slot and the server performance parameter includes:

16. The method according to claim 15, wherein determining the second total energy consumption value according to the transmit power of the edge terminal, the upload time, the energy consumption value per unit time of the edge server, the propagation time, the calculation amount of the edge server, and the calculation power of the edge server includes:

；

wherein,represents a set of edge terminals that select an edge server to perform a computing task,representing edge terminationiSelf-emission power, < >>Representing edge terminationiIn the first placetUploading time corresponding to each time slot, +.>Representing edge serversiCapacity consumption value per second when receiving downlink data,/v>Middle and upper markEDBroadcasting representing edge server correspondence, +.>Representing edge serversiIn the first placetPropagation time corresponding to each time slot, < > >Representing a second total energy consumption value,/->Middle and upper markERepresenting the power consumption of the edge server.

17. The task allocation method according to claim 7, wherein constructing a cost function for executing a task based on the first control latency, the first total energy value, the second control latency, the second total energy value, a weight parameter, and a cost penalty parameter comprises:

；

wherein,representing the total control delay->Representing the total energy consumption value>Representing said first control delay,/for>Representing said first total energy consumption value,/v>Representing said second control delay, +.>Representing said second total energy consumption value, +.>The weight parameter is represented as a function of the weight parameter,C ^p representing the cost penalty parameter,>representing the cost function, when the total control delay is greater than the set delay threshold value +.>When the total control delay is less than or equal to the set delay threshold +.>Selecting the edge terminal to execute the computing task >Selecting an edge server to perform a computing task>。

18. The task allocation method according to claim 17, wherein constructing a metric function for selecting a policy action according to the corresponding number of states, the number of policy actions and the cost function in the current slot comprises:

；

19. The task allocation method according to any one of claims 1 to 18, further comprising, after the obtaining the action task issued after the target device to which the target policy action is directed performs the calculation task and performing the action task:

Adding one to the time slot number after each execution of the action task;

20. The task allocation device is characterized by comprising a determination unit, a construction unit, an adjustment unit, an acquisition unit and an execution unit;

21. A task assigning apparatus, characterized by comprising:

a memory for storing a computer program;

processor for executing said computer program to carry out the steps of the task allocation method according to any one of claims 1 to 19.

22. A computer readable storage medium, having stored thereon a computer program which, when executed by a processor, implements the steps of the task allocation method according to any one of claims 1 to 19.