CN112202886B

CN112202886B - Task unloading method, system, device and storage medium

Info

Publication number: CN112202886B
Application number: CN202011061127.6A
Authority: CN
Inventors: 唐冬; 李福乐
Original assignee: Guangzhou University
Current assignee: Guangzhou University
Priority date: 2020-09-30
Filing date: 2020-09-30
Publication date: 2023-06-23
Anticipated expiration: 2040-09-30
Also published as: CN112202886A

Abstract

The invention discloses a task unloading method, a system, a device and a storage medium. By matching the first terminal and the second terminal and enabling one of the terminals to serve as a cooperative unloading terminal of the other terminal, a parallel transmission structure can be realized for cooperative unloading, for example, task data of one terminal are unloaded to the cooperative unloading terminal and an access point in parallel, so that task unloading delay and energy consumption are reduced; compared with the prior art, the task offloading method in the embodiment is more suitable for the environment of multiple terminals, and can realize a larger range of delay optimization, such as optimization of total time consumption and resource consumption in a longer period of time. The invention is widely applied to the technical field of mobile communication.

Description

Task unloading method, system, device and storage medium

Technical Field

The present invention relates to the field of mobile communications technologies, and in particular, to a task offloading method, system, device, and storage medium.

Background

In 5G communication technologies such as NOMA-MEC, a mobile terminal needs to perform task offloading, for example, the mobile terminal offloading task data to an Access Point (AP) or the like, in order to implement a communication procedure. Among the problems faced by the task offloading process are latency problems. The task unloading technology based on edge calculation is one kind of available technology capable of reducing task unloading delay, and the principle of edge calculation is to separate the transaction processing originally performed by the core node into the edge nodes for processing and to lay out near the end user. However, the task offloading technique based on edge computation does not guarantee that the total time consumption and resource consumption is optimized over a long period of time without prior knowledge of the behavior of the other terminals.

Disclosure of Invention

In view of at least one of the above technical problems, an object of the present invention is to provide a task offloading method, system, device and storage medium.

In one aspect, an embodiment of the present invention includes a task offloading method, including:

executing a decision process; the decision process is used for determining a second terminal matched with the first terminal, and setting the second terminal as a cooperative unloading terminal of the first terminal or setting the first terminal as a cooperative unloading terminal of the second terminal according to a judging condition;

according to the setting result of the cooperative unloading terminal, determining a network state value;

determining an instant prize value based on the network status value and the action performed in the decision process;

determining a long-term utility value according to the instant prize value;

determining an optimal decision process; the optimal decision process is a decision process that maximizes the long-term utility value;

and executing task unloading by the first terminal, the second terminal and the cooperative unloading terminal determined by the optimal decision process.

Further, the task unloading method further comprises the following steps:

training a neural network; the inputs to the neural network include the network state values and actions performed in the decision process;

and ending training of the neural network when the deviation between the output of the neural network and the maximized long-term utility value is smaller than a preset threshold value.

Further, in the training neural network, the mean square error is used as a loss function.

Further, the setting the second terminal as a cooperative offload terminal of the first terminal or the first terminal as a cooperative offload terminal of the second terminal according to a determination condition includes:

determining a first offload delay and a second offload delay; the first offloading delay is an offloading delay of the first terminal in an OMA mode, and the second offloading delay is an offloading delay of the second terminal in an OMA mode;

when the second unloading delay is smaller than the first unloading delay, setting the second terminal as a cooperative unloading terminal of the first terminal;

and setting the first terminal as a cooperative unloading terminal of the second terminal when the first unloading delay is smaller than the second unloading delay.

Further, the immediate prize value is an inverse of a sum of the first offload delay and the second offload delay.

Further, when the second terminal is a cooperative offloading terminal of the first terminal, the second terminal and the cooperative offloading terminal determined by the optimal decision process perform task offloading, including:

separating a first part and a second part from task data of the first terminal;

offloading, by the first terminal, the first portion to an access point and the second portion to the second terminal in a first time slot;

the second terminal synthesizes the received second part with local task data and then separates a third part and a fourth part;

and unloading the third part to an access point by the second terminal in a second time slot, and performing local calculation on the fourth part.

Further, when the first terminal is a cooperative offloading terminal of the second terminal, the first terminal, the second terminal and the cooperative offloading terminal determined by the optimal decision process perform task offloading, including:

separating a first part and a second part from task data of the second terminal;

offloading, by the second terminal, the first portion to an access point and the second portion to the first terminal in a first time slot;

the first terminal synthesizes the received second part with local task data and then separates a third part and a fourth part;

and unloading the third part to an access point by the first terminal in a second time slot, and performing local calculation on the fourth part.

In another aspect, an embodiment of the present invention further includes a task offloading system, including:

a first unit for performing a decision process; the decision process is used for determining a second terminal matched with the first terminal, and setting the second terminal as a cooperative unloading terminal of the first terminal or setting the first terminal as a cooperative unloading terminal of the second terminal according to a judging condition;

the second unit is used for determining a network state value according to the setting result of the cooperative unloading terminal;

a third unit for determining an instant prize value based on the network status value and the action performed in the decision process;

a fourth unit for determining a long-term utility value based on the instant prize value;

a fifth unit for determining an optimal decision process; the optimal decision process is a decision process that maximizes the long-term utility value;

and the sixth unit is used for executing task unloading by the first terminal, the second terminal and the cooperative unloading terminal determined by the optimal decision process.

In another aspect, embodiments of the present invention also include a computer apparatus comprising a memory for storing at least one program and a processor for loading the at least one program to perform the method of the embodiments.

In another aspect, embodiments of the present invention further include a storage medium having stored therein a processor-executable program for performing the method of the embodiments when executed by a processor.

The beneficial effects of the invention are as follows: according to the task unloading method in the embodiment, by matching the first terminal and the second terminal and enabling one terminal to serve as a cooperative unloading terminal of the other terminal, a parallel transmission structure can be realized for cooperative unloading, for example, task data of one terminal are unloaded to the cooperative unloading terminal and an access point in parallel, so that task unloading delay and energy consumption are reduced; compared with the prior art, the task offloading method in the embodiment is more suitable for the environment of multiple terminals, and can realize a larger range of delay optimization, such as optimization of total time consumption and resource consumption in a longer period of time.

Drawings

FIG. 1 is a flow chart of a task offloading method in an embodiment;

FIGS. 2 and 3 are schematic diagrams of a task offloading method in an embodiment;

FIG. 4 is a schematic diagram of performing task offloading in an embodiment;

fig. 5 is a schematic diagram of delay optimization effect obtained by performing a task offloading method on multiple communication systems when m=5 in the embodiment;

fig. 6 is a schematic diagram of delay optimization effect obtained by performing a task offloading method on multiple communication systems when m=10 in the embodiment.

Detailed Description

In this embodiment, referring to fig. 1, the task offloading method performed includes the steps of:

s1, executing a decision process; the decision process is used for determining a second terminal matched with the first terminal, and setting the second terminal as a cooperative unloading terminal of the first terminal or setting the first terminal as a cooperative unloading terminal of the second terminal according to a judging condition;

s2, determining a network state value according to a setting result of the cooperative unloading terminal;

s3, determining an instant rewarding value according to the network state value and actions executed in the decision process;

s4, determining a long-term utility value according to the instant rewarding value;

s5, determining an optimal decision process; wherein the optimal decision process is a decision process that maximizes the long-term utility value;

s6, executing task unloading according to the first terminal, the second terminal and the cooperation unloading terminal determined by the optimal decision process.

The principle of steps S1-S6 is shown in fig. 2 and 3. In this embodiment, for convenience in expression, the first terminal may be referred to as terminal n, and the second terminal may be referred to as terminal m.

The decision process in step S1 may be implemented using an action function. In the action function, the first terminal needs to select a proper action according to a certain strategy under the current observed environment state, so as to obtain a larger instant rewarding value. The action space of the action function is expressed as

Wherein (1)>

The case where the second terminal m is set as a cooperative offload terminal of the first terminal n in the decision process epoch k is shown. In this embodiment, if the second terminal m is set as the cooperative offload terminal of the first terminal n, +.>

On the contrary->

Take other values. 2M represents the total number of terminals including the first terminal n and the second terminal M, and the second terminal M may be any one of the 2M terminals except the first terminal n and the user that has been selected. It should be noted that 2M terminals select the cooperative offloading terminals in turn, and the selected terminals accept the group pair by default, so that all terminals complete one offloading task, and the selection process needs to be performed M times.

In this embodiment, the determination condition for determining that the second terminal is set as the cooperative offload terminal of the first terminal or that the first terminal is set as the cooperative offload terminal of the second terminal is: the offload delay of the first terminal n in OMA mode is a first offload delay T _n The offload delay of the second terminal m in OMA mode is a second offload delay T _m If the first unloading delay is less than the second unloading delay, T _n ＜T _m The first terminal n is set as the cooperative offload terminal of the second terminal m, and vice versa, if the second terminal mThe two unloading delays being smaller than the first unloading delay, T _m ＜T _n The second terminal m is set as a cooperative offload terminal of the first terminal n.

In step S2, the network status value may be expressed as

The network state value may characterize the network state offloaded in each decision process epoch k. Wherein m is _n Indicating that the first terminal n matches the second terminal m, delta _n Indicating the setting result of the cooperative unloading terminal, when delta _n =0, then the first terminal n acts as a cooperative unloading terminal for the second terminal m, when δ _n =1, the second terminal m acts as a cooperative offload terminal for the first terminal n.

In step S3, the instant prize value is

Where s is the network state value and a is the action performed in the decision process. />

Representing the instant prize obtained by the selection of action a in the decision process epoch k, at the network state value s, according to the policy pi used in the decision process.

In this embodiment, the instant prize value may be set to

I.e. the instant prize value is the first offload delay T _n And a second unloading delay T _m The inverse of the sum.

In this embodiment, the objective of performing steps S1-S6 is to find a suitable strategy that optimizes the long-term cumulative utility of the terminal. In step S4, the relationship between the long-term utility value V (S, pi) and the instant prize value may be expressed as

It can be seen from the formula that the closer the instant reward value is to the current time, the larger the proportion of the long-term utility value calculation is, so that the long-term utility value corresponding to the optimal decision process should be the maximum value of all the long-term utility values, that is, the decision process corresponding to the maximized long-term utility value is the optimal decision process. Thus, step S5 may be expressed as

In this embodiment, by executing steps S1 to S5, an optimal decision process may be determined, and further, a matching relationship between the first terminal n and the second terminal m and which terminal of the first terminal n and the second terminal m is used as a cooperative unloading terminal of the other terminal may be determined in the optimal decision process, and finally, step S6 is executed.

In this embodiment, step S6 will be described by taking the cooperative offload terminal in which the second terminal m is the first terminal n as an example. The principle of step S6 is shown in fig. 4, which specifically includes the following sub-steps:

s601, separating a first part and a second part from task data of a first terminal;

s602, unloading the first part to an access point and unloading the second part to the second terminal by the first terminal in a first time slot;

s603, the second terminal synthesizes the received second part with local task data and then separates a third part and a fourth part;

s604, unloading the third part to an access point by the second terminal in a second time slot, and performing local calculation on the fourth part.

In steps S601-S604, the task data of the first terminal may be denoted as L _u In step S601, except L _u Is decomposed into a first part and a second part, and L can be also calculated _u Is decomposed into a first part l _u，a Second part l _u，h And a third part L _u -l _u，h -l _u，a 。

This practice isIn the embodiment, the first time slot t _u For the transmission time from a first terminal to an access point and to a second terminal as a cooperative offload terminal, there is therefore l _u，h ＝t _u R _u，h L _u，a ＝t _u R _u，a . Available according to SIC (Successive Interference Cancelation)

And->

In this embodiment, referring to fig. 4, the second terminal, which is a cooperative offload terminal, has a size L _h Is required to be performed within the time limit T. Referring to fig. 4, in step S602, a first terminal is in a first slot t _u Task data of the first part _u，a Offloading to access point at first time slot t _u Task data/of the second part _u，h Offloading to the second terminal, task data L of the third part _u -l _u，h -l _u，a The local calculation is performed by the first terminal.

In this embodiment, referring to fig. 4, in step S603, the second terminal receives the task data l of the second portion _u，h Task data L local to the second terminal _h After synthesis, the third fraction alpha (l _u，h +L _h ) And a fourth part (1-alpha) (l _u，h +L _h ) Wherein alpha is a proportionality coefficient and alpha E [0,1 ]]。

In this embodiment, referring to fig. 4, in step S603, the second terminal performs a second time slot t _h The task data alpha (l of the third part _u，h +L _h ) Offloading to the access point, task data (1-alpha) (l _u，h +L _h ) For the second terminal to perform local calculations.

In this embodiment, the first terminal n may be used as a cooperative offload terminal of the second terminal m, and the first terminal and the second terminal in steps S601 to S604 are interchanged to form a specific sub-step of step S6 when the first terminal n is used as a cooperative offload terminal of the second terminal m.

In this embodiment, the task unloading method may further include the following steps:

s7, training a neural network; wherein the input of the neural network comprises a network state value s and an action a executed in the decision process;

s8, when the deviation between the output of the neural network and the maximized long-term utility value is smaller than a preset threshold value, training of the neural network is finished.

In step S7, the neural network may be represented as V (S, a; θ), where S is a network state value, a is an action performed in the decision process, and θ is a parameter of the neural network.

The training end condition of step S8 may be expressed as V (S, a; θ) ≡V ^π (s, a), i.e. performing a number of training cycles, ending the training process when the deviation between the output of the neural network and the maximized long-term utility value is less than a small preset threshold. In performing steps S7 and S8, the mean square error may be used as a loss function of the training process. In the present embodiment, the mean square error used may be L (θ) = (R (s, a) +γmax) _a′ V(s′，a′；θ)-V(s，a；θ)) ² The algorithm parameter theta updating method is as follows: given a continuous differentiable function L (θ) to be optimized, a learning rate γ, and a set of initial values θ ₀ ＝(θ ₀₁ ，θ ₀₂ ，...，θ _0M (ii) calculating the gradient of the function to be optimized

Updating an iteration formula: />

Then calculate θ ⁰⁺¹ The function gradient at (a) and loop iteration, when the modulus of the gradient vector: />

And ending the iteration, thereby obtaining the optimal parameters of the neural network.

In the case where m=5 is set, the task offloading method in the present embodiment is executed for the communication systems such as OMA, NOMA-MEC-range, NOMA-MEC-DQN, and the resulting delay is as shown in fig. 5. As can be seen from fig. 5, the system delay in OMA mode remains unchanged, since the terminal can only offload computing tasks to the MEC server one by one; after NOMA-MEC is adopted, the system delay is obviously reduced, in addition, in the NOMA-MEC, as the action times are increased, the system delay of the DQN algorithm is reduced, after 1200 periods, the value is basically stable, the fluctuation is small, and the curve is not obviously increased although the curve is increased; while at random pairing, the system delay is fluctuating without increasing, which occurs because DQN-based offloading can choose the appropriate action depending on the circumstances.

In the case where m=10 is set, the task offloading method in the present embodiment is executed for the communication systems such as OMA, NOMA-MEC-range, NOMA-MEC-DQN, and the resulting delay is as shown in fig. 6. As can be seen from fig. 6, the 3000-set algorithm converges because of the large number of users, which requires a long learning time. The value of the system delay fluctuates only to a small extent and does not increase significantly after 3000 iterations.

As can be seen from fig. 5 and 6, the task offloading method in this embodiment can have a good delay reduction effect on task offloading in communication systems such as NOMA-MEC-random and NOMA-MEC-DQN.

In this embodiment, a task offloading system includes:

Wherein the first unit, the second unit, the third unit, the fourth unit, the fifth unit and the sixth unit may be hardware, software or a combination of hardware and software with corresponding functions. By applying the task offloading system, the same technical effects as those described in the embodiments can be achieved.

In this embodiment, a computer apparatus includes a memory for storing at least one program and a processor for loading the at least one program to perform the task offloading method in the embodiment, to achieve the same technical effects as described in the embodiment.

In this embodiment, a storage medium has stored therein a processor-executable program for performing the task offloading method in the embodiment when executed by a processor, achieving the same technical effects as described in the embodiment.

It should be noted that, unless otherwise specified, when a feature is referred to as being "fixed" or "connected" to another feature, it may be directly or indirectly fixed or connected to the other feature. Further, the descriptions of the upper, lower, left, right, etc. used in this disclosure are merely with respect to the mutual positional relationship of the various components of this disclosure in the drawings. As used in this disclosure, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. In addition, unless defined otherwise, all technical and scientific terms used in this example have the same meaning as commonly understood by one of ordinary skill in the art. The terminology used in the description of the embodiments is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The term "and/or" as used in this embodiment includes any combination of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used in this disclosure to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element of the same type from another. For example, a first element could also be termed a second element, and, similarly, a second element could also be termed a first element, without departing from the scope of the present disclosure. The use of any and all examples, or exemplary language (e.g., "such as") provided herein, is intended merely to better illuminate embodiments of the invention and does not pose a limitation on the scope of the invention unless otherwise claimed.

It should be appreciated that embodiments of the invention may be implemented or realized by computer hardware, a combination of hardware and software, or by computer instructions stored in a non-transitory computer readable memory. The methods may be implemented in a computer program using standard programming techniques, including a non-transitory computer readable storage medium configured with a computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner, in accordance with the methods and drawings described in the specific embodiments. Each program may be implemented in a high level procedural or object oriented programming language to communicate with a computer system. However, the program(s) can be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language. Furthermore, the program can be run on a programmed application specific integrated circuit for this purpose.

Furthermore, the operations of the processes described in the present embodiments may be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The processes (or variations and/or combinations thereof) described in this embodiment may be performed under control of one or more computer systems configured with executable instructions, and may be implemented as code (e.g., executable instructions, one or more computer programs, or one or more applications), by hardware, or combinations thereof, that collectively execute on one or more processors. The computer program includes a plurality of instructions executable by one or more processors.

Further, the method may be implemented in any type of computing platform operatively connected to a suitable computing platform, including, but not limited to, a personal computer, mini-computer, mainframe, workstation, network or distributed computing environment, separate or integrated computer platform, or in communication with a charged particle tool or other imaging device, and so forth. Aspects of the invention may be implemented in machine-readable code stored on a non-transitory storage medium or device, whether removable or integrated into a computing platform, such as a hard disk, optical read and/or write storage medium, RAM, ROM, etc., such that it is readable by a programmable computer, which when read by a computer, is operable to configure and operate the computer to perform the processes described herein. Further, the machine readable code, or portions thereof, may be transmitted over a wired or wireless network. When such media includes instructions or programs that, in conjunction with a microprocessor or other data processor, implement the steps described above, the invention described in this embodiment includes these and other different types of non-transitory computer-readable storage media. The invention also includes the computer itself when programmed according to the methods and techniques of the present invention.

The computer program can be applied to the input data to perform the functions described in this embodiment, thereby converting the input data to generate output data that is stored to the non-volatile memory. The output information may also be applied to one or more output devices such as a display. In a preferred embodiment of the invention, the transformed data represents physical and tangible objects, including specific visual depictions of physical and tangible objects produced on a display.

The present invention is not limited to the above embodiments, but can be modified, equivalent, improved, etc. by the same means to achieve the technical effects of the present invention, which are included in the spirit and principle of the present invention. Various modifications and variations are possible in the technical solution and/or in the embodiments within the scope of the invention.

Claims

1. A method of task offloading comprising:

determining a long-term utility value according to the instant prize value;

executing task unloading by the first terminal, the second terminal and the cooperative unloading terminal determined by the optimal decision process;

the setting the second terminal as the cooperative offloading terminal of the first terminal or the first terminal as the cooperative offloading terminal of the second terminal according to the determination condition includes:

2. The task offloading method of claim 1, wherein the task offloading method further comprises the steps of:

3. The task offloading method of claim 2, wherein the training neural network has a mean square error as a loss function.

4. The task offloading method of claim 1, wherein the immediate prize value is an inverse of a sum of the first offload delay and the second offload delay.

5. The task offloading method of claim 1, wherein when the second terminal is a cooperative offloading terminal of the first terminal, the second terminal, and the cooperative offloading terminal determined by the optimal decision process perform task offloading, comprising:

separating a first part and a second part from task data of the first terminal;

6. The task offloading method of claim 1, wherein when the first terminal is a cooperative offloading terminal of the second terminal, the first terminal, the second terminal, and the cooperative offloading terminal determined by the optimal decision process perform task offloading, comprising:

7. A task offloading system, comprising:

a sixth unit, configured to execute task offloading according to the first terminal, the second terminal, and the cooperative offloading terminal determined by the optimal decision process;

8. A computer device comprising a memory for storing at least one program and a processor for loading the at least one program to perform the method of any of claims 1-6.

9. A storage medium having stored therein a processor executable program, wherein the processor executable program when executed by a processor is for performing the method of any of claims 1-6.