CN114466409A

CN114466409A - Machine communication-oriented data unloading control method and device

Info

Publication number: CN114466409A
Application number: CN202210370705.7A
Authority: CN
Inventors: 冯伟; 魏鹏; 王景维; 葛宁
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2022-04-11
Filing date: 2022-04-11
Publication date: 2022-05-10
Anticipated expiration: 2042-04-11
Also published as: CN114466409B

Abstract

The embodiment of the application discloses a machine communication-oriented data unloading control method and device. The method comprises the following steps: set up the firstnAn MEC covering an arealThe data unloading model of the time slot, wherein the input parameter of the data unloading model is description information of a data unloading task, the output parameter is a variable speed indication of the advancing speed of equipment, the size of unloaded data and a target MEC of data unloading, and the data unloading model determines an output parameter value corresponding to the input parameter value by using a decision parameter; wherein the decision parameter comprises at least one of: the processing power of the MEC, the processing power of the device and the quality of service of the channel; calculating an output parameter value corresponding to the input parameter value of each MEC coverage area by using the data unloading model to obtain a data unloading strategy; performing a data offload operation on the device according to the data offload policy, wherein,n≤N，l≤L _nand is andn、N、landL _nare all integers greater than 0.

Description

Machine communication-oriented data unloading control method and device

Technical Field

The present disclosure relates to the field of information processing, and in particular, to a method and an apparatus for controlling data offloading for machine-oriented communication.

Background

In future networks, a large number of machines are connected to the network, and machine communication becomes an important part of the future networks. Moreover, less-or unmanned production, manufacturing, logistics, etc. have accelerated the need for machine communication. For example, in the process of transporting cold chain products by an automatic truck for a long distance, in order to ensure quick and reliable delivery of the cold chain products, a large amount of sensing data (such as visual data of road conditions and the like, sensitive data of cargo temperature and the like) needs to be transmitted and processed in time through a network, so that the truck can be monitored and controlled in real time, but the truck has a wide moving range, can experience network environments with various communication qualities such as a cellular network, a satellite network and a non-network, and the development of machine communication technology is urgently needed in order to meet the application requirements of machines such as computation intensive type, time delay sensitive type and the like. The main service object of the existing Long-Term Evolution (LTE) network is a person, and the main function is to provide a data transmission pipeline, and to emphasize that data collected at the edge of the network is uploaded to a data center for centralized processing, which technically depends on effective deployment of a base station and an ultra-high performance central server. Although the service objects of the existing industrial network are machines, the deployment of a wired network and the special purpose of a private network are emphasized to ensure the reliability of the system. However, centralized processing would make it difficult to meet the ultra-low latency requirements of machine applications, the diverse mobile machine communication requirements present challenges to wired networks, and the limited bandwidth and energy supply would further degrade the performance of machine communication systems as the mobile machine application requirements dramatically increase.

The development of machine communication cannot be started from scratch, and one feasible technical approach is to modify the current technologies of LTE, industrial network and the like. In order to meet the Mobile application requirement of machine communication, Mobile Edge Computing (MEC) can be added at a position (such as a base station and an aggregation node) close to the Edge of a network, so that part of the control function of a core network is moved down, various offload services of a machine are rapidly processed and judged, and the service quality of the network is improved.

With the continuous development of machine communication, the demand of machines for services such as mobile internet and the like is increasing, and the contradiction between the unreliability of wireless channels and the scarcity of resources thereof and the increasing demand of machine mobile services is increasingly prominent. Therefore, how to design the MEC-based offload decision scheme to solve the above contradiction is an urgent problem to be solved.

Disclosure of Invention

In order to solve any technical problem, embodiments of the present application provide a method and an apparatus for controlling data offloading for machine-oriented communication.

To achieve the purpose of the embodiments of the present application, embodiments of the present application provide a method for controlling data offloading, where devices sequentially pass throughNAn edge compute server, MEC, coverage area, the method comprising:

set up the firstnThe coverage area of each MEC islThe data unloading model of the time slot, wherein the input parameter of the data unloading model is description information of a data unloading task, the output parameter is a variable speed indication of the advancing speed of equipment, the size of unloaded data and a target MEC of data unloading, and the data unloading model determines an output parameter value corresponding to the input parameter value by using a decision parameter; wherein the decision parameter comprises at least one of: the processing power of the MEC, the processing power of the device and the quality of service of the channel;

calculating an output parameter value corresponding to the input parameter value of each MEC coverage area by using the data unloading model to obtain a data unloading strategy;

performing data offload operations on the device according to the data offload policy,

wherein the content of the first and second substances,n≤N，l≤

and is andn、N、land

are all integers greater than 0.

A storage medium having a computer program stored therein, wherein the computer program is arranged to perform the method as described above when executed.

An electronic device comprising a memory having a computer program stored therein and a processor arranged to execute the computer program to perform the method as described above.

A control device for data unloading is provided with the electronic device.

One of the above technical solutions has the following advantages or beneficial effects:

set up the firstnAn MEC covering an arealAnd the data unloading model of the time slot calculates the output parameter value corresponding to the input parameter value of each MEC coverage area by using the data unloading model to obtain a data unloading strategy, carries out data unloading operation on the equipment according to the data unloading strategy, and carries out data unloading control by using the data unloading strategy which accords with the machine and service characteristics in the communication system, so that the wireless resources can be better adapted, the resource utilization rate is improved, the overall performance of the network is improved, and the aim of improving the service quality of the wireless network system based on the MEC is fulfilled.

Additional features and advantages of the embodiments of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the embodiments of the application. The objectives and other advantages of the embodiments of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

The accompanying drawings are included to provide a further understanding of the embodiments of the present application and are incorporated in and constitute a part of this specification, illustrate embodiments of the present application and together with the examples of the embodiments of the present application do not constitute a limitation of the embodiments of the present application.

Fig. 1 is a flowchart of a control method for data offloading for machine-oriented communication according to an embodiment of the present disclosure;

fig. 2 is a schematic diagram of an application scenario provided in an embodiment of the present application;

FIG. 3 is a diagram for comparing the effects of two modes provided by the embodiment of the present application and the modes of the related art;

fig. 4 is a comparison graph of the effects of the two modes provided by the embodiment of the application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application more apparent, the embodiments of the present application will be described in detail below with reference to the accompanying drawings. It should be noted that, in the embodiments of the present application, features in the embodiments and the examples may be arbitrarily combined with each other without conflict.

At present, in theoretical research of an unloading decision optimization method based on MEC, on one hand, a device is assumed to be uniform, and on the other hand, a network senses the speed of a machine. The first method often does not consider time-varying moving speed when optimizing the offloading decision scheme and configuring the wireless resources, and the second method configures calculation and communication resources according to the influence of the moving speed of the device on the communication distance, which requires a complex offloading scheme optimization design.

Further, neither of the two ways provided by the related art considers adapting wireless communication resources by controlling the moving speed of the device. Because the moving speed of the equipment is not adjusted, when the machine passes through the unavailable area of wireless connection at low speed, a large amount of new data cannot be unloaded to an MEC (Mobile Edge Computing) server, and excessive local Computing can increase the processing delay of the service; meanwhile, due to the severe wireless channel attenuation, the unloading data is seriously interfered, and an optimal unloading scheme is difficult to obtain; in addition, when the channel quality is good and available, frequent service migration caused by too fast movement of the machine increases the service cost and reduces the resource allocation efficiency.

Based on the above analysis, the embodiment of the present application provides that, by using the controllability of the device mobility, an offloading decision optimization scheme oriented to a business process is explored, machine and business characteristics in a machine communication system need to be fully mined, wireless resources are adapted, and the resource utilization rate is improved, so that the overall performance of a network is improved, and the service quality of a wireless network system based on MEC is improved.

Fig. 1 is a flowchart of a control method for data offloading for machine-oriented communication according to an embodiment of the present disclosure. As shown in fig. 1, characterized in that the apparatus passes through in sequenceNA coverage area of each MEC server, the method comprising:

step 101, establishingnAn MEC covering an arealThe data unloading model of the time slot, wherein the input parameter of the data unloading model is description information of a data unloading task, the output parameter is a variable speed indication of the equipment advancing speed, the size of the unloaded data and a target MEC of data unloading, and the data unloading model utilizes the decision parameter to determine the output parameter value corresponding to the input parameter value;

wherein the decision parameter comprises at least one of: the processing power of the MEC, the processing power of the device and the quality of service of the channel; wherein, the first and the second end of the pipe are connected with each other,n≤N，l≤

and is andn、N、land

are all integers greater than 0;

specifically, in a network accessed by a large number of people and machines, the environment of a wireless communication network is very complex, when a truck blindly drives into a wireless coverage area with high reliability, large interference and scarce bandwidth resources, even an area without network coverage, sensing data of gigabyte order cannot be timely unloaded to an MEC server, and excessive local calculation obviously increases service delay; in areas with sufficient radio channel resources, too fast movement of the machine may exacerbate frequent service migration, increasing service costs. One possible solution is to adaptively adjust the moving speed according to the wireless channel state machine, thereby improving the utilization rate of wireless resources. When wireless channel resources are unavailable (or channel quality is poor), the device accelerates to move to a resource available area; conversely, when wireless channel resources are available (or channel quality is good), the device slows down to obtain more offload slots.

In the data unloading model, the firstnAn MEC inlTask description information of time slot is input parameter, takenAn MEC inlThe variable speed indication of the time slot, the data size required to be unloaded and the target MEC of the data unloading are taken as output parameters, so that the self-adaption to decision parameters can be realized, and the machines in the machine communication system can be fully minedAnd traffic characteristics, thereby enabling data offloading tasks to adapt radio resources.

102, calculating an output parameter value corresponding to an input parameter value of each MEC coverage area by using the data unloading model to obtain a data unloading strategy;

sequentially acquiring a data unloading strategy of each MEC coverage area; and acquiring the data unloading strategy of each time slot in the coverage area of each MEC according to the time sequence.

103, carrying out data unloading operation on the equipment according to the data unloading strategy;

the obtained data unloading strategy accords with the machine and service characteristics in the communication system, adapts to wireless resources, improves the resource utilization rate, improves the overall performance of the network, and achieves the aim of improving the service quality of the wireless network system based on the MEC.

The method provided by the embodiment of the application establishesnAn MEC covering an arealAnd the data unloading model of the time slot calculates the output parameter value corresponding to the input parameter value of each MEC coverage area by using the data unloading model to obtain a data unloading strategy, carries out data unloading operation on the equipment according to the data unloading strategy, and carries out data unloading control by using the data unloading strategy which accords with the machine and service characteristics in the communication system, so that the wireless resources can be better adapted, the resource utilization rate is improved, the overall performance of the network is improved, and the aim of improving the service quality of the wireless network system based on the MEC is fulfilled.

The following describes a method of the system according to an embodiment of the present application:

in one exemplary embodiment, the data offload model includes expressions for states, rewards, and actions;

wherein, the firstnAn MEC covering an arealThe state expression for a time slot includes at least one of the following parameters:

identification of MEC coverage area; whether a channel is available; first, thenAn MEC covering an arealAn offload data size for a time slot; the equipment is atnAn MEC covering an arealA speed of one time slot; deviceThe processing power of (a); first, thenProcessing power of each MEC server;

wherein, the firstnThe reward for each MEC coverage area is derived by,

get the firstnGenerating an unloaded instantaneous prize value per slot in an MEC coverage area, whereinnThe total number of time slots of each MEC coverage area is based onnThe coverage area of each MEC and the moving speed of the equipment are determined;

to the firstnThe instantaneous reward value of each time slot in the coverage area of each MEC is accumulated according to a set threshold to obtain the secondnVariable speed reward values for individual MEC coverage areas;

wherein, the firstnAn MEC covering an arealThe action expression of each time slot comprises at least two parameters as follows:

target MEC for data offload, secondnAn MEC covering an arealThe size of the unloaded data of each time slot and speed change indication information, wherein the speed change indication information comprises deceleration, uniform speed and acceleration, and the speed change indication information is updated only when the coverage area of the MEC changes.

Further, the obtaining is the firstnGenerating an offloaded instantaneous prize value per slot in an MEC coverage area, comprising:

get the firstnAn MEC covering an arealThe time delay required by the data unloading action of each time slot and the total time delay for completing the unloading operation;

obtaining the first time delay according to the time delay required by the data unloading action and the total time delay for completing the unloading operationlAn instantaneous prize value for the time slot;

wherein, the time delay required by the data unloading action is a value obtained by removing at least one of the following time delays from the total time delay for completing the unloading operation, and the time delay required by the data unloading action comprises the following steps:

first, thenAn MEC covering an arealCommunication delay of one time slot

Of 1 atnAn MEC covering an arealComputing time delay for completing unloading task in one time slot

(ii) a First, thenAn MEC covering an arealTime delay of task migration of time slot

(ii) a First, thenAn MEC covering an arealTime delay calculated by time slot equipment

。

By passing through the apparatusNEach Access Point (AP) is provided with an independent MEC server, optical fiber connection is adopted among MECs, between APs and MECs, and the second step is thatnCoverage distance of AP is

(ii) a Speed range of the device

Linear accelerationa>0, shift indicator variable

Respectively corresponding to deceleration, uniform speed and acceleration; in the first placenA coverage area oflThe data size ratio unloaded at one moment is

(ii) a Neglecting the time delay returned by the calculation result after the MEC calculation is completed; suppose during the course of the movement of the device, in

The unloading task is always generated in time, and the time slot interval for generating the unloading task is

Then the total number of time slots is

。

And establishing a state, reward and action model of the system based on Markov Decision Process (MDP).

In the first placenAn MEC covering an arealThe state of a slot is represented as:

（1）

wherein the content of the first and second substances,D(l) Is shown aslThe offload data size (number of bits) of a slot,

indicating that the device is inlThe available CPU for a time slot calculates the frequency,

is shown aslOne time slot tonThe available CPU of each MEC server calculates the frequency,

it is indicated that the channel is available,

indicating that the channel is not available.

In the first placenThe instantaneous reward for an individual MEC coverage area is expressed as:

（2）

wherein the content of the first and second substances,

is shown innOne MEC coverage area yields the number of unloaded slots,

indicating the time delay for the offloading task to be completed entirely by local computation, when accelerating linearlyWhen the degree is larger, the degree of the reaction is higher,

the approximation calculation can be performed as follows:

（3）

wherein the content of the first and second substances,

and

respectively indicate entry tonInitial velocity and departure of individual zonesnThe final speed of each zone. If it is

Then the instantaneous speed can be expressed as:

（4）

if it is

Then the instantaneous speed can be expressed as:

（5）

if it is

Then the instantaneous speed can be expressed as:

（6）

is the firstnAn MEC covering an arealThe communication delay of a time slot is expressed as:

（7）

wherein the content of the first and second substances,h(l) Is shown aslThe channel impulse response of a time slot is,pwhich is indicative of the transmission power of the signal,W _nwhich represents the transmission bandwidth of the signal and,

representing the channel noise power.

Represents the computational delay of the offloading task:

（8）

wherein the content of the first and second substances,

representing the number of cycles of the CPU required per bit,

represents the locally calculated delay of the device:

（9）

is the time delay of task migration, and the current coverage area is assumed to benThe MEC server currently responsible for the computation ismThe migration delay is expressed as:

（10）

wherein when

When the utility model is used, the water is discharged,

and on the contrary,

。

in the first placenAn MEC covering an arealThe per-slot action is represented as:

（11）

the model solution has the following two ways:

the first method is as follows:

training the speed change instruction by adopting a traversal algorithm to obtain a first training result; training the size of the unloading data by adopting reinforcement learning to obtain a second training result;

from the first training results, choose the secondnA shift instruction value as the firstnA variable speed indication of individual MEC coverage areas; and determining the second training result from the second training resultnAnd taking the unloading data size used when the Q value is maximum in each time slot in the coverage area of each MEC as the unloading data size of the time slot.

Searching for speed change indication in a traversal mode based on computational expressions (1), (2) and (11)

While learning the unloading parameters using Q-learning

Andn. The specific implementation steps are as follows:

the input parameters include: whether a channel is available, computing power of the MEC, computing power of the device, eachCoverage of MEC

Number of iterations of the training operationT ₁State set, action set A, upper bound value of reward

And lower limit value

Step length of

Attenuation factor

Exploration rate

。

And step A1, randomly initializing Q values corresponding to all the states and actions according to the dimensions of the states and actions.

Specifically, the initial state is obtained according to the size of the unloading task, the position of the equipment and the moving speed

Let us order

。

Step A2, determiningnA shift indication for each MEC coverage area;

calculating the reward according to the calculation expression (2)

Obtaining the next state according to the calculation expression (1)

While calculating the next action

。

The total number of iterations to perform the following operation is

The method comprises the following steps:

step A1, obtaining initial state by using unloading task size, equipment position and moving speed

. Order to

。

Step A2, pairNA shift indication information for each zone in the coverage area of each MEC;

to a first ordernThe shift instruction information of each coverage area is explained as an example:

will record

Variable speed reward vector of

And (5) setting to zero.

To the firstnEach time slot of each coverage area performs the following operations, including:

randomly generating a value

；

By using

-greedy algorithm generating actions

: if it is used

Random generation ofCurrent action

(ii) a If it is not

，

；

In the current state

Execute the current action

Calculating the offloaded instant prize according to the calculation expression (2)

；

If it is not

Then the transmission award value is incremented by 1, i.e.

；

If it is not

Then the transmission award value is incremented by-1, i.e.

；

If it is used

Then, then

The value remains unchanged.

Based on accumulated speed changes within the MEC coverage areaReward vector, determining a shift indication for the whole of the MEC coverage area

. Based on training

Selecting each areanMost frequently present in

The value is used as the shift control in this region. Wherein, the first and the second end of the pipe are connected with each other,

。

based on obtainedNAfter the speed change indication of the whole MEC coverage area, the method is used forNOffload data size for each zone in an MEC coverage area;

reset state

Executing the following operations in each time slot to obtain the unloading data size of each time slot;

in the current time slot islAt the time, adopt

-greedy algorithm generating current action

。

In the current state

Execute the current action

；

The Q value is updated using the following calculation expression:

（12）

by adopting the above method, the average time delay for completing each calculation task can be obtained:

。

mode two

Training the speed change instruction by adopting reinforcement learning Q-learning to obtain a third training result; training the size of the unloading data by adopting reinforcement learning to obtain a fourth training result;

determining from the third training result the secondnThe gear shifting instruction used when the Q value of each MEC coverage area is maximum is used as the gear shifting instruction of the coverage area; and determining the fourth training result from the fourth training resultnAnd the size of the unloaded data used when the Q value is maximum in each time slot in the coverage area of each MEC and the selected MEC server are used as the unloaded data size of the time slot and the target MEC of data unloading.

Based on the calculation expressions (1), (2) and (11), two Q-learning are adopted to learn the unloading parameters respectively

、nAnd gear shift indication

. The specific implementation steps are as follows:

the input parameters include: whether a channel is available, computing power of an MEC, computing power of a device, coverage of each MEC

And lower limit value

Step length of

Attenuation factor

Exploration rate

。

The total number of iterations performing the following operation is

The method comprises the following steps:

step B1, obtaining initial state by using unloading task size, equipment position and moving speed

. Order to

。

Initialization: randomly initializing Q corresponding to all states and actions according to the dimensions of the states and the actions₁Value and Q₂The value is obtained.

Step B2, pairNA shift indication information for each zone in the coverage area of each MEC;

based on Q₂By using

-greedy algorithm generation

Will record

The variable speed prizeExcitation vector

And setting zero.

based on Q₁At the present state

Execute the current action

；

Calculating offloaded transient rewards according to calculation expression (2)

Obtaining the next state according to the calculation expression (1)

。

If it is not

Then the transmission award value is incremented by 1, i.e.

；

If it is used

Then the transmission award value is incremented by-1, i.e.

；

If it is not

Then, then

The value remains unchanged.

According to the calculation expressions (12) and

updating Q₁。

At the completion ofnAfter calculation of each time slot of each coverage area, according to the calculation expressions (12) and

update Q₂。

By adopting the above mode, the average time delay for completing each task is calculated:

。

the data unloading strategy determined by the two modes utilizes the channel information in the wireless coverage area, the unloading service requirement of the user and the moving speed of the equipment, reduces the completion delay of the service, reduces the processing cost of the service and improves the service quality of the network to the equipment.

Fig. 2 is a schematic diagram of an application scenario provided in an embodiment of the present application. As shown in FIG. 2, in the machine communication system, the bandwidth of the system is 20MHz, and the channel gain is

The signal transmission power of the device is 0.2W, and the noise power of the channel is

W, the number of MEC Servers (APs) is 16, channels with random 5 coverage areas are unavailable when equipment passes by, and the coverage distance of each AP is randomly generated

m, randomly generating computing power of MEC server

GHz, with 0.1GHz spacing, random generation deviceComputing power of

GHz, wherein the spacing is

GHz, size of offload data

KB with a middle interval of 150KB and a number of cycles of CPU required per bit from the interval [800, 5600]Where the interval is 1600, the speed of movement of the apparatus

m/s, acceleration of 2m/s²The offload slot interval is 1 s. In addition, the migration delay adopts the calculation delay of the MEC. In the parameter setting of reinforcement learning, the training times is 500, the testing times is 100, and the threshold value of the reward

And

step length of

=0.1, attenuation factor

=0.9, search rate

=0.995。

Under the simulation conditions, the present embodiment simulates the average delay of each offloading task when the device sequentially passes through a plurality of AP coverage areas, and compares the results of the two determined data offloading strategies provided in the embodiments of the present application with the average delay cost at a constant speed. The comparison result is shown in fig. 3, each speed sampling point in fig. 3 has a bar graph, which includes 3 bars, and sequentially shows the effect graphs of the prior art, the first mode and the second mode. It can be seen from fig. 3 that both proposed algorithms can effectively reduce the average time delay of the offload service, and the time delay of the reinforcement learning algorithm based on dual Q-learning is lower, which significantly improves the processing speed of the offload service.

Fig. 4 is a comparison graph of the effects of the two modes provided by the embodiment of the application. As shown in fig. 4, there is a bar graph at each computing power of the device, comprising 2 bars, representing the effect of the first and second ways in turn, as can be seen from fig. 4, the second way has a better quality of service in most cases than the first way.

In summary, compared with the offloading decision scheme of uniform-speed movement in the related art, the scheme provided in the embodiment of the present application utilizes channel information of multiple wireless coverage areas and the controllable speed characteristic of the mobile machine device, and designs a process-oriented offloading decision optimization method for single machine and multiple MECs. Therefore, the gain brought by the speed control process is effectively excavated, the completion delay of the unloading service is obviously reduced, and the service efficiency of the machine communication system is improved. In addition, the scheme has strong real-time performance, and can effectively adapt to the unloading judgment scheme of the machine equipment by adaptively controlling the moving speed of the machine equipment after the moving speed of the machine equipment, the unloading service requirement and the wireless channel information of a plurality of MEC responsible areas are known.

An embodiment of the present application provides a storage medium, in which a computer program is stored, wherein the computer program is configured to perform the method described in any one of the above when the computer program runs.

An embodiment of the application provides an electronic device, comprising a memory and a processor, wherein the memory stores a computer program, and the processor is configured to execute the computer program to perform the method described in any one of the above.

The embodiment of the application provides a control device for data unloading, which is provided with the electronic device.

It will be understood by those of ordinary skill in the art that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all of the components may be implemented as software executed by a processor, such as a digital signal processor or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.

Claims

1. A control method for data unloading facing machine communication is characterized in that equipment passes through in sequenceNA coverage area of a mobile edge computing, MEC, server, the method comprising:

set up the firstnAn MEC covering an arealThe data unloading model of the time slot, wherein the input parameter of the data unloading model is description information of a data unloading task, and the output parameter is a device rowThe method comprises the following steps that speed changing indication of an incoming speed, the size of unloaded data and a target MEC of data unloading are carried out, and a data unloading model determines an output parameter value corresponding to an input parameter value by using a decision parameter; wherein the decision parameter comprises at least one of: the processing power of the MEC, the processing power of the device and the quality of service of the channel;

performing data unloading operation on the equipment according to the data unloading strategy;

wherein the content of the first and second substances,n≤N，l≤

and is andn、N、land

are all integers greater than 0.

2. The method of claim 1, wherein:

the data unloading model comprises a state expression, a reward expression and an action expression;

identification of MEC coverage area; whether a channel is available; first, thenAn MEC covering an arealAn offload data size for a time slot; the equipment is atnAn MEC covering an arealA speed of one time slot; the processing power of the device; first, thenProcessing power of each MEC server;

wherein, the firstnThe reward for each MEC coverage area is derived by,

get the firstnGenerating an unloaded instantaneous prize value per slot in an MEC coverage area, whereinnThe total number of time slots of each MEC coverage area is based onnThe coverage of each MEC and the travel speed of the equipment are determined;

3. The method of claim 2, wherein the obtaining is firstnThe instantaneous prize value for each timeslot in each MEC coverage area that results in offloading includes:

first, thenAn MEC covering an arealCommunication delay of one time slot

。

4. The method of claim 2, wherein the data offload policy is derived by:

training the speed change instruction by adopting a traversal algorithm to obtain a first training result; training the size of the unloading data by adopting reinforcement learning Q-learning to obtain a second training result;

directly selecting the first training result from the first training resultsnA shift instruction value as the firstnA variable speed indication of individual MEC coverage areas; and determining the second training result from the second training resultnAnd the size of the unloaded data used when the Q value is maximum in each time slot in the coverage area of each MEC and the selected MEC server are used as the unloaded data size of the time slot and the target MEC of data unloading.

5. The method of claim 4, wherein the second step is obtained by a first and second training operationnTraining results for each MEC coverage area, wherein:

the first training operation performs the following operations, including:

performing the following operations for each time slot, including:

get the firstnGenerating an instantaneous prize value for each time slot offloaded in each MEC coverage area;

respectively comparing the instantaneous reward value of each time slot with a reward upper limit value and a reward lower limit value to obtain a comparison result;

if the comparison result is less than the lower limit value of the variable speed reward, subtracting one from the value of the variable speed reward value; if the comparison result is greater than or equal to the lower reward limit value and less than or equal to the upper reward limit value, the value of the variable-speed reward value is unchanged; if the comparison result is that the variable speed reward value is greater than the reward upper limit value, adding one to the value of the variable speed reward value;

after the calculation operation is completed on all time slots of the first MEC coverage area, determining a speed change instruction used when the speed change reward value is maximum, and taking the speed change instruction as the speed change instruction obtained by the training operation;

the second training operation is implemented as follows:

counting the number one obtained in the first training stepnThe number of shift indications for each MEC coverage area, and the indication with the largest occurrence number is taken as the firstnFinal shift indication for each MEC coverage area.

6. The method of claim 2, wherein the data offload policy is derived by:

training the speed change instruction by adopting reinforcement learning to obtain a third training result; training the size of the unloading data by adopting reinforcement learning to obtain a fourth training result;

determining from the third training result the secondnThe gear shifting instruction used when the Q value of each MEC coverage area is maximum is used as the gear shifting instruction of the coverage area; and, from the fourth training result, determining the secondnAnd the size of the unloaded data used when the Q value is maximum in each time slot in the coverage area of each MEC and the selected MEC server are used as the unloaded data size of the time slot and the target MEC of data unloading.

7. The method of claim 6, wherein training the shift indicator using reinforcement learning to obtain a third training result comprises:

obtaining the first time slot by performing the following operation on each time slotnThe variable speed reward value for each MEC coverage area, comprising:

if the comparison result is less than the lower limit value of the variable speed reward, subtracting one from the value of the variable speed reward value; if the comparison result is greater than or equal to the lower reward limit value and less than or equal to the upper reward limit value, the value of the variable-speed reward value is unchanged; if the comparison result is that the variable speed reward value is greater than the reward upper limit value, the value of the variable speed reward value is added by one;

after obtaining the variable-speed reward value, the variable-speed reward value is used for carrying out reinforcement learning of the variable-speed instruction, and a Q table of the variable-speed instruction is updated.

8. A storage medium, in which a computer program is stored, wherein the computer program is arranged to perform the method of any of claims 1 to 7 when executed.

9. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, and wherein the processor is arranged to execute the computer program to perform the method of any of claims 1 to 7.

10. A control device for data offloading oriented to machine communication, characterized in that an electronic device according to claim 9 is provided.