CN110427261A

CN110427261A - A kind of edge calculations method for allocating tasks based on the search of depth Monte Carlo tree

Info

Publication number: CN110427261A
Application number: CN201910741439.2A
Authority: CN
Inventors: 陈杰男; 陈思宇; 李帅; 王琪
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2019-08-12
Filing date: 2019-08-12
Publication date: 2019-11-08

Abstract

The application discloses a kind of edge calculations method for allocating tasks based on the search of depth Monte Carlo tree, to support optimization of the Edge Server to resource allocation.Edge Server regard mobile edge calculations system mode as input, Edge Server scheduling of resource module exports optimal resource allocation scheme by deeply learning algorithm, and mobile device terminal carries out task unloading according to optimal resource allocation scheme and executes task together with Edge Server.The deeply learning algorithm cooperates completion by DNN, MCTS, LSTM, and compared with greedy search and DQN algorithm, algorithm proposed in this paper is substantially improved in terms of optimization service time delay and the service capacity consumption of optimization mobile terminal.

Description

A kind of edge calculations method for allocating tasks based on the search of depth Monte Carlo tree

Technical field

It is the present invention relates to a kind of intelligence communication field, in particular to a kind of based on the edge of depth Monte Carlo tree search Calculate method for allocating tasks.

Background technique

Have some algorithms at present to be applied in the optimization distribution of mobile edge calculations resource.First method uses line Property planning algorithm, optimizes computing resource and bandwidth resources, and Lai Tigao system maximum throughput simultaneously reduces service response and prolongs When, to promote mobile limbic system performance, but this method is unable to adjust the unloading rate of task.Second is based on Lyapunov Optimization method is a kind of algorithm of dynamic adjustment calculating task unloading rate, can reduce the time of calculating task completion, disadvantage It is that the task complexity that can be handled is lower, the higher distribution task of complexity cannot be handled.And both resource allocations are excellent Linear programming algorithm, the Lyapunov algorithm used in change method is all discovery learning, and the experience of the mankind is needed to instruct It could complete.In addition, mobile number of devices is significantly increased under 5G scenes of internet of things, the demand of mobile subscriber terminal calculating task is more Sample, optimization problem become complicated, and existing method is difficult to handle the optimization problem of high complexity.

Summary of the invention

It is an object of the invention to overcome the above-mentioned deficiency in the presence of the prior art, provide a kind of based on depth Meng Teka The edge calculations method for allocating tasks of Luo Shu search becomes in demand multiplicity, the optimization problem of mobile subscriber terminal calculating task When complicated, it is also able to achieve the optimization to environmental resource distribution.In order to achieve the above-mentioned object of the invention, the present invention provides following technologies Scheme:

When mobile subscriber terminal generates calculating task, Edge Server updates mobile edge calculations system status information； The mobile edge calculations system includes mobile device terminal, radio communication base station, Edge Server；The mobile edge calculations System status information includes the computing capability of Edge Server, the wireless bandwidth resource of radio communication base station, times of mobile device Business solicited message, the task request message includes the historical channel gain information of each mobile device terminal and base station, currently The data volume size of task to be processed, cpu clock periodicity needed for completing current task, the local of mobile device terminal Cpu clock frequency；

Mobile edge calculations system status information is transferred to DNN (deep neural network) by Edge Server, MCTS (is covered special Ka Luoshu) and LSTM (shot and long term memory network)；The LSTM increases according to the channel of mobile device terminal and radio communication base station Benefit predicts channel gain in future, sends MCTS and DNN for obtained channel gain prediction data；DNN is according to movement Edge calculations system status information and channel estimating data, obtain resource allocation actions prior probability, the resource allocation that will be obtained Movement prior probability is sent to MCTS；

MCTS combines mobile edge calculations system status information, and channel gain prediction data and resource allocation actions priori are general Rate searches for optimal resource allocation scheme and optimal resource allocation scheme is sent to mobile device terminal after MCTS is searched for, and moves Task is offloaded to mobile edge calculations system by dynamic device end, and mobile edge calculations system execution module is according to optimal resource point Optimal resource allocation behavior is executed with scheme.

Wherein MCTS carries out the optimal resource allocation scheme that emulation search is found to task status, is searched for using the MCTS In obtained optimal resource allocation scheme deposit experience pond, experience pond size is fixed, and is deleted when data are filled with and is deposited earliest Then the data entered are stored in new data.And the data in use experience pond train the DNN at regular intervals, to improve The predictablity rate for stating DNN makes DNN export updated resource allocation actions prior probability and preferably MCTS is instructed to be searched Rope, to optimize the edge calculations method for allocating tasks.The better search result of MCTS, which can preferably update DNN again, simultaneously makes It is more acurrate to obtain its prediction.

The present invention provides MCTS search routine is as follows:

S1: according to the root node of mobile edge calculations system mode initialization MCTS；

S2: search starting point is set by root node and starts to be searched for next time；

S3: judge whether to complete the search of pre-determined number, if so, step s9 is executed, if it is not, thening follow the steps s4；

S4: judging whether present node is leaf node, if so, step s5 is executed, if it is not, thening follow the steps s6；

S5: after entering leaf node, i.e., after the completion of computational resource allocation, assessing the Resource Allocation Formula, return to award, The state of all nodes on the paths is updated according to award；

S6: judge whether present node is full expanding node, if it is not, step s7 is executed, if so, thening follow the steps s8；

S7: all child nodes of present node are extended according to the prior probability of the output of DNN, and according to formulaNext node is selected, wherein Q (v '_k) be defined as saving Point v '_kJackpot prize value, N (v '_k) it is defined as node v '_kAccess times, e be balanced algorithm development and exploration proportionality coefficient, p(v′_k|s_k) indicate next node prior probability；

S8: choosing search from all possible resource allocation actions and be worth highest movement, execute the movement of selection into Enter next layer of node；

S9: the most path of output access number is optimal Resource Allocation Formula.

Above-mentioned optimal distributing scheme but also as DNN training set, to promote search performance again.

The acquisition methods of reward value r are as follows:

Optimal execution time t is set_bestTo use minimum time used in history Resource Allocation Formula completion task, just Initial value is infinity, and wherein t indicates that Current resource allocation plan is completed the time of required by task, and σ is setting value and σ > 1.

The present invention is as follows using LSTM prediction channel gain principle:

If Fig. 1 is LSTM network structure:

LSTM network is according to historical juncture channel gain h_τ-p+1, h_τ-p+2..., h_τTo predict future time channel gain h_τ+1, it is by LSTM net definitions

h_τ+1=g_θ(h_τ-p+1,h_τ-p+2,...,h_τ)

Here, θ is defined as the weight parameter of LSTM network, and LSTM Web vector graphic cell factory stores long term state, It is mainly controlled by three doors, i.e. input gate, forgets door and out gate.It is illustrated in figure 2 LSTM eucaryotic cell structure figure:

Pass through to each permission header length, this is by sigmoid function and the nervous layer of point-by-point multiplying Lai real Existing, chief component can be summarized as follows.

Input gate: it is determined that the input of current network to be saved in the quantity of location mode, the realization of input gate is shown in (1) and in (2), h is currently inputted_τWith previous LSTM cell stateIt is used as the input of input gate, then weight matrix Which information determination is needed to update by the result of multiplication, recently enters sigmoid layers or tanh layers.

Forget door: it determines to forget how many current input on network, and remaining output is then saved in current list First state.Door inputs h from current_τWith previous LSTM cell stateInformation is obtained, and exports 0 to 1 probability, wherein 1 Expression is fully retained, and 0 indicates to abandon completely.Relevant equations are as follows

Out gate: it will export new LSTM cell state.Firstly, sigmoid layers by determine need lead-out unit which A little parts.Then tanh layers are sent to export the probability value between [- 1,1] by location mode.Finally, probability value multiplied by Sigmoid layers of output.Correlate equation is as follows:

Pass through optimization object function J (θ) Lai Xunlian LSTM parameter θ

HereIt is the data label obtained from mobile edge calculations network, regularization term ξ ‖ θ ‖²Avoid overfitting. From the channel gain at MEC network collection continuous p+1 moment, input of the preceding p data as LSTM is then used, and use the P+1 data train LSTM network as label.

The DNN that the present invention uses be it is trained in advance, training method is as follows: according to the shape of mobile edge calculations system State, generates a simulated environment, and Monte Carlo first carries out equal proportion search, the resource ratio of as same task distribution or It the ratios such as is between the different schemes of task unloading ratio, depending on specific ratio setting according to circumstances, search process and above-mentioned Resource allocation methods are similar, and the search plan for being different in the training process is only simulated in simulated environment, search result It is only used as the training set of DNN, search result does not have to execute in true mobile edge calculations system.It will search after search As a result it is sent to DNN, DNN is using search result as training set, because DNN has Function Fitting performance, one not exported to MCTS A little states can also export prior probability, therefore, when having mobile edge calculations system status information and channel gain prediction data defeated Fashionable, DNN can export resource allocation actions prior probability to MCTS.

At this point, MTCS can be scanned in true environment according to the prior probability, optimal Resource Allocation Formula is obtained (search result).Then, we carry out reverse train update to DNN using MTCS search result obtained in true environment, By the real-time training of MTCS search result, DNN is just continuously updated, optimizes, thus export more accurate prior probability to MCTS, MCTS continue to optimize the Resource Allocation Formula of its output according to updated prior probability.

It is DNN structure of the present invention as shown in Figure 3.By last several layers of separation of deep neural network, with building Neural network with sublayer, to export the behavior of the resource allocation of multitask.DNN receives task statusAnd export the behavior a for distributing each resource_x={ a_{X, l}(l=0,1 ..., q-1) prior probabilityDNN includes an input layer Hⁱ, n public hidden layer { H₁..., H_nAnd q sublayer, each sublayer packet Containing m sub- hidden layersEvery layer of neuron number is expressed asThe parameter of every layer of neural network isWherein i indicates that input layer, s indicate son Hidden layer, O indicate that output layer, W indicate weight, and b indicates biasing.In the training stage, the training set of DNN is generated by MCTS, and is made With the parameter of RMSProp (root mean square backpropagation) algorithm training DNN.Each RMSProp optimizer exports mutually independent damage Lose function Wherein Δ ‖ θ ‖²It is parameter regularization term, to avoid Over-fitting.By q RMSProp optimizer, solve so that the smallest DNN parameter of the value of loss function, for example, first of optimization Device is according to resource allocation policyUpdate the parameter of DNN The l+1 optimizer uses the θ updated later_l, according to strategyTo update θ_l+1.Therefore, in public hidden layer Parameter (W_1:n,b_1:n) can be by the behavior label of all sublayersIt is updated.In forecast period, DNN provides environment Source and mission bit stream As input, the prior probability distribution of each sublayer is exportedWherein x indicates xth item task.

Edge Server completes the calculating task of mobile subscriber terminal, multiple neighbouring sides using co-operating mode Edge server shares computing resource, and partial task is offloaded to other adjacent domains when handling local task by Edge Server Idle Edge Server.

Compared with prior art, beneficial effects of the present invention:

Edge calculations method for allocating tasks of the invention can be completed in the case where no human intervention (or artificially labelling) Study to resource allocation policy；

Invention defines variable tasks to unload ratio, moves Edge Server according to the calculating energy of mobile subscriber terminal Power and the remaining computing resource of each Edge Server determine the scale that calculating task is completed for user, mobile subscriber Terminal upload server permits the calculating task part of processing, and rest part is voluntarily handled by mobile subscriber terminal.

Edge calculations method for allocating tasks of the invention is used by cooperating between DNN, MCTS, LSTM mobile When demand multiplicity, the optimization problem of family terminal calculating task become complexity, it is also able to achieve the optimization to environmental resource distribution.DNN It instructs MCTS to scan for prior probability, beta pruning is carried out to MCTS, so that MCTS is done less returns low search, such MCTS Can be with less searching times come search performance the same when reaching no DNN, the even more than not no performance of DNN, property 50% or so can be promoted compared with DQN.The channel estimating module based on LSTM is devised, to the channel gain of user's future time It is predicted, the movement of mobile subscriber is that consecutive variations are regular, and the channel estimating module is used in MCTS to future time instance Task calculated using the channel gain of prediction when scanning for so that the acquisition of reward is more acurrate.

The DNN that the present invention uses is multitask neural network: being to be mutually related between multiple subtasks, with a nerve Network one sub- task state of corresponding output cannot learn to the relationship between multiple subtasks, therefore the multitask nerve used Network energy boosting algorithm constringency performance.

The algorithm that the present invention uses has flexibility: if optimization aim changes, only need to change reward function (reward) Setting achieve that new optimization aim, without redesign algorithm.(example optimization aim as in the previous is that optimization is minimum Change time delay, be now to be changed to minimize energy consumption, then only needing to change the set-up mode of reward function at this time)

After the algorithm off-line training that the present invention uses is good, it can also be collected simultaneously training data during on-line operation, To promote search performance again.

Detailed description of the invention:

Fig. 1 is LSTM network structure.

Fig. 2 show LSTM eucaryotic cell structure figure.

Fig. 3 show the DNN structure of invention use.

Fig. 4 is a kind of showing for edge calculations method for allocating tasks based on the search of depth Monte Carlo tree of the present invention It is intended to.

Fig. 5 show the flow chart of MCTS search of the present invention.

Specific embodiment

Below with reference to test example and specific embodiment, the present invention is described in further detail.But this should not be understood It is all that this is belonged to based on the technology that the content of present invention is realized for the scope of the above subject matter of the present invention is limited to the following embodiments The range of invention.

Embodiment 1:

It is a kind of based on deeply study edge method for allocating tasks, whole distribution method as shown in figure 4, include with Lower step:

Edge Server refreshes mobile edge calculations system status information；The mobile edge calculations system status information packet Include the computing resource situation of Edge Server, the communication resource situation of radio communication base station, the task request message of mobile device, The task request message includes the historical channel gain information of each mobile device terminal and base station, currently pending task Data volume size, complete current task needed for cpu clock periodicity, the local cpu clock frequency of mobile device terminal；

Mobile edge calculations system status information is transferred to DNN, MCTS and LSTM by Edge Server；Edge Server connects Receive N number of task requests T from N number of mobile device terminal_c={ T₀, T₁..., T_N-1, while τ+1 at the time of be following, τ+ 2 ... M virtual task T is generated at random_v={ T₀, T₁..., T_M-1, wherein virtual M task be for following task it is reserved Then resource allocates resources to X=M+N task T={ T₀..., T_N..., T_X-1, LSTM is according to mobile device terminal and base station Channel gain the channel gain of the following M task is predicted, will obtain channel gain prediction data be sent to MCTS and It is general that DNN, DNN further according to mobile edge calculations system status information and channel gain prediction data obtain resource allocation actions priori Obtained prior probability is sent to MCTS by rate；

MCTS combines mobile edge calculations system status information and channel gain prediction data, generates a simulated environment, The node of Monte Carlo tree is expressed as to the distribution condition of last distribution movement, the task that this task is completed in last movement is unloaded The prior probability of the distribution of load ratio or resource ratio, MCTS combination DNN output scans for；

Search process flow chart such as Fig. 5, the specific steps are as follows:

S1, the root node that MCTS is initialized according to mobile edge calculations system mode, root node information are expressed as s₀=(F^e, B, T) and channel gain prediction data, wherein F^eIndicate the computing resource situation of Edge Server, B indicates mobile edge calculations system The communication resource situation of system, T indicate task request message；Wherein T contains the channel gain h of mobile device terminal and base station, The period c and cpu frequency f of the size of data d of task, mobile device terminal CPU completion required by task^l；

S2, it sets search starting point to root node and starts to be searched for next time, when search first searches for first task It is offloaded to the ratio of Edge Server, then search is the communication resource ratio of first task distribution, is searched again for as first item The computing resource ratio of task distribution, first task are assigned, and search Section 2 task is also appointed according to above-mentioned for first item The sequence of business distribution executes, and searches X item task always and completes resource allocation, so search depth, that is, path leaf node Place number of stories m ax_depth=3*X；

S3, judge i > max_search, if the search of pre-determined number is completed, if so, step S9 is executed, if it is not, then holding Row step S4, wherein i is that searching times, max_search have been presetting searching times；

S4, judge whether k > max_depth, present node are leaf node, if so, step S5 is executed, if it is not, then holding Row step S6, k are the number of plies where present node；

S5, after entering leaf node, i.e., after the completion of computational resource allocation, Resource Allocation Formula is sent on the ring of simulation Border executes, and the time parameter of required by task is completed according to the Resource Allocation Formula, obtains award r, is updated on the paths according to r The state of all nodes, more new formula have N (s_k, a_k)=N (s_k, a_k)+1, (Q (s_k, a_k)*N(s_k, a_k)+r/(N(s_k, a_k)+1), Wherein s indicates state, a expression movement, N (s_k, a_k) indicate side (s_k, a_k) searching times, Q (s_k, a_k) indicate state-movement pair Side (s_k, a_k) value；

S6, judge whether present node is full expanding node, if it is not, step S7 is executed, if so, thening follow the steps S8；

S7, all child nodes that present node is extended according to the prior probability of the output of DNN, and according to formulaSelect next node；

S8, from all possible distribution movement according to formula It chooses search and is worth highest movement, the movement for executing selection enters next layer of node；

S9, the most path of output access number, i.e., optimal Resource Allocation Formula；

Because the most path of searching times is exactly to be worth highest path, the path pair after multiple selection circulation The ratio for the distribution resource answered is exactly optimal allocation proportion, i.e. the path is exactly optimal Resource Allocation Formula, refers to edge To the computing resource ratio of distribution of computation tasks, Edge Server and mobile device terminal are carried out wireless communication and are used server Communication resource ratio, the calculating task of mobile device terminal is offloaded to the ratio of Edge Server.

Edge Server described in above-described embodiment 1 refers to local Edge Server, and mobile device terminal refers to local shifting Dynamic device end.

Embodiment 2:

The calculating that multiple neighbouring Edge Servers can complete mobile subscriber terminal using co-operating mode is appointed Business, comprising the following steps:

Local Edge Server refreshes mobile edge calculations system status information；The mobile edge calculations system mode letter Computing resource situation, the collaboration edge service that breath includes the computing resource situation of local Edge Server, cooperates with Edge Server The address of device, the wireless bandwidth resource of radio communication base station and mobile device task request message, the task request message Historical channel gain information including each mobile device terminal and base station, the data volume size of currently pending task are complete At cpu clock periodicity needed for current task, the local cpu clock frequency of mobile device terminal；

Mobile edge calculations system status information is transferred to DNN, MCTS and LSTM by local Edge Server；Edge service Mobile edge calculations system status information is transferred to DNN, MCTS and LSTM by device；Edge Server is received from N number of movement N number of task requests T of device end_c={ T₀, T₁..., T_N-1, while τ+1 at the time of be following, τ+2 ... generate M at random Virtual task T_v={ T₀, T₁..., T_M-1, wherein virtual M task is to reserve resource for following task, then by resource Distribute to X=M+N task T={ T₀..., T_N..., T_X-1, LSTM is according to the channel gain of mobile device terminal and base station to not The channel gain for carrying out M task is predicted, will be obtained channel gain prediction data and is sent to MCTS and DNN, DNN is further according to shifting Dynamic edge calculations system status information and channel gain prediction data obtain resource allocation actions prior probability, the priori that will be obtained Probability is sent to MCTS；

MCTS combines mobile edge calculations system status information and channel gain prediction data, generates a simulated environment, The node of Monte Carlo tree is expressed as to the distribution condition of last distribution movement, the collaboration side of this task is completed in last movement The selection of edge server, task unload the distribution of ratio or resource ratio, and the prior probability of MCTS combination DNN output is searched Rope；

Search process flow chart such as Fig. 5, the specific steps are as follows:

S1, the root node that MCTS is initialized according to mobile edge calculations system mode, root node information are expressed as s₀=(F^e, B, T) and channel estimating data, wherein F^eIndicate the calculating of Edge Server (local Edge Server and collaboration Edge Server) Ability and resource situation, B indicate the state of each wireless channel, such as fading profiles and noise size, and T indicates task requests letter Breath；Wherein T contains the communication resource h of mobile device terminal and base station, and the size of data d of task, mobile device terminal CPU are complete At the period c and cpu frequency f of required by task^l；

S2, it sets search starting point to root node and starts to be searched for next time, when search first searches for first task It is offloaded to the ratio of local Edge Server, then search is the communication resource ratio of first task distribution, searches again for local Edge Server is the computing resource ratio of first task distribution, searches for the Edge Server address of collaboration, is offloaded to collaboration The ratio of the first task of Edge Server, the computing resource ratio that collaboration Edge Server is distributed to first task, the One task is assigned, and search Section 2 task is offloaded to the ratio of local Edge Server, is then searched for and is appointed for Section 2 The communication resource ratio of business distribution searches again for the computing resource ratio that local Edge Server is the distribution of Section 2 task, search Collaboration Edge Server address (complete front task do not distribute front collaboration Edge Server resource, then after Collaboration Edge Server before continuous use is then searched for newly if having distributed the resource of the collaboration Edge Server of front Cooperate with Edge Server), it is offloaded to the ratio of the Section 2 task of collaboration Edge Server, cooperates with Edge Server to Section 2 The computing resource ratio of task distribution, executes according to above-mentioned allocation order, searches X item task always and completes resource point Match, so number of stories m ax_depth=6*X where the leaf node of search depth, that is, path；

S3, judge i >=max_search, if the search of pre-determined number is completed, if so, step S9 is executed, if it is not, then holding Row step S4, wherein i is that searching times, max_search have been the searching times of setting；

S4, judge whether k > max_depth, present node are leaf node, if so, step S5 is executed, if it is not, then executing Step S6, k are present node, and max_depth is leaf node；

S5, after entering leaf node, i.e., after the completion of resource allocation, required by task is completed according to the Resource Allocation Formula Time parameter obtains award r, the state of all nodes on the paths is updated according to r, more new formula has N (s_k, a_k)=N (s_k, a_k)+1, (Q (s_k, a_k)*N(s_k, a_k)+r/(N(s_k, a_k)+1), wherein s indicates state, a expression movement, N (s_k, a_k) indicate side (s_k, a_k) searching times, Q (s_k,a_k) indicate state-movement opposite side (s_k,a_k) value；

Because the most path confidence level of searched number is higher, and the path is corresponding after multiple selection circulation The ratio for distributing resource is exactly optimal Resource Allocation Formula.Optimal Resource Allocation Formula includes being offloaded to local edge service The task ratio of device, communication resource ratio, the computing resource ratio of local server distribution, the ground of the Edge Server of collaboration Location is offloaded to the task ratio of collaboration Edge Server, the computing resource ratio of collaboration Edge Server distribution.

Above embodiments are only to illustrate the present invention and not limit the technical scheme described by the invention, although this explanation The present invention has been described in detail referring to above-mentioned each embodiment for book, but the present invention is not limited to above-mentioned specific implementation Mode, therefore any couple of present invention modifies or equivalent replacement；And the technical side of all spirit and scope for not departing from invention Case and its improvement, are intended to be within the scope of the claims of the invention.

Claims

1. a kind of edge calculations method for allocating tasks based on the search of depth Monte Carlo tree, which is characterized in that the method packet It includes:

Step 1: Edge Server updates mobile edge calculations system status information；The mobile edge calculations system mode letter Breath includes: the computing resource situation of Edge Server, the communication resource situation of radio communication base station and mobile terminal, and mobile The task request message of equipment；The task request message includes: the channel of each mobile device terminal and radio communication base station Cpu clock week needed for gain information, the data volume size of currently pending task, mobile device terminal complete current task Issue and mobile device terminal cpu clock frequency；

Step 2: mobile edge calculations system status information is transferred to DNN, MCTS and LSTM by Edge Server；The LSTM Future channel gain is predicted according to the channel gain of mobile device terminal and radio communication base station, and obtained channel gain is pre- Measured data is sent to MCTS and DNN；The DNN is obtained according to mobile edge calculations system status information and channel gain prediction data To resource allocation actions prior probability, obtained resource allocation actions prior probability is sent to MCTS；

Step 3: MCTS is dynamic according to the mobile edge calculations system status information, channel gain prediction data and resource allocation It is scanned for as prior probability, obtains optimal resource allocation scheme；And the optimal resource allocation scheme is sent to movement and is set Task is offloaded to mobile edge calculations system execution module by standby terminal, mobile device terminal, and mobile edge calculations system executes Module executes optimal resource allocation behavior according to optimal resource allocation scheme.

2. a kind of edge calculations method for allocating tasks based on the search of depth Monte Carlo tree according to claim 1, It is characterized in that, the DNN is trained in advance, make its satisfaction: when thering is mobile edge calculations system status information and channel to increase When beneficial prediction data inputs, resource allocation actions prior probability can be exported to MCTS.

3. a kind of edge calculations method for allocating tasks based on the search of depth Monte Carlo tree according to claim 1, Be characterized in that, described in step 3 search the following steps are included:

S1:MCTS initializes root node according to mobile edge calculations system mode；

S2: search starting point is set by root node and is searched for next time；

S5: after entering leaf node, i.e., after the completion of computational resource allocation, assessing the Resource Allocation Formula, return to award, according to Award updates the state of all nodes on the paths；

S7: all child nodes of present node are extended according to the resource allocation actions prior probability of the output of DNN, and according to formulaNext node is selected, wherein Q (v '_k) be defined as saving Point v '_kJackpot prize value, N (v '_k) it is defined as node v '_kAccess times, e be balanced algorithm development and exploration proportionality coefficient, p(v′_k|s_k) indicate next node prior probability, k indicate present node the number of plies；

S8: choosing search from all possible resource allocation actions and be worth highest movement, under the movement entrance for executing selection One layer of node；

4. a kind of edge calculations method for allocating tasks based on the search of depth Monte Carlo tree according to claim 1, It is characterized in that, the optimal resource allocation scheme training DNN searched for using MCTS in the step 3, described in improving The predictablity rate of DNN makes DNN export updated resource allocation actions prior probability and preferably MCTS is instructed to scan for, Optimize the edge calculations method for allocating tasks.

5. a kind of edge calculations method for allocating tasks based on the search of depth Monte Carlo tree according to claim 1, It is characterized in that, the DNN includes an input layer Hⁱ, n public hidden layer { H₁,…,H_nAnd q division sublayer, each Dividing sublayer includes m sub- hidden layersEvery layer of neuron number is expressed asThe parameter of every layer of neural network isWherein i indicates that input layer, s indicate Sub- hidden layer, O indicate that output layer, W indicate weight, and b indicates biasing.

6. a kind of edge calculations method for allocating tasks based on the search of depth Monte Carlo tree according to claim 1, It is characterized in that, the LSTM is according to historical juncture channel gain h_τ-p+1, h_τ-p+2..., h_τPredict future time channel gain h_τ+1, It is by LSTM net definitions

h_τ+1=g_θ(h_τ-p+1,h_τ-p+2..., h_τ)

Wherein, θ is the weight parameter of LSTM, and LSTM Web vector graphic cell factory stores long term state, by input gate, forgets door With three door controls of out gate.

7. a kind of edge calculations method for allocating tasks based on the search of depth Monte Carlo tree according to claim 3, It is characterized in that, the acquisition methods of the step s5 reward value r in described search are as follows:

Wherein t_bestFor using minimum time used in history Resource Allocation Formula completion task, initial value is infinity, t is indicated Current resource allocation plan is completed the time of required by task, and σ is setting value and σ > 1.

8. a kind of edge calculations method for allocating tasks based on the search of depth Monte Carlo tree according to claim 1, It is characterized in that, the optimal Resource Allocation Formula includes: that the calculating task of mobile device terminal is offloaded to local edge service The ratio of device, local Edge Server are whole to the computing resource ratio and Edge Server and mobile device of distribution of computation tasks End carries out wireless communication used communication resource ratio.

9. a kind of edge calculations method for allocating tasks based on the search of depth Monte Carlo tree according to claim 1, It is characterized in that, Edge Server completes the calculating task of mobile subscriber terminal, multiple neighbouring sides using co-operating mode Edge server shares computing resource, and it is neighbouring can be offloaded to other when handling local task by Edge Server for partial task The Edge Server of region free time.

10. a kind of edge calculations method for allocating tasks based on the search of depth Monte Carlo tree according to claim 1, It is characterized in that, the optimal Resource Allocation Formula includes the task ratio for being offloaded to local Edge Server, communication resource ratio Example, the computing resource ratio of local server distribution, the address of the Edge Server of collaboration are offloaded to collaboration Edge Server Task ratio, the computing resource ratio of collaboration Edge Server distribution.