CN110233763A

CN110233763A - A kind of virtual network embedded mobile GIS based on Timing Difference study

Info

Publication number: CN110233763A
Application number: CN201910527020.7A
Authority: CN
Inventors: 王森; 张标
Original assignee: Chongqing University
Current assignee: Chongqing University
Priority date: 2019-07-19
Filing date: 2019-07-19
Publication date: 2019-09-13
Anticipated expiration: 2039-07-19
Also published as: CN110233763B

Abstract

The present invention relates to a kind of virtual network embedded mobile GISs based on Timing Difference study, and VNE problem is modeled as Markovian decision process (MDP) by this method, and establishes a neural network to approach the value function of VNE state.On this basis, a kind of algorithm for being named as VNE-TD based on Timing Difference study (a kind of intensified learning method) is proposed.In VNE-TD, multiple insertion candidate items of node mapping are that probability generates, and TD learns the long-term potentiality for assessing each candidate item.A large amount of simulation result shows that VNE-TD algorithm is all substantially better than previous algorithm in terms of (block ratio) blockage ratio and income.

Description

A kind of virtual network embedded mobile GIS based on Timing Difference study

Technical field

The present invention relates to computer network, in particular to a kind of virtual network embedded mobile GIS based on Timing Difference study.

Background technique

In recent years, network virtualization is received due to providing a very promising solution for following network Study the extensive concern of community and industry.It is considered as a kind of tool, can overcome resistance of the current internet to fundamental change Power.In addition, network virtualization is also the crucial pusher of cloud computing.The principal entities of network virtualization are virtual network (VN). As shown in Figure 1, VN is the combination of the dummy node and link on bottom-layer network (SN), the number on interior joint or under link Respectively node capacity and link bandwidth.Dummy node is connected with each other by the virtual link in the one or more path SN.Pass through The node resource and link circuit resource of one SN are virtualized, can on identical physical hardware trustship simultaneously have it is wide Multiple VNs of general different characteristics.One group of virtual network requests (VNR) for having certain resource requirement to node and link is given, A problem of specific node and link subset are to meet each VNR referred to as virtual network is found in a SN to be embedded in (VNE) problem.Under most of realities, VNE problem must be handled as an on-line annealing.That is, VNRs It is not known in advance.On the contrary, they dynamically reach system, and can stay for some time in SN.In fact, VNE algorithm VNRs must be handled when reaching, rather than one group of VNRs of single treatment (offline VNE).Online insertion decision is being done for VNRs When, usually to maximize its long-term gain as target, this makes infrastructure provider (InP, the usually owner of SN) VNE problem has more challenge.

Summary of the invention

The technical problem to be solved by the present invention is being realized when virtual network is embedded between performance and computation complexity preferably Balance.

To achieve the above object, the present invention adopts the following technical scheme: a kind of virtual network based on Timing Difference study Insertion is calculated

Method includes the following steps:

S101: VNE model is established

Bottom-layer network SN is modeled as weighted undirected graph, and is denoted as G^s(V^s,E^s), wherein V^sIt is bottom layer node collection, E^sIt is bottom set of links, each bottom layer node v^s∈V^s, haveComputing capability, each bottom link e^s∈E^s, have Bandwidth；

By VNR_kIt is modeled as a non-directed graph, is denoted as G^k(V^k,E^k), wherein V^kIt is dummy node collection, E^kIt is virtual linkage collection, Each dummy node v^k∈V^k, haveComputing capability, each virtual link e^k∈E^k, haveBandwidth demand；

S102: definition status

S102a: for VNE_kA reward function is defined, such as formula (1): VNE_kIndicate the process to k-th of VNR；

Wherein, c_vIndicate the node capacity of node v, b_eIndicate that the link bandwidth of link e, η indicate computing resource unit price, β table Show the unit price of bandwidth resources；Therefore, VNR will naturally enough be handled_kInstant reward afterwards is defined as Rvn (k), i.e. r_k=Rvn (k)；

S102b: the set of all possible node mapping is defined as VNE defining operation collection: the operation set of VNE；

S102c: Markov state is defined for VNE:

State s is indicated using the remaining node capacity of the standardization of SN and link bandwidth_k, have in formWiths_kIt is an ordered set, shown in following formula (3):

In RL, the status signal for being successfully reserved all relevant informations is known as markov；

If status signal has Markov property, the environmental response at k+1 only depends on the state at k and moves Make, in this case, only by specifying the following contents, so that it may determine the dynamic of environment；

Pr{s_k+1=s ', r_k+1=r | s_k, a_k} (5)

S103: VNE is modeled as Markovian decision process MDP；

S103a: definition strategy and value function: VNE agency strategy be at state s, from each state s and movement a to The mapping for taking the probability of movement a, gives strategy π, the value function of VNE is the function of VNE state, and value function is expressed as V^π (s), s ∈ S, V^π(s) it can be counted as accommodating the potentiality of following VNRs and generation permanent income, it is current to measure with this The quality of state, its definition such as formula (8):

R_kIt is from VNR_kAll rewards summation, γ is the discount rate for determining the following reward present worth；

S103b: optimal value function is defined:

Purpose from the angle research VNE problem of RL is to find one kind to obtain the optimal of maximal rewards on long terms Strategy；

If π^*It is an optimal policy, and if only if given arbitrary strategy π, π^*>=π, this means that for all S, s ∈ S, have

Optimal value function is defined as

For optimal value function V^*(s), there is following iteration expression formula:

S104: optimal value function V is approached using neural network^*(s), i.e., the value function under optimal policy:

It is fully connected (fc) layer using the feedforward neural network and 2 of standard and carrys out near-optimization value function V^*(s), Fc1 with fc2 node layer number is identical, is denoted as H, uses rectifier as activation primitive, the input of neural network is state s, such as formula (3) shown in, by calculating, neural network is input, output valve V (s), it is expected that being similar to V with state s^*(s)；

The supervised learning of approximating function V (s) is an adjustment neural network parameterProcess, the purpose is to reduce V to the greatest extent (s) and V^*(s) difference between can indicate are as follows:

With the progress of RL process, V^*(s_k) it is considered as the sample of approximate function V (s) parallel supervisory control study, according to gradient Descent method is for VNR k, parameterIt updates as follows:

Wherein α is the positive step parameter of Schistosomiasis control speed；

S105: in VNE, giving a VNR, it is understood that possible operation and corresponding next state, therefore,WithIt is determining, it is known that, the matching for traversing each node mapping simulates operation set as operation set Input of the result phase collection of insertion as neural network in S104, obtains the value of multiple optimal value functions, due to optimal policy π * (s) may be expressed as:

That is, value is maximum just to meet optimal policy,

S106: matching corresponding to the maximum optimal value function of selective value is actually embedded in VNR, and then finding two has one The shortest path between the SN node of bandwidth is determined to match VN link.

As an improvement, when traversing the matching of each node mapping, needing to do it at following diminution first in the S105 Reason:

Using the probabilistic method for generating multiple nodes mapping candidate items, using measurement RW and unified value, generate have RW and The node of uniform design probability maps candidate item.

Compared with the existing technology, the present invention at least has the advantages that

1. using neural network come the value function of approximate VNE state, for the VNE problem with huge state space, mind Facilitate to be generalized to the state having never seen from former veteran state through network.

2. being learnt based on Timing Difference, abandons and passively balance, determining online by Active Learning and based on previous experiences Plan is embedded in the contradiction of decision and long term object, more effectively solution resource allocation problem to overcome online, improves the utilization of resources Rate.

Detailed description of the invention

Fig. 1 is the example of a VNE problem

Fig. 2 is a sample topology.

The example of mono- VNE problem of Fig. 3.

The insertion result of Fig. 4 example.

Fig. 5 RL conceptual illustration VNE process.

Fig. 6 approaches the neural network of optimal value function

Fig. 7 (a) is the relational graph of blockage ratio and parameter d in algorithms of different, Fig. 7 (b) be in algorithms of different income per second and The relational graph of parameter d.

Fig. 8 (a) is the relational graph of blockage ratio and time in algorithms of different, Fig. 8 (b) be in algorithms of different income per second and when Between relational graph, Fig. 8 (c) be algorithms of different in WAPL and time relational graph.

Fig. 9 is the relational graph of loss with frequency of training.

Figure 10 (a) is the relational graph of blockage ratio and workload in algorithms of different, and Figure 10 (b) is receipts per second in algorithms of different The relational graph of benefit and workload, Figure 10 (c) are the relational graph of WAPL and workload in algorithms of different.

Figure 11 (a) is the influence relational graph that blockage ratio and node map candidate item number, and Figure 11 (b) is income per second and section The influence relational graph of point mapping candidate item number.

Figure 12 (a) is the relational graph of blockage ratio and VNRs link degree of communication in algorithms of different, and Figure 12 (b) is in algorithms of different The relational graph of income per second and VNRs link degree of communication.

Specific embodiment

Invention is further described in detail below.

The significant challenge of VNE problem is on-line decision and pursues the contradiction between long term object.The prior art attempts to lead to Overbalance SN workload overcomes this challenge, it is desirable to be able to accommodate more following VNRs.However, the problem of here, is The concatenation ability of node is related to other nodes.The consumption of node connectivity ability can might not only reduce the energy of its own Power.Need to be embedded in SN and VNR in Fig. 3, in SN.It is with node rank in the prior art measurement (entitled GRC) Example.The parameter d GRC value for being set as 0.85, SN node is illustrated in figure 4 " Original ".In order to balance SN workload, The node and strongest two nodes of connectivity binding ability that GRC-VNE will select GRC to measure, i.e. node B and node G come Match two nodes (node a and the node b) in VNR.Therefore, remaining GRC value is illustrated in figure 4 " After VNR Embedded by GRC-VNE ", the variance of these values are 0.0032.In contrast, VNE-TD algorithm choosing proposed by the present invention Select node B and node C.Remaining GRC value is illustrated in figure 4 " After VNR embedded by VNE-TD ", the side of these values Difference is 0.0016.This shows that it is problematic for balancing the basic assumption of the work of SN workload in the prior art.It was both The workload more balanced cannot be brought, more surplus resources can not be brought.

A kind of virtual network embedded mobile GIS based on Timing Difference study, includes the following steps:

S101: VNE model is established

Bottom-layer network SN is modeled as weighted undirected graph, and is denoted as G^s(V^s,E^s), wherein V^sIt is bottom layer node collection, E^sIt is bottom set of links, each bottom layer node v^s∈V^s, haveComputing capability (such as cpu cycle), each bottom link e^s∈E^s, haveBandwidth；The bottom Fig. 1 gives the example of a SN.Number around node and link is them Available resources.

An example of VNR is given at the top of Fig. 1.For VNR k, t^kFor VNR arrival time, the value l of a restriction^kFor The service life of VNR.

S102: definition status

Wherein, c_vIndicate the node capacity of node v, b_eIndicate that the link bandwidth of link e, η indicate computing resource unit price, β table Show the unit price of bandwidth resources；

The target of reward function is to maximize the long term time average yield of InP, specific as follows:

Wherein K_T=k | 0 < t_k< T } indicate that multiple VNR before time instance T arrival gather；

Reward function is it is intended that a certain behavior under given state provides a kind of instant benefaction measurement.By formula (2) it is found that The target of VNE problem is to make the long-term average time maximum revenue of InP.Therefore, naturally enough will be after processing VNR k When reward be defined as Rvn (k), i.e. rk=Rvn (k).

S102c: Markov state is defined for VNE:

It is all it is important that current state output signals for Markovian state；Significance of which is independently of leading to its path Or historical signal.More specifically, in the most common causality, the one of generation before the reaction of environment is likely to be dependent on It cuts.In most of RL problems, transfer function is probability function.In this case, dynamics can only be by specified complete Probability distribution indicates:

Pr{s_k+1=s ', r_k+1=r | s_k, a_k, r_k, s_k-1, a_k-1..., r₁, s₀, a₀} (4)

On the other hand, if status signal has Markov property, the environmental response at k+1 is only depended at k State and movement only pass through specified the following contents, so that it may determine the dynamic of environment in this case；

Pr{s_k+1=s ', r_k+1=r | s_k, a_k} (5)

S103: VNE is modeled as Markovian decision process MDP；

S103a: definition strategy and value function:

The intensified learning task for meeting Markov property is known as Markovian decision process, is provided due to the present invention VNE state is a kind of Markovian state, therefore the decision process of VNE problem can be ideally modeled as MDP.

In MDP, free position s and movement a are given, the probability of each possible next state s ' is expressed as:

This tittle is called transition probability；Equally, the desired value of next reward is denoted as:

From the point of view of RL, the target of VNE is to find one to select optimal action most at any time, under any state Dominant strategy；

VNE agency strategy be at state s from each state s and movement a to take movement A probability mapping, will Tactful and corresponding probability be expressed as π and π (s, a)；

Almost all of nitrification enhancement is all based on estimation value function, function of state, to estimate agency in given The quality of state；

Given strategy π, the value function of VNE is the function of VNE state, value function is expressed as V^π(s), s ∈ S, V^π(s) It can be counted as accommodating the potentiality of following VNRs and generation permanent income, its formal definition such as formula (8):

R_kIt is the summation of all rewards from VNR k, γ is the discount rate for determining the following reward present worth,

S103b: optimal value function is defined:

Optimal value function is defined as

S104: Neural Networks Solution V (s), so that V (s) is approached optimal value function V^*(s):

It is fully connected (fc) layer using the feedforward neural network and 2 of standard and carrys out near-optimization value function V^*(s), such as Shown in Fig. 6.Fc1 with fc2 node layer number is identical, is denoted as H, uses rectifier as activation primitive, and the input of neural network is shape State s, as shown in formula (3), by calculating, neural network is input, output valve V (s), it is expected that being similar to V with state s^*(s)；

Wherein α is the positive step parameter of Schistosomiasis control speed；

S105: in VNE, giving a VNR, it is understood that possible operation and corresponding next state.Therefore,WithIt is determining, it is known that.The matching for traversing each node mapping simulates operation set as operation set Input of the result phase collection of insertion as neural network in S104, obtains the value of multiple optimal value functions.Due to optimal policy π * (s) may be expressed as:

That is, value is maximum just to meet optimal policy.

S106: node corresponding to the maximum value function of selective value is actually embedded in VNR, and then finding two has certain band Shortest path between wide SN node matches VN link.

Present invention uses a kind of RL method, i.e. Timing Difference (being abbreviated as TD) learns, and Lai Gengxin optimal value function is estimated Meter, and insertion decision is made according to estimation, specifically, TD study updates it and estimates V^*(s) as follows:

As previously mentioned, V^*(s) by neural network, approximation combines TD algorithm, by formula (11) parameterUpdate transformation Are as follows:

According to above-mentioned update rule, V^*(s) it during being respectively at TD and supervised learning with V (s), and carries out simultaneously.

Algorithm VNE-TD is the function that insertion decision is carried out when VNR is reached.As shown in algorithm VNE-TD, it is input to The state of neural network is the result phase of each node mapping candidate item simulation insertion, and the maximum node of selective value is actually embedded in VNR.After establishing node mapping, the shortest path between two SN nodes with certain bandwidth is found to match VN link.Such as Fruit allows divisible stream, then uses the identical multiple commodity flow Algorithm mapping virtual linkage with [12].According to expression formula (12), we It should select to maximizeMatching j.Because reward (r=Rvn (VNR)) be candidate be it is identical, we It can choose the matching j for maximizing V (sjn).At the end of the life cycle of VNR, it will leave SN and discharge previously described Distribute to its resource.The state of SN can change.However, the parameter of neural network is not update when VNR leaves and when reaching 's.

As an improvement, in the S105, when determining the maximum optimal value function of value, since possible operation set is too big, nothing Method

Traversal needs to do following diminution to operation set first handling: using the probability for generating multiple node mapping candidate items Method,

Using measurement RW and unified value, generating, there is the node of RW and uniform design probability to map candidate item.

Specifically details are as follows for the method for the present invention:

Symbol and notation used in 1 present invention of table

1.1VNE model

SN is modeled as weighted undirected graph, and is denoted as G^s(V^s,E^s), wherein V^sIt is bottom layer node collection, E^sIt is bottom Set of links.Each bottom layer node v^s∈V^s, haveComputing capability (such as cpu cycle), each bottom link e^s∈E^s, tool HaveBandwidth.The bottom Fig. 1 gives the example of a SN.Number around node and link is their available resources.

1.1.1 virtual network requests

One VNR k can also be modeled as a non-directed graph, be denoted as G^k(V^k,E^k), wherein V^kIt is dummy node collection and E^kIt is The set of virtual linkage.Each dummy node v^k∈V^k, haveComputing capability, each virtual link e^k∈E^k, have's Bandwidth demand.An example of VNR is given at the top of Fig. 1.For VNR k, t^kFor VNR arrival time, the value l of a restriction^k For the service life of VNR.

2.2 VNE processes

For VNR k, VNE process is made of following two key component, i.e. node mapping and link mapping

2.2.1 node maps

Node mapping can be described as one-to-one mapping, i.e. M_N:V^k→V^s, in this way, for M_N(v^k)=v^s,v^k∈V^kAnd v^s ∈V^s, following two condition must be satisfied with: if (1)So(2) First constraint ensures that any two node of VNR is mapped to two different sections of SN Point, each VN node of second constraint requirements are mapped to the SN node with certain node capacity.

2.2.2 link maps

In the link maps stage, for a virtual link in VNR, need to look between two mapping nodes in SN To one group of path, total available bandwidth of these nodes is greater than the requirement of virtual link.In the present invention, only consider that single path is reflected The case where penetrating, i.e. a virtual linkage can only be mapped to the path SN.In the case where single path mapping, link mapping can With with mappingIt indicates.Wherein,It is G^sAll paths set.ForIt must satisfy the following conditions:

VNE problem must be handled as an on-line annealing.VNRs dynamic reaches system, and VNE algorithm must reach When handle VNRs.

2.3VNE based revenue model and target

VNE earnings pattern is similar, and the income that InP is generated is expressed from the next:

Wherein η and β respectively indicates the unit price of computing resource and bandwidth resources.

Target is to maximize the long term time average yield of InP, specific as follows:

Wherein K_T=k | 0 < t_k< T } indicate that the VNRs before time instance T arrival gathers.

3. being fitted VME in RL model

How situation is mapped to action as learning algorithm by RL, to maximize digits prize signal.As shown in figure 5, Agency is the main body of study, and environment is the object of study.Agency is able to carry out operation.Executing operation may be such that agency is in Current state, or cause state space to the conversion of another state.Transfer function can be probability function, be also possible to really Qualitative function.As act of agency as a result, environment is that agency generates a reward.Under normal conditions, the value of reward is logical It crosses preset reward function to calculate, which is for controlling the enhancing process to agency.

The purpose of awards faction is that a kind of instant benefaction measurement is provided for a certain behavior under particular state.Each movement Reward depend on new state whether be better than current state.Over time, acting on behalf of trial learning is each particular state The optimum operation of execution maximumlly operates long-term Total Return.In RL, it is related to a value function, by limited In the range of accumulate relevant instant return with indicate it is following what be best.

It will be described in closer detail below, on the one hand, the target of VNE problem is to keep the long term time average yield of InP maximum Change；On the other hand, insertion decision should be made immediately after VNR appearance according to present case and previous experience.VNE problem The existing long term object of property there is on-line decision to provide good environment for the participation of RL again.In Fig. 5, illustrate how VNE problem is fitted in RL model.For VNE problem, the VNRs to arrive from the angle of RL SN and constantly regard as one it is whole Body constitutes environment.In VNE problem, handles a VNR and form a RL period.Basis is acted on behalf of for VNR k+1, VNE Current state s_kIt wherein may include all pervious states with pervious experience and reward provide an insertion decision a_k,.? Take action a_kLater, environment provides result phase s_k+1, and reward r_k+1。

3.2 define a reward function for VNE

As previously mentioned, reward function is it is intended that a certain behavior under given state provides a kind of instant benefaction measurement.By Formula (2) is it is found that the target of VNE problem is to make the long-term average time maximum revenue of InP.It therefore, naturally enough will processing Instant reward after VNR k is defined as Rvn (k), i.e. r_k=Rvn (k).

Obviously, this reward function can be easily adapted to other targets of VNE.It is asked this means that solving VNE with RL Topic is very flexible.For example, if the target of VNE is minimum blockage ratio, if that it is successfully embedded in VNR, Wo Menke It is set as 1 will reward, is otherwise provided as 0.

3.3 define an operation set and a Markovian state for VNE

How definition status and behavior are the key that be related to RL performance.In the present invention, the operation set of VNE is defined as The set of all possible node mapping.If the action insertion according to node mapping is unsuccessful, VNR can be blocked, not to SN Do any operation.

In VNE problem, it is understood that current VNR but not knowing next.Therefore, it is reached in next VNR Before, if containing the VNR state for indicating environment, it can not determine next state of environment.Therefore, although VNE problem Environment include SN and multiple VNR as shown in figure 5, but we only use the state of SN to indicate environment.

We indicate state s using the remaining node capacity of the standardization of SN and link bandwidth_k, have in formWiths_kIt is an ordered set, as follows:

In RL, the status signal for being successfully reserved all relevant informations is known as markov.

Pr{s_k+1=s ', r_k+1=r | s_k, a_k, r_k, s_k-1, a_k-1..., r₁, s₀, a₀} (4)

On the other hand, if status signal has Markov property, the environmental response at k+1 is only depended at k State and movement only pass through specified the following contents, so that it may determine the dynamic of environment in this case:

Pr{s_k+1=s ', r_k+1=r | s_k, a_k} (5)

VNE is modeled as Markovian decision process by 3.4

The intensified learning task for meeting Markov property is known as Markovian decision process (MDP).Since the present invention gives VNE state out is a kind of Markovian state, therefore the decision process of VNE problem can be ideally modeled as MDP.

This tittle is called transition probability.Equally, the desired value of next reward is denoted as:

From the point of view of RL, the target of VNE is to find one to select optimal action most at any time, under any state Dominant strategy.

Definition: the strategy of VNE agency is the reflecting from each state s and movement a to the probability for taking movement a at state s It penetrates.We by strategy and corresponding probability be expressed as π and π (s, a).

Definition: given strategy π, the value function of VNE is the function of VNE state.Value function is expressed as V by us^π(s), s ∈S。V^π(s) it can be counted as accommodating the potentiality of following VNRs and generation permanent income.Its formal definition is as follows:

R_kIt is the summation of all rewards from VNR k.γ is the discount rate for determining the following reward present worth.

Purpose from the angle research VNE problem of RL is to find one kind to obtain the optimal of maximal rewards on long terms Tactful π.

Definition: π^*It is an optimal policy, and if only if given arbitrary strategy π, π^*>=π, this means that for all S, s ∈ S has

Definition: optimal value function is defined as

Proposition: for optimal value function V^*(s), we have following iteration expression formula:

It proves:

Formula (9) indicates the relationship between the optimal value of current state and the optimal value of possible NextState, provides optimal value How function obtains optimal movement.

3.5 optimal value functions approach

In the present invention, it is approximate most to be fully connected (fc) layer using the feedforward neural network of a standard with 2 for we Merit function V^*(s), as shown in Figure 6.Fc1 with fc2 node layer number is identical, is denoted as H.Use rectifier as activation primitive, this It may be the most common activation primitive of deep neural network by 2018.The input of neural network is state s, such as formula (3) It is shown.By calculating, neural network is input, output valve V (s), it is expected that being similar to V with state s^*(s)。

The supervised learning of approximating function V (s) is an adjustment neural network parameterProcess.The purpose is to reduce V to the greatest extent (s) and V^*(s) difference between can indicate are as follows:

With the progress of RL process, V^*(s_k) it is considered as the sample of approximate function V (s) parallel supervisory control study, according to gradient Descent method is for VNRk, parameterIt updates as follows:

Wherein α is the positive step parameter of Schistosomiasis control speed.

3.6 solve the problems, such as VNE with TD study

V is calculated by the approximation of neural network in learning process^*(s).In VNE, a VNR is given, it is understood that Possible operation and corresponding next state.Therefore,WithIt is determining, it is known that.Optimal action π^*(s) may be used It is calculated by following formula:

However, can not be traversed since possible operation set is too big.It would therefore be desirable to be reduced significantly search space.It is as follows Shown in the algorithm GC_GRC in face, (entitled GRC) is measured using node sequencing, it is candidate to develop a kind of multiple nodes mappings of generation The probabilistic method of item.However, inventive algorithm is unrelated with the measurement of GRC.Other two measurement is considered, that is, is measured (referred to as ) and unified value RW.Generate have the node mapping candidate item of RW and uniform design probability two kinds of algorithms be respectively GC_RW and GC_UNI.In algorithm GC_GRC, parameter L is that the node generated maps candidate number.

In the present invention, a kind of RL method has been used, i.e. Timing Difference (being abbreviated as TD) learns, Lai Gengxin optimal value function Estimation, and insertion decision is made according to estimation.Specifically, TD study updates its estimation V^*(s) as follows:

V^*(s) by neural network approximation.In conjunction with TD algorithm, by formula (11) parameterUpdate transformation are as follows:

Algorithm VNE-TD is the function that insertion decision is carried out when VNR is reached.In VNE-TD, neural network parameterIt presses Normal distribution initialization.As shown in algorithm VNE-TD, the state for being input to neural network is each node mapping candidate item mould The result phase of quasi- insertion, the maximum node of selective value are actually embedded in VNR.After establishing node mapping, finding two has centainly Shortest path between the SN node of bandwidth matches VN link.If allowing divisible stream, reflected using multiple commodity flow algorithm Penetrate virtual linkage.According to expression formula (12), it should which selection maximizesMatching j.Because rewarding (r=Rvn (VNR)) for candidate be it is identical, can choose maximize V (s^j _n) matching j.After being embedded in VNR, algorithm VNE-TD is in memory Middle storage triple<sc, r, sn>, as shown in the 26th row.The maximum triple quantity that memory can store is set as 1000.It is interior Deposit the Substitution Rules for following FIFO (advanced, first to go out).In order to keep the training of neural network more smooth and optimize, relative to expression Single step mode described in formula (14), parameterIt is batch updating.VNE-TD randomly selects the three of batch size from memory Tuple, with the triple training neural network of batch size.As shown in formula (14), a triple<sc, r, sn>training miss Difference is r+ γ V_k(s_n)-V_k(s_c).The target of the batch processing training process is the mean square error i.e. loss reduction for making batch processing.Such as Shown in 2nd row, any one of three kinds of algorithms, i.e. GC_GRC, GC_RW or GC_UNI is can be used in VNE-TD.Use GC_ The algorithm of GRC, GC_RW or GC_UNI are respectively designated as VNE-TD-GRC, VNE-TD-RW or VNE-TD-UNI.

At the end of the life cycle of VNR, it will leave SN and discharges the previously described resource for distributing to it.The shape of SN State can change.However, the parameter of neural network is not updated when VNR leaves and when reaching.

Evaluation

1, benchmark test and performance indicator

VNE-TD and algorithm in the prior art are compared.

Mainly compare VNE-TD and other algorithms using following three performance indicators: (1) blockage ratio is obstruction VNRs Quantity divided by all VNRs sum；(2) income per second be up to the present total income obtained divided by the second passed through Number；(3) weighted average path length (abbreviation WAPL) is the sum of all bandwidth actually distributed in SN divided by all VNRs links The weighted average length in all paths that the sum of bandwidth, i.e. VNR link maps arrive.

2, emulation setting

Event driven simulated environment is realized using Python.Neural network and its training are realized with Tensorflow , Tensorflow is the popular open source software library for the machine learning such as neural network application.In simulations, sharp Generate the topological structure of SN and VNs at random with GT-ITM tool.SN has 60 nodes and 150 links.VN number of nodes It is evenly distributed between 2-20, the connectivity of link between VNs any two node is 0.2.It needs to be embedded in 4000 in SN VNRs.For SN network and VNs network, start node capacity and link bandwidth are all randomly selected, and take the equal of identical mean value Even distribution.The node capacity of SN and the average value of link bandwidth are 40 times of VNs.VNRs is reached one by one, forms one A Poisson process, average arrival rate are one per second request.The service life of VNRs obeys exponential distribution, average out to μ=70 second. By in the expression formula (1) in earnings pattern parameter η and β value be set as 1.Discount rate in formula (8) is set as 1, because we Make neural network convergence more stable, faster as 1 it was found that setting γ.For neural network, node in hidden layer H is set as by we 300, it is identical as the input number size of neural network.Batch size assessed in subdivision below is rule of thumb set as 50.Node The quantity of mapping candidate item (i.e. L) is set as 40.That, unless otherwise stated, the above parameter will not be changed in following trifle.

In addition to 4 trifles, each emulation series in following subsections will be run three times.To use every time with it is previously described The different sets of identical SN and VNRs topological structure and random node capacity and link bandwidth.The standard deviation run three times Difference is expressed as follows simulation result with error bars.

1, the robustness of GRC parameter d

In general, the calculating of GRC is based on two factors, i.e. node capacity and the concatenation ability with other nodes.Use GRC Parameter d balance the two factors.In Fig. 7 (a), the blockage ratio of algorithms of different is illustrated.In Fig. 7 (b), illustrate Income per second.It can be seen from figure 7 that VNE-TD-GRC is insensitive to parameter d, and the performance of GRC-VNE is obviously dependent on Parameter d.In addition, the deviation of GRC-VNE is very big when d is relatively small.The offset of VNE-TD-GRC is small and stablizes.Imitative Under the congestion condition being really arranged, the demand of link bandwidth is bigger than node capacity, also more crucial.Therefore, it for GRC-VNE, needs Parameter d is adjusted to support the factor of concatenation ability, and almost to have ignored the factor of node capacity close to 1.00.It compares Under, VNE-TD-GRC only uses the measurement of GRC to help to reduce search range, and reflects dependent on value function to do egress The final decision penetrated.This is why compared with GRC-VNE, reason VNE-TD-GRC insensitive to parameter d.Obviously, this It is an ideal attribute of VNE-TD-GRC, because VNRs is not to be known in advance, and over time It can change a lot.

Therefore, it is 0.95, GRC-VNE 0.995 that parameter d is set VNE-TD-GRC by the present invention.

2, the influence of TD study

In order to show the influence of TD study, we with Rand-GRC algorithm (refer to randomly choose GRC) come with VNE-TD-GRC It is compared.Similar with algorithm VNE-TD-GRC, algorithm Rand-GRC generates L node using algorithm GC-GRC probabilityly and reflects Penetrate candidate item.Unlike, it there is no selection V (s) represented by maximum value, but from it is all can succeed be embedded in times A candidate item is randomly choosed in option.This means that Rand-GRC loses learning ability compared with VNE-TD-GRC.? In the simulation of this trifle, L is arranged to 10.

Although from Fig. 8 (a) as can be seen that node mapping be it is probabilistic, due to having multiple candidate items, algorithm The blockage ratio of Rand-GRC is better than GRC-VNE.It means that even if in the training process, VNE-TD-GRC still can compare GRC-VNE shows more preferably.In addition, compared with GRC-VNE, when TD study be related to selecting from multiple candidate targets it is optimal When, blockage ratio significantly improves 67.2% at 3900.From Fig. 8 (b) as can be seen that compared with GRC-VNE, VNE-TD-GRC Algorithm at 3900 it is per second can increase by 13.9% income.It is interesting that Rand-GRC in terms of income per second almost and GRC-VNE is equally good, although it is better than GRC-VNE in terms of blockage ratio.It is lower to seem that Rand-GRC is only good at insertion income And the VNRs relatively easily handled.From Fig. 8 (c) as can be seen that the probability due to node maps, algorithm Rand-GRC is compared with GRC- VNE significantly improves WAPL.And algorithm TD-VNE-GRC can be efficiently against this disadvantage.This means that being learnt using TD It can be by keeping blockage ratio and WAPL is lower helps to improve income per second.

In Fig. 9, we show the increase with frequency of training, the situations of change of loss.Loss is trained batch Mean square error is the minimum target of training process.From fig. 9, it can be seen that loss converges to part when the 700th time trained most It is excellent, that is, handle the time after the 700th VNR.In local optimum, loss is about 400 (error is about 20).Average remuneration is about It is 92, loss when local optimum is relatively small, this might mean that preferable with proposed neural network Approximation effect.

3, the influence of workload

The influence that we show workload by the way that the mean survival time (MST) of VNRs to be changed to 100 seconds from 40 seconds.We It also adds algorithm LC-GRC and (represents the minimum node of GRC cost, (our algorithm, which is that selection is maximum, to be compared), it makes L node is generated with algorithm GC-GRC and maps candidate item, and selects the candidate item that cost is minimum in SN.

It can be seen from fig. 10 that compared with other algorithms, with the increase of workload, the three kinds of VNE-TD proposed The blockage ratio of algorithm and income per second have lasting raising.Wherein, compared with GRC-VNE and RW-MM-SP, algorithm Income VNE-TD-GRC per second under highest workload has increased separately 24.8% and 17.1%.

Algorithm VNE-TD-GRC behaves oneself best in the VNE-TD of three versions.Algorithm VNE-TD-UNI performance is worst, Three version large deviations are maximum.It is absorbed in one this means that two indices GRC and RW contribute positively to VNE-TD and more has prospect Search field, although improve amplitude it is little.In addition, it also shows the potentiality that VNE-TD is combined with other VNE algorithms.

4, the influence of parameter L

(a) and (b) in Figure 11, we show the influence of the quantity of node mapping candidate, i.e. parameter L. this show with GRC-VNE is compared, VNE-TD-GRC is per second can be further improved blockage ratio and income respectively from 79.6% and 17.4%, 82.3% and 18.3%, while L increases to 60 from 40.According to the computation complexity of VNE-TD in 3.7 sections, L is increased to from 40 60 not will lead to the unacceptable increase for calculating the time.

5, the influence of topological attribute

In Figure 12, we show the influences of VN node link degree of communication.With the raising of link degree of communication, VN node Degree of communication also increase accordingly, it means that insertion difficulty be consequently increased.It can be recognized from fig. 12 that when connectivity of link is higher When, VNE-TD-GRC ratio GRC-VNE works more preferably.When connectivity of link is 0.5, the income ratio per second of VNE-TD-GRC GRC-VNE high 23.1%.

Finally, it is stated that the above examples are only used to illustrate the technical scheme of the present invention and are not limiting, although referring to compared with Good embodiment describes the invention in detail, those skilled in the art should understand that, it can be to skill of the invention Art scheme is modified or replaced equivalently, and without departing from the objective and range of technical solution of the present invention, should all be covered at this In the scope of the claims of invention.

Claims

1. a kind of virtual network embedded mobile GIS based on Timing Difference study, characterized by the following steps:

S101: VNE model is established

Bottom-layer network SN is modeled as weighted undirected graph, and is denoted as G^s(V^s,E^s), wherein V^sIt is bottom layer node collection, E^sIt is Bottom set of links, each bottom layer node v^s∈V^s, haveComputing capability, each bottom link e^s∈E^s, haveBand It is wide；

S102: definition status

Wherein, c_vIndicate the node capacity of node v, b_eIndicate that the link bandwidth of link e, η indicate that computing resource unit price, β indicate band The unit price of wide resource；Therefore, VNR will naturally enough be handled_kInstant reward afterwards is defined as Rvn (k), i.e. r_k=Rvn (k)；

S102c: Markov state is defined for VNE:

State s is indicated using the remaining node capacity of the standardization of SN and link bandwidth_k, have in form Withs_kIt is an ordered set, shown in following formula (3):

If status signal has Markov property, the environmental response at k+1 only depends on state and movement at k, In this case, only by specifying the following contents, so that it may determine the dynamic of environment；

Pr{s_k+1=s ', r_k+1=r | s_k, a_k} (5)

S103: VNE is modeled as Markovian decision process MDP；

S103a: definition strategy and value function: the strategy of VNE agency is at state s, from each state s and movement a to taking The mapping of the probability of a is acted, gives strategy π, the value function of VNE is the function of VNE state, and value function is expressed as V^π(s), s ∈ S, V^π(s) it can be counted as accommodating following VNRs and generate the potentiality of permanent income, current state is measured with this Quality, its definition such as formula (8):

S103b: optimal value function is defined:

Purpose from the angle research VNE problem of RL is to find a kind of optimal plan that can obtain maximal rewards on long terms Slightly；

If π^*It is an optimal policy, and if only if given arbitrary strategy π, π^*>=π, this means that for all s, s ∈ S has

Optimal value function is defined as

It is fully connected (fc) layer using the feedforward neural network and 2 of standard and carrys out near-optimization value function V^*(s), fc1 and Fc2 node layer number is identical, is denoted as H, uses rectifier as activation primitive, the input of neural network is state s, such as formula (3) institute Show, by calculating, neural network is input, output valve V (s), it is expected that being similar to V with state s^*(s)；

The supervised learning of approximating function V (s) is an adjustment neural network parameterProcess, the purpose is to reduce V (s) to the greatest extent With V^*(s) difference between can indicate are as follows:

With the progress of RL process, V^*(s_k) it is considered as the sample of approximate function V (s) parallel supervisory control study, declined according to gradient Method is for VNR k, parameterIt updates as follows:

Wherein α is the positive step parameter of Schistosomiasis control speed；

S105: in VNE, giving a VNR, it is understood that possible operation and corresponding next state, therefore,WithIt is determining, it is known that, the matching of each node mapping is traversed, as operation set, by operation set simulation insertion Input of the result phase collection as neural network in S104, obtains the value of multiple optimal value functions, since optimal policy π * (s) can It indicates are as follows:

That is, value is maximum just to meet optimal policy,

S106: matching corresponding to the maximum optimal value function of selective value is actually embedded in VNR, and then finding two has certain band Shortest path between wide SN node matches VN link.

2. as described in claim 1, it is characterised in that: in the S105, when traversing the matching of each node mapping, need head Following diminution processing is first done to it:

Using the probabilistic method for generating multiple node mapping candidate items, using measurement RW and unified value, generation is with RW and uniformly The node of select probability maps candidate item.