CN110233763A - A kind of virtual network embedded mobile GIS based on Timing Difference study - Google Patents

A kind of virtual network embedded mobile GIS based on Timing Difference study Download PDF

Info

Publication number
CN110233763A
CN110233763A CN201910527020.7A CN201910527020A CN110233763A CN 110233763 A CN110233763 A CN 110233763A CN 201910527020 A CN201910527020 A CN 201910527020A CN 110233763 A CN110233763 A CN 110233763A
Authority
CN
China
Prior art keywords
vne
node
state
vnr
function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910527020.7A
Other languages
Chinese (zh)
Other versions
CN110233763B (en
Inventor
王森
张标
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University
Original Assignee
Chongqing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University filed Critical Chongqing University
Priority to CN201910527020.7A priority Critical patent/CN110233763B/en
Publication of CN110233763A publication Critical patent/CN110233763A/en
Application granted granted Critical
Publication of CN110233763B publication Critical patent/CN110233763B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/12Discovery or management of network topologies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present invention relates to a kind of virtual network embedded mobile GISs based on Timing Difference study, and VNE problem is modeled as Markovian decision process (MDP) by this method, and establishes a neural network to approach the value function of VNE state.On this basis, a kind of algorithm for being named as VNE-TD based on Timing Difference study (a kind of intensified learning method) is proposed.In VNE-TD, multiple insertion candidate items of node mapping are that probability generates, and TD learns the long-term potentiality for assessing each candidate item.A large amount of simulation result shows that VNE-TD algorithm is all substantially better than previous algorithm in terms of (block ratio) blockage ratio and income.

Description

A kind of virtual network embedded mobile GIS based on Timing Difference study
Technical field
The present invention relates to computer network, in particular to a kind of virtual network embedded mobile GIS based on Timing Difference study.
Background technique
In recent years, network virtualization is received due to providing a very promising solution for following network Study the extensive concern of community and industry.It is considered as a kind of tool, can overcome resistance of the current internet to fundamental change Power.In addition, network virtualization is also the crucial pusher of cloud computing.The principal entities of network virtualization are virtual network (VN). As shown in Figure 1, VN is the combination of the dummy node and link on bottom-layer network (SN), the number on interior joint or under link Respectively node capacity and link bandwidth.Dummy node is connected with each other by the virtual link in the one or more path SN.Pass through The node resource and link circuit resource of one SN are virtualized, can on identical physical hardware trustship simultaneously have it is wide Multiple VNs of general different characteristics.One group of virtual network requests (VNR) for having certain resource requirement to node and link is given, A problem of specific node and link subset are to meet each VNR referred to as virtual network is found in a SN to be embedded in (VNE) problem.Under most of realities, VNE problem must be handled as an on-line annealing.That is, VNRs It is not known in advance.On the contrary, they dynamically reach system, and can stay for some time in SN.In fact, VNE algorithm VNRs must be handled when reaching, rather than one group of VNRs of single treatment (offline VNE).Online insertion decision is being done for VNRs When, usually to maximize its long-term gain as target, this makes infrastructure provider (InP, the usually owner of SN) VNE problem has more challenge.
Summary of the invention
The technical problem to be solved by the present invention is being realized when virtual network is embedded between performance and computation complexity preferably Balance.
To achieve the above object, the present invention adopts the following technical scheme: a kind of virtual network based on Timing Difference study Insertion is calculated
Method includes the following steps:
S101: VNE model is established
Bottom-layer network SN is modeled as weighted undirected graph, and is denoted as Gs(Vs,Es), wherein VsIt is bottom layer node collection, EsIt is bottom set of links, each bottom layer node vs∈Vs, haveComputing capability, each bottom link es∈Es, have Bandwidth;
By VNRkIt is modeled as a non-directed graph, is denoted as Gk(Vk,Ek), wherein VkIt is dummy node collection, EkIt is virtual linkage collection, Each dummy node vk∈Vk, haveComputing capability, each virtual link ek∈Ek, haveBandwidth demand;
S102: definition status
S102a: for VNEkA reward function is defined, such as formula (1): VNEkIndicate the process to k-th of VNR;
Wherein, cvIndicate the node capacity of node v, beIndicate that the link bandwidth of link e, η indicate computing resource unit price, β table Show the unit price of bandwidth resources;Therefore, VNR will naturally enough be handledkInstant reward afterwards is defined as Rvn (k), i.e. rk=Rvn (k);
S102b: the set of all possible node mapping is defined as VNE defining operation collection: the operation set of VNE;
S102c: Markov state is defined for VNE:
State s is indicated using the remaining node capacity of the standardization of SN and link bandwidthk, have in formWithskIt is an ordered set, shown in following formula (3):
In RL, the status signal for being successfully reserved all relevant informations is known as markov;
If status signal has Markov property, the environmental response at k+1 only depends on the state at k and moves Make, in this case, only by specifying the following contents, so that it may determine the dynamic of environment;
Pr{sk+1=s ', rk+1=r | sk, ak} (5)
S103: VNE is modeled as Markovian decision process MDP;
S103a: definition strategy and value function: VNE agency strategy be at state s, from each state s and movement a to The mapping for taking the probability of movement a, gives strategy π, the value function of VNE is the function of VNE state, and value function is expressed as Vπ (s), s ∈ S, Vπ(s) it can be counted as accommodating the potentiality of following VNRs and generation permanent income, it is current to measure with this The quality of state, its definition such as formula (8):
RkIt is from VNRkAll rewards summation, γ is the discount rate for determining the following reward present worth;
S103b: optimal value function is defined:
Purpose from the angle research VNE problem of RL is to find one kind to obtain the optimal of maximal rewards on long terms Strategy;
If π*It is an optimal policy, and if only if given arbitrary strategy π, π*>=π, this means that for all S, s ∈ S, have
Optimal value function is defined as
For optimal value function V*(s), there is following iteration expression formula:
S104: optimal value function V is approached using neural network*(s), i.e., the value function under optimal policy:
It is fully connected (fc) layer using the feedforward neural network and 2 of standard and carrys out near-optimization value function V*(s), Fc1 with fc2 node layer number is identical, is denoted as H, uses rectifier as activation primitive, the input of neural network is state s, such as formula (3) shown in, by calculating, neural network is input, output valve V (s), it is expected that being similar to V with state s*(s);
The supervised learning of approximating function V (s) is an adjustment neural network parameterProcess, the purpose is to reduce V to the greatest extent (s) and V*(s) difference between can indicate are as follows:
With the progress of RL process, V*(sk) it is considered as the sample of approximate function V (s) parallel supervisory control study, according to gradient Descent method is for VNR k, parameterIt updates as follows:
Wherein α is the positive step parameter of Schistosomiasis control speed;
S105: in VNE, giving a VNR, it is understood that possible operation and corresponding next state, therefore,WithIt is determining, it is known that, the matching for traversing each node mapping simulates operation set as operation set Input of the result phase collection of insertion as neural network in S104, obtains the value of multiple optimal value functions, due to optimal policy π * (s) may be expressed as:
That is, value is maximum just to meet optimal policy,
S106: matching corresponding to the maximum optimal value function of selective value is actually embedded in VNR, and then finding two has one The shortest path between the SN node of bandwidth is determined to match VN link.
As an improvement, when traversing the matching of each node mapping, needing to do it at following diminution first in the S105 Reason:
Using the probabilistic method for generating multiple nodes mapping candidate items, using measurement RW and unified value, generate have RW and The node of uniform design probability maps candidate item.
Compared with the existing technology, the present invention at least has the advantages that
1. using neural network come the value function of approximate VNE state, for the VNE problem with huge state space, mind Facilitate to be generalized to the state having never seen from former veteran state through network.
2. being learnt based on Timing Difference, abandons and passively balance, determining online by Active Learning and based on previous experiences Plan is embedded in the contradiction of decision and long term object, more effectively solution resource allocation problem to overcome online, improves the utilization of resources Rate.
Detailed description of the invention
Fig. 1 is the example of a VNE problem
Fig. 2 is a sample topology.
The example of mono- VNE problem of Fig. 3.
The insertion result of Fig. 4 example.
Fig. 5 RL conceptual illustration VNE process.
Fig. 6 approaches the neural network of optimal value function
Fig. 7 (a) is the relational graph of blockage ratio and parameter d in algorithms of different, Fig. 7 (b) be in algorithms of different income per second and The relational graph of parameter d.
Fig. 8 (a) is the relational graph of blockage ratio and time in algorithms of different, Fig. 8 (b) be in algorithms of different income per second and when Between relational graph, Fig. 8 (c) be algorithms of different in WAPL and time relational graph.
Fig. 9 is the relational graph of loss with frequency of training.
Figure 10 (a) is the relational graph of blockage ratio and workload in algorithms of different, and Figure 10 (b) is receipts per second in algorithms of different The relational graph of benefit and workload, Figure 10 (c) are the relational graph of WAPL and workload in algorithms of different.
Figure 11 (a) is the influence relational graph that blockage ratio and node map candidate item number, and Figure 11 (b) is income per second and section The influence relational graph of point mapping candidate item number.
Figure 12 (a) is the relational graph of blockage ratio and VNRs link degree of communication in algorithms of different, and Figure 12 (b) is in algorithms of different The relational graph of income per second and VNRs link degree of communication.
Specific embodiment
Invention is further described in detail below.
The significant challenge of VNE problem is on-line decision and pursues the contradiction between long term object.The prior art attempts to lead to Overbalance SN workload overcomes this challenge, it is desirable to be able to accommodate more following VNRs.However, the problem of here, is The concatenation ability of node is related to other nodes.The consumption of node connectivity ability can might not only reduce the energy of its own Power.Need to be embedded in SN and VNR in Fig. 3, in SN.It is with node rank in the prior art measurement (entitled GRC) Example.The parameter d GRC value for being set as 0.85, SN node is illustrated in figure 4 " Original ".In order to balance SN workload, The node and strongest two nodes of connectivity binding ability that GRC-VNE will select GRC to measure, i.e. node B and node G come Match two nodes (node a and the node b) in VNR.Therefore, remaining GRC value is illustrated in figure 4 " After VNR Embedded by GRC-VNE ", the variance of these values are 0.0032.In contrast, VNE-TD algorithm choosing proposed by the present invention Select node B and node C.Remaining GRC value is illustrated in figure 4 " After VNR embedded by VNE-TD ", the side of these values Difference is 0.0016.This shows that it is problematic for balancing the basic assumption of the work of SN workload in the prior art.It was both The workload more balanced cannot be brought, more surplus resources can not be brought.
A kind of virtual network embedded mobile GIS based on Timing Difference study, includes the following steps:
S101: VNE model is established
Bottom-layer network SN is modeled as weighted undirected graph, and is denoted as Gs(Vs,Es), wherein VsIt is bottom layer node collection, EsIt is bottom set of links, each bottom layer node vs∈Vs, haveComputing capability (such as cpu cycle), each bottom link es∈Es, haveBandwidth;The bottom Fig. 1 gives the example of a SN.Number around node and link is them Available resources.
By VNRkIt is modeled as a non-directed graph, is denoted as Gk(Vk,Ek), wherein VkIt is dummy node collection, EkIt is virtual linkage collection, Each dummy node vk∈Vk, haveComputing capability, each virtual link ek∈Ek, haveBandwidth demand;
An example of VNR is given at the top of Fig. 1.For VNR k, tkFor VNR arrival time, the value l of a restrictionkFor The service life of VNR.
S102: definition status
S102a: for VNEkA reward function is defined, such as formula (1): VNEkIndicate the process to k-th of VNR;
Wherein, cvIndicate the node capacity of node v, beIndicate that the link bandwidth of link e, η indicate computing resource unit price, β table Show the unit price of bandwidth resources;
The target of reward function is to maximize the long term time average yield of InP, specific as follows:
Wherein KT=k | 0 < tk< T } indicate that multiple VNR before time instance T arrival gather;
Reward function is it is intended that a certain behavior under given state provides a kind of instant benefaction measurement.By formula (2) it is found that The target of VNE problem is to make the long-term average time maximum revenue of InP.Therefore, naturally enough will be after processing VNR k When reward be defined as Rvn (k), i.e. rk=Rvn (k).
S102b: the set of all possible node mapping is defined as VNE defining operation collection: the operation set of VNE;
S102c: Markov state is defined for VNE:
State s is indicated using the remaining node capacity of the standardization of SN and link bandwidthk, have in formWithskIt is an ordered set, shown in following formula (3):
In RL, the status signal for being successfully reserved all relevant informations is known as markov;
It is all it is important that current state output signals for Markovian state;Significance of which is independently of leading to its path Or historical signal.More specifically, in the most common causality, the one of generation before the reaction of environment is likely to be dependent on It cuts.In most of RL problems, transfer function is probability function.In this case, dynamics can only be by specified complete Probability distribution indicates:
Pr{sk+1=s ', rk+1=r | sk, ak, rk, sk-1, ak-1..., r1, s0, a0} (4)
On the other hand, if status signal has Markov property, the environmental response at k+1 is only depended at k State and movement only pass through specified the following contents, so that it may determine the dynamic of environment in this case;
Pr{sk+1=s ', rk+1=r | sk, ak} (5)
S103: VNE is modeled as Markovian decision process MDP;
S103a: definition strategy and value function:
The intensified learning task for meeting Markov property is known as Markovian decision process, is provided due to the present invention VNE state is a kind of Markovian state, therefore the decision process of VNE problem can be ideally modeled as MDP.
In MDP, free position s and movement a are given, the probability of each possible next state s ' is expressed as:
This tittle is called transition probability;Equally, the desired value of next reward is denoted as:
From the point of view of RL, the target of VNE is to find one to select optimal action most at any time, under any state Dominant strategy;
VNE agency strategy be at state s from each state s and movement a to take movement A probability mapping, will Tactful and corresponding probability be expressed as π and π (s, a);
Almost all of nitrification enhancement is all based on estimation value function, function of state, to estimate agency in given The quality of state;
Given strategy π, the value function of VNE is the function of VNE state, value function is expressed as Vπ(s), s ∈ S, Vπ(s) It can be counted as accommodating the potentiality of following VNRs and generation permanent income, its formal definition such as formula (8):
RkIt is the summation of all rewards from VNR k, γ is the discount rate for determining the following reward present worth,
S103b: optimal value function is defined:
Purpose from the angle research VNE problem of RL is to find one kind to obtain the optimal of maximal rewards on long terms Strategy;
If π*It is an optimal policy, and if only if given arbitrary strategy π, π*>=π, this means that for all S, s ∈ S, have
Optimal value function is defined as
For optimal value function V*(s), there is following iteration expression formula:
S104: Neural Networks Solution V (s), so that V (s) is approached optimal value function V*(s):
It is fully connected (fc) layer using the feedforward neural network and 2 of standard and carrys out near-optimization value function V*(s), such as Shown in Fig. 6.Fc1 with fc2 node layer number is identical, is denoted as H, uses rectifier as activation primitive, and the input of neural network is shape State s, as shown in formula (3), by calculating, neural network is input, output valve V (s), it is expected that being similar to V with state s*(s);
The supervised learning of approximating function V (s) is an adjustment neural network parameterProcess, the purpose is to reduce V to the greatest extent (s) and V*(s) difference between can indicate are as follows:
With the progress of RL process, V*(sk) it is considered as the sample of approximate function V (s) parallel supervisory control study, according to gradient Descent method is for VNR k, parameterIt updates as follows:
Wherein α is the positive step parameter of Schistosomiasis control speed;
S105: in VNE, giving a VNR, it is understood that possible operation and corresponding next state.Therefore,WithIt is determining, it is known that.The matching for traversing each node mapping simulates operation set as operation set Input of the result phase collection of insertion as neural network in S104, obtains the value of multiple optimal value functions.Due to optimal policy π * (s) may be expressed as:
That is, value is maximum just to meet optimal policy.
S106: node corresponding to the maximum value function of selective value is actually embedded in VNR, and then finding two has certain band Shortest path between wide SN node matches VN link.
Present invention uses a kind of RL method, i.e. Timing Difference (being abbreviated as TD) learns, and Lai Gengxin optimal value function is estimated Meter, and insertion decision is made according to estimation, specifically, TD study updates it and estimates V*(s) as follows:
As previously mentioned, V*(s) by neural network, approximation combines TD algorithm, by formula (11) parameterUpdate transformation Are as follows:
According to above-mentioned update rule, V*(s) it during being respectively at TD and supervised learning with V (s), and carries out simultaneously.
Algorithm VNE-TD is the function that insertion decision is carried out when VNR is reached.As shown in algorithm VNE-TD, it is input to The state of neural network is the result phase of each node mapping candidate item simulation insertion, and the maximum node of selective value is actually embedded in VNR.After establishing node mapping, the shortest path between two SN nodes with certain bandwidth is found to match VN link.Such as Fruit allows divisible stream, then uses the identical multiple commodity flow Algorithm mapping virtual linkage with [12].According to expression formula (12), we It should select to maximizeMatching j.Because reward (r=Rvn (VNR)) be candidate be it is identical, we It can choose the matching j for maximizing V (sjn).At the end of the life cycle of VNR, it will leave SN and discharge previously described Distribute to its resource.The state of SN can change.However, the parameter of neural network is not update when VNR leaves and when reaching 's.
As an improvement, in the S105, when determining the maximum optimal value function of value, since possible operation set is too big, nothing Method
Traversal needs to do following diminution to operation set first handling: using the probability for generating multiple node mapping candidate items Method,
Using measurement RW and unified value, generating, there is the node of RW and uniform design probability to map candidate item.
Specifically details are as follows for the method for the present invention:
Symbol and notation used in 1 present invention of table
1.1VNE model
SN is modeled as weighted undirected graph, and is denoted as Gs(Vs,Es), wherein VsIt is bottom layer node collection, EsIt is bottom Set of links.Each bottom layer node vs∈Vs, haveComputing capability (such as cpu cycle), each bottom link es∈Es, tool HaveBandwidth.The bottom Fig. 1 gives the example of a SN.Number around node and link is their available resources.
1.1.1 virtual network requests
One VNR k can also be modeled as a non-directed graph, be denoted as Gk(Vk,Ek), wherein VkIt is dummy node collection and EkIt is The set of virtual linkage.Each dummy node vk∈Vk, haveComputing capability, each virtual link ek∈Ek, have's Bandwidth demand.An example of VNR is given at the top of Fig. 1.For VNR k, tkFor VNR arrival time, the value l of a restrictionk For the service life of VNR.
2.2 VNE processes
For VNR k, VNE process is made of following two key component, i.e. node mapping and link mapping
2.2.1 node maps
Node mapping can be described as one-to-one mapping, i.e. MN:Vk→Vs, in this way, for MN(vk)=vs,vk∈VkAnd vs ∈Vs, following two condition must be satisfied with: if (1)So(2) First constraint ensures that any two node of VNR is mapped to two different sections of SN Point, each VN node of second constraint requirements are mapped to the SN node with certain node capacity.
2.2.2 link maps
In the link maps stage, for a virtual link in VNR, need to look between two mapping nodes in SN To one group of path, total available bandwidth of these nodes is greater than the requirement of virtual link.In the present invention, only consider that single path is reflected The case where penetrating, i.e. a virtual linkage can only be mapped to the path SN.In the case where single path mapping, link mapping can With with mappingIt indicates.Wherein,It is GsAll paths set.ForIt must satisfy the following conditions:
VNE problem must be handled as an on-line annealing.VNRs dynamic reaches system, and VNE algorithm must reach When handle VNRs.
2.3VNE based revenue model and target
VNE earnings pattern is similar, and the income that InP is generated is expressed from the next:
Wherein η and β respectively indicates the unit price of computing resource and bandwidth resources.
Target is to maximize the long term time average yield of InP, specific as follows:
Wherein KT=k | 0 < tk< T } indicate that the VNRs before time instance T arrival gathers.
3. being fitted VME in RL model
How situation is mapped to action as learning algorithm by RL, to maximize digits prize signal.As shown in figure 5, Agency is the main body of study, and environment is the object of study.Agency is able to carry out operation.Executing operation may be such that agency is in Current state, or cause state space to the conversion of another state.Transfer function can be probability function, be also possible to really Qualitative function.As act of agency as a result, environment is that agency generates a reward.Under normal conditions, the value of reward is logical It crosses preset reward function to calculate, which is for controlling the enhancing process to agency.
The purpose of awards faction is that a kind of instant benefaction measurement is provided for a certain behavior under particular state.Each movement Reward depend on new state whether be better than current state.Over time, acting on behalf of trial learning is each particular state The optimum operation of execution maximumlly operates long-term Total Return.In RL, it is related to a value function, by limited In the range of accumulate relevant instant return with indicate it is following what be best.
It will be described in closer detail below, on the one hand, the target of VNE problem is to keep the long term time average yield of InP maximum Change;On the other hand, insertion decision should be made immediately after VNR appearance according to present case and previous experience.VNE problem The existing long term object of property there is on-line decision to provide good environment for the participation of RL again.In Fig. 5, illustrate how VNE problem is fitted in RL model.For VNE problem, the VNRs to arrive from the angle of RL SN and constantly regard as one it is whole Body constitutes environment.In VNE problem, handles a VNR and form a RL period.Basis is acted on behalf of for VNR k+1, VNE Current state skIt wherein may include all pervious states with pervious experience and reward provide an insertion decision ak,.? Take action akLater, environment provides result phase sk+1, and reward rk+1
3.2 define a reward function for VNE
As previously mentioned, reward function is it is intended that a certain behavior under given state provides a kind of instant benefaction measurement.By Formula (2) is it is found that the target of VNE problem is to make the long-term average time maximum revenue of InP.It therefore, naturally enough will processing Instant reward after VNR k is defined as Rvn (k), i.e. rk=Rvn (k).
Obviously, this reward function can be easily adapted to other targets of VNE.It is asked this means that solving VNE with RL Topic is very flexible.For example, if the target of VNE is minimum blockage ratio, if that it is successfully embedded in VNR, Wo Menke It is set as 1 will reward, is otherwise provided as 0.
3.3 define an operation set and a Markovian state for VNE
How definition status and behavior are the key that be related to RL performance.In the present invention, the operation set of VNE is defined as The set of all possible node mapping.If the action insertion according to node mapping is unsuccessful, VNR can be blocked, not to SN Do any operation.
In VNE problem, it is understood that current VNR but not knowing next.Therefore, it is reached in next VNR Before, if containing the VNR state for indicating environment, it can not determine next state of environment.Therefore, although VNE problem Environment include SN and multiple VNR as shown in figure 5, but we only use the state of SN to indicate environment.
We indicate state s using the remaining node capacity of the standardization of SN and link bandwidthk, have in formWithskIt is an ordered set, as follows:
In RL, the status signal for being successfully reserved all relevant informations is known as markov.
It is all it is important that current state output signals for Markovian state;Significance of which is independently of leading to its path Or historical signal.More specifically, in the most common causality, the one of generation before the reaction of environment is likely to be dependent on It cuts.In most of RL problems, transfer function is probability function.In this case, dynamics can only be by specified complete Probability distribution indicates:
Pr{sk+1=s ', rk+1=r | sk, ak, rk, sk-1, ak-1..., r1, s0, a0} (4)
On the other hand, if status signal has Markov property, the environmental response at k+1 is only depended at k State and movement only pass through specified the following contents, so that it may determine the dynamic of environment in this case:
Pr{sk+1=s ', rk+1=r | sk, ak} (5)
VNE is modeled as Markovian decision process by 3.4
The intensified learning task for meeting Markov property is known as Markovian decision process (MDP).Since the present invention gives VNE state out is a kind of Markovian state, therefore the decision process of VNE problem can be ideally modeled as MDP.
In MDP, free position s and movement a are given, the probability of each possible next state s ' is expressed as:
This tittle is called transition probability.Equally, the desired value of next reward is denoted as:
From the point of view of RL, the target of VNE is to find one to select optimal action most at any time, under any state Dominant strategy.
Definition: the strategy of VNE agency is the reflecting from each state s and movement a to the probability for taking movement a at state s It penetrates.We by strategy and corresponding probability be expressed as π and π (s, a).
Definition: given strategy π, the value function of VNE is the function of VNE state.Value function is expressed as V by usπ(s), s ∈S。Vπ(s) it can be counted as accommodating the potentiality of following VNRs and generation permanent income.Its formal definition is as follows:
RkIt is the summation of all rewards from VNR k.γ is the discount rate for determining the following reward present worth.
Purpose from the angle research VNE problem of RL is to find one kind to obtain the optimal of maximal rewards on long terms Tactful π.
Definition: π*It is an optimal policy, and if only if given arbitrary strategy π, π*>=π, this means that for all S, s ∈ S has
Definition: optimal value function is defined as
Proposition: for optimal value function V*(s), we have following iteration expression formula:
It proves:
Formula (9) indicates the relationship between the optimal value of current state and the optimal value of possible NextState, provides optimal value How function obtains optimal movement.
3.5 optimal value functions approach
In the present invention, it is approximate most to be fully connected (fc) layer using the feedforward neural network of a standard with 2 for we Merit function V*(s), as shown in Figure 6.Fc1 with fc2 node layer number is identical, is denoted as H.Use rectifier as activation primitive, this It may be the most common activation primitive of deep neural network by 2018.The input of neural network is state s, such as formula (3) It is shown.By calculating, neural network is input, output valve V (s), it is expected that being similar to V with state s*(s)。
The supervised learning of approximating function V (s) is an adjustment neural network parameterProcess.The purpose is to reduce V to the greatest extent (s) and V*(s) difference between can indicate are as follows:
With the progress of RL process, V*(sk) it is considered as the sample of approximate function V (s) parallel supervisory control study, according to gradient Descent method is for VNRk, parameterIt updates as follows:
Wherein α is the positive step parameter of Schistosomiasis control speed.
3.6 solve the problems, such as VNE with TD study
V is calculated by the approximation of neural network in learning process*(s).In VNE, a VNR is given, it is understood that Possible operation and corresponding next state.Therefore,WithIt is determining, it is known that.Optimal action π*(s) may be used It is calculated by following formula:
However, can not be traversed since possible operation set is too big.It would therefore be desirable to be reduced significantly search space.It is as follows Shown in the algorithm GC_GRC in face, (entitled GRC) is measured using node sequencing, it is candidate to develop a kind of multiple nodes mappings of generation The probabilistic method of item.However, inventive algorithm is unrelated with the measurement of GRC.Other two measurement is considered, that is, is measured (referred to as ) and unified value RW.Generate have the node mapping candidate item of RW and uniform design probability two kinds of algorithms be respectively GC_RW and GC_UNI.In algorithm GC_GRC, parameter L is that the node generated maps candidate number.
In the present invention, a kind of RL method has been used, i.e. Timing Difference (being abbreviated as TD) learns, Lai Gengxin optimal value function Estimation, and insertion decision is made according to estimation.Specifically, TD study updates its estimation V*(s) as follows:
V*(s) by neural network approximation.In conjunction with TD algorithm, by formula (11) parameterUpdate transformation are as follows:
According to above-mentioned update rule, V*(s) it during being respectively at TD and supervised learning with V (s), and carries out simultaneously.
Algorithm VNE-TD is the function that insertion decision is carried out when VNR is reached.In VNE-TD, neural network parameterIt presses Normal distribution initialization.As shown in algorithm VNE-TD, the state for being input to neural network is each node mapping candidate item mould The result phase of quasi- insertion, the maximum node of selective value are actually embedded in VNR.After establishing node mapping, finding two has centainly Shortest path between the SN node of bandwidth matches VN link.If allowing divisible stream, reflected using multiple commodity flow algorithm Penetrate virtual linkage.According to expression formula (12), it should which selection maximizesMatching j.Because rewarding (r=Rvn (VNR)) for candidate be it is identical, can choose maximize V (sj n) matching j.After being embedded in VNR, algorithm VNE-TD is in memory Middle storage triple<sc, r, sn>, as shown in the 26th row.The maximum triple quantity that memory can store is set as 1000.It is interior Deposit the Substitution Rules for following FIFO (advanced, first to go out).In order to keep the training of neural network more smooth and optimize, relative to expression Single step mode described in formula (14), parameterIt is batch updating.VNE-TD randomly selects the three of batch size from memory Tuple, with the triple training neural network of batch size.As shown in formula (14), a triple<sc, r, sn>training miss Difference is r+ γ Vk(sn)-Vk(sc).The target of the batch processing training process is the mean square error i.e. loss reduction for making batch processing.Such as Shown in 2nd row, any one of three kinds of algorithms, i.e. GC_GRC, GC_RW or GC_UNI is can be used in VNE-TD.Use GC_ The algorithm of GRC, GC_RW or GC_UNI are respectively designated as VNE-TD-GRC, VNE-TD-RW or VNE-TD-UNI.
At the end of the life cycle of VNR, it will leave SN and discharges the previously described resource for distributing to it.The shape of SN State can change.However, the parameter of neural network is not updated when VNR leaves and when reaching.
Evaluation
1, benchmark test and performance indicator
VNE-TD and algorithm in the prior art are compared.
Mainly compare VNE-TD and other algorithms using following three performance indicators: (1) blockage ratio is obstruction VNRs Quantity divided by all VNRs sum;(2) income per second be up to the present total income obtained divided by the second passed through Number;(3) weighted average path length (abbreviation WAPL) is the sum of all bandwidth actually distributed in SN divided by all VNRs links The weighted average length in all paths that the sum of bandwidth, i.e. VNR link maps arrive.
2, emulation setting
Event driven simulated environment is realized using Python.Neural network and its training are realized with Tensorflow , Tensorflow is the popular open source software library for the machine learning such as neural network application.In simulations, sharp Generate the topological structure of SN and VNs at random with GT-ITM tool.SN has 60 nodes and 150 links.VN number of nodes It is evenly distributed between 2-20, the connectivity of link between VNs any two node is 0.2.It needs to be embedded in 4000 in SN VNRs.For SN network and VNs network, start node capacity and link bandwidth are all randomly selected, and take the equal of identical mean value Even distribution.The node capacity of SN and the average value of link bandwidth are 40 times of VNs.VNRs is reached one by one, forms one A Poisson process, average arrival rate are one per second request.The service life of VNRs obeys exponential distribution, average out to μ=70 second. By in the expression formula (1) in earnings pattern parameter η and β value be set as 1.Discount rate in formula (8) is set as 1, because we Make neural network convergence more stable, faster as 1 it was found that setting γ.For neural network, node in hidden layer H is set as by we 300, it is identical as the input number size of neural network.Batch size assessed in subdivision below is rule of thumb set as 50.Node The quantity of mapping candidate item (i.e. L) is set as 40.That, unless otherwise stated, the above parameter will not be changed in following trifle.
In addition to 4 trifles, each emulation series in following subsections will be run three times.To use every time with it is previously described The different sets of identical SN and VNRs topological structure and random node capacity and link bandwidth.The standard deviation run three times Difference is expressed as follows simulation result with error bars.
1, the robustness of GRC parameter d
In general, the calculating of GRC is based on two factors, i.e. node capacity and the concatenation ability with other nodes.Use GRC Parameter d balance the two factors.In Fig. 7 (a), the blockage ratio of algorithms of different is illustrated.In Fig. 7 (b), illustrate Income per second.It can be seen from figure 7 that VNE-TD-GRC is insensitive to parameter d, and the performance of GRC-VNE is obviously dependent on Parameter d.In addition, the deviation of GRC-VNE is very big when d is relatively small.The offset of VNE-TD-GRC is small and stablizes.Imitative Under the congestion condition being really arranged, the demand of link bandwidth is bigger than node capacity, also more crucial.Therefore, it for GRC-VNE, needs Parameter d is adjusted to support the factor of concatenation ability, and almost to have ignored the factor of node capacity close to 1.00.It compares Under, VNE-TD-GRC only uses the measurement of GRC to help to reduce search range, and reflects dependent on value function to do egress The final decision penetrated.This is why compared with GRC-VNE, reason VNE-TD-GRC insensitive to parameter d.Obviously, this It is an ideal attribute of VNE-TD-GRC, because VNRs is not to be known in advance, and over time It can change a lot.
Therefore, it is 0.95, GRC-VNE 0.995 that parameter d is set VNE-TD-GRC by the present invention.
2, the influence of TD study
In order to show the influence of TD study, we with Rand-GRC algorithm (refer to randomly choose GRC) come with VNE-TD-GRC It is compared.Similar with algorithm VNE-TD-GRC, algorithm Rand-GRC generates L node using algorithm GC-GRC probabilityly and reflects Penetrate candidate item.Unlike, it there is no selection V (s) represented by maximum value, but from it is all can succeed be embedded in times A candidate item is randomly choosed in option.This means that Rand-GRC loses learning ability compared with VNE-TD-GRC.? In the simulation of this trifle, L is arranged to 10.
Although from Fig. 8 (a) as can be seen that node mapping be it is probabilistic, due to having multiple candidate items, algorithm The blockage ratio of Rand-GRC is better than GRC-VNE.It means that even if in the training process, VNE-TD-GRC still can compare GRC-VNE shows more preferably.In addition, compared with GRC-VNE, when TD study be related to selecting from multiple candidate targets it is optimal When, blockage ratio significantly improves 67.2% at 3900.From Fig. 8 (b) as can be seen that compared with GRC-VNE, VNE-TD-GRC Algorithm at 3900 it is per second can increase by 13.9% income.It is interesting that Rand-GRC in terms of income per second almost and GRC-VNE is equally good, although it is better than GRC-VNE in terms of blockage ratio.It is lower to seem that Rand-GRC is only good at insertion income And the VNRs relatively easily handled.From Fig. 8 (c) as can be seen that the probability due to node maps, algorithm Rand-GRC is compared with GRC- VNE significantly improves WAPL.And algorithm TD-VNE-GRC can be efficiently against this disadvantage.This means that being learnt using TD It can be by keeping blockage ratio and WAPL is lower helps to improve income per second.
In Fig. 9, we show the increase with frequency of training, the situations of change of loss.Loss is trained batch Mean square error is the minimum target of training process.From fig. 9, it can be seen that loss converges to part when the 700th time trained most It is excellent, that is, handle the time after the 700th VNR.In local optimum, loss is about 400 (error is about 20).Average remuneration is about It is 92, loss when local optimum is relatively small, this might mean that preferable with proposed neural network Approximation effect.
3, the influence of workload
The influence that we show workload by the way that the mean survival time (MST) of VNRs to be changed to 100 seconds from 40 seconds.We It also adds algorithm LC-GRC and (represents the minimum node of GRC cost, (our algorithm, which is that selection is maximum, to be compared), it makes L node is generated with algorithm GC-GRC and maps candidate item, and selects the candidate item that cost is minimum in SN.
It can be seen from fig. 10 that compared with other algorithms, with the increase of workload, the three kinds of VNE-TD proposed The blockage ratio of algorithm and income per second have lasting raising.Wherein, compared with GRC-VNE and RW-MM-SP, algorithm Income VNE-TD-GRC per second under highest workload has increased separately 24.8% and 17.1%.
Algorithm VNE-TD-GRC behaves oneself best in the VNE-TD of three versions.Algorithm VNE-TD-UNI performance is worst, Three version large deviations are maximum.It is absorbed in one this means that two indices GRC and RW contribute positively to VNE-TD and more has prospect Search field, although improve amplitude it is little.In addition, it also shows the potentiality that VNE-TD is combined with other VNE algorithms.
4, the influence of parameter L
(a) and (b) in Figure 11, we show the influence of the quantity of node mapping candidate, i.e. parameter L. this show with GRC-VNE is compared, VNE-TD-GRC is per second can be further improved blockage ratio and income respectively from 79.6% and 17.4%, 82.3% and 18.3%, while L increases to 60 from 40.According to the computation complexity of VNE-TD in 3.7 sections, L is increased to from 40 60 not will lead to the unacceptable increase for calculating the time.
5, the influence of topological attribute
In Figure 12, we show the influences of VN node link degree of communication.With the raising of link degree of communication, VN node Degree of communication also increase accordingly, it means that insertion difficulty be consequently increased.It can be recognized from fig. 12 that when connectivity of link is higher When, VNE-TD-GRC ratio GRC-VNE works more preferably.When connectivity of link is 0.5, the income ratio per second of VNE-TD-GRC GRC-VNE high 23.1%.
Finally, it is stated that the above examples are only used to illustrate the technical scheme of the present invention and are not limiting, although referring to compared with Good embodiment describes the invention in detail, those skilled in the art should understand that, it can be to skill of the invention Art scheme is modified or replaced equivalently, and without departing from the objective and range of technical solution of the present invention, should all be covered at this In the scope of the claims of invention.

Claims (2)

1. a kind of virtual network embedded mobile GIS based on Timing Difference study, characterized by the following steps:
S101: VNE model is established
Bottom-layer network SN is modeled as weighted undirected graph, and is denoted as Gs(Vs,Es), wherein VsIt is bottom layer node collection, EsIt is Bottom set of links, each bottom layer node vs∈Vs, haveComputing capability, each bottom link es∈Es, haveBand It is wide;
By VNRkIt is modeled as a non-directed graph, is denoted as Gk(Vk,Ek), wherein VkIt is dummy node collection, EkIt is virtual linkage collection, each Dummy node vk∈Vk, haveComputing capability, each virtual link ek∈Ek, haveBandwidth demand;
S102: definition status
S102a: for VNEkA reward function is defined, such as formula (1): VNEkIndicate the process to k-th of VNR;
Wherein, cvIndicate the node capacity of node v, beIndicate that the link bandwidth of link e, η indicate that computing resource unit price, β indicate band The unit price of wide resource;Therefore, VNR will naturally enough be handledkInstant reward afterwards is defined as Rvn (k), i.e. rk=Rvn (k);
S102b: the set of all possible node mapping is defined as VNE defining operation collection: the operation set of VNE;
S102c: Markov state is defined for VNE:
State s is indicated using the remaining node capacity of the standardization of SN and link bandwidthk, have in form WithskIt is an ordered set, shown in following formula (3):
In RL, the status signal for being successfully reserved all relevant informations is known as markov;
If status signal has Markov property, the environmental response at k+1 only depends on state and movement at k, In this case, only by specifying the following contents, so that it may determine the dynamic of environment;
Pr{sk+1=s ', rk+1=r | sk, ak} (5)
S103: VNE is modeled as Markovian decision process MDP;
S103a: definition strategy and value function: the strategy of VNE agency is at state s, from each state s and movement a to taking The mapping of the probability of a is acted, gives strategy π, the value function of VNE is the function of VNE state, and value function is expressed as Vπ(s), s ∈ S, Vπ(s) it can be counted as accommodating following VNRs and generate the potentiality of permanent income, current state is measured with this Quality, its definition such as formula (8):
RkIt is from VNRkAll rewards summation, γ is the discount rate for determining the following reward present worth;
S103b: optimal value function is defined:
Purpose from the angle research VNE problem of RL is to find a kind of optimal plan that can obtain maximal rewards on long terms Slightly;
If π*It is an optimal policy, and if only if given arbitrary strategy π, π*>=π, this means that for all s, s ∈ S has
Optimal value function is defined as
For optimal value function V*(s), there is following iteration expression formula:
S104: optimal value function V is approached using neural network*(s), i.e., the value function under optimal policy:
It is fully connected (fc) layer using the feedforward neural network and 2 of standard and carrys out near-optimization value function V*(s), fc1 and Fc2 node layer number is identical, is denoted as H, uses rectifier as activation primitive, the input of neural network is state s, such as formula (3) institute Show, by calculating, neural network is input, output valve V (s), it is expected that being similar to V with state s*(s);
The supervised learning of approximating function V (s) is an adjustment neural network parameterProcess, the purpose is to reduce V (s) to the greatest extent With V*(s) difference between can indicate are as follows:
With the progress of RL process, V*(sk) it is considered as the sample of approximate function V (s) parallel supervisory control study, declined according to gradient Method is for VNR k, parameterIt updates as follows:
Wherein α is the positive step parameter of Schistosomiasis control speed;
S105: in VNE, giving a VNR, it is understood that possible operation and corresponding next state, therefore,WithIt is determining, it is known that, the matching of each node mapping is traversed, as operation set, by operation set simulation insertion Input of the result phase collection as neural network in S104, obtains the value of multiple optimal value functions, since optimal policy π * (s) can It indicates are as follows:
That is, value is maximum just to meet optimal policy,
S106: matching corresponding to the maximum optimal value function of selective value is actually embedded in VNR, and then finding two has certain band Shortest path between wide SN node matches VN link.
2. as described in claim 1, it is characterised in that: in the S105, when traversing the matching of each node mapping, need head Following diminution processing is first done to it:
Using the probabilistic method for generating multiple node mapping candidate items, using measurement RW and unified value, generation is with RW and uniformly The node of select probability maps candidate item.
CN201910527020.7A 2019-07-19 2019-07-19 Virtual network embedding algorithm based on time sequence difference learning Expired - Fee Related CN110233763B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910527020.7A CN110233763B (en) 2019-07-19 2019-07-19 Virtual network embedding algorithm based on time sequence difference learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910527020.7A CN110233763B (en) 2019-07-19 2019-07-19 Virtual network embedding algorithm based on time sequence difference learning

Publications (2)

Publication Number Publication Date
CN110233763A true CN110233763A (en) 2019-09-13
CN110233763B CN110233763B (en) 2021-06-18

Family

ID=67859663

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910527020.7A Expired - Fee Related CN110233763B (en) 2019-07-19 2019-07-19 Virtual network embedding algorithm based on time sequence difference learning

Country Status (1)

Country Link
CN (1) CN110233763B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113193999A (en) * 2021-04-29 2021-07-30 东北大学 Virtual network mapping method based on depth certainty strategy gradient
WO2022186808A1 (en) * 2021-03-05 2022-09-09 Havelsan Hava Elektronik San. Ve Tic. A.S. Method for solving virtual network embedding problem in 5g and beyond networks with deep information maximization using multiple physical network structure

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103259744A (en) * 2013-03-26 2013-08-21 北京航空航天大学 Method for mapping mobile virtual network based on clustering
CN103457752A (en) * 2012-05-30 2013-12-18 中国科学院声学研究所 Virtual network mapping method
US20150195178A1 (en) * 2014-01-09 2015-07-09 Ciena Corporation Method for resource optimized network virtualization overlay transport in virtualized data center environments
CN108650191A (en) * 2018-04-20 2018-10-12 重庆邮电大学 The decision-making technique of mapping policy in a kind of virtualization network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103457752A (en) * 2012-05-30 2013-12-18 中国科学院声学研究所 Virtual network mapping method
CN103259744A (en) * 2013-03-26 2013-08-21 北京航空航天大学 Method for mapping mobile virtual network based on clustering
US20150195178A1 (en) * 2014-01-09 2015-07-09 Ciena Corporation Method for resource optimized network virtualization overlay transport in virtualized data center environments
CN108650191A (en) * 2018-04-20 2018-10-12 重庆邮电大学 The decision-making technique of mapping policy in a kind of virtualization network

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022186808A1 (en) * 2021-03-05 2022-09-09 Havelsan Hava Elektronik San. Ve Tic. A.S. Method for solving virtual network embedding problem in 5g and beyond networks with deep information maximization using multiple physical network structure
CN113193999A (en) * 2021-04-29 2021-07-30 东北大学 Virtual network mapping method based on depth certainty strategy gradient
CN113193999B (en) * 2021-04-29 2023-12-26 东北大学 Virtual network mapping method based on depth deterministic strategy gradient

Also Published As

Publication number Publication date
CN110233763B (en) 2021-06-18

Similar Documents

Publication Publication Date Title
Seghir et al. A hybrid approach using genetic and fruit fly optimization algorithms for QoS-aware cloud service composition
Marden et al. Game theory and distributed control
CN108282587A (en) Mobile customer service dialogue management method under being oriented to strategy based on status tracking
CN104636801A (en) Transmission line audible noise prediction method based on BP neural network optimization
Chen et al. ALBRL: Automatic Load‐Balancing Architecture Based on Reinforcement Learning in Software‐Defined Networking
CN110233763A (en) A kind of virtual network embedded mobile GIS based on Timing Difference study
CN109067583A (en) A kind of resource prediction method and system based on edge calculations
CN110247795A (en) A kind of cloud net resource service chain method of combination and system based on intention
CN108898300A (en) The construction method of supply chain network risk cascade model
Dalgkitsis et al. Dynamic resource aware VNF placement with deep reinforcement learning for 5G networks
CN116390161A (en) Task migration method based on load balancing in mobile edge calculation
He et al. A-DDPG: Attention mechanism-based deep reinforcement learning for NFV
CN111510334B (en) Particle swarm algorithm-based VNF online scheduling method
Mguni et al. Ligs: Learnable intrinsic-reward generation selection for multi-agent learning
CN116669068A (en) GCN-based delay service end-to-end slice deployment method and system
CN112036651A (en) Electricity price prediction method based on quantum immune optimization BP neural network algorithm
Cheng et al. VNE-HRL: A proactive virtual network embedding algorithm based on hierarchical reinforcement learning
Dalgkitsis et al. Schema: Service chain elastic management with distributed reinforcement learning
Liu et al. Learning-based adaptive data placement for low latency in data center networks
Shefu et al. Fruit fly optimization algorithm for network-aware web service composition in the cloud
KR20220150126A (en) Coded and Incentive-based Mechanism for Distributed Training of Machine Learning in IoT
Liu et al. Contextual learning for content caching with unknown time-varying popularity profiles via incremental clustering
CN115022231B (en) Optimal path planning method and system based on deep reinforcement learning
CN113543160A (en) 5G slice resource allocation method and device, computing equipment and computer storage medium
Liu et al. Multi-objective robust workflow offloading in edge-to-cloud continuum

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20210618