CN101835239A

CN101835239A - Multi-path delay sensing optimal route selecting method for cognitive network

Info

Publication number: CN101835239A
Application number: CN201010120758A
Authority: CN
Inventors: 盛敏; 乐天助; 史琰; 李建东; 李红艳; 龙春燕
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2010-03-09
Filing date: 2010-03-09
Publication date: 2010-09-15
Anticipated expiration: 2030-03-09
Also published as: CN101835239B

Abstract

The invention discloses a multi-path delay sensing optimal route selecting method for a cognitive network, which comprises the following steps of: dividing service into different classes; establishing a plurality of paths through route discovery; adopting an end-to-end delay recorded in the route discovery process as an initial value of a Q value; updating the Q value of the path by utilizing a Q learning algorithm, and introducing the estimation of node queue delay and the estimation of channel contending delay during updating; selecting an activating path according to the Q value to send a data package; reducing route control packet overhead by utilizing the Q learning algorithm; and when the plurality of paths cannot meet the requirement of QoS of the service, beginning the process again. The method has the advantages of quick transmission of advanced service, short path delay, high routing efficiency and high network load bearing capacity, and can be used for a cognitive wireless network.

Description

Multi-path delay sensing optimal route selecting method for cognitive network

Technical field

The invention belongs to wireless communication technology field, relate to cognitive multipath optimal route selecting method, be used for cognition wireless network.

Background technology

Cognition network is by adopting suitable study mechanism, as the intensified learning algorithm, can be under the situation that can't obtain complete environmental information, the current state of sensing network, and each parameter of network is reshuffled according to the state that perceives, and then adapt to the network environment that constantly changes, improve the performance of network.The Q learning algorithm can utilize the environment award to seek and carry out optimum behavior under the environmental model condition of unknown as a kind of intensified learning algorithm.Document " Cognitive Network Management with Reinforcement Learning for WirelessMesh Networks " has proposed the Q learning algorithm is applied to reduce the wireless network method for routing that route is controlled expense.This method can autonomous learning and is predicted the state of network, and according to suitable control expense number under the network state quality configuration corresponding states of institute's perception, thereby reach the purpose that reduces network overhead.But employed path is still the fixed route of traditional routing algorithm in this method, can't be adaptive carries out optimal path selection according to states such as current end-to-end time delay of network and loads.

In the document " Packet Routing In Dynamically Changing Networks:A ReinforcementLearning Approach " a kind of method for routing that the Q learning algorithm is applied to Path selection has been proposed.This method has self-learning capability, can the different offered load of perception, and the path of learning time delay minimum under this load.This method can obtain relative less time delay under the situation of high capacity.But this method still is a more rough study to time delay, and the current state of learning network causes the accuracy of sensing results low more accurately.

Summary of the invention

The objective of the invention is to overcome the shortcoming of above-mentioned prior art, the advantages of above-mentioned two kinds of methods is got up, a kind of multi-path delay sensing optimal route selecting method for cognitive network has been proposed, to set up mulitpath, and utilize path delay of time of Q learning algorithm real-time perception different business grade, select best route to send packet according to sensing results is dynamic in multipath; And in perception path delay of time, introduce the estimation of channel competition time delay and different business grade bag queuing delay, improve the accuracy of sensing results.

The object of the present invention is achieved like this:

One. the term explanation

Cognitive Hello bag: be meant the bag that is used for to neighbor node inquiry Q value.

Cognitive Echo bag: be meant the bag that is used for replying the Q value to neighbor node.

RREQ bag: be meant the routing request packet that source node sends.

RREP bag: be meant the routing reply bag that destination node sends.

Hello bag: be meant the bag that is used to notify the existence of this node of neighbor node.

Metric: be meant route cost.

Activated path: be meant the path that is actually used in transmits data packets in the multipath.

Two. step of the present invention

Realize the object of the invention technical scheme, comprise the steps:

(1) according to type of service the business in the network is divided into different priorities, wherein video traffic is a limit priority, and speech business is secondary priority, and data service is a lowest priority;

(2) in routing table, increase the Q codomain of representing video, the estimation in the path delay of time of 3 service with different priority levels of voice-and-data, and they are initialized as " zero ";

(3) in routing table, increase the activation marker territory of corresponding video, 3 different business priority of voice-and-data, and it is initialized as " vacation ";

(4) in route replies RREP bag, increase the path number territory of delegated path sequence number, and it is initialized as " zero ";

(5) between node, increase the cognition inquiry Cognitive Hello bag that transmits the Q value, and Cognitive Echo bag is replied in cognition;

(6) employing of the source node in the network is broadcasted the mode that floods and is sent the RREQ bag, and writes down the time that this sends the RREQ bag;

Whether, according to this node be destination node determine send mode, if this node is a destination node, then reply the RREP bag if (7) receiving the via node that RREQ wraps, otherwise via node will be transmitted the RREQ bag, and note the time that this transmits this RREQ bag;

(8) destination node is when receiving the RREQ bag, read the source node in the bag, and determine whether to reply the RREP bag, if number of times is more than or equal to 3 according to the number of times that in this route finding process, receives from the RREQ of this source node bag, then abandon this RREQ bag, otherwise reply the RREP bag;

(9) receive the via node that RREP wraps, read the value in the path number territory in the RREP bag, estimate to arrive the time delay value of destination node, and it is recorded in the Q codomain of routing table corresponding to this path number, as the initial value of different priorities Q value;

(10) receive the source node that RREP wraps, estimate to arrive the time delay value of destination node, read the value in the path number territory in the RREP bag, then the time delay estimated value is recorded in the routing table in the Q codomain corresponding to this path number, as the initial value of different priorities Q value, the route of finishing between source node and the destination node is set up;

(11) each node of having set up on the path is replied Cognitive Echo bag transmission Q value by mutual cognitive inquiry Cognitive Hello bag and cognition, and upgrades the Q value of the different priorities in the routing table by following formula:

Q_{t}^{x} {(d, y)}_{i} = (1 - α) Q_{t - 1}^{x} {(d, y)}_{i} + α (W_{i} + T_{contention} + \min_{z} Q_{t - 1}^{y} {(d, z)}_{i})

In the formula, α is the study factor, and scope is 0＜α＜1;

The subscript i of Q value is the service priority of packet, i=1,2,3;

D represents destination node, and x is a present node, and y represents the neighbors of x, and z represents the neighbors of y;

T represents current time, and t-1 represents one constantly;

Q _T-1 ^y(d, z) _iArrive the Q value of destination node for a last moment neighbors that from Cognitive Echo bag, reads;

Q _T-1 ^x(d, y) _iBe the Q value that goes up the destination node of carving copy node arrival for the moment that from routing table, reads;

Q _t ^x(d, y) _iArrive the Q value of destination node for this node after upgrading;

W _iFor system of non-intrusion priority queue medium priority is that the queuing time that wraps in the formation of i is estimated,

W_{i} = \frac{Σ_{k = 1}^{3} λ_{k} X_{k} \overset{&OverBar;}{_{2}}}{2 (1 - ρ_{1} - \cdot \cdot \cdot - ρ_{i - 1}) (1 - ρ_{1} - \cdot \cdot \cdot - ρ_{i})},

In the formula, i is a priority of data packets, i=1,2,3; K is an integer, k=1,2,3; λ _kFor priority is the data packet arrival rate of k; X _k ²It for priority the second order distance of average service time of the packet of k; ρ _iUtilance for the bag of priority i;

T _ContentionFor the time based on the channel competition mean consumption of 802.11 standard channels is estimated T _Contention=(1-P _Tr) σ+P _TrP _sT _s+ P _Tr(1-P _s) T _c, in the formula, σ is the unit slot length, T _sFor channel is used for successfully transmitting the time of being experienced, T _cBe the time that channel bumps and experienced, P _TrFor having at least a node to send the probability of bag in n neighbors in any time slot, n is the neighbors number of this node, P _Tr=1-(1-τ) ⁿ, wherein, τ is that arbitrary node is sending the probability that wraps, P in the time slot arbitrarily _sFor bag in any time slot sends probability of successful,

P_{s} = \frac{nτ {(1 - τ)}^{n - 1}}{1 - {(1 - τ)}^{n}};

(12) source node as activated path, is used to send packet according to the paths in the Q value selection multipath of real-time update;

(13) source node adopts the Q learning algorithm to reset the route lifetime of hello cycle and this route;

(14), then forward step (6) to and initiate new route finding process if the Q value in the routing table of source node all can't satisfy the QoS time delay demand of the packet priority that will send.

The present invention compared with prior art has following advantage:

The present invention since each source node destination node between set up and safeguard mulitpath simultaneously, and adopt the Q learning algorithm time delay in these paths is learnt and to be predicted, from multipath, select best path dynamically for use according to study and prediction result, make method for routing to adjust employed path, guaranteed the low time delay of network according to the dynamic change of network environment;

2. the present invention is owing to respectively the time delay of different priorities data being learnt and being predicted, more rational is the suitable path of data selection of different priorities, make the network can be according to the rational Resources allocation of the data of different priorities, balance network load improves network performance;

3. the present invention has added the prediction to node queue's length because when Q value is upgraded, and learns and has predicted the influence of node queue's length to time delay, makes node avoid the bigger node of queue length when the selection path, has reduced the path delay of time;

4. the present invention is because when upgrading the Q value, added prediction, made node when Path selection, avoid the zone of some channel competition fiercenesses, reduced the path delay of time the channel competition time delay, alleviate congestion to a certain extent simultaneously, balance the load of node in the network;

5. the present invention is owing to adopt the state of Q learning algorithm prediction network, and the route lifetime and the hello cycle of reasonably configuration oneself, can effectively reduce network overhead.

Description of drawings

Fig. 1 is a cognitive route theory diagram of the present invention;

Fig. 2 is a routing table entry schematic diagram of the present invention;

Fig. 3 is a route requests RREQ bag schematic diagram of the present invention;

Fig. 4 is a routing reply RREP bag schematic diagram of the present invention;

Fig. 5 is the cognitive inquiry of a present invention Cognitive Hello bag schematic diagram;

Fig. 6 is that Cognitive Echo bag schematic diagram is replied in cognition of the present invention;

Fig. 7 is an example schematic of the present invention.

Embodiment

With reference to Fig. 1, specific implementation of the present invention comprises the steps:

Step 1 is divided into different brackets with the business in the network.

According to type of service the business in the network is divided into different priorities, wherein video traffic is divided into limit priority, and speech business is divided into secondary priority, and data service is divided into lowest priority, and these different business have different service quality QoS requirements for network.

Step 2 increases Q codomain and activation marker territory in routing table, increase the path number territory in the RREP bag, introduces Cognitive Hello bag and Cognitive Echo bag and is used for transmitting between the node Q value.

Described Q codomain is divided into three subdomains, and the Q value of corresponding video, 3 different business priority of voice-and-data is initialized as " zero " respectively;

Described activation marker territory also is divided into 3 subdomains, and the video of respectively corresponding this path correspondence, the activation marker of 3 different business priority of voice-and-data are initialized as " vacation ".

Step 3, the mode that the source node in the network adopts broadcasting to flood send the RREQ bag.

Source node is noted the time that this sends RREQ when sending the RREQ bag, and the channel that adopts 802.11 standards to provide when broadcasting carries out.

Step 4, via node are transmitted the RREQ bag.

Whether via node is that destination node is determined send mode according to this node, if this node is a destination node, then replys the RREP bag, otherwise via node will be transmitted the RREQ bag, and notes the time of transmitting RREQ.

Step 5, destination node are replied the RREP bag.

(5a) destination node is when receiving the RREQ bag, read the source node in the bag, and determine whether to reply the RREP bag, if number of times is more than or equal to 3 according to the number of times that in this route finding process, receives from the RREQ of this source node bag, then abandon this RREQ bag, otherwise reply the RREP bag;

When (5b) replying RREP, destination node is inserted the number of times of answer in the path number territory of RREP, and this number of times is this and replys the path number of setting up;

(5c) behind the answer RREP, reply number of times and add 1.

Step 6, via node are transmitted the RREP bag, and upgrade the routing table entry that arrives destination node.

(6a) via node reads the value in the path number territory in the RREP bag;

(6b) node time that will receive RREP deduct before time of the corresponding RREQ of transmission of record, and the gained result obtained the time delay estimated value of via node to destination node divided by 2;

(6c) the time delay estimated value is recorded in the Q codomain of routing table corresponding to this path number, as the initial value of the Q value subdomain of different priorities.

Step 7, source node are handled the RREP bag, upgrade the routing table entry that arrives destination node, finish the initialization procedure of path foundation and Q value.

(7a) read the value in the path number territory in the RREP bag;

(7b) receive RREP bag at every turn after, on number of path, add 1;

(7c) node time that will receive RREP deduct before record the corresponding RREQ of transmission time and divided by 2, obtain the time delay estimated value of via node to destination node;

(7d) the time delay estimated value is recorded in the Q codomain of routing table corresponding to this path number, as the initial value of the Q value subdomain of different priorities.

Step 8 is utilized the Q learning algorithm to carry out path Q value and is upgraded.

(8a) on the multipath of having set up, each node constructed a Cognitive Hello bag every one second, and insert this node address in the source node territory of Cognitive Hello bag as source node address, the destination node address in path is inserted in the destination node territory of Cognitive Hello bag, next-hop node to the path sends Cognitive Hello bag, and inquiry arrives the Q value of destination node; Each node is when receiving Cognitive Hello bag, read the source node of Cognitive Hello bag and the destination node in path, structure Cognitive Echo bag, the Q value that is recorded in the different priorities of respective path destination node in the local routing table is inserted in the bag, and Cognitive Echo bag is replied to the source node of Cognitive Hello bag; At last, each node reads the Q value from bag when receiving Cognitive Echo bag;

(8b) each node upgrades the Q value as follows after getting access to the Q value of neighbors:

At first, estimate that system of non-intrusion priority queue medium priority is the queuing time in the formation of wrapping in of i:

W_{i} = \frac{Σ_{k = 1}^{3} λ_{k} \overset{&OverBar;}{X_{k}^{}}}{2 (1 - ρ_{1} - \cdot \cdot \cdot - ρ_{i - 1}) (1 - ρ_{1} - \cdot \cdot \cdot - ρ_{i})}

Wherein, i is a priority of data packets, i=1,2,3; K is an integer, k=1,2,3; λ _kFor priority is the data packet arrival rate of k, X _k ²For priority is the packet average service time second order distance of k; ρ _iUtilance for the bag of priority i;

Secondly, compete the time of mean consumption based on 802.11 standard channel predicted channel:

T _contention＝(1-P _tr)σ+P _trP _sT _s+P _tr(1-P _s)T _c

In the formula, σ is the unit slot length, T _sFor channel is used for successfully transmitting the time of being experienced, T _cBe the time that channel bumps and experienced, P _TrFor having at least a node to send the probability of bag in n neighbors in any time slot, n is the neighbors number of this node, P _Tr=1-(1-τ) ⁿ, wherein, τ is that arbitrary node is sending the probability that wraps, P in the time slot arbitrarily _sFor bag in any time slot sends probability of successful,

P_{s} = \frac{nτ {(1 - τ)}^{n - 1}}{1 - {(1 - τ)}^{n}};

At last, with W _iAnd T _ContentionTwo value substitution Q values more in the new formula, carry out upgrading according to following more new formula to the Q value:

Q_{t}^{x} {(d, y)}_{i} = (1 - α) Q_{t - 1}^{x} {(d, y)}_{i} + α (W_{i} + T_{contention} + \min_{z} Q_{t - 1}^{y} {(d, z)}_{i})

Wherein, α is the study factor, and scope is 0＜α＜1; The subscript i of Q value is the priority of bag, i=1,2,3; D represents destination node; X is a present node; Y represents the neighbors of x; Z represents the neighbors of y; T represents current time; T-1 represents one constantly; Q _T-1 ^y(d, z) _iArrive the Q value of destination node for a last moment neighbors that from Cognitive Echo bag, reads; Q _T-1 ^x(d, y) _iBe the Q value that goes up the destination node of carving copy node arrival for the moment that from routing table, reads; Q _t ^x(d, y) _iArrive the Q value of destination node for this node after upgrading;

(8c) the Q value after will upgrading is inserted in the Q codomain of routing table corresponding to this path.

Step 9, source node selects a paths as activated path according to the Q value of real-time update, is used to send packet.

(9a) utilize that paths of setting up at first as the activated path transmits data packets when initial between source node and the destination node, and 3 activation markers under this path number in the routing table all are set to " very ", the activation marker under other path number all is set to " vacation ";

(9b) in the multipath of having set up, each paths is all being safeguarded video, the Q value of 3 service with different priority levels of voice-and-data, source node compares respectively by service priority the Q value in different paths, if after double renewal, certain paths all keeps minimum corresponding to the Q value of a certain priority service, then select this path as activated path, be used to send the packet of this priority, in the source node routing table activation marker under should the described priority in path is set to " very ", in the routing table other path under described priority activation marker be set to " vacation ";

(9c) when source node has packet to send, then select activation marker in the routing table to be used to send this packet for the path of " very " according to priority of data packets and destination node.

Step 10, source node adopts the Q learning algorithm to reset the route lifetime of hello cycle and this route according to the situation of change of Q value in the routing table.

(10a) source node is averaged the video of every paths, the Q value of 3 different priorities of voice-and-data respectively after receiving that Cognitive Echo when bag upgrade the Q value of routing table, obtains the time delay estimated value T in this path _Est

(10b) according to above-mentioned time delay estimated value T _Est, calculate normalization estimated value in path delay of time γ by following formula:

γ＝T _est/ete _max

In the formula, ete _MaxThe maximum of the end-to-end time delay that the expression network allows;

(10c) according to above-mentioned normalization estimated value in path delay of time γ, to describing the Q of network stabilization _sValue and instable Q _UnsValue is upgraded respectively, the network stabilization Q after upgrade _s[t] value and instable Q _Uns[t] value:

Q_{s} [t] = &PartialD; * Q_{s} [t - 1] + (1 - &PartialD;) γ

Q_{uns} [t] = &PartialD; * Q_{uns} [t - 1] + \frac{1 - &PartialD;}{γ}

In the formula, Q _s[t-1] expression node is at the Q of t-1 moment network stabilization _sValue; Q _s[t] expression node is at the Q of t moment network stabilization _sValue; Q _Uns[t-1] expression node is at the Q of t-1 moment network instability _UnsValue; Q _Uns[t] expression node is at the Q of t moment network instability _UnsValue;

Be meant the study factor, span is

0 \leq &PartialD; < 1;

(10d) source node is carried out different operating according to upgrading the result, works as Q _s[t]＞Q _UnsWhen [t], perceive the network state instability, reduce the route lifetime and the hello cycle of this route; Work as Q _s[t]＜Q _UnsWhen [t], perceive network state and stablize the route lifetime and the hello cycle that increase this route.

Step 11, node if the Q value in the routing table of source node all can't satisfy the QoS time delay demand of this packet priority, are then initiated new route finding process when sending packet.

According to the performing step of the invention described above, provide following example:

With reference to Fig. 7, small arrow in the example of the present invention on the path represents between each node that mutual cognitive inquiry CognitiveHello bag and cognition reply Cognitive Echo bag and transmit Q value, and CH that marks on the small arrow and CE represent that respectively Cognitive Hello wraps and Cognitive Echo wraps.The selected path that provides and the situation of change of route lifetime have been marked as time passes on the time shaft among Fig. 7.Concrete process prescription is as follows:

Suppose certain constantly source node S priority is arranged is that 2 speech business will send to destination node D.The source node S table of query and routing, discovery does not arrive the route entry of destination node D, then initiates route finding process.Successively set up three paths through the route finding process source node and arrived destination node D, be respectively path S-A-D, path S-B-C-D and S-E-F-D.On above-mentioned three paths, each node wrapped with the mutual Cognitive Hello of its neighbors bag and Cognitive Echo every one second and transmits the Q value, then according to the new video more respectively of new formula more, the Q value of voice-and-data business, the video traffic Q1 value after obtaining upgrading, speech business Q2 value and data service Q3 value.And in 0 to 1 second time, select the S-A-D path as initial value.If at 0 to 2 second therebetween, the Q2 value of path S-B-C-D remains minimum, and then node S will select path S-B-C-D to come the transmitting audio data bag as activated path.In the section, the average (Q1+Q2+Q3)/3 of the different business priority Q value of path S-A-D increases, and then thinks the network environment variation to reduce route lifetime and Hello cycle, as the T of route lifetime from Fig. 7 between supposing at this moment _(0,2)Be reduced to T _(2,4)Pass in time, route lifetime and activated path that the present invention adopts all can dynamically be adjusted with the variation of network environment.

Claims

1. a multi-path delay sensing optimal route selecting method for cognitive network comprises the steps:

Q_{t}^{x} {(d, y)}_{i} = (1 - α) Q_{t - 1}^{x} {(d, y)}_{i} + α (W_{i} + T_{contention} + \min_{z} Q_{t - 1}^{y} {(d, z)}_{i})

In the formula, α is the study factor, and scope is 0＜α＜1;

The subscript i of Q value is the service priority of packet, i=1,2,3;

T represents current time, and t-1 represents one constantly;

W_{i} = \frac{Σ_{k = 1}^{3} λ_{k} \overset{&OverBar;}{X_{k}^{2}}}{2 (1 - ρ_{1} - \cdot \cdot \cdot - ρ_{i - 1}) (1 - ρ_{1} - \cdot \cdot \cdot - ρ_{i})},

P_{s} = \frac{nτ {(1 - τ)}^{n - 1}}{1 - {(1 - τ)}^{n}};

2. cognitive network route method according to claim 1, wherein step (11) described on the multipath of having set up node reply Cognitive Echo bag transmission Q value by mutual cognitive inquiry Cognitive Hello bag and cognition, carry out according to following steps:

(2a) node of each on the path was every one second structure Cognitive Hello bag, just this node address is inserted in the source node territory of Cognitive Hello bag as source node address, the destination node address in path is inserted in the destination node territory of Cognitive Hello bag, the next-hop node to the path sends the Q value that the inquiry of Cognitive Hello bag arrives destination node;

(2b) node is when receiving Cognitive Hello bag, read the source node of bag and the destination node in path, structure Cognitive Echo bag, the Q value that is recorded in the different priorities of respective path destination node in the local routing table is inserted in the bag, and Cognitive Echo bag is replied to the source node of Cognitive Hello bag;

(2c) each node reads the Q value from bag when receiving Cognitive Echo bag.

3. cognitive network route method according to claim 1, wherein the described source node of step (12) selects a paths as activated path according to the Q value of real-time update, is used to send packet, carries out according to following steps:

(3a) utilize that paths of setting up at first as the activated path transmits data packets when initial between source node and the destination node, and 3 activation markers under this path number in the routing table all are set to " very ", the activation marker under other path number all is set to " vacation ";

(3b) in the multipath of having set up, each paths is all being safeguarded video, the Q value of 3 service with different priority levels of voice-and-data, source node compares respectively by service priority the Q value in different paths, if after double renewal, certain paths all keeps minimum corresponding to the Q value of a certain priority service, then select this path as activated path, be used to send the packet of this priority, in the source node routing table activation marker under should the described priority in path is set to " very ", in the routing table other path under described priority activation marker be set to " vacation ";

(3c) when source node has packet to send, then select activation marker in the routing table to be used to send this packet for the path of " very " according to priority of data packets and destination node.

4. cognitive network route method according to claim 1, wherein the described source node of step (10) adopts the Q learning algorithm to reset the route lifetime of hello cycle and this route, carries out according to following steps:

(4a) source node is averaged the video of every paths, the Q value of 3 different priorities of voice-and-data respectively after receiving that Cognitive Echo when bag upgrade the Q value of routing table, obtains the time delay estimated value T in this path _Est

(4b) according to above-mentioned time delay estimated value T _Est, calculate normalization estimated value in path delay of time γ by following formula:

γ＝T _est/ete _max

(4c) according to above-mentioned normalization estimated value in path delay of time γ, to describing the Q of network stabilization _sValue and instable Q _UnsValue is upgraded respectively, the network stabilization Q after upgrade _s[t] value and instable Q _Uns[t] value:

Q_{s} [t] = &PartialD; * Q_{s} [t - 1] + (1 - &PartialD;) γ

Q_{uns} [t] = &PartialD; * Q_{uns} [t - 1] + \frac{1 - &PartialD;}{γ}

Be meant the study factor, span is

0 \leq &PartialD; < 1;

(4d) source node is carried out different operating according to upgrading the result, works as Q _s[t]＞Q _UnsWhen [t], perceive the network state instability, reduce the route lifetime and the hello cycle of this route; Work as Q _s[t]＜Q _UnsWhen [t], perceive network state and stablize the route lifetime and the hello cycle that increase this route.