CN109660375B - High-reliability self-adaptive MAC (media Access control) layer scheduling method - Google Patents

High-reliability self-adaptive MAC (media Access control) layer scheduling method Download PDF

Info

Publication number
CN109660375B
CN109660375B CN201710946487.6A CN201710946487A CN109660375B CN 109660375 B CN109660375 B CN 109660375B CN 201710946487 A CN201710946487 A CN 201710946487A CN 109660375 B CN109660375 B CN 109660375B
Authority
CN
China
Prior art keywords
action
denotes
probability
feedback
learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710946487.6A
Other languages
Chinese (zh)
Other versions
CN109660375A (en
Inventor
刘元安
张洪光
王怡浩
范文浩
吴帆
谢刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN201710946487.6A priority Critical patent/CN109660375B/en
Publication of CN109660375A publication Critical patent/CN109660375A/en
Application granted granted Critical
Publication of CN109660375B publication Critical patent/CN109660375B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/04Network management architectures or arrangements
    • H04L41/044Network management architectures or arrangements comprising hierarchical management structures
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/06Testing, supervising or monitoring using simulated traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W84/00Network topologies
    • H04W84/18Self-organising networks, e.g. ad-hoc networks or sensor networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention discloses a high-reliability self-adaptive MAC layer scheduling method. The method mainly solves the problem that a large amount of energy consumption is caused by idle interception of cluster head nodes in a wireless sensor network. The method comprises the following steps: carrying out model establishment on the wireless sensor network; generating a specific frame format, and embedding queue occupancy rate and delay in a frame control field; initializing an action set, and selecting a probability set and a feedback set; the coordinator interacts with the surrounding environment by using a learning automaton method and updates the action and the state of the coordinator; the whole learning process is divided into three stages: adopting corresponding search strategies in an initial stage, an exploration stage and a greedy stage; evaluating the interaction between the action and the environment, and updating the feedback and selection probability set; and selecting relevant parameters for determining the duty ratio based on the feedback set to realize self-adaptive MAC layer scheduling. The embodiment of the invention ensures that the duty ratio of the node is adjusted in a self-adaptive manner in the operation period, minimizes the power consumption and has wide application value.

Description

High-reliability self-adaptive MAC (media Access control) layer scheduling method
Technical Field
The invention belongs to the technical field of wireless sensor networks, and particularly relates to a high-reliability self-adaptive MAC (media access control) layer scheduling method.
Background
Wireless Sensor Network (WSN) nodes are typically battery powered and in many deployment environments, replacing batteries or electromagnetic charging is expensive or even infeasible. Therefore, low power consumption is considered as the most important index of the wireless sensor network communication protocol. In particular, the node does not know when other nodes are data-ing, so the transceiver will continue to be in receive mode even if the node is in idle state. Idle listening is considered one of the major problems of energy waste.
Currently, the most widely adopted ieee802.15.4 standard defines several different types of nodes: a Full Function Device (FFD), also known as a beacon-enabled device, may operate as a personal area network coordinator, cluster head or router, and a partial function device (RFD), also known as a non-beacon device, may only operate as an end device. When the FFD acts as a cluster head, it will quickly drain its energy because the FFD cannot predict when other sensor nodes will send their data to them, so they need to be in receive mode all the time to receive all the collected information. To overcome this problem, the standard specification defines a mode in which beacons are enabled. This mode supports the transmission of beacon frames from the coordinator to the end devices that allow node synchronization. This allows all devices to sleep between coordinated transmissions, which helps reduce idle listening and thus extends network lifetime.
In recent years, many duty cycle adjustment algorithms are proposed for such situations, for example, a reserved frame control field existing in a MAC frame header is modified, and information such as occupation of a transmission queue and end-to-end delay of a collection node is collected to select a duty cycle; yet another solution is to use reinforcement learning, whose main goal is to find the optimal duty cycle, and design a solution to adjust the sleep time of the SMAC protocol in the WSN environment, and the proposed solution takes the number of frames queued for transmission as the state and the reserved active time as the action. However, this means that a large number of state-action pairs need to be stored, which is not desirable in wireless sensor nodes where memory resources are limited. Recently, an extension of CAP based on a busy tone emitted by a device at the end of a standard CAP has been proposed. A busy tone is sent only when a device fails to send all of its data frames. The CAP is extended if there is some real-time data in the transmission queue of any device at the end of the CAP. However, these extensions do not meet the standard and require modification of the superframe structure.
Disclosure of Invention
The embodiment of the invention provides a high-reliability self-adaptive MAC layer scheduling method, which adaptively adjusts the duty ratio in the running period without human intervention so as to minimize the power consumption and balance the probability of successful data transmission and the delay constraint of application.
In order to achieve the above object, an embodiment of the present invention provides a highly reliable adaptive MAC layer scheduling method, which is applied to a coordinator device in a wireless sensor network, and the method includes:
the method comprises the steps of establishing a model according to a wireless sensor network environment, wherein the wireless sensor network environment model is represented by a three-dimensional array E ═ alpha, beta, p, wherein alpha represents an action set of automatic learning, namely input, of a node, and represents a duty ratio set of the node in the invention; beta represents a feedback signal output by the node after selecting a proper duty ratio and interacting with the environment.
Specifically, the environment can be divided into a P-model and a Q-model according to the difference of β value types, wherein in the P-model, the feedback signal is Boolean (0 or 1), and in the Q-model, the feedback signal is [0,1]]An inner continuous random variable. The P-model is adopted in the invention because the control model is simple and easy to use. p ═ p1,p2,...,prDenotes a series of reward and punishment probabilities, and each learning automaton action αiAll have a corresponding pi
The node generates a specific frame structure format, and embeds the reserved bits of the frame control field into parameters such as queue occupancy rate, queuing delay and the like.
Specifically, to avoid introducing any additional overhead, each terminal device embeds the queue occupancy O and the queuing delay D in the frame control structure of each data frame transmitted, using the 3 reserved bits of the frame control field as shown in fig. 3.
It should be noted that each sender uses two bits to represent 4 different levels of queue occupancy oiAnd queuing delay diIs divided into 2 levels.
The coordinator (FFD) performs flow estimation to generate a flow adaptive duty ratio set.
It should be noted that, the present invention assumes that the wireless sensor network is in a star topology structure, and the coordinator collects data sent by the terminal device. Each coordinator estimates incoming traffic by computing idle overhearing, packet accumulation, and delay in the terminal device transmit queues.
The coordinator initializes its action set, and the actions select a probability set and a feedback set.
Specifically, the learning automaton isA learning tool based on probability by randomly activating a probability vector Pi(t) to select the activities, the activity probability vectors are the main building blocks of the learning automaton, and must therefore be kept updated at any time.
In an initial stage, in order to prevent a large amount of data from being lost when the data traffic of the wireless sensor network is large, the operation is selected to be the maximum duty ratio, that is, the coordinator is always in the receiving state, and the corresponding operation selection probability is also 1, so that the coordinator in the previous stage can be ensured to collect more information of the network.
The coordinator (FFD) interacts with the surrounding environment using a learning automata method (LA) method.
Specifically, the learning robot model of the variable structure may be represented by a three-dimensional array LA ═ (α, p), where α ═ { α ═1,α2,...,αrDenotes an action set of the learning automaton, β ═ β1,β2,...,βrDenotes the set of feedback signals given by the environment, p ═ p1,p2,...,prDenotes the set of action probabilities, satisfy
Figure GDA0001538403520000031
Wherein p isi(n) α representing the learning process through the n-th roundiCorresponding action probabilities.
Selecting an exploration strategy: selecting different exploration strategies at different periods
Specifically, the whole part of the exploration strategy is divided into 3 stages: an initial stage, an exploration stage and a greedy stage;
in the initial stage, all actions in the set are explored in a deterministic manner by adopting a cyclic search strategy, and the node selects the highest duty ratio at the initial stage and slowly reduces the duty ratio until the minimum duty ratio is reached, so that all duty ratio sets are ensured to be tried.
And an exploration phase, wherein actions of higher duty ratio than the current selection are randomly explored, and if the selection probability is increased, the reward is increased. Otherwise, if the reward remains the same or decreases, it will randomly explore low duty cycle actions.
A greedy phase, in which the node becomes substantially more cognizant of the environment after a period of learning by applying an exploration strategy, allows autonomous selection actions to begin.
Evaluating the influence of the action on data transmission after interaction with the environment, updating the feedback set, and updating the action selection probability set
In particular, the coordinator updates the reward for each beacon interval by using feedback received from the sender during the last activity duration.
Selecting action, selecting BO and SO standard parameters for determining duty ratio based on feedback set, and realizing adaptive MAC scheduling
After selecting the action value, the BO and SO standard parameters that determine the duty cycle are adjusted.
In order to achieve the above object, an embodiment of the present invention provides a highly reliable adaptive MAC layer scheduling apparatus, which is applied to a coordinator device in a wireless sensor network, and the apparatus includes:
a generation unit: generating a specific frame control structure format, and embedding parameters such as queue occupancy rate, queue delay and the like into reserved bits of a frame control field;
a transmission unit: each sensor node sets a generated frame format according to the self condition and sends the frame format to each sensor node;
a receiving unit: the data frame receiving unit is used for receiving data frames sent by each sensor node after the sensor node is accessed to a channel; the data frame at least comprises parameters such as queue occupancy rate, queue delay and the like;
an evaluation unit: evaluating the selection probability of the action according to the parameters sent by the sensor nodes and the coordinated working state;
an autonomous learning unit: updating the action set, the action selection probability set and the feedback set of the node by adopting a learning automaton method;
a policy selection unit: judging which time slot is in, adopting a corresponding strategy, adopting a circular exploration strategy at an initial stage, adopting a random strategy at an exploration stage, and adopting a greedy strategy at a final greedy stage;
the self-adaptive adjusting unit: after the action is selected, the adaptive MAC scheduling is completed based on the feedback set and the action set adjusting parameters BO and SO.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flowchart of a high-reliability adaptive MAC layer scheduling method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a model of a learning automaton according to an embodiment of the present invention;
FIG. 3 is a block diagram of a frame control format according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a high-reliability adaptive MAC layer scheduling apparatus according to an embodiment of the present invention;
fig. 5 is a schematic diagram of a MAC layer scheduling node transmission collision according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The technical scheme of the invention is specifically explained according to the attached drawings.
The high-reliability self-adaptive MAC layer scheduling method comprises the following steps:
and S101, establishing a model of the wireless sensor network.
Specifically, the wireless sensor network environment model is represented by a three-dimensional array E ═ α, β, p, where α represents an action set of node auto-learning, i.e., input, and in the present invention represents a duty cycle set of nodes; beta represents a feedback signal output by the node after selecting a proper duty ratio and interacting with the environment.
Specifically, the environment can be divided into a P-model and a Q-model according to the difference of β value types, wherein in the P-model, the feedback signal is Boolean (0 or 1), and in the Q-model, the feedback signal is [0,1]]The internal continuous random variable is suitable for the actual control field; the P-model is widely applied to the research of the wireless sensor network because the control model is simple and easy to use. p ═ p1,p2,...,prDenotes a series of reward and punishment probabilities, and each learning automaton action αiAll have a corresponding pi. In the invention, a P-model is adopted to model the wireless sensor network environment.
S102, the node generates a specific frame structure format, and embeds parameters such as queue occupancy rate, queue delay and the like by using reserved bits of a frame control field.
Specifically, to avoid introducing any additional overhead, each terminal device embeds the queue occupancy O and the queuing delay D in the frame control structure of each data frame transmitted, using the 3 reserved bits of the frame control field as shown in fig. 3.
It should be noted that each sender uses two bits to represent 4 different levels of queue occupancy oiAnd queuing delay diIs divided into 2 levels. With this information, the coordinator can estimate the queue occupancy O and the queuing delay D. Queue occupancy O is defined as follows:
Figure GDA0001538403520000051
where any node is equal to 1 if it reaches or exceeds the maximum number of frames that can be stored in its queue. Otherwise, it is equal to the average queuing occupancy during inactive periods, i.e. the time at which the occupancy in the packet accumulation CAP is highest. The queue occupancy rate O is expressed by 2Bits, so that the space can be saved, and the fluctuation range of the value can be reduced.
S103, the coordinator (FFD) carries out flow estimation to generate a flow self-adaptive duty ratio set.
It should be noted that, the present invention assumes that the wireless sensor network is in a star topology structure, and the coordinator collects data sent by the terminal device. Each coordinator estimates incoming traffic by computing idle overhearing, packet accumulation, and delay in the terminal device transmit queues. The expression for idle snoop IL is as follows:
IL=1.0-SFu(2)
wherein, SFuThe superframe utilization rate is the ratio of the time occupied by the terminal device in the superframe to the total time available for data communication, and is defined as:
Figure GDA0001538403520000061
where SD is the superframe duration, TbTime taken by the coordinator for beacon transmission, TcIndicating the time, T, that a device occupies the channel due to a frame collisionrIs the time for data reception.
Illustratively, in type 1(C1), the sender node under consideration (node a) first ends its transmission, while the transmissions of the other nodes still continue, see fig. 5. In type 2(C2), the sender a completes transmission after the collision occurs. Finally, in type 3(C3), both nodes end transmission at the same time. To detect C1 and C2, a or B may listen to the channel to detect other transmissions if they are within range of each other. Therefore, the transmitting side considers that if a collision occurs while listening to the channel after transmission, a busy channel is detected and the acknowledgement frame 2 is not received. On the other hand, to detect C3, the receiver perceives the received energy increase on its CCA threshold, but it is not synchronized with the start frame delimiter.
S104, the action set is initialized for coordination, and the probability set and the feedback set are selected for actions.
In particular, studyThe automaton is a probability-based learning tool that works through a random activity probability vector Pi(t) to select the activities, the activity probability vectors are the main building blocks of the learning automaton, and must therefore be kept updated at any time. Learning automata AiIs represented as follows:
Figure GDA0001538403520000062
wherein, Pi(t) denotes the node n at time tiSelecting the probability of a certain duty ratio, wherein in the invention, the probability is set as the expected value of the total feedback return of the corresponding duty ratio, and is defined as follows:
Figure GDA0001538403520000063
in order to prevent a large amount of data from being lost when the data traffic of the wireless sensor network is large, the duty ratio of the operation is selected to be zero in the initial stage, that is, the coordinator is always in the receiving state, and the corresponding operation selection probability is also 1, so that the coordinator in the early stage can be ensured to collect more information of the network.
S105, the mediator (FFD) interacts with the surrounding environment using a learning automata method (LA).
It should be noted that the learning robot model with a variable structure may be represented by a three-dimensional array LA (α, p), where α { α ═ is1,α2,...,αrDenotes an action set of the learning automaton, β ═ β1,β2,...,βrDenotes the set of feedback signals given by the environment, p ═ p1,p2,...,prDenotes the set of action probabilities, satisfy
Figure GDA0001538403520000071
Wherein p isi(n) α representing the learning process through the n-th roundiThe corresponding action probability is satisfied, and the probability update formula p (n +1) ═ T (α (n), β (n), p (n)), T denotes a learning algorithmThe general learning algorithm mechanism of the learning automaton is defined as follows:
Figure GDA0001538403520000072
Figure GDA0001538403520000073
wherein α (n) and b (n) are linear functions giAnd hiThe weight coefficient of (2) can be defined as a linear function or a constant, and is determined according to specific application; and adopting a P-environment model, wherein the feedback signal takes a value of 0 or 1, and when the feedback signal takes 0, the environment gives a reward signal. When the feedback signal takes 0, the corresponding probability update is represented as follows:
Figure GDA0001538403520000074
when the feedback signal takes 1, the corresponding probability update is expressed as follows:
Figure GDA0001538403520000075
s106, selecting an exploration strategy: different exploration strategies are selected at different periods.
It should be noted that although there is an action selection probability, it may cause the coordinator to adjust more slowly only depending on the action selection probability, and cannot reflect the environment in time.
Specifically, the whole part of the exploration strategy is divided into 3 stages: an initial stage, an exploration stage and a greedy stage;
in the initial stage, all actions in the set are explored in a deterministic manner by adopting a cyclic search strategy, the node selects the highest duty ratio at the initial stage, and the duty ratio is slowly reduced until the minimum duty ratio is reached, so that all duty ratio sets are ensured to be tried, namely the action set of the learning automaton is completely listed.
During the exploration phase, once all actions have been selected, we adopt the following strategy:
Figure GDA0001538403520000081
in particular, this strategy includes the act of randomly exploring a higher duty cycle than the current selection, indicating that the reward is increased if the probability of selection increases. Otherwise, if the reward remains the same or decreases, it will randomly explore low duty cycle actions.
And a greedy stage, wherein after the exploration strategy is applied for learning for a period of time, the node basically knows the environment almost, and the autonomous selection action can be started, wherein the following strategy is adopted:
Figure GDA0001538403520000082
in particular, the greedy strategy selects the action with the best P value in the subset of actions with lower action values, in other words, selects a higher duty cycle than the one selected at the last moment. In case of several actions with the same P-value in the selected subset, the action with the lowest duty cycle (highest action value) is selected. This means that we choose the best action with the lowest duty cycle, indicating that it chooses a better P value if the reward is equal to or lower than the reward received in the previous phase. Therefore, under steady conditions, a minimum duty cycle is preferred. Once an action is selected, the probability of exploration of a node is increased if the new action value is different from the previous stage.
S107, evaluating the influence of the action and the environment on data transmission after interaction, updating a feedback set and updating an action selection probability set
It is noted that the coordinator updates the reward per beacon interval by using the feedback received from the sender during the last activity duration. The reward function is defined as follows:
Figure GDA0001538403520000083
where β represents the combination of penalty (negative) values for performance selected for the phase duty cycle. As can be seen from the above equation, the best reward is a zero value (no penalty) because it means no idle listening and no overflow of the transmit queue.
Specifically, the reward is based on the queue occupancy O and the threshold OmaxA comparison between them. If the queue occupancy is higher than the upper threshold OmaxThen the reward signal is negative (-1), which means that the larger the OmaxThe more chance that the final device must drop the packet, and therefore the lower the reward it receives. The threshold value OmaxThe choice of (c) indicates how sensitive the coordinator is to frame loss. The setting of the parameter may be set according to the reliability requirements of the application. It may be set to 0.8 in the normal case, if the queue occupancy O is less than the threshold OmaxThen the feedback signal is defined as a negative value equal to the amount of idle sensing, since idle sensing is one of the main causes of energy consumption. So the lower it is, the better. A maximum reward of zero (no penalty) can only be reached when the idle listening is zero and the queue occupancy O indicates no data frame loss. This means that the goal of an optimal trade-off between bandwidth utilization and energy consumption is achieved.
S108, selecting action, selecting BO and SO standard parameters for determining duty ratio based on feedback set, and realizing self-adaptive MAC scheduling
After selecting the action value, the BO and SO standard parameters that determine the duty cycle are adjusted. The adjustment is defined as follows:
BO=max(4,|A|→(BI-SD)<) (13)
SO←max(0,BO-αt) (14)
it should be noted that the selection is based on the delay experienced by the data frames, and the parameter values BO and SO are embedded in the beacon frames broadcast to the terminal devices for synchronization.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Claims (9)

1. A high-reliability self-adaptive MAC layer scheduling method is characterized by comprising the following steps:
the first step is modeling according to the wireless sensor network environment, applying the learning automaton method to the wireless sensor network environment, and representing the sensor network environment model by a three-dimensional array E ═ (α, p), wherein α ═ α12,....,αrDenotes a finite set of actions for node auto-learning, i.e., input, and denotes a set of duty cycles for the node, β ═ β12,...,βmDenotes a feedback signal output by the node after selecting a proper duty ratio and interacting with the environment, and p ═ p1,p2,....,prDenotes a series of reward and punishment probabilities, each punishment probability piAll in conjunction with a given input variable αi(ii) related;
secondly, the node generates a specific frame structure format, and embeds queue occupancy rate and queuing delay parameter by using reserved bits of a frame control field, specifically, each terminal device embeds queue occupancy rate O and queuing delay D in the frame control structure of each data frame to be transmitted, and the information is embedded by using 3 reserved bits of the frame control field;
thirdly, the coordinators (FFD) perform flow estimation to generate a flow adaptive duty ratio set, and each coordinator estimates the incoming service by calculating idle monitoring, packet accumulation and delay in the terminal equipment sending queue, wherein the expression of the idle monitoring IL is as follows:
IL=1.0-SFu(1)
wherein, SFuRepresenting the superframe utilization, which is the ratio of the time that the terminal device occupies the superframe to the total time available for data communication;
fourthly, initializing an action set of the coordinator, and selecting a probability set and a feedback set for the action;
fifthly, the coordinator (FFD) interacts with the surrounding environment by using a Learning Automata (LA) method, a P-environment model is adopted, the value of the feedback signal is 0 or-1, and if the feedback signal is 0, the probability is defined as follows:
Figure FDA0002631667840000011
if the feedback signal takes-1, the probability is defined as follows:
Figure FDA0002631667840000012
wherein r is the number of actions in the action set of the learning automaton, α (n) and b (n) are weight coefficients of a linear function, which can be defined as a linear function or a constant, depending on the specific application;
and sixthly, selecting an exploration strategy: selecting different exploration strategies at different periods, dividing the whole learning process into three stages, wherein a cyclic search strategy is adopted in the initial stage, a random search strategy is adopted in the exploration stage, and a greedy strategy is adopted in the greedy stage;
seventhly, evaluating the influence of the action on data transmission after the action interacts with the environment, updating a feedback set, and updating an action selection probability set, wherein a reward function for evaluation is defined as follows:
Figure FDA0002631667840000021
whereinβ denotes the combination of penalty values for the performance of the phase duty cycle selection, i.e. the combination of negative values, O denotes the occupancy, O denotes the duty cyclemaxIndicating an occupancy upper threshold, as seen from the above equation, the best reward is a zero value, i.e., no penalty, since it indicates no idle listening and no overflow of the transmission queue;
and an eighth step of selecting action, namely selecting BO and SO standard parameters for determining the duty ratio based on the feedback set, and selecting the optimal duty ratio, wherein the BO parameters are defined as follows:
BO=max(4,|A|→(BI-SD)<) (5)
where a denotes a learning automaton, BI denotes the current beacon interval, SD denotes the superframe duration, and denotes the delay experienced by the data frame.
2. The method as claimed in claim 1, wherein the network environment model is established, and in particular, the wireless sensor network environment model is represented by a three-dimensional array E ═ (α, p), where α ═ α12,....,αnThe node automatic learning, namely the input limited action set is represented, and the duty ratio set of the node is represented, β ═ β12,...,βmDenotes a feedback signal output by the node after selecting a proper duty ratio and interacting with the environment, and p ═ p1,p2,....,pnDenotes a series of reward and punishment probabilities, each punishment probability piAll in conjunction with a given input variable αiIn connection with this, the environment can be divided into 3 types of P-type environment, Q-type environment and S-type environment according to the feedback signal β, the wireless sensor network environment is modeled by P-model, the feedback signal is boolean value, i.e. β is described by binary 0 and 1, wherein αii∈α) representing the selected activity of the learning automaton, P (t) representing the probability vector at time t, denoted by PrewardDenotes a reward factor, denoted by PpenaltyRepresenting a penalty factor, determining the probability of increasing or decreasing activity by these two factors, respectively, using the notation P (t) to represent the probability vector at any time t, the notation p (t) to represent the probability vector at a given time t, andthe rate vector p (t) is updated as follows:
Figure FDA0002631667840000022
if the activity is awarded by the random environment, the activity probability vector P (t) is updated as follows:
Figure FDA0002631667840000031
3. a highly reliable adaptive MAC layer scheduling method as claimed in claim 1, characterized in that the node generates a specific frame structure format, in particular, each terminal device embeds the queue occupancy O and the queuing delay D in the frame control structure of each data frame transmitted, this information being embedded using 3 reserved bits of the frame control field, in particular, each transmitting node uses two bits to represent 4 different levels of queue occupancy OiAnd queuing delay diDivided into 2 levels, from this information the coordinator can estimate queue occupancy O and queuing delay D, the queue occupancy O being defined as follows:
Figure FDA0002631667840000032
it should be noted that, through the information of 3 reserved bits, the coordinator can estimate the queue occupancy rate O and the queuing delay D, if there is a node device reaching or exceeding the maximum number of frames that can be stored in its queue, the queue occupancy rate O estimated by the coordinator is equal to 1, otherwise, the queue occupancy rate O estimated by the coordinator is equal to the average value of the queue occupancy rates of the first message received in the packet accumulation contention access period, i.e. CAP, where the first message received in the packet accumulation contention access period, i.e. CAP, is the message with the highest queue occupancy rate during the inactive period, and it should be noted that the queuing delay bit D of each terminal device i isiRepresenting the current beacon interval BI with a defined minimum delay threshold DthIf less than the threshold, queuing a delay bit DiIs '0', otherwise '1', the coordinator will minimize the delay threshold DthThe flag is the maximum delay of the node device transmission, which is done to ensure that any node can still transmit data when the queuing delay is above a threshold.
4. The method as claimed in claim 1, wherein the coordinators (FFD) perform traffic estimation, and specifically each coordinator estimates the incoming traffic by calculating idle snoops, packet accumulation and delay in the sending queue of the terminal device, and the expression of the idle snoop IL is as follows:
IL=1.0-SFu(9)
wherein, SFuThe superframe utilization rate is the ratio of the time occupied by the terminal device in the superframe to the total time available for data communication, and is defined as:
Figure FDA0002631667840000041
where SD is the superframe duration, TbIndicates the time taken by the coordinator to perform beacon transmission, TcIndicating the time, T, that a device occupies the channel due to a frame collisionrIs the time for data reception, TsThe definition is as follows:
Ts=TCCA+TDATA+TIFS+TACK(11)
wherein, TCCAIndicating the channel estimation time, T, during each frame data transmissionDATAIndicating data transmission time, TIFSIndicating the inter-frame space, TACKIndicating the time of acknowledgment receipt.
5. The method as claimed in claim 1, wherein the action set, the action selection probability set and the feedback set are initialized, and the learning automaton is a probability-based learning toolOver-random activity probability vector Pi(t) to select the activity, the activity probability vector is the main component of the learning automaton, so it must be kept updated at any time, learning automaton AiIs represented as follows:
Figure FDA0002631667840000042
wherein, Pi(t) denotes the node n at time tiSelecting a probability of a certain duty cycle, wherein the probability is expressed as an expectation value of the corresponding duty cycle overall feedback return following a normal distribution:
Figure FDA0002631667840000043
wherein
Figure FDA0002631667840000044
Representing the expected value of the corresponding duty cycle overall feedback return,
Figure FDA0002631667840000046
the probability density of the overall feedback expectation of the duty ratio is shown, it should be noted that the action of the coordinator is initially selected as the maximum duty ratio, that is, the coordinator is always in the receiving state, and the corresponding action selection probability is also 1, so that the earlier-stage coordinator can collect more information of the network.
6. A highly reliable adaptive MAC layer scheduling method as claimed in claim 1, characterized in that the coordinator (FFD) uses a Learning Automata (LA) method to interact with the surrounding environment, in particular, the learning automata model can be represented by a three-dimensional array LA ═ α, p, where α ═ α12,...,αrDenotes an action set of the learning automaton, β ═ β12,...,βrDenotes the set of feedback signals given by the environment, p ═ p1,p2,...,prDenotes the action probability set, fullFoot
Figure FDA0002631667840000045
Wherein p isi(n) α representing the learning process through the n-th roundiCorresponding action probability, satisfying probability updating formula p (n +1) ═ T (α (n), β (n), p (n)), where T represents learning algorithm;
specifically, a P-environment model is adopted, the feedback signal takes a value of 0 or 1, when the feedback signal takes a value of 0, the environment gives a reward signal, and when the feedback signal takes a value of 0 or 1, the corresponding probability updates are respectively expressed as follows:
when the feedback signal takes 0:
Figure FDA0002631667840000051
when the feedback signal takes 1:
Figure FDA0002631667840000052
it should be noted that, in the process of adjusting the duty ratio by using the learning automaton method, a feedback β of the environment is continuously received, and the total received feedback can be understood as the sum of the immediate feedback and the future feedback, as shown below:
Figure FDA0002631667840000053
where γ is a discount factor, γ ∈ [0,1], representing a weight for future feedback.
7. The high-reliability adaptive MAC layer scheduling method of claim 1, wherein different exploration strategies are selected at different time periods, specifically, the whole part of the exploration strategies are divided into 3 stages: initial phase, exploration phase and greedy phase:
in the initial stage, all actions in the set are explored in a deterministic manner by adopting a cyclic search strategy, the node selects the highest duty ratio at the initial stage, and the duty ratio is slowly reduced until the minimum duty ratio is reached, so that all duty ratio sets are ensured to be tried, namely the action set of the learning automaton is listed completely;
exploration phase, once all actions are selected, actions higher than the duty cycle selected by the current phase will be randomly explored if corresponding β in feedback set βt iIncreased, then action α is indicatediThe duty cycle represented is better, otherwise, if the feedback set β remains unchanged or corresponding βt iLess, it will randomly explore the actions of lower duty cycle, the strategy is as follows:
Figure FDA0002631667840000054
a greedy stage, in which after the exploration strategy is applied to learn for a period of time, the node basically knows the environment almost, and at this time, the greedy strategy is used to find the optimal action value, when the feedback set β corresponds to βt iHigher than the last stage
Figure FDA0002631667840000055
To account for increased traffic, the greedy strategy selects a subset of actions with lower action values, i.e., selects a higher duty cycle, if the corresponding β in feedback set βt iLower than or equal to the previous stage
Figure FDA0002631667840000056
The greedy strategy selects the subset of actions with higher action values, i.e. selects a lower duty cycle; therefore, under a stable condition, a minimum duty ratio is preferred, the probability of searching is increased when the duty ratio selected in the next stage is different from that selected in the present stage, otherwise, the learning and exploring probabilities are reduced to avoid oscillation when the optimal action is selected, and the strategy is as follows:
Figure FDA0002631667840000061
where β represents the combination of negative values of the performance of the duty cycle selection for this phase, i.e. the combination of penalty values, equation (18) can result, the best reward is a zero value, i.e. no penalty, since it represents no idle listening, no overflow of the transmission queue, it is noted that if the new action is equal to the last action selected, the learning and exploration rates are reduced to avoid oscillation in which the best action is selected.
8. The method according to claim 1, wherein the selecting operation selects BO and SO standard parameters for determining the duty cycle based on the feedback set to implement the adaptive MAC scheduling, and specifically, after the selecting operation value, the adjusting formula is defined as follows:
BO=max(4,|A|→(BI-SD)<) (19)
SO←max(0,BO-αt) (20) wherein a denotes a learning automaton, BI denotes the current beacon interval, SD denotes the superframe duration, denotes the delay experienced by the data frame, it is noted that the selection is based on the delay experienced by the data frame, and the parameter values BO and SO are embedded in the beacon frame broadcast to the terminal devices for synchronization.
9. An apparatus for implementing a highly reliable adaptive MAC layer scheduling method, comprising:
a model establishing unit for establishing a model according to the wireless sensor network environment, applying the learning automaton method to the environment of the wireless sensor network, and representing the sensor network environment model by a three-dimensional array E (α, p), wherein α is { α ═12,....,αrDenotes a finite set of actions for node auto-learning, i.e., input, and denotes a set of duty cycles for the node, β ═ β12,...,βmDenotes a feedback signal output by the node after selecting a proper duty ratio and interacting with the environment, and p ═ p1,p2,....,prRepresents a series of reward and punishmentProbability, per penalty probability piAll in conjunction with a given input variable αi(ii) related;
a generating unit, configured to generate a specific frame structure format by a node, embed, using reserved bits of a frame control field, a queue occupancy rate and a queuing delay parameter, specifically, embed, by each terminal device, a queue occupancy rate O and a queuing delay D in a frame control structure of each data frame to be transmitted, where the information is embedded using 3 reserved bits of the frame control field;
a traffic estimation unit, configured to perform traffic estimation by a coordinator (FFD), and generate a traffic adaptive duty cycle set, where each coordinator estimates an incoming service by calculating idle snoops, packet accumulation, and delay in a terminal device transmission queue, where an expression of the idle snoop IL is as follows:
IL=1.0-SFu(21)
wherein, SFuRepresenting the superframe utilization, which is the ratio of the time that the terminal device occupies the superframe to the total time available for data communication;
the coordinator initialization unit is used for initializing an action set of the coordinator, and selecting a probability set and a feedback set for the action;
an environment interaction unit, configured to interact with a surrounding environment by using a Learning Automata (LA) method through a coordinator (FFD), where a P-environment model is adopted, a value of a feedback signal is 0 or-1, and if the feedback signal is 0, a probability is defined as follows:
Figure FDA0002631667840000071
if the feedback signal takes-1, the probability is defined as follows:
Figure FDA0002631667840000072
wherein r is the number of actions in the action set of the learning automaton, α (n) and b (n) are weight coefficients of a linear function, which can be defined as a linear function or a constant, depending on the specific application;
an exploration strategy selection unit for selecting an exploration strategy: selecting different exploration strategies at different periods, dividing the whole learning process into three stages, wherein a cyclic search strategy is adopted in the initial stage, a random search strategy is adopted in the exploration stage, and a greedy strategy is adopted in the greedy stage;
the interaction evaluation unit is used for evaluating the influence of the action on data transmission after interaction with the environment, updating the feedback set and updating the action selection probability set, and the reward function used for evaluation is defined as follows:
Figure FDA0002631667840000073
wherein β denotes the combination of penalty values for the performance of the phase duty cycle selection, i.e. the combination of negative values, O denotes the occupancy, O denotes the duty cyclemaxIndicating an occupancy upper threshold, as seen from the above equation, the best reward is a zero value, i.e., no penalty, since it indicates no idle listening and no overflow of the transmission queue;
and an action selection unit for selecting an action, namely selecting BO and SO standard parameters for determining the duty ratio based on the feedback set, and selecting the optimal duty ratio, wherein the BO parameters are defined as follows:
BO=max(4,|A|→(BI-SD)<) (25)
where a denotes a learning automaton, BI denotes the current beacon interval, SD denotes the superframe duration, and denotes the delay experienced by the data frame.
CN201710946487.6A 2017-10-11 2017-10-11 High-reliability self-adaptive MAC (media Access control) layer scheduling method Active CN109660375B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710946487.6A CN109660375B (en) 2017-10-11 2017-10-11 High-reliability self-adaptive MAC (media Access control) layer scheduling method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710946487.6A CN109660375B (en) 2017-10-11 2017-10-11 High-reliability self-adaptive MAC (media Access control) layer scheduling method

Publications (2)

Publication Number Publication Date
CN109660375A CN109660375A (en) 2019-04-19
CN109660375B true CN109660375B (en) 2020-10-02

Family

ID=66108497

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710946487.6A Active CN109660375B (en) 2017-10-11 2017-10-11 High-reliability self-adaptive MAC (media Access control) layer scheduling method

Country Status (1)

Country Link
CN (1) CN109660375B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110856264A (en) * 2019-11-08 2020-02-28 山东大学 Distributed scheduling method for optimizing information age in sensor network
CN111542070B (en) * 2020-04-17 2023-03-14 上海海事大学 Efficient multi-constraint deployment method for industrial wireless sensor network
CN114666880B (en) * 2022-03-16 2024-04-26 中南大学 Method for reducing end-to-end delay in delay-sensitive wireless sensor network

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103260229A (en) * 2013-06-04 2013-08-21 东北林业大学 Wireless sensor network MAC protocol based on forecast and feedback

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103260229A (en) * 2013-06-04 2013-08-21 东北林业大学 Wireless sensor network MAC protocol based on forecast and feedback

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Traffic Adaptive Duty Cycle MAC Protocol for Wireless Sensor Networks;Chen Hao等;《IEEE》;20081231;全文 *
无线传感器网络自适应MAC协议;范清峰等;《计算机工程与应用》;20101231;全文 *

Also Published As

Publication number Publication date
CN109660375A (en) 2019-04-19

Similar Documents

Publication Publication Date Title
JP6498818B2 (en) Communication system, access network node and method for optimizing energy consumed in a communication network
de Paz Alberola et al. Duty cycle learning algorithm (DCLA) for IEEE 802.15. 4 beacon-enabled wireless sensor networks
JP6266773B2 (en) Communication system and method for determining optimal duty cycle to minimize overall energy consumption
Oliveira et al. A duty cycle self-adaptation algorithm for the 802.15. 4 wireless sensor networks
Jeon et al. DCA: Duty-cycle adaptation algorithm for IEEE 802.15. 4 beacon-enabled networks
Lee et al. Adaptive duty-cycle based congestion control for home automation networks
CN109660375B (en) High-reliability self-adaptive MAC (media Access control) layer scheduling method
WO2005060604A2 (en) Wireless network with improved sharing of high power consumption tasks
Hassan et al. Traffic differentiation and dynamic duty cycle adaptation in IEEE 802.15. 4 beacon enabled WSN for real-time applications
Jagannath et al. A hybrid MAC protocol with channel-dependent optimized scheduling for clustered underwater acoustic sensor networks
Cheng et al. An opportunistic routing in energy-harvesting wireless sensor networks with dynamic transmission power
Siddiqui et al. Towards dynamic polling: Survey and analysis of Channel Polling mechanisms for Wireless Sensor Networks
US8320269B2 (en) Scalable delay-power control algorithm for bandwidth sharing in wireless networks
Zhou et al. An efficient adaptive mac frame aggregation scheme in delay tolerant sensor networks
KR101557588B1 (en) Apparatus for packet retransmission in wireless sensor network
Perillo et al. ASP: An adaptive energy-efficient polling algorithm for Bluetooth piconets
Afroz et al. QX-MAC: Improving QoS and Energy Performance of IoT-based WSNs using Q-Learning
Han et al. Multi-agent reinforcement learning for green energy powered IoT networks with random access
Shrestha et al. A Markov decision process (MDP)-based congestion-aware medium access strategy for IEEE 802.15. 4
CN111432505B (en) Wireless networking transmission system based on WiFi
Nefzi et al. SCSP: An energy efficient network-MAC cross-layer design for wireless sensor networks
de Paz et al. Dcla: A duty-cycle learning algorithm for ieee 802.15. 4 beacon-enabled wsns
El Rachkidy et al. Queue-exchange mechanism to improve the QoS in a multi-stack architecture
Koren et al. Requirements and challenges in wireless network's performance evaluation in ambient assisted living environments
Poulose Jacob et al. Channel adaptive MAC protocol with traffic-aware distributed power management in wireless sensor networks-some performance issues

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant