CN111356198B

CN111356198B - Clustering cross-layer communication processing method and system based on geographic position and Q learning

Info

Publication number: CN111356198B
Application number: CN202010085552.2A
Authority: CN
Inventors: 何先灯; 邱熠凡; 陈南; 易运晖; 权东晓; 朱畅华; 赵楠
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2020-02-10
Filing date: 2020-02-10
Publication date: 2022-02-08
Anticipated expiration: 2040-02-10
Also published as: CN111356198A

Abstract

The invention belongs to the technical field of communication, and discloses a clustering cross-layer communication processing method and a clustering cross-layer communication processing system based on geographical positions and Q learning.A neighbor node discovery stage is used for discovering neighbor nodes and selecting a cluster head by combining the Q learning through the division of the geographical positions and time slots; and in the data transmission stage, the cluster head sends TC information to discover the whole network topology, and when a data transmission request exists, the nodes select the relay nodes to transmit data by using the cluster head geographical positions of the clusters where the target nodes are located in the routing table and combining a greedy principle. The invention solves the defects of inaccurate cluster head neighbor node information collection, frequent cluster head replacement and the like caused by collision of a large number of broadcast packets in the conventional clustering routing algorithm. The cluster head reconstructs the TC packets which are not sent out by analyzing the TC messages sent by other cluster heads in the cluster, thereby reducing the size of the TC messages and reducing the topological overhead in a network.

Description

Clustering cross-layer communication processing method and system based on geographic position and Q learning

Technical Field

The invention belongs to the technical field of communication, and particularly relates to a clustering cross-layer communication processing method and system based on geographic position and Q learning.

Background

China has a wide sea area and a plurality of ports, has abundant ocean resources, is a big country for aquatic product production and trade, and is an important industry in coastal areas in marine aquaculture industry and marine fishing industry. Due to the complexity and the changeability of the offshore operation environment, fishery fishing is carried out in the offshore area, and the production condition is relatively severe. Therefore, how to meet the basic requirements of offshore safety operation is always a problem of attention of all levels of governments and fishery administration departments. The lack of marine communication environment infrastructure makes the mature wireless communication technology on land not directly applicable to marine communication systems, so it is very necessary to research marine telecommunication technology according to the particularity of marine communication environment. Adhoc is one of the major technologies to enable marine telecommunication. In a communication system, a MAC protocol specifies when nodes in a wireless network access a channel, coordinating the nodes in the network to share a bandwidth-limited wireless channel efficiently and fairly. There are many classic MAC protocols, which can be classified into the following categories for different channel access modes: contention-type protocols such as CSMA, nodes access the channel using a direct contention approach, re-contend for the channel and then retransmit the data packet if a data packet collision occurs. An allocation-like MAC protocol, such as TDMA, pre-allocates a certain amount of channel resources to nodes in the network, so that the nodes can complete data transmission on their allocated channels without interference. The reservation-type MAC protocol is a protocol in which a node in a network uses a short reservation packet to perform channel reservation and then can perform collision-free data transmission on a reserved sub-channel.

Since the Ad Hoc network can perform data transmission by using a multi-hop forwarding method, a routing protocol is required to establish an reachable path from a source node to a destination node. And the route discovery mode and the route selection strategy are core problems of the protocol. According to the difference of route establishing process, the following categories can be divided: proactive routing protocols, such as OLSR, where nodes periodically broadcast routing packets to obtain link information between nodes in the network. Then, according to the topology information, in combination with a proper routing algorithm, the nodes establish routes to all the nodes in the network, and update the routing table in real time according to the received routing information. An on-demand routing protocol, such as AODV, obtains a required route through route discovery when a source node needs to send data but does not have a route to a destination node. After acquiring the positioning information of the nodes in the network, the nodes in the network acquire the position of a target node through a position service protocol, and acquire the specific geographical position of a neighbor node in a beacon interaction mode. And the forwarding of the data packet is completed by utilizing the position of the node and combining a corresponding routing algorithm.

Since communication nodes in an Ad Hoc network have mobility, a distributed control structure is generally adopted, and the distributed control structure can be generally divided into two types: planar structures and layered structures. In a network with a hierarchical structure, the nodes in the network are subjected to hierarchical topology control through a reasonable clustering algorithm. In the network, the sub-network formed by the cluster head nodes selected by the cluster head election algorithm is stable, the influence of the change of the topological structure on the routing protocol can be reduced, and the topology of the large-scale network can be managed conveniently. Regarding the election of the cluster head, a typical clustering algorithm is as follows: a minimum ID clustering algorithm, a highest node degree clustering algorithm, a lowest mobility clustering algorithm, a load balancing node ID clustering algorithm, a load balancing node degree clustering algorithm and a combined weighted clustering algorithm.

At present, the closest prior art proposes to reduce collision occurring at a neighbor node by clustering geographical locations and performing time slot division, so as to improve the performance of the neighbor node discovery process. On this basis, the closest prior art two further proposes a cross-layer protocol based on geographical location and clustering. The protocol broadcasts TC groups through the cluster heads, so that nodes in the network can acquire node distribution information in each cellular cluster and specific geographic positions of the cluster heads. When a data transmission request exists, the nodes select the relay nodes to carry out data forwarding by using the cluster head geographical position of the cluster where the target node is located in the routing table and combining the greedy principle, so that the routing overhead is reduced, and the method has higher throughput and lower packet loss rate and time delay. The third closest prior art proposes a cross-layer protocol based on clustering and hybrid MAC access. Route overhead and latency are optimized by establishing routes by sending route request messages (RREQ) and route reply messages (RREP) only between cluster heads.

These nearest existing clustering algorithms usually rely on information interaction between neighboring nodes, but few articles study the incompleteness of information collection caused by collision in the broadcasting process, thereby causing performance deterioration in cluster election. Nowadays, how to design a reasonable channel access mechanism of a MAC layer and a routing protocol of a network layer of a mobile Ad Hoc network has become a hot issue of current Ad Hoc network research. The layered Ad hoc networking has the advantage of being more suitable for large-scale networks due to better expansibility, and is widely applied. Therefore, it is necessary to design a protocol suitable for the fishing Ad Hoc network by focusing on the research on the channel access mechanism of the MAC layer and the routing protocol of the network layer, and the principle and key technology of the clustering algorithm in the layered Ad Hoc.

In summary, the problems of the prior closest technologies are as follows:

(1) in the prior art, the cluster head is frequently replaced, information completely depends on interaction between nodes of a current wheel, and information errors caused by collision are not considered.

The difficulty of solving the technical problems is as follows:

due to the fact that the wireless communication system nodes autonomously compete for the channel, the broadcast packet collision probability is high, and the accuracy of the collected information is difficult to guarantee only by means of the data packets collected in the current round.

The significance of solving the technical problems is as follows:

the method breaks through the limitation caused by the traditional algorithm through the Q learning intelligent algorithm, and accumulates the previous information, so that the incompleteness and inaccuracy of data packet collection caused by collision are reduced, the stability and robustness of the cluster head are improved, and the overall performance of the network is improved.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a clustering cross-layer communication processing method and system based on geographic position and Q learning.

The invention is realized in such a way that a clustering cross-layer communication processing method based on geographic position and Q learning comprises the following steps:

step one, in a neighbor node discovery stage, neighbor node discovery is carried out through the division of geographic positions and time slots and the combination of Q learning, and a cluster head is selected;

and in the data transmission stage, the cluster head sends TC information to discover the topology of the whole network, and when a data transmission request exists, the nodes select relay nodes to forward data by using the cluster head geographical position of the cluster where the target node is located in the routing table and combining a greedy principle.

Further, the clustering cross-layer communication processing method based on the geographic position and Q learning adopts a TDMA mechanism, time is periodically divided into time slices with the time of 2s, and each time slice is divided into a neighbor discovery time period and a data transmission time period; the neighbor discovery period occupies 0.09s, each small time slot occupies 0.01s, and the rest period is the data transmission period.

Further, the neighbor node discovery phase includes: dividing the honeycomb, if the radius of the honeycomb is R, dividing a circle with the radius of 12R at the center of each honeycomb, and equally dividing the rest part of each cluster into six parts; the indexes from 1 to 3 are marked on the circle at the center of the honeycomb in sequence, and the indexes from 4 to 9 are marked on other areas of each honeycomb in sequence; meanwhile, dividing a neighbor node discovery period into 9 time frames equally, numbering the time frames sequentially from 1 to 9, and distributing the time frames to regions with the same index value in a regular hexagon cluster, wherein each time frame consists of n small time slots; the wireless transmission distance is 32 times of the radius of the honeycomb, when a node is arranged in a cluster center circle, the node in the circle communicates with all nodes in the honeycomb, and one cluster head can realize the full coverage of the node in the cluster.

Further, the neighbor node discovery phase is a policy action that the accumulation of environment reward values obtained from the environment by the action selected by the system is maximum in Q learning, namely pi: s → A; wherein Q is an update formula shown in formula (1):

where α represents the learning rate and γ is a discount factor，r_iIs a return function; the trend of the system to produce this action is mainly determined by the reward value of the environment, i.e. the reward function, and the trend is stronger if it is a positive reward value and weaker if it is a negative reward value.

Further, the method for selecting the Q-learning cluster head comprises the following steps: in the first round of discovery completion of the neighbor nodes, firstly, the cluster head is initially selected, and the node closest to the cluster center becomes the cluster head; starting from the second round of the neighbor node discovery phase, each node locally stores a self Q table in a distributed manner; the Q table is a structure array, and cluster neighbor node IDs, Qval and Qflag are stored in the structure; qval is the Q value of the neighbor node, and represents that the current round has received the message of the node when Qflag is 1, and represents that the current round has not received the HELLO message of the node when Qflag is 0; in the HELLO stage, when HELLO information of a neighbor node in a cluster is received, correspondingly updating Qval corresponding to the neighbor node and setting Qflag to 1; when the HELLO timer expires, setting the Qflag corresponding to the node which does not receive the HELLO message in the current round and has the node ID in the Q table to be 0; if two continuous rounds of the nodes do not receive the HELLO message of the node stored in the Q table, namely the Qflag of the neighbor node is 0 in two continuous rounds, deleting the node from the Q table; adding no information in the Q table for the neighbor nodes which are not in the cluster; if i is the current node and m cluster neighbor nodes exist in i, a Q table as follows is locally stored in the node i; the Q table has m +1 columns, wherein one column is a current node, the Qval of the current node is constantly 0, and the Qflag is constantly 1; one for each column.

Further, the Qval value is updated according to equation (2).

Q_k+1(s_k,a_k)＝(1-α)Q_k+1(s_k,a_k)+αr_k (2)

The reward function is defined as equation (3):

r_k＝w₁(d_j-d_i)+w₂(s_j-s_i)-w₃(D_j-D_i) (3)

in formula (3), d_iIs the number of the neighbor nodes in the cluster of the current node,d_jthe number of neighbor nodes in the cluster of the neighbor node j; s_iIs the stability of the current node i, s_jIs the stability of the neighbor node j; d_iIs the distance from the current node i to the cluster center, D_jIs the distance from the neighbor node j to the cluster center; when r is larger than 0, the Q value obtains positive feedback, and positive effect is generated on the handover of the cluster head to the node j; when r is less than 0, the Q value obtains negative feedback and has positive effect on cluster head maintenance;

the stability S of a node is defined as follows:

wherein N is_xIs the current set of neighbor nodes for the x node,

is a neighbor node set obtained after the previous round of HELLO messages; each node has a neighbor node set reserved for the last time locally; when the HELLO timer expires, the node calculates S according to the reserved neighbor node set and the current neighbor set; relatively stable nodes have higher S values; for static networks, S for each node is 1; for all common nodes, such as the node j, the Q table is only updated, each time the Q table is updated, the Q table is equivalent to the feedback obtained after a hypothetical action is performed, namely, the Q table is updated by switching the cluster head or keeping the obtained feedback value under the condition that the node j is used as the cluster head; for all cluster head nodes, after the discovery of the neighbor nodes is finished, traversing Qval of the node with Qflag being 1 in the Q table, and selecting the ID corresponding to the node with the maximum Q value; i is the current cluster head node, and when the Q value corresponding to i is maximum, i continues to play the role of the cluster head; when the Q value corresponding to the neighbor node j is maximum, i transfers the role of the cluster head to the node j, namely, the optimal selection strategy:

wherein A is_kDenotes a_kAll can selectA set of actions taken; when the maximum Q value is obtained, the action a is selected_kThe process of (2); if the cluster head performs the cluster head transfer action, a cluster head transfer packet is sent out at the beginning of a data transmission time slot; when two consecutive rounds of Q values corresponding to the neighbor nodes j in the cluster are maximum, the cluster head i carries out cluster head handover; selecting the action with the maximum Q value as a global optimal solution; a node is equivalent to a state, and each node is sampling all its neighbors using HELLO messages.

Further, a time slot preemption mechanism is adopted in the data transmission stage, 1ms of channel preemption time slots are reserved when the data transmission time slots start after the HELLO stage is finished, if a cluster head needs to perform cluster head handover, the 1ms of time slots are preempted, all other nodes can monitor in the 1ms, if the nodes are preempted, the other nodes in the next 10ms are silent, a pure channel is reserved for the cluster head to perform cluster head handover, when the candidate cluster head does not respond, the cluster head starts retransmission for one time, and if the candidate cluster head does not respond for two times, a suboptimal node is selected to perform cluster head handover; if the 1ms is not occupied by any node, TC message flooding and data transmission are carried out at the next time;

all cluster heads in the data transmission stage network periodically generate TC (transmission control) messages, wherein the TC messages comprise IDs (identity) of all nodes in a cluster and geographical position information of the cluster heads; after receiving TC messages generated by other cluster heads, the cluster heads forward the messages;

after the cluster head node A receives the TC message, if the TC message is sent by another cluster head B in the same cluster and the TC packet of the node A is not sent out, the node A compares the cluster node set of the node B with the cluster node set of the node A:

(1) if the cluster node set of the node B comprises the node A and the cluster node set of the node A, the node A is degraded into a common node and deletes the TC message to be sent in the queue;

(2) if the cluster node set of B does not completely contain the node A and the cluster node set of the node A, deleting the TC message in the queue, deleting the node contained in the received TC message from the cluster node of B, and regenerating a new TC message.

Further, still include:

when a cluster head sends a TC packet:

(1) the round node back-off time is a fixed time slot DIFS + a random back-off time t1, that is, the random back-off time is any value of (0, t 1);

(2) the back-off time of the out-of-circle node is a fixed time slot DIFS + t1, and the random back-off time is t 2;

after the common node A receives the TC message, if the TC message is sent by a cluster head B in the same cluster:

(1) if the intra-cluster node in the TC message contains the node A, the node A changes the cluster head of the node A into the node B;

(2) if the intra-cluster nodes in the TC messages received in two consecutive rounds do not contain the node A, the node A can become a cluster head by itself, and the nodes contained in the received TC messages are deleted from the adjacent nodes in the cluster by itself, and then a new TC message is generated;

if the common node does not receive the TC message of the cluster in one round of data transmission time slot, the common node enters an original cluster head election mechanism in the next round, and the node closest to the center of the cluster is selected as a new cluster head; and in the route discovery process, a greedy forwarding principle is adopted, and the data packet is forwarded hop by selecting the neighbor node closest to the cluster head of the destination node as the relay node.

Another object of the present invention is to provide an application program in a communication terminal, the program causing the terminal to execute steps comprising:

Another object of the present invention is to provide a clustered cross-layer communication processing system implementing the geographic location and Q learning, the clustered cross-layer communication processing system including:

the neighbor node discovery stage module is used for discovering neighbor nodes and selecting cluster heads by dividing geographic positions and time slots and combining Q learning;

and the cluster head TC message sending module is used for carrying out topology discovery of the whole network, and when a data transmission request exists, the nodes select the relay nodes to carry out data forwarding by utilizing the cluster head geographical positions of the clusters where the target nodes are located in the routing table and combining a greedy principle.

In summary, the advantages and positive effects of the invention are: according to the cluster head selection method, the geographical position and the time slot are divided, and the cluster center circles are alternately numbered according to the geographical position, so that the cluster nodes cannot be collided by HELLO messages sent by other cluster center circle nodes in the same time slot when receiving the cluster center circle node HELLO messages, the probability that the cluster nodes receive the HELLO messages of the cluster center nodes is improved, and a better cluster head can be selected during the first round of initial selection of the cluster head; the selected cluster head comprehensively considers various factors by comprehensively considering the number of neighbor nodes in the cluster of the nodes, the distance between the nodes and the center of the cluster and the stability of the nodes; by introducing Q learning in cluster head election, traversing a Q table by a cluster head after the discovery of each round of neighbor nodes is finished, and if the Q value of a neighbor node in a cluster is larger than the Q value of the cluster head, performing cluster head transfer and sending a cluster head transfer packet; if not, the cluster head identity is kept unchanged. The introduction of Q learning enables the information stored by the node to have not only the information of the current round but also the previous experience, so that the contingency caused by the collision of broadcast packets in the network is reduced, and the dynamic change of the network is met; and all nodes maintain the Q table locally, and when a HELLO message from a neighbor node in the cluster is received, the Q value entry corresponding to the neighbor node is updated. The advantage that the ordinary node also maintains the Q table locally is that there is also past experience to guide the subsequent action when it is handed over by the old cluster head to become the new cluster head; the cluster head reconstructs the TC messages which are not sent out by analyzing the TC messages sent out by other cluster heads in the cluster, so that the TC messages sent out by the cluster heads in the same cluster are complementary to each other, and the topological overhead is reduced; when the TC message is sent, the priority of the TC message sent by the nodes in the center circle of the cluster is improved by setting different backoff intervals, so that the nodes more suitable for the cluster head are further protected to become the cluster head.

The invention protects the HELLO message of the node in the center circle of the Cluster by dividing the Time Slot for the node in the center circle of the Cluster separately, avoids the collision between the HELLO message of the node in the center circle of the Cluster and the HELLO message of the node in the center circle of other clusters, and simultaneously improves the probability that the node in the center circle of the Cluster is found by other nodes in the Cluster, compared with the documents [ Yipan Qiu, Xiandinghe and Qingcai Wang, "Cluster and Time Slot Based Cross-layer Protocol for Ad Hoc Network",14th EAI International Conference communication Networks and Network in China,2019] and documents [ Longcha Wang, XiandionHe, Qinchcai Wang, Hearing and Yipan, A-navigation Network and navigation Network, GPS Network and communication Network, and GPS Network and communication Network, 2019], the occurrence of two or three cluster heads in one cluster due to the fact that no HELLO message of a node in the center circle of the cluster is received is reduced. Unlike the documents [ Yifang Qiu, Xianding He and Qingcai Wang, "Cluster and Time Slot Based Cross-layer Protocol for Ad Hoc Network",14th EAI International Conference on communication and Networking in China,2019] and the documents [ Longchano Wang, Xianding He, Qingcai Wang, Heping Yao and Yifang Qiu, "A Cross-layer neighbor Discovery interaction in Ad Hoc Network Based on Hexagocluster and GPS",14th EAI International Conference on communication and Networking in China,2019], the appearance of a node closer to the Cluster center after each round of neighbor Discovery becomes a new first Cluster after each round of neighbor node Discovery, the present invention combines the prior learning Algorithm with the prior learning Algorithm, the present invention combines the prior learning Algorithm with the first Cluster Q, the prior learning Algorithm, the present invention combines the first Cluster coverage rate with the first Cluster coverage rate of the first Cluster, and the present learning Algorithm, the present invention combines the first Cluster coverage rate with the first Cluster coverage rate of the first Cluster, the first Cluster coverage rate of nodes, the second Cluster coverage rate of the first Cluster coverage Algorithm, the defects of inaccurate neighbor node information collection, frequent cluster head replacement and the like caused by collision of a large number of broadcast packets; the cluster head reconstructs the TC packets which are not sent out by analyzing the TC messages sent by other cluster heads in the cluster, so that the size of the TC messages is reduced, and further the topological overhead in the network is reduced.

Drawings

Fig. 1 is a flowchart of a clustering cross-layer communication processing method based on geographic location and Q learning according to an embodiment of the present invention.

FIG. 2 is a schematic structural diagram of a clustered cross-layer communication processing system based on geographic location and Q learning according to an embodiment of the present invention;

in the figure: 1. a neighbor node discovery phase module; 2. and the cluster head sends a TC message module.

Fig. 3 is a QLCT time frame allocation diagram according to an embodiment of the present invention.

Fig. 4 is a schematic diagram of a time slot allocation based on a geographical location in a cell cluster according to an embodiment of the present invention.

Fig. 5 is a schematic diagram of time slot allocation of a cluster center node according to an embodiment of the present invention.

Fig. 6 is a schematic diagram of a Q learning model according to an embodiment of the present invention.

Fig. 7 is a packet loss rate performance diagram according to an embodiment of the present invention.

Fig. 8 is a graph of average end-to-end delay provided by an embodiment of the present invention.

Fig. 9 is a graph of throughput performance provided by an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Aiming at the problems in the prior art, the invention provides a clustering cross-layer communication processing method and system based on geographic position and Q learning, and the invention is described in detail below with reference to the accompanying drawings.

As shown in fig. 1, the clustering cross-layer communication processing method based on geographic location and Q learning according to the embodiment of the present invention includes the following steps:

s101: in the neighbor node discovery stage, neighbor node discovery is carried out through the division of geographic positions and time slots and the combination of Q learning, and a cluster head is selected;

s102: and the cluster head sends TC information to discover the whole network topology, and when a data transmission request exists, the nodes select the relay nodes to forward data by using the cluster head geographical positions of the clusters where the target nodes are located in the routing table and combining a greedy principle.

As shown in fig. 2, the clustering cross-layer communication processing system based on geographic location and Q learning according to the embodiment of the present invention includes:

and the neighbor node discovery phase module 1 is used for discovering neighbor nodes and selecting cluster heads by dividing geographic positions and time slots and combining Q learning.

And the cluster head TC message sending module 2 is used for carrying out topology discovery of the whole network, and when a data transmission request exists, the nodes select the relay nodes to carry out data forwarding by utilizing the cluster head geographical positions of the clusters where the target nodes are located in the routing table and combining a greedy principle.

The technical solution of the present invention is further described below with reference to the accompanying drawings.

Packet loss rate of the embodiment of the present invention: the total amount of data which is not successfully received by the destination node in a certain time is the proportion of the total amount of data which is sent by the source node in the time. Average end-to-end delay: the average time required for a data packet to be transmitted from a source node to a destination node. Throughput: is the total amount of data successfully transmitted in the network per unit time. Reinforcement learning: is a process of continuously learning from the environment state to the behavior map so that the behavior of the system acts to maximize the value of the jackpot obtained in the environment. And Q learning: the method is a reinforcement learning method which is most widely used at present, can realize online learning optimization, and has the advantage of reducing the computational complexity in the iterative process. The method selects the action which enables the Q value to be maximum as the optimal action in a trial and error mode, thereby achieving the purpose that the return function r is maximum and further obtaining the optimal selection strategy. Ad hoc: i.e. wireless mobile ad hoc networks. The Ad Hoc network with dynamically changed topology has the characteristics of no center, self-organization, quick construction and the like under the condition of not depending on fixed network equipment, and data grouping can be transmitted in a multi-hop manner. MAC: i.e., media access control, which defines how data frames are transmitted over the medium. Routing protocol: is a rule that specifies the transmission of data from a source node to a destination node. GPSR: the greedy peripheral stateless routing protocol of a wireless network is a method for making a data packet forwarding strategy by using the positions of routing nodes and destination nodes of the data packet.

The clustering cross-layer protocol based on Q learning provided by the invention is mainly divided into two stages. The first stage is a neighbor node discovery stage, neighbor node discovery is carried out through the division of geographic positions and time slots and the combination of Q learning, and a cluster head is selected. And in the second stage, the cluster head sends TC information to discover the whole network topology, and when a data transmission request exists, the nodes select the relay nodes to forward data by using the cluster head geographical positions of the clusters where the target nodes are located in the routing table and combining a greedy principle.

As shown in fig. 3, the present invention employs a TDMA scheme, in which time is periodically divided into time slices of 2s, and in each time slice, the time slices are divided into a neighbor discovery period and a data transmission period. The difference is that the neighbor discovery period of the invention takes 0.09s, each small time slot is 0.01s, and the rest period is the data transmission period.

A neighbor node discovery phase:

in the present invention, the cells are further divided. As shown in fig. 4, if the cell radius is R, the center of each cell is divided into a circle with a radius of 12R, and the remaining part of each cluster is equally divided into six parts. The circles in the center of the cells are alternately indexed by 1 to 3 in turn, and the other areas of each cell are indexed by 4 to 9 in turn. Meanwhile, the invention equally divides a neighbor node discovery period into 9 time frames, the serial numbers are sequentially the indexes from 1 to 9, and the time frames are distributed to the areas with the same index value in the regular hexagon cluster, and each time frame consists of n small time slots. Through the time slot division mode, the nodes with different indexes in the same cluster can avoid communication collision. Nodes in different clusters but with the same index can simultaneously transmit HELLO messages to achieve space division multiplexing. In the invention, the wireless transmission distance is 32 times of the radius of the honeycomb, when a node is arranged in a cluster center circle, the node in the circle can communicate with all nodes in the honeycomb, and one cluster head can realize the full coverage of the node in the cluster, as shown in figure 5. The time slots from 1 to 3 are sequentially and alternately distributed to the circle at the center of the honeycomb, so that the collision between the HELLO of the node in the center circle of the cluster and the HELLO message of the node in the center circle of other clusters when other nodes in the cluster receive the HELLO of the node in the center circle of the cluster is avoided. In fig. 5, the dotted lines in the figure are the radio transmission ranges of node a and node B, respectively. When the node a and the node B send HELLO messages simultaneously, the node in the cluster where the node a is located can receive the HELLO message of the node a without being interfered by the HELLO message sent by the node B, and the node B has the same principle. When the nodes in the same time slot in the cluster send the HELLO message, the random timing jitter is added to further reduce the collision with other adjacent nodes, so that the node transmission competition channel has fairness and effectiveness.

In order to solve the problems, Q learning is introduced into cluster head election, and election of the cluster heads is guided through multiple rounds of information interaction experiences among nodes. The Q learning algorithm is one of reinforcement learning algorithms, so-called reinforcement learning is continuous learning according to an environment state to behavior mapping, so that the accumulated reward value obtained when the behavior of the system acts on the environment is maximized. The basic model of the operation is shown in fig. 6, Agent agents sense the current state of the environment and make corresponding actions, the environment state is converted into the next new state under the corresponding actions, the accuracy of the actions is evaluated while entering the new state, and the Agent records and updates the Q evaluation value after receiving accuracy return information (Reward). In order for the subsequent actions to always have the maximum accumulated return, the Agent must learn from this delayed indirect return. The method is an adaptive machine learning method based on environmental feedback, and an optimal behavior strategy is discovered by a trial-and-error (trial-and-error) method. The reinforcement learning system makes corresponding behaviors to the received environment state s according to an internal working mechanism, and then the system outputs corresponding behavior actions a. And the environment is changed to a new state s' under the action of the system action a, and meanwhile, the transient punishment feedback r of the environment to the system is obtained. The main goal of the Q learning system is that the action selected by the system accumulates the largest policy action from the environmental reward values obtained from the environment, i.e., pi: s → A. Wherein Q is an update formula shown in formula (1):

in other words, the system is to maximize equation (1), where α represents the learning rate, γ is the discount factor, and r_iIs a reward function. The trend of the system to produce this action is mainly determined by the reward value of the environment, i.e. the reward function, and the trend is stronger if it is a positive reward value and weaker if it is a negative reward value.

The invention provides a cluster head selection algorithm based on Q learning by summarizing the advantages and disadvantages of the conventional clustering algorithm. The frame structure of the HELLO message is as follows.

Table 1 frame structure of HELLO packet

In the first round of discovery completion of the neighbor nodes, the initial selection of the cluster head is firstly carried out, and the node closest to the cluster center becomes the cluster head. Through the time slot allocation of fig. 4, the situation that other nodes in the cluster collide with the HELLO messages of the nodes in the other cluster center circles when receiving the HELLO messages of the nodes in the cluster center circle is avoided, the probability that the nodes in the cluster center circle are found by the other nodes in the cluster is also improved, the situation that two or three cluster heads appear in one cluster due to the fact that the HELLO messages of the nodes in the cluster center circle are not received by the other nodes after the first round of HELLO is finished is reduced, and the cluster heads initially selected in the first round are better. Starting from the second round of the neighbor discovery phase, each node maintains its own Q-table locally distributed. The Q table is a structure array, and cluster neighbor node IDs, Qval and Qflag are stored in the structure. Qval is the Q value of the neighbor node, and when Qflag is 1, it represents that the current round has received the message of the node, and when Qflag is 0, it represents that the current round has not received the HELLO message of the node. In the HELLO stage, when the HELLO message of a neighbor node in a cluster is received, the Qval corresponding to the neighbor node is correspondingly updated, and the Qflag is set to be 1. When the HELLO timer expires, setting the Qflag corresponding to the node which does not receive the HELLO message in the current round and has the node ID in the Q table to be 0. And if the HELLO messages of the nodes stored in the Q table are not received in two consecutive rounds, namely the Qflag of the neighbor node is 0 in two consecutive rounds, deleting the HELLO messages from the Q table. For neighbor nodes not in the cluster, no information is added in the Q table. If i is the current node and there are m cluster neighbor nodes, the following Q table is locally stored in the node i. The Q table has m +1 columns, wherein one column is the current node. Qval of the current node is constantly 0, and Qflag is constantly 1. One action per column (holding or replacing other nodes as cluster heads).

TABLE 2Q Table Structure

Node ID n₁	Node ID n₂	Node ID n₃	……	Current node ID n_i
					Qval	Qval	Qval	……	Qval
Qflag	Qflag	Qflag	……	Qflag

The Qval value is updated according to equation (2).

Q_k+1(s_k,a_k)＝(1-α)Q_k+1(s_k,a_k)+αr_k (2)

The reward function is defined as follows:

r_k＝w₁(d_j-d_i)+w₂(s_j-s_i)-w₃(D_j-D_i) (3)

in the formula (3), d_iIs the number of neighbor nodes in the cluster of the current node, d_jIs the number of neighbor nodes in the cluster of the neighbor node j. s_iIs the stability of the current node i, s_jIs the stability of the neighbor node j. D_iIs the distance from the current node i to the cluster center, D_jIs the distance of the neighbor node j to the cluster center. When r is larger than 0, the Q value obtains positive feedback, and the positive feedback has positive effect on the handover of the cluster head to the node j. When r is less than 0, the Q value obtains negative feedback and has positive effect on cluster head maintenance.

The stability S of a node is defined as follows:

wherein N is_xIs the current set of neighbor nodes for the x node,

is the set of neighbor nodes obtained after the last round of HELLO messages.Each node has a set of neighbor nodes that remain locally the last time. When the HELLO timer expires, the node will keep the set of neighbor nodes and the current set of neighbors to compute S. A relatively stable node has a higher S value. For a static network, S for each node is 1. The Q table is updated only for the normal node j, and each update of the Q table corresponds to feedback obtained after a virtual operation is performed once, that is, it is assumed that the Q table is updated by transferring the cluster head or by holding the obtained feedback value when the node j is the cluster head. The actual action selection is only performed when other cluster heads hand over the cluster head task to themselves. For the cluster head node, the Q table is continuously updated, and action selection is performed at the end of each HELLO phase. And traversing Qval of the node with Qflag being 1 in the Q table by the cluster head node, and selecting the ID corresponding to the node with the maximum Q value. And assuming that i is the current node, and when the Q value corresponding to i is maximum, i continuously takes the role of the cluster head. When the Q value corresponding to the neighbor node j is maximum, i transfers the role of the cluster head to the node j, namely, the optimal selection strategy:

wherein A is_kDenotes a_kA set of actions that can be selected. When the formula shows that the maximum Q value is obtained, the action a is selected_kThe process of (1). If the cluster head performs the action of cluster head handover, a cluster head handover packet is sent out at the beginning of a data transmission time slot.

In order to further avoid the contingency caused by collision of HELLO messages and combine the characteristics of low moving speed and mild topology change of the fishing ad hoc network, the invention provides that when two consecutive rounds of Q values corresponding to neighbor nodes j in a cluster are maximum, a cluster head i is subjected to cluster head handover. In order to solve the problem that the solution may be trapped in the local optimal solution in the Q learning, other actions are randomly selected to try to find the global optimal solution, and this process is called exploration. Also, in order to solve the problem, in the present invention, each node updates its own Q table after receiving the HELLO message of the neighboring node, and the update of the HELLO message is periodic, which is equivalent to the exploration. Therefore, the action with the maximum Q value is directly selected as the global optimal solution. As long as all the sampling operations are repeated in all the states and the operation values are discrete values, Q learning must converge to the optimal solution. The algorithm of the invention satisfies all the conditions of convergence. A node is equivalent to a state, each node is sampling all its neighbors with HELLO messages, and the action values (Q values) are also discrete. Therefore, it can be proved that the algorithm proposed by the present invention converges to an optimal value.

In the invention, by clustering the geographic position and the time slot, the broadcast conflict in the neighbor node discovery stage is greatly reduced, and the discovery efficiency and the accuracy of the neighbor node are improved. By protecting the time slot of the central node of the cluster, the probability that the central node of the cluster becomes the cluster head is improved, namely, the number of the cluster heads is reduced by improving the coverage rate of the cluster heads to the nodes of the cluster. Meanwhile, through the introduction of a Q learning algorithm, each node locally maintains a Q table, and the Q value of the neighbor node in the cluster is updated every time the HELLO message of the neighbor node in the cluster is received. The introduction of Q learning enables the information stored by the node to have not only the information of the current round but also the previous experience, and reduces the contingency caused by the collision of broadcast packets in the network. After each round of neighbor node discovery is finished, the cluster head traverses the Q table to select the action of handing over the cluster head or keeping the cluster head. And the ordinary node also dynamically updates the Q table locally so that the ordinary node can have past experience to continue making subsequent selections after the cluster head is handed over to the ordinary node. Through the mode, the replacement of the cluster head caused by inaccurate information collection of the current wheel is reduced, and the stability of the cluster head is improved.

(II) data transmission stage:

in order to improve the reliability of the cluster head transfer packet, the invention adopts a time slot preemption mechanism. When the HELLO stage is finished, a channel preemption time slot of 1ms is reserved when a data transmission time slot begins, if a cluster head needs to perform cluster head handover, the channel preemption time slot of 1ms is preempted, all other nodes can monitor in the 1ms, if the node needs to perform the cluster head handover, the other nodes of the next 10ms are silent, a pure channel is reserved for the cluster head to perform the cluster head handover, when a candidate cluster head does not respond, the cluster head starts retransmission for one time, and if the candidate cluster head does not respond for two times, a suboptimal node is selected to perform the cluster head handover. If the 1ms is not preempted by any node, then the next time a TC message flood and data transfer will take place.

All cluster heads in the network periodically generate TC messages. The TC message includes IDs of all nodes in the cluster and geographical location information of the cluster head. The cluster head forwards TC messages generated by other cluster heads after receiving the messages so that they can be broadcast to the entire network. In order to make the TC message generated by the cluster head be successfully broadcast to the adjacent cluster, the present invention follows the method before the present invention, that is, the information rate is reduced to 1/4, so that the transmission bandwidth is reduced to 1/4, thereby achieving 2 times of communication range between cluster heads.

In order to solve the problem that TC messages sent by cluster heads are redundant under the condition that a plurality of cluster heads exist in a cluster, the invention adopts the following method: after the cluster head node A receives the TC message, if the TC message is sent by another cluster head B in the same cluster and the TC packet of the node A is not sent out, the node A compares the cluster node set of the node B with the cluster node set of the node A:

(i) if the cluster node set of B contains node A and the cluster node set of node A, node A demotes itself to be a normal node and removes the TC message to be sent from the queue (in this case, the message collection is insufficient due to the collision of HELLO messages).

(ii) If the cluster node set of B does not completely contain the node A and the cluster node set of the node A, deleting the TC message in the queue, deleting the node contained in the received TC message from the cluster node of B, and regenerating a new TC message.

In this way, the information of the TC messages of the cluster head within the cluster can be made complementary to reduce the size of the TC.

Based on the above scheme, in order to avoid as much as possible the situation that one cluster head can realize full coverage because the nodes outside the cluster center circle send the TC first, but there are two or three cluster heads at last, the present invention continues the following optimization:

when a cluster head sends a TC packet:

(1) the intra-circle node backoff time is a fixed slot DIFS + a random backoff time t1, i.e., the random backoff time is any value of (0, t 1). (2) The back-off time of the out-of-circle node is a fixed time slot (DIFS + t1), and the random back-off time is t 2; by the method, the priority of the TC message sent by the nodes in the circle can be improved, and the nodes more suitable for the cluster head are further protected to become the cluster head.

In order to solve the problem that the existing cluster head cannot realize the full cluster node coverage when the network topology changes, the invention additionally generates a new cluster head in the following way. After the common node A receives the TC message, if the TC message is sent by a cluster head B in the same cluster:

(i) if the intra-cluster node in the TC message contains the node A, the node A changes the cluster head of the node A into the node B.

(ii) If the intra-cluster nodes in the TC messages received in two consecutive rounds do not contain the node A, the node A becomes a cluster head, and deletes the nodes contained in the received TC messages from the adjacent nodes in the cluster of the node A, and then generates a new TC message.

In order to solve the problem that the cluster head moves out of the cluster before the handover is not carried out yet and a new node enters a new cluster, namely the cluster head does not exist in the cluster, the invention adopts the following scheme: if the common node does not receive the TC message of the cluster in one round of data transmission time slot, the common node enters the original cluster head election mechanism in the next round, and the node closest to the center of the cluster is selected as a new cluster head. In the route discovery process, a greedy forwarding principle is used, and the data packet is forwarded hop by selecting the neighbor node closest to the target node cluster head as the relay node, so that the hop count of the selected path is shortest. Thus, a smaller end-to-end delay can be achieved. In addition, as the nodes only need to maintain the information of the adjacent nodes and the geographical positions of all cluster heads in the network, the performance is not reduced due to frequent disconnection of the links. Thus, routing overhead and time to establish routes in the network are reduced.

The technical solution of the present invention is further described below with reference to experiments.

In order to evaluate the overall performance of the cross-layer protocol QLCT provided by the invention more clearly, the performance evaluation of the QLCT protocol is carried out by using three performance indexes of packet loss rate, average end-to-end time delay and throughput in simulation. The network performances of OLSR and AODV routing protocols under the IEEE 802.11 protocol used by the MAC layer are respectively compared in the simulation.

The simulation scene is as follows: the 60 nodes in the simulation are evenly distributed within a rectangle of size 866m x 833 m. The wireless transmission distance is 250m, the carrier sense range cr is 550m, the channel rate B is 1Mbps, and the side length for cellular clustering is set to be 166.7 m. The minimum slot interval in the network is 20us, the period for generating HELLO and TC in the protocol is 2s, and the unlisted parameters are set to default values in NS-2. To verify the performance of the protocol under different network loads, the CBR data flow rate is increased from 25kbps to 350kbps at 25kbps step intervals, gradually increasing the network traffic load.

Fig. 7 shows packet loss rate performance from a source node to a destination node as a data generation rate increases. Fig. 8 shows the average end-to-end delay performance from the source node to the destination node as the data generation rate increases. Fig. 9 shows the throughput performance from the source node to the destination node as the data generation rate increases. Simulation results show that compared with AODV and OLSR, the QLCT protocol has better network throughput performance, lower packet loss rate and average end-to-end delay and can transmit data more efficiently.

It should be noted that the embodiments of the present invention can be realized by hardware, software, or a combination of software and hardware. The hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory and executed by a suitable instruction execution system, such as a microprocessor or specially designed hardware. Those skilled in the art will appreciate that the apparatus and methods described above may be implemented using computer executable instructions and/or embodied in processor control code, such code being provided on a carrier medium such as a disk, CD-or DVD-ROM, programmable memory such as read only memory (firmware), or a data carrier such as an optical or electronic signal carrier, for example. The apparatus and its modules of the present invention may be implemented by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., or by software executed by various types of processors, or by a combination of hardware circuits and software, e.g., firmware.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. A clustering cross-layer communication processing method based on geographic position and Q learning is characterized by comprising the following steps:

in the data transmission stage, the cluster head sends TC information to perform topology discovery of the whole network, and when a data transmission request exists, the nodes select relay nodes to perform data transmission by using the cluster head geographical position of a cluster where a target node is located in a routing table and combining a greedy principle;

the Q learning in the neighbor node discovery phase is a policy action that the accumulation of environment reward values obtained from the environment by the action selected by the system is maximum, namely pi: s → a; wherein Q is an update formula shown in formula (1):

where α represents the learning rate, γ is a discount factor, r_iIs a return function; the tendency of the system to generate this action is primarily determined by the environmental awardThe reward value is a reward function, the trend is stronger and weaker if the reward value is positive, and the trend is weaker and weaker if the reward value is negative;

the Q learning cluster head selection method comprises the following steps: in the first round of discovery completion of the neighbor nodes, firstly, the cluster head is initially selected, and the node closest to the cluster center becomes the cluster head; starting from the second round of the neighbor node discovery phase, each node locally stores a self Q table in a distributed manner; the Q table is a structure array, and cluster neighbor node IDs, Qval and Qflag are stored in the structure; qval is the Q value of the neighbor node, and represents that the current round has received the message of the node when Qflag is 1, and represents that the current round has not received the HELLO message of the node when Qflag is 0; in the HELLO stage, when HELLO information of a neighbor node in a cluster is received, correspondingly updating Qval corresponding to the neighbor node and setting Qflag to 1; when the HELLO timer expires, setting the Qflag corresponding to the node which does not receive the HELLO message in the current round and has the node ID in the Q table to be 0; if two continuous rounds of the nodes do not receive the HELLO message of the node stored in the Q table, namely the Qflag of the neighbor node is 0 in two continuous rounds, deleting the node from the Q table; adding no information in the Q table for the neighbor nodes which are not in the cluster; if i is the current node and m cluster neighbor nodes exist in i, a Q table as follows is locally stored in the node i; the Q table has m +1 columns, wherein one column is a current node, the Qval of the current node is constantly 0, and the Qflag is constantly 1; each column corresponds to an action;

the Qval value is updated according to equation (2);

Q_k+1(s_k,a_k)＝(1-α)Q_k(s_k,a_k)+αr_k (2)

the reward function is defined by equation (3):

r_k＝w₁(d_j-d_i)+w₂(S_j-S_i)-w₃(D_j-D_i) (3)

in formula (3), d_iIs the number of neighbor nodes in the cluster of the current node, d_jThe number of neighbor nodes in the cluster of the neighbor node j; s_iIs the stability of the current node i, S_jIs the stabilization of a neighbor node jSex; d_iIs the distance from the current node i to the cluster center, D_jIs the distance from the neighbor node j to the cluster center; when r is larger than 0, the Q value obtains positive feedback, and positive effect is generated on the handover of the cluster head to the node j; when r is less than 0, the Q value obtains negative feedback and has positive effect on cluster head maintenance;

the stability S of a node is defined as follows:

wherein N is_xIs the current set of neighbor nodes for the x node,

is a neighbor node set obtained after the previous round of HELLO messages; each node has a neighbor node set reserved for the last time locally; when the HELLO timer expires, the node calculates S according to the reserved neighbor node set and the current neighbor set; relatively stable nodes have higher S values; for static networks, S for each node is 1; for all common nodes, such as the node j, the Q table is only updated, each time the Q table is updated, the Q table is equivalent to the feedback obtained after a hypothetical action is performed, namely, the Q table is updated by switching the cluster head or keeping the obtained feedback value under the condition that the node j is used as the cluster head; for all cluster head nodes, after the neighbor node discovery phase is finished, traversing Qval of a node with Qflag being 1 in a Q table, and selecting an ID corresponding to the node with the maximum Q value; i is the current cluster head node, and when the Q value corresponding to i is maximum, i continues to play the role of the cluster head; when the Q value corresponding to the neighbor node j is maximum, i transfers the role of the cluster head to the node j, namely, the optimal selection strategy:

wherein A is_kDenotes a_kA set of actions that can be selected; when the maximum Q value is obtained, the action a is selected_kThe process of (2); if it is notWhen the cluster head performs the cluster head transfer action, a cluster head transfer packet is sent out at the beginning of a data transmission time slot; when two consecutive rounds of Q values corresponding to the neighbor nodes j in the cluster are maximum, the cluster head i carries out cluster head handover; selecting the action with the maximum Q value as a global optimal solution; a node is equivalent to a state, and each node is sampling all its neighbors using HELLO messages.

2. The method for processing clustered cross-layer communication based on geographical location and Q learning according to claim 1, wherein the method for processing clustered cross-layer communication based on geographical location and Q learning employs a TDMA mechanism to periodically divide time into time slices with time of 2s, and in each time slice, divide the time slices into a neighbor discovery period and a number transmission period; the neighbor discovery period occupies 0.09s, each small time slot occupies 0.01s, and the rest period is the data transmission period.

3. The method of claim 1, wherein the neighbor node discovery phase comprises: dividing the honeycomb, if the radius of the honeycomb is R, dividing a circle with the radius of 1/2R at the center of each honeycomb, and equally dividing the rest part of each cluster into six parts; the indexes from 1 to 3 are marked on the circle at the center of the honeycomb in sequence, and the indexes from 4 to 9 are marked on other areas of each honeycomb in sequence; meanwhile, dividing a neighbor node discovery period into 9 time frames equally, numbering the time frames sequentially from 1 to 9, and distributing the time frames to regions with the same index value in a regular hexagon cluster, wherein each time frame consists of n small time slots; the wireless transmission distance is 3/2 times of the radius of the honeycomb, when a node is arranged in a cluster center circle, the node in the circle can communicate with all nodes in the honeycomb, and the full coverage of the node in the cluster is realized by one cluster head.

4. The method according to claim 3, wherein a time slot preemption mechanism is adopted in the data transmission phase, and after the HELLO phase, a 1ms channel preemption time slot is reserved when the data transmission time slot starts, if a cluster head needs to perform cluster head handover, the 1ms channel preemption time slot is preempted, all other nodes monitor in the 1ms, if a node performs preemption, the next 10ms other nodes are silent, a pure channel is reserved for the cluster head to perform cluster head handover, when a candidate cluster head does not respond, the cluster head starts retransmission once, and if no response is made twice, a suboptimal node is selected to perform cluster head handover; if the 1ms is not occupied by any node, TC message flooding and data transmission are carried out at the next time;

5. The method of claim 4, further comprising:

when a cluster head sends a TC packet:

6. The clustered cross-layer communication processing system based on the geographic position and Q learning, which is used for implementing any one of claims 1-5, is characterized by comprising: