CN107094321B - Multi-agent Q learning-based vehicle-mounted communication MAC layer channel access method - Google Patents

Multi-agent Q learning-based vehicle-mounted communication MAC layer channel access method Download PDF

Info

Publication number
CN107094321B
CN107094321B CN201710205247.0A CN201710205247A CN107094321B CN 107094321 B CN107094321 B CN 107094321B CN 201710205247 A CN201710205247 A CN 201710205247A CN 107094321 B CN107094321 B CN 107094321B
Authority
CN
China
Prior art keywords
vehicle
action
learning
node
nodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710205247.0A
Other languages
Chinese (zh)
Other versions
CN107094321A (en
Inventor
赵海涛
于洪苏
沈箬怡
杜艾芊
朱洪波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN201710205247.0A priority Critical patent/CN107094321B/en
Publication of CN107094321A publication Critical patent/CN107094321A/en
Application granted granted Critical
Publication of CN107094321B publication Critical patent/CN107094321B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W74/00Wireless channel access
    • H04W74/08Non-scheduled access, e.g. ALOHA
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/12Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention discloses a vehicle-mounted communication MAC layer channel access method based on multi-agent Q learning, wherein each vehicle node constructs own joint state-action mapping relation and joint strategy in a VANETs environment; then judging whether a new vehicle node is added in the VANET network; if so, the newly added vehicle nodes quickly acquire an action space, a state space and a reward function through transfer learning, and then each vehicle node updates the joint state-action pair relation and the joint strategy of the vehicle node; if not, judging whether the current vehicle node has data to be sent; if data are to be sent, determining an action strategy solution meeting the relevant balance according to an eEQ algorithm; selecting actions from the action set which can enable the multi-agent system to finally reach relevant balance; determining a CW value and transmitting data by accessing a wireless channel with the CW value. The invention improves the probability of successful data transmission, reduces the back-off times, and effectively improves the data packet receiving rate, the end-to-end transmission delay problem and the like.

Description

Multi-agent Q learning-based vehicle-mounted communication MAC layer channel access method
Technical Field
The invention belongs to the technical field of Internet of things, and relates to a method for realizing MAC layer channel access based on multi-agent Q learning in vehicle-mounted communication.
Background
After the invention of motor vehicles from the second industrial revolution, with the rapid development of the automotive field, automobiles have become an indispensable part of people's modern life. With the pace of daily life of people accelerating, the use of vehicles such as buses and private cars is increasingly common. The automobile brings convenience to daily travel of people and causes many problems such as traffic jam, environmental pollution, traffic accidents and the like. Traffic congestion becomes a serious social problem, brings a lot of problems for road users, and causes a great deal of fuel waste and time waste due to traffic congestion every year. Not only make people's daily trip waste a large amount of time on the car road, the haze that fuel waste and exhaust emission etc. caused seriously threatens human health. Traffic accidents have also become one of the biggest threats to human life. In view of this, there is a need for safer, greener (e.g., less exhaust emissions), fully automated, more comfortable entertainment experience for passengers, etc. for future vehicle travel. Thus, to make the traffic infrastructure safer and more efficient, the traffic system must be intelligent enough. The ITS (Intelligent Transportation Systems) has been developed to improve road traffic safety, alleviate traffic congestion, reduce automobile fuel consumption and protect the environment, and has received extensive attention in both academic and industrial fields. ITS aims to improve the quality, efficiency and safety of future traffic systems using information and communication technologies. More advanced ITS technology will be deployed in the future to effectively manage urban traffic and improve highway and road safety. In addition, access to broadband networks via ITS technology is expected to revolutionize passenger and driver QoE (quality of experience) entertainment applications. The vehicular ad-hoc network (VANET) can support ITS application, and is used as an important component of the ITS to improve traffic safety and traffic efficiency, reduce oil consumption by relieving traffic congestion and protect the environment and provide safe and comfortable experience for passengers, so that most novel applications (such as mobile information entertainment) are brought into operation. VANETs applications can be divided into the following categories: security-related applications, traffic management and traffic efficiency applications, user entertainment services, and network connectivity applications, among others. These VANETs applications have varying demands on VANET networks. The secure message is to ensure fast access and short transmission delay, and the message is only valid for a short time. The data volume of the entertainment service is large, and the requirement on synchronization is strict. As VANETs are expected to be used in a wide variety of applications, VANETs networks need to support a wide variety of needs. Safety applications should be able to broadcast warning messages wirelessly between adjacent vehicles in order to quickly inform the driver of dangerous situations. To ensure efficiency, it is better that the secure application transmits data with a periodic delay, and the MAC (Media Access Control) protocol plays a crucial role in VANET providing efficient data transmission. The MAC protocol is located at the data link layer and not only needs to ensure fairness in channel access, but also needs to provide multi-channel cooperation and error control. It is therefore necessary to design efficient and reliable MAC protocols for VANET.
At present, various VANETs MAC protocols are proposed in succession, and the WAVE standard adopts IEEE 802.11p to realize an MAC layer and is based on CSMA/CA. However, when the backoff counters of multiple vehicles are decremented to zero to access the channel simultaneously, the CSMA-based protocol inevitably collides, especially in a high-density scenario, and causes an infinite increase in access delay and serious packet loss. In addition to the CSMA protocol, most researchers prefer to employ TDMA-based access mechanisms in VANETs, especially security applications. The TDMA protocol allocates different time slots for different vehicles which are closest to each other, so that the TDMA protocol has determined channel access time delay, good expandability and small transmission interference. However, due to the high speed mobility of the vehicle-mounted environment and the dynamic nature of the network density, VANETs distributed time slot scheduling becomes very difficult. In addition, some documents improve the conventional backoff algorithm, the MILD algorithm and the EIED algorithm are researched and compared on the basis of the conventional binary exponential backoff algorithm, the network performance is improved after the two algorithms are optimized, and then the backoff algorithm based on the statistical times is proposed on the basis of the newMILD algorithm, namely, after the vehicle node is successfully accessed into the wireless channel to transmit data, the contention window is reduced, but the algorithm sets a threshold value for increasing the opportunity that the vehicle node failed in data transmission is accessed into the wireless channel. And when the number of times that the node accesses the wireless channel continuously and successfully transmits data is larger than the threshold value, setting the value of the contention window of the node as the maximum value. Similarly, when the number of times of continuous failures of the node to access the wireless channel to transmit data is larger than the threshold value, the value of the contention window of the node is set to be the minimum value. Finally, simulation proves that the algorithm effectively reduces the influence of the hidden node on the network performance and improves the fairness of the node accessing to the wireless channel. There is also a document that proposes a minimum Contention Window adjustment algorithm based on the estimation of the number of neighbor nodes, i.e., Adaptive CWmin algorithm, which changes the adjustment rule of the minimum CW (Contention Window) and dynamically adjusts CWmin according to the use condition of the network channel. The relation between the CW value and the number of nodes is deduced on the basis of an IEEE 802.11 broadcast backoff Markov model, the minimum CW value is dynamically adjusted by estimating the number of neighbor nodes, and simulation proves that the algorithm is superior to other methods for improving the broadcast receiving rate. In addition, after the node successfully sends data, the optimal CWMin value adaptive to the vehicle-mounted network condition is calculated according to the function. The algorithm proposed in the document selects reasonable CW after the retransmission of the data packet, shortens the time for the competing node to wait for the retransmission, and increases the network throughput.
However, the above prior art is improved on the basis of the BEB algorithm, and in general, when data collision is about to be backed off, the CW value is multiplied by the CW value, and the CW value is restored to 15 after the data is successfully transmitted, and if a plurality of nodes finish transmitting data at the same time, the CW value is restored to 15, and when data is transmitted again, collision occurs again. The network load condition is considered less, and the method is not suitable for networks with different load degrees, namely, the method has no expandability on traffic flows with different densities, and the channel access fairness is not effectively improved.
Disclosure of Invention
Aiming at some problems in the prior art, the invention provides a method for realizing vehicle-mounted communication MAC layer channel access based on multi-agent Q learning, which is an IEEE 802.11p MAC layer data transmission method-QL-CW based on multi-agent Q learningMulti-AgentThe algorithm is completely different from the traditional BEB algorithm, and each vehicle node continuously and interactively learns the surrounding environment by using the Q learning algorithm in the VANET network environment. Vehicle nodes are repeatedly tried and error in the VANETs environment, a Competition Window (CW) is dynamically adjusted according to feedback signals (namely reward values) obtained from the surrounding environment, and vehicle nodes newly added into the VANET network environment learn the network environment more quickly by means of transfer learning. The vehicle nodes not only need to learn the state-action pair mapping relation of the vehicle nodes according to the environment, but also need to learn the state-action pair relation of other vehicle nodes in the network environment, so as to construct a combined state-action pair relation constrained by other vehicle nodes for the vehicle nodes, finally obtain the combined strategy of the vehicle nodes, select a CW value which can enable other vehicle nodes to obtain the highest reward value according to the combined strategy, and enable the nodes to always obtain the highest reward valueAnd the optimal CW (namely the CW value selected when the reward value obtained from the surrounding environment is maximum) is accessed to the channel so as to reduce the collision rate of data frames and the transmission delay and improve the fairness of the node access to the channel.
Therefore, the technical scheme adopted by the invention is a vehicle-mounted communication MAC layer channel access method based on multi-agent Q learning, and the method comprises the following steps:
step 1: in a VANETs environment, each vehicle node constructs a self joint state-action mapping relation and a joint strategy according to the current network environment and other vehicle nodes;
step 2: judging whether a new vehicle node is added in the VANET network;
and step 3: if so, the newly added vehicle nodes quickly acquire an action space, a state space and a reward function through transfer learning, and then each vehicle node updates the joint state-action pair relation and the joint strategy of the vehicle node;
and 4, step 4: if not, judging whether the current vehicle node has data to be sent;
and 5: if data are to be sent, determining an action strategy solution meeting the relevant balance according to an eEQ algorithm;
step 6: selecting actions which enable the multi-agent system to finally reach relevant balance from the { I, K, R } action set;
and 7: determining a CW value after the action is executed, and accessing a wireless channel to transmit data according to the CW value;
and 8: whether a message needs to be sent exists in the current vehicle node or not, and if not, ending; if yes, returning to execute the step 2.
Further, in step 3, if a new vehicle node is added into the VANET, the newly added node can quickly acquire a state space, an action space and a reward function through transfer learning, and construct a joint state-action pair mapping relationship and a joint strategy that are constrained by other vehicle nodes.
Compared with the prior art, the invention has the beneficial effects that:
1. the vehicle node of the invention continuously interacts with the surrounding environment by utilizing the Q learning algorithm, dynamically adjusts the competition window according to the reward signal fed back by the network environment, so that the node can always access the channel with the optimal CW value when sending data next time, the probability of successful data sending is improved, the backoff times are reduced, and the problems of the data packet receiving rate, the end-to-end transmission delay and the like are effectively improved.
2. And the vehicle node newly added into the network environment acquires a joint strategy by transferring and learning the quick learning state-action pair mapping relation. The QL-CW proposed by the inventionMulti-agentThe communication node of the algorithm can quickly adapt to unknown environment, the receiving rate of the data packet and the transmission delay of the data packet are effectively improved, and more importantly, the QL-CW algorithmMultiThe agent algorithm can provide higher fairness for the node access channel and is suitable for network environments with different load degrees.
3. The invention reduces the collision rate and transmission time delay of data frames, improves the fairness of the nodes to access into the channel, different vehicle nodes perform Q learning in the VANET and use different CW values to access into the wireless channel according to the learning result, and can see that if the vehicle node information is successfully sent, the CW value is reduced to 15, but the CW value is gradually reduced by using the Q learning and continuously exploring, and meanwhile, the opportunity of other vehicle nodes to access into the wireless channel is also considered, so that the fairness of the vehicle nodes to access into the wireless channel in the vehicle-mounted self-organizing network is obviously improved, and the algorithm is also applicable no matter how many vehicle nodes are in the network, namely the wireless channel access method provided by the invention has expansibility to different network load scenes.
Drawings
Fig. 1 is a flow chart of a vehicle node accessing a wireless channel by using the invention in vehicle-mounted communication.
Detailed Description
The invention is described in further detail below with reference to the accompanying drawings.
As shown in fig. 1, the method of the present invention comprises the steps of:
step 1: in a VANETs environment, each vehicle node constructs a self joint state-action mapping relation and a joint strategy according to the current network environment and other vehicle nodes;
step 2: judging whether a new vehicle node is added in the VANET network;
and step 3: if so, the newly added vehicle nodes quickly acquire an action space, a state space and a reward function through transfer learning, and then each vehicle node updates the joint state-action pair relation and the joint strategy of the vehicle node;
and 4, step 4: if not, judging whether the current vehicle node has data to be sent;
and 5: if data are to be sent, determining an action strategy solution meeting the relevant balance according to an eEQ algorithm;
step 6: selecting actions which enable the multi-agent system to finally reach relevant balance from the { I, K, R } action set;
and 7: determining a CW value after the action is executed, and accessing a wireless channel to transmit data according to the CW value;
and 8: whether a message needs to be sent exists in the current vehicle node or not, and if not, ending; if yes, returning to execute the step 2.
Wherein QL-CWMulti-agentThe algorithm comprises the following contents:
the number of vehicles in the whole vehicle-mounted self-organizing network is N, namely the intelligent agent set in the multi-intelligent agent Q learning system is N ═ 1,2nDiscrete set A representing actions that a vehicle n can perform during backoff of an access channel in an on-board ad hoc networknE { I, K, R }, i.e. including increasing (Increase) the contention window, keeping (Keep) the contention window size unchanged, decreasing (Reduce) the contention window, at some time from A, vehicle nnTo select the action to be executednAnd (4) showing. Then the N vehicles select the joint action set of the competition window value in the back-off process as A ═ A1×A2×...×ANThe contention window value used by the vehicle to access the wireless channel at a certain time, i.e. the discrete set of environmental conditions, is set as S ═ 15,31,63,127,255,511,1023, and R is usednReward function representing successful transmission of data by vehicle n during access to the channel to obtain a reward from the network environment, since the reward value of the multi-agent system depends on all vehiclesThe joint action of the vehicle, the prize value is represented by S × A → R. Vehicle n adopts a fixed one-step strategy at time t
Figure BDA0001259636180000051
The joint strategy is denoted by pi.
In the backoff process that vehicle nodes in a vehicle-mounted self-organizing network need to send data to access a wireless channel, action models, state spaces and reward functions between any two vehicle nodes are the same, so when a new vehicle is added into the vehicle-mounted self-organizing network, the learning speed and efficiency of the vehicle nodes can be improved because the knowledge learned by a certain vehicle node can be used for strengthening the learning of other vehicle nodes, the new vehicle node can directly learn to other vehicle nodes in a transfer learning manner so as to quickly learn the state-action mapping relation of the new vehicle node and update the Q value iteration method of a Q table, and the final aim is to ensure that the vehicle node newly added into the vehicle-mounted self-organizing network can quickly learn the adaptive environment and solve tasks by using the least prior knowledge learned from other vehicle nodes. Therefore, knowledge transfer can be performed among the agents in the multi-agent system, and newly added vehicle nodes can learn the network environment more quickly by using transfer learning. The transfer learning process is as follows:
what is migrated: the action space, the state space and the reward function of any two vehicle nodes in the Q learning process are the same, so that the Q table obtained by the vehicle nodes in the vehicle-mounted self-organizing network through the Q learning can be migrated to the vehicle node newly added into the vehicle-mounted self-organizing network through the migration learning, and only the first Q maximum items in the Q table are migrated (sorted according to the Q values) in consideration of communication overhead.
How to migrate: the learned information is broadcast upon request using broadcast communication.
When to migrate: and when a new vehicle node is added into the vehicle-mounted self-organizing network, the vehicle node is migrated.
The specific migration process is as follows: when a new vehicle node joins the vehicle-mounted ad hoc network, the new vehicle node broadcasts a piece of migration request information, each vehicle node receiving the information starts a timer, and the timer value is inversely proportional to the distance between vehicles. The vehicle whose timer expires first broadcasts the largest Q entries in its Q table. And once the newly added vehicle node receives the migration information, updating the Q table of the newly added vehicle node according to the migration information, thereby accelerating the learning process.
Since the Q learning algorithm depends largely on the action value function, i.e., the Q function. In the single agent Q learning process, the strategy expression (namely the mapping relation from the state to the probability of selecting each action) selected by the agent is pi*(s), the Q value function Q (s, a) is the expected reward value that the agent obtains from the environment after performing action a in state s, after which the agent follows the policy
Figure BDA0001259636180000061
The action of the next state is executed. In a multi-agent system, the Q value function Q of the vehicle nnDepending on the joint action a of all agents and limited by the joint policy pi, the expression is as follows:
Figure BDA0001259636180000062
Figure BDA0001259636180000063
where s (t +1) denotes the next state, i.e. the vehicle n has performed the action anAnd (t) sending the contention window value used when the data needs to be accessed to the wireless channel again. Wherein T is SxAxS → [0,1 → []Representing a state transition probability function. Then T (s (T), a)1(t),a2(t),...,aN(t), s (t +1)) represents a transition probability from the state s (t) to the state s (t + 1). Sigma A (t +1) represents that each agent is according to strategy pinPerforms the action anA reward value Q is obtained after (t +1)n(s(t+1),a1(t+1),...,aN(t +1)), i.e. the value of the CW (i.e. s (t +1)) used by the vehicle n to retransmit the data access radio channel after having performed the I/K/R action (increasing CW/keeping CW unchanged/decreasing CW), i.e. the value of the reward value obtainable from the network environment. Gamma belongs to [0,1) as a discount factor, the larger the gamma isIndicating a higher degree of emphasis on the current prize value, and conversely indicating a higher degree of emphasis on subsequent prize values. Formula 1 shows that when a vehicle n has data to send at time t and accesses a wireless channel through a contention window s (t), other vehicles respectively select to execute action a1To aN(each action respectively represents increasing CW/keeping CW unchanged/decreasing CW), then the vehicles continue to learn interactively in the vehicle-mounted self-organizing network environment according to the strategy, and once the vehicles need to access the wireless channel to transmit data, each vehicle can perform a back-off process with an optimal CW value and then access the wireless channel to transmit data.
The final goal of reinforcement learning is that each agent can find the optimal strategy and select the action with the maximum value function. In cooperative gaming, the correlation balance is a matrix of probability distributions over the joint action space. The Q learning method for finally realizing the relevant balance defines a state-value function through the linear combination of Q functions based on the relevant action strategies, and is defined as follows:
Figure BDA0001259636180000071
wherein Vnk(sk) Indicating that agent n is at s at the k-th iterationkA state-value function in a state, which represents the relative equilibrium cooperation degree of the multi-agent in the state; a ═ a1,...,an,...,aN],anIs the action performed by the nth agent, N represents the number of agents in the multi-agent system; a represents the multi-agent in state skThe set of available federated actions below; qn(k-1)(skAnd a) denotes that agent n is at s during the k-1 iterationkThe Q-value function of joint action a is executed in the state. Pin *(skAnd a) is a probability distribution vector of the joint action set A, representing that an agent n is at skThe following best correlation equalization action strategy.
The decision and Q value functions of other agents are considered by the combined action strategy of the agents in the multi-agent reinforcement learning, so that the accumulated reward values of all agents are increased. For state skDown slave federated action policyThe action of selecting the assigned to the nth agent may determine the associated equalization action policy by the following inequality constraint:
Figure BDA0001259636180000072
A-n=Πm≠nAm,
a-n=Πm≠nam,
a=(a-n,an) Equation 4
Wherein A isnRepresenting the action set of the nth agent, A-nRepresenting a set of federated actions of other agents than agent n, an∈AnRepresents the action of the nth agent, a-n∈A-nRepresenting the combined action of agents other than agent n. a isn' represents any one of the agent n set of actions; pinRepresenting a feasible solution for the nth agent to satisfy all the action strategies (i.e., action probabilities) of the above-equation associative equilibrium. 4.4 in the equation a set of linear inequality constraints, π, are defined for solving the optimal correlation equilibrium pointnIs an unknown variable and the Q-value function is a known variable.
After determining the action strategy solution satisfying the correlation equilibrium according to formula 4, obtaining pi according to eEQ (correlated equilibrium Q, correlation equilibrium Q learning) algorithm (i.e. maximizing the minimum value of all intelligent agent rewards)nAnd determining the action which can always maximize the system state-value function for each intelligent agent according to a formula 3, so that the multi-intelligent-agent system can finally reach relevant balance.
In the VANETs environment, vehicle nodes utilize a Q learning algorithm to repeatedly try and error in the surrounding environment and continuously learn interactively with the environment, and a Competition Window (CW) is dynamically adjusted in the node backoff process according to a feedback signal given by the VANETs environment, so that the nodes can always access a channel with the optimal CW (the CW selected when a reward value obtained from the surrounding environment is maximum).
The invention applies the multi-agent Q learning algorithm to the vehicle-mounted communication MAC channel access method, and deduces the combined action set of a plurality of vehicle nodes in the Q learning process and the Q value iterative expression limited by the combined strategy pi. In the process that the vehicle node accesses the wireless channel by using the Q learning method in the vehicle-mounted self-organizing network, in order to reduce competition with other vehicle nodes, the vehicle node selects to execute joint action related to other vehicle nodes. Meanwhile, transfer learning is introduced into the multi-agent Q learning system, so that the learning speed of a vehicle node newly added into the vehicle-mounted self-organizing network is increased, and the time delay of the vehicle node for accessing a wireless channel to transmit data is greatly reduced. And finally, in order to enable the multi-agent system to finally reach relevant balance, calculating an optimal solution of an action strategy according to an eEQ (maximizing the minimum value awarded by all agents, namely maximizing the times of successfully sending data by accessing the vehicle nodes into the wireless channel), and then allocating actions which can always maximize the awarding values for the vehicle nodes according to the optimal action strategy, so that each vehicle node can access the wireless channel with the optimal CW value to successfully send data to the greatest extent, and the fairness of accessing each vehicle node into the wireless channel is remarkably improved.

Claims (2)

1. A vehicle-mounted communication MAC layer channel access method based on multi-agent Q learning is characterized by comprising the following steps:
step 1: in a VANETs environment, each vehicle node constructs a self joint state-action mapping relation and a joint strategy according to the current network environment and other vehicle nodes;
step 2: judging whether a new vehicle node is added in the VANET network;
and step 3: if so, the newly added vehicle nodes quickly acquire an action space, a state space and a reward function through transfer learning, and then each vehicle node updates the joint state-action pair relation and the joint strategy of the vehicle node;
and 4, step 4: if not, judging whether the current vehicle node has data to be sent;
and 5: if data are to be sent, determining an action strategy solution meeting the relevant balance according to an eEQ algorithm;
step 6: selecting actions which enable the multi-agent system to finally reach relevant balance from the { I, K, R } action set;
and 7: determining a CW value after the action is executed, and accessing a wireless channel to transmit data according to the CW value;
and 8: whether a message needs to be sent exists in the current vehicle node or not, and if not, ending; if yes, returning to execute the step 2;
QL-CWMulti-agentthe algorithm comprises the following contents:
the number of vehicles in the whole vehicle-mounted self-organizing network is N, namely the intelligent agent set in the multi-intelligent agent Q learning system is N ═ 1,2nDiscrete set A representing actions that a vehicle n can perform during backoff of an access channel in an on-board ad hoc networknE { I, K, R }, i.e. including increasing (Increase) the contention window, keeping (Keep) the contention window size unchanged, decreasing (Reduce) the contention window, at some time from A, vehicle nnTo select the action to be executednThat means, the joint action set of N vehicles selecting the contention window value in the backoff process is a ═ a1×A2×...×ANThe contention window value used by the vehicle to access the wireless channel at a certain time, i.e. the discrete set of environmental conditions, is set as S ═ 15,31,63,127,255,511,1023, and R is usednA reward function representing successful transmission of data from the network environment by vehicle n during access to the channel, the reward value of which is represented by S x A → R since it depends on the joint action of all vehicles, vehicle n adopts a fixed one-step strategy at time t
Figure FDA0002374591440000011
The joint strategy is denoted by pi;
in the backoff process that vehicle nodes in a vehicle-mounted self-organizing network need to send data to access a wireless channel, action models, state spaces and reward functions between any two vehicle nodes are the same, so when a new vehicle is added into the vehicle-mounted self-organizing network, the learning speed and efficiency of the vehicle nodes are improved because the knowledge learned by a certain vehicle node can be used for strengthening the learning of other vehicle nodes, the state-action mapping relation is directly learned to other vehicle nodes in a transfer learning manner so as to enable the new vehicle node to quickly learn the adaptive network environment, and a Q value iteration method for rapidly learning the state-action mapping relation and updating a Q table is finally aimed at enabling the vehicle nodes newly added into the vehicle-mounted self-organizing network to quickly learn the adaptive environment and solve tasks by utilizing the least prior knowledge learned from other vehicle nodes, so that knowledge transfer is performed among the intelligent agents in the multi-agent system, the newly added vehicle node learns the network environment more quickly by using transfer learning, and the transfer learning process is as follows:
what is migrated: the action space, the state space and the reward function of any two vehicle nodes in the Q learning process are the same, so that a Q table obtained by the vehicle nodes in the vehicle-mounted self-organized network through Q learning is migrated to the vehicle nodes newly added into the vehicle-mounted self-organized network through migration learning, and only the first Q maximum items in the Q table are migrated (sorted according to the Q values) in consideration of communication overhead;
how to migrate: broadcasting the learned information according to the request using the broadcast communication;
when to migrate: when a new vehicle node is added into the vehicle-mounted self-organizing network, the vehicle node is migrated;
the specific migration process is as follows: when a new vehicle node joins in the vehicle-mounted self-organizing network, the new vehicle node broadcasts a piece of migration request information, each vehicle node receiving the information starts a timer, the timer value is in inverse proportion to the distance between vehicles, the vehicle with the timer arriving first broadcasts the largest Q item in the Q table, and once the newly joined vehicle node receives the migration information, the Q table of the newly joined vehicle node is updated according to the migration information, so that the learning process is accelerated.
2. The multi-agent Q learning-based vehicle-mounted communication MAC layer channel access method according to claim 1, characterized in that in step 3, if a new vehicle node joins in VANET, the newly joined node can rapidly acquire a state space, an action space and a reward function through transfer learning, and construct a joint state-action mapping relation and a joint strategy constrained by other vehicle nodes.
CN201710205247.0A 2017-03-31 2017-03-31 Multi-agent Q learning-based vehicle-mounted communication MAC layer channel access method Active CN107094321B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710205247.0A CN107094321B (en) 2017-03-31 2017-03-31 Multi-agent Q learning-based vehicle-mounted communication MAC layer channel access method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710205247.0A CN107094321B (en) 2017-03-31 2017-03-31 Multi-agent Q learning-based vehicle-mounted communication MAC layer channel access method

Publications (2)

Publication Number Publication Date
CN107094321A CN107094321A (en) 2017-08-25
CN107094321B true CN107094321B (en) 2020-04-28

Family

ID=59646410

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710205247.0A Active CN107094321B (en) 2017-03-31 2017-03-31 Multi-agent Q learning-based vehicle-mounted communication MAC layer channel access method

Country Status (1)

Country Link
CN (1) CN107094321B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108924944B (en) * 2018-07-19 2021-09-14 重庆邮电大学 LTE and WiFi coexistence competition window value dynamic optimization method based on Q-learning algorithm
CN109582022B (en) * 2018-12-20 2021-11-02 驭势科技(北京)有限公司 Automatic driving strategy decision system and method
CN110488781B (en) * 2019-08-26 2021-09-21 华南理工大学 Production system scheduling method based on migration reinforcement learning
CN113347596B (en) * 2021-05-21 2022-09-20 武汉理工大学 Internet of vehicles MAC protocol optimization method for neighbor quantity detection and Q learning
CN114375066B (en) * 2022-01-08 2024-03-15 山东大学 Distributed channel competition method based on multi-agent reinforcement learning

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103490413A (en) * 2013-09-27 2014-01-01 华南理工大学 Intelligent electricity generation control method based on intelligent body equalization algorithm
CN104967670A (en) * 2015-06-01 2015-10-07 南京邮电大学 Vehicle network-accessing method based on IEEE 802.11p
CN105306176A (en) * 2015-11-13 2016-02-03 南京邮电大学 Realization method for Q learning based vehicle-mounted network media access control (MAC) protocol
CN106026084A (en) * 2016-06-24 2016-10-12 华南理工大学 AGC power dynamic distribution method based on virtual generation tribe

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103490413A (en) * 2013-09-27 2014-01-01 华南理工大学 Intelligent electricity generation control method based on intelligent body equalization algorithm
CN104967670A (en) * 2015-06-01 2015-10-07 南京邮电大学 Vehicle network-accessing method based on IEEE 802.11p
CN105306176A (en) * 2015-11-13 2016-02-03 南京邮电大学 Realization method for Q learning based vehicle-mounted network media access control (MAC) protocol
CN106026084A (en) * 2016-06-24 2016-10-12 华南理工大学 AGC power dynamic distribution method based on virtual generation tribe

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
车载通信中基于Q学习的信道接入技术研究;杜艾芊;赵海涛;刘南杰;《计算机技术与发展》;20170310;摘要、第〇节以及第2节 *

Also Published As

Publication number Publication date
CN107094321A (en) 2017-08-25

Similar Documents

Publication Publication Date Title
CN107094321B (en) Multi-agent Q learning-based vehicle-mounted communication MAC layer channel access method
Stanica et al. Enhancements of IEEE 802.11 p protocol for access control on a VANET control channel
CN102244683B (en) Method for improving service quality of mixed businesses in vehicular networking application
CN107864028B (en) Self-adaptive frame aggregation method in vehicle self-organizing network
CN105306176A (en) Realization method for Q learning based vehicle-mounted network media access control (MAC) protocol
CN109905921B (en) Multi-channel environment Internet of vehicles V2R/V2V cooperative data transmission scheduling method
CN109474897B (en) Hidden Markov model-based vehicle networking safety message single-hop cooperative broadcasting method
Alcaraz et al. Control-based scheduling with QoS support for vehicle to infrastructure communications
Feukeu et al. Dynamic broadcast storm mitigation approach for VANETs
Nguyen et al. Joint offloading and IEEE 802.11 p-based contention control in vehicular edge computing
CN104967670A (en) Vehicle network-accessing method based on IEEE 802.11p
CN111132083A (en) NOMA-based distributed resource allocation method in vehicle formation mode
CN110691349B (en) Adaptive control method for safe application-oriented combined power and competition window in Internet of vehicles
Deng et al. Implementing distributed TDMA using relative distance in vehicular networks
Facchina et al. Speed based distributed congestion control scheme for vehicular networks
Ali Shah et al. Coverage differentiation based adaptive tx-power for congestion and awareness control in vanets
CN113423087B (en) Wireless resource allocation method facing vehicle queue control requirement
CN109257830B (en) QoS-based vehicle-mounted network self-adaptive back-off method
CN106851765A (en) A kind of method for optimizing of the transmission trunking node of In-vehicle networking emergency safety message
Lu et al. Predictive contention window-based broadcast collision mitigation strategy for vanet
Lee et al. Back-off improvement by using q-learning in ieee 802.11 p vehicular network
CN108934081B (en) Wireless vehicle-mounted network channel access method
Gopinath et al. Channel status based contention algorithm for non-safety applications in IEEE802. 11p vehicular network
CN114916087A (en) Dynamic spectrum access method based on India buffet process in VANET system
Şahin et al. Scheduling out-of-coverage vehicular communications using reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant