CN111132083B

CN111132083B - A NOMA-based distributed resource allocation method in vehicle formation mode

Info

Publication number: CN111132083B
Application number: CN201911214993.1A
Authority: CN
Inventors: 郭彩丽; 许世琳; 冯春燕; 王兆丰
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2019-12-02
Filing date: 2019-12-02
Publication date: 2021-10-22
Anticipated expiration: 2039-12-02
Also published as: CN111132083A

Abstract

The invention discloses a NOMA-based distributed resource allocation method in vehicle formation mode, which belongs to the field of wireless communication. The method proposed by the present invention first decouples the resource allocation problem into two parts: power allocation and sub-channel allocation, and then proposes a power allocation scheme based on the fleet driving state and a spectrum based on distributed multi-agent reinforcement learning (RL, reinforcement learning). Solve the allocation plan. In the power allocation part, by comparing with the fixed power allocation scheme, the power allocation scheme considering the safety distance proposed by the present invention can provide fairer communication performance for vehicle formations in different lanes; in the spectrum allocation part, the scheme proposed by the present invention The powerful self-learning ability of reinforcement learning can be fully utilized, and a faster convergence rate can be obtained by considering the neighborhood iteration order based on queue position in multi-agent Q-learning. On the premise of ensuring V2I communication, the present invention maximizes the total throughput of the V2mV link by utilizing the distributed resource allocation based on NOMA, and improves the communication performance of the system.

Description

NOMA-based distributed resource allocation method in vehicle formation mode

Technical Field

The invention belongs to the field of wireless communication, relates to a Non-Orthogonal Multiple Access (NOMA) communication system, and particularly relates to a distributed resource allocation method in a vehicle formation mode in an internet of vehicles.

Background

With the advent of the automatic driving era, the driving mode of automobiles will change greatly, and in order to reduce driving cost, environmental pollution and traffic accidents, the vehicle formation travel mode will become one of the important driving modes in the automatic driving era^[1]. In thatIn each formation, vehicles need to share surrounding traffic and road condition information, abundant entertainment application information and the like. Specifically, the vehicles with rich resources in the fleet communicate with other vehicles in an information sharing manner, so that a stable driving mode of the whole fleet and high-quality driving experience of drivers and passengers are maintained. However, the above process is vehicle to multi-vehicle communication, and cannot be realized by conventional vehicle to vehicle communication (V2V). In the face of increasingly serious shortage of spectrum resources, in order to meet the requirement of V2mV communication in a vehicle queue, the invention introduces NOMA technology in the Internet of vehicles, which mainly passes through a power domain^[2]Or code field^[3]Multiplexing allows users to access the same channel non-orthogonally, and the receiving end demodulates the received signal by using Serial Interference Cancellation (SIC) technique. Therefore, the NOMA can greatly improve the system throughput under the condition of reducing the dependence on a large amount of spectrum resources, and meet the large-scale communication connection requirement in a vehicle formation scene.

Currently, resource allocation research based on NOMA in the internet of vehicles just starts to develop, and currently, frequency spectrum resource research of NOMA is mainly a centralized scheme, and a distributed resource allocation scheme is less. Boya Di^[4]A matching theory based spectrum resource allocation algorithm is proposed to support NOMA based vehicle to everything (V2X) communication. Yiyi Xu^[5]By classifying and grouping in the internet of vehicles, a centralized spectrum allocation scheme based on NOMA is proposed. Chen^[6]The problem of NOMA-based resource allocation was studied using interference hypergraph and graph coloring theory. Although much research is conducted on a centralized scheme at present, the centralized scheme has the disadvantages of incomplete Channel State Information (CSI), delayed communication request and response, and the like, and thus cannot meet the requirements of high reliability and low delay of vehicle-mounted communication. Therefore, a distributed approach is needed to implement NOMA-based resource allocation.

[1] Huang, D.Chu, C.Wu, and Y.He, IEEE Transactions on Intelligent Transportation Systems, vol.20, No.3, pp.959-974,2018.

[2] Y.saito, y.kishiyama, a.benjebbour, t.nakamura, a.li, and k.higuchi, & wireless access for non-orthogonal multiple access (NOMA) for cellular networks, In 2013IEEE 77th vehicular technology conference (VTC Spring), pp.1-5, IEEE 2013.

[3] L.dai, b.wang, y.yuan, s.han, i.chih-Lin, and z.wang, non-orthogonal multiple access of 5G: solutions, challenges, opportunities and future research trends IEEE Communications Magazine, vol.53, No.9, pp.74-81,2015.

[4] B.di, l.song, y.li, and g.y.li, V2X communication for high reliability and low latency in 5G systems using non-orthogonal multiple access IEEE Journal on Selected Areas in Communications, vol.35, No.10, pp.2383-2397,2017.

[5] Xu and X.Gu, NOMA-based V2V System resource Allocation, In 2018International Conference on Network Infrastructure and Digital Content (IC-NIDC), pp.239-243, IEEE,2018.

[6] Chen, B.Wang, and R.Zhang, resource allocation for interference maps in NOMA-based V2X networks IEEE Internet of Things Journal, vol.6, No.1, pp.161-170,2018.

Disclosure of Invention

The invention aims to solve the problems, and provides a distributed resource allocation method based on NOMA (non-uniform resource allocation) by utilizing NOMA (non-uniform resource allocation) technology to realize the reuse of the same resource according to a distributed resource allocation principle, which is applied to a vehicle formation mode in a vehicle networking. The invention considers a vehicle networking scene that a vehicle and an infrastructure (V2I) link and a V2mV link coexist, realizes the maximization of the total throughput of the V2mV link, and ensures the normal communication of the V2I link.

In order to achieve the technical effect, the implementation steps of the NOMA distributed resource allocation method based on the vehicle formation mode of the invention comprise:

step one, considering the influence of large-scale fading and small-scale fading of a wireless channel in a system model, and establishing a channel model;

step two, under the condition of protecting the normal communication of the V2I link, the transmission rate of the V2mV link is maximized, and the optimization target is set to be the maximum total throughput of the V2mV link;

thirdly, considering the influence of the frequency reuse of the V2I link and the V2mV link on the normal communication of the V2I link, characterizing the transmission rate of the V2I link considering the interference, and performing constraint characterization on the transmission rate;

step four, with the maximum total throughput of the V2mV link as an optimization target, taking a transmission rate threshold value, power allocation constraint and subchannel allocation constraint of the V2I link as constraint conditions of an optimization problem, constructing a distributed resource allocation model based on NOMA under vehicle formation, and decoupling the optimization problem into two parts of power allocation and subchannel allocation;

step five, adopting a power distribution scheme based on lane conditions;

step 501, analyzing and deducing the channel state of the V2mV link;

502, generating a power distribution scheme among NOMA according to the channel states of links of different lanes V2 mV;

step six, representing sub-channel distribution by using a distributed multi-agent Q-learning algorithm, and accelerating convergence speed by considering a neighborhood iteration sequence based on a formation position;

601, constructing a multi-agent Q-learning framework;

step 602, updating a Q table and a strategy;

step 603, determining the sub-channel allocation scheme.

The invention has the advantages that:

(1) on the premise of not influencing the basic communication quality of V2I, the V2I communication and the V2mV link share the spectrum resource, so that the shortage of the spectrum resource is relieved;

(2) NOMA technology is introduced into the Internet of vehicles, and a user is allowed to be accessed to the same channel in a non-orthogonal manner through power domain and code domain multiplexing technology, so that the system throughput is greatly improved under the condition of reducing the dependence on a large number of spectrum resources;

(3) the resource allocation based on NOMA is realized by adopting a distributed scheme, the maximum total throughput of a V2mV link is kept on the basis of realizing V2I communication, and the requirements of high reliability and low time delay of vehicle-mounted communication are met;

drawings

FIG. 1: the V2mV communication system model schematic diagram based on NOMA in the vehicle formation mode in the vehicle networking is disclosed by the embodiment of the invention;

FIG. 2: the embodiment of the invention provides a flowchart of a distributed resource allocation method based on NOMA in a vehicle formation mode;

FIG. 3: the power distribution scheme in the present invention is compared to the average throughput of the V2mV link on different lanes for the fixed power distribution scheme mentioned in the summary of the invention (graph).

FIG. 4: the invention compares the graph (graph) with the cumulative distribution function of other various resource allocation schemes on the V2I link throughput;

FIG. 5: the present invention is a graph (graph) of the total throughput of V2mV links versus other resource allocation schemes.

FIG. 6: the present invention is a graph (graph) comparing the average run time with other resource allocation schemes.

FIG. 7: the invention is compared with other resource allocation schemes in convergence performance (graph).

Detailed Description

In order that the technical principles of the present invention may be more clearly understood, embodiments of the present invention are described in detail below with reference to the accompanying drawings.

The communication system model of the present invention is shown in fig. 1 and comprises an autonomous driving section of U unidirectional lanes, in which a V2mV link coexists with a V2I link, and different lanes specify different driving speeds ({ V [ ]₁,…,v_U}) and safety distance ({ d)₁,…,d_U}). SV in model_kAnd SV_k'(K, K '∈ {1,2, …, K }, K ≠ K') denotes K individual traveling vehicles, PVn and PVm (N, m ∈ {1,2, …, N }, N ≠ m) denotes N formation of autonomous vehicles. Each vehicle formation has a speed V ═ V specified in the lane₁,…,v_NAnd the corresponding vehicle safety distance D ═ D₁,…,d_NAnd driving in sequence. The vehicle formation n and the m respectively comprise a vehicle set of psi_nAnd Ψ_mIn which is defined

And

respectively the sending vehicles in convoy n and m,

(v∈{2,…,|Ψ_n| }) and

and

for the v and w receiving vehicles in convines n and m, respectively.

In the scene, two communication modes of V2I and V2mV are mainly adopted, wherein in V2I communication, a base station and an SV are adopted_kAnd SV_k'The channel gains of the communications are respectively

In V2mV communication

And

and

the channel gains of the communications are respectively

And

and

the channel gains of the communications are respectively

In addition, the base station pair

Respectively is

To pair

And

respectively is

To pair

Interference of

For the V2I link, each individual traveling vehicle receives information from the base station through Orthogonal Frequency Division Multiple Access (OFDMA). To alleviate the spectrum resource shortage situation, the present invention assumes that the V2mV link reuses the spectrum resources allocated to the V2I link using an underlay pattern in a Cognitive Radio (CR) network. For convenience, the present invention refers to NOMA-based intra-formation communications collectively as V2 mV.

Referring to fig. 2, a flowchart of a distributed resource allocation method based on NOMA in a vehicle formation mode according to the present invention includes the steps of:

step one, characterizing a channel model S1: in the system model, the path loss is mainly consideredLarge scale fading caused and small scale fading caused by doppler effect. Large-scale fading G based on distance d and path loss exponent γ_LIs defined as:

wherein G is₀Is at a reference distance d₀Attenuation of (G) of_rxAnd G_txThe gain of the antenna is represented by,

is related to the carrier frequency f_cAnd the wavelength of the speed of light c. The presence of fast fading can be demonstrated using a rayleigh channel model due to the doppler effect caused by relative velocity. Based on statistical distribution theory and law of large numbers, the impulse response h (t, tau) of the channel follows a complex Gaussian distribution with amplitude | h_i(t) | obeys a rayleigh distribution of:

where σ is a constant and σ > 0.

Step two, optimizing target representation S2: the invention provides a reference transmission rate as a threshold value for judging whether a V2I link is interrupted, and maximizes the transmission rate of a V2mV link on the basis of protecting V2I link communication so as to meet the requirement of information sharing among teams. Therefore, the optimization goal of the present invention is to maximize the overall throughput of the V2mV link. The invention researches the situation that two receiving users exist, and can be popularized to the situation that more than 2 receiving users exist in formation. The internal interference of V2mV, the mutual interference caused by multiplexing the same channel l with V2mV n by other V2mV links, and the interference caused by the base station are respectively defined as

And

further, the throughputs of user v and user w in formation n are obtained

And

respectively as follows:

wherein omega_lL denotes the set of available spectrum resources, {1,2, …, L being the number of spectrum resource blocks that can be allocated, L ∈ Ω_lFor the frequency band allocated to V2mV n, mu_vAnd mu_wPower allocation factors for users v and w, respectively, based on the NOMA power multiplexing rule, assuming channel gain in the formation

Is lower than

At this time mu_v＞μ_w，μ_v+μ_w＝1。P_nAnd P_mTransmission powers of V2mV n and m respectively,

and B_lRespectively base station transmission power and bandwidth at frequency band l, N₀Is the power spectral density of Additive White Gaussian Noise (AWGN).

The optimization objective is to maximize the total throughput of the V2mV link, characterized by:

wherein

N is the total number of V2mV links.

Step three, interference rate constraint characterization S3: since V2mV link set omega_kThe same frequency band is shared by the V2I link, and the interference on the V2I link k is

Corresponding interference rate

Comprises the following steps:

wherein

Is V2mV n vs. SV_kThe interference of (2).

Vehicle formation model based on spectrum reuse for guaranteeing communication quality of V2I link and throughput of V2I link k

Should be given as p₀Is greater than a predetermined threshold

Namely:

wherein K is the number of cellular users in the model;

step four, establishing an optimization model S4: taking the throughput of each V2mV link as an optimization variable, maximizing the total throughput of the V2mV link as an optimization target, and taking a constraint condition which needs to be met by spectrum multiplexing and the maximum power limit of the V2mV link and the V2I link as optimization conditions, establishing an optimization model of a resource allocation problem based on NOMA:

wherein the first constraint represents the throughput of the V2I link k

Should be given as p₀Is greater than a predetermined threshold

In the second constraint of S _n,l1 indicates that the frequency band l has been allocated to V2mV n, S _n,l0 indicates that the frequency band l is not allocated to V2mV n; the third and fourth constraints give the maximum number of multiplexes of sub-channels/where L_maxDefining the maximum multiplexing number of the frequency band; in the fifth and sixth constraints

For the received power of the V-th vehicle in V2mV n,

and

limiting the maximum power of V2mV n and the base station, respectively.

The optimization problem is a non-convex MINLP problem due to the discrete domain of the optimization target channel allocation result and the continuous domain limitation of the power allocation result. Due to the extremely high computational complexity of the exhaustive search algorithm, it is not practical to obtain a global solution through it. Therefore, similar to other resource allocation solution schemes, the present invention decouples the entire resource allocation problem into two sub-problems, power allocation and sub-channel.

Step five, power distribution characterization S5: the power allocation can be divided into power allocation between V2mV (inter-V2mV) and power allocation inside V2mV (intra-V2 mV). The principle of power allocation of intra-V2mV is essentially power multiplexing of NOMA, and much research has been done on power multiplexing technology of NOMA so far, therefore, the present invention uses the power multiplexing scheme proposed by Zhiguo Ding for power allocation of intra-V2 mV. Next, the invention focuses on the inter-V2mV power distribution problem with optimization problems of the first, the fifth and the six constraint conditions, and proposes a power distribution scheme based on lane conditions.

Step 501, channel state analysis and derivation S51: because the influence of path loss on effective signals is generally far greater than fast fading in a traditional channel model, the invention provides that power distribution is reasonably adjusted according to the safe distance corresponding to different lanes, thereby reducing the difference of throughput among NOMA on different lanes. Based on reasonable assumptions and theoretical derivation, the invention provides an effective inter-V2mV power allocation scheme. Derived from

And

throughput R of V2mV n_nComprises the following steps:

randomly selecting vehicles v and v +1(2 ≦ v < v +1 ≦ Ψ ≦ v ≦ 1 ≦ Ψ ≦ n_n) Their channel gains satisfy

According to the principle of Serial Interference Cancellation (SIC) demodulation of NOMA, the content of the receiving vehicle v +1 is demodulated at the vehicle v +1 and the vehicle v, and the signal to interference and noise ratios (SINR) thereof are respectively recorded as

And

satisfies the following conditions:

wherein

Indicating an equivalent derivation. The content of the receiving vehicle v is demodulated at the vehicle v +1 and the vehicle v respectively, and the signal to interference and noise ratio is recorded as

And

satisfies the following conditions:

obtaining V2mV nDischarge volume R_nThe approximation is:

wherein

Represents an equivalent condition to condition Δ:

v＝|Ψ_nand l, w is 1. The present invention assumes that this approximation equation can be established as long as SIC can be successfully performed in each formation.

Step 502, generating a power allocation scheme between NOMA S52: the invention assumes formation of n and m on different lanes, their throughputs being R respectively_nAnd R_m. Thus, R_nAnd R_mThe difference of (d) is:

suppose that

Therefore, there are:

wherein d is_nAnd d_mThe safe distance between vehicles in the formation n and the formation m of the vehicles respectively.

In this way, the reference power of the V2mV link supporting NOMA is introduced. And distributing the reference power to the lane with the minimum safety distance, and completing the NOMA-based vehicle formation power distribution on other lanes by the above formula.

Step six, sub-channel allocation characterization S6: due to the strong autonomous learning capacity of Q-learning in a complex strange environment, in order to solve the problem of sub-channel resource allocation with first, second, third and fourth constraint conditions, the Q-learning-based reinforcement learning framework is introduced. Unlike conventional Q-learning, the algorithm proposed by the present invention decomposes the global optimal solution into a plurality of approximately optimal local solutions. In the process, each agent only considers the state and action of the adjacent agent, so that the state space and the action space of each agent can be reduced to a relatively small scale, and the convergence performance of each agent is remarkably improved. At this point, since path loss plays a major role in channel gain, the effect of neighboring agent states is more significant for each agent than for distant agents, so it is a feasible solution to solve for a local solution instead of a global solution. The invention assumes that each agent can receive the state of the adjacent agent, and makes decision according to the state of the adjacent agent without considering the agent with longer distance, so as to reduce the dimension of feasible solution on the premise of ensuring the solution quality.

Step 601, constructing a Q-learning frame S61: the proposed Q-learning framework mainly comprises five basic components: a) the intelligent agent, b) action, c) state, d) reward and e) iteration sequence, the algorithm is characterized in that the state and the iteration sequence of adjacent intelligent agents are considered, and the specific meaning of each part is as follows:

a) agent-each agent corresponds to each V2mV, i.e., {1,2, …, N }, and thus, there are multiple agents in the proposed reinforcement learning framework.

b) The actions are as follows: the action set a ═ (1,2, …, L) is the set of subchannels that the agent chooses in a uniformly distributed manner, each action a ∈ a corresponding to each spectrum L.

c) The state is as follows: the state of each agent is defined as S ═ { V, W, P, Ω }, S ∈ S, where V, W, P and Ω represent the states of the agent' S relative velocity, position, power allocation, and subchannel allocation, respectively, given the limited number of neighboring agents.

d) Rewarding: for agent n, actions based on its previous state and selectionTo do, will reward the function Re_n(s, a) is defined as the throughput of agent n and is passed through

Wherein l is implemented as a.

e) And (3) iteration sequence: determining the Q-learning sequence of the V2mV link according to the distance from the formation position to the start point of the road section

Is sorted in descending order, defined as

Step 602, update Q table and policy S62: in order to obtain an agent

Based on the optimal subchannel allocation solution of the iteration sequence, the proposed algorithm needs to use a Q-table to store the reward values resulting from different states and actions. According to Bellman's optimal equation, agent

The optimal Q value of (a) is defined as:

wherein

Wherein p is_ss'For transition probability from state s to s ', r (s, a) is the reward obtained by action a in state s, γ is the discount factor, φ is the number of adjacent agents, Z is the set of integers, and a ' is the action performed in state s '. At each iteration, the Q table will be updated:

where α is the learning rate, a^*For optimal behaviour in the state s, i.e.

s' is the next state reached after completing action a at state s.

Action selection strategy pi for selecting action a by agent and further updating Q table_aComprises the following steps:

wherein, strategy pi_aCorresponding to the probability of selecting action a and the probability of exploration epsilon, respectively, | A | being an agent

The total number of actions selected.

Step 603, determining a sub-channel allocation scheme S63: and obtaining a converged Q table through S62, selecting the optimal action and state according to the converged Q table, and determining the optimal sub-channel allocation scheme.

Fig. 3 verifies the effectiveness of the proposed power allocation scheme of the present invention by simulating the average throughput of V2mV links on different lanes, where the power allocation scheme represents a safe distance based inter-NOMA power allocation scheme and no power allocation represents the same power allocation to all V2mV links. As can be seen from the figure, the average throughput distribution of the V2mV link on each lane is relatively uniform compared to the case without power distribution, and therefore, the V2mV link on each lane will obtain a fair communication service.

Fig. 4, 5, 6 and 7 show simulation results of performance indexes such as average total throughput, operation time and convergence performance of the V2I link and the V2mV link based on the inter-NOMA power allocation scheme proposed by the present invention. The algorithm is named as a distributed NOMA resource allocation algorithm based on multi-agent Q-learning, and is compared with other distributed and centralized comparison algorithms. In distributionIn contrast to the distributed V2V resource allocation algorithm based on multi-agent Q-learning, the algorithm divides each V2mV link in the NOMA scheme into | Ψ_nl-1D 2D-based V2V link, the other parameters of which are the same as for the NOMA scheme. Centralized schemes include a group theory algorithm, a greedy algorithm, and a stochastic algorithm. The software and hardware parameters of the server used for simulation are as follows: window Server 2019, Intel (R) Xeon (R)2.6GHz processor, 16GB RAM.

Fig. 4 compares the cumulative distribution function of the present invention with various other resource allocation schemes with respect to the throughput of the V2I link, and it can be seen that the scheme of the present invention is superior to the V2V scheme, and the performance of the scheme is slightly inferior to greedy and random resource allocation algorithms, but more than 90% of the V2I link can reach the reference rate.

Fig. 5 compares the total throughput of V2mV link according to the present invention with various other resource allocation schemes, and it can be seen that the exhaustion method can achieve better performance with less advantages at the cost of huge computational complexity compared with the proposed scheme. Compared with the V2V scheme, the proposed scheme is generally more advantageous except for the case of a smaller number of queues, because the V2V scheme can utilize more spectrum resources than the NOMA scheme when the spectrum utilization environment is not congested, and the advantage of the proposed algorithm gradually appears as the V2mV link increases. Furthermore, the performance of the proposed algorithm is clearly superior to the centralized algorithm described above.

Fig. 6 and 7 compare the average runtime and convergence performance of the present invention with other resource allocation schemes, respectively. Compared with the traditional Q-learning resource allocation algorithm, the multi-agent of the proposed algorithm can update the Q table at the same time, so that the time for the algorithm to converge is shorter. The superiority of convergence verifies the effectiveness of the algorithm in considering the iteration sequence and the strategy of the adjacent agent states. As shown in fig. 6, the proposed algorithm consumes less runtime than the V2V approach, and can also be verified in fig. 7 by its smaller number of iterations. As can be seen from fig. 6, the proposed algorithm will consume more runtime than a centralized solution, but still within an acceptable time frame.

In summary, by implementing the NOMA-based distributed resource allocation method in the vehicle formation mode according to the embodiment of the present invention, fairer and efficient communication can be realized for the V2mV links between different lanes, and the transmission rate of the V2mV link is greatly increased on the basis of ensuring the communication of the V2I link.

While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims

1. a distributed resource allocation method based on NOMA under a vehicle formation mode, is characterized in that, this resource allocation method comprises:

Step 1: Consider the influence of large-scale fading and small-scale fading of the wireless channel in the system model, and establish a channel model;

Step 2: Maximize the transmission rate of the vehicle to multi-vehicle (V2mV) link while protecting the normal communication of the V2I link, and set the optimization goal to maximize the total throughput of the V2mV link;

First characterize the V2mV internal interference, the mutual interference caused by other V2mV links and V2mV n multiplexing the same channel 1, and the interference caused by the base station

and

Among them, Ψ _n is the set of vehicles included in the vehicle formation n, μ _v and μ _w are the power distribution factors of the receiving vehicles v and w, respectively, P _n and P _m are the transmit power of the transmitting vehicles in V2mV n and m, respectively, P _l ^c is the transmit power of the base station at frequency band l,

is the channel gain for the communication between the sending vehicle and the v-th receiving vehicle in vehicle formation n,

is the interference of the sending vehicle in vehicle formation m to the vth receiving vehicle in formation n,

is the interference of the base station to the vth receiving vehicle in vehicle formation n, Ω _l ={1,2,...,L} represents the available spectrum resource set, L is the number of spectrum resource blocks that can be allocated, l∈Ω _l is the available spectrum resource block The frequency band assigned to V2mVn;

Second, characterize the throughput of user v and user w in formation n respectively

and

Among them, B _l is the base station transmit bandwidth at frequency band l, N ₀ is the power spectral density of additive white Gaussian noise, μ _v and μ _w are the power allocation factors of users v and w, respectively,

is the channel gain of the communication between the sending vehicle and the wth receiving vehicle in vehicle formation n;

Finally, the optimization objective is characterized as maximizing the total throughput of the V2mV link:

in,

P={P _n , P _m , P _l ^c },

N is the total number of autonomous vehicle formations;

Step 3: Consider the impact of frequency reuse of V2I and V2mV links on the normal communication of V2I links, characterize the V2I link transmission rate considering interference, and characterize it with constraints;

Characterize the throughput of V2I link k

for:

in,

is the interference received by the V2I link k,

is the disturbance of V2mV n to the k-th single driving vehicle,

is the channel gain of the communication from the base station to the kth single-driving vehicle;

The vehicle formation model based on spectrum reuse In order to ensure the communication quality of the V2I link,

should be greater than the established threshold with probability p ₀

Among them, K is the total number of vehicles traveling alone;

Step 4: With the optimization goal of maximizing the total throughput of the V2mV link, the transmission rate threshold, power allocation constraints and sub-channel allocation constraints of the V2I link are used as the constraints of the optimization problem, and a NOMA-based distributed resource under vehicle formation is constructed. allocation model and decouple the optimization problem into two parts: power allocation and subchannel allocation:

where the first constraint represents the throughput of V2I link k

should be greater than the established threshold with probability p ₀

_Sn,l = 1 in the second constraint means that frequency band l is allocated to V2mV n, and Sn _,l =0 means that frequency band l is not allocated to V2mV n; the third and fourth constraints give the sub-channel The maximum multiplexing number of l, where _Lmax is defined as the maximum multiplexing number of the frequency band; in the fifth and sixth constraints

is the received power of the vth vehicle in V2mV n,

and

Limit the maximum power of V2mVn and base station respectively;

Step 5: Adopt a power distribution scheme based on lane conditions, and adjust the power distribution reasonably according to the safety distances corresponding to different lanes, thereby reducing the difference in throughput between NOMAs on different lanes:

Firstly, the calculation method of the throughput Rn of the _n -th V2mV formation is analyzed, and the unequal relationship between the signal-to-interference and noise ratio within the formation is deduced, and the approximate formula of _Rn is obtained. Secondly, in order to reduce the difference between NOMAs on different lanes, Calculate the difference function _Rn- Rm of the throughputs of formations n and _m on different lanes, the difference is a function of the vehicle formation transmission power _Pn and _Pm :

Among them, Ψ _n and Ψ _m are the vehicle sets contained in vehicle formations n and m,

is the channel gain for the communication between the transmitting vehicle in the _nth vehicle formation and the |Ψn| vehicle,

is the channel gain for the communication between the transmitting vehicle in the _mth vehicle formation and the |Ψm| vehicle;

Finally, in order to reduce the variability of throughput, the reference power of the V2mV link supporting NOMA is introduced, the reference power is allocated to the lane with the smallest safety distance, and the

Complete the NOMA-based vehicle formation power distribution in other lanes, where d _n and d _m are the safe distances between vehicles in vehicle formations n and m, respectively, and γ is the path loss index;

Step 6: Characterize the sub-channel assignments with a distributed multi-agent Q-learning algorithm, and speed up the convergence by considering the neighborhood iteration order based on the formation position:

First construct the Q-learning framework, define the agent, action, state, reward and iteration order in the framework; where the agent is the V2mV formation, the action is the sub-channel selected by the agent in a uniformly distributed manner, and the state is determined by the relative It consists of speed, position, power allocation and sub-channel state. The reward is the throughput of the agent, and the iteration order determines the order of Q-learning for the V2mV link;

Secondly, obtain the optimal sub-channel assignment solution of the agent based on the iteration order, use the Q table to store the reward values obtained by different states and actions, and obtain the optimal Q value according to the Bellman optimal equation, and select the strategy π _a according to the action To update the Q table, the strategy π _a is:

Among them, the value of policy π _a corresponds to the probability of selecting action a and the exploration probability ε, respectively, and A is the agent

The set of actions to choose, |A| is the agent

the total number of actions selected,

for the agent

Q value in state s, action a;

Finally, the converged Q table is obtained, and the optimal action and state are selected according to the converged Q table to determine the optimal sub-channel allocation scheme.