CN112616131B

CN112616131B - Internet of vehicles resource allocation method based on video content priority

Info

Publication number: CN112616131B
Application number: CN202011457896.8A
Authority: CN
Inventors: 冯春燕; 陈九九; 郭彩丽; 刘芳芳; 杨洋
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2020-12-11
Filing date: 2020-12-11
Publication date: 2022-05-13
Anticipated expiration: 2040-12-11
Also published as: CN112616131A

Abstract

The invention discloses a vehicle networking resource allocation method based on video content priority, and belongs to the field of vehicle networking communication. Firstly, designing a video content priority evaluation method, quantizing the content priorities of different videos, and calculating the weight of video resources with different content priorities in the distribution process by utilizing inter-frame difference; secondly, calculating the video transmission distortion rate and the bit error rate under the dynamic channel, and constructing a resource distribution optimization model based on the video content priority by taking the effective information amount as a target to be maximized; then, according to the optimization model, constructing an agent, a state space, an action space and an environment feedback of a distributed multi-agent reinforcement Q learning algorithm model for vehicle networking resource allocation; and finally, training the distributed multi-agent reinforcement Q learning algorithm model, and solving the optimization problem of the vehicle networking resource allocation based on the video content priority. According to the invention, the content difference of different videos and the dynamic channel condition of the Internet of vehicles are considered, the bandwidth and power resource allocation from the vehicle end to the edge server end are jointly optimized, and the video content understanding performance of the edge server end can be effectively improved.

Description

Internet of vehicles resource allocation method based on video content priority

Technical Field

The invention belongs to the field of vehicle networking communication, relates to a video transmission and video content understanding system in a vehicle networking scene, and particularly relates to a vehicle networking resource allocation method based on video content priority.

Background

In an internet of vehicles scenario, vehicles are typically equipped with multiple high definition cameras. According to data statistics, more than 90% of driving environment information can be acquired through the camera. Meanwhile, the computer vision technology adopts an artificial intelligence algorithm represented by deep learning, fully understands and analyzes the content of a large amount of video data acquired by the vehicle-mounted camera, and is beneficial to completing intelligent decision processing of various services in automatic driving. However, computer vision technology is currently a separate area of research, and it is generally accepted that understanding of video content is done on raw video data, ignoring the information loss caused by video compression and transmission during communication. In practice, a great number of tasks based on video content understanding, such as target detection and tracking, vehicle abnormal behavior analysis, and the like, pose a great challenge to the computing power of the vehicle end. Due to hardware and power limitations, it is nearly impossible for a vehicle to perform all of the intelligent tasks. One possible solution is to move some of the tasks to an edge server (a unit typically deployed at the roadside) for collaborative computing with the help of a car networking. In addition, the transmission of large amounts of video data from the vehicle to the edge server to limited communication resources, such as bandwidth and power, places tremendous pressure. Therefore, there is a need to develop an effective resource allocation method to improve the resource utilization of the car networking, so as to better complete the task of understanding the video content.

The traditional vehicle-mounted network resource allocation methods are mainly divided into two types, one is a resource allocation method based on Quality of Service (QoS), and the other is a resource allocation method based on Quality of Experience (QoE). The QoS-based approach is mainly to optimize the network throughput, transmission rate, delay, jitter, etc. efficiency by reasonable allocation of communication resources. The QoE-based approach is mainly to allocate communication resources to meet various subjective needs of the receiving user.

The existing resource allocation methods mainly consider video transmission efficiency or user experience, and do not consider the transmitted video content. Thus, when the transmitted video is used to accomplish the video content understanding task, the conventional method is no longer optimal. In this case, it is necessary to consider the importance of different video contents and allocate sufficient and reliable resources to the video with higher priority to improve the accuracy of the subsequent video content understanding task. Therefore, there is a need to develop more efficient resource allocation methods for video content in an internet of vehicles scenario.

Disclosure of Invention

In order to solve the problems, the invention provides a video content priority-based vehicle networking resource allocation method by combining a traditional communication theory and a reinforcement learning theory, optimizes power and spectrum allocation from a vehicle end to an edge server end in a combined manner, and realizes maximization of effective information quantity of video transmission.

The method comprises the following specific steps:

firstly, constructing a vehicle networking communication system model based on video content priority;

the vehicle networking communication model comprises a Mobile Edge server (MEC) and M Mobile intelligent automobiles provided with cameras. The communication and calculation process of the whole system is as follows: 1) the method comprises the steps that a vehicle initiates a video transmission request, the collected video is preprocessed, the priority of video content is calculated based on an interframe difference method, a calculation result is uploaded to an edge server, and meanwhile, the edge server acquires Channel State Information (CSI); 2) the edge server obtains an optimal resource allocation result according to the maximization of the effective information quantity of video transmission; 3) the vehicle completes the coding compression of the video according to the result of the resource allocation and transmits the video to the edge server through a wireless channel; 4) the edge server decodes the received video, understands the content of the video (such as a traffic accident detection task) and feeds back the result to each vehicle.

Designing a video content priority evaluation method;

considering the difference of video contents, it is necessary to first define how to calculate the content priority of the video frames.

The content priority evaluation method based on the video interframe difference comprises the following specific steps:

step 201, preprocessing videos acquired by a vehicle, including graying and Gaussian filtering;

step 202, performing frame-by-frame differential calculation on the processed video;

step 203, binarization and exponential smoothing of calculation results;

and step 204, normalization processing.

The priority rating result of video X can be expressed as

f(X)＝{f₁,f₂,…,f_K}

f (X) shows the result of priority evaluation of the entire video clip, f₁,f₂,…,f_KAnd respectively representing the priority evaluation result of each frame of the video, and K represents the maximum frame number of the video clip.

Calculating the video transmission distortion rate and the error rate under the Internet of vehicles channel, taking the effective information quantity of the maximized video transmission as an optimization target, and constructing a resource distribution optimization model based on the video content priority in the Internet of vehicles;

the effective information is defined as the information transmitted without distortion and error, and the effective information amount of each video frame

Can be expressed as:

wherein, I_m,kRepresenting the amount of original video information at pixel level, q_mRepresenting the distortion ratio, p_mIndicating the bit error rate. The subscript m denotes a vehicle number and k denotes a frame number.

Step 301, calculating a distortion rate of a transmission video;

step 302, calculating the error rate of the transmission video;

step 303, constructing a resource allocation optimization model based on video content priority

The optimization aim is to reasonably allocate bandwidth and power to each vehicle through priority weighting under the resource constraint condition, so that the effective information quantity of the transmitted video is maximized. Thus, the optimization model can be expressed as:

C1:

C2:

C3:q_m(B_m,P_m)∈(0,1]

C4:p_m(B_m,P_m)＜e_min

C5:

m is the total number of vehicles, and K is the total number of target categories. f. of_m,kThe priority weights representing the video frame content, i.e. the different video frames of each vehicle, have different importance in the video content understanding task.

Indicating the amount of useful information. Optimizing variable B_mAnd P_mRespectively, the bandwidth and power allocated for each vehicle.

Constraint C1 represents a constraint on bandwidth, bandwidth B assigned to each vehicle_mMust be greater than a minimum threshold B_minThe sum of the bandwidth resources allocated to all the cars is less than the available bandwidth resource B_max；

Constraint C2 represents a constraint of power, power P allocated to each vehicle_mMust be greater than a minimum threshold value P_minThe sum of the power resources allocated to all vehicles is less than the available power resource P_max；

The constraint C3 represents a constraint on the distortion rate, rate q_mIt needs to be greater than 0 and less than or equal to 1, which is a condition for guaranteeing video quality;

the constraint C4 represents a constraint on the bit error rate, bit error rate p_mHas a threshold value of e_minThe bit error rate must be less than the threshold value to ensure the communication quality;

constraint condition C5 represents the constraint of video coding quantization parameter, and the value range of QP value is [0,51], which is the general value range of quantization parameter for video coding and decoding using h.265.

On the basis of a resource allocation optimization model based on video content priority, a distributed multi-agent reinforcement Q learning algorithm model is constructed and trained;

step 401, constructing an agent;

step 402, constructing an action space;

step 403, constructing a state space;

step 404, constructing environment feedback;

step 405, setting hyper-parameters;

and 406, training the distributed multi-agent reinforcement Q learning algorithm model to obtain a model for solving a resource allocation optimization problem based on video content priority.

The resource allocation method jointly allocates bandwidth and power resources to each vehicle, so that the effective information amount of the unit vehicle for transmitting the video is maximized, and the precision of the edge server for completing the video content understanding task is improved.

The invention has the advantages that:

(1) a vehicle networking resource allocation method based on video content priority jointly optimizes power and spectrum allocation from a vehicle end to an edge server end, and achieves maximization of effective information quantity;

(2) a video content priority-based vehicle networking resource allocation method designs a video content priority evaluation method, quantifies the content priority of different videos from the importance of video content, and obtains the weight when video resources with different content priorities are allocated;

(3) a distributed multi-agent reinforcement Q learning algorithm model is introduced, so that the problem of resource allocation for a video content understanding task in the Internet of vehicles is solved, good training convergence performance is obtained, and the accuracy of video content understanding is improved.

Drawings

FIG. 1 is a video content priority based vehicle networking communication system model constructed in accordance with the present invention;

FIG. 2 is a schematic diagram of a video content priority evaluation process according to the present invention;

FIG. 3 is a graph of the convergence performance of the distributed multi-agent reinforcement Q learning algorithm of the present invention;

FIG. 4 is a graph showing a relationship between the number of vehicles and performance according to the resource allocation method of the present invention;

FIG. 5 is a diagram illustrating the evaluation result of the video content priority under the real data according to the present invention;

fig. 6 is a graph of video content understanding accuracy performance under real data according to the present invention.

Detailed Description

In order that the technical principles of the present invention may be more clearly understood, embodiments of the present invention are described below in detail with reference to the accompanying drawings.

A vehicle networking Resource Allocation Method (A Joint Resource Allocation Method for Internet of Vehicles Based on Video Content Priority) is applied to a vehicle networking communication system of a vehicle end and an edge server end; firstly, designing a video content priority evaluation method, quantizing the content priorities of different videos, and obtaining the weight when video resources with different content priorities are distributed;

secondly, constructing a resource distribution optimization model based on video content priority by taking the effective information quantity maximization as a target and taking the constraint conditions that the sum of the distributed bandwidths or powers of all vehicles does not exceed the total bandwidth or total power limit, the distributed bandwidths or powers of the vehicles are not less than the minimum bandwidth or power value, the video distortion rate, the bit error rate and the video quantization parameter value accord with the actual value range;

then, according to the optimization model, constructing an agent, a state space, an action space and an environment feedback of a distributed multi-agent reinforcement Q learning algorithm model for vehicle networking resource allocation;

and finally, training the distributed multi-agent reinforcement Q learning algorithm model according to the instantaneous CSI from the vehicle to the edge server, the bandwidth allocation result of the vehicle in the last time slot, the change scheme of the corresponding vehicle bandwidth increase and decrease selected in the last time slot and the environment feedback value calculated in the last time slot, and obtaining the model of the vehicle networking resource allocation optimization problem solved in the video content priority.

The whole process comprises four steps of establishing a system model, evaluating the priority of video content, proposing a resource allocation optimization problem, establishing an optimization model, establishing a distributed multi-agent reinforcement Q learning algorithm model and executing training; wherein the video content priority evaluation process comprises: video preprocessing, difference calculation, index smoothing processing and normalization processing; the process of establishing the optimization model comprises the following steps: calculating a distortion rate, calculating an error rate and constructing a resource allocation optimization model with constraints; the establishment of the distributed multi-agent reinforcement Q learning algorithm model and the execution of the training process comprise the following steps: designing an intelligent agent, constructing an action space, constructing a state space, constructing environment feedback, setting a hyper-parameter and training a distributed multi-intelligent agent reinforcement Q learning algorithm model.

The method comprises the following specific steps:

step one, constructing a vehicle networking system model of video semantic communication between a vehicle end and an edge server end;

as shown in fig. 1, the system model includes an edge server MEC and M intelligent networked cars equipped with cameras.

The communication and calculation process of the whole system is as follows:

1) the method comprises the steps that a vehicle sends a video transmission request to an edge server, firstly, collected videos are preprocessed, the priority of video contents is calculated based on an interframe difference method, the calculation result is uploaded to the edge server, and meanwhile, the edge server obtains global CSI; 2) the edge server obtains an optimal resource allocation result by utilizing a distributed multi-agent reinforcement Q learning algorithm according to the maximization target of the effective information quantity of video transmission; 3) the vehicle completes the coding compression of the video according to the result of the resource allocation and transmits the video to the edge server through a wireless channel; 4) the edge server decodes the received video, finishes understanding video content, such as a traffic accident detection task and the like, feeds back content understanding results to each vehicle, and meanwhile, the server can store the video so as to finish other tasks.

Designing a video content priority evaluation method;

in consideration of the difference in video content, it is necessary to first define how to calculate the content priority of the video frames. The traffic accident detection task is taken as an example for explanation, in videos collected by a vehicle end, not every frame of video is important for an accident detection result, and traffic accident video clips are often accompanied with great changes of inter-frame contents. These video frames with more content changes are important to the detection result, so the difference between the video frames can be used to define the content priority of the frames.

As shown in fig. 2, the method for evaluating the priority of content based on the difference between video frames specifically comprises the following steps:

step 201, preprocessing videos acquired by a vehicle, including graying and Gaussian filtering, so as to reduce the influence of noise on the capturing process of the original videos;

step 202, performing frame-by-frame difference calculation on the processed video, which may be expressed as:

d_k＝x_k-x_k-1

wherein k represents a frame number, d_kRepresenting the difference, x, of two frames before and after a video_kK frame, x, representing video_k-1A k-1 th frame representing video;

step 203, binarization and exponential smoothing of the calculation result to eliminate the influence of glitches and frame anomalies, which can be expressed as:

respectively representing the difference smoothing values of the (k + 1) th frame, the (k) th frame and the (k-1) th frame, and alpha represents a smoothing parameter.

Step 204, normalization processing, which is to normalize the smoothed difference value to obtain a priority score of each video frame, and may be represented as:

f_kindicating the priority weight of the k-th frame and Norm the normalization function.

The priority rating result of the entire video X can be expressed as

f(X)＝{f₁,f₂,…,f_K}

according to the second step, different priorities of the video contents can be obtained, and in order to obtain a better video content understanding result on the edge server, more reliable resources need to be allocated to the video with higher content priority in the transmission process so as to reserve the original video information and reduce distortion or error codes in the transmission process. In resource-limited situations, bandwidth and power are jointly allocated to each vehicle to maximize the delivery of useful information. The effective information is defined as the information transmitted without distortion and error, and the effective information amount of each video frame

Can be expressed as:

wherein, I_m,kRepresenting the amount of original video information at pixel level, q_mRepresenting the distortion ratio, p_mIndicating the bit error rate. Subscript m denotes the order of the vehicleThe number k indicates a frame number.

Step 301, calculating a distortion rate of a transmission video;

video data is encoded and compressed using the h.265 video encoding compression standard, and in the h.265 encoding scheme, the distortion rate is related to the value of a Quantization Parameter (QP), and the QP is related to the transmission rate. Thus, the relation between the distortion rate and the transmission rate can be derived as:

α₁,α₂,α₃,β₁,β₂the model parameters are known. n is₀Representing the noise power spectrum, h_mIndicating the channel gain, and m is the vehicle number. B is_m，P_mRespectively, the bandwidth and power allocated to the m-th vehicle.

Channel gain h_mIncluding path loss h_plShadow fading h_sdAnd small scale fading

The three models of loss or fading are: 1) path loss h_pl＝148.1+37.6log₁₀(d_m)(dB)(d_m(km) is the distance between the mth vehicle and the edge server); 2) shadow fading h_sdObeying a lognormal distribution model with the standard deviation of 8 and the mean value of 0; 3) the small-scale fading coefficients are expressed as rayleigh distributions with unit variance and zero mean. Modeling the time-varying Rayleigh coefficients as independent first-order autoregressive processes taking into account the time-varying characteristics of small-scale fading

(t_eIs the time interval during which the channel remains in a steady state, e_hIs formed by

Distribution derived processing noise, where p_m(t_e)＝J₀(2πv_mt_e/λ_c) Representing the channel autocorrelation function, J₀(. is a zero-order Bessel function of the first kind, λ_cIs the center carrier wavelength, v_mThe traveling speed of the m-th vehicle.

Step 302, calculating the error rate of the transmission video;

by means of quadrature modulation and demodulation, the bit error rate at the receiving end can be expressed as:

wherein, N is a modulation order number,

the function of Q is represented by the value,

indicating the desire.

C1:

C2:

C3:q_m(B_m,P_m)∈(0,1]

C4:p_m(B_m,P_m)＜e_min

C5:

step 401, constructing an agent;

m vehicles in the vehicle network are used as intelligent agents in the multi-intelligent-agent Q-learning algorithm.

Step 402, constructing an action space;

the motion space is composed of two parts, which are bandwidth resource allocation and power resource allocation, and the size of the motion space is L, all the motions of the mth vehicle can be expressed as:

A_m＝[B_m1,B_m2,…,B_mL；P_m1,P_m2,…,P_mL]

A_mrepresents the motion space of the m-th vehicle, B_m1,B_m2,…,B_mL(ii) a Indicating a bandwidth resource selection action, P_m1,P_m2,…,P_mLIndicating a spectrum resource selection action.

Step 403, constructing a state space;

the system state consists of two parts s ═ max, rem. s represents the current state of the system, and max is the maximum value of the effective information quantity

rem is the bandwidth resource and power resource currently available to the entire system.

Step 404, constructing environment feedback;

taking into account that the optimization objective is to maximize weighted effective information and the original video information I of each vehicle_m,kIt is known that when a constraint is satisfied, a reward function can be designed as R_m＝∑(1-q_m)(1-p_m)f_m,k(ii) a When the constraint is not satisfied, the reward may be set to a negative value, such as R_mIs-1. Thus, the environmental feedback has a penalty value that prevents the selection of actions that iterate from this direction.

Step 405, setting hyper-parameters;

the effect of the reinforcement Q learning algorithm training and the main parameters thereof have strong correlation, and the main parameters of the model are set as follows: 1) the learning rate alpha is 0.1, 90% of the historical training result is kept by the algorithm, and more attention is paid to past experience; 2) the discount factor gamma is 0.9, 90% of the benefit of the next step is considered by the algorithm, and long-term benefit is more important; 3) the epsilon in the epsilon-greedy strategy is 0.8, so that the algorithm has 20% of trial space, and the local optimum is avoided.

For each training round of multi-agent reinforcement Q learning, the specific steps are as follows:

(1) setting an initial state Q_m(s,a)，Q_mA Q table representing the mth vehicle, s representing the current state of the system, and a representing the current action selection;

(2) generating a random number random, selecting action a based on greedy decision: if the random number random is smaller than epsilon, selecting the action a with the maximum Q value according to the Q value table; if the random number random is larger than epsilon, randomly selecting an action a;

(3) performing the selected action a, observing the next bandwidth allocation status and R of the environmental feedback_m(s, a) value;

(4) updating the Q value table: q_m(s,a)←max{Q_m(s,a),Q'_m(s,a)}，

Q'_m(s,a)＝Q(B_m,a)+α[R_m(s,a)+γmaxQ(s',a')-Q(s,a)]；

(5) And (5) returning to the step (2) until convergence is finished.

Fig. 3 illustrates the convergence process of the reward function of the reinforcement Q learning algorithm with the number of iterations. As can be seen from the figure, the solution of the reinforcement Q learning algorithm tends to be stable after about 3000 iterations, which proves the convergence of the resource allocation algorithm based on the distributed multi-agent reinforcement Q learning proposed by the present invention, a stable Q value table can be obtained through a limited number of iterations in on-line training, and in addition, the total reward increases with the increase of power or bandwidth, because sufficient resources will bring better performance;

comparing the resource allocation method based on the reinforcement Q learning with the resource allocation method based on the average allocation and the traversal search, as shown in fig. 4, it can be seen from the figure that as the number of vehicles increases, the performance of all schemes decreases, because the increase of the number of vehicles leads to the aggravation of resource competition. In addition, the invention provides a method based on reinforcement Q learningThe method can be similar to the traversal method, but the scheme provided by the invention has lower complexity, the time complexity of the method provided by the invention is O (L), and the complexity of the traversal search is O (L)^M)；

Based on a large number of experiments on public data sets such as data sets of Caltech, Waymo and the like, the feasibility of the video frame content priority evaluation method is verified, and the improvement of the scheme on the traffic accident detection performance of the transmitted video is verified:

fig. 5 shows the evaluation result of the priority of the content of the video frame on the real data set, and the evaluation result of the priority is highly consistent with the real situation of the traffic accident video clip, thus proving the effectiveness of the method provided by the invention. This is because the priority evaluation method based on the inter-frame difference can capture the content difference between frames, and particularly for traffic accident videos, the difference between frames is very large;

fig. 6 is a traffic accident detection ROC (Receiver Operating Characteristic, ROC) performance curve of a video transmitted under different resource allocation methods. For the ROC curve, the closer to the coordinate point (0,1), the better the detection performance. Compared with other resource allocation methods, the method provided by the invention obviously improves the detection performance. In addition, the correct detection rate of the scheme is 0.8791, the false alarm rate is 0.0673, and the average allocation schemes are 0.8279 and 0.1178 respectively. This shows that the method of the present invention brings about a performance improvement of the video content task (such as traffic accident detection) of nearly 5%.

In conclusion, by implementing the method for allocating the resources of the internet of vehicles based on the video content priority, the bandwidth and power resource allocation from the vehicle end to the edge server end can be optimized, the effective information amount is maximized, and the traffic accident detection performance of the edge server end is improved; compared with a QoE and QoS-based resource allocation algorithm, the video content priority-based resource allocation method for the Internet of vehicles, which is designed by the invention, takes the transmitted video content into consideration, provides more reliable resources for video frames with higher priorities, and improves the subsequent accuracy of understanding the video content; in addition, the method provided by the invention quantifies the priority of the video content, and considers the content difference of different videos and the unstable channel condition of the Internet of vehicles.

While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims

1. A vehicle networking resource allocation method based on video content priority is characterized by comprising the following specific steps:

the communication model of the internet of vehicles comprises a Mobile Edge server (MEC) and M Mobile intelligent automobiles provided with cameras, and the communication and calculation processes of the whole system are as follows: 1) the method comprises the steps that a vehicle initiates a video transmission request, the collected video is preprocessed, the priority of video content is calculated based on an interframe difference method, a calculation result is uploaded to an edge server, and meanwhile, the edge server acquires Channel State Information (CSI); 2) the edge server obtains an optimal resource allocation result according to the maximization of the effective information quantity of video transmission; 3) the vehicle completes the coding compression of the video according to the result of the resource allocation and transmits the video to the edge server through a wireless channel; 4) the edge server decodes the received video, understands the content of the video and feeds back the result to each vehicle;

designing a video content priority evaluation method;

the method for evaluating the content priority based on the video frame difference comprises the following specific steps:

step 203, binarization and exponential smoothing of calculation results;

step 204, normalization processing, namely normalizing the smoothed difference value to obtain the content priority evaluation result of each video frame;

calculating the video transmission distortion rate and the error rate under the Internet of vehicles channel, and constructing a resource allocation optimization model based on the video content priority in the Internet of vehicles by taking the effective information quantity of the maximized video transmission as an optimization target;

under the condition of limited resources, jointly allocating bandwidth and power to each vehicle so as to transmit effective information to the maximum extent; the effective information is defined as the information transmitted without distortion and error, and the effective information amount of each video frame

Expressed as:

wherein, I_m,kRepresenting the amount of original video information at pixel level, q_mRepresenting the distortion ratio, p_mRepresenting the error rate, subscript m representing the vehicle number, k representing the frame number;

step 301, calculating a transmission video distortion rate based on an H.265 video coding compression standard;

the video data is coded and compressed by adopting an h.265 video coding compression standard, in an h.265 coding scheme, a distortion rate is related to a value of a Quantization Parameter (QP), and the QP is related to a transmission rate, and a calculation formula of the distortion rate is expressed as:

α₁,α₂,α₃,β₁,β₂as a model parameter, n₀Representing the noise power spectrum, h_mRepresents the channel gain, B_mAnd P_mRespectively representing the bandwidth and power allocated to the mth vehicle;

step 302, calculating the error rate of the transmitted video based on orthogonal modulation and demodulation;

by adopting the orthogonal modulation and demodulation mode, the bit error rate of the receiving end is expressed as:

wherein, N is a modulation order number,

the function of Q is represented by the value,

indicating expectation, B_mAnd P_mRespectively representing the bandwidth and power allocated to the m-th vehicle, h_mRepresenting the channel gain, n, of the m-th vehicle₀Representing the noise power spectrum, P representing the vehicle transmission power, h representing the channel gain;

step 303, constructing a resource allocation optimization model based on video content priority;

the optimization goal is to reasonably allocate bandwidth and power to each vehicle through priority weighting under the resource constraint condition, so as to maximize the effective information amount of the transmission video, and therefore, the optimization model is expressed as:

C1:

C2:

C3:q_m(B_m,P_m)∈(0,1]

C4:p_m(B_m,P_m)<e_min

C5:

m is the total number of vehicles, K is the total number of target categories, f_m,kPriority weights representing the video frame content, i.e. different video frames of each vehicle have different importance in the video content understanding task,

representing the effective information quantity, optimizing variable B_mAnd P_mRespectively the bandwidth and power allocated to each vehicle, beta₁And beta₂Representing model parameters, h_mRepresenting the channel gain, n₀Representing a noise power spectrum;

constraint condition C5 represents the constraint of video coding quantization parameter, and the value range of QP value is [0,51], which is the value range of quantization parameter of video coding and decoding using h.265;

step 401, constructing an agent;

taking M vehicles in a vehicle network as intelligent agents in a multi-intelligent-agent Q-learning algorithm;

step 402, constructing an action space;

the motion space is composed of two parts, namely bandwidth resource allocation and power resource allocation, and the size of the motion space is L, all the motions of the mth vehicle are expressed as:

A_m＝[B_m1,B_m2,…,B_mL；P_m1,P_m2,…,P_mL]

A_mrepresents the motion space of the m-th vehicle, B_m1,B_m2,…,B_mLIndicating a bandwidth resource selection action, P_m1,P_m2,…,P_mLRepresenting a spectrum resource selection action;

step 403, constructing a state space;

the system state is composed of two parts, s ═ max (rem), where s denotes the current state of the system and max is the maximum value of the effective information amount

rem is the bandwidth resource and power resource currently available for the whole system;

step 404, constructing environment feedback;

considering the optimization goal as maximizing the weighted effective information and the original video information I of each vehicle_m,kIt is known that when the constraints are satisfied, the reward function is designed as R_m＝∑(1-q_m)(1-p_m)f_m,k(ii) a When the constraint condition is not satisfied, setting the reward as R_mThus, the environmental feedback has a penalty value, preventing action selection from this direction iteration;

step 405, setting hyper-parameters;

the effect of the reinforcement Q learning algorithm training has correlation with the main parameters thereof, and the main parameters are set as follows: 1) the learning rate alpha is 0.1, 90% of the historical training result is kept by the algorithm, and more attention is paid to past experience; 2) the discount factor gamma is 0.9, 90% of the benefit of the next step is considered by the algorithm, and long-term benefit is more important; 3) the epsilon in the epsilon-greedy strategy is 0.8, so that the algorithm has 20% of trial space, and the local optimum is avoided;

2. The method for allocating resources in the internet of vehicles based on video content priority as claimed in claim 1, wherein the video content priority evaluation method in the second step defines the content priority of the frame based on the difference calculation between the video preprocessing and the video frame: step 1, preprocessing a video pair acquired by a vehicle, including graying and Gaussian filtering, so as to reduce the influence of noise on the capturing process of an original video; step 2, carrying out frame-by-frame differential calculation on the processed video, wherein the calculation formula is d_k＝x_k-x_k-1Wherein k represents a frame number, d_kRepresenting the difference, x, of two frames before and after a video_kK frame, x, representing video_k-1A k-1 th frame representing video; step 3, performing exponential smoothing treatment on the calculation result to eliminate the influence of burrs and frame anomalies, wherein the exponential smoothing formula is

Wherein,

respectively representing the difference smoothing values of the (k + 1) th frame, the (k) th frame and the (k-1) th frame, wherein alpha represents a smoothing parameter; and 4, normalizing the smoothed difference value to obtain a priority evaluation result of each video frame.

3. The method according to claim 1, wherein the calculating of the distortion rate of the transmitted video in step three is performed under dynamic internet of vehicles channel conditions, and the video coding scheme is based on h.265: the video data is encoded and compressed by using an h.265 video encoding compression standard, in an h.265 encoding scheme, a distortion rate is related to a value of a Quantization Parameter (QP), and the QP is related to a transmission rate, and a calculation formula of the distortion rate is represented as:

α₁,α₂,α₃,β₁,β₂as a model parameter, n₀Representing the noise power spectrum, h_mDenotes channel gain, m is vehicle number, B_m，P_mRespectively, the bandwidth and power allocated to the m-th vehicle, the channel gain h stated in the formula_mIncluding path loss h_plShadow fading h_sdAnd small scale fading