CN113992706A - Method and device for requesting content placement in Internet of vehicles scene and electronic equipment - Google Patents

Method and device for requesting content placement in Internet of vehicles scene and electronic equipment Download PDF

Info

Publication number
CN113992706A
CN113992706A CN202111054937.3A CN202111054937A CN113992706A CN 113992706 A CN113992706 A CN 113992706A CN 202111054937 A CN202111054937 A CN 202111054937A CN 113992706 A CN113992706 A CN 113992706A
Authority
CN
China
Prior art keywords
state information
content
environment state
information
action
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111054937.3A
Other languages
Chinese (zh)
Other versions
CN113992706B (en
Inventor
陈莹
马腾
陈昕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Information Science and Technology University
Original Assignee
Beijing Information Science and Technology University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Information Science and Technology University filed Critical Beijing Information Science and Technology University
Priority to CN202111054937.3A priority Critical patent/CN113992706B/en
Publication of CN113992706A publication Critical patent/CN113992706A/en
Application granted granted Critical
Publication of CN113992706B publication Critical patent/CN113992706B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/12Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]

Abstract

The invention discloses a method and a device for requesting content placement in a car networking scene and electronic equipment, wherein the method comprises the following steps: the method comprises the steps of obtaining request content and current environment state information, inputting the current environment state information into a trained content placement model to obtain action description information for placing the request content, and placing the request content on a corresponding roadside unit according to the action description information, wherein the trained content placement model is obtained after training is carried out by utilizing different environment state information. The trained content placement model is input into the controller for global control, and a requested content placement decision can be made in real time according to the state information of the vehicle and the state information of the roadside units, so that the decision time delay is reduced, and the system operation efficiency is improved.

Description

Method and device for requesting content placement in Internet of vehicles scene and electronic equipment
Technical Field
The invention relates to the technical field of computers, in particular to a method and a device for requesting content placement in a car networking scene and electronic equipment.
Background
With the popularization of the 5G technology, the Internet of vehicles becomes a key technology of intelligent cities, traffic jam is hopeful to be solved, and traffic accidents caused by improper driving are reduced. The development of vehicle-mounted intelligence also enables a vehicle to be converted into an intelligent terminal from a vehicle, and a plurality of vehicle-mounted applications, such as vehicle-mounted maps, vehicle-mounted videos and the like, are also on line on a large scale.
In a car networking scenario, an on-board application often has a requirement of low time delay, and in order to meet the requirement, applying an edge caching technology to the acquisition of the content requested by the vehicle is a significant method. On the one hand, however, unlike the traditional scenario of being relatively static, the internet of vehicles tends to be highly dynamic and random due to the high speed mobility of the vehicles. On the other hand, autonomous vehicles generate a large amount of data transmission per hour, which poses a challenge to data processing and transmission energy consumption of the base station.
In summary, there is a need for a method for requesting content placement in an internet of vehicles scenario to solve the above-mentioned problems in the prior art.
Disclosure of Invention
Because the existing method has the problems, the invention provides a method and a device for requesting content placement in an internet of vehicles scene, electronic equipment and a storage medium.
In a first aspect, the present invention provides a method for requesting content placement in a car networking scenario, including:
acquiring request content and current environment state information; the current environment state information comprises state information of a plurality of vehicles, state information of a plurality of roadside units and state information of a plurality of channels;
inputting the current environment state information into a trained content placement model to obtain action description information for placing the request content;
placing the request content on a corresponding roadside unit according to the action description information;
the trained content placement model is obtained by training with different environment state information.
Further, the state information of the vehicle comprises the motion information of the vehicle and a task backlog queue; the state information of the roadside unit comprises the residual cache capacity and the content cache state of the roadside unit; the state information of the channel includes a communication rate.
Further, the content placement model includes a value network, a policy network, an actor target network, and a critic target network, and before the current environmental state information is input to the trained content placement model and the action description information for placing the requested content is obtained, the method further includes:
acquiring a preset number of training sample sets; each group of training samples comprises first environment state information, action description information, second environment state information and action rewards; the action description information is obtained after the strategy network inputs the first environment state information; the first environment state information is environment state information before executing the action corresponding to the action description information; the second environment state information is environment state information after the action corresponding to the action description information is executed; the action reward is a reward value for executing the action corresponding to the action description information;
inputting the first environmental state information and the action description information into the value network to obtain a first function value;
inputting the second environment state information into the actor target network to obtain next action description information;
inputting the second environment state information and the next action description information into the critic target network to obtain a second function value;
determining a loss function according to the first function value and the second function value;
and updating the parameters of the content placement model according to the loss function to obtain the trained content placement model.
Further, the specific calculation formula of the remaining buffer capacity is as follows:
Figure BDA0003254223970000031
wherein k isr(t-1) represents the buffer capacity of the last slot roadside unit r deletion,
Figure BDA0003254223970000032
representing a cache decision variable, sfIndicates the number of encoded segments needed to decode the content f,
Figure BDA0003254223970000033
indicating the size of each encoded segment of the content f.
Further, the acquiring a preset number of training sample sets includes:
acquiring the first environment state information and the action description information; the first environment state information comprises a content cache state corresponding to the first environment state information and a task backlog queue corresponding to the first environment state information;
determining the action reward according to the action description information;
and determining a task backlog queue corresponding to the second environment state information according to the content cache state corresponding to the first environment state information and the task backlog queue corresponding to the first environment state information.
Further, the determining the action reward according to the action description information includes:
acquiring transmission energy consumption and delivery time delay corresponding to the action description information;
and determining the action reward corresponding to the action description information according to the transmission energy consumption and the delivery time delay.
Further, the determining the task backlog queue corresponding to the second environment state information according to the content cache state corresponding to the first environment state information and the task backlog queue corresponding to the first environment state information includes:
determining a communication time distribution decision according to a content cache state corresponding to the first environment state information and a task backlog queue corresponding to the first environment state information;
and determining the task backlog queue corresponding to the second environment state information according to the task backlog queue corresponding to the first environment state information and the communication time distribution decision.
Further, the determining a communication time allocation decision according to the content cache status corresponding to the first environmental status information and the backlog queue of the task corresponding to the first environmental status information includes:
obtaining a target function;
determining the content of the request to be distributed according to the content cache state corresponding to the first environment state information and the task backlog queue corresponding to the first environment state information;
determining the unit value of the request content to be distributed according to the objective function;
sequencing the content of the request to be distributed according to the unit value to obtain a sequencing result;
and distributing the communication time for the content of the request to be distributed according to the sequencing result to obtain the communication time distribution decision.
In a second aspect, the present invention provides an apparatus for requesting content placement in a car networking scenario, including:
the acquisition module is used for acquiring request content and current environment state information; the current environment state information comprises state information of a plurality of vehicles, state information of a plurality of roadside units and state information of a plurality of channels;
the processing module is used for inputting the current environment state information into a trained content placement model to obtain action description information for placing the request content; placing the request content on a corresponding roadside unit according to the action description information; the trained content placement model is obtained by training with different environment state information.
Further, the processing module is specifically configured to:
the state information of the vehicle comprises the motion information of the vehicle and a task backlog queue; the state information of the roadside unit comprises the residual cache capacity and the content cache state of the roadside unit; the state information of the channel includes a communication rate.
Further, the content placement model includes a value network, a policy network, an actor target network, and a critic target network, the processing module further to:
acquiring a preset number of training sample sets before inputting the current environment state information into a trained content placement model to obtain action description information for placing the request content; each group of training samples comprises first environment state information, action description information, second environment state information and action rewards; the action description information is obtained after the strategy network inputs the first environment state information; the first environment state information is environment state information before executing the action corresponding to the action description information; the second environment state information is environment state information after the action corresponding to the action description information is executed; the action reward is a reward value for executing the action corresponding to the action description information;
inputting the first environmental state information and the action description information into the value network to obtain a first function value;
inputting the second environment state information into the actor target network to obtain next action description information;
inputting the second environment state information and the next action description information into the critic target network to obtain a second function value;
determining a loss function according to the first function value and the second function value;
and updating the parameters of the content placement model according to the loss function to obtain the trained content placement model.
Further, the processing module is specifically configured to: the specific calculation formula of the remaining cache capacity is as follows:
Figure BDA0003254223970000051
wherein k isr(t-1) represents the buffer capacity of the last slot roadside unit r deletion,
Figure BDA0003254223970000052
representing a cache decision variable, sfIndicates the number of encoded segments needed to decode the content f,
Figure BDA0003254223970000061
indicating the size of each encoded segment of the content f.
Further, the processing module is specifically configured to:
acquiring the first environment state information and the action description information; the first environment state information comprises a content cache state corresponding to the first environment state information and a task backlog queue corresponding to the first environment state information;
determining the action reward according to the action description information;
and determining a task backlog queue corresponding to the second environment state information according to the content cache state corresponding to the first environment state information and the task backlog queue corresponding to the first environment state information.
Further, the processing module is specifically configured to:
acquiring transmission energy consumption and delivery time delay corresponding to the action description information;
and determining the action reward corresponding to the action description information according to the transmission energy consumption and the delivery time delay.
Further, the processing module is specifically configured to:
determining a communication time distribution decision according to a content cache state corresponding to the first environment state information and a task backlog queue corresponding to the first environment state information;
and determining the task backlog queue corresponding to the second environment state information according to the task backlog queue corresponding to the first environment state information and the communication time distribution decision.
Further, the processing module is specifically configured to:
obtaining a target function;
determining the content of the request to be distributed according to the content cache state corresponding to the first environment state information and the task backlog queue corresponding to the first environment state information;
determining the unit value of the request content to be distributed according to the objective function;
sequencing the content of the request to be distributed according to the unit value to obtain a sequencing result;
and distributing the communication time for the content of the request to be distributed according to the sequencing result to obtain the communication time distribution decision.
In a third aspect, the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the processor implements the method for requesting content placement in the scenario of internet of vehicles as described in the first aspect.
In a fourth aspect, the present invention also provides a non-transitory computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the method for requesting content placement in a car networking scenario as described in the first aspect.
According to the technical scheme, the method, the device and the electronic equipment for requesting content placement in the Internet of vehicles scene, provided by the invention, have the advantages that the trained content placement model is input into the controller for global control, a request content placement decision can be made in real time according to the state information of the vehicles and the state information of the roadside units, the decision delay is reduced, and the system operation efficiency is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a system framework for a method of requesting content placement in a vehicle networking scenario provided by the present invention;
FIG. 2 is a schematic flow chart illustrating a method for requesting content placement in an Internet of vehicles scenario provided by the present invention;
FIG. 3 is a schematic flow chart of a training content placement model provided by the present invention;
FIG. 4 is a schematic structural diagram of a device for requesting content placement in a vehicle networking scenario according to the present invention;
fig. 5 is a schematic structural diagram of an electronic device provided in the present invention.
Detailed Description
The following further describes embodiments of the present invention with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
In the embodiment of the present invention, the base station includes all request contents required by users, and R roadside units (RSUs) having a buffering and calculating function and N vehicle users in a driving state are respectively represented by sets R ═ 1, 2,. multidrug, R,. multidrug, and N ═ 1, 2,. multidrug, N,. multidrug, and N ═ N. The length of each timestamp in the discrete-time system T {1, 2.,. T } is denoted by τ.
Further, before the start of each time slot, the vehicle user generates the request content qf(t)={pf(t),zf(t) }, in which pf(t) is the probability that content f is requested, zf(t) is the size of the content f.
It should be noted that the request probability p of an arbitrary content ff(t) obeying a Zipf distribution with
Figure BDA0003254223970000081
Where epsilon is the request factor.
Due to the limited buffer capacity of the roadside units, each roadside unit cannot store all content that may be requested by a user. According to the embodiment of the invention, the coding segments of the related contents are cached in the roadside unit dynamically according to the state information of the vehicle, and when the vehicle collects the coding segments of the related contents, the coding segments can be decoded locally to obtain the complete contents.
In a possible implementation manner, the coding manner is Maximum-Distance-Separable (MDS) coding, and may also be other coding manners, which is not specifically limited in this embodiment of the present invention.
The method for placing content provided by the embodiment of the present invention may be applied to a system architecture as shown in fig. 1, where the system architecture includes a content placement model 100, a controller 200, a base station 300, a roadside unit 400, and a vehicle 500.
Specifically, the content placement model 100 is used to obtain the action description information of the placement request content after inputting the current environment state information.
It should be noted that the current environmental state information includes state information of the vehicle 500, state information of the roadside unit 400, and state information of the channel.
Further, the controller 200 places the requested contents of the base station 300 on the corresponding roadside unit 400 according to the action description information.
In one possible implementation, a software defined network SDN controller is employed.
It should be noted that, the trained content placement model 100 is obtained by training with different tasks and different environmental status information.
It should be noted that fig. 1 is only an example of a system architecture according to the embodiment of the present invention, and the present invention is not limited to this specifically.
Based on the above illustrated system architecture, fig. 2 is a schematic flow diagram corresponding to a method for requesting content placement in an internet of vehicles scenario provided by the embodiment of the present invention, as shown in fig. 2, the method includes:
step 201, obtaining the request content and the current environment state information.
It should be noted that the current environmental state information includes state information of a plurality of vehicles and state information of a plurality of roadside units.
In the embodiment of the invention, the state information of the vehicle comprises the motion information of the vehicle and a task backlog queue; the state information of the roadside unit includes a remaining buffer capacity, a content buffer state, and a communication rate of the roadside unit.
Further, with Rn(t) to represent the set of RSUs associated with vehicle n in time slot t. With Nr(t) indicates the presence of an RSU in time slot trA collection of vehicles within service range.
In the embodiment of the invention, the motion information of the vehicle comprises the position information, the speed magnitude and the speed direction of the vehicle.
In the embodiment of the present invention, the first and second substrates,
Figure BDA0003254223970000091
for buffering decision variables, i.e. base station to RSU in t time slotrThe number of the coding segments of the cache content f is
Figure BDA0003254223970000092
Wherein s isfIndicating the number of encoded segments needed to decode the content f.
The embodiment of the invention adopts a Least Recently Used (LRU) strategy as a cache updating strategy. The RSU before the start of the current time slot trThe specific calculation formula of the remaining buffer capacity is as follows:
Figure BDA0003254223970000101
wherein k isr(t-1) represents the last time slot RSUrBased on the cache capacity deleted by the LRU policy,
Figure BDA0003254223970000102
indicating the size of each encoded segment of the content f.
Further, due to the limitation of RSU buffer capacity, it needs to be satisfied
Figure BDA0003254223970000103
Figure BDA0003254223970000104
After the buffer is finished, the RSUrThe content cache state in (1) is updated as follows:
Figure BDA0003254223970000105
wherein the content of the first and second substances,
Figure BDA0003254223970000106
indicating content f in RSUrThe number of the code segments buffered in the buffer.
Further, the vehicle user generates a content request task before the start of each time slot, and the limited communication time of each time slot may result in the task being incomplete, thereby generating a backlog of tasks. By using
Figure BDA00032542239700001011
Representing the backlog of the vehicle n at time t, then
Figure BDA0003254223970000107
The update method of (1) is as follows:
Figure BDA0003254223970000108
in the embodiment of the present invention, the first and second substrates,
Figure BDA0003254223970000109
for the average queue length of vehicle n with respect to content f, in order to stabilize the long-term task queue, the following formula needs to be satisfied:
Figure BDA00032542239700001010
where ε is the constraint value that keeps the task queue stable.
Further, the vehicle n and the RSUrnThe specific calculation formula of the communication rate between the two is as follows:
Figure BDA0003254223970000111
where H (t) represents the channel gain, obeying to g0(d0/d)4Index distribution of (1), g0Is a path loss exponent, d0And d reflects the reference distance and the real distance of the vehicle mobility.
According to the scheme, the content placement decision is made in real time according to the motion information of the current vehicle, the task backlog queue, the residual cache capacity of the roadside unit, the content cache state and the communication rate, and the system operation efficiency is improved.
In particular, the method comprises the following steps of,
Figure BDA0003254223970000112
indicates the bandwidth allocated to the vehicle n,
Figure BDA0003254223970000113
is the maximum available bandwidth of the RSU,
Figure BDA0003254223970000114
is RSUrnThe number of vehicles served.
Based on this, the current environmental state information can be represented as the following state space:
Figure BDA0003254223970000115
wherein iNRepresenting the position coordinates of the vehicle, vnRepresenting the speed of the vehicle, the set eta for the selectable directions of the vehicle at each momentnAnd the symbols are expressed as { E, S, W, N } and represent four directions of southeast, northwest, respectively.
Further, the vehicle waits at the intersection with a probability of movement. The probability of movement of the vehicle n at the intersection is expressed as,
Figure BDA0003254223970000116
where ρ isintRepresenting intersection density, v, in a scenenWhich indicates the traveling speed of the vehicle n,
Figure BDA0003254223970000117
the probability of waiting for the vehicle n is indicated,
Figure BDA0003254223970000118
representing the maximum waiting time for vehicle n. The calculation formula of the probability that the vehicle n stays at the intersection is as follows:
Figure BDA0003254223970000119
step 202, inputting the current environment state information into the trained content placement model to obtain the action description information for placing the requested content.
In the embodiment of the present invention, the action description information is expressed as:
Figure BDA00032542239700001110
and step 203, placing the request content on the corresponding roadside unit according to the action description information.
It should be noted that, the trained content placement model is obtained by training with different environmental state information.
According to the scheme, the trained content placement model is input into the controller to be subjected to global control, and a requested content placement decision can be made in real time according to the state information of the vehicle and the state information of the roadside units, so that the decision delay is reduced, and the system operation efficiency is improved.
Further, the content placement model of the embodiment of the present invention includes a value network, a policy network, an actor target network, and a critic target network, and before step 202, the flow of steps is as shown in fig. 3, specifically as follows:
step 301, a preset number of training sample sets are obtained.
It should be noted that each set of training samples includes first environmental status information, action description information, second environmental status information, and action reward.
Specifically, the action description information is obtained after the policy network inputs the first environment state information; the first environment state information is environment state information before the action corresponding to the action description information is executed; the second environment state information is environment state information after the action corresponding to the action description information is executed; the action reward is a reward value for executing the action corresponding to the action description information.
In the embodiment of the present invention, the generation process of the training sample set is as follows:
S0→A0→R0→S1→…→St-1→At-1→Rt-1→St→…
wherein S ist-1Indicating the environmental status information at time t-1, At-1Representing the action description taken at time t-1, Rt-1Representing the action award acquired at time t-1.
Specifically, first environment state information and action description information are obtained;
it should be noted that the first environment state information includes a content cache state corresponding to the first environment state information and a task backlog queue corresponding to the first environment state information.
Determining an action reward according to the action description information;
and determining a task backlog queue corresponding to the second environment state information according to the content cache state corresponding to the first environment state information and the task backlog queue corresponding to the first environment state information.
Specifically, acquiring transmission energy consumption and delivery time delay corresponding to the action description information;
and determining the action reward corresponding to the action description information according to the transmission energy consumption and the delivery time delay.
In the embodiment of the invention, before the beginning of each time slot t, the driving-out RSUr of the vehicle n can be calculated according to the motion information of the vehiclenTime of service range. Vehicle n and RSUr in t time slotnExpressed as communication time of
Figure BDA0003254223970000131
Further, due to the occurrence of the backlog of tasks, the new content request task generated at the time slot t may be executed only after the remaining tasks in the task queue are completed. During the time slot t, the vehicle n obtains the required code segment by searching the cache content in its associated RSU.
For content f, the code segment acquisition amount of vehicle n in t time slot is
Figure BDA0003254223970000132
The specific calculation formula of the delivery delay of the vehicle n with respect to the content f is as follows:
Figure BDA0003254223970000133
further, during data transmission, the specific calculation formula of the transmission energy consumption of the RSU for the vehicle n and the content f is as follows:
Figure BDA0003254223970000134
wherein the content of the first and second substances,
Figure BDA0003254223970000135
is the power.
In one possible implementation mode, the action reward corresponding to the action description information is obtained by carrying out weighted summation on transmission energy consumption and delivery time delay.
In the embodiment of the invention, a specific formula for obtaining the action reward corresponding to the action description information by weighting and summing transmission energy consumption and delivery time delay is as follows:
Figure BDA0003254223970000136
wherein alpha is1、α2Respectively, the transmission energy consumption and the delivery time delayA weight coefficient.
According to the scheme, the action reward based on transmission energy consumption and delivery time delay is set, so that the request content is reasonably placed.
Step 302, inputting the first environmental status information and the action description information into the value network to obtain a first function value.
Step 303, inputting the second environment status information into the actor target network to obtain the next action description information.
And step 304, inputting the second environment state information and the next action description information into the critic target network to obtain a second function value.
In the embodiment of the present invention, the second function value is specifically calculated as follows:
Qπ(s,a)=Eπ{Rt+1+γRt+22Rt+3+…|St=s,At=a}
where γ is the attenuation factor of the Markov process.
Specifically, the calculation formula of the target value in the embodiment of the present invention is specifically as follows:
y(t)=r(t)+γQ*(s(t+1),μ*(s(t+1)|θμ*Q*)
where r is the action reward, θQ*Network parameters, θ, for the critic's target networkμ*Network parameters for the actor target network.
And 305, determining a loss function according to the first function value and the second function value.
Specifically, the loss function is determined according to the first function value, the second function value and the target value.
The calculation formula of the loss function in the embodiment of the invention is specifically as follows:
Figure BDA0003254223970000141
wherein, thetaQY represents a target value for a network parameter of the value network.
And step 306, updating parameters of the content placement model according to the loss function to obtain the trained content placement model.
In particular, by
Figure BDA0003254223970000142
And
Figure BDA0003254223970000143
and updating the parameters.
Where δ is an update coefficient, θμThe network parameters representing the policy network are updated less each time by adopting a soft updating mode.
According to the scheme, the action reward based on transmission energy consumption and delivery time delay is set, so that reasonable placement of the request content is achieved, meanwhile, offline content placement is adopted, and actual interaction with the environment is not needed in the training process. And inputting the trained content placement model into a controller for global control.
Further, in step 301, according to the content cache state corresponding to the first environment state information and the task backlog queue corresponding to the first environment state information, the embodiment of the present invention determines a communication time allocation decision;
specifically, a target function is obtained;
in the embodiment of the present invention, the objective function is expressed as:
Figure BDA0003254223970000151
Figure BDA0003254223970000152
namely, the RSU with the largest remaining communication time is selected as the data acquisition source.
Determining the content of the request to be distributed according to the content cache state corresponding to the first environment state information and the task backlog queue corresponding to the first environment state information;
determining the unit value of the content of the request to be distributed according to the objective function;
specifically, the calculation formula of the unit value is as follows:
Figure BDA0003254223970000153
ordering the contents of the requests to be distributed according to the unit value to obtain an ordering result;
and distributing the communication time for the content of the request to be distributed according to the sequencing result to obtain a communication time distribution decision.
According to the embodiment of the invention, the transmission energy consumption of the roadside unit is reduced by scheduling the job in the limited communication time.
And determining a task backlog queue corresponding to the second environment state information according to the task backlog queue corresponding to the first environment state information and the communication time distribution decision.
According to the scheme, on the premise that the stability of each content queue of each vehicle is guaranteed, communication time is allocated to each content queue of a vehicle user, and therefore transmission energy consumption of the base station is reduced.
Further, before step 202, in the embodiment of the present invention, the training process of the content placement model is specifically as follows:
s1: initializing a network parameter θ of a value networkQAnd a network parameter θ of the policy networkμ
S2: initializing network parameters of a reviewer's target network
Figure BDA0003254223970000161
Network parameters of actor target network
Figure BDA0003254223970000162
S3: an empirical playback pool R of capacity C is initialized.
S4: steps S5 through S7 and S20 are performed at each round K ∈ K until all rounds expire.
S5: initializing random noise
Figure BDA0003254223970000163
S6: and initializing the current environment state information to obtain initial environment state information s (1).
S7: steps S8 to S19 are performed at each time slot T e T until all time slots terminate.
S8: carrying out caching decision, and caching the content of each roadside unit
Figure BDA0003254223970000167
And a backlog queue of tasks for each vehicle
Figure BDA0003254223970000164
As input to the job scheduling algorithm.
S9: and obtaining a communication time distribution decision according to the objective function and the job scheduling algorithm.
Specifically, the objective function is expressed as:
Figure BDA0003254223970000165
s10: and (4) performing task management in parallel aiming at each roadside unit, namely allocating communication time for each content request task of the vehicle user in the service range.
S11: according to
Figure BDA0003254223970000166
An action reward is calculated.
S12: and updating the coordinates of the vehicle to obtain the next time slot state s (t + 1).
S13: store the quadruple { s (t), (a), (t), (R), (t), s (t +1) } to the empirical replay pool R.
S14: randomly extracting M small-batch tuples { s (i), (a) (i), R (i), s (i +1) }fromthe empirical playback pool R.
S15: the target values were calculated as follows:
Figure BDA0003254223970000171
wherein, r is the action reward,
Figure BDA0003254223970000172
to be a network parameter of the critic's target network,
Figure BDA0003254223970000173
network parameters for the actor target network.
S16: the value network is trained by minimizing a loss function.
The specific calculation formula of the loss function is as follows:
Figure BDA0003254223970000174
wherein, thetaQY represents a target value for a network parameter of the value network.
S17: and training the strategy network according to the strategy gradient.
The specific calculation formula of the strategy gradient is as follows:
Figure BDA0003254223970000175
s18: by passing
Figure BDA0003254223970000176
And
Figure BDA0003254223970000177
and updating the network parameters of the critic target network and the actor target network.
S19: time slot t + 1.
S20: round k + 1.
Further, in step S9, according to the embodiment of the present invention, the objective function needs to satisfy the following constraint:
Figure BDA0003254223970000178
Figure BDA0003254223970000179
wherein the content of the first and second substances,
Figure BDA00032542239700001710
it can be seen that the above optimization problem belongs to the one-dimensional linear knapsack problem. Wherein the content of the first and second substances,
Figure BDA00032542239700001711
can be considered as the capacity of the backpack,
Figure BDA00032542239700001712
is the unit value of each item. For the knapsack problem, the optimal solution is to sequentially select the items with higher value and non-negative value to be put into the knapsack until the knapsack is filled or the items with non-negative value are all put out.
Specifically, the flow of step S9 is as follows:
s9.1: acquiring the remaining travel time of each vehicle in each roadside unit
Figure BDA0003254223970000181
S9.2: computing
Figure BDA0003254223970000182
Selecting the largest remaining travel time
Figure BDA0003254223970000183
And carrying out data downloading.
S9.3: steps S9.4 to S9.8 are performed for each vehicle.
S9.4: initializing a data acquisition time length for each content request
Figure BDA0003254223970000184
S9.5: and acquiring information such as the queue length of all content requests.
S9.6: for all content requests in accordance with
Figure BDA0003254223970000185
In descending order, i.e.
Figure BDA0003254223970000186
Figure BDA0003254223970000187
S9.7: according to the ordering of the current content requests, each content request is allocated a download time starting from the first content request, i.e. if
Figure BDA0003254223970000188
Then
Figure BDA0003254223970000189
Figure BDA00032542239700001810
On the contrary, the method can be used for carrying out the following steps,
Figure BDA00032542239700001811
s9.8: updating the current remaining allocable download time, i.e.
Figure BDA00032542239700001812
Figure BDA00032542239700001813
S9.9: and outputting communication time distribution decisions of all vehicles.
According to the scheme, the transmission energy consumption of the roadside unit is reduced by scheduling the job in the limited communication time. Based on delivery delay and roadside unit transmission energy consumption, the length of a request content queue is reduced under the condition that the time for a vehicle user to obtain the request content and the transmission energy consumption of the roadside unit are reduced, decision delay is reduced, and the system operation efficiency is improved.
Based on the same inventive concept, fig. 4 exemplarily shows a device for requesting content placement in a car networking scenario, which may be a flow of a method for requesting content placement in a car networking scenario according to an embodiment of the present invention.
The apparatus, comprising:
an obtaining module 401, configured to obtain request content and current environment state information; the current environmental state information comprises state information of a plurality of vehicles and state information of a plurality of roadside units;
a processing module 402, configured to input the current environment state information into a trained content placement model, so as to obtain action description information for placing the request content; placing the request content on a corresponding roadside unit according to the action description information; the trained content placement model is obtained by training with different environment state information.
Further, the processing module 402 is specifically configured to:
the state information of the vehicle comprises the motion information of the vehicle and a task backlog queue; the state information of the roadside unit includes a remaining buffer capacity, a content buffer state, and a communication rate of the roadside unit.
Further, the content placement model includes a value network, a policy network, an actor target network, and a critic target network, the processing module further to:
acquiring a preset number of training sample sets before inputting the current environment state information into a trained content placement model to obtain action description information for placing the request content; each group of training samples comprises first environment state information, action description information, second environment state information and action rewards; the action description information is obtained after the strategy network inputs the first environment state information; the first environment state information is environment state information before executing the action corresponding to the action description information; the second environment state information is environment state information after the action corresponding to the action description information is executed; the action reward is a reward value for executing the action corresponding to the action description information;
inputting the first environmental state information and the action description information into the value network to obtain a first function value;
inputting the second environment state information into the actor target network to obtain next action description information;
inputting the second environment state information and the next action description information into the critic target network to obtain a second function value;
determining a loss function according to the first function value and the second function value;
and updating the parameters of the content placement model according to the loss function to obtain the trained content placement model.
Further, the processing module 402 is specifically configured to: the specific calculation formula of the remaining cache capacity is as follows:
Figure BDA0003254223970000201
wherein k isr(t-1) represents the buffer capacity of the last slot roadside unit r deletion,
Figure BDA0003254223970000202
representing a cache decision variable, sfIndicates the number of encoded segments needed to decode the content f,
Figure BDA0003254223970000203
the table does not contain the size of each coded segment.
Further, the processing module 402 is specifically configured to:
acquiring the first environment state information and the action description information; the first environment state information comprises a content cache state corresponding to the first environment state information and a task backlog queue corresponding to the first environment state information;
determining the action reward according to the action description information;
and determining a task backlog queue corresponding to the second environment state information according to the content cache state corresponding to the first environment state information and the task backlog queue corresponding to the first environment state information.
Further, the processing module 402 is specifically configured to:
acquiring transmission energy consumption and delivery time delay corresponding to the action description information;
and determining the action reward corresponding to the action description information according to the transmission energy consumption and the delivery time delay.
Further, the processing module 402 is specifically configured to:
determining a communication time distribution decision according to a content cache state corresponding to the first environment state information and a task backlog queue corresponding to the first environment state information;
and determining the task backlog queue corresponding to the second environment state information according to the task backlog queue corresponding to the first environment state information and the communication time distribution decision.
Further, the processing module 402 is specifically configured to:
obtaining a target function;
determining the content of the request to be distributed according to the content cache state corresponding to the first environment state information and the task backlog queue corresponding to the first environment state information;
determining the unit value of the request content to be distributed according to the objective function;
sequencing the content of the request to be distributed according to the unit value to obtain a sequencing result;
and distributing the communication time for the content of the request to be distributed according to the sequencing result to obtain the communication time distribution decision.
Based on the same inventive concept, another embodiment of the present invention provides an electronic device, which specifically includes the following components, with reference to fig. 5: a processor 501, a memory 502, a communication interface 503, and a communication bus 504;
the processor 501, the memory 502 and the communication interface 503 complete mutual communication through the communication bus 504; the communication interface 503 is used for implementing information transmission between the devices;
the processor 501 is configured to call a computer program in the memory 502, and the processor implements all the steps of the method for requesting content placement in the car networking scenario when executing the computer program, for example, the processor implements the following steps when executing the computer program: acquiring request content and current environment state information; the current environmental state information comprises state information of a plurality of vehicles and state information of a plurality of roadside units; inputting the current environment state information into a trained content placement model to obtain action description information for placing the request content; placing the request content on a corresponding roadside unit according to the action description information; the trained content placement model is obtained by training with different environment state information.
Based on the same inventive concept, yet another embodiment of the present invention provides a non-transitory computer-readable storage medium, having a computer program stored thereon, which when executed by a processor implements all the steps of the above method for requesting content placement in an internet of vehicles scenario, for example, the processor implements the following steps when executing the computer program: acquiring request content and current environment state information; the current environmental state information comprises state information of a plurality of vehicles and state information of a plurality of roadside units; inputting the current environment state information into a trained content placement model to obtain action description information for placing the request content; placing the request content on a corresponding roadside unit according to the action description information; the trained content placement model is obtained by training with different environment state information.
In addition, the logic instructions in the memory may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a device requesting content placement in an internet-of-vehicles scenario, or a network device) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. Based on such understanding, the above technical solutions may be essentially or partially implemented in the form of a software product, which may be stored in a computer-readable storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a device for requesting content placement in an internet of vehicles scenario, or a network device, etc.) to execute the method for requesting content placement in an internet of vehicles scenario described in the embodiments or some parts of the embodiments.
In addition, in the present invention, terms such as "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
Moreover, in the present invention, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
Furthermore, in the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (11)

1. A method for requesting content placement in a vehicle networking scenario, comprising:
acquiring request content and current environment state information; the current environment state information comprises state information of a plurality of vehicles, state information of a plurality of roadside units and state information of a plurality of channels;
inputting the current environment state information into a trained content placement model to obtain action description information for placing the request content;
placing the request content on a corresponding roadside unit according to the action description information;
the trained content placement model is obtained by training with different environment state information.
2. The method for requesting content placement in an internet of vehicles scenario as claimed in claim 1, wherein the status information of the vehicle comprises motion information of the vehicle and a backlog queue of tasks; the state information of the roadside unit comprises the residual cache capacity and the content cache state of the roadside unit; the state information of the channel includes a communication rate.
3. The method for requesting content placement in an internet of vehicles scenario as claimed in claim 1, wherein the content placement model comprises a value network, a policy network, an actor target network and a critic target network, and further comprises, before the inputting the current environmental status information into the trained content placement model to obtain the action description information for placing the requested content:
acquiring a preset number of training sample sets; each group of training samples comprises first environment state information, action description information, second environment state information and action rewards; the action description information is obtained after the strategy network inputs the first environment state information; the first environment state information is environment state information before executing the action corresponding to the action description information; the second environment state information is environment state information after the action corresponding to the action description information is executed; the action reward is a reward value for executing the action corresponding to the action description information;
inputting the first environmental state information and the action description information into the value network to obtain a first function value;
inputting the second environment state information into the actor target network to obtain next action description information;
inputting the second environment state information and the next action description information into the critic target network to obtain a second function value;
determining a loss function according to the first function value and the second function value;
and updating the parameters of the content placement model according to the loss function to obtain the trained content placement model.
4. The method for requesting content placement in an internet of vehicles scenario as claimed in claim 2, wherein the specific calculation formula of the remaining buffer capacity is as follows:
Figure FDA0003254223960000021
wherein k isr(t-1) represents the buffer capacity of the last slot roadside unit r deletion,
Figure FDA0003254223960000022
representing a cache decision variable, sfIndicates the number of encoded segments needed to decode the content f,
Figure FDA0003254223960000023
indicating the size of each encoded segment of the content f.
5. The method for requesting content placement in an internet of vehicles scenario as claimed in claim 3, wherein said obtaining a preset number of training sample sets comprises:
acquiring the first environment state information and the action description information; the first environment state information comprises a content cache state corresponding to the first environment state information and a task backlog queue corresponding to the first environment state information;
determining the action reward according to the action description information;
and determining a task backlog queue corresponding to the second environment state information according to the content cache state corresponding to the first environment state information and the task backlog queue corresponding to the first environment state information.
6. The method for requesting content placement in an internet of vehicles scenario as claimed in claim 5, wherein said determining the action reward according to the action description information comprises:
acquiring transmission energy consumption and delivery time delay corresponding to the action description information;
and determining the action reward corresponding to the action description information according to the transmission energy consumption and the delivery time delay.
7. The method for requesting content placement in an internet of vehicles scenario according to claim 5, wherein determining the task backlog queue corresponding to the second environment status information according to the content caching status corresponding to the first environment status information and the task backlog queue corresponding to the first environment status information comprises:
determining a communication time distribution decision according to a content cache state corresponding to the first environment state information and a task backlog queue corresponding to the first environment state information;
and determining the task backlog queue corresponding to the second environment state information according to the task backlog queue corresponding to the first environment state information and the communication time distribution decision.
8. The method for requesting content placement in an internet of vehicles scenario of claim 7, wherein the determining a communication time allocation decision according to the content caching status corresponding to the first environment status information and the task backlog queue corresponding to the first environment status information comprises:
obtaining a target function;
determining the content of the request to be distributed according to the content cache state corresponding to the first environment state information and the task backlog queue corresponding to the first environment state information;
determining the unit value of the request content to be distributed according to the objective function;
sequencing the content of the request to be distributed according to the unit value to obtain a sequencing result;
and distributing the communication time for the content of the request to be distributed according to the sequencing result to obtain the communication time distribution decision.
9. An apparatus for requesting content placement in a vehicle networking scenario, comprising:
the acquisition module is used for acquiring request content and current environment state information; the current environmental state information comprises state information of a plurality of vehicles and state information of a plurality of roadside units;
the processing module is used for inputting the current environment state information into a trained content placement model to obtain action description information for placing the request content; placing the request content on a corresponding roadside unit according to the action description information; the trained content placement model is obtained by training with different environment state information.
10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method according to any of claims 1 to 8 are implemented when the processor executes the program.
11. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 8.
CN202111054937.3A 2021-09-09 2021-09-09 Method and device for placing request content in Internet of vehicles scene and electronic equipment Active CN113992706B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111054937.3A CN113992706B (en) 2021-09-09 2021-09-09 Method and device for placing request content in Internet of vehicles scene and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111054937.3A CN113992706B (en) 2021-09-09 2021-09-09 Method and device for placing request content in Internet of vehicles scene and electronic equipment

Publications (2)

Publication Number Publication Date
CN113992706A true CN113992706A (en) 2022-01-28
CN113992706B CN113992706B (en) 2023-05-23

Family

ID=79735493

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111054937.3A Active CN113992706B (en) 2021-09-09 2021-09-09 Method and device for placing request content in Internet of vehicles scene and electronic equipment

Country Status (1)

Country Link
CN (1) CN113992706B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170331670A1 (en) * 2016-05-13 2017-11-16 Telefonaktiebolaget Lm Ericsson (Publ) Network Architecture, Methods, and Devices for a Wireless Communications Network
CN111385734A (en) * 2020-02-19 2020-07-07 重庆邮电大学 Internet of vehicles content caching decision optimization method
CN112995950A (en) * 2021-02-07 2021-06-18 华南理工大学 Resource joint allocation method based on deep reinforcement learning in Internet of vehicles
CN113094982A (en) * 2021-03-29 2021-07-09 天津理工大学 Internet of vehicles edge caching method based on multi-agent deep reinforcement learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170331670A1 (en) * 2016-05-13 2017-11-16 Telefonaktiebolaget Lm Ericsson (Publ) Network Architecture, Methods, and Devices for a Wireless Communications Network
CN111385734A (en) * 2020-02-19 2020-07-07 重庆邮电大学 Internet of vehicles content caching decision optimization method
CN112995950A (en) * 2021-02-07 2021-06-18 华南理工大学 Resource joint allocation method based on deep reinforcement learning in Internet of vehicles
CN113094982A (en) * 2021-03-29 2021-07-09 天津理工大学 Internet of vehicles edge caching method based on multi-agent deep reinforcement learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
TENG MA: "Deep Reinforcement Learning for Pre-caching and Task Allocation in Internet of Vehicles" *
宁兆龙: "基于多智能体元强化学习的车联网协同服务缓存和计算卸载" *
汤媛媛;朱琦;胡晗;: "基于内容及留存时间的VANETs转发策略" *

Also Published As

Publication number Publication date
CN113992706B (en) 2023-05-23

Similar Documents

Publication Publication Date Title
Ndikumana et al. Deep learning based caching for self-driving cars in multi-access edge computing
CN112291793B (en) Resource allocation method and device of network access equipment
CN111835827A (en) Internet of things edge computing task unloading method and system
EP3732628A1 (en) Learning data augmentation policies
CN111049903B (en) Edge network load distribution algorithm based on application perception prediction
JP2007317068A (en) Recommending device and recommending system
CN114764471A (en) Recommendation method, recommendation device and storage medium
CN113364854A (en) Privacy protection dynamic edge cache design method based on distributed reinforcement learning in mobile edge computing network
CN112905312A (en) Workflow scheduling method based on deep Q neural network in edge computing environment
CN116112563A (en) Dual-strategy self-adaptive cache replacement method based on popularity prediction
CN116166690A (en) Mixed vector retrieval method and device for high concurrency scene
Shingne et al. Heuristic deep learning scheduling in cloud for resource-intensive internet of things systems
CN114240506A (en) Modeling method of multi-task model, promotion content processing method and related device
Iqbal et al. Intelligent multimedia content delivery in 5G/6G networks: a reinforcement learning approach
CN113052253A (en) Hyper-parameter determination method, device, deep reinforcement learning framework, medium and equipment
CN113992706B (en) Method and device for placing request content in Internet of vehicles scene and electronic equipment
CN113411826A (en) Edge network equipment caching method based on attention mechanism reinforcement learning
CN116915869A (en) Cloud edge cooperation-based time delay sensitive intelligent service quick response method
CN108770014B (en) Calculation evaluation method, system and device of network server and readable storage medium
CN111104951A (en) Active learning method and device and terminal equipment
CN112101729B (en) Mobile edge computing system energy distribution method based on deep double Q learning
CN112949850B (en) Super-parameter determination method, device, deep reinforcement learning framework, medium and equipment
CN114339879A (en) Service migration method based on reinforcement learning in mobile edge calculation
CN114090239A (en) Model-based reinforcement learning edge resource scheduling method and device
CN113676519B (en) Combined optimization method and device for vehicle content pre-caching and broadband distribution

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant