CN113992706A

CN113992706A - Method and device for requesting content placement in Internet of vehicles scene and electronic equipment

Info

Publication number: CN113992706A
Application number: CN202111054937.3A
Authority: CN
Inventors: 陈莹; 马腾; 陈昕
Original assignee: Beijing Information Science and Technology University
Current assignee: Beijing Information Science and Technology University
Priority date: 2021-09-09
Filing date: 2021-09-09
Publication date: 2022-01-28
Anticipated expiration: 2041-09-09
Also published as: CN113992706B

Abstract

The invention discloses a method and a device for requesting content placement in a car networking scene and electronic equipment, wherein the method comprises the following steps: the method comprises the steps of obtaining request content and current environment state information, inputting the current environment state information into a trained content placement model to obtain action description information for placing the request content, and placing the request content on a corresponding roadside unit according to the action description information, wherein the trained content placement model is obtained after training is carried out by utilizing different environment state information. The trained content placement model is input into the controller for global control, and a requested content placement decision can be made in real time according to the state information of the vehicle and the state information of the roadside units, so that the decision time delay is reduced, and the system operation efficiency is improved.

Description

Method and device for requesting content placement in Internet of vehicles scene and electronic equipment

Technical Field

The invention relates to the technical field of computers, in particular to a method and a device for requesting content placement in a car networking scene and electronic equipment.

Background

With the popularization of the 5G technology, the Internet of vehicles becomes a key technology of intelligent cities, traffic jam is hopeful to be solved, and traffic accidents caused by improper driving are reduced. The development of vehicle-mounted intelligence also enables a vehicle to be converted into an intelligent terminal from a vehicle, and a plurality of vehicle-mounted applications, such as vehicle-mounted maps, vehicle-mounted videos and the like, are also on line on a large scale.

In a car networking scenario, an on-board application often has a requirement of low time delay, and in order to meet the requirement, applying an edge caching technology to the acquisition of the content requested by the vehicle is a significant method. On the one hand, however, unlike the traditional scenario of being relatively static, the internet of vehicles tends to be highly dynamic and random due to the high speed mobility of the vehicles. On the other hand, autonomous vehicles generate a large amount of data transmission per hour, which poses a challenge to data processing and transmission energy consumption of the base station.

In summary, there is a need for a method for requesting content placement in an internet of vehicles scenario to solve the above-mentioned problems in the prior art.

Disclosure of Invention

Because the existing method has the problems, the invention provides a method and a device for requesting content placement in an internet of vehicles scene, electronic equipment and a storage medium.

In a first aspect, the present invention provides a method for requesting content placement in a car networking scenario, including:

acquiring request content and current environment state information; the current environment state information comprises state information of a plurality of vehicles, state information of a plurality of roadside units and state information of a plurality of channels;

inputting the current environment state information into a trained content placement model to obtain action description information for placing the request content;

placing the request content on a corresponding roadside unit according to the action description information;

the trained content placement model is obtained by training with different environment state information.

Further, the state information of the vehicle comprises the motion information of the vehicle and a task backlog queue; the state information of the roadside unit comprises the residual cache capacity and the content cache state of the roadside unit; the state information of the channel includes a communication rate.

Further, the content placement model includes a value network, a policy network, an actor target network, and a critic target network, and before the current environmental state information is input to the trained content placement model and the action description information for placing the requested content is obtained, the method further includes:

acquiring a preset number of training sample sets; each group of training samples comprises first environment state information, action description information, second environment state information and action rewards; the action description information is obtained after the strategy network inputs the first environment state information; the first environment state information is environment state information before executing the action corresponding to the action description information; the second environment state information is environment state information after the action corresponding to the action description information is executed; the action reward is a reward value for executing the action corresponding to the action description information;

inputting the first environmental state information and the action description information into the value network to obtain a first function value;

inputting the second environment state information into the actor target network to obtain next action description information;

inputting the second environment state information and the next action description information into the critic target network to obtain a second function value;

determining a loss function according to the first function value and the second function value;

and updating the parameters of the content placement model according to the loss function to obtain the trained content placement model.

Further, the specific calculation formula of the remaining buffer capacity is as follows:

wherein k is_r(t-1) represents the buffer capacity of the last slot roadside unit r deletion,

representing a cache decision variable, s_fIndicates the number of encoded segments needed to decode the content f,

indicating the size of each encoded segment of the content f.

Further, the acquiring a preset number of training sample sets includes:

acquiring the first environment state information and the action description information; the first environment state information comprises a content cache state corresponding to the first environment state information and a task backlog queue corresponding to the first environment state information;

determining the action reward according to the action description information;

and determining a task backlog queue corresponding to the second environment state information according to the content cache state corresponding to the first environment state information and the task backlog queue corresponding to the first environment state information.

Further, the determining the action reward according to the action description information includes:

acquiring transmission energy consumption and delivery time delay corresponding to the action description information;

and determining the action reward corresponding to the action description information according to the transmission energy consumption and the delivery time delay.

Further, the determining the task backlog queue corresponding to the second environment state information according to the content cache state corresponding to the first environment state information and the task backlog queue corresponding to the first environment state information includes:

determining a communication time distribution decision according to a content cache state corresponding to the first environment state information and a task backlog queue corresponding to the first environment state information;

and determining the task backlog queue corresponding to the second environment state information according to the task backlog queue corresponding to the first environment state information and the communication time distribution decision.

Further, the determining a communication time allocation decision according to the content cache status corresponding to the first environmental status information and the backlog queue of the task corresponding to the first environmental status information includes:

obtaining a target function;

determining the content of the request to be distributed according to the content cache state corresponding to the first environment state information and the task backlog queue corresponding to the first environment state information;

determining the unit value of the request content to be distributed according to the objective function;

sequencing the content of the request to be distributed according to the unit value to obtain a sequencing result;

and distributing the communication time for the content of the request to be distributed according to the sequencing result to obtain the communication time distribution decision.

In a second aspect, the present invention provides an apparatus for requesting content placement in a car networking scenario, including:

the acquisition module is used for acquiring request content and current environment state information; the current environment state information comprises state information of a plurality of vehicles, state information of a plurality of roadside units and state information of a plurality of channels;

the processing module is used for inputting the current environment state information into a trained content placement model to obtain action description information for placing the request content; placing the request content on a corresponding roadside unit according to the action description information; the trained content placement model is obtained by training with different environment state information.

Further, the processing module is specifically configured to:

the state information of the vehicle comprises the motion information of the vehicle and a task backlog queue; the state information of the roadside unit comprises the residual cache capacity and the content cache state of the roadside unit; the state information of the channel includes a communication rate.

Further, the content placement model includes a value network, a policy network, an actor target network, and a critic target network, the processing module further to:

acquiring a preset number of training sample sets before inputting the current environment state information into a trained content placement model to obtain action description information for placing the request content; each group of training samples comprises first environment state information, action description information, second environment state information and action rewards; the action description information is obtained after the strategy network inputs the first environment state information; the first environment state information is environment state information before executing the action corresponding to the action description information; the second environment state information is environment state information after the action corresponding to the action description information is executed; the action reward is a reward value for executing the action corresponding to the action description information;

Further, the processing module is specifically configured to: the specific calculation formula of the remaining cache capacity is as follows:

indicating the size of each encoded segment of the content f.

Further, the processing module is specifically configured to:

determining the action reward according to the action description information;

Further, the processing module is specifically configured to:

obtaining a target function;

In a third aspect, the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the processor implements the method for requesting content placement in the scenario of internet of vehicles as described in the first aspect.

In a fourth aspect, the present invention also provides a non-transitory computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the method for requesting content placement in a car networking scenario as described in the first aspect.

According to the technical scheme, the method, the device and the electronic equipment for requesting content placement in the Internet of vehicles scene, provided by the invention, have the advantages that the trained content placement model is input into the controller for global control, a request content placement decision can be made in real time according to the state information of the vehicles and the state information of the roadside units, the decision delay is reduced, and the system operation efficiency is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a system framework for a method of requesting content placement in a vehicle networking scenario provided by the present invention;

FIG. 2 is a schematic flow chart illustrating a method for requesting content placement in an Internet of vehicles scenario provided by the present invention;

FIG. 3 is a schematic flow chart of a training content placement model provided by the present invention;

FIG. 4 is a schematic structural diagram of a device for requesting content placement in a vehicle networking scenario according to the present invention;

fig. 5 is a schematic structural diagram of an electronic device provided in the present invention.

Detailed Description

The following further describes embodiments of the present invention with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.

In the embodiment of the present invention, the base station includes all request contents required by users, and R roadside units (RSUs) having a buffering and calculating function and N vehicle users in a driving state are respectively represented by sets R ═ 1, 2,. multidrug, R,. multidrug, and N ═ 1, 2,. multidrug, N,. multidrug, and N ═ N. The length of each timestamp in the discrete-time system T {1, 2.,. T } is denoted by τ.

Further, before the start of each time slot, the vehicle user generates the request content q_f(t)＝{p_f(t)，z_f(t) }, in which p_f(t) is the probability that content f is requested, z_f(t) is the size of the content f.

It should be noted that the request probability p of an arbitrary content f_f(t) obeying a Zipf distribution with

Where epsilon is the request factor.

Due to the limited buffer capacity of the roadside units, each roadside unit cannot store all content that may be requested by a user. According to the embodiment of the invention, the coding segments of the related contents are cached in the roadside unit dynamically according to the state information of the vehicle, and when the vehicle collects the coding segments of the related contents, the coding segments can be decoded locally to obtain the complete contents.

In a possible implementation manner, the coding manner is Maximum-Distance-Separable (MDS) coding, and may also be other coding manners, which is not specifically limited in this embodiment of the present invention.

The method for placing content provided by the embodiment of the present invention may be applied to a system architecture as shown in fig. 1, where the system architecture includes a content placement model 100, a controller 200, a base station 300, a roadside unit 400, and a vehicle 500.

Specifically, the content placement model 100 is used to obtain the action description information of the placement request content after inputting the current environment state information.

It should be noted that the current environmental state information includes state information of the vehicle 500, state information of the roadside unit 400, and state information of the channel.

Further, the controller 200 places the requested contents of the base station 300 on the corresponding roadside unit 400 according to the action description information.

In one possible implementation, a software defined network SDN controller is employed.

It should be noted that, the trained content placement model 100 is obtained by training with different tasks and different environmental status information.

It should be noted that fig. 1 is only an example of a system architecture according to the embodiment of the present invention, and the present invention is not limited to this specifically.

Based on the above illustrated system architecture, fig. 2 is a schematic flow diagram corresponding to a method for requesting content placement in an internet of vehicles scenario provided by the embodiment of the present invention, as shown in fig. 2, the method includes:

step 201, obtaining the request content and the current environment state information.

It should be noted that the current environmental state information includes state information of a plurality of vehicles and state information of a plurality of roadside units.

In the embodiment of the invention, the state information of the vehicle comprises the motion information of the vehicle and a task backlog queue; the state information of the roadside unit includes a remaining buffer capacity, a content buffer state, and a communication rate of the roadside unit.

Further, with R_n(t) to represent the set of RSUs associated with vehicle n in time slot t. With N_r(t) indicates the presence of an RSU in time slot t_rA collection of vehicles within service range.

In the embodiment of the invention, the motion information of the vehicle comprises the position information, the speed magnitude and the speed direction of the vehicle.

In the embodiment of the present invention, the first and second substrates,

for buffering decision variables, i.e. base station to RSU in t time slot_rThe number of the coding segments of the cache content f is

Wherein s is_fIndicating the number of encoded segments needed to decode the content f.

The embodiment of the invention adopts a Least Recently Used (LRU) strategy as a cache updating strategy. The RSU before the start of the current time slot t_rThe specific calculation formula of the remaining buffer capacity is as follows:

wherein k is_r(t-1) represents the last time slot RSU_rBased on the cache capacity deleted by the LRU policy,

indicating the size of each encoded segment of the content f.

Further, due to the limitation of RSU buffer capacity, it needs to be satisfied

After the buffer is finished, the RSU_rThe content cache state in (1) is updated as follows:

wherein the content of the first and second substances,

indicating content f in RSU_rThe number of the code segments buffered in the buffer.

Further, the vehicle user generates a content request task before the start of each time slot, and the limited communication time of each time slot may result in the task being incomplete, thereby generating a backlog of tasks. By using

Representing the backlog of the vehicle n at time t, then

The update method of (1) is as follows:

in the embodiment of the present invention, the first and second substrates,

for the average queue length of vehicle n with respect to content f, in order to stabilize the long-term task queue, the following formula needs to be satisfied:

where ε is the constraint value that keeps the task queue stable.

Further, the vehicle n and the RSUr_nThe specific calculation formula of the communication rate between the two is as follows:

where H (t) represents the channel gain, obeying to g₀(d₀/d)⁴Index distribution of (1), g₀Is a path loss exponent, d₀And d reflects the reference distance and the real distance of the vehicle mobility.

According to the scheme, the content placement decision is made in real time according to the motion information of the current vehicle, the task backlog queue, the residual cache capacity of the roadside unit, the content cache state and the communication rate, and the system operation efficiency is improved.

In particular, the method comprises the following steps of,

indicates the bandwidth allocated to the vehicle n,

is the maximum available bandwidth of the RSU,

is RSUr_nThe number of vehicles served.

Based on this, the current environmental state information can be represented as the following state space:

wherein i_NRepresenting the position coordinates of the vehicle, v_nRepresenting the speed of the vehicle, the set eta for the selectable directions of the vehicle at each moment_nAnd the symbols are expressed as { E, S, W, N } and represent four directions of southeast, northwest, respectively.

Further, the vehicle waits at the intersection with a probability of movement. The probability of movement of the vehicle n at the intersection is expressed as,

where ρ is_intRepresenting intersection density, v, in a scene_nWhich indicates the traveling speed of the vehicle n,

the probability of waiting for the vehicle n is indicated,

representing the maximum waiting time for vehicle n. The calculation formula of the probability that the vehicle n stays at the intersection is as follows:

step 202, inputting the current environment state information into the trained content placement model to obtain the action description information for placing the requested content.

In the embodiment of the present invention, the action description information is expressed as:

and step 203, placing the request content on the corresponding roadside unit according to the action description information.

It should be noted that, the trained content placement model is obtained by training with different environmental state information.

According to the scheme, the trained content placement model is input into the controller to be subjected to global control, and a requested content placement decision can be made in real time according to the state information of the vehicle and the state information of the roadside units, so that the decision delay is reduced, and the system operation efficiency is improved.

Further, the content placement model of the embodiment of the present invention includes a value network, a policy network, an actor target network, and a critic target network, and before step 202, the flow of steps is as shown in fig. 3, specifically as follows:

step 301, a preset number of training sample sets are obtained.

It should be noted that each set of training samples includes first environmental status information, action description information, second environmental status information, and action reward.

Specifically, the action description information is obtained after the policy network inputs the first environment state information; the first environment state information is environment state information before the action corresponding to the action description information is executed; the second environment state information is environment state information after the action corresponding to the action description information is executed; the action reward is a reward value for executing the action corresponding to the action description information.

In the embodiment of the present invention, the generation process of the training sample set is as follows:

S₀→A₀→R₀→S₁→…→S_t-1→A_t-1→R_t-1→S_t→…

wherein S is_t-1Indicating the environmental status information at time t-1, A_t-1Representing the action description taken at time t-1, R_t-1Representing the action award acquired at time t-1.

Specifically, first environment state information and action description information are obtained;

it should be noted that the first environment state information includes a content cache state corresponding to the first environment state information and a task backlog queue corresponding to the first environment state information.

Determining an action reward according to the action description information;

Specifically, acquiring transmission energy consumption and delivery time delay corresponding to the action description information;

In the embodiment of the invention, before the beginning of each time slot t, the driving-out RSUr of the vehicle n can be calculated according to the motion information of the vehicle_nTime of service range. Vehicle n and RSUr in t time slot_nExpressed as communication time of

Further, due to the occurrence of the backlog of tasks, the new content request task generated at the time slot t may be executed only after the remaining tasks in the task queue are completed. During the time slot t, the vehicle n obtains the required code segment by searching the cache content in its associated RSU.

For content f, the code segment acquisition amount of vehicle n in t time slot is

The specific calculation formula of the delivery delay of the vehicle n with respect to the content f is as follows:

further, during data transmission, the specific calculation formula of the transmission energy consumption of the RSU for the vehicle n and the content f is as follows:

wherein the content of the first and second substances,

is the power.

In one possible implementation mode, the action reward corresponding to the action description information is obtained by carrying out weighted summation on transmission energy consumption and delivery time delay.

In the embodiment of the invention, a specific formula for obtaining the action reward corresponding to the action description information by weighting and summing transmission energy consumption and delivery time delay is as follows:

wherein alpha is₁、α₂Respectively, the transmission energy consumption and the delivery time delayA weight coefficient.

According to the scheme, the action reward based on transmission energy consumption and delivery time delay is set, so that the request content is reasonably placed.

Step 302, inputting the first environmental status information and the action description information into the value network to obtain a first function value.

Step 303, inputting the second environment status information into the actor target network to obtain the next action description information.

And step 304, inputting the second environment state information and the next action description information into the critic target network to obtain a second function value.

In the embodiment of the present invention, the second function value is specifically calculated as follows:

Q_π(s，a)＝E_π{R_t+1+γR_t+2+γ²R_t+3+…|S_t＝s，A_t＝a}

where γ is the attenuation factor of the Markov process.

Specifically, the calculation formula of the target value in the embodiment of the present invention is specifically as follows:

y(t)＝r(t)+γQ^*(s(t+1)，μ^*(s(t+1)|θ^μ*)θ^Q*)

where r is the action reward, θ^Q*Network parameters, θ, for the critic's target network^μ*Network parameters for the actor target network.

And 305, determining a loss function according to the first function value and the second function value.

Specifically, the loss function is determined according to the first function value, the second function value and the target value.

The calculation formula of the loss function in the embodiment of the invention is specifically as follows:

wherein, theta^QY represents a target value for a network parameter of the value network.

And step 306, updating parameters of the content placement model according to the loss function to obtain the trained content placement model.

In particular, by

And

and updating the parameters.

Where δ is an update coefficient, θ^μThe network parameters representing the policy network are updated less each time by adopting a soft updating mode.

According to the scheme, the action reward based on transmission energy consumption and delivery time delay is set, so that reasonable placement of the request content is achieved, meanwhile, offline content placement is adopted, and actual interaction with the environment is not needed in the training process. And inputting the trained content placement model into a controller for global control.

Further, in step 301, according to the content cache state corresponding to the first environment state information and the task backlog queue corresponding to the first environment state information, the embodiment of the present invention determines a communication time allocation decision;

specifically, a target function is obtained;

in the embodiment of the present invention, the objective function is expressed as:

namely, the RSU with the largest remaining communication time is selected as the data acquisition source.

determining the unit value of the content of the request to be distributed according to the objective function;

specifically, the calculation formula of the unit value is as follows:

ordering the contents of the requests to be distributed according to the unit value to obtain an ordering result;

and distributing the communication time for the content of the request to be distributed according to the sequencing result to obtain a communication time distribution decision.

According to the embodiment of the invention, the transmission energy consumption of the roadside unit is reduced by scheduling the job in the limited communication time.

And determining a task backlog queue corresponding to the second environment state information according to the task backlog queue corresponding to the first environment state information and the communication time distribution decision.

According to the scheme, on the premise that the stability of each content queue of each vehicle is guaranteed, communication time is allocated to each content queue of a vehicle user, and therefore transmission energy consumption of the base station is reduced.

Further, before step 202, in the embodiment of the present invention, the training process of the content placement model is specifically as follows:

s1: initializing a network parameter θ of a value network^QAnd a network parameter θ of the policy network^μ。

S2: initializing network parameters of a reviewer's target network

Network parameters of actor target network

S3: an empirical playback pool R of capacity C is initialized.

S4: steps S5 through S7 and S20 are performed at each round K ∈ K until all rounds expire.

S5: initializing random noise

S6: and initializing the current environment state information to obtain initial environment state information s (1).

S7: steps S8 to S19 are performed at each time slot T e T until all time slots terminate.

S8: carrying out caching decision, and caching the content of each roadside unit

And a backlog queue of tasks for each vehicle

As input to the job scheduling algorithm.

S9: and obtaining a communication time distribution decision according to the objective function and the job scheduling algorithm.

Specifically, the objective function is expressed as:

s10: and (4) performing task management in parallel aiming at each roadside unit, namely allocating communication time for each content request task of the vehicle user in the service range.

S11: according to

An action reward is calculated.

S12: and updating the coordinates of the vehicle to obtain the next time slot state s (t + 1).

S13: store the quadruple { s (t), (a), (t), (R), (t), s (t +1) } to the empirical replay pool R.

S14: randomly extracting M small-batch tuples { s (i), (a) (i), R (i), s (i +1) }fromthe empirical playback pool R.

S15: the target values were calculated as follows:

wherein, r is the action reward,

to be a network parameter of the critic's target network,

network parameters for the actor target network.

S16: the value network is trained by minimizing a loss function.

The specific calculation formula of the loss function is as follows:

S17: and training the strategy network according to the strategy gradient.

The specific calculation formula of the strategy gradient is as follows:

s18: by passing

And

and updating the network parameters of the critic target network and the actor target network.

S19: time slot t + 1.

S20: round k + 1.

Further, in step S9, according to the embodiment of the present invention, the objective function needs to satisfy the following constraint:

wherein the content of the first and second substances,

it can be seen that the above optimization problem belongs to the one-dimensional linear knapsack problem. Wherein the content of the first and second substances,

can be considered as the capacity of the backpack,

is the unit value of each item. For the knapsack problem, the optimal solution is to sequentially select the items with higher value and non-negative value to be put into the knapsack until the knapsack is filled or the items with non-negative value are all put out.

Specifically, the flow of step S9 is as follows:

s9.1: acquiring the remaining travel time of each vehicle in each roadside unit

S9.2: computing

Selecting the largest remaining travel time

And carrying out data downloading.

S9.3: steps S9.4 to S9.8 are performed for each vehicle.

S9.4: initializing a data acquisition time length for each content request

S9.5: and acquiring information such as the queue length of all content requests.

S9.6: for all content requests in accordance with

In descending order, i.e.

S9.7: according to the ordering of the current content requests, each content request is allocated a download time starting from the first content request, i.e. if

Then

On the contrary, the method can be used for carrying out the following steps,

s9.8: updating the current remaining allocable download time, i.e.

S9.9: and outputting communication time distribution decisions of all vehicles.

According to the scheme, the transmission energy consumption of the roadside unit is reduced by scheduling the job in the limited communication time. Based on delivery delay and roadside unit transmission energy consumption, the length of a request content queue is reduced under the condition that the time for a vehicle user to obtain the request content and the transmission energy consumption of the roadside unit are reduced, decision delay is reduced, and the system operation efficiency is improved.

Based on the same inventive concept, fig. 4 exemplarily shows a device for requesting content placement in a car networking scenario, which may be a flow of a method for requesting content placement in a car networking scenario according to an embodiment of the present invention.

The apparatus, comprising:

an obtaining module 401, configured to obtain request content and current environment state information; the current environmental state information comprises state information of a plurality of vehicles and state information of a plurality of roadside units;

a processing module 402, configured to input the current environment state information into a trained content placement model, so as to obtain action description information for placing the request content; placing the request content on a corresponding roadside unit according to the action description information; the trained content placement model is obtained by training with different environment state information.

Further, the processing module 402 is specifically configured to:

the state information of the vehicle comprises the motion information of the vehicle and a task backlog queue; the state information of the roadside unit includes a remaining buffer capacity, a content buffer state, and a communication rate of the roadside unit.

Further, the processing module 402 is specifically configured to: the specific calculation formula of the remaining cache capacity is as follows:

the table does not contain the size of each coded segment.

Further, the processing module 402 is specifically configured to:

determining the action reward according to the action description information;

Further, the processing module 402 is specifically configured to:

obtaining a target function;

Based on the same inventive concept, another embodiment of the present invention provides an electronic device, which specifically includes the following components, with reference to fig. 5: a processor 501, a memory 502, a communication interface 503, and a communication bus 504;

the processor 501, the memory 502 and the communication interface 503 complete mutual communication through the communication bus 504; the communication interface 503 is used for implementing information transmission between the devices;

the processor 501 is configured to call a computer program in the memory 502, and the processor implements all the steps of the method for requesting content placement in the car networking scenario when executing the computer program, for example, the processor implements the following steps when executing the computer program: acquiring request content and current environment state information; the current environmental state information comprises state information of a plurality of vehicles and state information of a plurality of roadside units; inputting the current environment state information into a trained content placement model to obtain action description information for placing the request content; placing the request content on a corresponding roadside unit according to the action description information; the trained content placement model is obtained by training with different environment state information.

Based on the same inventive concept, yet another embodiment of the present invention provides a non-transitory computer-readable storage medium, having a computer program stored thereon, which when executed by a processor implements all the steps of the above method for requesting content placement in an internet of vehicles scenario, for example, the processor implements the following steps when executing the computer program: acquiring request content and current environment state information; the current environmental state information comprises state information of a plurality of vehicles and state information of a plurality of roadside units; inputting the current environment state information into a trained content placement model to obtain action description information for placing the request content; placing the request content on a corresponding roadside unit according to the action description information; the trained content placement model is obtained by training with different environment state information.

In addition, the logic instructions in the memory may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a device requesting content placement in an internet-of-vehicles scenario, or a network device) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. Based on such understanding, the above technical solutions may be essentially or partially implemented in the form of a software product, which may be stored in a computer-readable storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a device for requesting content placement in an internet of vehicles scenario, or a network device, etc.) to execute the method for requesting content placement in an internet of vehicles scenario described in the embodiments or some parts of the embodiments.

In addition, in the present invention, terms such as "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

Moreover, in the present invention, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Furthermore, in the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method for requesting content placement in a vehicle networking scenario, comprising:

2. The method for requesting content placement in an internet of vehicles scenario as claimed in claim 1, wherein the status information of the vehicle comprises motion information of the vehicle and a backlog queue of tasks; the state information of the roadside unit comprises the residual cache capacity and the content cache state of the roadside unit; the state information of the channel includes a communication rate.

3. The method for requesting content placement in an internet of vehicles scenario as claimed in claim 1, wherein the content placement model comprises a value network, a policy network, an actor target network and a critic target network, and further comprises, before the inputting the current environmental status information into the trained content placement model to obtain the action description information for placing the requested content:

4. The method for requesting content placement in an internet of vehicles scenario as claimed in claim 2, wherein the specific calculation formula of the remaining buffer capacity is as follows:

indicating the size of each encoded segment of the content f.

5. The method for requesting content placement in an internet of vehicles scenario as claimed in claim 3, wherein said obtaining a preset number of training sample sets comprises:

determining the action reward according to the action description information;

6. The method for requesting content placement in an internet of vehicles scenario as claimed in claim 5, wherein said determining the action reward according to the action description information comprises:

7. The method for requesting content placement in an internet of vehicles scenario according to claim 5, wherein determining the task backlog queue corresponding to the second environment status information according to the content caching status corresponding to the first environment status information and the task backlog queue corresponding to the first environment status information comprises:

8. The method for requesting content placement in an internet of vehicles scenario of claim 7, wherein the determining a communication time allocation decision according to the content caching status corresponding to the first environment status information and the task backlog queue corresponding to the first environment status information comprises:

obtaining a target function;

9. An apparatus for requesting content placement in a vehicle networking scenario, comprising:

the acquisition module is used for acquiring request content and current environment state information; the current environmental state information comprises state information of a plurality of vehicles and state information of a plurality of roadside units;

10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method according to any of claims 1 to 8 are implemented when the processor executes the program.

11. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 8.