CN113271338B

CN113271338B - Intelligent preloading method for mobile augmented reality scene

Info

Publication number: CN113271338B
Application number: CN202110445941.6A
Authority: CN
Inventors: 吴俊�; 韩雨琪; 胡蝶; 刘典; 徐跃东
Original assignee: Fudan University
Current assignee: Fudan University
Priority date: 2021-04-25
Filing date: 2021-04-25
Publication date: 2022-04-12
Anticipated expiration: 2041-04-25
Also published as: CN113271338A

Abstract

The invention belongs to the technical field of wireless transmission, and particularly relates to an intelligent preloading algorithm for a mobile augmented reality scene. According to the intelligent preloading algorithm, the edge server side learns the track of the user, and pushes the file to the user before the user does not reach a certain holographic content. The push algorithm utilizes idle bandwidth to transmit holographic content to improve transmission efficiency of the edge base station. Under the condition that the motion trail of the user is not predicted in advance, the intelligent preloading algorithm takes the motion trail of the user as a Markov decision process, and an optimal preloading strategy is learned in a self-adaptive mode. The intelligent preload algorithm causes the mobile device to selectively store the received content through its own cache space for future requests. Particularly, in order to solve the problem of learning non-convergence caused by sparse holographic content in a scene, a state-dependent Q learning algorithm is provided.

Description

Intelligent preloading method for mobile augmented reality scene

Technical Field

The invention belongs to the technical field of wireless transmission, and particularly relates to a cache deployment method for an Augmented Reality (AR) scene.

Background

While edge computing is generally considered useful for optimizing the AR experience for multiple users, there is currently a lack of specific solutions for such settings. Due to the high resolution and diversity requirements of multi-user mobile AR applications for holographic content, it becomes necessary to solve the holographic content transmission problem of mobile AR devices. Currently, most mobile devices will store all holographic content on the device when downloading AR applications, which requires on the one hand a sufficiently large storage space of the user device and on the other hand is difficult to cope with real-time updates of the three-dimensional AR content. In order to enable the continued growth of mobile ARs, existing work utilizes edge servers to allow the correct hologram to be loaded onto the user device at the appropriate time. An edge server located near the mobile AR device is used as an aid to the AR experience, the edge server being used to store and provide the user with holographic content that the user may need within a particular area.

Disclosure of Invention

The invention aims to provide an intelligent preloading algorithm for a mobile Augmented Reality (AR) scene, which is high in transmission efficiency and low in calculation complexity.

The user obtains an immersive virtual viewing experience through the mobile AR device. However, due to practical wireless transmission bandwidth limitations, when the number of user requests is large, the base station cannot provide the three-dimensional holographic content to the user as required. As shown in fig. 1, the users have different paths and access different holograms, and before the users access the holograms, the edge base stations may send the holographic content to the device side in advance, which is called a preloading procedure.

According to the cache deployment of the mobile AR scene, an intelligent preloading algorithm is adopted, and the edge server side pushes a file to a user before the user does not reach a certain holographic content. This preloading utilizes spare bandwidth to transmit holographic content to improve the transmission efficiency of the edge base stations. And (3) adopting an intelligent preloading strategy, and taking the motion trail of the user as a Markov decision process under the scene that the motion trail of the user is not predicted in advance, so as to adaptively learn the optimal preloading strategy.

The intelligent pre-loading algorithm defines the problem as a Markov decision model and provides a state-related Q learning algorithm; in the algorithm, the invention provides a state-dependent Q learning algorithm for learning an AR sparse deployment scene, which comprises the following specific steps:

(1) establishing a motion model of a user by sampling a motion track of the user and utilizing a Markov decision model;

(2) sampling the user motion model to obtain the current position of the user, and predicting the position of the user at the next moment according to the transition probability after continuously accumulating knowledge;

(3) discretizing the current position of the user into a plurality of sub-regions, each sub-region representing a state s_tAlso a node of the markov model, an edge represents the transition probability between two adjacent sub-regions, i.e. two states;

(4) when receiving the reward value, updating the Q table by using the reward value;

(5) and when the reward value is not received, updating the Q value of the current state by using the state transition probability of the adjacent subarea, namely the Markov decision model.

The invention provides an intelligent preloading algorithm for a mobile augmented reality scene, which comprises the following specific steps:

(1) the edge server side learns the track of the user, the motion track of the user is regarded as a Markov decision process, and the optimal preloading strategy is learned in a self-adaptive mode. Dividing the whole area into a plurality of sub-areas, and determining the state s of the user m at the time slot t_tThe state is determined by the current sub-area i and the set of cacheable content. Recording the behavior trace and direction of the user and calculating the transition probability to the new state

(2) If the user receives a push content, the user selects one of the push content and all the cache contents to discard; action a_tRepresenting the holographic content discarded at each moment in time, definition a_tN denotes a discarded file n;

(3) when the user equipment displays the holographic content, the corresponding cache action can obtain rewards; if the holographic content n is displayed on the screen, action a should not be selected_tN is used as a discarded file;

(4) updating the Q table: when the user device receives the reward r_st,atAt a given attenuation factor γ, the behavior value function of the device is updated as:

and e-greedy exploration is adopted to avoid suboptimal caused by insufficient knowledge. Given 0< e <1, the equipment selection probability e selects a random strategy, and the current best strategy is selected by 1-e;

if no reward is received, updating the Q table according to the current sub-area position i and the content n of each cache, and utilizing the Q value and the transition probability P (m, d, i, j) of the adjacent sub-area j, wherein d represents the advancing direction, and L represents the advancing direction_m,d,iThe number of times that the user equipment m enters i according to the current direction is represented, and the formula for updating the Q value is as follows:

here, the dependency of updating the Q-table is simulated from the Q-table of the neighboring sub-region using a dependency factor b, 0< b < 1.

The innovation points of the invention are as follows: in an actual scene, AR content is often deployed sparsely, a traditional reinforcement learning algorithm cannot make the AR content converge, and the Q table is updated only when the user m obtains a reward, which may cause a decrease in learning speed. However, the present invention proposes a state-dependent Q learning algorithm, and for a content n, the Q value of n can be obtained by the Q values of the adjacent sub-regions even though m does not obtain any knowledge. When the user enters sub-region i, the Q value can be updated without deriving from the neighboring sub-region, since the available knowledge of sub-region i is already sufficient. The invention uses the dependency factor b to simulate the dependency of updating the Q table according to the Q table of the adjacent sub-region.

Drawings

Figure 1 is a multi-user mobile AR system. The users in the graph have different paths and access different holograms. Before the user accesses the hologram, the edge base station may send the hologram content to the device side in advance, and this process is called a preloading process.

FIG. 2 is an embodiment environment illustration.

Figure 3 is a graphical representation of the performance of the present invention.

Detailed Description

The invention provides an intelligent preloading algorithm for a mobile augmented reality scene for carrying out a caching strategy, and provides a state-dependent Q learning algorithm in order to solve the problem of non-convergence caused by sparse cache deployment. The user motion model obtains the current position of the user through sampling, and establishes the motion model of the user by sampling the motion track of the user and utilizing the Markov decision model. And predicting the position of the user at the next moment according to the transition probability. At the same time, the user's current location is discretized into a plurality of sub-regions, each sub-region representing a state s_tAlso a node of the markov model, an edge represents the transition probability between two adjacent sub-regions, i.e. two states.

And when receiving the reward value, the state-dependent Q learning algorithm updates the Q table by using the reward value.

When the reward value is not received, the state-related Q learning algorithm updates the Q value of the current state by using the state transition probability in the adjacent sub-region, namely the Markov decision model.

Example (b):

(1) assuming that within a 10 × 10 size mobile AR area, each 1 × 1 area is considered as a sub-area, the user moves to another sub-area at each time t, and the user has 9 directions, i.e., upper left, lower right, upper left, and still in place. There are 50 users and 8 AR contents in the network. As shown in fig. 2, the black portions represent obstacles such as walls in the simulated actual scene. User m determines state s at time slot t_t，State s_tDetermined by the current sub-region i and the set of cacheable content. There are 50 users in the network. The numbers 1-8 in the figure represent indices of AR content.

(2) The user's cache space is set to 2, and if the user receives a push content, the user selects one of the current push content and all cache contents to discard. Action a_tRepresenting the discarded holographic content at each instant of time rather than the cached content from time period t. Due to limited wireless bandwidth, the devices m andthe holographic content will not be received at every moment, if m does not receive the holographic content at t, it will keep the content currently cached and will not do discarding action.

(3) When the user equipment displays the holographic content, the corresponding cache action can obtain rewards, and the rewards can be set according to the number of the holographic content in the scene, the importance degree of each holographic content in the scene and other parameters. If the holographic content n is displayed on the screen, action a should not be selected_tN as a discarded file.

(4) Updating the Q table: when the user device receives the reward r_st,atThe action value function of the device is updated as follows:

and e-greedy exploration is adopted to avoid suboptimal caused by insufficient knowledge. Given 0< e <1, the device selection probability e selects the random strategy, with 1-e selecting the current best strategy.

Otherwise, updating the Q table as follows:

the dependency of updating the Q-table is simulated from the Q-table of the neighboring sub-region using a dependency factor b, 0< b < 1.

The performance is shown in figure 3. And comparing the performance of the preloading algorithm under different iteration times by evaluating the accumulated response user quantity Acc. The JPC strategy [1], the nearest distance push (NDT) strategy and the LRUC strategy [2] are used as a comparative benchmark algorithm, and the pre-loading algorithm provided by the invention has a higher response user number Acc.

Reference documents:

[1]W.Chen and H.V.Poor,``Content Pushing With Request Delay Information," in IEEE Transactions on Communications,2017.

[2]D.Lee et al.,``LRFU:a spectrum of policies that subsumes the least recently used and least frequently used policies,"in IEEE Transactions on Computers,2001。

Claims

1. an intelligent preloading method for a mobile augmented reality scene is characterized in that an edge server side pushes a file to a user before the user does not reach a certain holographic content; in order to learn the AR sparse deployment scene, a state-dependent Q learning algorithm is adopted, and the method comprises the following steps:

(5) when the reward value is not received, updating the Q value of the current state by using the adjacent subarea, namely the state transition probability in the Markov decision model;

the method comprises the following specific steps:

(1) the edge server side learns the track of the user, the motion track of the user is regarded as a Markov decision process, and the optimal preloading strategy is learned in a self-adaptive mode; dividing the whole area into a plurality of sub-areas, and determining the state s of the user m at the time slot t_tThe state is determined by the current sub-area i and the cacheable content set; recording the behavior trace and direction of the user and calculating the transition probability to the new state

adopting epsilon-greedy exploration, selecting a random strategy according to the equipment selection probability epsilon of 0< epsilon <1, and selecting the current optimal strategy according to 1-epsilon;

if no reward is received, updating the Q table according to the current sub-area position i and the content n of each cache, and updating the Q value by using the Q value and the transition probability P (m, d, i, j) of the adjacent sub-area j, wherein the formula is as follows:

wherein d represents the direction of travel, L_m,d,iRepresents the number of times that the user equipment m enters i according to the current direction, beta is a dependent factor, 0<β<1, simulating the dependency of updating the Q table according to the Q table of the adjacent sub-region.