CN110665223A

CN110665223A - Game resource caching method, decision network training method and device

Info

Publication number: CN110665223A
Application number: CN201911064155.0A
Authority: CN
Inventors: 胡玥; 王蒙; 陈赢峰; 范长杰
Original assignee: Netease Hangzhou Network Co Ltd
Current assignee: Netease Hangzhou Network Co Ltd
Priority date: 2019-11-01
Filing date: 2019-11-01
Publication date: 2020-01-10
Anticipated expiration: 2039-11-01
Also published as: CN110665223B

Abstract

The invention provides a game resource caching method, a decision network training method and a decision network training device, which relate to the technical field of games, and the game resource caching method comprises the following steps: responding to the resource request, and checking whether a target game resource corresponding to the resource request is cached in the cache; if the target game resource is not cached in the cache, acquiring the state information of the game; determining the cache position of the target game resource based on the state information and a pre-trained decision network; and caching the target game resource to a cache position. The game resource caching method, the decision network training method and the device provided by the invention can determine the game resources to be cleared by combining the behavior habits of the players, so that the resources which are possibly requested later are reserved in the cache in advance, the frequent replacement of the resources can be reduced, and the hit rate of the requested resources in the cache is improved.

Description

Game resource caching method, decision network training method and device

Technical Field

The invention relates to the technical field of games, in particular to a game resource caching method, a decision network training method and a decision network training device.

Background

In the current game, a player enters a new game map each time, or a resource request occurs to a special effect generated in a fighting process, and resource loading needs to be carried out in a cache. The main working principle of the cache is that resources frequently requested by a user are placed in the cache, and when the user requests the resources again, a CPU (central processing unit) can directly obtain the resources in the cache without needing to go to a main memory to obtain data, so that the processing speed is improved.

Because the game scene in the game or the special effect of the game and the like are constantly changed, the cache in the game often needs to load and unload resources, the capacity of the cache is fixed, and some commonly used resources are repeatedly loaded and unloaded, so that equipment consumption is caused, the game is easily jammed, and the game experience of a player is influenced. Therefore, resource management within the cache is a very important part.

For resources in a game, a situation that a player repeatedly requests resources in a previous period of time after a period of time often occurs, and by applying the existing resource management mode, the resources which are just deleted are reloaded, a large amount of equipment resources are consumed due to the fact that the same resource is loaded for multiple times, and the utilization rate of the resources which are actually loaded in the cache is low.

Disclosure of Invention

In view of the above, the present invention provides a game resource caching method, a decision network training method and an apparatus thereof, so as to alleviate the above technical problems.

In a first aspect, an embodiment of the present invention provides a game resource caching method, which is applied to a client of a game, and the method includes: responding to a resource request aiming at a virtual object in the game, and checking whether a target game resource corresponding to the resource request is cached in a cache of a cache memory; if the target game resources are not cached in the cache, obtaining state information of the game, wherein the state information comprises current position information of the virtual object and motion path information of the virtual object; determining the cache position of the target game resource based on the state information and a pre-trained decision network; the decision network is obtained by training based on the movement behavior of the virtual object in the game; and caching the target game resource to the cache position.

With reference to the first aspect, an embodiment of the present invention provides a first possible implementation manner of the first aspect, where the step of determining a cache location of a target game resource based on state information and a pre-trained decision network includes: inputting the state information into a decision network trained in advance to obtain a score value output by the decision network, wherein the score value is a score value corresponding to each position for caching the target game resource into the cache; the cache location of the target game resource is determined based on the value of credit.

With reference to the first aspect, an embodiment of the present invention provides a second possible implementation manner of the first aspect, where the step of caching the target game resource to a cache location includes: and if the resources are cached in the cache position, replacing the cached resources in the cache position with the target game resources.

With reference to the first aspect, an embodiment of the present invention provides a third possible implementation manner of the first aspect, where the step of caching the target game resource to the cache location includes: and if the cache position is empty, caching the target game resource corresponding to the resource request to the cache position.

With reference to the first aspect, an embodiment of the present invention provides a fourth possible implementation manner of the first aspect, where the method further includes: and if the target game resources are cached in the cache, rendering the target game resources to a graphical user interface of the client.

In a second aspect, an embodiment of the present invention further provides a method for training a decision network, where the method includes: responding to the construction operation aiming at the game simulation environment, adding static resources, dynamic resources and moving bodies on a pre-established game simulation interface, and triggering the moving bodies and the dynamic resources to move on the game simulation interface; responding to each simulation resource request of the moving body in the moving process; for each simulated resource request, performing: acquiring state information of a game simulation environment, and applying the state information to carry out reinforcement learning training on a decision network to obtain a trained decision network; wherein the state information includes current position information of the moving body and motion path information of the moving body; the decision network is used for carrying out resource management on the cache for caching game resources.

With reference to the second aspect, an embodiment of the present invention provides a first possible implementation manner of the second aspect, where the step of adding the static resource, the dynamic resource, and the moving object to the pre-established game simulation interface includes: and adding an identifier corresponding to the static resource, an identifier corresponding to the dynamic resource, an identifier of the moving body and a preset moving end identifier of the moving body on a pre-established game simulation interface.

With reference to the second aspect, an embodiment of the present invention provides a second possible implementation manner of the second aspect, where the step of triggering the moving body and the dynamic resource to move on the game simulation interface includes: setting the motion modes of the moving body and the dynamic resource; triggering the moving body and the dynamic resources to move on the game simulation interface according to the corresponding movement mode so as to construct a game simulation environment; wherein the motion mode includes a fixed transition mode and a probabilistic transition mode.

With reference to the second aspect, an embodiment of the present invention provides a second possible implementation manner of the second aspect, where the applying state information to perform reinforcement learning training on the decision network includes: applying the state information to construct a reinforcement learning environment of the decision network, and performing reinforcement learning training on the decision network in the reinforcement learning environment; the reinforcement learning environment comprises a state space, an action space and a reward function.

In a third aspect, an embodiment of the present invention further provides a game resource caching device, which is disposed at a client of a game, and includes: the detection module is used for responding to a resource request aiming at a virtual object in the game and checking whether a target game resource corresponding to the resource request is cached in a cache of a cache memory; the acquisition module is used for acquiring the state information of the game if the target game resource is not cached in the cache, wherein the state information comprises the current position information of the virtual object and the motion path information of the virtual object; the determining module is used for determining the cache position of the target game resource based on the state information and a pre-trained decision network; the decision network is obtained by training based on the movement behavior of the virtual object in the game; and the cache module is used for caching the target game resources to a cache position.

In a fourth aspect, an embodiment of the present invention further provides a device for training a decision network, where the device includes: the building module is used for responding to building operation aiming at the game simulation environment, adding static resources, dynamic resources and moving bodies on a pre-built game simulation interface, and triggering the moving bodies and the dynamic resources to move on the game simulation interface; the response module is used for responding to each simulation resource request of the moving body in the moving process; an execution module, configured to execute, for each simulated resource request: acquiring state information of a game simulation environment, and applying the state information to carry out reinforcement learning training on a decision network to obtain a trained decision network; wherein the state information includes current position information of the moving body and motion path information of the moving body; the decision network is used for carrying out resource management on the cache for caching game resources.

In a fifth aspect, the present invention further provides an electronic device, which includes a processor and a memory, where the memory stores computer-executable instructions that can be executed by the processor, and the processor executes the computer-executable instructions to implement the methods in the first to second aspects.

In a sixth aspect, embodiments of the present invention also provide a computer-readable storage medium storing computer-executable instructions, which, when invoked and executed by a processor, cause the processor to implement the method of the first to second aspects.

The embodiment of the invention has the following beneficial effects:

the game resource caching method and the decision network training method and device provided by the embodiment of the invention can respond to a resource request aiming at a virtual object in a game, check whether a target game resource corresponding to the resource request is cached in a cache of a cache memory, acquire state information of the game when the target game resource is not cached in the cache, and determine the caching position of the target game resource based on the state information and a pre-trained decision network; and caching the target game resources, wherein the decision network is obtained by training the motion behavior of the virtual object in the game, so that the decision network can determine the cache position of the target game resources by combining the behavior habits of the player after acquiring the state information, so that the resources which are possibly requested later are reserved in the cache in advance, the frequent replacement of the resources can be reduced, and the hit rate of the requested resources in the cache is improved.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

FIGS. 1 (a) and (b) are schematic diagrams illustrating the implementation process of a cache management algorithm in the prior art, respectively;

FIG. 2 is a flowchart of a method for caching game resources according to an embodiment of the present invention;

FIG. 3 is a flow chart of another method for caching game resources according to an embodiment of the present invention;

FIG. 4 is a flowchart of a method for training a decision network according to an embodiment of the present invention;

FIG. 5 is a flowchart of another method for training a decision network according to an embodiment of the present invention;

FIG. 6 is a partial schematic view of a game simulation environment according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of a training process provided by an embodiment of the present invention;

fig. 8 is a schematic structural diagram of a game resource caching apparatus according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of a training apparatus for a decision network according to an embodiment of the present invention;

fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

To make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Currently, commonly used cache management algorithms include lru (least recent used) algorithm and fifo (First in First out) algorithm. The LRU algorithm is the least recently used algorithm, and when the cache is full and a new resource needs to be loaded, the resource which has not been requested for the longest time in the cache is selected for replacement. The algorithm execution process is as shown in (a) of fig. 1, the uppermost number represents the resource label of the request, and the sequence is from left to right; the lower part represents cache, the capacity of which is 5; the newly loaded resource is put into the cache, and the priority of the newly loaded resource is highest; when the resource No. 7 is requested in the fourth step, a new resource cannot be loaded into the cache, because the resource No. 7 already exists in the cache, the priority of the resource No. 7 is changed to be the highest, and the resource No. 7 is frequently requested; the seventh, ninth and tenth request resources are the same, no new resource is added into the cache, but the priority of the resource in the cache is changed; and the last resource No. 6 is requested, the capacity of the cache is insufficient, the resource No. 4 with the lowest priority is moved out of the cache, and the resource No. 6 is put in.

The FIFO algorithm is a first-in first-out algorithm, and the priority in the cache is determined according to the sequence of the cache. When the cache is full, the resource which is firstly put into the cache is replaced. The algorithm execution process is as shown in (b) in fig. 1, the uppermost number still represents the resource label of the request, and the cache capacity is 5; the priority of the resource in the cache is not influenced by multiple requests of the resource, and the sequence of the resource in the cache is not changed when the resource No. 1 and the resource No. 2 are requested for the second time; the last resource 6 request will replace the resource 4 that was first placed in the cache.

The LRU algorithm and the FIFO algorithm introduced above are relatively common algorithms, and most of cache management algorithms adopt the LRU algorithm, which is also adopted in the current games. The LRU algorithm periodically deletes a portion of the previously used resources, leaving the most recently used resources to reload the player's requested resources. However, when the resource requested by the player at the previous period of time is repeatedly requested after a period of time, for example, when the player returns to a mission point or a mission route is a circle, the cache managed by the LRU algorithm needs to reload the just deleted resource, multiple times of loading causes consumption of a large amount of equipment resources, and the utilization rate of the actually loaded resource in the cache is very low.

Based on this, the game resource caching method, the decision network training method and the decision network training device provided by the embodiment of the invention can alleviate the technical problems, reduce frequent replacement of resources, and improve the hit rate of requested resources in cache.

For the convenience of understanding the present embodiment, a detailed description will be first given of a method for caching game resources disclosed in the present embodiment.

In a possible implementation manner, an embodiment of the present invention provides a method for caching game resources, where the method is applied to a client of a game, and in a specific implementation, the client of the game generally refers to an intelligent terminal installed with an application APP corresponding to the game, for example, the intelligent terminal may be an intelligent terminal such as a smart phone, a tablet computer, a desktop computer, a handheld computer, and the like, and the method provided by the embodiment of the present invention may be used to cache the game resources. Specifically, fig. 2 shows a flow chart of a caching method of game resources, which includes the following steps:

step S202, responding to a resource request aiming at a virtual object in a game, and checking whether a target game resource corresponding to the resource request is cached in a cache of a cache;

the virtual object is usually a virtual object in a game controlled by a player, and the resource request is usually sent by a client of the game to the current virtual object each time the player controls the virtual object to enter a new game map or when a special effect is generated in a fighting process, so as to render the current game scene or game environment of the virtual object, and display the corresponding game scene or game environment on a display interface of the client.

Specifically, the intelligent terminal corresponding to the client may directly search whether the target game resource corresponding to the resource request has been cached in the cache through the CPU.

Step S204, if the target game resource is not cached in the cache, acquiring the state information of the game;

wherein the state information includes current position information of the virtual object and motion path information of the virtual object;

specifically, if the target game resource is not in the cache, the target game resource needs to be acquired from the main memory, and the target game resource is cached in the cache for the CPU to call. Specifically, the target game resource needs to be cached at which position in the cache, and the decision is made according to the process of step S206.

Step S206, determining the cache position of the target game resource based on the state information and a pre-trained decision network;

specifically, the decision network is obtained by training based on the movement behavior of the virtual object in the game; in the embodiment of the invention, a decision network obtained based on motion behavior training of the virtual object in the game can be combined with behavior habits of the player, such as how the player controls the motion trajectory of the virtual object, and the like, so as to provide a reasonable cache management mode, and a more reasonable cache position in the cache is decided to cache the target game resource under the condition that the target game resource is not cached in the cache, so that the resource exchange times are reduced, the hit rate of the cached game resource in the cache is improved, and the total time consumed by loading the game resource and the equipment resource consumption in the game are reduced.

Step S208, caching the target game resource to the cache position.

The game resource caching method provided by the embodiment of the invention can respond to a resource request aiming at a virtual object in a game, check whether a target game resource corresponding to the resource request is cached in a cache of a cache memory, acquire state information of the game when the target game resource is not cached in the cache, and determine the caching position of the target game resource based on the state information and a pre-trained decision network; and caching the target game resources, wherein the decision network is obtained by training the motion behavior of the virtual object in the game, so that the decision network can determine the cache position of the target game resources by combining the behavior habits of the player after acquiring the state information, so that the resources which are possibly requested later are reserved in the cache in advance, the frequent replacement of the resources can be reduced, and the hit rate of the requested resources in the cache is improved.

In a specific implementation, the process of determining the cache location of the target game resource in step S206 is to input the state information into a decision network, and the decision network makes a decision based on the state information, based on which, on the basis of fig. 2, fig. 3 further shows a flowchart of another cache method for the game resource, and as shown in fig. 3, the method includes the following steps:

step S302, responding to a resource request aiming at a virtual object in a game, and checking whether a target game resource corresponding to the resource request is cached in a cache of a cache;

step S304, if the target game resource is not cached in the cache, obtaining the state information of the game;

generally, the state information may further include game resources cached in the cache, and in the embodiment of the present invention, the subsequent steps may be executed after the current position information of the virtual object and the motion path information of the virtual object are acquired, so that the game resources cached in the cache are not limited in the embodiment of the present invention.

Step S306, inputting the state information into a decision network trained in advance to obtain a score value output by the decision network;

the scoring value is a scoring value corresponding to each position for caching the target game resource into the cache;

step S308, determining the cache position of the target game resource based on the credit value;

step S310, caching the target game resource to the cache position;

specifically, if the resources are cached in the cache position, replacing the cached resources in the cache position with the target game resources; and if the cache position is empty, directly caching the target game resource corresponding to the resource request to the cache position.

In an embodiment of the present invention, the decision network is a network with a specific decision function obtained by training a neural network, and the input of the decision network is state information, including: the current state in the cache, namely the cached game resources in the cache, is usually replaced by the number information of the game resources; the current request resource number is also included, namely the target game resource; the method also comprises the current position information of the moving body, the end position information and the path information which are required to be reached by the current path, namely, the current moving body is in the several selectable paths, wherein the moving body is a virtual object which can be controlled by a player in the game, and therefore, the motion information of the moving body is the current position information of the virtual object and the motion path information of the virtual object.

For a trained decision network, a score value, that is, a Q value, of each action can be output according to the state information, where each action can be represented as 0, 1, 2 … … C-2, and C-1, and represents that the number of cache positions in the cache are cache positions, where, for a case that the cache in the cache is full, the number of cache resources in the cache and a currently requested target game resource are fetched for replacement, and the Q value is generally a concept of reinforcement learning, and is used to measure how good the action is made in a certain state, and generally, a corresponding action with the largest Q value is selected as an execution action.

In the embodiment of the present invention, the action is to cache the target game resource to the fourth position in the cache, or to replace the target game resource requested by the current resource and the game resource cached in the cache at which position.

For convenience of understanding, the cache with the capacity of 5 in fig. 1 is taken as an example for explanation, and it is assumed that game resources are cached in each location in the cache at this time, when the decision network acquires the input state information, the decision network may output a Q value of an action when each cached game resource is replaced according to the state information, for example, a score Q1 of replacing a first cached game resource, a score Q2 of replacing a second cached game resource, a score Q3 of replacing a third cached game resource, a score Q4 of replacing a fourth cached game resource, and a score Q5 of replacing a fifth cached game resource.

At this time, the position corresponding to the highest Q value may be selected as the cache position based on Q1 to Q5, the game resource cached at the cache position is taken as the game resource to be cleared, and a replacement action is performed to perform resource replacement, for example, if the action with the highest Q value is the fifth cached game resource corresponding to Q5, the fifth cached game resource in the cache and the target game resource corresponding to the resource request are replaced. Further, if the fifth cache location corresponding to Q5 is empty, the target game resource is directly cached to the fifth cache location.

And step S312, rendering the target game resources in the cache to a graphical user interface of the client.

Specifically, after the target game resource is cached to the cache position decided by the decision network, the CPU may obtain the target game resource in the cache to perform rendering, without obtaining resource data in the main memory.

In addition, for the case that the cache location is empty, after the target game resource is cached to the cache location, the game resource number and the location number in the currently stored cache can be recorded, so that the decision network can make a decision when the cache is full.

In practical use, because the motion tracks of most of the players in the game for controlling the motion of the virtual objects are regularly circulated, when the cache is managed by using the game resource buffering method provided by the embodiment of the invention, the decision network in the embodiment of the invention is obtained by training the motion behaviors of the virtual objects in the game, so that the game resources to be cleared can be decided by combining the behavior habits of the players, and the game resources which are possibly requested later are reserved in the cache in advance, thereby effectively reducing frequent replacement of the resources and improving the hit rate of the requested resources in the cache.

Further, for the decision network, an embodiment of the present invention further provides a method for training a decision network, and specifically, the method may be applied to a terminal with an operation function, such as a server, a computer, a cloud platform, and the like. Fig. 4 shows a flow chart of a training method of a decision network, as shown in fig. 4, the method comprising the steps of:

step S402, responding to the construction operation aiming at the game simulation environment, adding static resources, dynamic resources and moving bodies on a pre-established game simulation interface, and triggering the moving bodies and the dynamic resources to move on the game simulation interface;

specifically, the training process of the decision network in step S402 is equivalent to the building process of the game simulation environment, and specifically, the game simulation environment needs to be close to the real game environment, so that when a corresponding game simulation interface is built, the game simulation interface needs to be added with corresponding static resources, dynamic resources and moving objects, and the moving objects and the dynamic resources are triggered to move on the game simulation interface, so as to simulate the real game environment.

Step S404, responding each simulation resource request of the moving body in the moving process;

specifically, the moving object corresponds to a player, that is, a virtual object in a game controlled by the player, and the process of step S404 is a process of requesting game resources within a range of a field of view, which can be calculated according to the manhattan distance, when the moving object moves in the simulation environment.

Step S406, for each analog resource request, performs: acquiring state information of a game simulation environment, and applying the state information to carry out reinforcement learning training on a decision network to obtain a trained decision network;

the obtained trained decision network can be used for performing resource management on a cache for caching game resources, and the state information comprises current position information of the moving body and motion path information of the moving body.

In practical use, for each simulation resource request, the training execution subject (i.e., a terminal with an arithmetic function, such as a server, a computer, a cloud platform, etc.) may have a corresponding training action, i.e., perform a reinforcement learning training of the decision network according to the execution process of step S406.

In order to make the game simulation environment closer to a real game scene for training a decision network, when adding static resources, dynamic resources and moving objects on a game simulation interface, the game simulation interface usually adopts a method of adding identifiers to identify and characterize corresponding resources and moving objects, so as to construct the game simulation environment, therefore, on the basis of fig. 4, an embodiment of the present invention further provides another training method for a decision network, such as a flowchart of another training method for a decision network shown in fig. 5, the method includes the following steps:

step S502, responding to the construction operation aiming at the game simulation environment, and adding an identifier corresponding to a static resource, an identifier corresponding to a dynamic resource, an identifier of a moving body and a preset movement destination identifier of the moving body on a pre-established game simulation interface;

in order to distinguish the static resources, the dynamic resources and the moving bodies in the game simulation environment, the identifiers corresponding to the static resources, the identifiers corresponding to the dynamic resources, the identifiers of the moving bodies and the preset movement destination identifiers of the moving bodies can be represented by graphs in different shapes, and different colors can be set for different resources and moving bodies for distinguishing.

In particular, the above-described moving bodies, i.e., virtual objects in a player-controlled game, are typically represented by white circles in a simulated environment. The preset motion endpoint identifier of the moving body is generally represented by a yellow square.

Further, the static assets generally represent fixed assets in the game, such as fixed NPCs (Non-player characters), which may be represented by blue triangles in the game simulation environment; dynamic assets then represent assets that move in the game, such as other players, NPCs that move regularly, represented by orange triangles in the game simulation environment.

In a specific implementation, the shape and color of the mark and the number of the preset movement end points of the moving body may be set according to an actual use situation, which is not limited in the embodiment of the present invention.

Step S504, setting the motion mode of the moving body and the dynamic resource;

in the embodiment of the present invention, the motion mode generally includes a fixed transition mode and a probability transition mode.

Specifically, the fixed transition mode is to reach the first end point first, then transition to the second end point, and so on, taking the number of the preset motion end points of the moving body as 10 as an example, the transition probability matrix can be expressed as:

the probability transition mode refers to that after the moving object reaches the first end point, the moving object goes to the second end point with a high probability, but other end points may also go, and similarly, taking the number of the preset moving end points of the moving object as 10 as an example, the transition probability matrix may be represented as:

the motion body is simulated through the two motion modes, namely the regular motion track of the virtual object in the game controlled by the player in the game.

Further, the motion mode of the NPC of the other player represented by the motion resource or the regular motion may also be implemented by the two motion modes, and in the specific implementation, the motion mode of each motion body and the motion resource may be preset, and the motion mode of each motion body and the motion resource may be the same or different, and specifically, the motion mode may be set according to the actual situation and the game to be simulated, so as to simulate the motion of the other player in the real game, and further construct the game simulation environment.

Step S506, triggering the moving body and the dynamic resource to move on the game simulation interface according to the corresponding movement mode to construct a game simulation environment;

step S508, respond to each analog resource request of the moving body in the course of moving;

step S510, for each analog resource request, performs: and acquiring state information of the game simulation environment, and applying the state information to carry out reinforcement learning training on the decision network to obtain the trained decision network.

In a game simulation environment, a moving body can request simulation resources (a visual field is calculated according to the Manhattan distance) in the range of the visual field during the moving process, and if the simulation resources are not in the cache and the corresponding cache position in the cache is empty, the simulation resources can be put into the cache; if the simulation resource is not in the cache and the corresponding cache position in the cache is full, the requested simulation resource needs to be replaced with the resource at the cache position in the cache, and the cache position specifically refers to which position in the cache, or specifically which resource is selected to be replaced with the requested simulation resource when the cache is full, a decision network is needed to make a decision. In order to obtain the decision network, in the training process, for each simulated resource request, the decision network may be subjected to reinforcement learning training in the manner of step S510 described above.

Furthermore, if the emulated resource corresponding to the emulated resource request is already in the cache, then a hit is deemed. Typically, when building a game simulation environment, the resource is represented in the cache by a pink triangle and the resource hit of the current simulation resource request is represented by a red triangle.

In addition to being represented by different graphs and colors, the dynamic resources and the static resources included in the game simulation environment may also be represented by graphs and sequence numbers, and a resource information table may also be stored in the training execution main body in advance to determine whether each sequence number corresponds to a dynamic resource or a static resource, or a moving body, and a preset movement end point of the moving body, and for convenience of understanding, fig. 6 shows a partial schematic diagram of the game simulation environment, where triangles with different labels identify dynamic resources and static resources, and in a cache, resources and hit resources, and white circles identify moving bodies. The specific reference numbers may be set according to actual use conditions, and the embodiment of the present invention is not limited to this.

Further, in step S510, the applying the state information to perform reinforcement learning training on the decision network includes: applying the state information to construct a reinforcement learning environment of the decision network, and performing reinforcement learning training on the decision network in the reinforcement learning environment; the reinforcement learning environment comprises a state space, an action space and a reward function.

Specifically, since the decision network training is performed based on reinforcement learning in the present application, modeling and constructing a state space, an action space and a reward function in the reinforcement learning environment are required in the training process to construct the decision network.

The state in the state space corresponds to the state information, and the current state in the cache, namely the cached game resource in the cache, is usually replaced by the number information of the game resource; the method also comprises the resource number of the current request, namely the target game resource; and the current position information of the moving body and the end position information and the path information to be reached by the current path are also contained, namely, the current moving body is in the several optional paths.

The action space corresponds to actions in the training process, if the size of the current cache is C, the actions are 0, 1, 2 … … C-2 and C-1, which means that the number of cached resources in the cache and the currently requested target game resources are taken for replacement, and if the action is i and the ith position of the cache is empty, the action means that the currently requested resources are directly placed at the ith position of the cache.

Further, the design of the reward function is divided into an intermediate reward and a final reward, wherein the intermediate reward is the reward obtained by corresponding to one action in each state of the moving body in the moving process, and the design is that whether the moving body hits each time or not, and 1 is added if the moving body hits each time. The final reward is the reward obtained after the path of one movement is finished and is the final hit rate. The formula for the reward function is as follows:

R＝r_mid+r_final(1)

r_mid＝hit_num(2)

r_final＝α*hitrate (3)

where α is a coefficient used to balance the intermediate prize and the final prize. R represents a reward function, R_midIndicates an intermediate prize, r_finalIndicating the final prize.

In the above formula (2), the intermediate reward is hit number hit in each time slice_numTo indicate that one hit adds 1, two hits adds 2, and a time slice refers to the process from one state to the next.

In the above formula (3), the final reward is a total hit rate hit of the moving object completing one complete movement, for example, taking the number of preset movement end points of the moving object as 10, and the final reward is a total hit rate after the moving object moves for ten paths together, that is, the number of hits is divided by the total number of simulated resource requests.

Based on the game simulation environment and the reinforcement learning environment of the decision network, the decision network and the game simulation environment are continuously interactive, for easy understanding, fig. 7 shows a schematic diagram of a training process, as shown in fig. 7, the game simulation environment gives the current state to the decision network, the decision network makes an action according to the state and executes the action, and the game simulation environment enters a new state and feeds back the obtained reward. The decision network, namely the algorithm for managing the cache, can make a correct resource replacement strategy according to the current cache condition, reserve resources which may be encountered later, and enable the overall hit rate to be high.

The method comprises the steps that a moving body moves in a game simulation environment according to a set fixed transfer mode or a probability transfer mode, a simulation resource request is initiated when resources in a visual field are met, if the resources are hit in a cache, a decision is made when a network is not decided in the cache, and the resources corresponding to the current simulation resource request and which resources in the cache are to be replaced are determined. After replacement, the game simulation environment enters the next state (the resource condition inside the cache changes, and the moving body and the dynamic resource also change), and the just intermediate reward is fed back to the decision network. The decision network trains the self network according to the { state, action, next state, reward } tuple.

Furthermore, the process of training the decision network according to the { state, action, next state and reward } tuple is equivalent to the process of continuously trial and error of the moving body in a simulation environment, and the decision network can learn the optimal cache resource replacement scheme, so that the final hit rate of the cache is higher.

In a game simulation environment, after the training of the decision network is completed, the decision network can be tested to compare the hit rates of the LRU, the FIFO, the random strategy and the strategy of the decision network based on reinforcement learning.

Specifically, table 1 shows a test experiment result, as shown in table 1, wherein the experiment result of table 1 is performed when only static resources are available in the game simulation environment and the motion mode of the moving body is the fixed transfer mode.

Table 1:

algorithm	LRU	FIFO	Random	Optimization of	Algorithm for reinforcement learning
						Average hit rate	0.3624	0.3840	0.3858	0.6487	0.5788

The specific game simulation environment settings are as follows: random seed: 0; the size of the map corresponding to the game simulation interface is as follows: 20 by 20; number of static resources: 30, of a nitrogen-containing gas; capacity of cache: 10; visual field range: the manhattan distance is in the range of 2. In the test, 10000 times of tests are carried out in the same game simulation environment, and the average hit rate is calculated.

In table 1, the optimal method refers to that, after all request lists are derived, the optimal result calculated under the condition of knowing subsequent resource requests is used as the highest achievable value of the average hit rate under the experimental setting; randomness in the table refers to the average hit rate of random substitutions. The random seed is used to generate the virtual environment and the resource distribution, and in order to make each setting of the environment the same, in the embodiment of the present invention, the random seed is fixed, that is, set to 0.

Further, table 2 shows another test experiment result, as shown in table 2, wherein, in order to avoid interference of static resources with dynamic resources, the experiment result in table 2 is performed in the game simulation environment with only dynamic resources, and is performed when the moving body and the dynamic resources move according to the fixed transfer mode and the probability transfer mode, respectively.

Table 2:

specifically, the game simulation environment is set as follows: random seed: 0; the size of the map corresponding to the game simulation interface is as follows: 20 by 20; number of dynamic resources: 40; capacity of cache: 10; visual field range: the manhattan distance is in the range of 2. In the same test, 10000 times of tests were performed in the same game simulation environment, and the average hit rate was calculated.

To sum up, the decision network training method based on reinforcement learning provided by the embodiment of the present invention includes the following steps:

(1) constructing a game simulation environment:

the game simulation environment includes static resources, dynamic resources and the moving body itself. The dynamic resources are regularly transferred between preset motion end points of each preset motion body in a fixed transfer mode or a probability transfer mode. The moving body itself regularly performs a fixed course movement or a probabilistic movement according to one of the two movement patterns.

(2) Constructing a reinforcement learning environment of a decision network:

designing a reinforcement learning environment comprising a state space, an action space and a reward function, and constructing a decision network. The state space comprises the current state of the cache, the resource number of the current request, the current position information of the moving body, and the destination position information and path information to be reached by the current path;

the action in the action space is replaced by selecting a certain resource in the cache;

the reward function is divided into an intermediate reward and a final reward, the intermediate reward is 1 added if hit, and the final reward is the final hit rate.

(3) Training process:

the moving body moves regularly in the game simulation environment, collects data and trains a decision network. The moving body moves regularly, and the { state, action, next state, reward } tuple is recorded and used for training the decision network.

(4) Tests were conducted in a game simulation environment comparing hit rates of LRU, FIFO, random strategies and a decision network based on reinforcement learning.

As can be seen from the experimental results shown in tables 1 and 2, the hit rate of the resource replacement policy made by the decision network based on reinforcement learning in the embodiment of the present invention is higher than the experimental results of other algorithms.

In a real game environment, data of a real player can be sampled in advance, the decision network is trained by using the method, and the trained decision network is used for determining which resource in the current cache is replaced by a game resource corresponding to the resource request. Due to the training of a large amount of player data, the decision network can make a behavior with higher long-term benefit, the hit rate of loading resources is improved, and the consumption of the resources is reduced.

Corresponding to the above caching method for game resources shown in fig. 2, an embodiment of the present invention further provides a caching device for game resources, where the device is disposed at a client of a game, and as shown in fig. 8, the caching device for game resources includes:

a detection module 80, configured to respond to a resource request for a virtual object in a game, and check whether a target game resource corresponding to the resource request is cached in a cache of a cache memory;

an obtaining module 82, configured to obtain state information of the game if the target game resource is not cached in the cache, where the state information includes current position information of the virtual object and motion path information of the virtual object;

a determining module 84, configured to determine a cache location of the target game resource based on the state information and a pre-trained decision network; the decision network is obtained by training based on the movement behavior of the virtual object in the game;

the cache module 86 is configured to cache the target game resource to the cache location.

Further, corresponding to the above training method of the decision network shown in fig. 4, an embodiment of the present invention further provides a training device of the decision network, which may be disposed in a training execution subject, such as a server, a computer, a cloud platform, and the like, and as shown in fig. 9, the training device of the decision network includes:

the building module 90 is configured to add a static resource, a dynamic resource, and a moving body to a game simulation interface that is established in advance in response to a building operation for the game simulation environment, and trigger the moving body and the dynamic resource to move on the game simulation interface;

a response module 92, for responding to each simulation resource request of the moving body in the moving process;

an execution module 94, configured to, for each simulated resource request, perform: acquiring state information of a game simulation environment, and applying the state information to carry out reinforcement learning training on a decision network to obtain a trained decision network;

wherein the state information includes current position information of the moving body and motion path information of the moving body; the decision network is used for carrying out resource management on the cache for caching game resources.

The embodiment of the invention also provides electronic equipment, which comprises a processor and a memory, wherein the memory stores computer executable instructions capable of being executed by the processor, and the processor executes the computer executable instructions to realize the game resource caching method or the decision network training method.

Further, an embodiment of the present invention further provides a computer-readable storage medium, where computer-executable instructions are stored, and when the computer-executable instructions are called and executed by a processor, the computer-executable instructions cause the processor to implement the above-mentioned cache method for game resources or the training method for decision network.

Fig. 10 is a schematic structural diagram of an electronic device, where the electronic device includes a processor 101 and a memory 100, the memory 100 stores computer-executable instructions that can be executed by the processor 101, and the processor 101 executes the computer-executable instructions to implement the caching method for game resources or the training method for decision network.

In the embodiment shown in fig. 10, the electronic device further comprises a bus 102 and a communication interface 103, wherein the processor 101, the communication interface 103 and the memory 100 are connected by the bus 102.

The Memory 100 may include a high-speed Random Access Memory (RAM) and may further include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The communication connection between the network element of the system and at least one other network element is realized through at least one communication interface 103 (which may be wired or wireless), and the internet, a wide area network, a local network, a metropolitan area network, and the like can be used. The bus 102 may be an ISA (Industry standard Architecture) bus, a PCI (Peripheral component interconnect) bus, an EISA (Extended Industry standard Architecture) bus, or the like. The bus 102 may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 10, but this does not indicate only one bus or one type of bus.

The processor 101 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 101. The Processor 101 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory, and the processor 101 reads information in the memory, and completes the steps of the game resource caching method or the decision network training method of the foregoing embodiment in combination with hardware thereof.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the electronic device and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

The game resource caching method, the decision network training method and the computer program product of the device provided by the embodiment of the invention comprise a computer readable storage medium storing program codes, instructions included in the program codes can be used for executing the method in the previous method embodiment, and specific implementation can refer to the method embodiment, and is not described herein again.

In addition, in the description of the embodiments of the present invention, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meaning of the above terms in the present invention can be understood in specific cases for those skilled in the art.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In the description of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc., indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of description and simplicity of description, but do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art will understand that the following embodiments are merely illustrative of the present invention, and not restrictive, and the scope of the present invention is not limited thereto: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A caching method of game resources is applied to a client of a game, and the method comprises the following steps:

responding to a resource request aiming at a virtual object in the game, and checking whether a target game resource corresponding to the resource request is cached in a cache of a cache memory;

if the target game resource is not cached in the cache, state information of the game is acquired, wherein the state information comprises current position information of the virtual object and motion path information of the virtual object;

determining the cache position of the target game resource based on the state information and a pre-trained decision network; wherein the decision network is trained based on the movement behavior of the virtual object in the game;

and caching the target game resource to the cache position.

2. The method of claim 1, wherein determining the cache location of the target game resource based on the state information and a pre-trained decision network comprises:

inputting the state information into a decision network trained in advance to obtain a score value output by the decision network, wherein the score value is a score value corresponding to each position for caching the target game resource into the cache;

and determining the cache position of the target game resource based on the scoring value.

3. The method of claim 1, wherein caching the target game resource to the cache location comprises:

and if the resources are cached in the cache position, replacing the cached resources in the cache position with the target game resources.

4. The method of claim 1, wherein caching the target game resource to the cache location comprises:

and if the cache position is empty, caching the target game resource corresponding to the resource request to the cache position.

5. The method of claim 1, further comprising:

and if the target game resource is cached in the cache, rendering the target game resource to a graphical user interface of the client.

6. A method of training a decision network, the method comprising:

responding to construction operation aiming at a game simulation environment, adding static resources, dynamic resources and a moving body on a pre-established game simulation interface, and triggering the moving body and the dynamic resources to move on the game simulation interface;

responding to each simulation resource request of the moving body in the moving process;

for each of the simulated resource requests, performing: acquiring state information of the game simulation environment, and applying the state information to carry out reinforcement learning training on a decision network to obtain the trained decision network;

7. The method of claim 6, wherein the step of adding static resources, dynamic resources and moving objects on a pre-established game simulation interface comprises:

and adding the identifier corresponding to the static resource, the identifier corresponding to the dynamic resource, the identifier of the moving body and the preset movement end point identifier of the moving body on a pre-established game simulation interface.

8. The method of claim 6, wherein the step of triggering the motion of the moving body and the dynamic resource on the game simulation interface comprises:

setting a motion pattern of the moving body and the dynamic resource;

triggering the moving body and the dynamic resources to move on the game simulation interface according to corresponding movement modes so as to construct a game simulation environment;

wherein the motion patterns include a fixed transition pattern and a probabilistic transition pattern.

9. The method of claim 6, wherein applying the state information for reinforcement learning training of decision networks comprises:

applying the state information to construct a reinforcement learning environment of the decision network, and performing reinforcement learning training on the decision network in the reinforcement learning environment;

wherein the reinforcement learning environment comprises a state space, an action space and a reward function.

10. A game resource caching device is arranged at a client of a game, and comprises:

the detection module is used for responding to a resource request aiming at a virtual object in the game and checking whether a target game resource corresponding to the resource request is cached in a cache of a cache memory;

an obtaining module, configured to obtain state information of the game if the target game resource is not cached in the cache, where the state information includes current position information of the virtual object and motion path information of the virtual object;

the determining module is used for determining the cache position of the target game resource based on the state information and a pre-trained decision network; wherein the decision network is trained based on the movement behavior of the virtual object in the game;

and the cache module is used for caching the target game resource to the cache position.

11. An apparatus for training a decision network, the apparatus comprising:

the game simulation system comprises a building module, a game simulation interface and a control module, wherein the building module is used for responding to building operation aiming at a game simulation environment, adding static resources, dynamic resources and moving bodies on a pre-built game simulation interface, and triggering the moving bodies and the dynamic resources to move on the game simulation interface;

the response module is used for responding to each simulation resource request of the moving body in the moving process;

an execution module, configured to execute, for each of the simulated resource requests: acquiring state information of the game simulation environment, and applying the state information to carry out reinforcement learning training on a decision network to obtain the trained decision network;

12. An electronic device comprising a processor and a memory, the memory storing computer-executable instructions executable by the processor, the processor executing the computer-executable instructions to implement the method of any of claims 1 to 9.

13. A computer-readable storage medium having computer-executable instructions stored thereon which, when invoked and executed by a processor, cause the processor to implement the method of any of claims 1 to 9.