CN112087489A

CN112087489A - Relay forwarding selection method and system for online mobile game network transmission

Info

Publication number: CN112087489A
Application number: CN202010776245.9A
Authority: CN
Inventors: 朱国伟; 顾维玺; 马戈; 吕衎; 黄启洋; 王青春
Original assignee: China Industrial Internet Research Institute
Current assignee: Beijing Gonglian Technology Co ltd
Priority date: 2020-08-05
Filing date: 2020-08-05
Publication date: 2020-12-15
Anticipated expiration: 2040-08-05
Also published as: CN112087489B

Abstract

The invention discloses a relay forwarding selection method and a system for online mobile game network transmission, wherein the method comprises the following steps: s1, deploying relay forwarding servers, and using CDN servers distributed everywhere as forwarding nodes; s2, designing a data-driven game quality model to accurately determine the game quality of a session connection; s3, determining a relay forwarding selection strategy; s4, designing a network hierarchical control platform, collecting information of game session connection in real time, performing model training, user experience calculation and relay forwarding selection, and further guiding network optimization under game distribution. The invention can analyze the network conditions in the system in a global view, thereby achieving the maximization of the overall user experience.

Description

Relay forwarding selection method and system for online mobile game network transmission

Technical Field

The invention relates to a relay forwarding selection method and a relay forwarding selection system for online mobile game network transmission, and belongs to the technical field of network transmission and control, network optimization, machine learning and data analysis.

Background

With the deep convergence of mobile internet and network games, online mobile phone games are becoming one of the most important media entertainment services on the internet. According to statistics, by 2019, the hand game occupies 45% of the global game market, and a large number of network applications represented by MOBA type hand games are emerged, including the royal glory, soul fighting, battlefield stimulation, live wire crossing and the like. Due to the steep increase of the number of online users caused by the increasing popularity of online handtours, compared with other streaming media services (languages, videos and the like), the highly interactive handtours are more sensitive to network conditions such as time delay, packet loss, jumping and the like. In order to provide better user experience, the user experience in the game process can be improved by deploying the relay forwarding server instead of adopting a network BGP direct connection mode. The relay forwarding mechanism can significantly improve the network performance of the user, because the relay forwarding server can provide more bandwidth for the user, so that the user can interact with the game server with better network performance.

Although the relay forwarding mechanism is a good solution for improving the network performance of users, how to effectively allocate service resources under the mechanism is a very important issue, that is, determining which session connections can be relayed and further improving the overall user experience of the online mobile game. Efficiently allocating policies can maximize the overall user experience.

Traditional relay forwarding selection strategies are mostly rule-based, and this approach generally cannot dynamically select a session connection for a game from a global perspective, for the following reasons:

first, rule-based selection policies cannot analyze a user's network performance from a global perspective. Under rule-based policies, gaming operators typically learn rules based on historical data. The limitations of this rule-based approach are that these approaches do not account for dynamic variability of network conditions, and that users experience different network performance over time due to changes in background network patterns.

Second, there is a complex relationship between network quality of service and game quality. Moreover, there is a mutual dependency between the service qualities, for example, when the delay increases, the packet loss rate and jitter increase. In addition, there are also implicit environmental factors affecting the quality of service and game quality of the network, such as static factors including autonomous network, device type, AP and base station, etc., and time factors including idle and busy hours, etc.

Disclosure of Invention

The invention aims to provide a relay forwarding selection method and a relay forwarding selection system for online mobile game network transmission, which aim to solve the problems in the background technology.

The invention firstly provides a system architecture of relay forwarding, and the relay forwarding service uses CDN servers distributed everywhere as forwarding nodes. Then, a data-driven game quality model is designed, and the model considers the service quality factor and the environment factor at the same time, so that the game quality of a session connection can be accurately determined. Further, the invention converts the relay forwarding selection problem into an optimization problem which enables the overall game quality to be maximum, and provides a relay forwarding selection strategy of online hand-game session connection. Finally, the invention designs a network layered control platform, which can collect information (including service quality information, client information and the like) of game session connection in real time, perform model training, user experience calculation and relay forwarding selection, and further guide network optimization under game distribution.

The invention relates to a relay forwarding selection method for online mobile game network transmission, which comprises the following specific processes:

s1, deploying relay forwarding servers, and using CDN servers distributed everywhere as forwarding nodes; the method comprises the following specific steps:

each game session connection selects direct connection or forwarding as a network connection path, all data transmission in the session connection is completed through the established connection path, all the paths of the session connection firstly pass through the intercity route of an operator, and the path of the network connection between a client and a forwarding node is determined by BGP. When the forwarding is used as a network connection path, the CDN servers distributed in various places are used as forwarding nodes, and the CDN servers are connected via a high-speed network, so that the network performance of game session connection can be effectively improved.

In each session connection, after a user starts a game, a client sends active detection to a game server through a direct connection path and a forwarding path respectively, collects network information such as time delay and the like in a certain time period (the time period of detection is determined by system implementation difficulty and is generally 3 seconds or 5 seconds), and finally reports the network information to a control server; and the control server makes a decision whether to relay the game session connection or not according to the data, the model and the limiting conditions (bandwidth resources, server load conditions and the like) reported by the client. Preferably, in order to avoid the load of the control server exceeding the limit, a plurality of control servers can be applied in parallel.

And S2, designing a data-driven game quality model to accurately determine the game quality of a session connection.

Firstly, establishing a basic prediction model by using a typical machine learning algorithm;

then extracting environmental factors influencing the service quality and the game quality;

and finally, determining important environmental factors, dividing the data set according to the environmental factors, and respectively training a game quality model of the online hand game for optimization.

S3, determining relay forwarding selection strategy

S31, collecting network performance data, game quality data, client data and the like of online mobile game session connection in the system, measuring and analyzing the data, and training a game quality model of the online mobile game according to the method in S2;

s32, loading a trained game quality model of the online mobile phone game;

s33, setting the exploration probability E and the threshold eta, initializing the distribution strategy F to be null, and calculating the game quality brought under the direct connection path and the forwarding path and calculating the difference Diff of the game quality and the forwarding path for each session connection S in the set S of the session connections to be distributed. When the value of Diff exceeds the threshold η of pruning, we assign the path of session connection s to be the one that minimizes its value of Pred;

s34, when the value of Diff is less than the threshold eta, we search with the probability of epsilon by using a search-discovery combination method (a search algorithm will be described later), and randomly allocate with the probability of 1-epsilon.

The search-discovery selection algorithm described in step S34 specifically includes the following steps: we translate the path selection per session connection into a "maximize single step reward" reinforcement learning problem, the set of options C is the set of "actions" and the game experience gained under each selection is "reward". Part of the paths of the session connection are selected (developed) by using Upper Confidence (UCB) with the probability of ∈, and the other part of the paths of the session connection are selected (explored) by using a random method with the probability of 1 ∈.

The method comprises the following steps of selecting a part of session connection paths by utilizing upper signaling, and comprising the following steps:

s341, acquiring a session connection S to be distributed, an option set C, a distribution strategy F and a user experience model Pred;

s342, initialize the minimum value ucb of confidence_minTo infinity, the optimal allocation of the initial session connections s is chosen c_best；

S343, calculating the average game quality w obtained by the session connection S under each selection, wherein the specific calculation method is

S344, for any one C in the option set C, firstly, according to the distribution strategy F, the session connection set S of the option C is obtained through statistics_cUpdating its confidence value, { s '| F (s') ═ c }, in the updated value

If the value of ucb is less than ucb_minUpdating the optimal selection policy for the session connection s

Update ucb_min＝ucb。

S4, designing a network hierarchical control platform, collecting information of game session connection in real time, performing model training, user experience calculation and relay forwarding selection, and further guiding network optimization under game distribution.

The network layered control platform comprises a model layer, a decision layer and a sensing/executing layer;

the model layer collects data of all session connections in the system for modeling in a certain period (the period is determined according to the specific design of the system and is generally 30-60 minutes), and pushes the model to all decision instances (a data center, a front-end server and the like) after training is finished. In addition, the model layer collects global information of the network, such as bandwidth, load and the like of the server, and pushes the global information to the decision instance, so that the decision instance can conveniently make a decision according to the multivariate information.

Decision instances (decision instances represent decision procedures whether to relay or not) of the decision layer are deployed in servers at different geographical locations. When a new session connection requests connection, the session connection is redirected to a specific (a specific decision example refers to a server running a relay forwarding program) decision example through a load balancing mechanism (the load balancing mechanism refers to the existing load balancing technology, such as the Nginx technology and the like). The specific decision example calculates according to the information and the model reported by the session connection, and then makes the path selection of the session connection.

The sensing/executing layer mainly comprises: 1) reporting information including service quality information and game quality information to a decision layer; 2) receiving and executing commands of a decision layer, such as frame rate adjustment, relay forwarding selection and the like; 3) and automatic fault tolerance, if the sensing/executing layer cannot receive the instruction of the decision layer when the network is interrupted and the server is down, the game quality of the client can be ensured not to be rapidly reduced, and a new decision example is requested again.

The invention relates to a relay forwarding selection system for online mobile game network transmission, which comprises:

the relay forwarding physical system architecture utilizes CDN servers distributed everywhere as relay forwarding nodes, and the CDN servers are connected through a high-speed network;

the data-driven game quality model training module is used for accurately calculating the game quality of one session connection according to the service quality factors and the environment factors;

the relay forwarding selection strategy module is used for loading a game quality model of the online hand game and selecting a session connection path based on a reinforcement learning algorithm;

and the network layered control platform is used for collecting information of game session connection in real time, performing model training, user experience calculation and relay forwarding selection, and further guiding network optimization under game distribution.

The invention relates to a relay forwarding selection method and a system for online mobile game network transmission, which have the advantages and effects that: network conditions in the system can be analyzed from a global perspective, so that the overall user experience is maximized.

Drawings

FIG. 1 is a block diagram of the process of the present invention.

FIG. 2 illustrates a process of a game session connection and events associated therewith, according to an embodiment of the present invention.

Fig. 3 shows an online mobile game relay forwarding architecture according to an embodiment of the present invention.

Fig. 4 is a method route diagram of a game quality prediction model of an online mobile phone game according to an embodiment of the present invention.

Fig. 5 shows the steps of the relay forwarding selection policy in the embodiment of the present invention.

Fig. 6 shows a network layered control platform architecture according to an embodiment of the present invention.

Detailed Description

The technical solution of the present invention will be further described with reference to the accompanying drawings and examples.

Description of business logic

As shown in fig. 1, the process of a user participating in a one-play and multi-player battle game includes the steps of downloading updates, logging in, grouping, battle, reporting results, and the like. Specifically, in the team formation and fight process of each game, the SDK embedded in the mobile phone client can be used for collecting rich information of the client and reporting the result after the game is finished. Unlike video and other games, the frames of the pictures of the multi-player battle type hand game are rendered by the rendering engine of the client of the mobile phone at a certain number of frames per second, so that the game pictures felt by the user are a series of frames. In the process of fighting, the client encapsulates the operation instructions of the users into data packets and transmits the data packets to the game server in real time through the network, the running process of the game server calculates all the received instruction packets and transmits the calculated results to all the users in the game in the form of the data packets through a certain period. As shown in the upper part of fig. 2, when the rendering of the frame is delayed due to poor network performance or insufficient mobile phone performance, and the user feels that the game frame "floats", which results in poor game experience, this phenomenon is called delay.

Through the SDK embedded in the client, the records in the data set can be collected from two aspects of network quality and client information:

network quality: in the session connection of the game, the client detects the RTT experienced by the user in the game process by actively sending a detection packet at regular intervals. Network metrics of game session connections that can be collected include: 1) time delay; 2) shaking; 3) packet loss rate; 4) a rate of jump; 5) the distribution of the time delay of the game session connection is, for example, a ratio between 0 to 100 milliseconds and 100 to 200 milliseconds.

Client information: the SDK collects client information from both the quality of the game and the device information.

1) The game quality is as follows: frames with rendering delays greater than 200 milliseconds are defined as lag frames, and the number of lag frames per game session connection is counted. The invention defines the ratio of the number of lag frames to the total rendering frames as a lag rate, and uses the lag rate as a measure of game quality.

2) Device information: before the session starts, the SDK may collect the device type, game start time, and some network connection information (wifi, base station, AS, etc.). And in the fighting process, dynamic information such as power consumption, CPU occupancy rate, memory occupancy rate and the like is collected in a certain period.

(II) Relay forwarding System architecture

Fig. 3 shows a schematic diagram of a relay forwarding architecture of an online mobile game. The relay forwarding service uses CDN servers distributed in various places as forwarding nodes. The forwarding nodes are connected through a high-speed network, and have the service characteristics of high bandwidth and low time delay, so that the network performance can be remarkably improved by using the nodes.

Each session connection selects a "direct" path or a "forward" path as a network connection path, and all data transmission in the session connection is completed through the established path. The path of all session connections first "passes" through the carrier's inter-city routes, and the network connection path between the client and the forwarding node is determined by BGP.

In each session connection, after the user completes the formation, the client sends active detection to the game server through the direct connection path and the forwarding path respectively, collects network information such as time delay and the like in a certain time period, and finally reports the network information to the control server. The control server makes decisions according to the data, the model and the limiting conditions (bandwidth resources, server load conditions and the like) reported by the client. In order to avoid the load of the control server exceeding the limit, a plurality of control servers can be applied for paralleling.

Game quality prediction model

Macroscopically, the present invention can express the game Quality as a function of the service Quality, i.e., Lagranio ═ f (Quality)_i) Wherein Lagranio represents a measure of game Quality, Quality_iRepresenting the network service quality, including time delay, packet loss rate, hop rate, etc.

Fig. 4 shows a method route diagram for designing a game quality prediction model of an online mobile phone game. The invention first uses a typical machine learning algorithm to establish a basic prediction model. Environmental factors that affect the quality of service and the quality of the game are then extracted. And finally, determining important environmental factors, dividing the data set according to the environmental factors, and respectively training a game quality model of the online hand game for optimization. The concrete solution is as follows:

1. resolving the complex relationships and interdependencies between factors: the present invention employs a machine learning algorithm as a functional mapping between quality of service and quality of game play.

2. Determining important environmental factors: important environmental factors include static environmental factors and dynamic environmental factors, the static environmental factors mainly include time, network connection, operators and the like, and the dynamic environmental factors mainly refer to factors which have influence on service quality and game quality such as CPU occupancy rate, memory occupancy rate, power consumption and the like.

(IV) Relay Forwarding selection strategy

The problem to be solved is to assign a connection path, either a "direct" or "forward" path, to each game session connection. The invention uses symbol S to represent the set of session connections to be distributed, uses symbol C to represent the option set (direct connection or forwarding), and correspondingly, S belongs to S and C belongs to C to specifically represent a certain session connection and a certain path selection. The invention uses Q (s, c) to represent the user experience that is expected to be obtained when the session connection s selects option c, and the user experience model will be described later. Since the hysteresis rate of the session connection is used as the metric index of the online hand-trip user experience in the invention, a smaller value of Q (s, c) represents a better user experience.

The purpose of designing the relay forwarding selection policy is to allocate a connection path for S e S. The invention uses F: S → C to represent the distribution rule of algorithm output, F (S) represents the connection path of distribution to S epsilon S. The optimization goal of the present invention is to solve an optimal allocation rule, as shown in the following formula:

thus, the problem is actually an optimization problem with a minimum goal, and the objective of the algorithm is to find the optimal solution within the solution space of the optimization problem.

Because the difference of the network performance of the session connection in the matching stage and the fighting stage is small, the indexes collected in the matching stage can be used for predicting the game quality. Since the client actively probes each candidate path ("direct" or "forward") in parallel during the matching phase of each session connection, the game quality can be predicted at each selection.

The invention provides a method based on the combination of prediction and exploration-development. Fig. 5 shows a procedure of a relay forwarding selection policy, and the specific logic is:

1. obtaining a latest trained game quality prediction model Pred, wherein the exploration probability belongs to the element and the threshold eta;

2. initializing an allocation policy F to null;

3. for each session connection S in the set S of session connections to be allocated, respectively calculating the game quality brought under each selected condition according to a game quality prediction model Pred, respectively calculating the game quality brought under the 'direct connection' path and the 'forwarding' path, and calculating the difference Diff between the two paths;

4. when the value of Diff exceeds the threshold η of pruning, we assign the path of session connection s to be the one that minimizes its value of Pred;

5. when the value of Diff is smaller than a threshold eta, exploring by using an element-greedy algorithm according to an element-belonging probability based on Upper Confidence (UCB), and randomly distributing according to a 1-element-belonging probability;

in the whole process, the modeling of the game quality is performed in a certain time period (usually 30-60 minutes), and the selection of data reporting and relay forwarding is performed online in real time.

Wherein the Upper Confidence (UCB) based exploration algorithm is as follows:

1. acquiring a session connection s to be distributed, an option set C, a distribution strategy F and a user experience model Pred; c represents the number of choices of connection paths;

2. initialized confidence minimum ucb_minTo infinity, the optimal allocation of the initial session connections s is chosen c_best；

3. Calculating the average game quality w obtained by the session connection s under each selection by

4. For any one C in the option set C, firstly, according to the distribution strategy F, the session connection set S of the option C is obtained through statistics_cUpdating its confidence value, { s '| F (s') ═ c }, in the updated value

Update ucb_min＝ucb；

In summary, the present invention is based on predictive pruning: in order to reduce a search space of an exploration algorithm and improve decision efficiency, pruning is carried out on the search space according to a prediction result of a model. We first define a threshold η, when the absolute value of the difference between two selected game qualities is greater than the threshold η, select a connection path that makes the model prediction result smaller (the smaller the prediction result is, the higher the game quality is represented), and when the absolute value of the difference is less than the threshold η, we perform path selection by a reinforcement learning method.

Search-development based selection algorithm: we translate the path selection per session connection into a "maximize single step reward" reinforcement learning problem, the set of options C is the set of "actions" and the game experience gained under each selection is "reward". In the exploration-development method, a part of session connection paths are selected (explored) by an Upper Confidence Bound Algorithm (UCB) with a probability of ∈, and another part of session connection paths are selected (developed) by a random method with a probability of 1 ∈. The reason for this is that there is a dynamic change in the quality of the game for each session connection.

And (3) limiting conditions: in a real system environment, due to the limitation of server resources and bandwidth resources, at most, only a certain proportion of session connections can be relayed and forwarded, for example, 10%. To meet these constraints, we can select the session connection with the largest game quality improvement for forwarding, for example, the session connection with the top 10% game quality improvement.

(V) network layered control platform

In order to calculate user experience in real time and operate a relay forwarding strategy to optimize game quality, a network hierarchical control platform needs to meet the following three requirements: 1) in order to solve the dynamic change of the network performance of game session connection, a control platform needs to have global perception capability on the network conditions in the system; 2) the control platform needs to respond to the establishment and the state (time delay, packet loss and the like) of the session connection in real time and make a quick decision; 3) in the face of tens of millions of online users, the platform can be expandable;

in order to meet the requirements, a network hierarchical control platform for online hand-games is provided, logic of network hierarchical control is shown in fig. 6, and the idea behind the logic is that the platform can achieve global perception, real-time response and expandability at a high accuracy rate during decision making even if the platform is not based on latest data.

The model layer collects data of all session connections in the system for modeling in a certain period, and pushes the model to all decision examples (a data center, a front-end server and the like) after training is finished. In addition, the model layer collects global information of the network, such as bandwidth, load and the like of the server, and pushes the global information to the decision instance, so that the decision instance can conveniently make a decision according to the multivariate information. Because the data center network has high bandwidth and low time delay, the information interaction time between the model layer and the decision layer is short and is close to real time.

Decision instances of the decision layer are deployed in servers at different geographic locations. When a new session connection requests a connection, the session connection is redirected to a specific decision instance through a load balancing mechanism. And the decision example calculates according to the information and the model reported by the session connection, and then makes the path selection of the session connection.

The sensing/executing layer mainly comprises the following functions in three aspects: 1) reporting information including service quality information and game quality information to a decision layer; 2) receiving and executing commands of a decision layer, such as frame rate adjustment, relay forwarding selection and the like; 3) and automatic fault tolerance, if the sensing/executing layer cannot receive the instruction of the decision layer when the network is interrupted and the server is down, the game quality of the client can be ensured not to be rapidly reduced, and a new decision example is requested again.

Claims

1. A relay forwarding selection method for online mobile game network transmission is characterized in that: the method comprises the following specific processes:

s1, deploying relay forwarding servers, and using CDN servers distributed everywhere as forwarding nodes;

s2, designing a data-driven game quality model to accurately determine the game quality of a session connection;

finally, important environmental factors are determined, the data set is divided according to the environmental factors, and game quality models of the online hand games are trained respectively for optimization;

s3, determining relay forwarding selection strategy

S31, collecting network performance data, game quality data and client data of online mobile phone game session connection in the system, measuring and analyzing the data, and training a game quality model of the online mobile phone game according to the method in S2;

s32, loading a trained game quality model of the online mobile phone game;

s33, setting the exploration probability E and the threshold eta, initializing the distribution strategy F to be null, and respectively calculating the game quality brought under the direct connection path and the forwarding path and calculating the difference Diff of the game quality and the forwarding path for each session connection S in the set S of the session connections to be distributed; when the value of Diff exceeds the threshold η of pruning, assigning the path of the session connection s to the path whose value of Pred is the minimum;

s34, when the value of Diff is less than the threshold eta, exploring with the probability belonging to the group by utilizing an exploration-discovery algorithm and randomly distributing with the probability belonging to the group of 1-belonging to the group;

2. The method of claim 1, wherein the method comprises the steps of: the specific process of step S1 is as follows:

each game session connection selects 'direct connection' or 'forwarding' as a network connection path, all data transmission in the session connection is completed through the established connection path, all the paths of the session connection firstly 'pass through' the intercity route of an operator, and the path of the network connection between a client and a forwarding node is determined by BGP;

in each session connection, after a user starts a game, a client sends active detection to a game server through a direct connection path and a forwarding path respectively, collects network information in a certain time period, and finally reports the network information to a control server; the control server makes a decision whether to relay the game session connection or not according to the data, the model and the limiting conditions reported by the client; preferably, in order to avoid the load of the control server exceeding the limit, a plurality of control servers can be applied in parallel.

3. The method of claim 1, wherein the method comprises the steps of: the search-discovery algorithm described in step S34 specifically includes the following steps: converting the path selection of each session connection into a reinforced learning problem of maximizing single-step reward, wherein the option set C is a set of actions, and the game experience obtained under each selection is reward; and selecting partial paths of the session connection by using the upper confidence with the probability of ∈ and selecting another partial path of the session connection by using a random method with the probability of 1 ∈.

4. The method of claim 3, wherein the method comprises the steps of: the method for selecting the path of the partial session connection by using the upper trust comprises the following steps:

Update ucb_min＝ucb。

5. The method of claim 1, wherein the method comprises the steps of: the network hierarchical control platform of step S4 includes a model layer, a decision layer, and a sensing/execution layer;

the model layer collects data of all session connections in the system for modeling in a certain period, and pushes the model to all decision instances after training is finished; in addition, the model layer can also collect the global information of the network, such as the bandwidth and the load of the server, and push the global information to the decision example, so that the decision example can conveniently make a decision according to the multivariate information;

decision examples of a decision layer are deployed in servers at different geographic positions, and represent decision programs for whether to perform relay forwarding; when a new session connection request is connected, redirecting the session connection to a specific decision example, namely a server for operating a relay forwarding program, through a load balancing mechanism; the specific decision example calculates according to the information and the model reported by the session connection, and then makes the path selection of the session connection;

the sensing/executing layer mainly comprises: 1) reporting information including service quality information and game quality information to a decision layer; 2) receiving and executing a command of a decision layer; 3) and automatic fault tolerance, if the sensing/executing layer cannot receive the instruction of the decision layer when the network is interrupted and the server is down, the game quality of the client can be ensured not to be rapidly reduced, and a new decision example is requested again.

6. A relay forwarding selection system for online mobile game network transmission is characterized in that: the system comprises: