CN112494938A

CN112494938A - Game resource distribution method and device, electronic equipment and storage medium

Info

Publication number: CN112494938A
Application number: CN202011419023.8A
Authority: CN
Inventors: 冯璟烁; 卢君; 马文晔
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2020-12-07
Filing date: 2020-12-07
Publication date: 2021-03-16
Anticipated expiration: 2040-12-07
Also published as: CN112494938B

Abstract

The present disclosure relates to a game resource distribution method, apparatus, electronic device and storage medium, the method comprising: acquiring current game state information of a plurality of game accounts in the current round of game play; extracting the characteristics of the current game state information to obtain account characteristics corresponding to the game accounts; determining resource configuration characteristics corresponding to the game play of the current round according to account characteristics corresponding to the game accounts; according to the resource allocation characteristics, inquiring in a game resource combination library to obtain a target game resource combination; the plurality of game accounts have the highest predicted probability of participating in the next round of game play after obtaining the target game resource combination; and distributing the target game resource combination to the plurality of game accounts. By adopting the method and the device, the utilization rate of game server resources can be improved.

Description

Game resource distribution method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a method and an apparatus for distributing game resources, an electronic device, and a storage medium.

Background

For example, in a card game for a card game, the related art only selects a card group from a good card library according to a certain card control rule to deal cards, and cannot control the cards according to the states of the emotion, the winning rate and the like of each player, so that the card dealing effect is poor, which results in that the game duration of the game in which the game account participates in the game play through the game server is not long and the frequency of quitting the game play is high, so that the resource utilization rate of the game server is not high.

Disclosure of Invention

The present disclosure provides a game resource distribution method, device, electronic device, and storage medium, to at least solve the problem of low resource utilization of a game server in the related art. The technical scheme of the disclosure is as follows:

according to a first aspect of embodiments of the present disclosure, there is provided a game resource distribution method, the method including:

acquiring current game state information of a plurality of game accounts in the current round of game play;

extracting the characteristics of the current game state information to obtain account characteristics corresponding to the game accounts;

determining resource configuration characteristics corresponding to the game play of the current round according to account characteristics corresponding to the game accounts;

according to the resource allocation characteristics, inquiring in a game resource combination library to obtain a target game resource combination; the plurality of game accounts have the highest predicted probability of participating in the next round of game play after obtaining the target game resource combination;

and distributing the target game resource combination to the plurality of game accounts.

In a possible implementation manner, the querying, according to the resource allocation feature, a game resource combination library to obtain a target game resource combination includes:

according to the resource configuration characteristics, determining a target game resource combination sub-library in a plurality of game resource combination sub-libraries of the game resource combination library; wherein, the similarity among a plurality of candidate game resource combinations in the target game resource combination sub-library meets a preset similarity threshold; the feature similarity between the resource allocation feature and the resource combination feature of the candidate game resource combinations is smaller than a preset feature similarity threshold;

and taking one candidate game resource combination in the target game resource combination sub-library as the target game resource combination.

In one possible implementation manner, the determining, as the target game resource combination, one of the candidate game resource combinations in the sub-pool of target game resource combinations includes:

determining similarity between resource combination features of the plurality of sets of candidate game resource combinations and the resource configuration features;

and taking the candidate game resource combination with the highest similarity as the target game resource combination.

In one possible implementation manner, the determining, according to the account characteristics corresponding to each of the plurality of game accounts, the resource configuration characteristics corresponding to the current round of game play includes:

inputting account characteristics corresponding to the game accounts and preset candidate resource configuration characteristics into a pre-trained prediction network to obtain the prediction probability of the game accounts participating in the next round of game play after obtaining game resource combinations corresponding to the candidate resource configuration characteristics;

and taking the candidate resource configuration characteristic with the highest prediction probability as the resource configuration characteristic corresponding to the game play of the current round.

inputting a plurality of account characteristics corresponding to the plurality of game accounts into a pre-trained agent; the policy function of the pre-trained agent is to generate a first action responsive to the plurality of account characteristics; the first action is an action that the pre-trained agent determines the resource configuration characteristics corresponding to the current round of game play in preset candidate resource configuration characteristics; the plurality of game accounts have the highest prediction probability of participating in the next round of game play after obtaining the game resource combination corresponding to the resource configuration characteristics;

and outputting the first action by using the pre-trained agent to obtain the resource configuration characteristics corresponding to the current round of game play.

In one possible implementation manner, before the step of inputting the plurality of account features corresponding to the plurality of game accounts to the pre-trained agent, the method further includes:

acquiring first account characteristics corresponding to the plurality of game accounts in a first game play;

inputting a plurality of first account characteristics corresponding to the plurality of game accounts into an agent to be trained; the policy function of the agent to be trained is to generate a second action responsive to the plurality of first account characteristics; the second action is an action of determining the resource configuration characteristics corresponding to the first game play in the candidate resource configuration characteristics by the agent to be trained; the second game play is a next game play of the first game play;

acquiring reward data obtained after the pre-trained agent outputs the second action; the bonus data is determined according to the prediction probability that the plurality of game accounts participate in the second game play after obtaining the game resource combination corresponding to the resource configuration characteristics;

and training the intelligent agent to be trained according to the reward data to obtain the pre-trained intelligent agent.

In one possible implementation manner, the training the agent to be trained according to the reward data to obtain the pre-trained agent includes:

after the pre-trained agent outputs the second action, acquiring a second account characteristic responding to the second action from the current game environment by using the agent to be trained as the new account characteristic; the second account characteristics are account characteristics corresponding to the game accounts respectively after the first game is played;

and updating the strategy function of the intelligent agent to be trained based on the new account characteristics and the reward data until the strategy function of the intelligent agent to be trained is converged to obtain the pre-trained intelligent agent.

In one possible implementation manner, the obtaining of the reward data obtained by the pre-trained agent after outputting the second action includes:

inputting the resource configuration characteristics corresponding to the first game play and the first account characteristics corresponding to the game accounts into a pre-trained probability prediction network to obtain the probability of the game accounts participating in the second game play;

deriving the reward data based on a probability of the plurality of gaming accounts participating in the second game round.

In one possible implementation, the method further includes:

obtaining game data of multiple rounds of historical game play; the game data of each round of historical game play comprises account characteristics of a plurality of historical game accounts, resource combination characteristics of historical resource combinations obtained by the plurality of historical game accounts, and game results corresponding to whether the plurality of historical game accounts participate in the next round of game play of the historical game play;

inputting the account characteristics of the historical game accounts and the resource combination characteristics of the historical resource combinations obtained by the historical game accounts into a probability prediction network to be trained to obtain the prediction probability of the next round of game play of the historical game accounts participating in the historical game play;

and training the model parameters of the probability prediction network to be trained according to the difference between the prediction probability and the game result until the trained probability prediction network is obtained and used as the pre-trained probability prediction network.

In a possible implementation manner, after the step of distributing the resource to be distributed according to the resource configuration feature, the method further includes:

obtaining game results of a plurality of game accounts after the game play of the current round is finished;

and updating each set of candidate game resource combination in the game resource combination library according to the game result.

According to a second aspect of the embodiments of the present disclosure, there is provided a game resource distribution apparatus including:

an acquisition unit configured to perform acquisition of current game state information of a plurality of game accounts in a current round of game play;

the extracting unit is configured to perform feature extraction on the current game state information to obtain account features corresponding to the game accounts;

the determining unit is configured to determine resource configuration characteristics corresponding to the game play in the current round according to account characteristics corresponding to the game accounts;

the query unit is configured to execute query in a game resource combination library according to the resource configuration characteristics to obtain a target game resource combination; the plurality of game accounts have the highest predicted probability of participating in the next round of game play after obtaining the target game resource combination;

a distribution unit configured to perform distribution of the target game resource combination to the corresponding plurality of game accounts.

In a possible implementation manner, the querying unit is specifically configured to determine a target game resource combination sub-library from a plurality of game resource combination sub-libraries of the game resource combination library according to the resource configuration feature; wherein, the similarity among a plurality of candidate game resource combinations in the target game resource combination sub-library meets a preset similarity threshold; the feature similarity between the resource allocation feature and the resource combination feature of the candidate game resource combinations is smaller than a preset feature similarity threshold; and taking one candidate game resource combination in the target game resource combination sub-library as the target game resource combination.

In a possible implementation manner, the query unit is specifically configured to perform determining a similarity between a resource combination feature of the plurality of sets of candidate game resource combinations and the resource configuration feature; and taking the candidate game resource combination with the highest similarity as the target game resource combination.

In a possible implementation manner, the determining unit is specifically configured to perform inputting, to a pre-trained prediction network, account characteristics corresponding to the game accounts and preset candidate resource configuration characteristics, so as to obtain a prediction probability that the game accounts participate in a next round of game play after obtaining a game resource combination corresponding to the candidate resource configuration characteristics; and taking the candidate resource configuration characteristic with the highest prediction probability as the resource configuration characteristic corresponding to the game play of the current round.

In one possible implementation, the determining unit is specifically configured to perform inputting a plurality of account features corresponding to the plurality of game accounts into a pre-trained agent; the policy function of the pre-trained agent is to generate a first action responsive to the plurality of account characteristics; the first action is an action that the pre-trained agent determines the resource configuration characteristics corresponding to the current round of game play in preset candidate resource configuration characteristics; the plurality of game accounts have the highest prediction probability of participating in the next round of game play after obtaining the game resource combination corresponding to the resource configuration characteristics; and outputting the first action by using the pre-trained agent to obtain the resource configuration characteristics corresponding to the current round of game play.

In a possible implementation manner, the determining unit is specifically configured to perform acquiring first account characteristics corresponding to the plurality of game accounts in a first game pair; inputting a plurality of first account characteristics corresponding to the plurality of game accounts into an agent to be trained; the policy function of the agent to be trained is to generate a second action responsive to the plurality of first account characteristics; the second action is an action of determining the resource configuration characteristics corresponding to the first game play in the candidate resource configuration characteristics by the agent to be trained; the second game play is a next game play of the first game play; acquiring reward data obtained after the pre-trained agent outputs the second action; the bonus data is determined according to the prediction probability that the plurality of game accounts participate in the second game play after obtaining the game resource combination corresponding to the resource configuration characteristics; and training the intelligent agent to be trained according to the reward data to obtain the pre-trained intelligent agent.

In a possible implementation manner, the determining unit is specifically configured to perform, after the pre-trained agent outputs the second action, acquiring, by using the agent to be trained, a second account feature responding to the second action from a current game environment as the new account feature; the second account characteristics are account characteristics corresponding to the game accounts respectively after the first game is played; and updating the strategy function of the intelligent agent to be trained based on the new account characteristics and the reward data until the strategy function of the intelligent agent to be trained is converged to obtain the pre-trained intelligent agent.

In a possible implementation manner, the determining unit is specifically configured to perform inputting the resource configuration feature corresponding to the first game pair and the first account feature corresponding to each of the plurality of game accounts into a pre-trained probability prediction network, so as to obtain a probability that the plurality of game accounts participate in the second game pair; deriving the reward data based on a probability of the plurality of gaming accounts participating in the second game round.

In one possible implementation manner, the game resource distribution apparatus further includes: a game data acquisition unit configured to perform acquisition of game data of a plurality of rounds of historical game play; the game data of each round of historical game play comprises account characteristics of a plurality of historical game accounts, resource combination characteristics of historical resource combinations obtained by the plurality of historical game accounts, and game results corresponding to whether the plurality of historical game accounts participate in the next round of game play of the historical game play; an input unit configured to perform input of the account characteristics of the historical game accounts and the resource combination characteristics of the historical resource combinations obtained by the historical game accounts into a probability prediction network to be trained, so as to obtain the predicted probability of the historical game accounts participating in the next round of the historical game play; and the training unit is configured to train model parameters of the probability prediction network to be trained according to the difference between the prediction probability and the game result until a trained probability prediction network is obtained and used as the pre-trained probability prediction network.

In one possible implementation manner, the game resource distribution apparatus further includes: the result acquisition unit is configured to execute the game result acquisition of a plurality of game accounts after the game play in the current round is finished; and the updating unit is configured to update each set of candidate game resource combination in the game resource combination library according to the game result.

According to a third aspect of the embodiments of the present disclosure, there is provided an electronic device, including a memory and a processor, where the memory stores a computer program, and the processor implements the game resource distribution method according to the first aspect or any one of the possible implementation manners of the first aspect when executing the computer program.

According to a fourth aspect of embodiments of the present disclosure, there is provided a storage medium having stored thereon a computer program which, when executed by a processor, implements a game resource distribution method as set forth in the first aspect or any one of the possible implementation manners of the first aspect.

According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product, the program product comprising a computer program, the computer program being stored in a readable storage medium, from which at least one processor of a device reads and executes the computer program, so that the device performs the game resource distribution method of any one of the possible implementations of the first aspect.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects: obtaining current game state information of a plurality of game accounts in the current round of game play; extracting the characteristics of the current game state information to obtain account characteristics corresponding to a plurality of game accounts; determining resource configuration characteristics corresponding to the game play of the current round according to account characteristics corresponding to a plurality of game accounts; according to the resource allocation characteristics, inquiring in a game resource combination library to obtain a target game resource combination; after the target game resource combination is obtained, the prediction probability of the plurality of game accounts participating in the next round of game play is the highest; finally, the target game resource combination is distributed to a plurality of corresponding game accounts; therefore, each game account can obtain proper game resources, the prediction probability of each game account participating in the next round of game play is maximized, the frequency of the game accounts exiting the game play is reduced, the time of the game accounts participating in the game play is prolonged, the server does not need to continuously replace the game accounts in the game play, and the resource utilization rate of the server is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

FIG. 1 is a diagram illustrating an application environment for a method of game resource distribution, according to an exemplary embodiment.

FIG. 2 is a flow diagram illustrating a method of game resource distribution, according to an example embodiment.

FIG. 3 is a flow diagram illustrating a method of game resource distribution, according to an example embodiment.

Fig. 4 is a network diagram illustrating a probabilistic predictive network in accordance with an exemplary embodiment.

FIG. 5 is a flow diagram illustrating a method of game resource distribution, according to an example embodiment.

FIG. 6 is a block diagram illustrating a game resource distribution apparatus according to an example embodiment.

Fig. 7 is an internal block diagram of an electronic device shown in accordance with an example embodiment.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure.

The game resource distribution method provided by the present disclosure can be applied to the application environment shown in fig. 1. The server 110 obtains current game state information of a plurality of game accounts in the current round of game play; then, the server 110 performs feature extraction on the current game state information to obtain account features corresponding to the plurality of game accounts; then, the server 110 determines the resource configuration characteristics corresponding to the game play of the current round according to the account characteristics corresponding to the game accounts; finally, the server 110 distributes the resources to be distributed according to the resource configuration characteristics, so that the prediction probability of the game accounts participating in the next round of game play after obtaining the game resource combination matched with the resource configuration characteristics is maximized. In practical applications, the server 110 may be implemented by a stand-alone server or a server cluster composed of a plurality of servers.

Fig. 2 is a flow chart illustrating a method for distributing game resources according to an exemplary embodiment, in practical applications, game resources may refer to resources used by respective game accounts in a game play. For example, in a slot host game, the game resource may be playing cards. In a mahjong game, the game resource may be tiles.

As shown in fig. 2, the method is used in the server 110 of fig. 1 and includes the following steps.

In step S210, current game state information of a plurality of game accounts in the current round of game play is acquired.

In step S220, feature extraction is performed on the current game state information, and account features corresponding to the plurality of game accounts are obtained.

The current game state information may refer to information corresponding to a game state of the plurality of game accounts before participating in the current round of game play. The current game state information may include, but is not limited to, recent (e.g., current day) win or loss information of each game account in the historical game information, player level information, card style information, current emotional state information, and the like.

In a specific implementation, taking a fighting main game as an example, when the server determines that the current round of game play is a card game, the server acquires current game state information of a plurality of game accounts in the current round of game play, such as the current winning or losing information, player level information, card playing style information, current emotion state information and the like. And then, the server extracts the characteristics of the current game state information to obtain account characteristics corresponding to the game accounts.

In step S230, the resource allocation feature corresponding to the current round of game play is determined according to the account feature corresponding to each of the plurality of game accounts.

In a specific implementation, after the server obtains the account features corresponding to the multiple game accounts, the server determines the resource configuration features corresponding to the game play of the current round according to the account features corresponding to the multiple game accounts. Specifically, the server may input account features corresponding to each of the plurality of game accounts to the trained neural network together. And outputting the resource configuration characteristics suitable for the game match of the current round through the trained neural network. Such as card set characteristics.

In step S240, according to the resource allocation feature, a target game resource combination is obtained by querying in the game resource combination library.

In step S250, the target game resource combination is distributed to the plurality of game accounts, so that the predicted probability of each game account participating in the next round of game play after obtaining the target game resource combination is maximized.

Wherein the game resource combination may refer to a set of playing card faces.

The game resource combination library comprises a plurality of sets of candidate game resource combinations corresponding to the resources to be distributed. In practical application, the game resource combination library may be a card control library in a landlord game.

In a specific implementation, after the server determines the resource configuration characteristics corresponding to the game play of the current round, the server may distribute the resources to be distributed to each game account based on the resource configuration characteristics, so that the prediction probability of each game account participating in the next game play of the next round is maximized after obtaining the target game resource combination matched with the resource configuration characteristics.

Taking a ground-fighting primary game as an example, after the server determines card set characteristics suitable for the current round of game play, the server may distribute playing cards to the various game players based on the card set characteristics. Specifically, the server can select a group of cards meeting the requirement in the card control bank based on the card group characteristics and assign good card positions to game players, so that the specific card groups are distributed to the game players leaving the game pair to improve the probability of winning in the game pair of the current round of the game by the game players, and further maximize the probability of participating in the next game pair of the current round of the game by each game player after the game pair of the current round is finished.

To facilitate understanding by those skilled in the art, fig. 3 provides a flow diagram of a game resource distribution method. The server inputs account characteristics corresponding to the game accounts into the trained neural network together. And outputting card group characteristics suitable for the current round of game play through the trained neural network. The server then selects a set of satisfactory card sets in the card control store based on the set characteristics and assigns good spots to the game players.

In the game resource distribution method, the current game state information of a plurality of game accounts in the current round of game play is acquired; extracting the characteristics of the current game state information to obtain account characteristics corresponding to a plurality of game accounts; determining resource configuration characteristics corresponding to the game play of the current round according to account characteristics corresponding to a plurality of game accounts; according to the resource allocation characteristics, inquiring in a game resource combination library to obtain a target game resource combination; after the target game resource combination is obtained, the prediction probability of the plurality of game accounts participating in the next round of game play is the highest; finally, the target game resource combination is distributed to a plurality of corresponding game accounts; therefore, each game account can obtain proper game resources, the prediction probability of each game account participating in the next round of game play is maximized, the frequency of the game accounts exiting the game play is reduced, the time of the game accounts participating in the game play is prolonged, the server does not need to continuously replace the game accounts in the game play, and the resource utilization rate of the server is improved.

In an exemplary embodiment, according to the resource allocation feature, querying the game resource combination library to obtain the target game resource combination includes: according to the resource allocation characteristics, determining a target game resource combination sub-library in a plurality of game resource combination sub-libraries of the game resource combination library; and taking one candidate game resource combination in the target game resource combination sub-library as a target game resource combination.

And the similarity among a plurality of candidate game resource combinations in the target game resource combination sub-library meets a preset similarity threshold.

And the feature similarity between the resource configuration feature and the resource combination feature of the candidate game resource combinations is smaller than a preset feature similarity threshold.

In practical application, the game resource combination library may be a card control library in a landlord game.

In practical applications, the server may extract card set characteristics such as card force contrast, geodesic intensity, excitement, etc. from historical game data in advance.

The server can extract a card library for controlling the cards according to the characteristics of the card groups, and the card library contains the card groups with different characteristics. And setting a threshold value according to the distribution condition of each characteristic, and dividing the whole card library into different sub-libraries so as to meet the card control requirements of different occasions.

In a specific implementation, the process that the server queries and obtains the target game resource combination in the game resource combination library according to the resource configuration characteristics specifically includes:

the server may determine a target game resource combination sub-pool among a plurality of game resource combination sub-pools of the game resource combination pool based on the resource allocation characteristics. The feature similarity between the resource combination features and the resource configuration features of the candidate game resource combinations in the target game resource combination sub-library is smaller than a preset feature similarity threshold.

Then, the server takes one of the candidate game resource combinations in the target game resource combination sub-library as the target game resource combination. Specifically, the server may determine a similarity between resource combination characteristics and resource configuration characteristics of a plurality of sets of candidate game resource combinations; and finally, taking the candidate game resource combination with the highest similarity as the target game resource combination.

According to the technical scheme of the embodiment, a target game resource combination sub-library is determined in a plurality of game resource combination sub-libraries of a game resource combination library according to the resource configuration characteristics; the server can determine the similarity between the resource combination characteristics and the resource configuration characteristics of a plurality of sets of candidate game resource combinations; finally, the candidate game resource combination with the highest similarity is used as a target game resource combination; and furthermore, each game account can acquire proper game resources, the prediction probability of each game account participating in the next round of game play is maximized, the frequency of the game account exiting the game play is reduced, and the server does not need to continuously replace the game accounts in the game play, so that excessive processing resources of the server are avoided being occupied.

In an exemplary embodiment, determining resource configuration characteristics corresponding to the current round of game play according to account characteristics corresponding to each of a plurality of game accounts comprises: inputting account characteristics corresponding to a plurality of game accounts and preset candidate resource configuration characteristics into a pre-trained prediction network to obtain the prediction probability of participating in the next round of game play after a game resource combination corresponding to the candidate resource configuration characteristics is obtained by the plurality of game accounts; and taking the candidate resource configuration characteristic with the highest prediction probability as the resource configuration characteristic corresponding to the game play of the current round.

In the concrete implementation, the playing cards in the game with the game resources as the ground fighter are taken as an example; in the card game, a server can firstly obtain a pre-trained prediction network; then, the server inputs a plurality of account features corresponding to the game accounts and preset candidate card group features into the pre-trained prediction network, and the pre-trained prediction network maps the account features corresponding to the game accounts into retention rates corresponding to the candidate card group features, so that the prediction probability of the game accounts participating in the next round of game play after obtaining the card groups corresponding to the candidate card group features is obtained. Then, the server takes the candidate card group characteristic with the highest prediction probability as the card group characteristic corresponding to the current round of game match, so that the card group which meets the requirement most is matched in the existing card base based on the card group characteristic, and the corresponding card dealing operation is executed.

According to the technical scheme of the embodiment, the account characteristics corresponding to the game accounts and the preset candidate resource configuration characteristics are input into a pre-trained prediction network, so that the prediction probability of the game accounts participating in the next round of game play after obtaining the game resource combination corresponding to the candidate resource configuration characteristics is obtained; and the candidate resource configuration characteristic with the highest prediction probability is used as the resource configuration characteristic corresponding to the game play of the current round, so that the resource configuration characteristic matched with the current game play can be accurately determined in each candidate resource configuration characteristic, the prediction probability of the plurality of game accounts participating in the next game play after obtaining the game resource combination matched with the resource configuration characteristic is maximized, and the game duration of the game accounts participating in the game play is prolonged.

In an exemplary embodiment, determining resource configuration characteristics corresponding to the current round of game play according to account characteristics corresponding to each of a plurality of game accounts comprises: inputting a plurality of account characteristics corresponding to a plurality of game accounts into a pre-trained agent; the policy function of the pre-trained agent is used to generate a first action responsive to the plurality of account characteristics; the first action is an action that the pre-trained agent determines the resource configuration characteristics corresponding to the game play in the current round in each candidate resource configuration characteristic so that a plurality of game accounts participate in the maximization of the prediction probability of the next game play after obtaining the game resource combination matched with the resource configuration characteristics; and outputting a first action by using the pre-trained agent to obtain the resource configuration characteristics corresponding to the game play of the current round.

Wherein the strategy function of the pre-trained agent may be a neural network.

In a specific implementation, the server specifically includes, in a process of mapping the account feature to a retention rate corresponding to each candidate resource configuration feature, and using the candidate resource configuration feature with the highest retention rate as the resource configuration feature corresponding to the game play of the current round: the server inputs a plurality of account characteristics corresponding to a plurality of game accounts into the pre-trained agent. Wherein the pre-trained agent's policy function may generate a first action responsive to the plurality of account characteristics. Specifically, the first action may be an action in which the pre-trained agent determines, among the candidate resource allocation features, a resource allocation feature corresponding to the current round of game play, so that the predicted probability that the plurality of game accounts participate in the next round of game play after obtaining a game resource combination matching the resource allocation feature is maximized. And finally, the server outputs the first action by utilizing the pre-trained intelligent agent, and accurately obtains the resource configuration characteristics corresponding to the game play of the current round.

In an exemplary embodiment, before the step of inputting a plurality of account characteristics corresponding to a plurality of game accounts into the pre-trained agent, the method further includes: acquiring first account characteristics corresponding to a plurality of game accounts in a first game match; inputting a plurality of first account characteristics corresponding to a plurality of game accounts into an agent to be trained; the policy function of the agent to be trained is used to generate a second action responsive to the plurality of first account characteristics; the second action is an action that the agent to be trained determines the resource configuration characteristics corresponding to the first game match in each candidate resource configuration characteristic, so that the plurality of game accounts participate in the second game match after obtaining the game resource combination matched with the resource configuration characteristics, and the prediction probability of the second game match is maximized; the second game match is the next game match of the first game match; acquiring reward data obtained after the pre-trained agent outputs the second action; the reward data is determined according to the prediction probability that the plurality of game accounts participate in the second game play after obtaining the game resource combination matched with the resource configuration characteristics; and performing reinforcement learning training on the agent to be trained according to the reward data to obtain the pre-trained agent.

For example, the server may set up the RL learning framework to maximize long-term benefits (e.g., user retention) with account characteristics as state and deck characteristics as action. The RL learning framework includes two important components, namely the state of the environment and the interaction of the environment with the decision maker. In a changing environment, at a certain time, the environment is in a certain state (state), and a decider needs to make a decision according to the state and make a certain action (action). Each action causes the environment to change and a new state is reached, which affects the action of the decision maker at the new time. Under the RL framework, different states and different actions generate rewarded (which may be negative). In the ongoing interaction and development of the whole environment and decision makers, the RL wishes to find a certain policy, i.e. how to decide on for different states, to maximize the long-term accumulated rewarded. In the dealing model, the state of the user is the state that influences the dealing decision, the action of the dealt card influences the next state, and policy is how to deal the card according to the state of the player.

In an exemplary embodiment, training reinforcement learning for an agent to be trained according to reward data to obtain a pre-trained agent comprises: after the pre-trained agent outputs a second action, acquiring a second account characteristic responding to the second action from the current game environment by using the agent to be trained as a new account characteristic; and updating the strategy function of the intelligent agent to be trained based on the new account characteristics and the reward data until the strategy function of the intelligent agent to be trained is converged to obtain the pre-trained intelligent agent.

The second account characteristics are account characteristics corresponding to the game accounts after the first game is played.

In the concrete implementation, the server is carrying out reinforcement learning training on the intelligent agent to be trained according to reward data, and the process of obtaining the intelligent agent to be trained specifically comprises the following steps: the server can utilize the pre-trained agent to output a second action, and then utilize the agent to be trained to acquire a second account characteristic responding to the second action from the current game environment as a new account characteristic; and updating the strategy function of the intelligent agent to be trained based on the new account characteristics and the reward data until the strategy function of the intelligent agent to be trained is converged to obtain the pre-trained intelligent agent.

According to the technical scheme of the embodiment, a plurality of account characteristics corresponding to a plurality of game accounts are input to a pre-trained intelligent agent; the policy function of the pre-trained agent is used to generate a first action responsive to the plurality of account characteristics; the first action is an action that the pre-trained agent determines the resource configuration characteristics corresponding to the game play in the current round in each candidate resource configuration characteristic so that a plurality of game accounts participate in the maximization of the prediction probability of the next game play after obtaining the game resource combination matched with the resource configuration characteristics; outputting a first action by using a pre-trained agent to obtain resource configuration characteristics corresponding to the game play of the current round, mapping account characteristics to retention rates corresponding to the candidate resource configuration characteristics accurately through reinforcement learning, and taking the candidate resource configuration characteristics with the highest retention rate as the resource configuration characteristics corresponding to the game play of the current round

In an exemplary embodiment, obtaining reward data obtained by a pre-trained agent after outputting the second action comprises: inputting the resource configuration characteristics corresponding to the first game play and the first account characteristics corresponding to the multiple game accounts into a pre-trained probability prediction network to obtain the probability of the multiple game accounts participating in the second game round; reward data is derived based on the probability of the plurality of game accounts participating in the second game round.

The pre-trained probability prediction network is obtained by training game data based on multiple rounds of historical game play.

The game data of each round of historical game play comprises account characteristics of a plurality of historical game accounts, resource combination characteristics of historical resource combinations obtained by the plurality of historical game accounts and game results corresponding to whether the plurality of historical game accounts participate in the next round of game play of the historical game play.

In a specific implementation, the server specifically includes, in the process of acquiring reward data obtained by the pre-trained agent after outputting the second action: the server can input the resource configuration characteristics corresponding to the first game play and the first account characteristics corresponding to the game accounts into a pre-trained probability prediction network to obtain the probability of the game accounts participating in the second game round; the bonus data is derived based on the probability of the plurality of game accounts participating in the second game round, i.e., the probability of each game account continuing the game.

Taking the game of the main fighting as an example, the server may input account characteristics corresponding to player one, player two, and player three (i.e., player one characteristic, player two characteristic, and player three characteristic) and card group characteristics corresponding to the first game match into the pre-trained probability prediction network to obtain the predicted probability of whether player one, player two, and player three continue the game.

To facilitate understanding by those skilled in the art, fig. 4 provides a network diagram of a probabilistic predictive network; the player one characteristic, the player two characteristic and the player three characteristic can be input into the corresponding first full-connection networks to be subjected to full-connection processing, and the processed player one characteristic, player two characteristic and player three characteristic are obtained; then, inputting the processed first player characteristic, the second player characteristic and the third player characteristic as well as the card group characteristic corresponding to the first game match into a second full-connection network or a factorization machine to obtain the processed characteristics; the method adopts a factorization machine to increase interaction effect in a model, enhance nonlinear learning ability and also increase a ReLU activation function. Then, inputting the processed features into a dropout layer, and further reducing the overfitting condition of the model by deleting neurons; and finally, performing multi-classification processing on the processed features through a Softmax output layer, and mapping the prediction probability of whether the player I, the player II and the player III continue the game to be 0-1 for output through a Sigmoid activation function.

Finally, the server may determine bonus data that the pre-trained agent receives after outputting the second action based on the predicted probabilities of whether player one, player two, and player three continue the game.

In the technical scheme of the embodiment, the resource configuration characteristics corresponding to the first game play and the first account characteristics corresponding to the plurality of game accounts are input into the pre-trained probability prediction network, the probability of the plurality of game accounts participating in the second game round may be accurately derived, and based on the probability of the plurality of game accounts participating in the second game round, and further determining the reward data obtained by the pre-trained intelligent agent after outputting the second action, thereby realizing that whether the player continues to play is taken as the reward data of the pre-trained intelligent agent to carry out reinforcement learning on the pre-trained intelligent agent, so that the trained intelligent agent can accurately determine the resource configuration characteristics corresponding to the game play in the current round in each candidate resource configuration characteristic, so as to maximize the predicted probability that the plurality of game accounts participate in the next round of game play after obtaining the game resource combination matched with the resource configuration characteristics.

In an exemplary embodiment, the method further comprises: obtaining game data of multiple rounds of historical game play; the game data of each round of historical game play comprises account characteristics of a plurality of historical game accounts, resource combination characteristics of historical resource combinations obtained by the plurality of historical game accounts, and game results corresponding to whether the plurality of historical game accounts participate in the next round of game play of the historical game play; inputting the account characteristics of the historical game accounts and the resource combination characteristics of the historical resource combinations obtained by the historical game accounts into a probability prediction network to be trained to obtain the prediction probability of the next round of game play of the historical game accounts participating in the historical game play; and training the model parameters of the probability prediction network to be trained according to the difference between the prediction probability and the game result until the trained probability prediction network is obtained and is used as a pre-trained probability prediction network.

In specific implementation, the server can obtain a pre-trained probability prediction network by using the probability prediction network to be trained based on game data of multiple rounds of historical game play. Specifically, the server may obtain game data for a plurality of historical game plays; the game data of each round of historical game play comprises account characteristics of a plurality of historical game accounts, resource combination characteristics of historical resource combinations obtained by the plurality of historical game accounts, and game results corresponding to whether the plurality of historical game accounts participate in the next round of game play of the historical game play; secondly, the server inputs the account characteristics of the historical game accounts and the resource combination characteristics of the historical resource combination obtained by the historical game accounts into a probability prediction network to be trained to obtain the prediction probability of the next round of game play of the historical game accounts participating in the historical game play; and finally, the server trains the model parameters of the probability prediction network to be trained according to the difference between the prediction probability and the game result until the trained probability prediction network is obtained and is used as the pre-trained probability prediction network.

According to the technical scheme of the embodiment, game data of multiple rounds of historical game play are obtained; inputting the account characteristics of the historical game accounts and the resource combination characteristics of the historical resource combinations obtained by the historical game accounts into a probability prediction network to be trained to obtain the prediction probability of the next round of game play of the historical game accounts participating in the historical game play; and finally, the server trains the model parameters of the probability prediction network to be trained according to the difference between the prediction probability and the game result until the trained probability prediction network is obtained, so that the trained probability prediction network can accurately output the probability of the game accounts participating in the second game round based on the resource configuration characteristics corresponding to the first game play and the first account characteristics corresponding to the game accounts, and further the subsequent awarded data for reinforcement learning based on the probability is determined.

In an exemplary embodiment, after the step of distributing the resource to be distributed according to the resource configuration feature, the method further includes: obtaining game results of a plurality of game accounts after the game play of the current round is finished; and updating each set of candidate game resource combination in the game resource combination library according to the game result.

In the concrete implementation, after the server distributes the resources to be distributed according to the resource configuration characteristics, the server can also obtain game results of a plurality of game accounts after the game play of the current round is finished; the server may then update each set of candidate game resource combinations in the pool of game resource combinations based on the game outcome.

According to the technical scheme of the embodiment, the game results of a plurality of game accounts after the game play of the current round is finished are obtained; and updating each set of candidate game resource combination in the game resource combination library according to the game result, so that the characteristics of the card library can be dynamically updated based on the game match data newly generated by game match.

Fig. 5 is a flowchart illustrating another game resource distribution method according to an exemplary embodiment, which is used in the server 110 in fig. 1, as shown in fig. 5, and includes the following steps. In step S510, current game state information of a plurality of game accounts in the current round of game play is acquired. In step S520, feature extraction is performed on the current game state information to obtain account features corresponding to the plurality of game accounts. In step S530, resource allocation characteristics corresponding to the current round of game play are determined according to account characteristics corresponding to the plurality of game accounts. In step S540, according to the resource allocation feature, determining a target game resource combination sub-library from a plurality of game resource combination sub-libraries of the game resource combination library; wherein, the similarity among a plurality of candidate game resource combinations in the target game resource combination sub-library meets a preset similarity threshold; the feature similarity between the resource allocation feature and the resource combination feature of the candidate game resource combinations is smaller than a preset feature similarity threshold. In step S550, a similarity between the resource combination feature and the resource allocation feature of the plurality of sets of candidate game resource combinations is determined. In step S560, the candidate game resource combination with the highest similarity is used as the target game resource combination; and the plurality of game accounts have the highest predicted probability of participating in the next round of game play after obtaining the target game resource combination. In step S570, the target game resource combination is distributed to the plurality of game accounts. It should be noted that, for the specific limitations of the above steps, reference may be made to the above specific limitations of a game resource distribution method, which is not described herein again.

It should be understood that although the steps in the flowcharts of fig. 2 and 5 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2 and 5 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed in turn or alternately with other steps or at least some of the other steps.

FIG. 6 is a block diagram illustrating a game resource distribution apparatus according to an example embodiment. Referring to fig. 6, the apparatus includes:

an obtaining unit 610 configured to perform obtaining current game state information of a plurality of game accounts in a current round of game play;

an extracting unit 620 configured to perform feature extraction on the current game state information to obtain account features corresponding to the plurality of game accounts;

a determining unit 630, configured to perform determining, according to account characteristics corresponding to each of the plurality of game accounts, resource configuration characteristics corresponding to the current round of game play;

the query unit 640 is configured to perform query in the game resource combination library according to the resource configuration characteristics to obtain a target game resource combination; the plurality of game accounts have the highest predicted probability of participating in the next round of game play after obtaining the target game resource combination;

a distribution unit 650 configured to perform distribution of the target game resource combination to the plurality of game accounts.

In a possible implementation manner, the querying unit 640 is specifically configured to determine a target game resource combination sub-library from a plurality of game resource combination sub-libraries of the game resource combination library according to the resource configuration feature; wherein, the similarity among a plurality of candidate game resource combinations in the target game resource combination sub-library meets a preset similarity threshold; the feature similarity between the resource allocation feature and the resource combination feature of the candidate game resource combinations is smaller than a preset feature similarity threshold; and taking one candidate game resource combination in the target game resource combination sub-library as the target game resource combination.

In a possible implementation manner, the querying unit 640 is specifically configured to perform determining similarity between the resource combination feature and the resource configuration feature of the plurality of sets of candidate game resource combinations; and taking the candidate game resource combination with the highest similarity as the target game resource combination.

In a possible implementation manner, the determining unit 630 is specifically configured to perform inputting, to a pre-trained prediction network, account characteristics corresponding to the game accounts and preset candidate resource configuration characteristics, so as to obtain a prediction probability that the game accounts participate in a next round of game play after obtaining a game resource combination corresponding to the candidate resource configuration characteristics; and taking the candidate resource configuration characteristic with the highest prediction probability as the resource configuration characteristic corresponding to the game play of the current round.

In a possible implementation manner, the determining unit 630 is specifically configured to perform inputting a plurality of account features corresponding to the plurality of game accounts into a pre-trained agent; the policy function of the pre-trained agent is to generate a first action responsive to the plurality of account characteristics; the first action is an action that the pre-trained agent determines the resource configuration characteristics corresponding to the current round of game play in preset candidate resource configuration characteristics; the plurality of game accounts have the highest prediction probability of participating in the next round of game play after obtaining the game resource combination corresponding to the resource configuration characteristics; and outputting the first action by using the pre-trained agent to obtain the resource configuration characteristics corresponding to the current round of game play.

In a possible implementation manner, the determining unit 630 is specifically configured to perform obtaining a first account characteristic corresponding to each of the plurality of game accounts in the first game pair; inputting a plurality of first account characteristics corresponding to the plurality of game accounts into an agent to be trained; the policy function of the agent to be trained is to generate a second action responsive to the plurality of first account characteristics; the second action is an action of determining the resource configuration characteristics corresponding to the first game play in the candidate resource configuration characteristics by the agent to be trained; the second game play is a next game play of the first game play; acquiring reward data obtained after the pre-trained agent outputs the second action; the bonus data is determined according to the prediction probability that the plurality of game accounts participate in the second game play after obtaining the game resource combination corresponding to the resource configuration characteristics; and training the intelligent agent to be trained according to the reward data to obtain the pre-trained intelligent agent.

In a possible implementation manner, the determining unit 630 is specifically configured to perform, after the pre-trained agent outputs the second action, acquiring, by using the agent to be trained, a second account feature responding to the second action from a current game environment as the new account feature; the second account characteristics are account characteristics corresponding to the game accounts respectively after the first game is played; and updating the strategy function of the intelligent agent to be trained based on the new account characteristics and the reward data until the strategy function of the intelligent agent to be trained is converged to obtain the pre-trained intelligent agent.

In a possible implementation manner, the determining unit 630 is specifically configured to perform inputting the resource configuration feature corresponding to the first game pair and the first account feature corresponding to each of the plurality of game accounts into a pre-trained probability prediction network, so as to obtain a probability that the plurality of game accounts participate in the second game pair; deriving the reward data based on a probability of the plurality of gaming accounts participating in the second game round.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Fig. 7 is a block diagram illustrating an electronic device 700 for performing a game resource distribution method according to an example embodiment. For example, the electronic device 700 may be a server. Referring to fig. 7, electronic device 700 includes a processing component 720 that further includes one or more processors, and memory resources, represented by memory 722, for storing instructions, such as applications, that are executable by processing component 720. The application programs stored in memory 722 may include one or more modules that each correspond to a set of instructions. Further, the processing component 720 is configured to execute instructions to perform the game resource distribution method described above.

The electronic device 700 may also include a power component 724 configured to perform power management of the electronic device 700, a wired or wireless network interface 726 configured to connect the electronic device 700 to a network, and an input-output (I/O) interface 728. The electronic device 700 may operate based on an operating system stored in memory 722, such as Windows S Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, or the like.

In an exemplary embodiment, a storage medium comprising instructions, such as the memory 722 comprising instructions, executable by a processor of the electronic device 700 to perform the above-described method is also provided. The storage medium may be a non-transitory computer readable storage medium, which may be, for example, a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A game resource distribution method, characterized in that the method comprises:

2. The method for distributing game resources according to claim 1, wherein the querying a game resource combination library for a target game resource combination according to the resource allocation feature comprises:

3. The method for distributing game resources according to claim 2, wherein the step of using one of the candidate game resource combinations in the sub-pool of target game resource combinations as the target game resource combination comprises:

4. The method for distributing game resources according to claim 1, wherein the determining, according to the account characteristics corresponding to each of the plurality of game accounts, the resource configuration characteristics corresponding to the current round of game play comprises:

5. The method for distributing game resources according to claim 1, wherein the determining, according to the account characteristics corresponding to each of the plurality of game accounts, the resource configuration characteristics corresponding to the current round of game play comprises:

6. The method of claim 5, further comprising, prior to the step of inputting the account characteristics corresponding to the game accounts into the pre-trained agent:

7. The method for distributing game resources of claim 6, wherein the training the agent to be trained according to the reward data to obtain the pre-trained agent comprises:

8. A game resource distribution apparatus, comprising:

9. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the game resource distribution method of any one of claims 1 to 7.

10. A storage medium in which instructions, when executed by a processor of an electronic device, enable the electronic device to perform a game resource distribution method as claimed in any one of claims 1 to 7.