CN111330282A

CN111330282A - Method and device for determining card-playing candidate items

Info

Publication number: CN111330282A
Application number: CN202010104375.8A
Authority: CN
Inventors: 陈杰; 倪煜
Original assignee: Beijing Kingsoft Cloud Network Technology Co Ltd
Current assignee: Beijing Kingsoft Cloud Network Technology Co Ltd
Priority date: 2020-02-20
Filing date: 2020-02-20
Publication date: 2020-06-26

Abstract

The embodiment of the application provides a method for determining a card-playing candidate item, which comprises the following steps: acquiring state characteristics of a target player and combined card-playing characteristics corresponding to the target player; the joint card playing characteristics corresponding to the target player are historical card playing characteristics of each player participating in the game; obtaining the reward value of each card-playing candidate item in the card-playing candidate item set corresponding to the target player according to the state characteristic of the target player, the combined card-playing characteristic corresponding to the target player and a pre-trained machine learning model; wherein the prize value of the first one of the card-playing candidate items is indicative of a likelihood of winning the game based on triggering of a card-playing action by the first one of the card-playing candidate items, the first one of the card-playing candidate items being any one of a set of card-playing candidate items; and determining the card-playing candidates of the target player in the current round of game according to the prize value of each card-playing candidate in the obtained card-playing candidate set. The effect of the playing candidates of the target player determined by the method is better.

Description

Method and device for determining card-playing candidate items

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a method and an apparatus for determining a candidate for a card-out item.

Background

With the development of science and technology, some intelligent card games, such as landlord and mahjong, appear. In the process of playing the game, the terminal device may provide the corresponding card-playing candidate items for the user, or when the user does not play cards for a certain time, the terminal device may automatically trigger a card-playing action based on the card-playing candidate items.

However, in the conventional technology, since the terminal device randomly determines the cards satisfying the card-playing requirement from the cards held by the user when determining the card-playing candidates. The randomly determined card-playing candidate items are often not the best, so that the terminal device in the traditional technology is not intelligent enough to determine the card-playing candidate items, and the effect is not good. The determined card-playing candidates are not good in effect, namely that the possibility of winning the game after playing the cards based on the card-playing candidates is not high or the possibility of playing the cards based on the card-playing candidates is low for a user.

Therefore, a scheme is urgently needed to be provided, and the effect of the card-playing candidate determined by the terminal equipment can be better.

Disclosure of Invention

The technical problem to be solved by the application is how to make the effect of the card-playing candidate item determined by the terminal device better, and a method and a device for determining the card-playing candidate item are provided.

In a first aspect, an embodiment of the present application provides a method for determining a card-playing candidate, where the method includes:

acquiring state characteristics of a target player and combined card-playing characteristics corresponding to the target player; the joint card playing characteristics corresponding to the target player are historical card playing characteristics of each player participating in the game;

obtaining the reward value of each card-playing candidate item in the card-playing candidate item set corresponding to the target player according to the state characteristic of the target player, the combined card-playing characteristic corresponding to the target player and a pre-trained machine learning model; wherein the prize value of a first one of the set of card-playing candidates is indicative of a likelihood of winning a game based on the first one of the set of card-playing candidates triggering a card-playing action;

and determining the card-playing candidate items of the target player in the current round of game according to the obtained bonus value of each card-playing candidate item in the card-playing candidate item set.

Optionally, the method further includes: determining a card-playing candidate item set corresponding to a target player, wherein the card-playing candidate item set comprises one or more card-playing candidate items; the obtaining of the bonus value of each of the poker candidates in the poker candidate set corresponding to the target player according to the state feature of the target player, the joint poker feature corresponding to the target player, and the pre-trained machine learning model includes:

inputting the state characteristics of the target player, the combined card-playing characteristics corresponding to the target player and a preset card-playing item set into the machine learning model to obtain the reward value corresponding to each card-playing item in the card-playing item set;

and obtaining the bonus value of each card-playing candidate item in the card-playing candidate item set corresponding to the target player according to the bonus value corresponding to the card-playing item and the card-playing candidate item set corresponding to the target player.

Optionally, the method further includes:

determining a card-playing candidate item set corresponding to a target player, wherein the card-playing candidate item set comprises one or more card-playing candidate items;

the obtaining of the award value of each of the poker candidates in the poker candidate set corresponding to the target player according to the state feature of the target player, the joint poker feature corresponding to the target player, and the pre-trained machine learning model includes:

and inputting the state characteristic of the target player, the combined playing characteristic corresponding to the target player and the playing candidate item set corresponding to the target player into the machine learning model to obtain the bonus value of each playing candidate item in the playing candidate item set corresponding to the target player.

Alternatively to this, the first and second parts may,

the machine learning model is obtained based on sample data training;

the sample data comprises state characteristics of historical players, joint playing characteristics corresponding to the historical players and winning and losing results corresponding to the historical players; the joint playing characteristics corresponding to the historical players are the historical playing characteristics of each player participating in the game in the local game corresponding to the sample data.

Optionally, the status characteristics of the target player include any one or more of the following:

the character of the target player, the cards held by the target player, and the historical playing of each player in the game.

Optionally, the joint playing characteristics corresponding to the target player are characteristics describing face contents of historical playing of each player participating in the game and embodying playing sequence of each player.

Alternatively to this, the first and second parts may,

the obtaining of the bonus value of each of the poker candidates in the poker candidate set corresponding to the target player according to the state feature of the target player, the joint poker feature corresponding to the target player, and a pre-trained machine learning model includes:

determining a character of the target player and determining a machine learning model corresponding to the character of the target player;

and obtaining the reward value of each card-playing candidate item in the card-playing candidate item set corresponding to the target player according to the state characteristic of the target player, the combined card-playing characteristic corresponding to the target player and the machine learning model corresponding to the role of the target player.

Optionally, the machine learning model is a deep reinforcement learning DQN model.

Optionally, determining the card-playing candidates of the target player in the current round of game according to the obtained bonus value of each card-playing candidate in the card-playing candidate set, including:

determining one or more card-playing candidate items with the bonus value larger than or equal to a preset threshold value as the card-playing candidate items of the target player in the current round of game; alternatively, the first and second electrodes may be,

and determining the card-playing candidate item with the maximum bonus value as the card-playing candidate item of the target player in the current round of game.

Optionally, after determining the card-playing candidates of the target player in the current round of game, the method further includes:

and executing a first operation according to the determined card-playing candidate items of the target player in the current round of game.

Optionally, the performing a first operation according to the determined card-playing candidates of the target player in the current round of game includes:

displaying the card-playing candidate items of the target player in the current round of game in a preset mode; alternatively, the first and second electrodes may be,

and triggering card-playing behaviors aiming at the card-playing candidates of the target player in the current round of game.

In a second aspect, an embodiment of the present application provides an apparatus for determining a card-playing candidate, where the apparatus includes:

the card-playing device comprises an acquisition unit, a playing unit and a playing unit, wherein the acquisition unit is used for acquiring the state characteristics of a target player and the combined card-playing characteristics corresponding to the target player; the joint card playing characteristics corresponding to the target player are historical card playing characteristics of each player participating in the game;

the first determining unit is used for obtaining the reward value of each card-playing candidate item in the card-playing candidate item set corresponding to the target player according to the state characteristic of the target player, the combined card-playing characteristic corresponding to the target player and a pre-trained machine learning model; wherein the prize value of a first one of the set of card-playing candidates is indicative of a likelihood of winning a game based on the first one of the set of card-playing candidates triggering a card-playing action;

and the second determining unit is used for determining the card-playing candidates of the target player in the current round of game according to the obtained bonus value of each card-playing candidate in the card-playing candidate set.

Optionally, the apparatus further comprises:

a third determining unit, configured to determine a playing candidate item set corresponding to the target player, where the playing candidate item set includes one or more playing candidate items;

the first determining unit is specifically configured to:

Optionally, the apparatus further comprises:

the first determining unit is specifically configured to:

Alternatively to this, the first and second parts may,

the machine learning model is obtained based on sample data training;

Alternatively to this, the first and second parts may,

the first determining unit is specifically configured to:

Optionally, the second determining unit is specifically configured to:

Optionally, the apparatus further comprises:

and the operation unit is used for executing a first operation according to the determined card-playing candidates of the target player in the current round of game after determining the card-playing candidates of the target player in the current round of game.

Optionally, the operation unit is specifically configured to:

after the card-playing candidate items of the target player in the current round of game are determined, displaying the card-playing candidate items of the target player in the current round of game in a preset mode; alternatively, the first and second electrodes may be,

after the card-playing candidate items of the target player in the current round of game are determined, card-playing behaviors are triggered aiming at the card-playing candidate items of the target player in the current round of game.

In a third aspect, an embodiment of the present application provides an apparatus, where the apparatus includes: a processor and a memory:

the memory is used for storing a computer program and transmitting the computer program to the processor;

the processor is adapted to perform the method of any of the above first aspects in accordance with the computer program.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium for storing a computer program for executing the method of any one of the above first aspects.

Compared with the prior art, the embodiment of the application has the following advantages:

the embodiment of the application provides a method for determining a card-playing candidate item, and particularly, the currently played card is determined by considering that in practical application, when a target player plays cards, the current card is determined by combining the own state of the target player and the card-playing conditions of all players. In view of this, in the embodiment of the present application, the state characteristic of the target player and the associated card-playing characteristic corresponding to the target player (i.e. the historical card-playing characteristic describing each player participating in the game) may be combined to determine the card-playing candidates corresponding to the target player.

Specifically, the state feature of the target player and the combined card-playing feature corresponding to the target player may be obtained, and the state feature of the target player and the combined card-playing feature corresponding to the target player may be input into a pre-trained machine learning model, so as to obtain card-playing candidates of the target player in the current round of game. Wherein, the machine learning model is obtained by training based on sample data; the sample data comprises state characteristics of historical players, joint card-playing characteristics corresponding to the historical players and winning and losing results of the historical players in the local games to which the sample data belongs; the joint playing characteristics corresponding to the historical players are the historical playing characteristics of each player participating in the game in the local game corresponding to the sample data. It can be understood that, since the card-playing candidates of the target player in the current round of game are determined based on the machine learning model, if the card-playing candidates corresponding to the target player obtained based on the machine learning model are played, the probability that the target player wins the current round of game is higher. The initial purpose of a user to determine how to play a card while playing a game is to win the game in large part. Therefore, by the scheme provided by the embodiment of the application, the possibility that the determined playing candidates of the target player are selected by the user is higher, namely the determined playing candidates are better.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a schematic flow chart illustrating a method for determining a card-playing candidate according to an embodiment of the present disclosure;

FIG. 2 is a schematic structural diagram of a machine learning model according to an embodiment of the present disclosure;

fig. 3 is a schematic structural diagram of an apparatus for determining a card-playing candidate provided in an embodiment of the present application;

fig. 4 is a schematic structural diagram of a terminal device according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The inventor of the present application has found through research that, in the conventional technology, when determining the card-playing candidates, the terminal device randomly determines cards satisfying the card-playing requirements held by the user. The randomly determined card-playing candidate items are often not the best card-playing candidate items, so the effect of determining the card-playing candidate items by the terminal device in the traditional technology is not good.

In order to solve the above problem, embodiments of the present application provide a method for determining a card-playing candidate, where the determined card-playing candidate of a target player is more likely to be selected by a user or to win a game, i.e., the determined card-playing candidate is more effective.

Various non-limiting embodiments of the present application are described in detail below with reference to the accompanying drawings.

Exemplary method

Referring to fig. 1, a flowchart of a method for determining a card-playing candidate provided in an embodiment of the present application is shown.

The method for determining the card-playing candidate items provided by the embodiment of the application can be executed by a terminal device, and specifically, a corresponding game application program can be installed on the terminal device, or the terminal device can access a webpage capable of playing a game, so that the terminal device executes the method for determining the card-playing candidate items provided by the embodiment of the application. The embodiment of the application is not particularly limited to the terminal device, the terminal device may be a mobile terminal such as a smart phone and a tablet computer, and the terminal device may also be a terminal device such as a desktop computer and a mobile workstation.

The method for determining the card-playing candidates provided by the embodiment of the application can be implemented through the following steps S101 to S103, for example.

S101: acquiring state characteristics of a target player and combined card-playing characteristics corresponding to the target player; the joint playing characteristics corresponding to the target players are historical playing characteristics of each player participating in the game.

The game mentioned in the embodiment of the present application may be a card game, such as a card game, a mahjong game, or a field game, such as a three-player field. The embodiments of the present application are not particularly limited.

For ease of understanding, the concepts of "wheels" and "offices" that may be referred to in the embodiments of the present application will first be described. In which a game may include one or more rounds, and in a round of the game, the individual game characters may be dealt in a sequence, such as, for example, three players playing the game, beginning with the ground, dealing first, then with the ground, dealing with the next player, and then with the ground, dealing with the next player. After the landlord plays the cards, the landlord plays the cards at home, and then the landlord plays the cards at home can be called a round of game. Of course, the owner may choose not to play the cards, and is not limited herein. After a plurality of rounds, the game is ended, and the game is ended. In some cases, "a game of play" may also be referred to as "a game of chance".

In the embodiment of the present application, it is considered that, in practical application, when a target player plays a card, the current card played is determined by combining not only the state of the target player itself but also the playing condition of each player. In view of this, in the embodiment of the present application, the status feature of the target player and the corresponding joint playing feature of the target player may be obtained first. Thereby determining a discard candidate for the target player based on the status characteristic of the target player and the corresponding joint discard characteristic of the target player.

It should be noted that the state feature of the target player refers to a feature describing a state of the target player in the game, and the state feature of the target player may affect the card-playing strategy of the target player, that is, may affect the target player to determine the current card played.

The embodiment of the present application does not specifically limit the status characteristics of the target player, and it is considered that in practical applications, the target player has a specific role in the game. Taking a three-player landlord as an example, the target player may be the landlord and the farmer, and for the mahjong game, the target player may be the dealer or the ordinary player. It can be understood that the playing condition of the target player has a certain relationship with the character of the target player, taking three-player fighting as an example, if the target player is the landowner, the target player needs to compete with two other farmers, and if the target player is the farmer, the target player only needs to compete with the landowner, and is in a cooperative relationship with the other farmer player. In other words, the role of the target player may influence the card-playing strategy of the target player, and therefore, in the embodiment of the present application, the status feature of the target player may include the role of the target player.

In the embodiment of the present application, it is considered that in practical applications, when determining what cards are currently played, the target player can generally combine the cards held by the target player and the cards that have been played by various players (including the target player itself) in the game, so as to determine what cards should be played. In the embodiment of the present application, the cards that have been played by each player in the current game are referred to as "historical cards played by each player in the current game". That is, the cards held by the target player and the historical playing of each player in the game may affect the playing strategy of the target player to some extent. Therefore, in the embodiment of the present application, the status characteristics of the target player may include cards held by the target player and/or historical cards played by each player in the current round of game.

Regarding the joint playing characteristics of the target players, it should be noted that, in the embodiment of the present application, the joint playing characteristics of the target players may be historical playing characteristics of the respective players participating in the game. It is contemplated that in practice, all of the historical card plays of the individual players may affect the current play strategy of the target players, while some of the historical card plays of the individual players in the current round, such as the previous round, may also affect the play strategy of the target players in the current round. In view of this, in the embodiment of the present application, the combined playing characteristics of the target players may be all the historical playing characteristics of the players participating in the game, or may be part of the historical playing characteristics of the players participating in the game (for example, the playing characteristics of the previous round of playing of the players participating in the game), and the embodiment of the present application is not particularly limited.

It will be appreciated that in practice, each player plays the game in a certain sequence, which is: landowner-owner. The target player often determines how to play the cards in conjunction with the playing sequence when determining the playing strategy. For example, the playing strategy is determined by combining the playing sequence and the own role of the target player to determine the corresponding competition relationship and cooperation relationship. In view of this, in an implementation manner of the embodiment of the present application, the joint playing characteristics corresponding to the target player may be card face contents describing historical playing of each player participating in the game and characteristics representing playing sequence of each player. Namely, the combined playing characteristics corresponding to the target player not only can show the content of the playing face of each player in the history of playing cards, but also can show the playing sequence of each player. For example, the joint play characteristic of the target player may be a characteristic that the target player and the players play the cards sequentially in the playing order, among the contents of the cards played by the respective players in the history. With regard to the contents of the card, for example, the contents of the card may be, for example, a pair of K, three J, etc. Regarding the joint playing characteristics of the target players, taking a three-player fighting landowner as an example, assuming that the target player U is a farmer player, the target player U is a previous player of the landowner player O, and a next player of the landowner player O is a farmer player B, the joint playing characteristics of the target players may be { U1, O1, B1}, where U1 is the content of the face of the previous round played by the target player U in the game of the present game, O1 is the content of the face of the previous round played by the landowner player O in the game of the present game, B1 is the content of the face of the previous round played by the farmer player B in the game of the present game, and the playing sequence is U1, O1, B1. Although the partial historical playing characteristics of each player participating in the game (the playing characteristics of the previous round of playing of each player participating in the game) are taken as an example for description, this is only shown for convenience of understanding, and the present invention is not limited to this example.

S102: and obtaining the reward value of each card-playing candidate item in the card-playing candidate item set corresponding to the target player according to the state characteristic of the target player, the combined card-playing characteristic corresponding to the target player and a pre-trained machine learning model.

Specifically, in this step, the state characteristic of the target player and the combined playing characteristic corresponding to the target player may be input into a machine learning model corresponding to the character of the target player, so as to obtain the playing candidates corresponding to the target player.

In the embodiment of the present application, the machine learning model is obtained by training in advance. Specifically, the machine learning model is trained based on sample data; the sample data comprises state characteristics of historical players, joint playing characteristics of the historical players and winning and losing results corresponding to the historical players; and the joint playing characteristics corresponding to the historical players are the historical playing characteristics of each player participating in the game corresponding to the sample data.

As for the history player, it should be noted that the history player refers to a player participating in the history game. It will be appreciated that the sample data may comprise data for several plays of a historical game. In the embodiment of the present application, similar to the status characteristics of the target player, the status characteristics of the historical player include a character of the historical player, cards held by the historical player, and historical cards played by each player participating in the historical game in the local game corresponding to the sample data. Accordingly, in the embodiment of the present application, the joint play feature of the historical players may be all the historical play features of the players who participate in the game of the present game corresponding to the sample data, or may be a part of the historical play features of the players who participate in the historical game in the game of the present game corresponding to the sample data (for example, the play feature of the previous round of playing of the players who participate in the historical game of the present game corresponding to the sample data), and the embodiment of the present application is not specifically limited. In an implementation manner of the embodiment of the present application, the joint playing characteristics corresponding to the historical players may be characteristics describing face contents of historical playing of the respective players participating in the historical game in the game of the current game corresponding to the sample data and showing playing orders of the respective players participating in the historical game.

In the embodiment of the present application, the win-or-lose result corresponding to the historical player refers to a win-or-lose result of the historical player in the current game corresponding to the sample data, and the win-or-lose result may be a "win" or a "loss". It can be understood that, since most of the original intentions of the user in determining how to play the cards are to win the game, the probability that the target player wins the game is higher if the card-winning candidates corresponding to the target player obtained based on the machine learning model are played by combining the winning and losing results of the historical players in training the machine learning model. In this way, the determined card candidates of the target player are more likely to be selected by the user, so that the target player is more likely to win the game.

The embodiment of the present application is not particularly limited to the machine learning model, and as an example, it is considered that in practical applications, a game does not have a large amount of sample data at the stage of development. The reinforcement learning is relatively close to a human learning mode, does not need to depend on a large amount of sample data, and can enable the reinforcement learning to learn by self in a reward mode. However, the efficiency of obtaining a stable model through reinforcement learning training is low because the sample data amount is small. For example, in the process of reinforcement learning, the bonus value given is R1 based on the input E1 and the input a1 at the beginning, and in the process of playing, the bonus value given is R2 based on the input E1 and the input a1, wherein R2 is not equal to R1, then the model cannot be determined based on whether R1 or R2 should be output by such a training set, which is very common in the process of reinforcement learning, so that the efficiency of obtaining a stable model by reinforcement learning is low. And the reinforcement learning and the Deep learning are combined, namely, training is performed through an auxiliary network (also called Deep-Q network), so that the stability of the model obtained by training can be improved, and the efficiency of model training is effectively improved. Therefore, in an implementation manner of the embodiment of the present application, the machine Learning model may be a Deep-Q Learning (Deep-Q Learning) model.

In the embodiment of the present application, the cards that can be played by the target player in each round of game according to the game rule are referred to as the card-playing candidates. The cards that can be dealt by the target player according to the game rules in each round of game, that is, the card-dealing candidates, may include at least one item, and therefore, it is understood that in each round of game, the target player actually corresponds to one card-dealing candidate set, and the card-dealing candidate set corresponding to the target player may include a plurality of card-dealing candidates. For convenience of description, any one of the poker winning candidates in the poker winning candidate set is referred to as a "first poker winning candidate".

In one possible implementation, the set of playing candidates corresponding to the target player is determined according to the hand of the target player and/or the playing condition of the previous round of the target player. For example, if the previous round of the target player played the pair 3, the play candidates of the target player may include a pair and a bomb that are larger than 3, that is, the play candidates of the target player are the pair and the bomb that are larger than 3 and held by the target player.

In an embodiment of the present application, the prize value of a first card-winning candidate is used to indicate the likelihood of winning a game based on triggering a card-playing action by said first card-playing candidate.

In one possible implementation, it will be appreciated that for a game, the set of played items that may occur during the course of the game is fixed and invariant, as determined by the rules of the game. Taking a three-player-fighting-ground as an example, the preset card-playing item set may include a plurality of card-playing items determined by single-piece, pair-pair, three-band one-band, three-band two-class card-playing rules. Generally, the number of dealt items may be up to several hundred, for example, three-player landlords may be up to 309. In a specific implementation, in S102, a card-playing candidate set corresponding to the target player may be determined according to any one or more of game rules, a hand of the target player, and/or a card-playing situation of a previous current round of the target player, where the card-playing candidate set includes one or more card-playing candidates; specifically, the state characteristic of the target player, the combined card-playing characteristic corresponding to the target player and a preset card-playing item set may be input into the machine learning model to obtain the award value corresponding to each card-playing item in the card-playing item set. Then, it can be understood that, after obtaining the bonus value corresponding to each of the poker terms in the poker term set, the poker terms may be matched with the poker term set corresponding to the target player, so as to obtain the bonus value of each poker term in the poker term set corresponding to the target player. Wherein the preset set of playing items may be determined according to the game rules.

In another possible implementation manner, the playing candidate set corresponding to the target player may be determined according to the game rule, the hand of the target player, and/or the playing condition of the previous round of the target player, and specifically, the playing candidate set of the target player may be determined according to the hand of the target player and/or the playing condition of the previous round of the target player. Specifically, when the card-playing items meeting the game rules are determined to be specifically implemented according to the hands of the target player, a single card, a pair, a third with one, a third with two, a bomb and the like in the hands of the target player can be determined, so that a card-playing candidate item set corresponding to the target player is obtained. When the card-playing candidates of the target player are obtained according to the card-playing condition of the previous round, for example, the card-playing candidates of the target player can be determined according to the card-playing condition of the previous family. For example, if the previous house is played in pair 3, the playing candidates of the target player are pairs or bombs which are held in the hand of the target player and are larger than 3. The determination of the play candidate set of the target player in the specific implementation according to the hand of the target player and/or the play situation of the previous round may be, for example, the determination of the cards capable of being played from the hands held by the player according to the previous home play and the hand of the player. For example, if the last house is played in pair 3, the playing candidates of the target player are pairs or bombs in the hands of the target player.

After the card-playing candidate item set of the target player is determined, the state characteristic of the target player, the combined card-playing characteristic corresponding to the target player and the card-playing candidate item set corresponding to the target player may be input into the machine learning model, so as to obtain the bonus value of each card-playing candidate item in the card-playing candidate item set corresponding to the target player.

S103: and determining the card-playing candidates of the target player in the current round of game according to the prize value of each card-playing candidate in the obtained card-playing candidate set.

In the embodiment of the present application, the card-out candidates of the target player in the current round of game are one or more card-out candidates in the card-out candidate set corresponding to the target player mentioned in S102. As before, the prize value for a first card-candidate is used to indicate the likelihood of winning the game based on the triggering of a card-play action by the first card-candidate. The greater the prize value of the first card-winning candidate indicates a greater likelihood of winning the game based on the triggering of the card-playing action by the first card-playing candidate. And the user typically wishes to win the game while playing the game. Therefore, in specific implementation, the playing candidate with a larger bonus value may be determined as the playing candidate of the target player in the current round of game S103.

Specifically, in one implementation, one or more of the card-playing candidates with the bonus value greater than or equal to the preset threshold value may be determined as the card-playing candidates of the target player in the current round of game.

As for the preset threshold, the embodiment of the present application is not particularly limited, and it can be understood that if the prize value of each of the candidate poker hands is smaller than the preset threshold, it indicates that the target player does not poker in the current round of game as a more appropriate choice.

In another implementation manner of the embodiment of the present application, the card-playing candidate with the largest bonus value and the largest bonus value with the largest bonus value may be determined as the card-playing candidate of the target player in the current round of game. It will be appreciated that for this case, the likelihood of the target player winning the game of chance based on the determined card-out candidates is greater than the likelihood of winning the game of chance based on other card-out candidates.

As can be seen from the above description, with the solution provided in the embodiment of the present application, the probability that the determined card-playing candidate of the target player is selected by the user is higher, and the probability that the target player wins the game is higher, that is, the determined card-playing candidate is more effective.

As described above, the card-out situation of the target player has a certain relationship with the character of the target player, in other words, the character of the target player may affect the card-out strategy of the target player. In an implementation manner of the embodiment of the present application, in order to further improve the effect of the determined poker-out candidates, corresponding machine learning models may be respectively trained for each character according to the character of the player. In other words, each player character has a corresponding machine learning model. For example, taking a three-player landlord as an example, a machine learning model corresponding to a landlord player, a machine learning model corresponding to a landlord upper-family member, and a machine learning model corresponding to a landlord lower-family member can be trained and obtained.

For this case, S102 may first determine the character of the target player and determine a machine learning model corresponding to the character of the target player in specific implementation. Specifically, a mapping relationship between a character and a machine learning model may be established in advance, and after the character of the target player is determined, the machine learning model corresponding to the character of the target player may be determined based on the mapping relationship. After the machine learning model corresponding to the role of the target player is determined, obtaining the bonus value of each card-out candidate item in the card-out candidate item set corresponding to the target player according to the state feature of the target player, the combined card-out feature corresponding to the target player and the machine learning model corresponding to the role of the target player, that is, inputting the state feature of the target player and the combined card-out feature corresponding to the target player into the machine learning model corresponding to the role of the target player, so as to obtain the card-out candidate item corresponding to the target player.

In an implementation manner of the embodiment of the present application, after the card-playing candidates corresponding to the target player are determined, a first operation may be further performed according to the card-playing candidates of the target player in the current round of game. Wherein the first operation may refer to an operation for helping the target player win the game of the present hand.

As an example, in order to facilitate prompting how a user may play cards currently, when a first operation is performed according to a play candidate of a target player in a current round of game, a play candidate corresponding to the target player may be displayed in a preset manner to distinguish the play candidate corresponding to the target player from other cards held by the target player. The preset mode is not particularly limited in the embodiment of the present application, as long as the card-playing candidates displayed in the preset mode can be distinguished from other cards held by the target player. For example, if the playing candidates of the target player are originally displayed in the first display area, the playing candidates of the target player may be controlled to be displayed in a second display area different from the first display area. For another example, a prompt icon may be added to the play candidates of the target player to distinguish the play candidates of the target player from other cards held by the target player, and the description is not repeated here.

As yet another example, performing the first operation according to the card-playing candidates of the target player in the current round of game may trigger the card-playing behavior based on the card-playing candidates of the target player in the current round of game. In view of the fact that, in practical applications, in order to generally ensure smooth progress of a game process, a card-playing time reserved for each player is generally limited, in an implementation manner of the embodiment of the present application, if a target player does not trigger a card-playing behavior within the reserved card-playing time, a terminal device may automatically trigger the card-playing behavior based on a card-playing candidate corresponding to the target player. Alternatively, when the target player sets the escrow mode in the game process (that is, the terminal device automatically triggers the card-playing behavior), the terminal device may also automatically trigger the card-playing behavior based on the card-playing candidate corresponding to the target player.

In addition, it is considered that in practical applications, there may be a game mode of a human-machine game, i.e., a mode in which a user plays a game with a robot. When the target player is the robot player, after the card-playing candidates of the target player are determined, the card-playing behavior can be triggered directly based on the card-playing candidates.

It will be appreciated that the number of candidates played by the target player in the current round of game may be 1, or may be greater than 1. It will be appreciated that if a card play action is to be triggered based on the card play candidates, then the card play action may only be triggered based on one card play candidate. In this case, in the embodiment of the present application, if the number of the card-playing candidates of the target player in the current round of swordmen is plural, for example, the number of the card-playing candidates having the bonus value greater than or equal to the preset threshold value is plural, in order to make the probability that the target player wins the game higher, the card-playing behavior may be triggered based on the aforementioned card-playing candidate with the highest score. If the number of the card-playing candidate items output by the machine learning model is one, the card-playing behavior can be triggered directly based on the one card-playing candidate item.

In the following, a specific implementation manner of training a machine learning model (DQN) model corresponding to each role is described by taking a three-person battle ground as an example.

Referring to fig. 2, the diagram is a schematic structural diagram of a machine learning model provided in an embodiment of the present application.

As shown in fig. 2, the machine learning model includes a primary network 201 and a secondary network 202.

In the embodiment of the present application, a training mode for training the poker playing candidate model by using a training sample is described by taking the poker playing candidate model as a machine learning model corresponding to a landlord player as an example.

The primary network 201 and the auxiliary network 202 have common inputs, that is, the state characteristic E of the historical player, the joint playing characteristic a corresponding to the historical player, and the preset playing item set M. The historical player's character referred to herein is the landowner.

Based on a set of training samples (namely, a state characteristic E of a set of historical players and a joint playing characteristic a corresponding to the historical players), the state characteristic E of the historical players and the joint playing characteristic a corresponding to the historical players can be respectively input into the main network 201 and the auxiliary network 202, the main network 201 obtains an initial predicted bonus value R1 corresponding to each playing item in the playing item set M based on the state characteristic E of the historical players and the joint playing characteristic a corresponding to the historical players, and processes the predicted bonus values based on the winning and losing results Q of the historical players to obtain a target predicted bonus value R2 corresponding to each playing item in the playing item set M. Accordingly, the auxiliary network 201 obtains the standard bonus value R3 corresponding to each card in the card-playing item set M based on the status characteristic E of the historical player and the joint card-playing characteristic a corresponding to the historical player. Then, a LOSS function LOSS is calculated based on the target predicted reward value R2 and the standard reward value R3, and the network parameters of the main network 201 are updated based on the LOSS function LOSS.

As mentioned above, the auxiliary network 202 is to avoid that the model gives different reward values for a plurality of times based on the input E1 and the input a1 (for example, the reward value given for the first time is R1, and the reward value given for the second time is R2), in this embodiment, the parameter update of the auxiliary network 202 is not synchronized with the parameter update of the main network, and specifically, after the parameter update of the main network 201 is performed for a plurality of times, the parameter update of the auxiliary network 202 may be performed for one time based on the target reward value R2 corresponding to the parameter update of the main network 201 in the plurality of times. For example, the parameters of the main network 201 are updated 10 times, and the parameters of the auxiliary network 202 are updated once correspondingly, so that when similar training samples are input, since the parameters of the auxiliary network 202 are not updated synchronously with the parameters of the main network 201, the standard reward values R3 obtained by the auxiliary network 202 based on the two similar training samples are similar, thereby avoiding the aforementioned problem that the reward values given by the models are different for multiple times based on the input E1 and the input a1 and the reward value given is R1, and then the models are played based on the input E1 and the input a1, and improving the stability and the efficiency of model training.

As described above, the joint play characteristic corresponding to the historical player is a historical play characteristic of each player participating in the game of the current game corresponding to the sample data. In the embodiment of the present application, if the players participating in the game with the historical player M (landholder player) also include the farmer player N and the farmer player P, where the farmer player N is the upper player of the historical player M and the farmer player P is the lower player of the historical player M, the joint playing characteristics of the historical players may be { M1, P1, N1}, for example, where M1 is the content of the face of the previous round played by the historical player M in the game, P1 is the content of the face of the previous round played by the farmer player P in the game, and N1 is the content of the face of the previous round played by the farmer player N in the game.

Note that the training method of the machine learning models corresponding to the upper home of the landowner (for example, the farmer player N) and the lower home of the landowner (for example, the farmer player P) is similar to the training method of the machine learning models corresponding to the landowner player (for example, the historical player M). In the training of the machine learning model corresponding to the upper home of the landowner, the input of the model training is the state characteristic of the farmer player N and the combined card-playing characteristic corresponding to the farmer player N, wherein the combined card-playing characteristic corresponding to the farmer player N may be { N1, M1, P1}, for example. Accordingly, when training the machine learning model corresponding to the next home of the landowner, the inputs of the model training are the state feature of the farmer player P and the combined card-playing feature corresponding to the farmer player P, wherein the combined card-playing feature corresponding to the farmer player P may be { P1, N1, M1}, for example. For the training manner of the machine learning model corresponding to the upper home of the landowner and the lower home of the landowner, reference may be made to the above description part of the training process of the machine learning model 200, and details are not repeated here.

It should be noted that, in the embodiment of the present application, after the training of the machine learning model is completed, the state feature of the target player and the combined card-playing feature corresponding to the target player may be input into the machine learning model, and then, the main network in the machine learning model may output the card-playing candidates corresponding to the target player.

Exemplary device

Based on the method provided by the above embodiment, the embodiment of the present application further provides an apparatus, which is described below with reference to the accompanying drawings.

Referring to fig. 3, the figure is a schematic structural diagram of an apparatus for determining a card-playing candidate provided in an embodiment of the present application. The apparatus may specifically include, for example: an acquisition unit 301, a first determination unit 302, and a second determination unit 303.

The obtaining unit 301 is configured to obtain a status feature of a target player and a joint card-playing feature corresponding to the target player; the joint card playing characteristics corresponding to the target player are historical card playing characteristics of each player participating in the game;

the first determining unit 302 is configured to obtain a bonus value of each of the playing candidates in the playing candidate set corresponding to the target player according to the state feature of the target player, the joint playing feature corresponding to the target player, and a pre-trained machine learning model; wherein the prize value of a first one of the set of card-playing candidates is indicative of a likelihood of winning a game based on the first one of the set of card-playing candidates triggering a card-playing action;

the second determining unit 303 is configured to determine the playing candidates of the target player in the current round of game according to the obtained bonus values of the respective playing candidates in the playing candidate set.

Optionally, the apparatus further comprises:

the first determining unit 302 is specifically configured to:

Optionally, the apparatus further comprises:

the first determining unit 301 is specifically configured to:

Alternatively to this, the first and second parts may,

the machine learning model is obtained based on sample data training;

Alternatively to this, the first and second parts may,

the first determining unit 302 is specifically configured to:

Optionally, the second determining unit 303 is specifically configured to:

Optionally, the apparatus further comprises:

Optionally, the operation unit is specifically configured to:

Since the apparatus 300 is an apparatus corresponding to the method provided in the above method embodiment, and the specific implementation of each unit of the apparatus 300 is the same as that of the above method embodiment, for the specific implementation of each unit of the apparatus 300, reference may be made to the description part of the above method embodiment, and details are not repeated here.

The embodiment of the present application further provides a terminal device, which may be used to execute the method for determining the card-playing candidates provided in the above method embodiment, and the terminal device provided in the embodiment of the present application will be described below from the perspective of hardware implementation.

Referring to fig. 4, fig. 4 is a schematic structural diagram of a terminal device according to an embodiment of the present application. For convenience of explanation, only the parts related to the embodiments of the present application are shown, and details of the specific technology are not disclosed. The terminal may be any terminal device including a computer, a tablet computer, a Personal digital assistant (hereinafter, referred to as "Personal digital assistant"), and the like, taking the terminal as a mobile phone as an example:

fig. 4 is a block diagram illustrating a partial structure of a mobile phone related to a terminal provided in an embodiment of the present application. Referring to fig. 4, the handset includes: radio Frequency (RF) circuit 410, memory 420, input unit 430, display unit 440, sensor 450, audio circuit 460, wireless fidelity (WiFi) module 470, processor 480, and power supply 440. Those skilled in the art will appreciate that the handset configuration shown in fig. 4 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

The memory 420 may be used to store software programs and modules, and the processor 480 executes various functional applications and data processing of the mobile phone by operating the software programs and modules stored in the memory 420. The memory 420 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. Further, the memory 420 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

The processor 480 is a control center of the mobile phone, connects various parts of the entire mobile phone by using various interfaces and lines, and performs various functions of the mobile phone and processes data by operating or executing software programs and/or modules stored in the memory 420 and calling data stored in the memory 420, thereby integrally monitoring the mobile phone. Optionally, processor 480 may include one or more processing units; preferably, the processor 480 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into processor 480.

In the embodiment of the present application, the processor 480 included in the terminal further has the following functions:

optionally, the processor 480 is further configured to execute the steps of any implementation manner of the method for determining the card-playing candidates provided in the embodiment of the present application.

The embodiment of the present application further provides a computer-readable storage medium for storing a computer program, where the computer program is used to execute any one implementation of the method for determining the card-playing candidates described in the foregoing embodiments.

Embodiments of the present application further provide a computer program product including instructions, which when run on a computer, cause the computer to perform any one of the embodiments of the method for determining a playing candidate described in the foregoing embodiments.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice in the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the attached claims

The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method of determining a discard candidate, the method comprising:

2. The method of claim 1, further comprising:

the obtaining of the bonus value of each of the poker candidates in the poker candidate set corresponding to the target player according to the state feature of the target player, the joint poker feature corresponding to the target player, and the pre-trained machine learning model includes:

3. The method of claim 1,

the method further comprises the following steps: determining a card-playing candidate item set corresponding to a target player, wherein the card-playing candidate item set comprises one or more card-playing candidate items;

4. The method of claim 1,

the machine learning model is obtained based on sample data training;

5. The method of claim 1, wherein the status characteristics of the target player include any one or more of:

6. The method as claimed in claim 1, wherein the joint playing characteristics corresponding to the target players are face contents describing historical playing of each player participating in the game and characteristics representing playing sequence of each player.

7. The method of claim 1,

8. The method according to any of claims 1-7, wherein the machine learning model is a deep enhanced learning DQN model.

9. The method of any one of claims 1 to 8, wherein determining the winning value of each of the winning candidates in the set of winning candidates for the target player in the current round of game play comprises:

10. The method of any one of claims 1-9, wherein after determining the target player's playing candidates in the current round of play, the method further comprises:

11. The method of claim 10, wherein performing a first operation based on the determined draw candidates of the target player in the current round of game comprises:

12. An apparatus for determining a discard candidate, the apparatus comprising:

13. An apparatus, characterized in that the apparatus comprises: a processor and a memory:

the processor is configured to execute the method of determining a discard candidate of any of claims 1 to 11 in accordance with the computer program.

14. A computer-readable storage medium for storing a computer program for executing the method of determining a discard candidate of any of claims 1 to 11.