CN113018837B

CN113018837B - Machine game playing method, system and storage medium for whipped egg playing cards

Info

Publication number: CN113018837B
Application number: CN202110151660.XA
Authority: CN
Inventors: 潘志庚; 孙亚文; 张明敏; 徐守江; 朱兆辉; 高和蓓
Original assignee: Hangzhou Normal University
Current assignee: Hangzhou Normal University
Priority date: 2021-02-03
Filing date: 2021-02-03
Publication date: 2024-04-02
Anticipated expiration: 2041-02-03
Also published as: CN113018837A

Abstract

The invention relates to a machine game playing method, a system and a storage medium for a whipped egg playing card, belonging to the technical field of deep learning and machine game. Firstly, calculating the energy of the drawn card surface, calculating according to an energy calculation formula, then dividing the card into three basic effective card shapes, carrying out card distribution, calculating the card surface, obtaining a formula by using a minimum data set through deep learning training parameters, obtaining parameters, calculating Value values, and adjusting card discharging strategies in real time according to the Value values, wherein the card discharging rules comprise three types, including card discharging, card following and sprinting. The invention discloses a calculation strategy based on deep learning, which realizes a machine algorithm in a poker game, adopts the same method for two roles of a family, carries out poker game competition by two groups of algorithms, and finally trains a more intelligent poker competition method.

Description

Machine game playing method, system and storage medium for whipped egg playing cards

Technical Field

The invention relates to the technical field of deep learning and machine game, in particular to a machine game playing method, a system and a storage medium of a whipped egg playing card.

Background

Since the 40 s of the last century, deep blue and alpha go have attracted attention from the citizen, and machine gaming has been a pilot stone for verifying computing theory and artificial intelligence theory. Computer games are an important platform for testing the level of artificial intelligence, are a subject for researching and developing methods, technologies and applications for simulating, extending and expanding the intelligent theory of people (reference Yang Shanlin, ni Zhiwei. Machine learning and intelligent decision support system [ M ]. Ke xue chu ban she, 2004.), machine games are known as drosophila of artificial intelligence (reference Xu Yangong. Man-machine games [ J ]. Artistic and design 2016 (4): 150-151.), and are similar to the action of drosophila in biology genetics, a great deal of results generated by researching machine games can be applied to more fields such as chess playing, playing cards (reference square, zhao Shuying, time smelling, etc. man-machine games show-chess playing robot [ J ]. Robot technologies and applications, 2014 (2): 39-42.).

The inertial egg is a playing game widely used in Huaian regions, which is developed from the fast running of local playing card game, the card game adopts four people to compete and compete, two decks of cards are used, win-or-lose upgrade mode is adopted, the card game is more creative and recreational,

card type sharing: two pairs, bombs, four king, same-pattern, cisterns, steel plates, single cards, double cards, three cards with two and the like, wherein the sizes of the cards are equal to six of the four king, six or more bombs, same-pattern, five bombs, four bombs and other cards, and for the general cards, the sizes can be compared only if the point cards are the same and the cards with the same overview are the same, wherein the combined cards like the three cards with two, three cards with the same rank and the like can be played continuously only if the number of the cards is the largest. The ranking sequence is head game, double lower (namely, the last two of the two players) and last game, the playing cards are in sequence of king, K, A, Q, J, 10, 9, 8, 7, 6, 5, 4 and 3, wherein the upgrading rule is that the first card is out and the last card is out. Wherein, if double-down, the winner rises by 3 stages; the opponent one is the last winner, 2 stages; the winner's pair gate is last, winner rises 1 stage; the head of the person is treated for three times in a single direction, rising 1 level; 4 rises 2 stages and so on; the single side performs three last trip and pushes one stage; and when the game reaches A, ending the game.

The traditional method for calculating the playing cards comprises the following steps: the method has the advantages that the memory space is large, the intelligent card counting is not enough, the card distribution mode needs a mode of combining depth traversal and breadth traversal, and the problems of long card discharging time, lack of general type and the like exist in each card discharging. The intelligent game method for the Huaian egg-filling playing cards is less in research, and an intelligent card-playing strategy with small memory and high calculation speed needs to be researched.

Disclosure of Invention

The invention aims to provide a method, a system and a storage medium for playing cards in a machine game of whipped egg playing cards, which have simple calculation mode, and can be used for reasoning when each specific situation appears, so as to realize a playing card strategy which is more similar to the habit of a human player.

To achieve the above object, in a first aspect, the present invention provides a machine game play method of whipped egg playing cards, comprising the steps of:

1) Calculating the card splitting loss E1 and the corresponding value E2 of each card splitting mode in the held playing cards, and calculating the energy value Ea=E1+E2 of each card splitting mode;

2) Comparing the energy value of each card disassembling mode to obtain minEa;

3) Calculated valuek is a constant;

4) And playing cards according to the card type corresponding to the Eb value.

According to the technical scheme, the Huaian inertial egg playing cards are used as research objects, the card playing modes are calculated through the energy minimization algorithm, the calculation modes are simple, the overall value of the cards after the card splitting loss and the red heart cards are added is calculated, the optimal card playing mode is selected, the optimal k value is obtained through deep learning and training, the method is more focused on strategies in card playing, reasoning is conducted when each specific situation appears, and playing strategies which are closer to the habit of a human player are achieved.

Preferably, in step 1), the value of the tile removal loss E1 is equal to the number of the held tile. The discard losses of "7, 9, 10, j, qqqq, k, red heart 5" are 10. For example: when the hand holds "8 10J Q K AAAA", the AAAA is played as a bomb, the loss of the remaining singles as a whole becomes large because the number of held singles is excessive.

Preferably, in step 1), the value calculation mode corresponding to each card type is determined as follows:

value of three identical cards: v3 (i+1) =v3 (i) +1; if V3 (3) =v3 (2) +1, i.e. 3 sheets 3 have a Value 1 greater than the Value of 3 sheets 2.

Value of two identical cards: v2 (i+1) =v2 (i) +1;

value of bomb: v4 (2) =v3 (a) +1, V4 (3) =v4 (2) +1;

value of the same floral sequence: v5 (2) =v4 (a) +1, V5 (i+1) =v5 (i) +1;

i is the number of the card;

when a hand is drawn, 26 cards are shared, 13 cards are shared from 2-A, one card can be drawn in the egg-filling playing card, and the red heart corresponding to the one card is used as a free card, so that any card type can be replaced; the specific mode is as follows:

when the local playing card is randomly drawn to be red heart 5, the value corresponding to 2-4 is 1-3, the value corresponding to 6-A is 4-12, the value corresponding to square 5, spade 5 and plum blossom 5 is 13, and the red heart card is 14 and so on; two definitions of 2 less than two 3 and five of 2 greater than 4 a define the value of the card.

Preferably, in step 1), the playing cards held are classified into the following three categories:

first-kind active card C1: the number of cards in the group can be reduced without disassembling the cards; for example, when holding "333 22" in the hand, the cards can be dealt in two hands, when the cards are dealt in three hands, the number of sets of cards is reduced without disassembling the cards.

Second-class active cards C2: the number of groups cannot be reduced, but the value can be increased; for example, when the hand holds "3333 45678", the original "333 345678" is changed to "3333 45678", and "3333" is regarded as a bomb for playing cards, the number of the whole playing card sets is not reduced, the number of the playing card sets is not reduced, but the value can be increased.

Third category active card C3: the efficiency of C1 and C2 can not be improved, but the card can be used for home-pressing. For example, when a "7 8 999910J" hand holds a card, the gate of the upper house is removed 9 times, and C3 refers to a card-out mode that reduces the number of card-out sets or reduces value, to assist in the upper house card removal from the hand.

The first type of valid card C1 may have a reduced value for a reduced number of sets, e.g., a reduced number of sets (Z), which represents a hand, e.g., three 2 sets, two 3 sets, and a card held without a card match, requiring a minimum number of sets Ez = min (Z).

Preferably, in step 3), the constant k is obtained by:

the value of k is preset, when k=2, the loss value is amplified and calculated, when k=1/2, the proportion of the loss value is reduced, and the value of k with the largest number of winning cards is found by continuously adjusting the increment or decrement of 0.5 each time. Training is performed by first placing a deck of cards dealt into the hand into a deep learning neural network by changing parameters: training round number (epoch), normalization parameter (min-max), learning rate (learning rate), and obtaining the value of k with the maximum probability of winning the whole deck.

Each k value is trained for 20 times, the probability of winning the current k is calculated, the increment of 0.5 is preset in the program as an automatic training mode, the probability of winning the card once is tested every 0.5 increment, and finally the value of the k which is easiest to win the card is screened. The value of K will be continuously adjusted with the result of the big card.

Preferably, the deep learning neural network is a neural network based on Yu Mengte Carlo tree search.

Monte Carlo tree search expands the scale of the game tree step by step through iteration, wherein the UCT algorithm (upper limit confidence interval algorithm) is a game operation search algorithm, and the algorithm combines Monte Carlo tree search and UCB consensus, and plays advantages in time and space in the process of searching the ultra-large-scale game tree. UCT tree is asymmetrically grown, the growth sequence is unpredictable, the expansion direction is guided according to the performance index of the child node, the performance index is the UCB value, U utilizes the existing knowledge in the searching process, more opportunities are given to the node with high winning rate, and the brother node with low temporary winning rate is also considered. The value of UCB is calculated according to the following formula: wherein W is _i : number of times child node wins; n (N) _i : the number of times the child node participates in the simulation; n: the number of times the current node participates in the simulation; c: weighting coefficients. The basic flow of Monte Carlo is that the selected branch (Selection) is in the leaf nodeDot spread one layer (Expansion), simulation pair (Simulation), and result feedback (Back propagation).

UCB calculation formula

3-1) establishing a root node by the currently held cards, generating all child nodes of the root node, and respectively carrying out simulation on each hand serving as one child node;

3-2) starting from the root node constructed from the held cards, performing a best priority search;

3-3) calculating UCB value of each child node by utilizing UCB formula, and selecting the child node with maximum value;

3-4) if the node is not a leaf node, repeating 2 with the node as a root node;

3-5) until a leaf node is encountered, if the leaf node has not been once simulated for a game, simulating a game for the leaf node; otherwise, generating child nodes for the leaf node randomly, and carrying out simulation office checking;

3-6) updating the nodes and ancestor nodes of each level according to the corresponding colors by using the gains of the simulation pair (generally, the success ratio is 1 and the success ratio is 0), and simultaneously increasing the access times of all nodes above the nodes;

3-7) returning to 2 unless the round of search time ends or the preset number of cycles is reached;

3-8) selecting the best-given method with highest average benefit from the child nodes in the current situation.

Preferably, the network structure includes: 3 convolutional layers, one flame layer, 4 Dense full-link layers, and 1 softmax layer; in the convolution layer, the core size of the winding machine is 3*3, and the step length of the winding machine is 1; the function of the layer is to unidimensionally unify the input multidimensional data, namely, compress the data of [ height, width, channel ] into a one-dimensional array with the length of height multiplied by width multiplied by channel, and then connect with the full connection layer; after passing through the Dense fully connected layer, the softmax layer maps the multiple scalars into a probability distribution with each value range of its output being (0, 1); the softmax function is often used in the last layer of the neural network as the output layer for multiple classification.

In a second aspect, the present invention provides a machine game play system for whipped egg poker comprising: the system comprises a memory, a processor and a computer program stored on the memory, wherein the computer program is configured to realize the steps of the machine game playing method of the whipped egg playing card when the computer program is called by the processor.

In a third aspect, the present invention provides a computer readable storage medium storing a computer program configured to implement the steps of the above-described playing method of the whipped egg playing card by a machine when called by a processor.

Compared with the prior art, the invention has the following advantages:

the invention utilizes an energy minimization algorithm and a deep learning method to enable a machine to learn a playing strategy, and the optimal playing mode is rapidly realized by adjusting parameters. The card distribution method is simple, short in time consumption and intelligent in operation, and the intelligent operation is performed in a learning mode in training, so that an efficient and intelligent card-playing strategy is realized. The Monte Carlo algorithm is adopted on the algorithm, and comprises 3 convolution layers, one Flatten layer, 4 Dense full connection layers and 1 softmax layer. And finally, finding out the most suitable card-playing mode and obtaining the winner in the man-machine playing card game.

Drawings

FIG. 1 is a flow chart of an embodiment of the present invention.

Fig. 2 is a network configuration diagram of the deep learning neural network shown in fig. 1.

FIG. 3 is a schematic diagram of competition rules for four players to participate in an egg-laying poker game in accordance with an embodiment of the present invention.

Detailed Description

The present invention will be further described with reference to the following examples and drawings for the purpose of making the objects, technical solutions and advantages of the present invention more apparent. It will be apparent that the described embodiments are some, but not all, embodiments of the invention. All other embodiments, based on the described embodiments, which a person of ordinary skill in the art would obtain without inventive faculty, are within the scope of the invention.

Unless defined otherwise, technical or scientific terms used herein should be given the ordinary meaning as understood by one of ordinary skill in the art to which this invention belongs. As used in this specification, the word "comprising" or "comprises", and the like, means that the element or article preceding the word is meant to encompass the element or article listed thereafter and equivalents thereof without excluding other elements or articles.

Examples

In the embodiment, the machine game playing method of the whipped egg playing card is performed based on the Monte Carlo tree, and the energy of the playing card is calculated. Firstly, calculating the energy of the drawn card surface, calculating according to an energy calculation formula, then dividing the card into three basic effective card shapes, carrying out card distribution, calculating the card surface, obtaining a formula by using a minimum data set through deep learning training parameters, obtaining parameters, calculating Value values, and adjusting card-discharging strategies in real time according to the Value values, wherein the card-discharging rules comprise card discharging, card following and thorn. The invention discloses a calculation strategy based on deep learning, which realizes a machine algorithm in a poker game, adopts the same method for two roles of a family, carries out poker game competition by two groups of algorithms, and finally trains a more intelligent poker competition method. Referring to fig. 1, the method specifically includes:

step S100: calculating the card splitting loss E1 and the value E2 corresponding to the card splitting type of each card splitting mode in the held playing cards, and calculating the energy value Ea=E1+E2 of each card splitting mode;

the card removal loss is calculated based on the number of playing cards held.

The value for each card type is calculated as follows:

value of three identical cards: v3 (3) =v3 (2) +1;

value of two identical cards: v2 (3) =v2 (2) +1;

value of bomb: v4 (2) =v3 (a) +1, V4 (3) =v4 (2) +1;

value of the same floral sequence: v5 (2) =v4 (a) +1, V5 (3) =v5 (2) +1.

The playing cards held are divided into the following three categories:

first-kind active card C1: the number of cards in the group can be reduced without disassembling the cards;

second-class active cards C2: the number of groups cannot be reduced, but the value can be increased;

third category active card C3: the efficiency of C1 and C2 can not be improved, but the card can be used for home-pressing.

For example, the card type is 2,3,4, 5, 6, if not matched with the cistron:

E1＝1(2，2)+1(3)+1(4)+1(5，5)+1(6)＝5，

E2＝(1(2，2)+2(3)+3(4)+4(5，5)+5(6))/5＝3；

if the ligand:

E1＝1(2)+1(2，3，4，5，6)+1(5)＝3，

e2 = (1 (2) +3 (calculated as 4) +4 (5))/3=8/3.

Here value (5, 5) =4 is dynamically changed, and if all of the cards 2,3,4 have already been dealt, value (5, 5) =1.

Step S200: comparing the energy value of each card disassembling mode to obtain minEa;

step S300: calculated valuek is a constant; k is obtained by the following method:

the value of k is preset, when k=2, the loss value is amplified and calculated, when k=1/2, the proportion of the loss value is reduced, and the value of k with the maximum number of winning cards is found by continuously adjusting the increment or the decrement of 0.5 each time; training is performed by first placing a deck of cards dealt into the hand into a deep learning neural network by changing parameters: training round number epoch, normalization parameter min-max and learning rate, and obtaining the value of k with maximum probability of winning the whole deck. Each k value is trained for 20 times, the probability of winning the current k is calculated, the increment of 0.5 is preset in the program as an automatic training mode, the probability of winning the card once is tested every 0.5 increment, and finally the value of the k which is easiest to win the card is screened.

Referring to fig. 2, the network architecture includes: 3 convolutional layers, one flame layer, 4 Dense full-link layers, and 1 softmax layer; in the convolution layer, the core size of the winding machine is 3*3, and the step length of the winding machine is 1; the function of the layer is to unidimensionally dimension the input multidimensional data, namely, compress the data of [ height, width, channel ] into a one-dimensional array with length of height x width x channel, and then connect with the full connection layer; after passing through the Dense fully connected layer, the softmax layer maps the multiple scalars into a probability distribution with each value range of its output being (0, 1); the softmax function is often used in the last layer of the neural network as the output layer for multiple classification.

Step S400: and playing cards according to the card type corresponding to the Eb value.

As shown in FIG. 3, the algorithm of the inertia-egg poker game is four-player game, two groups of algorithms are adopted for game play through two competition rules of one group of opponents, two groups of algorithms are adopted respectively, in competition, the competition environment is firstly configured, the experiment comprises computer configuration, and the hardware configuration comprises two computers 2080Ti and 2080 computer.

For example, when a deck of cards is received, k=2.5 is obtained through pre-training calculation, and is brought into a formula, ea=e1+e2 is calculated first, and when "7, 9, 10, j, qqqq, k, and red heart 5" are held in the hand, the red heart 5 is used as a free card capable of replacing any card, and the following card-discharging mode is provided:

value (78910J) =30, value (910 jqk) =32, value (78910 jqk) =40, value (qqqq) =55, value (qqqq) =45, value (qqq) =35, value (7) =5, value (9) =7, value (10) =8, value (J) =9, value (K) =11, value (red heart 5) =14 are known.

1. The value e=5+7+8+9+45+11+14=99 calculated as per the existing playing cards

2. Card-playing method I: (cistron 78 (red heart 5) 9 j q k) (bomb:

qqq Red 5)

E2＝40+35＝75

E1＝10

Ea＝10+75＝85

And a second card-playing method: (7, (9 10j q k), qqq, red 5)

E2＝5+32+14+35＝86

E1＝10

Ea＝E1+E2＝96

The corresponding energy value of the card-playing method is smaller than that of the card-playing method by And (3) taking the k=2.5 value into Eb, and carrying out card drawing in a card drawing mode of finding the minimum Eb, and carrying out card drawing in a mode of finding the minimum Eb card drawing by recalculating the Ea and Eb values once every card drawing. Finally, the cards in the hands are discharged.

Claims

1. A machine game play method of whipped egg playing cards, which is characterized by comprising the following steps:

1) Calculating the card splitting loss E1 and the value E2 corresponding to the card splitting type of each card splitting mode in the held playing cards, and calculating the energy value Ea=E1+E2 of each card splitting mode;

the value of the card disassembling loss E1 is equal to the number of the held playing cards;

the value for each card type is calculated as follows:

value of three identical cards: v3 (i+1) =v3 (i) +1;

value of two identical cards: v2 (i+1) =v2 (i) +1;

value of bomb: v4 (2) =v3 (a) +1, V4 (3) =v4 (2) +1;

value of the same floral sequence: v5 (2) =v4 (a) +1, V5 (i+1) =v5 (i) +1;

i is the number of the card;

when the local playing card is randomly drawn to be red heart 5, the value corresponding to 2-4 is 1-3, the value corresponding to 6-A is 4-12, the value corresponding to square 5, spade 5 and plum blossom 5 is 13, and the red heart card is 14 and so on; two definitions of 2 less than two 3, five definitions of 2 greater than 4 a define the value of the card;

the playing cards held are divided into the following three categories:

first-kind active card C1: reducing the number of cards in the group without disassembling the cards;

second-class active cards C2: the number of groups cannot be reduced, but the value is increased;

third category active card C3: the efficiency of C1 and C2 cannot be improved, but the household cards are helped;

2) Comparing the energy value of each card disassembling mode to obtain minEa;

3) Calculated valuek is a constant and is obtained by the following method:

the value of k is preset, when k=2, the loss value is amplified and calculated, when k=1/2, the proportion of the loss value is reduced, and the value of k with the maximum number of winning cards is found by continuously adjusting the increment or the decrement of 0.5 each time; training is performed by first placing a deck of cards dealt into the hand into a deep learning neural network by changing parameters: training the number epoch of rounds, normalizing the parameter min-max and learning rate to obtain the value of k with the maximum probability of winning the whole deck;

4) And playing cards according to the card type corresponding to the Eb value.

2. The machine game play method of whipped egg playing cards as set forth in claim 1, wherein each k value is trained 20 times to calculate the probability of winning the current k, an increment of 0.5 is preset in the program as an automatic training mode, and each increment of 0.5 is used to test the probability of winning the one time, and finally the value of k most prone to winning is selected.

3. The machine game play method of whipped egg playing cards as defined in claim 1, wherein said deep learning neural network is constructed based on a monte carlo tree search.

4. The machine game play method of whipped egg playing cards as set forth in claim 3, wherein said deep learning neural network structure comprises: 3 convolutional layers, one flame layer, 4 Dense full-link layers, and 1 softmax layer; in the convolution layer, the core size of the winding machine is 3*3, and the step length of the winding machine is 1; the function of the flat layer is to unidimensionally unify the input multidimensional data, namely, compress the data of [ height, width, channel ] into a one-dimensional array with the length of height multiplied by width multiplied by channel, and then connect with the full connection layer; after passing through the Dense fully connected layer, the softmax layer maps the multiple scalars into a probability distribution with each value range of its output being (0, 1); the softmax function is often used in the last layer of the neural network as the output layer for multiple classification.

5. A machine game play system for whipped egg poker, comprising: a memory, a processor and a computer program stored on the memory, the computer program being configured to implement the steps of the method of any one of claims 1 to 4 when invoked by the processor.

6. A computer-readable storage medium, characterized by: the computer readable storage medium stores a computer program configured to implement the steps of the method of any one of claims 1 to 4 when invoked by a processor.