CN113018837A

CN113018837A - Machine game playing method and system of whipped egg poker and storage medium

Info

Publication number: CN113018837A
Application number: CN202110151660.XA
Authority: CN
Inventors: 潘志庚; 孙亚文; 张明敏; 徐守江; 朱兆辉; 高和蓓
Original assignee: Hangzhou Normal University
Current assignee: Hangzhou Normal University
Priority date: 2021-02-03
Filing date: 2021-02-03
Publication date: 2021-06-25
Anticipated expiration: 2041-02-03
Also published as: CN113018837B

Abstract

The invention relates to a machine game playing method, a system and a storage medium of egg whipped playing cards, belonging to the technical field of deep learning and machine game. The method comprises the steps of firstly calculating the energy of drawn card faces, calculating according to an energy calculation formula, dividing the cards into three basic effective card shapes, matching the cards, calculating the card faces, obtaining the formula by using a minimum data set through deep learning training parameters, obtaining parameters, calculating Value values, adjusting card-out strategies in real time according to the Value values, wherein the card-out rules have three types including card-out, card-following and sprint. The invention discloses a computing strategy based on deep learning, machine algorithms are realized in a poker game, two roles of a family are subjected to a poker game competition by adopting the same method, and the two algorithms finally train a more intelligent poker competition method.

Description

Machine game playing method and system of whipped egg poker and storage medium

Technical Field

The invention relates to the technical field of deep learning and machine game, in particular to a machine game playing method and system of whipped egg playing cards and a storage medium.

Background

Since the last 40 th century, the deep blue and AlphaGo attract the attention of the whole people, and machine gaming has been a gold testing stone for verifying the calculation theory and the artificial intelligence theory. Computer gaming is an important platform for testing the level of artificial intelligence, and is a subject for researching and developing methods, techniques and applications of intelligent theories for simulating, extending and expanding people (reference: Yang Minglin, Nishiwei. machine learning and intelligent decision support system [ M ]. Ke xue chu ban she, 2004.), and fruit flies known as artificial intelligence (reference: xuyanhong. man-machine gaming [ J ] art and design, 2016(4):150-, similar to the role of fruit flies in biological genetics, a large number of achievements generated by machine game research can be applied to more fields such as playing chess, poker and the like (reference documents: fang yuan, zhao shuying, smelling time, and the like, man-machine game technology shows-a chess playing robot [ J ] robot technology and application, 2014(2): 39-42.).

The inertial egg is a widely-distributed poker game in Huaian area, is developed from the 'running speed' of local poker games, adopts four-player competition, uses two pairs of cards and carries out win-win upgrade, leads the poker games to be more creative and entertaining,

the brand types are shared: the card type comprises two pairs, bombs, four king, same-pattern, four-card, single card, two-card, three cards, three strips and two cards, wherein the size of the card type is according to four king, six bombs and more than six bombs, same-pattern, five bombs, four bombs and other card types. The ranking order is head game, double lower (namely, the last two names of the two players and the opponent player) and end game, the playing cards are king, K, A, Q, J, 10, 9, 8, 7, 6, 5, 4 and 3, wherein the upgrading rule is that the first card is ended as the head game, and the last card is ended as the end game. Wherein, if the number of the winners is two, the winners are increased by 3 grades; one opponent is the last-trip winner, and the level is raised by 2; the winner is at last trip, and the winner is raised to 1 level; one side is used for the first family three times, and the level is raised to 1; 4 times of 2-grade ascending and the like; unilaterally performing three end trips, and pushing one step; when reaching A, the game is ended.

Conventional methods of playing card counting exist: the required memory space is large, the card type calculation is not intelligent enough, the card matching mode needs a mode of combining depth traversal and breadth traversal, and the problems of long card playing time, lack of universality and the like exist in each card playing. The intelligent game method aiming at the 'Huaian egg filling' playing card has less research, and a card-playing strategy which is intelligent, small in memory and high in calculation speed needs to be researched.

Disclosure of Invention

The invention aims to provide a machine game playing method, a machine game playing system and a storage medium for whipped egg poker, which have simple calculation mode, carry out reasoning when each specific situation occurs and realize a poker strategy closer to the habit of human players.

In order to achieve the above object, in a first aspect, the present invention provides a machine game playing method of whipped egg playing cards, comprising the steps of:

1) calculating the card splitting loss E1 and the value E2 corresponding to the card splitting type of each card splitting mode in the held playing cards, and calculating the energy value Ea of each card splitting mode as E1+ E2;

2) comparing the energy value of each brand splitting mode to obtain the minEa;

3) calculated value

k is a constant;

4) and (4) playing the cards according to the card types corresponding to the Eb values.

According to the technical scheme, the Huaian inertial egg playing cards are used as research objects, the playing mode is calculated through an energy minimization algorithm, the calculation mode is simple, the optimal playing mode is selected through calculating the playing card dismantling loss and the overall value after the red-heart cards are added, the optimal k value is obtained through deep learning and training, the method focuses on the playing strategy, reasoning is carried out when each specific situation appears, and the playing card strategy which is closer to the habit of human players is achieved.

Preferably, in step 1), the value of the card-splitting loss E1 is equal to the number of playing cards held. The split penalty of "7, 9, 10, j, qqqqq, k, hearts 5" is 10. For example: when the hand holds "810J Q K AAAA", AAAA when dealing with a bomb, the overall loss of remaining single cards becomes large because the number of single cards held is excessive.

Preferably, in step 1), the value calculation method corresponding to each brand is determined as follows:

value values for three identical cards: v3(i +1) ═ V3(i) + 1; for example, V3(3) is V3(2) +1, i.e. the Value of 3 sheets 3 is 1 greater than the Value of 3 sheets 2.

Value values for two identical cards: v2(i +1) ═ V2(i) + 1;

value of bomb: v4(2) ═ V3(a) +1, V4(3) ═ V4(2) + 1;

value of straight-line of same flower: v5(2) ═ V4(a) +1, V5(i +1) ═ V5(i) + 1;

i is the number of the card;

when a hand of cards is drawn, 26 cards are in total, 13 cards are in total from 2-A, one card can be drawn from the egg filling playing cards, and the corresponding red heart of the card is used as a free card to replace any card type; the specific mode is as follows:

when the card is drawn randomly to be played to be a red heart 5 in the bureau, the value values corresponding to 2-4 are 1-3 respectively, the value values corresponding to 6-A are 4-12 respectively, the value values corresponding to a square 5, a spade 5 and a plum blossom 5 are 13, and the process is repeated by taking the red heart as 14; two definitions 2 less than two 3, five definitions 2 greater than 4 a make the definition of the value of the card.

Preferably, in the step 1), the held playing cards are divided into the following three types:

first-type valid card C1: the number of the cards can be reduced under the condition of not disassembling the cards; for example, when a hand holds "33322," cards can be dealt in two hands, and when three hands are dealt with two hands, the number of sets of cards is reduced without splitting the cards.

Second type valid card C2: the number of groups cannot be reduced, but the value can be increased; for example, if "333345678" is held in the hand, changing the original "333345678" to "333345678" or two hands and "3333" as a bomb card would not decrease the total number of sets played, but would increase the value.

Third type valid card C3: the efficiency of C1 and C2 cannot be improved, but the card pressing effect on the home can be helped. For example, when a hand holding "78999910J" is playing, with 9 removed pressing the front of the house, C3 refers to a card playing mode that would reduce the number of playing groups or value, and to assist in playing the house, the cards in the hand are removed.

The first type of valid cards C1 may have a reduced number of sets, such as a reduced value for the number of sets (Z), which represents a hand, such as three 2 for one set and two 3 for another set, and requires only a minimum number of sets Ez min (Z) to be taken without dealing.

Preferably, in step 3), the constant k is obtained by:

the value of k is given in advance, when k is 2, the loss value is amplified for calculation, when k is 1/2, the proportion of the loss value is reduced for calculation, and the value of k with the largest number of winning cards is found by continuously adjusting each increment or decrement by 0.5. Firstly, a pair of cards which are dealt to a hand are put into a deep learning neural network for training, and parameters are changed: the number of training rounds (epoch), the normalization parameter (min-max), and the learning rate (learning rate) are obtained, and the value of k with the maximum probability of winning the whole deck is obtained.

Training 20 times for each k value, calculating the probability of winning the current k, presetting increasing 0.5 in a program as an automatic training mode, testing the probability of winning the card once every increasing 0.5, and finally screening the value of k which is easiest to win. The value of K will continually adjust with the outcome of the big card.

Preferably, the deep learning neural network is a neural network based on monte carlo tree search.

The Monte Carlo tree search expands the scale of the game tree step by step through iteration, wherein a UCT algorithm (upper limit confidence interval algorithm) is a game technology search algorithm, the algorithm teaches that Monte Carlo tree search and UCB consensus are combined, and the advantages in time and space are played in the search process of the super-large scale game tree. The UCT tree grows asymmetrically, the growing sequence of the UCT tree is unpredictable, the expanding direction is guided according to the performance index of the child node, the performance index is the value of the UCB, the U utilizes the existing knowledge in the searching process, more opportunities are given to the node with high winning rate, and the searching of the brother node with low temporary winning rate is also considered. The value of UCB is calculated according to the following formula: wherein W_i: the number of winning child nodes; n is a radical of_i: the number of times the child node participates in the simulation; n: the number of times that the current node participates in the simulation; c: and weighting the coefficients. The basic flow of Monte Carlo is to select a good branch (Selection), expand a layer on a leaf node (Expansion), simulate a bureau (Simulation), and return a result (Back propagation).

UCB calculation formula

3-1) establishing a root node by the currently held card type, generating all child nodes of the root node, taking each hand card as one child node, and respectively carrying out simulation for game;

3-2) starting from a root node constructed by the held cards, carrying out optimal priority search;

3-3) calculating the UCB value of each child node by using a UCB formula, and selecting the child node with the maximum value;

3-4) if the node is not a leaf node, taking the node as a root node, and repeating the step 2;

3-5) until a leaf node is encountered, if the leaf node has not been simulated for a cut, simulating a cut for this leaf node; otherwise, randomly generating a child node for the leaf node, and performing simulated office alignment;

3-6) updating the node and ancestor nodes of each level according to corresponding colors by the benefit (generally winning 1 minus 0) of simulated match-making, and increasing the access times of all the nodes above the node;

3-7) returning to 2 unless the round of search time is finished or the preset cycle number is reached;

3-8) selecting the child nodes in the current situation to give the best approach with the highest average income.

Preferably, the network structure includes: 3 convolutional layers, one Flatten layer, 4 Dense fully-connected layers and 1 softmax layer; in the convolution layer, the size of a coiling machine core is 3 x 3, and the step length of the coiling machine is 1; the Flatten layer is used for realizing one-dimensional input multi-dimensional data, namely compressing the data of height, width and channel into a one-dimensional array with the length of height multiplied by width multiplied by channel, and then connecting the one-dimensional array with the full-connection layer; after passing through the sense fully-connected layer, the softmax layer maps a plurality of scalars into a probability distribution, and each value output by the softmax layer is in a range of (0, 1); the softmax function is often used in the last layer of the neural network as the output layer for multi-classification.

In a second aspect, the present invention provides a machine bet playing system of whipped egg playing cards, comprising: the card game playing method comprises a memory, a processor and a computer program stored in the memory, wherein the computer program is configured to realize the steps of the machine game playing method of the whipped egg poker when being called by the processor.

In a third aspect, the present invention provides a computer-readable storage medium storing a computer program configured to, when invoked by a processor, implement the steps of the above-described machine playing method of whipped egg poker.

Compared with the prior art, the invention has the advantages that:

the invention utilizes an energy minimization algorithm and a deep learning method to lead a machine to learn the card-playing strategy and quickly realize the optimal card-playing mode by adjusting parameters. The card matching method is simple, short in time consumption and intelligent in operation, intelligent operation is carried out in a learning mode in training, and an efficient and intelligent card outlet strategy is achieved. The algorithm adopts a Monte Carlo algorithm and comprises 3 convolutional layers, a Flatten layer, 4 Dense full-link layers and 1 softmax layer. Finally, the most suitable card playing mode is found, and the winning is obtained in the man-machine playing card game.

Drawings

FIG. 1 is a flow chart of an embodiment of the present invention.

Fig. 2 is a network structure diagram of the deep learning neural network shown in fig. 1.

FIG. 3 is a schematic diagram of a competition rule of four persons participating in an egg-inertial poker game according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described with reference to the following embodiments and accompanying drawings. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the described embodiments without any inventive step, are within the scope of protection of the invention.

Unless defined otherwise, technical or scientific terms used herein shall have the ordinary meaning as understood by one of ordinary skill in the art to which this invention belongs. The use of the terms "comprising" or "including" and the like in the present invention, means that the element or item presented before the term covers the element or item listed after the term and its equivalents, without excluding other elements or items.

Examples

In the embodiment, the machine game playing method of the whipped egg playing cards is carried out based on the Monte Carlo tree, and the energy of the playing cards is calculated. The method comprises the steps of firstly calculating the energy of drawn card faces, calculating according to an energy calculation formula, dividing the cards into three basic effective card shapes, matching the cards, calculating the card faces, obtaining the formula by using a minimum data set through deep learning training parameters, obtaining parameters, calculating Value values, adjusting card-out strategies in real time according to the Value values, wherein the card-out rules have three types including card-out, card-following and sprint. The invention discloses a calculation strategy based on deep learning, machine algorithms are realized in a poker game, two groups of algorithms are used for carrying out poker game competition on two corners of a family by adopting the same method, and finally, a more intelligent poker competition method is trained. Referring to fig. 1, the method specifically includes:

step S100: calculating the card splitting loss E1 and the value E2 corresponding to the card splitting type of each card splitting mode in the held playing cards, and calculating the energy value Ea of each card splitting mode as E1+ E2;

the card removal loss is calculated based on the number of playing cards held.

The value calculation method corresponding to each brand is as follows:

value values for three identical cards: v3(3) ═ V3(2) + 1;

value values for two identical cards: v2(3) ═ V2(2) + 1;

value of bomb: v4(2) ═ V3(a) +1, V4(3) ═ V4(2) + 1;

value of straight-line of same flower: v5(2) ═ V4(a) +1, and V5(3) ═ V5(2) + 1.

The held playing cards are classified into the following three categories:

first-type valid card C1: the number of the cards can be reduced under the condition of not disassembling the cards;

second type valid card C2: the number of groups cannot be reduced, but the value can be increased;

third type valid card C3: the efficiency of C1 and C2 cannot be improved, but the card pressing effect on the home can be helped.

For example, brand types 2, 3, 4, 5, 6, if not matched:

E1＝1(2，2)+1(3)+1(4)+1(5，5)+1(6)＝5，

E2＝(1(2，2)+2(3)+3(4)+4(5，5)+5(6))/5＝3；

if the proportion is as follows:

E1＝1(2)+1(2，3，4，5，6)+1(5)＝3，

e2 ═ 4 (1 (2)) +3 (4 (5))/3 ═ 8/3.

Here, value (5, 5) ═ 4 is dynamically changed, and if all the cards of 2, 3, 4 have been played out, value (5, 5) ═ 1.

Step S200: comparing the energy value of each brand splitting mode to obtain the minEa;

step S300: calculated value

k is a constant; k is obtained by the following method:

presetting a value of k, calculating an amplification loss value when the value of k is 2, calculating a reduction loss value when the value of k is 1/2, and continuously adjusting by increasing or reducing 0.5 every time to find the value of k with the maximum winning number; firstly, a pair of cards which are dealt to a hand are put into a deep learning neural network for training, and parameters are changed: training round number epoch, normalization parameter min-max and learning rate, and obtaining the value of k with the maximum probability of winning the whole deck. Training 20 times for each k value, calculating the probability of winning the current k, presetting increasing 0.5 in the program as an automatic training mode, testing the probability of winning the card once every increasing 0.5, and finally screening the value of k which is easiest to win.

Referring to fig. 2, the network architecture includes: 3 convolutional layers, one Flatten layer, 4 Dense fully-connected layers and 1 softmax layer; in the convolution layer, the size of a winding machine core is 3 x 3, and the step length of the winding machine is 1; the Flatten layer is used for realizing one-dimensional input multi-dimensional data, namely compressing the data of height, width and channel into a one-dimensional array with the length of height multiplied by width multiplied by channel, and then connecting the one-dimensional array with the full connecting layer; after passing through the sense fully-connected layer, the softmax layer maps a plurality of scalars into a probability distribution, and each value output by the softmax layer is in a range of (0, 1); the softmax function is often used in the last layer of the neural network as the output layer for multi-classification.

Step S400: and (4) playing the cards according to the card types corresponding to the Eb values.

As shown in figure 3, the algorithm of the egg-shaped inertial poker game is played by four persons, two groups of games adopt different algorithms through two competition rules with one group of opponents, and two groups of algorithms are respectively adopted, in the competition, the competition environment is firstly configured, the experiment comprises computer configuration, and the hardware configuration comprises two computers 2080Ti and 2080 Ti.

For example, when a deck of cards is received, the pre-training calculation yields k to 2.5, and the formula is substituted, the Ea to E1+ E2 is calculated first, and when the hand holds "7, 9, 10, j, qqqqq, k, and" 5 "for red heart, the red heart 5 is a free card that can replace any card type, and the following card-out modes are provided:

value (78910j) 30, Value (910jqk) 32, Value (78910jqk) 40, Value (qqqqqq) 55, Value (qqqq) 45, Value (qqqq) 35, Value (7) 5, Value (9) 7, Value (10) 8, Value (j) 9, Value (k) 11, and Value (red 5) 14 are known.

1. The value E of E is calculated to be 5+7+8+9+45+11+14 to 99 according to the prior playing card

2. The card-playing method comprises the following steps: (cis: 78 (red heart 5) 910j q k) (bomb:

qqq Red heart 5)

E2＝40+35＝75

E1＝10

Ea＝10+75＝85

A second card playing method: (7, (910j q k), qqq, Red Heart 5)

E2＝5+32+14+35＝86

E1＝10

Ea＝E1+E2＝96

The corresponding energy value is smaller than that of the card-playing method, through

And (4) introducing k to 2.5 to obtain the value of Eb, finding the card-playing mode of the smallest Eb to play, recalculating the values of Ea and Eb once each card is played, and finding the card-playing mode of the smallest Eb to play. Finally, the cards in the hands are played.

Claims

1. A machine game playing method of the whipped egg poker is characterized by comprising the following steps:

2) comparing the energy value of each brand splitting mode to obtain the minEa;

3) calculated value

k is a constant;

2. The machine bet playing method of egg whipped poker according to claim 1, wherein the value of the card-breaking loss E1 in step 1) is equal to the number of playing cards held.

3. The machine game playing method of egg whipped poker according to claim 1, wherein in the step 1), the value calculation mode corresponding to each card type is as follows:

value values for three identical cards: v3(i +1) ═ V3(i) + 1;

value values for two identical cards: v2(i +1) ═ V2(i) + 1;

value of bomb: v4(2) ═ V3(a) +1, V4(3) ═ V4(2) + 1;

i is the number of the card;

4. The machine game playing method of egg whipped playing cards as claimed in claim 1, wherein in step 1), the playing cards are classified into the following three types:

5. The machine game playing method of whipped egg playing cards according to claim 1, wherein in step 3), the constant k is obtained by:

presetting a value of k, calculating an amplification loss value when the value of k is 2, calculating a reduction loss value when the value of k is 1/2, and continuously adjusting by increasing or reducing 0.5 every time to find the value of k with the maximum winning number; firstly, a pair of cards which are dealt to a hand are put into a deep learning neural network for training, and parameters are changed: training round number epoch, normalization parameter min-max and learning rate, and obtaining the value of k with the maximum probability of winning the whole deck.

6. The machine game playing method of whipped egg poker according to claim 5, wherein each k value is trained 20 times, the probability of winning the current k is calculated, 0.5 is preset to be added in the program as an automatic training mode, the probability of winning one time is tested every 0.5 is added, and the value of k which is easiest to win is finally screened out.

7. The machine game playing method of whipped egg playing card according to claim 5, wherein the deep learning neural network is a Monte Carlo tree search based neural network.

8. The machine game playing method of whipped egg playing cards as claimed in claim 1, wherein the network structure comprises: 3 convolutional layers, one Flatten layer, 4 Dense fully-connected layers and 1 softmax layer; in the convolution layer, the size of a coiling machine core is 3 x 3, and the step length of the coiling machine is 1; the Flatten layer is used for realizing one-dimensional input multi-dimensional data, namely compressing the data of height, width and channel into a one-dimensional array with the length of height multiplied by width multiplied by channel, and then connecting the one-dimensional array with the full-connection layer; after passing through the sense fully-connected layer, the softmax layer maps a plurality of scalars into a probability distribution, and each value output by the softmax layer is in a range of (0, 1); the softmax function is often used in the last layer of the neural network as the output layer for multi-classification.

9. A machine game card-playing system of whipped egg poker, comprising: a memory, a processor and a computer program stored on the memory, the computer program being configured to carry out the steps of the method of any one of claims 1 to 8 when invoked by the processor.

10. A computer-readable storage medium characterized by: the computer-readable storage medium stores a computer program configured to implement the steps of the method of any of claims 1-8 when invoked by a processor.