CN106055339A

CN106055339A - Method for determining card playing strategy of computer player in two-against-one game

Info

Publication number: CN106055339A
Application number: CN201610406925.5A
Authority: CN
Inventors: 李响
Original assignee: Tianjin Lianzhong Technology Development Co Ltd
Current assignee: Tianjin Lianzhong Technology Development Co Ltd
Priority date: 2016-06-08
Filing date: 2016-06-08
Publication date: 2016-10-26

Abstract

The invention provides a method for determining a card playing strategy of a computer player in a two-against-one game. The method comprises: creating a BP neural network; training the BP neural network by using training data; and when it is the computer player's turn to play a card in a current round of the two-against-one game, invoking the BP neural network to calculate card round parameters in the two-against-one game so as to determine the card playing strategy of the computer player. By implementation of the invention, the card playing strategy of the computer player can adapt to an actual card round situation, then the card playing intelligent level and simulation degree of the computer player can be improved, and the game experience of a real player who plays a game with the computer player is enhanced.

Description

Two make a call to one game in determine computer player play a card strategy method

Technical field

The present invention relates to artificial intelligence field, particularly relate to a kind of two and make a call to a game determining, computer player is played a card strategy Method.

Background technology

Two make a call to the board intermediate item that a game is a kind of physical culture intelligent sports that three players the participate in, " bucket being namely commonly called as Landlord " cards game.The two basic laws of the game plaing a game are by crying score, make a player become banker, remaining two Player becomes defender and banker and resists, and judges gambling party victory or defeat and according to row board board with a certain player hands that taken the lead in Score settled accounts by type.

Along with the development of computer technology, prior art provides online or unit electronization two and makes a call to a game journey Sequence, further, in unit training mode or compound match rule, occurs in that utilize artificial intelligence technology to realize two play a game Computer player, this computer player can simulate the row board of true player according to two rules plaing a game, in order to substitutes true Real player carries out two together with other true players and plays a game.Computer player of the prior art generally uses finite state Machine realizes, such as Chinese patent " implementation method of intelligent algorithm for computer player in Doudizhu game " (application number Technical scheme disclosed in 201010246269.X), this technical scheme is exactly that a kind of typical use finite state machine combines something lost Propagation algorithm realizes the method for two logic structure model making a call to a game Computer player.

The defect of prior art is, due to the characteristic of of finite state machine itself, builds meter even with genetic algorithm The row board logic of calculation machine player, is also easy to the incomplete problem of algorithm occur, thus causes the level of intelligence of computer player Low and style of playing a card is single, it is impossible to correctly identify ping-pong situation complicated in gambling party, the defender served as with true player During cooperation, artificial intelligence's degree of verisimilitude is low.Therefore the true player played is carried out with the computer player using finite state machine to realize It is difficult to obtain satisfied game experiencing.

Summary of the invention

In order to overcome drawbacks described above of the prior art, the invention provides a kind of two and make a call in a game and determine that computer is played Family play a card strategy method, the method includes:

Create BP neutral net；

Training data is used to train described BP neutral net；

An innings two currently carried out make a call to one game in take turns to computer player play a card time, call described BP neutral net pair The described two gambling party parameters made a call in a game calculate, to determine the strategy of playing a card of described computer player.

According to another aspect of the present invention, before using training data to train described BP neutral net, the method is also Including: pre-stored two makes a call to the gambling party information of true player in a history gambling party played；Generate described according to described gambling party information Training data.

According to a further aspect of the invention, described in the method, training data includes:

For describing the gaming identity of described true player, hands quantity, hands board type, active/passive play a card state, quilt Dynamic board type faced by when playing a card, the gaming identity of the player that plays a card faced by when passively playing a card, the board type got, get Board quantity, the overall situation remain hands quantity, board type of playing a card, cross the arbitrary or variable of a combination thereof and value thereof in board operation.

According to a further aspect of the invention, the method creates BP neutral net to include: be respectively created described BP neural The input layer of network, hidden layer and output layer；Initialize described input layer and described hidden layer weights and side-play amount, and initially Change described hidden layer and the weights of described output layer and side-play amount.

According to a further aspect of the invention, the method use training data train described BP neutral net to include: institute Stating BP neutral net uses back-propagation algorithm to calculate described training data, makes the actual output of described BP neutral net with described The error of the desired output in training data is less than predetermined threshold；And/or described BP neutral net uses back-propagation algorithm meter Calculate described training data, make the correction number of times of the actual output of described BP neutral net reach pre-determined number.

According to a further aspect of the invention, the method use training data train the step of described BP neutral net to make Realize by Distributed Calculation.

According to a further aspect of the invention, gambling party parameter described in the method includes: is used for describing described computer and plays The gaming identity of family, hands quantity, hands board type, active/passive play a card state, board type faced by when passively playing a card, passively go out The variable of the gaming identity of the player that plays a card faced by during board and value thereof；With make a call to one for describing described an innings two currently carried out The board type that in game, all players have got, the board quantity got, the variable of overall situation residue hands quantity and value thereof.

According to a further aspect of the invention, strategy of playing a card described in the method at least includes for marking described calculating The row board information of machine player's playing mode, described row board information includes play a card board type or mistake for describing described computer player The variable of board operation and value thereof.

Provided by the present invention two make a call to one game in determine computer player play a card strategy method, utilize trained BP neutral net calculates gambling party parameter in the first two plays a game and plays a card strategy determining computer player so that computer The strategy of playing a card of player is adapted to the gambling party situation of reality, and then promotes play a card intelligence degree and the plan of computer player True degree, enhances and carries out the game experiencing of the true player played with computer player.

Accompanying drawing explanation

By the detailed description that non-limiting example is made made with reference to the following drawings of reading, other of the present invention Feature, purpose and advantage will become more apparent upon:

Fig. 1 be two according to the present invention make a call to one game in determine computer player play a card strategy method one be embodied as The flow chart of mode；

Fig. 2 be two according to the present invention make a call to one game in determine computer player play a card strategy method one optional specifically The flow chart of embodiment；

Fig. 3 is the detailed process schematic diagram of step S100 shown in Fig. 1 or Fig. 2；

In accompanying drawing, same or analogous reference represents same or analogous step.

Detailed description of the invention

For a better understanding and interpretation of the present invention, below in conjunction with accompanying drawing, the present invention is described in further detail.

The invention provides a kind of two make a call to one game in determine computer player play a card strategy method, refer to Fig. 1, figure 1 is that two according to the present invention make a call to and determine in a game that computer player is played a card the flow chart of method of strategy, the method include with Lower step:

Step S100, creates BP neutral net；

Step S200, uses training data to train described BP neutral net；

Step S300, an innings two currently carried out make a call to one game in take turns to computer player play a card time, call described BP The gambling party parameter that neutral net is made a call to described two in a game calculates, to determine the strategy of playing a card of described computer player.

Term " BP neutral net " refers to back propagation (Back Propagation) neutral net, people in the art Member is appreciated that this BP neutral net is a kind of trainable Multi-layered Feedforward Networks, such as, use error backpropagation algorithm to enter Row training, the BP neutral net after training can learn and store the mapping relations of substantial amounts of input-output pattern, and not With the function of the mapping relations pre-set as finite state machine for describing input-output pattern, therefore BP nerve net Network can be used for realizing artificial intelligence, especially sets up artificial intelligence's reaction for external environment.Especially, in the present invention should BP neutral net is for determining two strategies of playing a card making a call to a game Computer player.

Specifically, refer to the detailed process schematic diagram that Fig. 3, Fig. 3 are steps S100 shown in Fig. 1 or Fig. 2, in step The step creating BP neutral net in S100 includes:

Step S101, is respectively created input layer, hidden layer and the output layer of described BP neutral net；

Step S102, initializes described input layer and described hidden layer weights and side-play amount, and initializes described implicit Layer and the weights of described output layer and side-play amount.

It will be understood by those skilled in the art that the described input layer in BP neutral net for receiving the information of input, and Pass to described hidden layer.Described hidden layer is for processing and calculate the information of described input to generate object information, and will tie Really information passes to described output layer, and described output layer is for outwardly transmitting described object information.Need according to information processing Asking, described hidden layer can be designed as single layer structure or multiple structure.Correspondingly, it will be understood by those skilled in the art that BP to be made Neutral net has normal learning capacity, needs to perform such as the operation in step S102, initializes described input layer with described Hidden layer weights and side-play amount, and initialize described hidden layer and the weights of described output layer and side-play amount, namely regulation institute Stating the initial value of weights and side-play amount, these weights and side-play amount can implement technical experience or the experiment knot of personnel according to the present invention Really determine, it is also possible to selecting according to routine and determine, this is not limited by the present invention.

In view of the level of intelligence in order to promote computer player, need to use training data to train institute in step s 200 State BP neutral net.Owing to the BP neutral net of the present invention is for determining two plans of playing a card making a call to the computer player in a game Slightly, the most described training data can generate according to the two gambling party information making a call to true player in the history gambling parties played. Preferably, refer to Fig. 2, Fig. 2 be two according to the present invention make a call to one game in determine computer player play a card strategy method one The flow chart of individual optional detailed description of the invention, compared with the technical scheme shown in Fig. 1, the technical scheme shown in Fig. 2 is in step Can also comprise the steps: before S200

Step S400, pre-stored two makes a call to the gambling party information of true player in a history gambling party played；

Step S500, generates described training data according to described gambling party information.

Specifically, owing to described training data is to generate according to the playing a card action of true player, therefore use described After BP neutral net is trained by training data, this BP neutral net can trend towards making the true object for appreciation of computer player simulation The strategy of playing a card of family, to reach to promote the purpose of computer player level of intelligence.In order to record described true object for appreciation all sidedly as far as possible Family's row board strategy in history gambling party, the most described training data includes the game body for describing described true player Part, hands quantity, hands board type, active/passive play a card state, board type faced by when passively playing a card, faced by when passively playing a card Play a card the gaming identity of player, the board type got, the board quantity got, the overall situation residue hands quantity, play a card board type, mistake Arbitrary or the variable of a combination thereof and value thereof in board operation..The variable X of described gaming identity is such as described₁₅Represent, wherein X₁₅=1 represents banker, X₁₅=2 represent banker player whose turn comes next defender, X₁₅=3 represent family defender on banker.Those skilled in the art It is appreciated that in implementing the present invention, it may, variable and value thereof for describing above-mentioned gambling party relevant information can bases The training demand of described BP neutral net freely selects, it is only necessary to meet the gambling party corresponding to specific value of described variable Relevant information is uniquely.

Specifically, step S200 use training data train described BP neutral net to include: described BP neutral net makes Calculate described training data with back-propagation algorithm, make actual output and the phase in described training data of described BP neutral net Hope that the error of output is less than predetermined threshold；And/or described BP neutral net uses back-propagation algorithm to calculate described training data, The correction number of times making the actual output of described BP neutral net reaches pre-determined number.Wherein said BP neutral net uses and reversely passes Broadcast the algorithm described training data of calculating to refer to: described training data inputs described BP neutral net according to described weights with inclined Shifting amount calculates, and records actual output, compares the error of this actual output and the desired output in described training data, root According to weights and side-play amount described in described error update, the most again calculate described instruction according to the described weights after updating and side-play amount Practice data and obtain actual output, the most described actual output and the error of described desired output.Circulation execution is described reversely Propagation algorithm until described error reaches pre-determined number less than the correction number of times of predetermined threshold and/or described actual output, according to The principle of back-propagation algorithm, described error can progressively reduce.The algorithm updating described weights and side-play amount can use under gradient Fall algorithm.It should be noted that described desired output is typically to play a card in the face of identical according to player true in described training data The information of playing a card of situation and generate.After described BP neutral net uses described training data to be trained, its actual output Tend to and close to described desired output, namely play a card strategy trend the strategy of playing a card close to true player of computer player.

The gambling party information bigger in view of may including the order of magnitude in described training data, preferably step S200 is permissible Use Distributed Calculation to realize, namely complete the training process to described BP neutral net on use multi-section processor.Use The advantage of method execution step S200 of Distributed Calculation is can be with the computational efficiency of BP neutral net described in training for promotion.

One innings two make a call to Import computer player in a game, and perform step S300, i.e. make a call to one currently carrying out two Game takes turns to computer player when playing a card, call the gambling party parameter that described BP neutral net makes a call to described two in a game and carry out Calculate, to determine the strategy of playing a card of described computer player.The most described gambling party parameter includes: be used for describing described computer The gaming identity of player, hands quantity, hands board type, active/passive are played a card state, board type faced by when passively playing a card, passive The variable of the gaming identity of the player that plays a card faced by when playing a card and value thereof；With for describing described an innings currently carried out two dozens The board type that in one game, all players have got, the board quantity got, the variable of overall situation residue hands quantity and value thereof, Namely described gambling party parameter for reaction when the first two make a call to one play described in computer player play a card before faced by gambling party shape Condition.Described strategy of playing a card at least includes the row board information for marking described computer player playing mode, and described row board is believed Breath includes the board type of playing a card for describing described computer player or the variable crossing board operation and value thereof.Those skilled in the art Should be understood that the above-mentioned board type of playing a card for describing described computer player or the variable of board operation excessively and value thereof are according to described BP Neutral net calculates the actual output of gained according to described gambling party parameter and determines.

Although it should be noted that describe the operation of the inventive method in the accompanying drawings with particular order, but, this is not Require or hint must perform these operations according to this particular order, could be real or have to carry out the most shown operation Existing desired result.On the contrary, the step described in flow chart can change execution sequence.Additionally or alternatively, it is convenient to omit Multiple steps are merged into a step and are performed by some step, and/or a step is decomposed into the execution of multiple step.

The present invention provide two make a call to one game in determine computer player play a card strategy method can use able to programme patrolling Volume device realizes, it is also possible to be embodied as computer software, such as, can be a kind of calculating according to embodiments of the invention Machine program product, runs this program product and makes computer perform for the method demonstrated.Described computer program includes Computer-readable recording medium, this medium comprises computer program logic or code section, for realizing each of said method Individual step.Described computer-readable recording medium can be mounted built-in medium in a computer or can be from calculating owner The removable medium (such as hot-plugging technology storage device) of body dismounting.Described built-in medium includes but not limited to rewritable non- Volatile memory, such as RAM, ROM, flash memory and hard disk.Described removable medium includes but not limited to: optical storage matchmaker Body (such as CD-ROM and DVD), magneto-optic storage media (such as MO), magnetic recording medium (such as tape or portable hard drive), have The media (such as storage card) of built-in rewritable nonvolatile memory and there are the media (such as ROM box) of built-in ROM.

It will be appreciated by those skilled in the art that any computer system with properly programmed device all will be able to carry out bag All steps of the method for the present invention being contained in program product.Although the most detailed description of the invention all sides described in this specification Overweight software program, but the alternate embodiment realizing the method for present invention offer as firmware and hardware equally will in the present invention Ask within the scope of protection.

It is obvious to a person skilled in the art that the invention is not restricted to the details of above-mentioned one exemplary embodiment, Er Qie In the case of the spirit or essential attributes of the present invention, it is possible to realize the present invention in other specific forms.Therefore, no matter From the point of view of which point, all should regard embodiment as exemplary, and be nonrestrictive, the scope of the present invention is by appended power Profit requires rather than described above limits, it is intended that all by fall in the implication of equivalency and scope of claim Change is included in the present invention.Should not be considered as limiting involved claim by any reference in claim.This Outward, it is clear that " including ", a word was not excluded for miscellaneous part, unit or step, and odd number is not excluded for plural number.

Above disclosed be only the present invention some preferred embodiments, certainly can not with this limit the present invention it Interest field, the equivalent variations therefore made according to the claims in the present invention, still belong to the scope that the present invention is contained.

Claims

1. two make a call to one game in determine computer player play a card strategy a method, the method includes:

Create BP neutral net；

Training data is used to train described BP neutral net；

An innings two currently carried out make a call to one game in take turns to computer player play a card time, call described BP neutral net to described The two gambling party parameters made a call in a game calculate, to determine the strategy of playing a card of described computer player.

Method the most according to claim 1, before using training data to train described BP neutral net, the method is also wrapped Include:

Pre-stored two makes a call to the gambling party information of true player in a history gambling party played；

Described training data is generated according to described gambling party information.

Method the most according to claim 2, wherein, described training data includes:

For describing the gaming identity of described true player, hands quantity, hands board type, active/passive play a card state, passively go out Board type faced by during board, the gaming identity of the player that plays a card faced by when passively playing a card, the board type got, the board number got Amount, the overall situation remain hands quantity, board type of playing a card, cross the arbitrary or variable of a combination thereof and value thereof in board operation.

Method the most according to claim 1, wherein, creates BP neutral net and includes:

It is respectively created input layer, hidden layer and the output layer of described BP neutral net；

Initialize described input layer and described hidden layer weights and side-play amount, and initialize described hidden layer and described output layer Weights and side-play amount.

Method the most according to claim 1, wherein, uses training data to train described BP neutral net to include:

Described BP neutral net uses back-propagation algorithm to calculate described training data, makes the actual output of described BP neutral net It is less than predetermined threshold with the error of the desired output in described training data；And/or

Described BP neutral net uses back-propagation algorithm to calculate described training data, makes the actual output of described BP neutral net Correction number of times reach pre-determined number.

6. according to the method described in any one of claim 1 to 5, wherein:

Training data is used to train the step of described BP neutral net to use Distributed Calculation to realize.

Method the most according to claim 1, wherein, described gambling party parameter includes:

For describing the gaming identity of described computer player, hands quantity, hands board type, active/passive play a card state, passive Board type faced by when playing a card, the variable of gaming identity of the player that plays a card faced by when passively playing a card and value thereof；With

The board type that in a game, all players have got, the board number got is made a call to for describing described an innings two currently carried out Amount, the variable of overall situation residue hands quantity and value thereof.

Method the most according to claim 1, wherein:

Described strategy of playing a card at least includes the row board information for marking described computer player playing mode, and described row board is believed Breath includes the board type of playing a card for describing described computer player or the variable crossing board operation and value thereof.