CN111667075A - Service execution method, device and related equipment - Google Patents
Service execution method, device and related equipment Download PDFInfo
- Publication number
- CN111667075A CN111667075A CN202010540400.7A CN202010540400A CN111667075A CN 111667075 A CN111667075 A CN 111667075A CN 202010540400 A CN202010540400 A CN 202010540400A CN 111667075 A CN111667075 A CN 111667075A
- Authority
- CN
- China
- Prior art keywords
- game
- model
- optimization
- original
- result
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 62
- 238000005457 optimization Methods 0.000 claims description 84
- 230000009471 action Effects 0.000 claims description 38
- 238000004590 computer program Methods 0.000 claims description 10
- 238000012545 processing Methods 0.000 claims description 10
- 238000012549 training Methods 0.000 abstract description 19
- 230000008569 process Effects 0.000 abstract description 16
- 238000010586 diagram Methods 0.000 description 6
- 230000006872 improvement Effects 0.000 description 6
- 238000013473 artificial intelligence Methods 0.000 description 4
- 230000008859 change Effects 0.000 description 3
- 238000011156 evaluation Methods 0.000 description 3
- 238000004088 simulation Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 241000255588 Tephritidae Species 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000003362 replicative effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
- G06N5/042—Backward inferencing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The application discloses a service execution method, a device, a system and a computer readable storage medium, wherein the service execution method utilizes the existing supervised learning game model to carry out self-game, corrects the game model according to a game result in the game process, generates corresponding game samples according to the corrected game model for continuous training of the subsequent supervised learning game model, and accordingly, the game level of the supervised learning game model is gradually improved by optimizing the game samples, the model precision is ensured, and the accuracy of the direct result of the game service is further improved.
Description
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to a service execution method, a service execution apparatus, a service execution system, and a computer-readable storage medium.
Background
Machine gaming has been touted as artificial intelligence fruit flies, and has been at the forefront of artificial intelligence research, for example, poker games are typical non-complete information games, and a long-standing challenge in artificial intelligence research, and many game intelligence systems have reached an advanced player level by replicating human player decisions using supervised learning. However, although the end-to-end game strategy model can be obtained by using the neural network for supervised learning based on the human game data, the performance level of the game strategy model trained based on the human game data is limited by the quality of the training data, because the human player sample contains strategy error data, the quality of the sample data set limits further improvement of the learned network model performance to a certain extent, so that the model precision is lower, and the accuracy of the execution result of the corresponding game service is reduced.
Therefore, how to effectively improve the accuracy of the game model and further improve the accuracy of the game service execution result is a problem to be solved urgently by those skilled in the art.
Disclosure of Invention
The application aims to provide a service execution method, which can effectively improve the accuracy of a game model and further improve the accuracy of a game service execution result; it is another object of the present application to provide a service execution apparatus, system and computer-readable storage medium, which also have the above-mentioned advantageous effects.
In order to solve the foregoing technical problem, in a first aspect, the present application provides a service execution method, including:
performing self-game by using an original game model to obtain a first game result;
backtracking according to the first game result to obtain a second game result opposite to the first game result, and obtaining a game sample corresponding to the second game result;
optimizing the original game model by using the game samples to obtain an optimized game model;
carrying out model confrontation on the original game model and the optimized game model, and reserving a game model with successful confrontation as the original game model;
judging whether the current model optimization meets preset optimization conditions, if not, returning to the step of utilizing the original game model to carry out self game to obtain a first game result for iterative optimization until the current model optimization meets the preset optimization conditions to obtain an optimal game model;
and executing the target game business by utilizing the optimal game model.
Preferably, the self-gaming by using the original gaming model to obtain the first gaming result includes:
acquiring current game data;
processing the current game data by using the original game model to obtain each legal game action and a probability value corresponding to each legal game action;
and determining a maximum probability value in all the probability values, and executing legal game actions corresponding to the maximum probability value until the game is finished to obtain the first game result.
Preferably, after obtaining the game sample corresponding to the second game result, the method further includes:
judging whether the number of the game samples reaches a preset number or not; if not, returning to the step of utilizing the original game model to carry out self-game to obtain a first game result until the number of the game samples reaches the preset number;
optimizing the original game model by using the game samples to obtain an optimized game model, wherein the optimizing comprises the following steps:
and optimizing the original game model by using the preset number of game samples to obtain the optimized game model.
Preferably, the determining whether the current model optimization meets a preset optimization condition includes:
counting the optimization times of the current model;
and judging whether the current model optimization times reach preset times or not.
In a second aspect, the present application further provides a service execution apparatus, including:
the initial game module is used for carrying out self-game by utilizing the original game model to obtain a first game result;
the backtracking game module is used for backtracking according to the first game result to obtain a second game result opposite to the first game result and obtain a game sample corresponding to the second game result;
the model optimization module is used for optimizing the original game model by using the game samples to obtain an optimized game model;
the model confrontation module is used for carrying out model confrontation on the original game model and the optimized game model and reserving a game model with successful confrontation as the original game model;
the iterative optimization module is used for judging whether the current model optimization meets a preset optimization condition, if not, returning to the step of utilizing the original game model to carry out self game and obtain a first game result for iterative optimization until the current model optimization meets the preset optimization condition, and obtaining an optimal game model;
and the service execution module is used for executing the target game service by utilizing the optimal game model.
Preferably, the primary gaming module comprises:
the data acquisition unit is used for acquiring current game data;
the data processing unit is used for processing the current game data by using the original game model to obtain each legal game action and a probability value corresponding to each legal game action;
and the action execution unit is used for determining a maximum probability value in all the probability values and executing legal game actions corresponding to the maximum probability value until the game is finished to obtain the first game result.
Preferably, the service execution method further includes:
the sample counting module is used for judging whether the number of the game samples reaches a preset number or not after the game samples corresponding to the second game result are obtained; if not, returning to the step of utilizing the original game model to carry out self-game to obtain a first game result until the number of the game samples reaches the preset number;
the model optimization module is specifically configured to optimize the original game model by using the preset number of game samples to obtain the optimized game model.
Preferably, the iterative optimization module is specifically configured to count the number of times of current model optimization; judging whether the current model optimization times reach preset times or not; if not, returning to the step of utilizing the original game model to carry out self game and obtain the first game result to carry out iterative optimization until the current model optimization meets the preset optimization condition to obtain the optimal game model.
In a third aspect, the present application further discloses a service execution system, including:
a memory for storing a computer program;
a processor for executing the computer program to implement the steps of any of the service execution methods described above.
In a fourth aspect, the present application also discloses a computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, is adapted to carry out the steps of any of the service execution methods as described above.
The service execution method comprises the steps of utilizing an original game model to carry out a self-game to obtain a first game result; backtracking according to the first game result to obtain a second game result opposite to the first game result, and obtaining a game sample corresponding to the second game result; optimizing the original game model by using the game samples to obtain an optimized game model; carrying out model confrontation on the original game model and the optimized game model, and reserving a game model with successful confrontation as the original game model; judging whether the current model optimization meets preset optimization conditions, if not, returning to the step of utilizing the original game model to carry out self game to obtain a first game result for iterative optimization until the current model optimization meets the preset optimization conditions to obtain an optimal game model; and executing the target game business by utilizing the optimal game model.
Therefore, the service execution method provided by the application utilizes the existing supervised learning game model to carry out self-game, corrects the game model according to the game result in the game process, generates the corresponding game sample according to the corrected game model, and is used for continuous training of the subsequent supervised learning game model, so that the game level of the supervised learning game model is gradually improved by optimizing the game sample, the model precision is ensured, and the accuracy of the direct result of the game service is further improved.
The service execution device, the service execution system, and the computer-readable storage medium provided by the present application all have the above beneficial effects, and are not described herein again.
Drawings
In order to more clearly illustrate the technical solutions in the prior art and the embodiments of the present application, the drawings that are needed to be used in the description of the prior art and the embodiments of the present application will be briefly described below. Of course, the following description of the drawings related to the embodiments of the present application is only a part of the embodiments of the present application, and it will be obvious to those skilled in the art that other drawings can be obtained from the provided drawings without any creative effort, and the obtained other drawings also belong to the protection scope of the present application.
Fig. 1 is a schematic flow chart of a service execution method provided in the present application;
FIG. 2 is a flow chart of a method for optimizing game samples provided herein;
FIG. 3 is a flow chart of a method for optimizing a game model provided herein;
FIG. 4 is a diagram illustrating the trend of the confrontation results of a game model provided in the present application;
fig. 5 is a schematic structural diagram of a service execution device provided in the present application;
fig. 6 is a schematic structural diagram of a service execution device provided in the present application.
Detailed Description
The core of the application is to provide a service execution method, which can effectively improve the accuracy of a game model and further improve the accuracy of a game service execution result; another core of the present application is to provide a service execution apparatus, a system and a computer-readable storage medium, which also have the above-mentioned advantages.
In order to more clearly and completely describe the technical solutions in the embodiments of the present application, the technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application. It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Referring to fig. 1, fig. 1 is a schematic flow chart of a service execution method provided in the present application, including:
s101: performing self-game by using an original game model to obtain a first game result;
this step is intended to perform a self-game using the original game model to obtain a corresponding game result, i.e. the above-mentioned first game result. The original game model is an existing supervised learning game strategy model, a self-game platform is built through the original game model to simulate a game, each game participant utilizes the original game model to make a decision and complete the game, and the first game result is obtained.
As a preferred embodiment, the self-gaming using the original gaming model to obtain the first gaming result may include: acquiring current game data; processing the current game data by using an original game model to obtain each legal game action and a probability value corresponding to each legal game action; and determining the maximum probability value in all the probability values, and executing legal game actions corresponding to the maximum probability value until the game is ended to obtain a first game result.
The preferred embodiment provides a specific method for acquiring a first game result, which includes acquiring current game data, wherein the current game data is situation data of a current game, processing the current game data by using an original game model to acquire legal game actions and probability values corresponding to the legal game actions, and executing the legal game actions corresponding to the maximum probability values to perform the game until the game is finished, so that the first game result can be acquired. For example, for a chess game, for one party participating in the game, the original game model can be used to process the current game data, such as the played information, the card information hidden by the own card information, and the like, so as to obtain the legal playing actions and the corresponding probability values, and further, the players can execute the legal playing actions corresponding to the maximum probability values, so that the players play cards in sequence by using the original game model until the game is finished, and the game result is obtained.
S102: backtracking according to the first game result to obtain a second game result opposite to the first game result, and obtaining a game sample corresponding to the second game result;
the method comprises the following steps of obtaining a second game result opposite to the first game result through backtracking game and obtaining a game sample corresponding to the second game result. Assuming that the game is played by a participant a and a participant B respectively, in the self-game process, the first game result indicates that the participant a wins over the participant B, and the second game result opposite to the first game result indicates that the participant B wins over the participant a, which can be realized by backtracking the game. Specifically, backtracking can be performed from the side with the game failure to reach the upper-layer game decision point, legal game actions completely different from the previous legal game actions are selected and executed at the decision point, the original game model is utilized to continue the game until the game is finished, if all legal game actions at the decision point cannot change the game result, the backtracking is continued upwards to perform the game until a second game result opposite to the first game result is obtained; furthermore, through multiple backtracking, until no legal game actions which can be continuously improved can be found in the specified backtracking layer number and backtracking times, at the moment, corresponding game samples which can change the game result can be obtained based on each improved action, namely the optimized game samples.
S103: optimizing the original game model by using the game sample to obtain an optimized game model;
the step aims at realizing model optimization, the optimized game samples are utilized to optimize the original game model to obtain a corresponding optimized game model, the model optimization process refers to the prior art, and the description is omitted herein.
As a preferred embodiment, after obtaining the game sample corresponding to the second game result, the method may further include: judging whether the number of the game samples reaches a preset number or not; if not, returning to the step of utilizing the original game model to perform self-game to obtain a first game result until the number of the game samples reaches a preset number; the optimizing the original game model by using the game samples to obtain an optimized game model may include: and optimizing the original game model by using a preset number of game samples to obtain an optimized game model.
In order to effectively ensure the optimization effect of the model and improve the performance of the optimized model, the number of the optimized game samples can be preset so as to optimize the original game model by using a certain number of optimized game samples. Therefore, after the game samples corresponding to the second game result are obtained based on S102, the number of the game samples can be counted first, and whether the number of the game samples reaches the preset number is judged, if not, the process returns to S101 to repeat the self game and backtracking game until the preset number of optimized game samples are obtained, so that in S103, the original game model can be optimized by using the preset number of game samples, and an optimized game model with higher performance is obtained. It can be understood that the value of the preset number does not affect the implementation of the technical scheme, and the value is set by a technician according to the actual situation, which is not limited in the present application.
S104: carrying out model confrontation on the original game model and the optimized game model, and reserving the game model with successful confrontation as the original game model;
s105: judging whether the current model optimization meets preset optimization conditions, if not, returning to S101 for iterative optimization, and if so, executing S106;
the method aims to realize model confrontation, namely an original game model and an optimized game model are subjected to confrontation game, the game model with successful confrontation is reserved, and the game model with successful confrontation is set as a new original game model, so that the game model with optimal performance can be obtained by performing cycle iterative training according to iteration conditions. The iteration condition is a preset condition for judging whether the model needs to be continuously subjected to iteration training or not, namely the preset optimization condition is not unique in type, can be the maximum times of the iteration training, can also be a condition that certain model parameters reach certain standard values and the like, and is not limited in the application.
As a preferred embodiment, the above determining whether the current model optimization satisfies the preset optimization condition may include: counting the optimization times of the current model; and judging whether the optimization times of the current model reach preset times or not.
The method comprises the following steps of providing a specific type of preset optimization condition, namely presetting the highest times of iterative training, namely the preset times, counting the optimization times of a current model after each model confrontation, judging whether the optimization times reach the preset times, if not, continuing the iterative training until the optimization times of the current model reach the preset times, and obtaining the optimal game model. The specific value of the preset times does not affect the implementation of the technical scheme, and the technical personnel can set the value according to the actual situation, which is not limited by the application.
S106: and taking the original game model as an optimal game model, and executing the target game service by using the optimal game model.
The step aims to realize the execution of the game service, namely when the target game service is obtained, the optimal game model is directly called to carry out the game, and the corresponding game service execution result can be obtained. The target game service is the received game service to be executed.
It should be noted that, the above-mentioned S101 to S105 are training processes of an optimal game model, and in an actual game service execution process, the above-mentioned model training process only needs to be executed once, and then the model is directly called when the game service is received again. In addition, the memorability correction and optimization of the optimal game model can be continued according to the execution result of the game service, so that a game model with better performance can be obtained.
Therefore, the service execution method provided by the application utilizes the existing supervised learning game model to carry out self-game, corrects the game model according to the game result in the game process, generates the corresponding game sample according to the corrected game model, and is used for continuous training of the subsequent supervised learning game model, so that the game level of the supervised learning game model is gradually improved by optimizing the game sample, the model precision is ensured, and the accuracy of the direct result of the game service is further improved.
On the basis of the above embodiments, the embodiment of the present application provides a more specific service execution method taking the game of the field-fighting primary game as an example, and the specific implementation flow is as follows:
(1) self-gaming simulated games
Learning card-playing strategy model p based on existing supervision of fighting with land ownerθBuilding a self-gaming platform simulation game, each player using pθMaking a decision, inputting the situation data of the current game state s into the model, and outputting the probability distribution p of all legal actions in the current game stateθAnd (a | s), each player respectively picks the legal action with the highest probability to play until the game is ended.
(2) Performing backtracking improvement decision to generate optimized sample
Referring to fig. 2, fig. 2 is a flowchart of a game sample optimization method provided in the present application, based on the above self-game process, tracing back from the losing player to the previous player decision point, picking a different card-playing action from the previous one at the decision point with multiple card-playing modes, and continuing to use p from this stepθSimulating the game until the game is finished, if all the card-playing actions of the decision point can not change the game result, continuously backtracking upwards, selecting different actions for simulation until the game result is changed, and recording the improvement action; further, repeating the backtracking process from the new output side until no continuously improved methods can be found within the specified backtracking layer number (set as 8 layers) and the maximum backtracking times (namely the iterative backtracking times in the single game, set as 400), and ending the backtracking of the game; finally, based on the improvement action, a new optimized training sample is generated for each step of the game and stored in the sample container M.
(3) Generating a number of game samples
Based on pθThe multiple games are played against so that the sample size of the sample container reaches the set number (the first training sample is set to 50 ten thousand, and the sample number is set to 2.5 thousand in the subsequent iterative training process).
(4) Model training
Self-gaming sample continuation pair p based on optimizationθTraining is carried out, and after the number of times of model training reaches the specified number (set to be 1000 times), a new supervised learning game strategy model p 'is obtained'θThis is one strategy model iteration.
(5) Model evaluation
Game strategy model p for supervised learningθAnd p'θConducting game fight, and determining winning model as new pθAnd using the new pθAnd continuing to play the self-game simulation game, and repeating the steps.
Therefore, the above processes are performed circularly, as shown in fig. 3, fig. 3 is a flowchart of a game model optimization method provided by the present application, until the performance of a supervised learning game policy model is not improved, an optimal game model is obtained.
The model evaluation can be realized by adopting the following method: selecting 1000 deals of ground-fighting games (the card distribution result and the base card of each game are known) as a fixed test card game library, wherein the 1000 deals are used for each model evaluation; because the landlord has three roles, on each card game, the two strategy models are combined according to the roles and the cards to carry out 6 times of confrontation games, and finally 6000 games are carried out, and meanwhile, the winning rates of the two strategy models are recorded, as shown in fig. 4, fig. 4 is a competition result trend chart of the game model provided by the application.
Therefore, the service execution method provided by the embodiment of the application utilizes the existing supervised learning game model to carry out self-game, corrects the game model according to the game result in the game process, generates the corresponding game sample according to the corrected game model, and is used for continuous training of the subsequent supervised learning game model, so that the game level of the supervised learning game model is gradually improved by optimizing the game sample, the model precision is ensured, and the accuracy of the direct result of the game service is further improved.
To solve the above technical problem, the present application further provides a service execution device, please refer to fig. 5, where fig. 5 is a schematic structural diagram of the service execution device provided in the present application, and the schematic structural diagram includes:
the initial game module 1 is used for carrying out self-game by utilizing an original game model to obtain a first game result;
the backtracking game module 2 is used for backtracking according to the first game result to obtain a second game result opposite to the first game result and obtain a game sample corresponding to the second game result;
the model optimization module 3 is used for optimizing the original game model by using the game samples to obtain an optimized game model;
the model confrontation module 4 is used for carrying out model confrontation on the original game model and the optimized game model and reserving a game model with successful confrontation as the original game model;
the iterative optimization module 5 is used for judging whether the current model optimization meets a preset optimization condition, if not, returning to the step of utilizing the original game model to perform self game and obtain a first game result to perform iterative optimization until the current model optimization meets the preset optimization condition, and obtaining an optimal game model;
and the service execution module 6 is used for executing the target game service by utilizing the optimal game model.
Therefore, the service execution device provided by the embodiment of the application utilizes the existing supervised learning game model to perform self-game, corrects the game model according to the game result in the game process, generates the corresponding game sample according to the corrected game model, and is used for continuous training of the subsequent supervised learning game model, so that the game level of the supervised learning game model is gradually improved by optimizing the game sample, the model precision is ensured, and the accuracy of the direct result of the game service is further improved.
As a preferred embodiment, the above-mentioned primary gaming module 1 may comprise:
the data acquisition unit is used for acquiring current game data;
the data processing unit is used for processing the current game data by using the original game model to obtain each legal game action and a probability value corresponding to each legal game action;
and the action execution unit is used for determining a maximum probability value in all the probability values and executing legal game actions corresponding to the maximum probability value until the game is finished to obtain the first game result.
As a preferred embodiment, the service execution device may further include a sample statistics module, configured to determine, after the game samples corresponding to the second game result are obtained, whether the number of the game samples reaches a preset number; if not, returning to the step of utilizing the original game model to carry out self-game to obtain a first game result until the number of the game samples reaches the preset number;
the model optimization module 3 may be specifically configured to optimize the original game model by using the preset number of game samples to obtain the optimized game model.
As a preferred embodiment, the iterative optimization module 5 may be specifically configured to count the number of times of current model optimization; judging whether the current model optimization times reach preset times or not; if not, returning to the step of utilizing the original game model to carry out self game and obtain the first game result to carry out iterative optimization until the current model optimization meets the preset optimization condition to obtain the optimal game model.
For the introduction of the apparatus provided in the present application, please refer to the above method embodiments, which are not described herein again.
To solve the above technical problem, the present application further provides a service execution system, please refer to fig. 6, where fig. 6 is a schematic structural diagram of the service execution system provided in the present application, and the service execution system may include:
a memory 10 for storing a computer program;
the processor 20, when executing the computer program, may implement the steps of any of the service execution methods described above.
For the introduction of the system provided by the present application, please refer to the above method embodiment, which is not described herein again.
To solve the above problem, the present application further provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, can implement the steps of any one of the service execution methods described above.
The computer-readable storage medium may include: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
For the introduction of the computer-readable storage medium provided in the present application, please refer to the above method embodiments, which are not described herein again.
The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The technical solutions provided by the present application are described in detail above. The principles and embodiments of the present application are explained herein using specific examples, which are provided only to help understand the method and the core idea of the present application. It should be noted that, for those skilled in the art, without departing from the principle of the present application, several improvements and modifications can be made to the present application, and these improvements and modifications also fall into the protection scope of the present application.
Claims (10)
1. A method for performing a service, comprising:
performing self-game by using an original game model to obtain a first game result;
backtracking according to the first game result to obtain a second game result opposite to the first game result, and obtaining a game sample corresponding to the second game result;
optimizing the original game model by using the game samples to obtain an optimized game model;
carrying out model confrontation on the original game model and the optimized game model, and reserving a game model with successful confrontation as the original game model;
judging whether the current model optimization meets preset optimization conditions, if not, returning to the step of utilizing the original game model to carry out self game to obtain a first game result for iterative optimization until the current model optimization meets the preset optimization conditions to obtain an optimal game model;
and executing the target game business by utilizing the optimal game model.
2. The method for executing services according to claim 1, wherein said self-gaming using a primary gaming model to obtain a primary gaming result comprises:
acquiring current game data;
processing the current game data by using the original game model to obtain each legal game action and a probability value corresponding to each legal game action;
and determining a maximum probability value in all the probability values, and executing legal game actions corresponding to the maximum probability value until the game is finished to obtain the first game result.
3. The service execution method of claim 1, wherein after obtaining the game sample corresponding to the second game result, the method further comprises:
judging whether the number of the game samples reaches a preset number or not; if not, returning to the step of utilizing the original game model to carry out self-game to obtain a first game result until the number of the game samples reaches the preset number;
optimizing the original game model by using the game samples to obtain an optimized game model, wherein the optimizing comprises the following steps:
and optimizing the original game model by using the preset number of game samples to obtain the optimized game model.
4. The method of claim 1, wherein the determining whether the current model optimization satisfies a predetermined optimization condition comprises:
counting the optimization times of the current model;
and judging whether the current model optimization times reach preset times or not.
5. A service execution apparatus, comprising:
the initial game module is used for carrying out self-game by utilizing the original game model to obtain a first game result;
the backtracking game module is used for backtracking according to the first game result to obtain a second game result opposite to the first game result and obtain a game sample corresponding to the second game result;
the model optimization module is used for optimizing the original game model by using the game samples to obtain an optimized game model;
the model confrontation module is used for carrying out model confrontation on the original game model and the optimized game model and reserving a game model with successful confrontation as the original game model;
the iterative optimization module is used for judging whether the current model optimization meets a preset optimization condition, if not, returning to the step of utilizing the original game model to carry out self game and obtain a first game result for iterative optimization until the current model optimization meets the preset optimization condition, and obtaining an optimal game model;
and the service execution module is used for executing the target game service by utilizing the optimal game model.
6. The transaction execution device of claim 5, wherein the primary gaming module comprises:
the data acquisition unit is used for acquiring current game data;
the data processing unit is used for processing the current game data by using the original game model to obtain each legal game action and a probability value corresponding to each legal game action;
and the action execution unit is used for determining a maximum probability value in all the probability values and executing legal game actions corresponding to the maximum probability value until the game is finished to obtain the first game result.
7. The service execution apparatus of claim 5, further comprising:
the sample counting module is used for judging whether the number of the game samples reaches a preset number or not after the game samples corresponding to the second game result are obtained; if not, returning to the step of utilizing the original game model to carry out self-game to obtain a first game result until the number of the game samples reaches the preset number;
the model optimization module is specifically configured to optimize the original game model by using the preset number of game samples to obtain the optimized game model.
8. The service execution device according to claim 5, wherein the iterative optimization module is specifically configured to count a number of times of current model optimization; judging whether the current model optimization times reach preset times or not; if not, returning to the step of utilizing the original game model to carry out self game and obtain the first game result to carry out iterative optimization until the current model optimization meets the preset optimization condition to obtain the optimal game model.
9. A business execution system, comprising:
a memory for storing a computer program;
a processor for executing the computer program to implement the steps of the service execution method of any of claims 1 to 4.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, is adapted to carry out the steps of the service execution method according to any one of claims 1 to 4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010540400.7A CN111667075A (en) | 2020-06-12 | 2020-06-12 | Service execution method, device and related equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010540400.7A CN111667075A (en) | 2020-06-12 | 2020-06-12 | Service execution method, device and related equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111667075A true CN111667075A (en) | 2020-09-15 |
Family
ID=72387571
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010540400.7A Pending CN111667075A (en) | 2020-06-12 | 2020-06-12 | Service execution method, device and related equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111667075A (en) |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100048302A1 (en) * | 2008-08-20 | 2010-02-25 | Lutnick Howard W | Game of chance processing apparatus |
CN105426969A (en) * | 2015-08-11 | 2016-03-23 | 浙江大学 | Game strategy generation method of non-complete information |
CN106339582A (en) * | 2016-08-19 | 2017-01-18 | 北京大学深圳研究生院 | Method for automatically generating chess endgame based on machine game technology |
CN107050839A (en) * | 2017-04-14 | 2017-08-18 | 安徽大学 | Amazon chess game playing by machine system based on UCT algorithms |
CN109165683A (en) * | 2018-08-10 | 2019-01-08 | 深圳前海微众银行股份有限公司 | Sample predictions method, apparatus and storage medium based on federation's training |
CN109598342A (en) * | 2018-11-23 | 2019-04-09 | 中国运载火箭技术研究院 | A kind of decision networks model is from game training method and system |
CN109871943A (en) * | 2019-02-20 | 2019-06-11 | 华南理工大学 | A kind of depth enhancing learning method for big two three-wheel arrangement of pineapple playing card |
CN110227263A (en) * | 2019-06-11 | 2019-09-13 | 汕头大学 | A kind of automatic game method of intelligence fighting landlord and system |
CN110404264A (en) * | 2019-07-25 | 2019-11-05 | 哈尔滨工业大学(深圳) | It is a kind of based on the virtually non-perfect information game strategy method for solving of more people, device, system and the storage medium self played a game |
CN110555305A (en) * | 2018-05-31 | 2019-12-10 | 武汉安天信息技术有限责任公司 | Malicious application tracing method based on deep learning and related device |
CN110555517A (en) * | 2019-09-05 | 2019-12-10 | 中国石油大学(华东) | Improved chess game method based on Alphago Zero |
CN110841295A (en) * | 2019-11-07 | 2020-02-28 | 腾讯科技(深圳)有限公司 | Data processing method based on artificial intelligence and related device |
-
2020
- 2020-06-12 CN CN202010540400.7A patent/CN111667075A/en active Pending
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100048302A1 (en) * | 2008-08-20 | 2010-02-25 | Lutnick Howard W | Game of chance processing apparatus |
CN105426969A (en) * | 2015-08-11 | 2016-03-23 | 浙江大学 | Game strategy generation method of non-complete information |
CN106339582A (en) * | 2016-08-19 | 2017-01-18 | 北京大学深圳研究生院 | Method for automatically generating chess endgame based on machine game technology |
CN107050839A (en) * | 2017-04-14 | 2017-08-18 | 安徽大学 | Amazon chess game playing by machine system based on UCT algorithms |
CN110555305A (en) * | 2018-05-31 | 2019-12-10 | 武汉安天信息技术有限责任公司 | Malicious application tracing method based on deep learning and related device |
CN109165683A (en) * | 2018-08-10 | 2019-01-08 | 深圳前海微众银行股份有限公司 | Sample predictions method, apparatus and storage medium based on federation's training |
CN109598342A (en) * | 2018-11-23 | 2019-04-09 | 中国运载火箭技术研究院 | A kind of decision networks model is from game training method and system |
CN109871943A (en) * | 2019-02-20 | 2019-06-11 | 华南理工大学 | A kind of depth enhancing learning method for big two three-wheel arrangement of pineapple playing card |
CN110227263A (en) * | 2019-06-11 | 2019-09-13 | 汕头大学 | A kind of automatic game method of intelligence fighting landlord and system |
CN110404264A (en) * | 2019-07-25 | 2019-11-05 | 哈尔滨工业大学(深圳) | It is a kind of based on the virtually non-perfect information game strategy method for solving of more people, device, system and the storage medium self played a game |
CN110555517A (en) * | 2019-09-05 | 2019-12-10 | 中国石油大学(华东) | Improved chess game method based on Alphago Zero |
CN110841295A (en) * | 2019-11-07 | 2020-02-28 | 腾讯科技(深圳)有限公司 | Data processing method based on artificial intelligence and related device |
Non-Patent Citations (2)
Title |
---|
QIQI JIANG等: "DeltaDou: Expert-level Doudizhu AI through Self-play" * |
闫天伟: "基于深度学习的不完全信息博弈决策的研究与应用" * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109091868B (en) | Method, apparatus, computer equipment and the storage medium that battle behavior determines | |
CN107970608B (en) | Setting method and device of level game, storage medium and electronic device | |
US6468155B1 (en) | Systems and methods to facilitate games of skill for prizes played via a communication network | |
CN109513215B (en) | Object matching method, model training method and server | |
CN112274925B (en) | AI model training method, calling method, server and storage medium | |
Tesauro et al. | Analysis of watson's strategies for playing Jeopardy! | |
CN107335220B (en) | Negative user identification method and device and server | |
CN111569429B (en) | Model training method, model using method, computer device, and storage medium | |
CN109718558B (en) | Game information determination method and device, storage medium and electronic device | |
Hawkins et al. | Dynamic difficulty balancing for cautious players and risk takers | |
Larkey et al. | Skill in games | |
Liu et al. | Automatic generation of tower defense levels using PCG | |
CN111506514B (en) | Intelligent testing method and system applied to elimination game | |
CN111111193A (en) | Game control method and device and electronic equipment | |
CN110458295B (en) | Chess and card level generation method, training method and device based on artificial intelligence | |
KR102342778B1 (en) | Golf simulation device providing personalized avatar for user and operating method thereof | |
CN111507475A (en) | Game behavior decision method, device and related equipment | |
CN111667075A (en) | Service execution method, device and related equipment | |
CN108664842A (en) | A kind of construction method and system of Lip Movement Recognition model | |
WO2023155472A1 (en) | Board-game playing explanation scheme generation method and apparatus, and electronic device, storage medium and program product | |
Isaksen et al. | A statistical analysis of player improvement and single-player high scores | |
CN114870403A (en) | Battle matching method, device, equipment and storage medium in game | |
CN113230644A (en) | Artificial intelligence anti-cheating method for chess | |
CN113946604A (en) | Staged go teaching method and device, electronic equipment and storage medium | |
Perez-Liebana et al. | General video game AI as a tool for game design |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200915 |
|
RJ01 | Rejection of invention patent application after publication |