CN111667075A - Service execution method, device and related equipment - Google Patents

Service execution method, device and related equipment Download PDF

Info

Publication number
CN111667075A
CN111667075A CN202010540400.7A CN202010540400A CN111667075A CN 111667075 A CN111667075 A CN 111667075A CN 202010540400 A CN202010540400 A CN 202010540400A CN 111667075 A CN111667075 A CN 111667075A
Authority
CN
China
Prior art keywords
game
model
optimization
original
result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010540400.7A
Other languages
Chinese (zh)
Inventor
史新新
宛然
魏培培
易平
姜传民
曹佳
周游
刘培锴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Fuyun Network Technology Co ltd
Original Assignee
Hangzhou Fuyun Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Fuyun Network Technology Co ltd filed Critical Hangzhou Fuyun Network Technology Co ltd
Priority to CN202010540400.7A priority Critical patent/CN111667075A/en
Publication of CN111667075A publication Critical patent/CN111667075A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • G06N5/042Backward inferencing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses a service execution method, a device, a system and a computer readable storage medium, wherein the service execution method utilizes the existing supervised learning game model to carry out self-game, corrects the game model according to a game result in the game process, generates corresponding game samples according to the corrected game model for continuous training of the subsequent supervised learning game model, and accordingly, the game level of the supervised learning game model is gradually improved by optimizing the game samples, the model precision is ensured, and the accuracy of the direct result of the game service is further improved.

Description

Service execution method, device and related equipment
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to a service execution method, a service execution apparatus, a service execution system, and a computer-readable storage medium.
Background
Machine gaming has been touted as artificial intelligence fruit flies, and has been at the forefront of artificial intelligence research, for example, poker games are typical non-complete information games, and a long-standing challenge in artificial intelligence research, and many game intelligence systems have reached an advanced player level by replicating human player decisions using supervised learning. However, although the end-to-end game strategy model can be obtained by using the neural network for supervised learning based on the human game data, the performance level of the game strategy model trained based on the human game data is limited by the quality of the training data, because the human player sample contains strategy error data, the quality of the sample data set limits further improvement of the learned network model performance to a certain extent, so that the model precision is lower, and the accuracy of the execution result of the corresponding game service is reduced.
Therefore, how to effectively improve the accuracy of the game model and further improve the accuracy of the game service execution result is a problem to be solved urgently by those skilled in the art.
Disclosure of Invention
The application aims to provide a service execution method, which can effectively improve the accuracy of a game model and further improve the accuracy of a game service execution result; it is another object of the present application to provide a service execution apparatus, system and computer-readable storage medium, which also have the above-mentioned advantageous effects.
In order to solve the foregoing technical problem, in a first aspect, the present application provides a service execution method, including:
performing self-game by using an original game model to obtain a first game result;
backtracking according to the first game result to obtain a second game result opposite to the first game result, and obtaining a game sample corresponding to the second game result;
optimizing the original game model by using the game samples to obtain an optimized game model;
carrying out model confrontation on the original game model and the optimized game model, and reserving a game model with successful confrontation as the original game model;
judging whether the current model optimization meets preset optimization conditions, if not, returning to the step of utilizing the original game model to carry out self game to obtain a first game result for iterative optimization until the current model optimization meets the preset optimization conditions to obtain an optimal game model;
and executing the target game business by utilizing the optimal game model.
Preferably, the self-gaming by using the original gaming model to obtain the first gaming result includes:
acquiring current game data;
processing the current game data by using the original game model to obtain each legal game action and a probability value corresponding to each legal game action;
and determining a maximum probability value in all the probability values, and executing legal game actions corresponding to the maximum probability value until the game is finished to obtain the first game result.
Preferably, after obtaining the game sample corresponding to the second game result, the method further includes:
judging whether the number of the game samples reaches a preset number or not; if not, returning to the step of utilizing the original game model to carry out self-game to obtain a first game result until the number of the game samples reaches the preset number;
optimizing the original game model by using the game samples to obtain an optimized game model, wherein the optimizing comprises the following steps:
and optimizing the original game model by using the preset number of game samples to obtain the optimized game model.
Preferably, the determining whether the current model optimization meets a preset optimization condition includes:
counting the optimization times of the current model;
and judging whether the current model optimization times reach preset times or not.
In a second aspect, the present application further provides a service execution apparatus, including:
the initial game module is used for carrying out self-game by utilizing the original game model to obtain a first game result;
the backtracking game module is used for backtracking according to the first game result to obtain a second game result opposite to the first game result and obtain a game sample corresponding to the second game result;
the model optimization module is used for optimizing the original game model by using the game samples to obtain an optimized game model;
the model confrontation module is used for carrying out model confrontation on the original game model and the optimized game model and reserving a game model with successful confrontation as the original game model;
the iterative optimization module is used for judging whether the current model optimization meets a preset optimization condition, if not, returning to the step of utilizing the original game model to carry out self game and obtain a first game result for iterative optimization until the current model optimization meets the preset optimization condition, and obtaining an optimal game model;
and the service execution module is used for executing the target game service by utilizing the optimal game model.
Preferably, the primary gaming module comprises:
the data acquisition unit is used for acquiring current game data;
the data processing unit is used for processing the current game data by using the original game model to obtain each legal game action and a probability value corresponding to each legal game action;
and the action execution unit is used for determining a maximum probability value in all the probability values and executing legal game actions corresponding to the maximum probability value until the game is finished to obtain the first game result.
Preferably, the service execution method further includes:
the sample counting module is used for judging whether the number of the game samples reaches a preset number or not after the game samples corresponding to the second game result are obtained; if not, returning to the step of utilizing the original game model to carry out self-game to obtain a first game result until the number of the game samples reaches the preset number;
the model optimization module is specifically configured to optimize the original game model by using the preset number of game samples to obtain the optimized game model.
Preferably, the iterative optimization module is specifically configured to count the number of times of current model optimization; judging whether the current model optimization times reach preset times or not; if not, returning to the step of utilizing the original game model to carry out self game and obtain the first game result to carry out iterative optimization until the current model optimization meets the preset optimization condition to obtain the optimal game model.
In a third aspect, the present application further discloses a service execution system, including:
a memory for storing a computer program;
a processor for executing the computer program to implement the steps of any of the service execution methods described above.
In a fourth aspect, the present application also discloses a computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, is adapted to carry out the steps of any of the service execution methods as described above.
The service execution method comprises the steps of utilizing an original game model to carry out a self-game to obtain a first game result; backtracking according to the first game result to obtain a second game result opposite to the first game result, and obtaining a game sample corresponding to the second game result; optimizing the original game model by using the game samples to obtain an optimized game model; carrying out model confrontation on the original game model and the optimized game model, and reserving a game model with successful confrontation as the original game model; judging whether the current model optimization meets preset optimization conditions, if not, returning to the step of utilizing the original game model to carry out self game to obtain a first game result for iterative optimization until the current model optimization meets the preset optimization conditions to obtain an optimal game model; and executing the target game business by utilizing the optimal game model.
Therefore, the service execution method provided by the application utilizes the existing supervised learning game model to carry out self-game, corrects the game model according to the game result in the game process, generates the corresponding game sample according to the corrected game model, and is used for continuous training of the subsequent supervised learning game model, so that the game level of the supervised learning game model is gradually improved by optimizing the game sample, the model precision is ensured, and the accuracy of the direct result of the game service is further improved.
The service execution device, the service execution system, and the computer-readable storage medium provided by the present application all have the above beneficial effects, and are not described herein again.
Drawings
In order to more clearly illustrate the technical solutions in the prior art and the embodiments of the present application, the drawings that are needed to be used in the description of the prior art and the embodiments of the present application will be briefly described below. Of course, the following description of the drawings related to the embodiments of the present application is only a part of the embodiments of the present application, and it will be obvious to those skilled in the art that other drawings can be obtained from the provided drawings without any creative effort, and the obtained other drawings also belong to the protection scope of the present application.
Fig. 1 is a schematic flow chart of a service execution method provided in the present application;
FIG. 2 is a flow chart of a method for optimizing game samples provided herein;
FIG. 3 is a flow chart of a method for optimizing a game model provided herein;
FIG. 4 is a diagram illustrating the trend of the confrontation results of a game model provided in the present application;
fig. 5 is a schematic structural diagram of a service execution device provided in the present application;
fig. 6 is a schematic structural diagram of a service execution device provided in the present application.
Detailed Description
The core of the application is to provide a service execution method, which can effectively improve the accuracy of a game model and further improve the accuracy of a game service execution result; another core of the present application is to provide a service execution apparatus, a system and a computer-readable storage medium, which also have the above-mentioned advantages.
In order to more clearly and completely describe the technical solutions in the embodiments of the present application, the technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application. It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Referring to fig. 1, fig. 1 is a schematic flow chart of a service execution method provided in the present application, including:
s101: performing self-game by using an original game model to obtain a first game result;
this step is intended to perform a self-game using the original game model to obtain a corresponding game result, i.e. the above-mentioned first game result. The original game model is an existing supervised learning game strategy model, a self-game platform is built through the original game model to simulate a game, each game participant utilizes the original game model to make a decision and complete the game, and the first game result is obtained.
As a preferred embodiment, the self-gaming using the original gaming model to obtain the first gaming result may include: acquiring current game data; processing the current game data by using an original game model to obtain each legal game action and a probability value corresponding to each legal game action; and determining the maximum probability value in all the probability values, and executing legal game actions corresponding to the maximum probability value until the game is ended to obtain a first game result.
The preferred embodiment provides a specific method for acquiring a first game result, which includes acquiring current game data, wherein the current game data is situation data of a current game, processing the current game data by using an original game model to acquire legal game actions and probability values corresponding to the legal game actions, and executing the legal game actions corresponding to the maximum probability values to perform the game until the game is finished, so that the first game result can be acquired. For example, for a chess game, for one party participating in the game, the original game model can be used to process the current game data, such as the played information, the card information hidden by the own card information, and the like, so as to obtain the legal playing actions and the corresponding probability values, and further, the players can execute the legal playing actions corresponding to the maximum probability values, so that the players play cards in sequence by using the original game model until the game is finished, and the game result is obtained.
S102: backtracking according to the first game result to obtain a second game result opposite to the first game result, and obtaining a game sample corresponding to the second game result;
the method comprises the following steps of obtaining a second game result opposite to the first game result through backtracking game and obtaining a game sample corresponding to the second game result. Assuming that the game is played by a participant a and a participant B respectively, in the self-game process, the first game result indicates that the participant a wins over the participant B, and the second game result opposite to the first game result indicates that the participant B wins over the participant a, which can be realized by backtracking the game. Specifically, backtracking can be performed from the side with the game failure to reach the upper-layer game decision point, legal game actions completely different from the previous legal game actions are selected and executed at the decision point, the original game model is utilized to continue the game until the game is finished, if all legal game actions at the decision point cannot change the game result, the backtracking is continued upwards to perform the game until a second game result opposite to the first game result is obtained; furthermore, through multiple backtracking, until no legal game actions which can be continuously improved can be found in the specified backtracking layer number and backtracking times, at the moment, corresponding game samples which can change the game result can be obtained based on each improved action, namely the optimized game samples.
S103: optimizing the original game model by using the game sample to obtain an optimized game model;
the step aims at realizing model optimization, the optimized game samples are utilized to optimize the original game model to obtain a corresponding optimized game model, the model optimization process refers to the prior art, and the description is omitted herein.
As a preferred embodiment, after obtaining the game sample corresponding to the second game result, the method may further include: judging whether the number of the game samples reaches a preset number or not; if not, returning to the step of utilizing the original game model to perform self-game to obtain a first game result until the number of the game samples reaches a preset number; the optimizing the original game model by using the game samples to obtain an optimized game model may include: and optimizing the original game model by using a preset number of game samples to obtain an optimized game model.
In order to effectively ensure the optimization effect of the model and improve the performance of the optimized model, the number of the optimized game samples can be preset so as to optimize the original game model by using a certain number of optimized game samples. Therefore, after the game samples corresponding to the second game result are obtained based on S102, the number of the game samples can be counted first, and whether the number of the game samples reaches the preset number is judged, if not, the process returns to S101 to repeat the self game and backtracking game until the preset number of optimized game samples are obtained, so that in S103, the original game model can be optimized by using the preset number of game samples, and an optimized game model with higher performance is obtained. It can be understood that the value of the preset number does not affect the implementation of the technical scheme, and the value is set by a technician according to the actual situation, which is not limited in the present application.
S104: carrying out model confrontation on the original game model and the optimized game model, and reserving the game model with successful confrontation as the original game model;
s105: judging whether the current model optimization meets preset optimization conditions, if not, returning to S101 for iterative optimization, and if so, executing S106;
the method aims to realize model confrontation, namely an original game model and an optimized game model are subjected to confrontation game, the game model with successful confrontation is reserved, and the game model with successful confrontation is set as a new original game model, so that the game model with optimal performance can be obtained by performing cycle iterative training according to iteration conditions. The iteration condition is a preset condition for judging whether the model needs to be continuously subjected to iteration training or not, namely the preset optimization condition is not unique in type, can be the maximum times of the iteration training, can also be a condition that certain model parameters reach certain standard values and the like, and is not limited in the application.
As a preferred embodiment, the above determining whether the current model optimization satisfies the preset optimization condition may include: counting the optimization times of the current model; and judging whether the optimization times of the current model reach preset times or not.
The method comprises the following steps of providing a specific type of preset optimization condition, namely presetting the highest times of iterative training, namely the preset times, counting the optimization times of a current model after each model confrontation, judging whether the optimization times reach the preset times, if not, continuing the iterative training until the optimization times of the current model reach the preset times, and obtaining the optimal game model. The specific value of the preset times does not affect the implementation of the technical scheme, and the technical personnel can set the value according to the actual situation, which is not limited by the application.
S106: and taking the original game model as an optimal game model, and executing the target game service by using the optimal game model.
The step aims to realize the execution of the game service, namely when the target game service is obtained, the optimal game model is directly called to carry out the game, and the corresponding game service execution result can be obtained. The target game service is the received game service to be executed.
It should be noted that, the above-mentioned S101 to S105 are training processes of an optimal game model, and in an actual game service execution process, the above-mentioned model training process only needs to be executed once, and then the model is directly called when the game service is received again. In addition, the memorability correction and optimization of the optimal game model can be continued according to the execution result of the game service, so that a game model with better performance can be obtained.
Therefore, the service execution method provided by the application utilizes the existing supervised learning game model to carry out self-game, corrects the game model according to the game result in the game process, generates the corresponding game sample according to the corrected game model, and is used for continuous training of the subsequent supervised learning game model, so that the game level of the supervised learning game model is gradually improved by optimizing the game sample, the model precision is ensured, and the accuracy of the direct result of the game service is further improved.
On the basis of the above embodiments, the embodiment of the present application provides a more specific service execution method taking the game of the field-fighting primary game as an example, and the specific implementation flow is as follows:
(1) self-gaming simulated games
Learning card-playing strategy model p based on existing supervision of fighting with land ownerθBuilding a self-gaming platform simulation game, each player using pθMaking a decision, inputting the situation data of the current game state s into the model, and outputting the probability distribution p of all legal actions in the current game stateθAnd (a | s), each player respectively picks the legal action with the highest probability to play until the game is ended.
(2) Performing backtracking improvement decision to generate optimized sample
Referring to fig. 2, fig. 2 is a flowchart of a game sample optimization method provided in the present application, based on the above self-game process, tracing back from the losing player to the previous player decision point, picking a different card-playing action from the previous one at the decision point with multiple card-playing modes, and continuing to use p from this stepθSimulating the game until the game is finished, if all the card-playing actions of the decision point can not change the game result, continuously backtracking upwards, selecting different actions for simulation until the game result is changed, and recording the improvement action; further, repeating the backtracking process from the new output side until no continuously improved methods can be found within the specified backtracking layer number (set as 8 layers) and the maximum backtracking times (namely the iterative backtracking times in the single game, set as 400), and ending the backtracking of the game; finally, based on the improvement action, a new optimized training sample is generated for each step of the game and stored in the sample container M.
(3) Generating a number of game samples
Based on pθThe multiple games are played against so that the sample size of the sample container reaches the set number (the first training sample is set to 50 ten thousand, and the sample number is set to 2.5 thousand in the subsequent iterative training process).
(4) Model training
Self-gaming sample continuation pair p based on optimizationθTraining is carried out, and after the number of times of model training reaches the specified number (set to be 1000 times), a new supervised learning game strategy model p 'is obtained'θThis is one strategy model iteration.
(5) Model evaluation
Game strategy model p for supervised learningθAnd p'θConducting game fight, and determining winning model as new pθAnd using the new pθAnd continuing to play the self-game simulation game, and repeating the steps.
Therefore, the above processes are performed circularly, as shown in fig. 3, fig. 3 is a flowchart of a game model optimization method provided by the present application, until the performance of a supervised learning game policy model is not improved, an optimal game model is obtained.
The model evaluation can be realized by adopting the following method: selecting 1000 deals of ground-fighting games (the card distribution result and the base card of each game are known) as a fixed test card game library, wherein the 1000 deals are used for each model evaluation; because the landlord has three roles, on each card game, the two strategy models are combined according to the roles and the cards to carry out 6 times of confrontation games, and finally 6000 games are carried out, and meanwhile, the winning rates of the two strategy models are recorded, as shown in fig. 4, fig. 4 is a competition result trend chart of the game model provided by the application.
Therefore, the service execution method provided by the embodiment of the application utilizes the existing supervised learning game model to carry out self-game, corrects the game model according to the game result in the game process, generates the corresponding game sample according to the corrected game model, and is used for continuous training of the subsequent supervised learning game model, so that the game level of the supervised learning game model is gradually improved by optimizing the game sample, the model precision is ensured, and the accuracy of the direct result of the game service is further improved.
To solve the above technical problem, the present application further provides a service execution device, please refer to fig. 5, where fig. 5 is a schematic structural diagram of the service execution device provided in the present application, and the schematic structural diagram includes:
the initial game module 1 is used for carrying out self-game by utilizing an original game model to obtain a first game result;
the backtracking game module 2 is used for backtracking according to the first game result to obtain a second game result opposite to the first game result and obtain a game sample corresponding to the second game result;
the model optimization module 3 is used for optimizing the original game model by using the game samples to obtain an optimized game model;
the model confrontation module 4 is used for carrying out model confrontation on the original game model and the optimized game model and reserving a game model with successful confrontation as the original game model;
the iterative optimization module 5 is used for judging whether the current model optimization meets a preset optimization condition, if not, returning to the step of utilizing the original game model to perform self game and obtain a first game result to perform iterative optimization until the current model optimization meets the preset optimization condition, and obtaining an optimal game model;
and the service execution module 6 is used for executing the target game service by utilizing the optimal game model.
Therefore, the service execution device provided by the embodiment of the application utilizes the existing supervised learning game model to perform self-game, corrects the game model according to the game result in the game process, generates the corresponding game sample according to the corrected game model, and is used for continuous training of the subsequent supervised learning game model, so that the game level of the supervised learning game model is gradually improved by optimizing the game sample, the model precision is ensured, and the accuracy of the direct result of the game service is further improved.
As a preferred embodiment, the above-mentioned primary gaming module 1 may comprise:
the data acquisition unit is used for acquiring current game data;
the data processing unit is used for processing the current game data by using the original game model to obtain each legal game action and a probability value corresponding to each legal game action;
and the action execution unit is used for determining a maximum probability value in all the probability values and executing legal game actions corresponding to the maximum probability value until the game is finished to obtain the first game result.
As a preferred embodiment, the service execution device may further include a sample statistics module, configured to determine, after the game samples corresponding to the second game result are obtained, whether the number of the game samples reaches a preset number; if not, returning to the step of utilizing the original game model to carry out self-game to obtain a first game result until the number of the game samples reaches the preset number;
the model optimization module 3 may be specifically configured to optimize the original game model by using the preset number of game samples to obtain the optimized game model.
As a preferred embodiment, the iterative optimization module 5 may be specifically configured to count the number of times of current model optimization; judging whether the current model optimization times reach preset times or not; if not, returning to the step of utilizing the original game model to carry out self game and obtain the first game result to carry out iterative optimization until the current model optimization meets the preset optimization condition to obtain the optimal game model.
For the introduction of the apparatus provided in the present application, please refer to the above method embodiments, which are not described herein again.
To solve the above technical problem, the present application further provides a service execution system, please refer to fig. 6, where fig. 6 is a schematic structural diagram of the service execution system provided in the present application, and the service execution system may include:
a memory 10 for storing a computer program;
the processor 20, when executing the computer program, may implement the steps of any of the service execution methods described above.
For the introduction of the system provided by the present application, please refer to the above method embodiment, which is not described herein again.
To solve the above problem, the present application further provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, can implement the steps of any one of the service execution methods described above.
The computer-readable storage medium may include: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
For the introduction of the computer-readable storage medium provided in the present application, please refer to the above method embodiments, which are not described herein again.
The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The technical solutions provided by the present application are described in detail above. The principles and embodiments of the present application are explained herein using specific examples, which are provided only to help understand the method and the core idea of the present application. It should be noted that, for those skilled in the art, without departing from the principle of the present application, several improvements and modifications can be made to the present application, and these improvements and modifications also fall into the protection scope of the present application.

Claims (10)

1. A method for performing a service, comprising:
performing self-game by using an original game model to obtain a first game result;
backtracking according to the first game result to obtain a second game result opposite to the first game result, and obtaining a game sample corresponding to the second game result;
optimizing the original game model by using the game samples to obtain an optimized game model;
carrying out model confrontation on the original game model and the optimized game model, and reserving a game model with successful confrontation as the original game model;
judging whether the current model optimization meets preset optimization conditions, if not, returning to the step of utilizing the original game model to carry out self game to obtain a first game result for iterative optimization until the current model optimization meets the preset optimization conditions to obtain an optimal game model;
and executing the target game business by utilizing the optimal game model.
2. The method for executing services according to claim 1, wherein said self-gaming using a primary gaming model to obtain a primary gaming result comprises:
acquiring current game data;
processing the current game data by using the original game model to obtain each legal game action and a probability value corresponding to each legal game action;
and determining a maximum probability value in all the probability values, and executing legal game actions corresponding to the maximum probability value until the game is finished to obtain the first game result.
3. The service execution method of claim 1, wherein after obtaining the game sample corresponding to the second game result, the method further comprises:
judging whether the number of the game samples reaches a preset number or not; if not, returning to the step of utilizing the original game model to carry out self-game to obtain a first game result until the number of the game samples reaches the preset number;
optimizing the original game model by using the game samples to obtain an optimized game model, wherein the optimizing comprises the following steps:
and optimizing the original game model by using the preset number of game samples to obtain the optimized game model.
4. The method of claim 1, wherein the determining whether the current model optimization satisfies a predetermined optimization condition comprises:
counting the optimization times of the current model;
and judging whether the current model optimization times reach preset times or not.
5. A service execution apparatus, comprising:
the initial game module is used for carrying out self-game by utilizing the original game model to obtain a first game result;
the backtracking game module is used for backtracking according to the first game result to obtain a second game result opposite to the first game result and obtain a game sample corresponding to the second game result;
the model optimization module is used for optimizing the original game model by using the game samples to obtain an optimized game model;
the model confrontation module is used for carrying out model confrontation on the original game model and the optimized game model and reserving a game model with successful confrontation as the original game model;
the iterative optimization module is used for judging whether the current model optimization meets a preset optimization condition, if not, returning to the step of utilizing the original game model to carry out self game and obtain a first game result for iterative optimization until the current model optimization meets the preset optimization condition, and obtaining an optimal game model;
and the service execution module is used for executing the target game service by utilizing the optimal game model.
6. The transaction execution device of claim 5, wherein the primary gaming module comprises:
the data acquisition unit is used for acquiring current game data;
the data processing unit is used for processing the current game data by using the original game model to obtain each legal game action and a probability value corresponding to each legal game action;
and the action execution unit is used for determining a maximum probability value in all the probability values and executing legal game actions corresponding to the maximum probability value until the game is finished to obtain the first game result.
7. The service execution apparatus of claim 5, further comprising:
the sample counting module is used for judging whether the number of the game samples reaches a preset number or not after the game samples corresponding to the second game result are obtained; if not, returning to the step of utilizing the original game model to carry out self-game to obtain a first game result until the number of the game samples reaches the preset number;
the model optimization module is specifically configured to optimize the original game model by using the preset number of game samples to obtain the optimized game model.
8. The service execution device according to claim 5, wherein the iterative optimization module is specifically configured to count a number of times of current model optimization; judging whether the current model optimization times reach preset times or not; if not, returning to the step of utilizing the original game model to carry out self game and obtain the first game result to carry out iterative optimization until the current model optimization meets the preset optimization condition to obtain the optimal game model.
9. A business execution system, comprising:
a memory for storing a computer program;
a processor for executing the computer program to implement the steps of the service execution method of any of claims 1 to 4.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, is adapted to carry out the steps of the service execution method according to any one of claims 1 to 4.
CN202010540400.7A 2020-06-12 2020-06-12 Service execution method, device and related equipment Pending CN111667075A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010540400.7A CN111667075A (en) 2020-06-12 2020-06-12 Service execution method, device and related equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010540400.7A CN111667075A (en) 2020-06-12 2020-06-12 Service execution method, device and related equipment

Publications (1)

Publication Number Publication Date
CN111667075A true CN111667075A (en) 2020-09-15

Family

ID=72387571

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010540400.7A Pending CN111667075A (en) 2020-06-12 2020-06-12 Service execution method, device and related equipment

Country Status (1)

Country Link
CN (1) CN111667075A (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100048302A1 (en) * 2008-08-20 2010-02-25 Lutnick Howard W Game of chance processing apparatus
CN105426969A (en) * 2015-08-11 2016-03-23 浙江大学 Game strategy generation method of non-complete information
CN106339582A (en) * 2016-08-19 2017-01-18 北京大学深圳研究生院 Method for automatically generating chess endgame based on machine game technology
CN107050839A (en) * 2017-04-14 2017-08-18 安徽大学 Amazon chess game playing by machine system based on UCT algorithms
CN109165683A (en) * 2018-08-10 2019-01-08 深圳前海微众银行股份有限公司 Sample predictions method, apparatus and storage medium based on federation's training
CN109598342A (en) * 2018-11-23 2019-04-09 中国运载火箭技术研究院 A kind of decision networks model is from game training method and system
CN109871943A (en) * 2019-02-20 2019-06-11 华南理工大学 A kind of depth enhancing learning method for big two three-wheel arrangement of pineapple playing card
CN110227263A (en) * 2019-06-11 2019-09-13 汕头大学 A kind of automatic game method of intelligence fighting landlord and system
CN110404264A (en) * 2019-07-25 2019-11-05 哈尔滨工业大学(深圳) It is a kind of based on the virtually non-perfect information game strategy method for solving of more people, device, system and the storage medium self played a game
CN110555305A (en) * 2018-05-31 2019-12-10 武汉安天信息技术有限责任公司 Malicious application tracing method based on deep learning and related device
CN110555517A (en) * 2019-09-05 2019-12-10 中国石油大学(华东) Improved chess game method based on Alphago Zero
CN110841295A (en) * 2019-11-07 2020-02-28 腾讯科技(深圳)有限公司 Data processing method based on artificial intelligence and related device

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100048302A1 (en) * 2008-08-20 2010-02-25 Lutnick Howard W Game of chance processing apparatus
CN105426969A (en) * 2015-08-11 2016-03-23 浙江大学 Game strategy generation method of non-complete information
CN106339582A (en) * 2016-08-19 2017-01-18 北京大学深圳研究生院 Method for automatically generating chess endgame based on machine game technology
CN107050839A (en) * 2017-04-14 2017-08-18 安徽大学 Amazon chess game playing by machine system based on UCT algorithms
CN110555305A (en) * 2018-05-31 2019-12-10 武汉安天信息技术有限责任公司 Malicious application tracing method based on deep learning and related device
CN109165683A (en) * 2018-08-10 2019-01-08 深圳前海微众银行股份有限公司 Sample predictions method, apparatus and storage medium based on federation's training
CN109598342A (en) * 2018-11-23 2019-04-09 中国运载火箭技术研究院 A kind of decision networks model is from game training method and system
CN109871943A (en) * 2019-02-20 2019-06-11 华南理工大学 A kind of depth enhancing learning method for big two three-wheel arrangement of pineapple playing card
CN110227263A (en) * 2019-06-11 2019-09-13 汕头大学 A kind of automatic game method of intelligence fighting landlord and system
CN110404264A (en) * 2019-07-25 2019-11-05 哈尔滨工业大学(深圳) It is a kind of based on the virtually non-perfect information game strategy method for solving of more people, device, system and the storage medium self played a game
CN110555517A (en) * 2019-09-05 2019-12-10 中国石油大学(华东) Improved chess game method based on Alphago Zero
CN110841295A (en) * 2019-11-07 2020-02-28 腾讯科技(深圳)有限公司 Data processing method based on artificial intelligence and related device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
QIQI JIANG等: "DeltaDou: Expert-level Doudizhu AI through Self-play" *
闫天伟: "基于深度学习的不完全信息博弈决策的研究与应用" *

Similar Documents

Publication Publication Date Title
CN109091868B (en) Method, apparatus, computer equipment and the storage medium that battle behavior determines
CN107970608B (en) Setting method and device of level game, storage medium and electronic device
US6468155B1 (en) Systems and methods to facilitate games of skill for prizes played via a communication network
CN109513215B (en) Object matching method, model training method and server
CN112274925B (en) AI model training method, calling method, server and storage medium
Tesauro et al. Analysis of watson's strategies for playing Jeopardy!
CN107335220B (en) Negative user identification method and device and server
CN111569429B (en) Model training method, model using method, computer device, and storage medium
CN109718558B (en) Game information determination method and device, storage medium and electronic device
Hawkins et al. Dynamic difficulty balancing for cautious players and risk takers
Larkey et al. Skill in games
Liu et al. Automatic generation of tower defense levels using PCG
CN111506514B (en) Intelligent testing method and system applied to elimination game
CN111111193A (en) Game control method and device and electronic equipment
CN110458295B (en) Chess and card level generation method, training method and device based on artificial intelligence
KR102342778B1 (en) Golf simulation device providing personalized avatar for user and operating method thereof
CN111507475A (en) Game behavior decision method, device and related equipment
CN111667075A (en) Service execution method, device and related equipment
CN108664842A (en) A kind of construction method and system of Lip Movement Recognition model
WO2023155472A1 (en) Board-game playing explanation scheme generation method and apparatus, and electronic device, storage medium and program product
Isaksen et al. A statistical analysis of player improvement and single-player high scores
CN114870403A (en) Battle matching method, device, equipment and storage medium in game
CN113230644A (en) Artificial intelligence anti-cheating method for chess
CN113946604A (en) Staged go teaching method and device, electronic equipment and storage medium
Perez-Liebana et al. General video game AI as a tool for game design

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200915

RJ01 Rejection of invention patent application after publication