CN112274925B - AI model training method, calling method, server and storage medium - Google Patents

AI model training method, calling method, server and storage medium Download PDF

Info

Publication number
CN112274925B
CN112274925B CN202011176373.6A CN202011176373A CN112274925B CN 112274925 B CN112274925 B CN 112274925B CN 202011176373 A CN202011176373 A CN 202011176373A CN 112274925 B CN112274925 B CN 112274925B
Authority
CN
China
Prior art keywords
model
agent
sample data
capability
agents
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011176373.6A
Other languages
Chinese (zh)
Other versions
CN112274925A (en
Inventor
朱展图
周正
李宏亮
刘永升
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Super Parameter Technology Shenzhen Co ltd
Original Assignee
Super Parameter Technology Shenzhen Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Super Parameter Technology Shenzhen Co ltd filed Critical Super Parameter Technology Shenzhen Co ltd
Priority to CN202011176373.6A priority Critical patent/CN112274925B/en
Publication of CN112274925A publication Critical patent/CN112274925A/en
Application granted granted Critical
Publication of CN112274925B publication Critical patent/CN112274925B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/55Controlling game characters or game objects based on the game progress
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F1/00Card games
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F1/00Card games
    • A63F2001/008Card games adapted for being playable on a screen

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses an AI model training method, a calling method, a server and a storage medium, wherein the method comprises the following steps: acquiring a plurality of groups of first sample data; inputting each group of first sample data into an AI model, and performing iterative training on the AI model based on supervised learning until the AI model converges to obtain an AI model of an Agent corresponding to each of the different levels; randomly initializing the AI model to perform sample generation operation to obtain second sample data; and inputting the second sample data in a back propagation manner to the AI model, performing iterative training on the AI model based on reinforcement learning, taking a training result as new second sample data, and circularly executing the step of inputting the second sample data in the back propagation manner to the AI model, and performing iterative training on the AI model based on reinforcement learning until the AI model converges to obtain AI models of agents corresponding to other levels. Therefore, the accuracy of the AI model is improved.

Description

AI model training method, calling method, server and storage medium
Technical Field
The application relates to the technical field of artificial intelligence, in particular to an AI model training method, an AI model calling method, a server and a storage medium.
Background
With the rapid development of artificial intelligence (Artificial Intelligence, AI) technology, the artificial intelligence technology is widely applied to various fields, and at present, in the field of game entertainment, a game between a virtual Agent and a real user in a chess game can be realized through the artificial intelligence technology, and the top professional player can be won. However, card games are often played by multiple players, and cards among players of the game are not known to each other, so developing an AI model corresponding to a card game Agent has a greater challenge.
At present, an AI model is mainly realized based on a deep neural network (Deep Neural Network, DNN), and is usually trained based on data of each party alone, so that the data cannot be fully utilized, and the accuracy of the AI model is poor. Therefore, how to improve the accuracy of AI models is a current urgent problem to be solved.
Disclosure of Invention
The embodiment of the application provides an AI model training method, a calling method, a server and a storage medium, which can improve the accuracy of an AI model.
In a first aspect, an embodiment of the present application provides an AI model training method, including:
acquiring a plurality of groups of first sample data, wherein the plurality of groups of first sample data are data corresponding to a plurality of users with different levels;
Inputting each group of first sample data into an AI model, and performing iterative training on the AI model based on supervised learning until the AI model converges to obtain an AI model of an Agent corresponding to each of the different levels;
randomly initializing the AI model to perform sample generation operation to obtain second sample data;
and inputting the second sample data in a back propagation manner to the AI model, performing iterative training on the AI model based on reinforcement learning, taking a training result as new second sample data, and circularly executing the step of inputting the second sample data in the back propagation manner to the AI model, and performing iterative training on the AI model based on reinforcement learning until the AI model converges to obtain AI models of agents corresponding to other levels except the different levels.
In a second aspect, an embodiment of the present application further provides an AI model invoking method, including:
acquiring a first initial evaluation parameter corresponding to an Agent to be evaluated;
according to the first initial evaluation parameters, selecting AI models of a plurality of first type reference agents matched with the first initial evaluation parameters;
and calling AI models of the plurality of first-type reference agents, and controlling the to-be-evaluated agents and the plurality of first-type reference agents to execute corresponding contrast operation so as to evaluate the capability of the to-be-evaluated agents.
In a third aspect, an embodiment of the present application further provides a server, where the server includes a processor, a memory, and a computer program stored on the memory and executable by the processor, where the memory stores an AI model, and where the computer program when executed by the processor implements an AI model training method as described above; alternatively, the AI model invoking method as described above is implemented.
In a fourth aspect, embodiments of the present application further provide a computer readable storage medium, where the computer readable storage medium is configured to store a computer program, where the computer program when executed by a processor causes the processor to implement the AI model training method described above; alternatively, the AI model invoking method described above is implemented.
The embodiment of the application provides an AI model training method, a calling method, a server and a storage medium, wherein multiple groups of first sample data corresponding to multiple users in different levels are obtained, each group of first sample data is input into an AI model, and the AI model is subjected to iterative training based on supervised learning until the AI model converges, so that the AI model of an Agent corresponding to each level in different levels is obtained; and randomly initializing the AI model to perform sample generation operation, obtaining second sample data, reversely transmitting the second sample data to the AI model, performing iterative training on the AI model based on reinforcement learning, taking a training result as new second sample data, and circularly executing the operation of reversely transmitting the second sample data to the AI model, and performing iterative training on the AI model based on reinforcement learning until the AI model is converged, thereby obtaining the AI models of agents corresponding to other levels except different levels. By fully utilizing different data to perform AI model training, the accuracy of the AI model is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of the steps of an AI model training method in accordance with one embodiment of the present application;
FIG. 2 is a schematic flow chart of steps of another AI model training method provided in one embodiment of the application;
FIG. 3 is a schematic flowchart of the steps of an AI model invoking method provided in one embodiment of the application;
FIG. 4 is a schematic flowchart of the steps for controlling the agents to be evaluated and the plurality of reference agents of the first type to perform corresponding checking operations according to an embodiment of the present application;
FIG. 5 is a schematic diagram of an AI tournament assessment procedure provided in an embodiment of the present application;
FIG. 6 is a schematic flow chart of steps of another AI model invocation method provided in an embodiment of the application;
FIG. 7 is a schematic flowchart of the steps for controlling a plurality of second-type benchmark agents and the newly registered users to perform corresponding reconciliation operations according to an embodiment of the present application;
FIG. 8 is a schematic diagram of a player user tournament assessment process provided in an embodiment of the present application;
fig. 9 is a schematic block diagram of a server provided in an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
The flow diagrams depicted in the figures are merely illustrative and not necessarily all of the elements and operations/steps are included or performed in the order described. For example, some operations/steps may be further divided, combined, or partially combined, so that the order of actual execution may be changed according to actual situations.
It is to be understood that the terminology used in the description of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should also be understood that the term "and/or" as used in this specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
Some embodiments of the present application are described in detail below with reference to the accompanying drawings. The following embodiments and features of the embodiments may be combined with each other without conflict.
At present, an AI model is mainly realized based on a deep neural network (Deep Neural Network, DNN), and is usually trained based on data of each party alone, so that the data cannot be fully utilized, and the accuracy of the AI model is poor.
In order to solve the above problems, embodiments of the present application provide an AI model training method, an invocation method, a server, and a storage medium, which can improve the accuracy of an AI model. The AI model training method and the calling method can be applied to a server, and the server can be a single server or a server cluster consisting of a plurality of servers.
Referring to fig. 1, fig. 1 is a flowchart of an AI model training method according to an embodiment of the disclosure.
As shown in fig. 1, the AI model training method specifically includes steps S101 to S104.
S101, acquiring a plurality of groups of first sample data, wherein the plurality of groups of first sample data are data corresponding to a plurality of users with different levels.
Taking the game as an example, the game level of each player user of the game is irregular and has different levels, such as the game level of a novice player user and a senior player user are obviously different. For player users of various levels, data corresponding to a plurality of player user games of each level is acquired as sample data. For example, a plurality of player users at different levels play a multi-game, and sample data corresponding to the plurality of player users at each level is obtained. For convenience of distinguishing description, sample data corresponding to a player user will be hereinafter referred to as first sample data. Based on player users of different levels, multiple sets of first sample data corresponding to the player users of multiple levels are obtained.
Illustratively, the first sample data includes global information, player information, and the like, wherein the player information is in turn divided into current player information and all player information. Global information includes the number of players in the authorities, the player's location information, the number of chips in the pool, the public cards, the current turn, etc., current player information includes the number of chips, hands, valid actions, valid bets, location information, the number of chips deposited, historical actions, historical bets, etc., and all player information includes the number of chips, location information, the number of chips deposited, historical actions, historical bets, whether to leave a field, whether to all-in, etc.
For example, if the used player data is derived from the platform a, the levels of the player users on the platform a are five segments from low to high, the data of the plurality of player users of the five segments are respectively screened out, and the data of the plurality of player users of each segment is taken as a set of first sample data, so that five sets of first sample data are obtained.
S102, inputting each group of first sample data into an AI model, and performing iterative training on the AI model based on supervised learning until the AI model converges to obtain the AI model of the Agent corresponding to each of the different levels.
The AI model is implemented based on a neural network model, which may be set based on actual situations, and this application is not limited specifically. The training of the AI model mainly uses a supervised learning method, after a plurality of groups of first sample data are obtained, each group of first sample data is input into the AI model, and the AI model is iteratively trained based on the supervised learning until the AI model converges, so that the AI model of the Agent with the same user level as that corresponding to each group of first sample data is obtained. The Agent is an Agent which is in a complex dynamic environment, autonomously perceives environment information, autonomously takes action and realizes a series of preset targets or tasks.
The feature extraction is performed on each group of first sample data to obtain corresponding feature vectors, then the feature vectors corresponding to the first sample data are input into the AI model, and the AI model is iteratively trained based on supervised learning until the AI model converges.
Illustratively, the feature vector corresponding to the first sample data includes a global information feature vector, a player information feature vector, and the like, wherein the player information is further divided into a current player information feature vector and all player information feature vectors. Global information feature vectors include an office player count feature vector, a player's location information feature vector, a pool number of chips feature vector, a common card feature vector, a current round feature vector, etc., current player information includes a number of chips feature vector, a hand feature vector, a valid behavior feature vector, a valid betting feature vector, a location information feature vector, a betting number of chips feature vector, a historical behavior feature vector, a historical betting feature vector, etc., and all player information includes a number of chips feature vector, a location information feature vector, a betting number of chips feature vector, a historical behavior feature vector, a historical betting feature vector, a leave feature vector, an all-in feature vector, etc.
For example, taking the example that the level of the player user on the platform a is five segments from low to high, the effective behaviors of a plurality of player users with five segments, such as fold, check, raise, call, all-in, the number of betting codes and other five groups of first sample data, extracting the characteristics of each group of first sample data, inputting the obtained corresponding characteristic vector into an AI model, and performing iterative training until the AI model converges, thereby obtaining the corresponding AI model representing the average level Agent of each segment in the five segments.
The method comprises the steps of obtaining a model loss value loss corresponding to an AI model during training, determining whether the model loss value loss is smaller than or equal to a preset loss value threshold, and determining that the AI model converges if the model loss value loss is smaller than or equal to the preset loss value threshold; otherwise, if the model loss value loss is greater than the preset loss value threshold, determining that the AI model is not converged.
Wherein, the model loss value loss is calculated according to the following formula (1):
loss=(p 1 *log(p 2 ))+(1-p 1 )*log(1-p 2 )+(q 1 -q 2 ) 2 (1)
wherein p is 1 Is the user action, i.e. the probability, p, corresponding to the user's effective behavior 2 Is the probability corresponding to the output behavior, q 1 Is the number of chips corresponding to the user, q 2 Is the output code number.
That is, after each iterative training, the corresponding model loss value loss is calculated by adopting the above formula, and if the corresponding model loss value loss is greater than the preset loss value threshold, it is determined that the AI model has not converged yet, and the iterative training is continued on the AI model. And determining the AI model to converge until the model loss value loss is smaller than or equal to a preset loss value threshold, namely finishing the current AI model training.
S103, randomly initializing the AI model to perform sample generation operation, and acquiring second sample data.
In addition to training AI models corresponding to agents representing the average level of each segment using supervised learning, AI models corresponding to agents that exceed the average level of the player user segments are trained. To train the AI model of the Agent corresponding to the other level, the AI model is initialized at random, and a sample generation operation, such as game play, is performed, so that data generated by the sample generation operation is acquired. In order to facilitate the distinguishing description of the first sample data corresponding to the user, the data generated by the sample generation operation will be hereinafter referred to as second sample data.
By way of example, the self-play mode, that is, the Agent self-play mode, can be completely separated from the data of the user, and the corresponding sample generation operation is performed based on the AI model to generate the second sample data required by the AI model training.
S104, reversely propagating and inputting the second sample data to the AI model, carrying out iterative training on the AI model based on reinforcement learning, taking a training result as new second sample data, and circularly executing the step of reversely propagating and inputting the second sample data to the AI model, and carrying out iterative training on the AI model based on reinforcement learning until the AI model is converged, so as to obtain the AI models of agents corresponding to other levels except the different levels.
In some embodiments, before inputting the second sample data back-propagation to the AI model, it may include: caching the second sample data in a Redis server; the back-propagating the second sample data to the AI model, iteratively training the AI model based on reinforcement learning, may include: and acquiring the cached second sample data from the Redis server, inputting the second sample data in a back propagation manner to the AI model, and performing iterative training on the AI model based on reinforcement learning.
Illustratively, the reinforcement learning AI model based training is divided into three parts, actor, redis store, and Learner. The Actor section is responsible for simulating the second sample data required for self-play generation training. And caching the second sample data generated by the Actor through a Redis server, and waiting for consumption of the Learner. The Learner consumes the second sample data stored by the Redis server. And (3) obtaining the AI models of the agents corresponding to other levels except the different levels until the AI models are converged, namely obtaining the AI models corresponding to the agents exceeding the average level of the user segments of the player.
Illustratively, the PPO algorithm is used for reinforcement training, wherein the PPO algorithm iteratively optimizes the cost function and the strategy of the neural network through the reward signal generated by the environment, and the level of reinforcement learning Agent is continuously enhanced along with the increase of the training duration. For example, in addition to the AI models corresponding to the five-segment average level agents in the above-listed examples, AI models corresponding to the sixth-segment Agent and the seventh-segment Agent are trained based on reinforcement learning.
Through supervised learning and reinforcement learning training, AI models corresponding to agents with multiple levels are obtained, so that users with various levels can be comprehensively covered, including users exceeding the average level of the highest section of the platform, and the comprehensiveness of evaluation can be ensured.
In some embodiments, as shown in fig. 2, the AI model training method further includes steps S105 to S107.
S105, controlling a plurality of agents to execute corresponding contrast operation, and obtaining contrast results corresponding to each Agent.
S106, updating evaluation parameters corresponding to each Agent according to the game result corresponding to each Agent, wherein the evaluation parameters comprise capability confidence.
This embodiment is to calculate an integral of the ability to stably evaluate each AI model trained in the above embodiment. Each user corresponds to a respective evaluation parameter, wherein the evaluation parameter includes, but is not limited to, a capability evaluation value representing an average level capability of the user, a capability confidence level representing an uncertainty of the capability of the user, and the like. Illustratively, for each user, a normal distribution is used to represent the user's ability, u represents the average ability of the user under the current assessment, i.e., the ability assessment value, and the larger u is, the greater the chance that the user will perform better in the game; sigma represents the uncertainty of the user's ability assessment, i.e., the ability confidence, the smaller sigma, the higher the confidence that the assessment of the user is. As the number of user evaluations increases, σ becomes progressively smaller, i.e., the confidence in the user's evaluation increases. The initial u value and sigma value corresponding to each AI model are preset. For example, initial values u=3 and σ=1 for each AI model are set.
Selecting AI models of the agents from the trained AI models corresponding to the agents, and controlling the A' sThe agent performs the corresponding office operation. The AI model of the random number of agents is extracted randomly, the random number of agents is controlled to execute corresponding checking operation, and a checking result corresponding to each Agent is obtained. The game results include a first result (such as victory) and a second result (such as failure). For example, a game of 6-player table playing cards is played, and after the game is finished, the surplus chips and the deficiency chips of 6 player users can be obtained, so that C can be obtained 6 2 And a result of the game of winning or losing relation.
And updating the evaluation parameters corresponding to each Agent according to the corresponding exchange result of each Agent. In some embodiments, updating the evaluation parameter corresponding to each Agent according to the game result corresponding to each Agent may include: if the game result corresponding to the Agent is a first result, updating the capability assessment value corresponding to the Agent according to a first preset capability assessment value updating formula, and updating the capability confidence coefficient corresponding to the Agent according to a first preset capability confidence coefficient updating formula; if the game result corresponding to the Agent is a second result, updating the capability assessment value corresponding to the Agent according to a second preset capability assessment value updating formula, and updating the capability confidence coefficient corresponding to the Agent according to a second preset capability confidence coefficient updating formula.
Taking an AI model of any Agent as an example, if an Agent's result of the game operation is a first result, such as a winning result, updating a capability evaluation value corresponding to the Agent according to a first preset capability evaluation value updating formula, and updating the capability confidence coefficient corresponding to the Agent according to a first preset capability confidence coefficient updating formula, where the first preset capability evaluation value updating formula is shown in formula (2), and the first preset capability confidence coefficient updating formula is shown in formula (3):
if the game result of the Agent game operation is a second result, such as a failed result, updating the capability assessment value corresponding to the Agent according to a second preset capability assessment value updating formula, and updating the capability confidence coefficient corresponding to the Agent according to a second preset capability confidence coefficient updating formula, wherein the second preset capability assessment value updating formula is shown as a formula (4), and the second preset capability confidence coefficient updating formula is shown as a formula (5):
in the above formulas (2) to (9):
t=u 1 -u 2 (6)
w(t)=v(t)(v(t)+t) (9)
wherein the sum of N (t) functionsThe functions are PDF (probability density function) and CDF (cumulative distribution function) of standard normal distribution, and β is the average value of u corresponding to all users.
And S107, if the updated capability confidence coefficient is larger than a first preset threshold value, returning to the step of executing the corresponding checking operation by the agents with the control preset quantity until the capability confidence coefficient is smaller than or equal to the first preset threshold value, and finishing capability assessment of each Agent.
According to the formula, the corresponding u will increase when winning the game, the u will decrease, and the sigma will decrease no matter what the win-or-lose is. When σ is less than or equal to a first preset threshold value after the office is continuously conducted, the evaluation of the Agent corresponding to the AI model is considered to be stable. For example, the first preset threshold is preset to be 0.1, that is, when the updated σ is smaller than or equal to 0.1, the capability evaluation of the Agent corresponding to the AI model is completed. And taking the AI model corresponding to the Agent with which the capability evaluation has been completed as the AI model of the reference Agent, and storing the AI model for use in AI tournament evaluation and human tournament evaluation.
Compared with the existing evaluation method, the large blind number of average each partial profit and loss (each partial profit and loss divided by the large blind number of authorities) is used as the judging basis of the capacity, and each user cannot be measured transversely without transitivity, for example, if A is 100 blind in the pair of A and B, C is 200 blind in the pair of C and B, and a conclusion that C is stronger than A cannot be obtained. And each user can be evaluated laterally by using the u-value evaluation, which has transitivity. And the evaluation of the user is more accurate when the sigma value is smaller.
According to the AI model training method provided by the embodiment, through acquiring multiple groups of first sample data corresponding to multiple users of different levels, inputting each group of first sample data into an AI model, and carrying out iterative training on the AI model based on supervised learning until the AI model converges, so as to obtain the AI model of the Agent corresponding to each level of the different levels; and randomly initializing the AI model to perform sample generation operation, obtaining second sample data, reversely transmitting the second sample data to the AI model, performing iterative training on the AI model based on reinforcement learning, taking a training result as new second sample data, and circularly executing the operation of reversely transmitting the second sample data to the AI model, and performing iterative training on the AI model based on reinforcement learning until the AI model is converged, thereby obtaining the AI models of agents corresponding to other levels except different levels. Therefore, the AI model training is realized by fully utilizing different data, and the accuracy of the AI model is improved.
The embodiment of the application also provides an AI model calling method. The AI model may be a trained AI model in the above embodiment, and the AI model calling method may be applied to a server, so as to evaluate an Agent corresponding to the AI model by calling the trained AI model. The server may be a single server or a server cluster composed of a plurality of servers.
Referring to fig. 3, fig. 3 is a flowchart of an AI model calling method according to an embodiment of the present application.
As shown in fig. 3, the AI model invoking method includes steps S201 to 203.
S201, acquiring a first initial evaluation parameter corresponding to an Agent to be evaluated.
Taking the AI tournament evaluation as an example, the AI tournament evaluation refers to a process of performing a capacity evaluation of obtaining stability of the Agent to be evaluated by using the reference Agent and the Agent to be evaluated.
Illustratively, initial evaluation parameters corresponding to the agents to be evaluated are obtained. For convenience of distinguishing description, the initial evaluation parameter corresponding to the Agent to be evaluated is hereinafter referred to as a first initial evaluation parameter. For example, if the initial u of the Agent to be evaluated is preset 0 =3 and σ 0 =1。
S202, selecting AI models of a plurality of first type reference agents matched with the first initial evaluation parameters according to the first initial evaluation parameters.
According to the initial u corresponding to the Agent to be evaluated 0 Sum sigma 0 Selecting AND u 0 Sum sigma 0 AI models of multiple benchmark agents that are matched, for ease of distinguishing descriptions, the following will be associated with u 0 Sum sigma 0 The AI model of the matched plurality of reference agents is referred to as the AI model of the first type of reference Agent. Exemplary, a random number of u values are randomly decimated over the interval [ u ] 00 ,u 00 ]AI model of the reference Agent in (a).
S203, invoking a plurality of AI models of the first type reference agents, and controlling the to-be-evaluated Agent and the first type reference agents to execute corresponding contrast operation so as to evaluate the capability of the to-be-evaluated Agent.
And calling the AI models of the selected multiple first-type reference agents, controlling the to-be-evaluated agents and the multiple first-type reference agents to execute corresponding office operation, and taking a room system, namely distributing certain initial chips to each Agent, and circularly carrying out office operation until one Agent is eliminated. The room system is compared with the single office system, more offices can be conducted, more chip changes and action changes are observed, various strategies of the agents to be evaluated are tested more perfectly, and the agents to be evaluated are subjected to capability evaluation.
In some embodiments, as shown in fig. 4, the step S203 may include a sub-step S2031 and a sub-step S2032.
S2031, controlling the to-be-evaluated Agent and the plurality of first-type reference agents to execute corresponding contrast operation, and obtaining contrast results corresponding to the to-be-evaluated Agent.
For example, taking the above listed room system for the example, after the room system is used for the office, the Agent to be evaluated may obtain a plurality of office results, for example, N first class reference agents are selected, and then the office results of N win-lose relationships corresponding to the Agent to be evaluated are obtained.
S2032, updating the first initial evaluation parameter according to the game result corresponding to the Agent to be evaluated until the capability confidence coefficient in the updated first initial evaluation parameter is smaller than or equal to a second preset threshold value.
For example, the first initial evaluation parameter corresponding to the Agent to be evaluated may be updated according to the above formulas (2) to (9), that is, u 0 Sum sigma 0 And updating. U once per update 0 Sum sigma 0 Then, the updated sigma is judged 0 Whether less than or equal to a second preset threshold. The second preset threshold may be the same as the first preset threshold or may be different from the first preset threshold, which is not particularly limited herein. If the sigma after updating 0 If the value is greater than the second preset threshold value, continuing to circulate the operation process until the updated sigma 0 Less than or equal to a second preset threshold.
For example, the first is setThe second preset threshold value is 0.1, and according to the formulas (2) to (9), the first initial evaluation parameters corresponding to the agents to be evaluated are continuously and circularly updated until the updated sigma 0 And (3) being smaller than or equal to 0.1, and completing the evaluation of the Agent to be evaluated.
The overall flow of the AI tournament assessment is illustrated in fig. 5:
a1, training an AI model of an Agent to be evaluated;
step B1, setting initial u 0 Sum sigma 0 A value;
step C1, current sigma 0 If the number of the air holes is smaller than 0.1, executing the step G1; if not, executing the step D1;
step D1, extracting the value of u in the interval [ u ] 00 ,u 00 ]AI model of the reference Agent in (a);
e1, performing room making and checking;
step F1, updating u 0 Sum sigma 0 The value is calculated and returns to the step C1;
and G1, obtaining u and sigma values of the Agent to be evaluated.
In some embodiments, as shown in fig. 6, the AI model invocation method may further include steps S204 through step 206.
S204, obtaining a second initial evaluation parameter corresponding to the newly registered user.
Taking the player user tournament evaluation as an example, the player user tournament evaluation refers to a process of performing a stable ability evaluation for a new registered user to be obtained by a game with the new registered user using the above-mentioned reference Agent when the new player user is registered.
Illustratively, initial evaluation parameters corresponding to the newly registered user are obtained. For convenience of distinguishing description, the initial evaluation parameter corresponding to the newly registered user will be hereinafter referred to as a second initial evaluation parameter. For example, if the initial u of the newly registered user is preset 0 =3 and σ 0 =1。
S205, according to the second initial evaluation parameters, selecting AI models of a plurality of second class reference agents matched with the second initial evaluation parameters.
According to the initial u corresponding to the newly registered user 0 Sum sigma 0 Selecting u corresponding to the newly registered user 0 Sum sigma 0 AI models of multiple reference Agents matched, for ease of description of discrimination, u corresponding to newly registered user will be described below 0 Sum sigma 0 The AI model of the matched plurality of reference agents is referred to as the AI model of the second type of reference Agent. Exemplary, a random number N of u values are randomly decimated in interval [ u ] 0 -2σ 0 ,u 0 ]AI model of reference Agent in the section [ u ] and randomly extracting N u values in the section [ u ] 0 ,u 0 +2σ 0 ]AI model of the reference Agent in (a).
The player user tournament assessment requires as reliable an ability assessment as possible in as short a time as possible compared to the AI tournament assessment. Therefore, the faster the reference Agent with a larger u-value gap is updated for σ corresponding to the newly registered user. The same number of agents with higher levels as those with lower levels can make u of the newly registered user to be evaluated more stable.
S206, invoking a plurality of AI models of the second type reference agents, and controlling the second type reference agents and the new registered users to execute corresponding checking operation so as to evaluate the capacity of the new registered users.
And calling the AI models of the selected second class reference agents, controlling the second class reference agents and the new registered users to execute corresponding checking operation, and carrying out single-check checking on the new registered users and the second class reference agents by using the new registered users and the new registered users as an example to evaluate the capacity of the new registered users.
In some embodiments, as shown in fig. 7, the step S206 may include a substep S2061 and a substep S2062.
S2061, controlling a plurality of second class reference agents and the new registered user to execute corresponding contrast operation, and obtaining contrast results corresponding to the new registered user.
Illustratively, taking the single-office system listed above as an example, after each office of the single office system, one office result of the newly registered user, such as a winning office result, or a failed office result, may be obtained.
S2062, updating the second initial evaluation parameters according to the corresponding exchange result of the new registered user until the capability confidence coefficient in the updated second initial evaluation parameters is smaller than or equal to a third preset threshold value.
For example, the second initial evaluation parameters corresponding to the newly registered user may be updated according to the above formulas (2) to (9), that is, u corresponding to the newly registered user 0 Sum sigma 0 And updating. U once per update 0 Sum sigma 0 Then, the updated sigma is judged 0 Whether less than or equal to a third preset threshold. The third preset threshold may be the same as the first preset threshold and/or the second preset threshold, or may be different from the first preset threshold and/or the second preset threshold, which is not particularly limited herein. If the sigma after updating 0 If the value is greater than the third preset threshold value, continuing to circulate the operation process until the updated sigma 0 Less than or equal to a third preset threshold.
For example, setting the third preset threshold to 0.25, and continuously and circularly updating the second initial evaluation parameters corresponding to the newly registered user according to the above formulas (2) to (9) until the updated sigma 0 Less than or equal to 0.25, the capability assessment of the newly registered user is completed. Sigma (sigma) 0 A value less than 0.25 may be considered to have a fluctuation range of the evaluation result below 20%, i.e. the result of the capability evaluation on the newly registered user is more accurate and reliable.
Exemplary, the overall flow of player user tournament assessment is shown in FIG. 8:
Step A2, newly adding a new registered user to be evaluated;
step B2, setting initial u 0 Sum sigma 0 A value;
step C2, current sigma 0 If the number of the holes is less than 0.25, executing the step G2; if not, executing the step D2;
step D2, extracting N u values in the interval [ u ] 0 -2σ 0 ,u 0 ]AI model of reference Agent in the section [ u ] and extracting N u values 0 ,u 0 +2σ 0 ]AI model of the reference Agent in (a);
e2, performing single-office system office checking;
step F2, updating u 0 Sum sigma 0 The value is calculated and the step C2 is returned;
and G2, obtaining u and sigma values of the new registered user.
When new registered users get the estimated u and sigma values, the u and sigma values of each user can be continuously updated in the subsequent rounds, the u and sigma values of each user are applied to matching with other users or agents, and only if the u and sigma values are relatively close, the users can be matched together. Therefore, aiming at the situation that the psychological fall is large due to the possible level mismatch of the new registered user in the game, the multi-level Agent and the new registered user are utilized to conduct the game, the capacity assessment of the new registered user is rapidly obtained, the situation of the level mismatch is shortened, the game competition is increased, and therefore the user experience is improved.
Referring to fig. 9, fig. 9 is a schematic block diagram of a server according to an embodiment of the present application.
As shown in fig. 9, the server may include a processor, memory, and a network interface. The processor, memory and network interface are connected by a system bus, such as an I2C (Inter-integrated Circuit) bus.
Specifically, the processor may be a Micro-controller Unit (MCU), a central processing Unit (Central Processing Unit, CPU), a digital signal processor (Digital Signal Processor, DSP), or the like.
Specifically, the Memory may be a Flash chip, a Read-Only Memory (ROM) disk, an optical disk, a U-disk, a removable hard disk, or the like.
The network interface is used for network communication such as transmitting assigned tasks and the like. It will be appreciated by those skilled in the art that the structure shown in fig. 9 is merely a block diagram of a portion of the structure associated with the present application and is not limiting of the server to which the present application is applied, and that a particular server may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
Wherein the processor is configured to run a computer program stored in the memory and to implement the following steps when the computer program is executed:
Acquiring a plurality of groups of first sample data, wherein the plurality of groups of first sample data are data corresponding to a plurality of users with different levels;
inputting each group of first sample data into an AI model, and performing iterative training on the AI model based on supervised learning until the AI model converges to obtain an AI model of an Agent corresponding to each of the different levels;
randomly initializing the AI model to perform sample generation operation to obtain second sample data;
and inputting the second sample data in a back propagation manner to the AI model, performing iterative training on the AI model based on reinforcement learning, taking a training result as new second sample data, and circularly executing the step of inputting the second sample data in the back propagation manner to the AI model, and performing iterative training on the AI model based on reinforcement learning until the AI model converges to obtain AI models of agents corresponding to other levels except the different levels.
In some embodiments, the processor, prior to effecting the inputting the second sample data back-propagation to the AI model, further effects:
caching the second sample data in a Redis server;
the processor performs the specific implementation when implementing the inputting the second sample data back-propagation to the AI model and performing iterative training on the AI model based on reinforcement learning:
And acquiring the cached second sample data from the Redis server, inputting the second sample data in a back propagation manner to the AI model, and performing iterative training on the AI model based on reinforcement learning.
In some embodiments, the processor is further configured to implement:
controlling a plurality of agents to execute corresponding contrast operation, and obtaining a contrast result corresponding to each Agent;
updating evaluation parameters corresponding to each Agent according to the game result corresponding to each Agent, wherein the evaluation parameters comprise capability confidence;
and if the updated capability confidence coefficient is larger than a first preset threshold value, returning to the step of executing the corresponding game operation by the agents with the control preset quantity until the capability confidence coefficient is smaller than or equal to the first preset threshold value, and finishing capability assessment of each Agent.
In some embodiments, the evaluation parameters further include a capability evaluation value, and when the processor updates the evaluation parameters corresponding to each Agent according to the game result corresponding to each Agent, the processor specifically implements:
if the game result corresponding to the Agent is a first result, updating the capability assessment value corresponding to the Agent according to a first preset capability assessment value updating formula, and updating the capability confidence coefficient corresponding to the Agent according to a first preset capability confidence coefficient updating formula;
If the game result corresponding to the Agent is a second result, updating the capability assessment value corresponding to the Agent according to a second preset capability assessment value updating formula, and updating the capability confidence coefficient corresponding to the Agent according to a second preset capability confidence coefficient updating formula.
In some embodiments, the processor is configured to run a computer program stored in a memory and when executing the computer program to perform the steps of:
acquiring a first initial evaluation parameter corresponding to an Agent to be evaluated;
according to the first initial evaluation parameters, selecting AI models of a plurality of first type reference agents matched with the first initial evaluation parameters;
and calling AI models of the plurality of first-type reference agents, and controlling the to-be-evaluated agents and the plurality of first-type reference agents to execute corresponding contrast operation so as to evaluate the capability of the to-be-evaluated agents.
In some embodiments, when the processor performs the controlling the to-be-evaluated Agent to perform corresponding checking operations with the plurality of first-type benchmark agents to perform capability evaluation on the to-be-evaluated Agent, the processor specifically performs:
controlling the to-be-evaluated Agent and a plurality of the first type reference agents to execute corresponding checking operation, and obtaining a checking result corresponding to the to-be-evaluated Agent;
Updating the first initial evaluation parameters according to the game result corresponding to the Agent to be evaluated until the capability confidence coefficient in the updated first initial evaluation parameters is smaller than or equal to a second preset threshold value.
In some embodiments, the processor is further configured to implement:
acquiring a second initial evaluation parameter corresponding to the newly registered user;
according to the second initial evaluation parameters, selecting AI models of a plurality of second class reference agents matched with the second initial evaluation parameters;
and calling AI models of the second class reference agents, and controlling the second class reference agents and the new registered users to execute corresponding checking operation so as to evaluate the capacity of the new registered users.
In some embodiments, when the processor performs the controlling the plurality of second class benchmark agents to perform corresponding checking operations with the new registered user to perform capability assessment on the new registered user, the processor specifically performs:
controlling a plurality of second-class reference agents and the new registered users to execute corresponding checking operations to obtain checking results corresponding to the new registered users;
and updating the second initial evaluation parameters according to the corresponding game result of the new registered user until the capability confidence coefficient in the updated second initial evaluation parameters is smaller than or equal to a third preset threshold value.
It should be noted that, for convenience and brevity of description, specific working processes of the server described above may refer to corresponding processes in the foregoing AI model training method and/or AI model invoking method embodiments, which are not described herein.
An embodiment of the present application further provides a computer readable storage medium, where the computer readable storage medium stores a computer program, where the computer program includes program instructions, and the processor executes the program instructions to implement the steps of the AI model training method and/or the AI model calling method provided in the foregoing embodiment. For example, the computer program is loaded by a processor, the following steps may be performed:
acquiring a plurality of groups of first sample data, wherein the plurality of groups of first sample data are data corresponding to a plurality of users with different levels;
inputting each group of first sample data into an AI model, and performing iterative training on the AI model based on supervised learning until the AI model converges to obtain an AI model of an Agent corresponding to each of the different levels;
acquiring a plurality of groups of second sample data, wherein the second sample data are randomly generated data in a sample generation operation;
And inputting each group of second sample data into the AI model, and performing iterative training on the AI model based on reinforcement learning until the AI model converges to obtain AI models of agents corresponding to other levels except the different levels.
The specific implementation of each operation above may be referred to the previous embodiments, and will not be described herein.
The computer readable storage medium may be an internal storage unit of the server of the foregoing embodiment, for example, a hard disk or a memory of the server. The computer readable storage medium may also be an external storage device of the server, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the server.
Because the computer program stored in the computer readable storage medium can execute any one of the AI model training methods and/or AI model invoking methods provided in the embodiments of the present application, the beneficial effects that any one of the AI model training methods and/or AI model invoking methods provided in the embodiments of the present application can be achieved, which are detailed in the previous embodiments and are not described herein.
The foregoing embodiment numbers of the present application are merely for describing, and do not represent advantages or disadvantages of the embodiments. While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the invention. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (9)

1. An AI model training method, comprising:
acquiring a plurality of groups of first sample data, wherein the plurality of groups of first sample data are data corresponding to a plurality of users with different levels;
inputting each group of first sample data into an AI model, and performing iterative training on the AI model based on supervised learning until the AI model converges to obtain an AI model of an Agent corresponding to each of the different levels;
randomly initializing the AI model to perform sample generation operation to obtain second sample data;
the second sample data is input to the AI model in a back propagation mode, iteration training is carried out on the AI model based on reinforcement learning, a training result is used as new second sample data, the step of inputting the second sample data to the AI model in a back propagation mode is circularly executed, the AI model is subjected to iteration training based on reinforcement learning until the AI model converges, and the AI models of agents corresponding to other levels except the different levels are obtained;
wherein the method further comprises:
controlling a plurality of agents to execute corresponding contrast operation, and obtaining a contrast result corresponding to each Agent;
updating evaluation parameters corresponding to each Agent according to the game result corresponding to each Agent, wherein the evaluation parameters comprise capability confidence;
And if the updated capability confidence coefficient is larger than a first preset threshold value, returning to execute the step of controlling the agents to execute corresponding checking operation until the capability confidence coefficient is smaller than or equal to the first preset threshold value, and finishing capability assessment of each Agent.
2. The method of claim 1, wherein the back-propagating the second sample data prior to input to the AI model comprises:
caching the second sample data in a Redis server;
the back-propagating the second sample data to the AI model, iteratively training the AI model based on reinforcement learning, comprising:
and acquiring the cached second sample data from the Redis server, inputting the second sample data in a back propagation manner to the AI model, and performing iterative training on the AI model based on reinforcement learning.
3. The method according to claim 1, wherein the evaluation parameters further include capability evaluation values, and the updating the evaluation parameters corresponding to each Agent according to the game result corresponding to each Agent includes:
if the game result corresponding to the Agent is a first result, updating the capability assessment value corresponding to the Agent according to a first preset capability assessment value updating formula, and updating the capability confidence coefficient corresponding to the Agent according to a first preset capability confidence coefficient updating formula;
If the game result corresponding to the Agent is a second result, updating the capability assessment value corresponding to the Agent according to a second preset capability assessment value updating formula, and updating the capability confidence coefficient corresponding to the Agent according to a second preset capability confidence coefficient updating formula.
4. An AI model invoking method, comprising:
acquiring a first initial evaluation parameter corresponding to an Agent to be evaluated;
according to the first initial evaluation parameters, selecting AI models of a plurality of first type reference agents matched with the first initial evaluation parameters;
and calling AI models of the plurality of first-type reference agents, and controlling the to-be-evaluated agents and the plurality of first-type reference agents to execute corresponding contrast operation so as to evaluate the capability of the to-be-evaluated agents.
5. The method of claim 4, wherein the controlling the Agent to be evaluated to perform corresponding reconciliation operations with the plurality of benchmark agents of the first type to perform the capability evaluation on the Agent to be evaluated comprises:
controlling the to-be-evaluated Agent and a plurality of the first type reference agents to execute corresponding checking operation, and obtaining a checking result corresponding to the to-be-evaluated Agent;
Updating the first initial evaluation parameters according to the game result corresponding to the Agent to be evaluated until the capability confidence coefficient in the updated first initial evaluation parameters is smaller than or equal to a second preset threshold value.
6. The method according to claim 4, wherein the method further comprises:
acquiring a second initial evaluation parameter corresponding to the newly registered user;
according to the second initial evaluation parameters, selecting AI models of a plurality of second class reference agents matched with the second initial evaluation parameters;
and calling AI models of the second class reference agents, and controlling the second class reference agents and the new registered users to execute corresponding checking operation so as to evaluate the capacity of the new registered users.
7. The method of claim 6, wherein said controlling a plurality of said second type benchmark agents to perform respective reconciliation operations with said new registered user for capability assessment of said new registered user comprises:
controlling a plurality of second-class reference agents and the new registered users to execute corresponding checking operations to obtain checking results corresponding to the new registered users;
and updating the second initial evaluation parameters according to the corresponding game result of the new registered user until the capability confidence coefficient in the updated second initial evaluation parameters is smaller than or equal to a third preset threshold value.
8. A server comprising a processor, a memory, and a computer program stored on the memory and executable by the processor, the memory storing AI models, wherein the computer program when executed by the processor implements the AI model training method of any of claims 1-3; alternatively, the AI model invocation method of any of claims 4-7 is implemented.
9. A computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, which when executed by a processor causes the processor to implement the AI model training method of any of claims 1-3; alternatively, the AI model invocation method of any of claims 4-7 is implemented.
CN202011176373.6A 2020-10-28 2020-10-28 AI model training method, calling method, server and storage medium Active CN112274925B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011176373.6A CN112274925B (en) 2020-10-28 2020-10-28 AI model training method, calling method, server and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011176373.6A CN112274925B (en) 2020-10-28 2020-10-28 AI model training method, calling method, server and storage medium

Publications (2)

Publication Number Publication Date
CN112274925A CN112274925A (en) 2021-01-29
CN112274925B true CN112274925B (en) 2024-02-27

Family

ID=74374058

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011176373.6A Active CN112274925B (en) 2020-10-28 2020-10-28 AI model training method, calling method, server and storage medium

Country Status (1)

Country Link
CN (1) CN112274925B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112870721B (en) * 2021-03-16 2023-07-14 腾讯科技(深圳)有限公司 Game interaction method, device, equipment and storage medium
CN113052038A (en) * 2021-03-16 2021-06-29 蔡勇 Method for counting chip tray inventory by AI technology
CN113379071B (en) * 2021-06-16 2022-11-29 中国科学院计算技术研究所 Noise label correction method based on federal learning
CN114404977B (en) * 2022-01-25 2024-04-16 腾讯科技(深圳)有限公司 Training method of behavior model and training method of structure capacity expansion model
CN114629797B (en) * 2022-03-11 2024-03-08 阿里巴巴(中国)有限公司 Bandwidth prediction method, model generation method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109036389A (en) * 2018-08-28 2018-12-18 出门问问信息科技有限公司 The generation method and device of a kind of pair of resisting sample
CN110782004A (en) * 2019-09-26 2020-02-11 超参数科技(深圳)有限公司 Model training method, model calling equipment and readable storage medium
CN110975294A (en) * 2019-11-22 2020-04-10 珠海豹趣科技有限公司 Game fighting implementation method and terminal
CN111111204A (en) * 2020-04-01 2020-05-08 腾讯科技(深圳)有限公司 Interactive model training method and device, computer equipment and storage medium
CN111598234A (en) * 2020-05-13 2020-08-28 超参数科技(深圳)有限公司 AI model training method, use method, computer device and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200265301A1 (en) * 2019-02-15 2020-08-20 Microsoft Technology Licensing, Llc Incremental training of machine learning tools

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109036389A (en) * 2018-08-28 2018-12-18 出门问问信息科技有限公司 The generation method and device of a kind of pair of resisting sample
CN110782004A (en) * 2019-09-26 2020-02-11 超参数科技(深圳)有限公司 Model training method, model calling equipment and readable storage medium
CN110975294A (en) * 2019-11-22 2020-04-10 珠海豹趣科技有限公司 Game fighting implementation method and terminal
CN111111204A (en) * 2020-04-01 2020-05-08 腾讯科技(深圳)有限公司 Interactive model training method and device, computer equipment and storage medium
CN111598234A (en) * 2020-05-13 2020-08-28 超参数科技(深圳)有限公司 AI model training method, use method, computer device and storage medium

Also Published As

Publication number Publication date
CN112274925A (en) 2021-01-29

Similar Documents

Publication Publication Date Title
CN112274925B (en) AI model training method, calling method, server and storage medium
CN107970608B (en) Setting method and device of level game, storage medium and electronic device
CN111291890B (en) Game strategy optimization method, system and storage medium
US20200242897A1 (en) System and method for conducting a game including a computer-controlled player
CN109513215B (en) Object matching method, model training method and server
CN110443284B (en) Artificial intelligence AI model training method, calling method, server and readable storage medium
CN109091868B (en) Method, apparatus, computer equipment and the storage medium that battle behavior determines
CN112016704B (en) AI model training method, model using method, computer device and storage medium
CN108392828A (en) A kind of player's On-line matching method and system for the game of MOBA classes
CN109718558B (en) Game information determination method and device, storage medium and electronic device
CN111738294B (en) AI model training method, AI model using method, computer device, and storage medium
CN111569429B (en) Model training method, model using method, computer device, and storage medium
CN111282267A (en) Information processing method, information processing apparatus, information processing medium, and electronic device
CN106075912A (en) A kind of method that online game is helped each other and network game system
CN112926744A (en) Incomplete information game method and system based on reinforcement learning and electronic equipment
CN111729300A (en) Monte Carlo tree search and convolutional neural network based bucket owner strategy research method
CN111111193B (en) Game control method and device and electronic equipment
CN110598853A (en) Model training method, information processing method and related device
CN111507475A (en) Game behavior decision method, device and related equipment
Salge et al. Relevant information as a formalised approach to evaluate game mechanics
Dockhorn et al. A decision heuristic for Monte Carlo tree search doppelkopf agents
CN108874377B (en) Data processing method, device and storage medium
CN114580642B (en) Method, device, equipment and medium for establishing game AI model and processing data
CN112439193A (en) Game difficulty matching method and device
Kocsis et al. Universal parameter optimisation in games based on SPSA

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant