CN112274925B

CN112274925B - AI model training method, calling method, server and storage medium

Info

Publication number: CN112274925B
Application number: CN202011176373.6A
Authority: CN
Inventors: 朱展图; 周正; 李宏亮; 刘永升
Original assignee: Super Parameter Technology Shenzhen Co ltd
Current assignee: Super Parameter Technology Shenzhen Co ltd
Priority date: 2020-10-28
Filing date: 2020-10-28
Publication date: 2024-02-27
Anticipated expiration: 2040-10-28
Also published as: CN112274925A

Abstract

The application discloses an AI model training method, a calling method, a server and a storage medium, wherein the method comprises the following steps: acquiring a plurality of groups of first sample data; inputting each group of first sample data into an AI model, and performing iterative training on the AI model based on supervised learning until the AI model converges to obtain an AI model of an Agent corresponding to each of the different levels; randomly initializing the AI model to perform sample generation operation to obtain second sample data; and inputting the second sample data in a back propagation manner to the AI model, performing iterative training on the AI model based on reinforcement learning, taking a training result as new second sample data, and circularly executing the step of inputting the second sample data in the back propagation manner to the AI model, and performing iterative training on the AI model based on reinforcement learning until the AI model converges to obtain AI models of agents corresponding to other levels. Therefore, the accuracy of the AI model is improved.

Description

AI model training method, calling method, server and storage medium

Technical Field

The application relates to the technical field of artificial intelligence, in particular to an AI model training method, an AI model calling method, a server and a storage medium.

Background

With the rapid development of artificial intelligence (Artificial Intelligence, AI) technology, the artificial intelligence technology is widely applied to various fields, and at present, in the field of game entertainment, a game between a virtual Agent and a real user in a chess game can be realized through the artificial intelligence technology, and the top professional player can be won. However, card games are often played by multiple players, and cards among players of the game are not known to each other, so developing an AI model corresponding to a card game Agent has a greater challenge.

At present, an AI model is mainly realized based on a deep neural network (Deep Neural Network, DNN), and is usually trained based on data of each party alone, so that the data cannot be fully utilized, and the accuracy of the AI model is poor. Therefore, how to improve the accuracy of AI models is a current urgent problem to be solved.

Disclosure of Invention

The embodiment of the application provides an AI model training method, a calling method, a server and a storage medium, which can improve the accuracy of an AI model.

In a first aspect, an embodiment of the present application provides an AI model training method, including:

acquiring a plurality of groups of first sample data, wherein the plurality of groups of first sample data are data corresponding to a plurality of users with different levels;

Inputting each group of first sample data into an AI model, and performing iterative training on the AI model based on supervised learning until the AI model converges to obtain an AI model of an Agent corresponding to each of the different levels;

randomly initializing the AI model to perform sample generation operation to obtain second sample data;

and inputting the second sample data in a back propagation manner to the AI model, performing iterative training on the AI model based on reinforcement learning, taking a training result as new second sample data, and circularly executing the step of inputting the second sample data in the back propagation manner to the AI model, and performing iterative training on the AI model based on reinforcement learning until the AI model converges to obtain AI models of agents corresponding to other levels except the different levels.

In a second aspect, an embodiment of the present application further provides an AI model invoking method, including:

acquiring a first initial evaluation parameter corresponding to an Agent to be evaluated;

according to the first initial evaluation parameters, selecting AI models of a plurality of first type reference agents matched with the first initial evaluation parameters;

and calling AI models of the plurality of first-type reference agents, and controlling the to-be-evaluated agents and the plurality of first-type reference agents to execute corresponding contrast operation so as to evaluate the capability of the to-be-evaluated agents.

In a third aspect, an embodiment of the present application further provides a server, where the server includes a processor, a memory, and a computer program stored on the memory and executable by the processor, where the memory stores an AI model, and where the computer program when executed by the processor implements an AI model training method as described above; alternatively, the AI model invoking method as described above is implemented.

In a fourth aspect, embodiments of the present application further provide a computer readable storage medium, where the computer readable storage medium is configured to store a computer program, where the computer program when executed by a processor causes the processor to implement the AI model training method described above; alternatively, the AI model invoking method described above is implemented.

The embodiment of the application provides an AI model training method, a calling method, a server and a storage medium, wherein multiple groups of first sample data corresponding to multiple users in different levels are obtained, each group of first sample data is input into an AI model, and the AI model is subjected to iterative training based on supervised learning until the AI model converges, so that the AI model of an Agent corresponding to each level in different levels is obtained; and randomly initializing the AI model to perform sample generation operation, obtaining second sample data, reversely transmitting the second sample data to the AI model, performing iterative training on the AI model based on reinforcement learning, taking a training result as new second sample data, and circularly executing the operation of reversely transmitting the second sample data to the AI model, and performing iterative training on the AI model based on reinforcement learning until the AI model is converged, thereby obtaining the AI models of agents corresponding to other levels except different levels. By fully utilizing different data to perform AI model training, the accuracy of the AI model is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of the steps of an AI model training method in accordance with one embodiment of the present application;

FIG. 2 is a schematic flow chart of steps of another AI model training method provided in one embodiment of the application;

FIG. 3 is a schematic flowchart of the steps of an AI model invoking method provided in one embodiment of the application;

FIG. 4 is a schematic flowchart of the steps for controlling the agents to be evaluated and the plurality of reference agents of the first type to perform corresponding checking operations according to an embodiment of the present application;

FIG. 5 is a schematic diagram of an AI tournament assessment procedure provided in an embodiment of the present application;

FIG. 6 is a schematic flow chart of steps of another AI model invocation method provided in an embodiment of the application;

FIG. 7 is a schematic flowchart of the steps for controlling a plurality of second-type benchmark agents and the newly registered users to perform corresponding reconciliation operations according to an embodiment of the present application;

FIG. 8 is a schematic diagram of a player user tournament assessment process provided in an embodiment of the present application;

fig. 9 is a schematic block diagram of a server provided in an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

The flow diagrams depicted in the figures are merely illustrative and not necessarily all of the elements and operations/steps are included or performed in the order described. For example, some operations/steps may be further divided, combined, or partially combined, so that the order of actual execution may be changed according to actual situations.

It is to be understood that the terminology used in the description of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should also be understood that the term "and/or" as used in this specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

Some embodiments of the present application are described in detail below with reference to the accompanying drawings. The following embodiments and features of the embodiments may be combined with each other without conflict.

At present, an AI model is mainly realized based on a deep neural network (Deep Neural Network, DNN), and is usually trained based on data of each party alone, so that the data cannot be fully utilized, and the accuracy of the AI model is poor.

In order to solve the above problems, embodiments of the present application provide an AI model training method, an invocation method, a server, and a storage medium, which can improve the accuracy of an AI model. The AI model training method and the calling method can be applied to a server, and the server can be a single server or a server cluster consisting of a plurality of servers.

Referring to fig. 1, fig. 1 is a flowchart of an AI model training method according to an embodiment of the disclosure.

As shown in fig. 1, the AI model training method specifically includes steps S101 to S104.

S101, acquiring a plurality of groups of first sample data, wherein the plurality of groups of first sample data are data corresponding to a plurality of users with different levels.

Taking the game as an example, the game level of each player user of the game is irregular and has different levels, such as the game level of a novice player user and a senior player user are obviously different. For player users of various levels, data corresponding to a plurality of player user games of each level is acquired as sample data. For example, a plurality of player users at different levels play a multi-game, and sample data corresponding to the plurality of player users at each level is obtained. For convenience of distinguishing description, sample data corresponding to a player user will be hereinafter referred to as first sample data. Based on player users of different levels, multiple sets of first sample data corresponding to the player users of multiple levels are obtained.

Illustratively, the first sample data includes global information, player information, and the like, wherein the player information is in turn divided into current player information and all player information. Global information includes the number of players in the authorities, the player's location information, the number of chips in the pool, the public cards, the current turn, etc., current player information includes the number of chips, hands, valid actions, valid bets, location information, the number of chips deposited, historical actions, historical bets, etc., and all player information includes the number of chips, location information, the number of chips deposited, historical actions, historical bets, whether to leave a field, whether to all-in, etc.

For example, if the used player data is derived from the platform a, the levels of the player users on the platform a are five segments from low to high, the data of the plurality of player users of the five segments are respectively screened out, and the data of the plurality of player users of each segment is taken as a set of first sample data, so that five sets of first sample data are obtained.

S102, inputting each group of first sample data into an AI model, and performing iterative training on the AI model based on supervised learning until the AI model converges to obtain the AI model of the Agent corresponding to each of the different levels.

The AI model is implemented based on a neural network model, which may be set based on actual situations, and this application is not limited specifically. The training of the AI model mainly uses a supervised learning method, after a plurality of groups of first sample data are obtained, each group of first sample data is input into the AI model, and the AI model is iteratively trained based on the supervised learning until the AI model converges, so that the AI model of the Agent with the same user level as that corresponding to each group of first sample data is obtained. The Agent is an Agent which is in a complex dynamic environment, autonomously perceives environment information, autonomously takes action and realizes a series of preset targets or tasks.

The feature extraction is performed on each group of first sample data to obtain corresponding feature vectors, then the feature vectors corresponding to the first sample data are input into the AI model, and the AI model is iteratively trained based on supervised learning until the AI model converges.

Illustratively, the feature vector corresponding to the first sample data includes a global information feature vector, a player information feature vector, and the like, wherein the player information is further divided into a current player information feature vector and all player information feature vectors. Global information feature vectors include an office player count feature vector, a player's location information feature vector, a pool number of chips feature vector, a common card feature vector, a current round feature vector, etc., current player information includes a number of chips feature vector, a hand feature vector, a valid behavior feature vector, a valid betting feature vector, a location information feature vector, a betting number of chips feature vector, a historical behavior feature vector, a historical betting feature vector, etc., and all player information includes a number of chips feature vector, a location information feature vector, a betting number of chips feature vector, a historical behavior feature vector, a historical betting feature vector, a leave feature vector, an all-in feature vector, etc.

For example, taking the example that the level of the player user on the platform a is five segments from low to high, the effective behaviors of a plurality of player users with five segments, such as fold, check, raise, call, all-in, the number of betting codes and other five groups of first sample data, extracting the characteristics of each group of first sample data, inputting the obtained corresponding characteristic vector into an AI model, and performing iterative training until the AI model converges, thereby obtaining the corresponding AI model representing the average level Agent of each segment in the five segments.

The method comprises the steps of obtaining a model loss value loss corresponding to an AI model during training, determining whether the model loss value loss is smaller than or equal to a preset loss value threshold, and determining that the AI model converges if the model loss value loss is smaller than or equal to the preset loss value threshold; otherwise, if the model loss value loss is greater than the preset loss value threshold, determining that the AI model is not converged.

Wherein, the model loss value loss is calculated according to the following formula (1):

loss＝(p ₁ *log(p ₂ ))+(1-p ₁ )*log(1-p ₂ )+(q ₁ -q ₂ ) ² (1)

wherein p is ₁ Is the user action, i.e. the probability, p, corresponding to the user's effective behavior ₂ Is the probability corresponding to the output behavior, q ₁ Is the number of chips corresponding to the user, q ₂ Is the output code number.

That is, after each iterative training, the corresponding model loss value loss is calculated by adopting the above formula, and if the corresponding model loss value loss is greater than the preset loss value threshold, it is determined that the AI model has not converged yet, and the iterative training is continued on the AI model. And determining the AI model to converge until the model loss value loss is smaller than or equal to a preset loss value threshold, namely finishing the current AI model training.

S103, randomly initializing the AI model to perform sample generation operation, and acquiring second sample data.

In addition to training AI models corresponding to agents representing the average level of each segment using supervised learning, AI models corresponding to agents that exceed the average level of the player user segments are trained. To train the AI model of the Agent corresponding to the other level, the AI model is initialized at random, and a sample generation operation, such as game play, is performed, so that data generated by the sample generation operation is acquired. In order to facilitate the distinguishing description of the first sample data corresponding to the user, the data generated by the sample generation operation will be hereinafter referred to as second sample data.

By way of example, the self-play mode, that is, the Agent self-play mode, can be completely separated from the data of the user, and the corresponding sample generation operation is performed based on the AI model to generate the second sample data required by the AI model training.

S104, reversely propagating and inputting the second sample data to the AI model, carrying out iterative training on the AI model based on reinforcement learning, taking a training result as new second sample data, and circularly executing the step of reversely propagating and inputting the second sample data to the AI model, and carrying out iterative training on the AI model based on reinforcement learning until the AI model is converged, so as to obtain the AI models of agents corresponding to other levels except the different levels.

In some embodiments, before inputting the second sample data back-propagation to the AI model, it may include: caching the second sample data in a Redis server; the back-propagating the second sample data to the AI model, iteratively training the AI model based on reinforcement learning, may include: and acquiring the cached second sample data from the Redis server, inputting the second sample data in a back propagation manner to the AI model, and performing iterative training on the AI model based on reinforcement learning.

Illustratively, the reinforcement learning AI model based training is divided into three parts, actor, redis store, and Learner. The Actor section is responsible for simulating the second sample data required for self-play generation training. And caching the second sample data generated by the Actor through a Redis server, and waiting for consumption of the Learner. The Learner consumes the second sample data stored by the Redis server. And (3) obtaining the AI models of the agents corresponding to other levels except the different levels until the AI models are converged, namely obtaining the AI models corresponding to the agents exceeding the average level of the user segments of the player.

Illustratively, the PPO algorithm is used for reinforcement training, wherein the PPO algorithm iteratively optimizes the cost function and the strategy of the neural network through the reward signal generated by the environment, and the level of reinforcement learning Agent is continuously enhanced along with the increase of the training duration. For example, in addition to the AI models corresponding to the five-segment average level agents in the above-listed examples, AI models corresponding to the sixth-segment Agent and the seventh-segment Agent are trained based on reinforcement learning.

Through supervised learning and reinforcement learning training, AI models corresponding to agents with multiple levels are obtained, so that users with various levels can be comprehensively covered, including users exceeding the average level of the highest section of the platform, and the comprehensiveness of evaluation can be ensured.

In some embodiments, as shown in fig. 2, the AI model training method further includes steps S105 to S107.

S105, controlling a plurality of agents to execute corresponding contrast operation, and obtaining contrast results corresponding to each Agent.

S106, updating evaluation parameters corresponding to each Agent according to the game result corresponding to each Agent, wherein the evaluation parameters comprise capability confidence.

This embodiment is to calculate an integral of the ability to stably evaluate each AI model trained in the above embodiment. Each user corresponds to a respective evaluation parameter, wherein the evaluation parameter includes, but is not limited to, a capability evaluation value representing an average level capability of the user, a capability confidence level representing an uncertainty of the capability of the user, and the like. Illustratively, for each user, a normal distribution is used to represent the user's ability, u represents the average ability of the user under the current assessment, i.e., the ability assessment value, and the larger u is, the greater the chance that the user will perform better in the game; sigma represents the uncertainty of the user's ability assessment, i.e., the ability confidence, the smaller sigma, the higher the confidence that the assessment of the user is. As the number of user evaluations increases, σ becomes progressively smaller, i.e., the confidence in the user's evaluation increases. The initial u value and sigma value corresponding to each AI model are preset. For example, initial values u=3 and σ=1 for each AI model are set.

Selecting AI models of the agents from the trained AI models corresponding to the agents, and controlling the A' sThe agent performs the corresponding office operation. The AI model of the random number of agents is extracted randomly, the random number of agents is controlled to execute corresponding checking operation, and a checking result corresponding to each Agent is obtained. The game results include a first result (such as victory) and a second result (such as failure). For example, a game of 6-player table playing cards is played, and after the game is finished, the surplus chips and the deficiency chips of 6 player users can be obtained, so that C can be obtained ₆ ² And a result of the game of winning or losing relation.

And updating the evaluation parameters corresponding to each Agent according to the corresponding exchange result of each Agent. In some embodiments, updating the evaluation parameter corresponding to each Agent according to the game result corresponding to each Agent may include: if the game result corresponding to the Agent is a first result, updating the capability assessment value corresponding to the Agent according to a first preset capability assessment value updating formula, and updating the capability confidence coefficient corresponding to the Agent according to a first preset capability confidence coefficient updating formula; if the game result corresponding to the Agent is a second result, updating the capability assessment value corresponding to the Agent according to a second preset capability assessment value updating formula, and updating the capability confidence coefficient corresponding to the Agent according to a second preset capability confidence coefficient updating formula.

Taking an AI model of any Agent as an example, if an Agent's result of the game operation is a first result, such as a winning result, updating a capability evaluation value corresponding to the Agent according to a first preset capability evaluation value updating formula, and updating the capability confidence coefficient corresponding to the Agent according to a first preset capability confidence coefficient updating formula, where the first preset capability evaluation value updating formula is shown in formula (2), and the first preset capability confidence coefficient updating formula is shown in formula (3):

if the game result of the Agent game operation is a second result, such as a failed result, updating the capability assessment value corresponding to the Agent according to a second preset capability assessment value updating formula, and updating the capability confidence coefficient corresponding to the Agent according to a second preset capability confidence coefficient updating formula, wherein the second preset capability assessment value updating formula is shown as a formula (4), and the second preset capability confidence coefficient updating formula is shown as a formula (5):

in the above formulas (2) to (9):

t＝u ₁ -u ₂ (6)

w(t)＝v(t)(v(t)+t) (9)

wherein the sum of N (t) functionsThe functions are PDF (probability density function) and CDF (cumulative distribution function) of standard normal distribution, and β is the average value of u corresponding to all users.

And S107, if the updated capability confidence coefficient is larger than a first preset threshold value, returning to the step of executing the corresponding checking operation by the agents with the control preset quantity until the capability confidence coefficient is smaller than or equal to the first preset threshold value, and finishing capability assessment of each Agent.

According to the formula, the corresponding u will increase when winning the game, the u will decrease, and the sigma will decrease no matter what the win-or-lose is. When σ is less than or equal to a first preset threshold value after the office is continuously conducted, the evaluation of the Agent corresponding to the AI model is considered to be stable. For example, the first preset threshold is preset to be 0.1, that is, when the updated σ is smaller than or equal to 0.1, the capability evaluation of the Agent corresponding to the AI model is completed. And taking the AI model corresponding to the Agent with which the capability evaluation has been completed as the AI model of the reference Agent, and storing the AI model for use in AI tournament evaluation and human tournament evaluation.

Compared with the existing evaluation method, the large blind number of average each partial profit and loss (each partial profit and loss divided by the large blind number of authorities) is used as the judging basis of the capacity, and each user cannot be measured transversely without transitivity, for example, if A is 100 blind in the pair of A and B, C is 200 blind in the pair of C and B, and a conclusion that C is stronger than A cannot be obtained. And each user can be evaluated laterally by using the u-value evaluation, which has transitivity. And the evaluation of the user is more accurate when the sigma value is smaller.

According to the AI model training method provided by the embodiment, through acquiring multiple groups of first sample data corresponding to multiple users of different levels, inputting each group of first sample data into an AI model, and carrying out iterative training on the AI model based on supervised learning until the AI model converges, so as to obtain the AI model of the Agent corresponding to each level of the different levels; and randomly initializing the AI model to perform sample generation operation, obtaining second sample data, reversely transmitting the second sample data to the AI model, performing iterative training on the AI model based on reinforcement learning, taking a training result as new second sample data, and circularly executing the operation of reversely transmitting the second sample data to the AI model, and performing iterative training on the AI model based on reinforcement learning until the AI model is converged, thereby obtaining the AI models of agents corresponding to other levels except different levels. Therefore, the AI model training is realized by fully utilizing different data, and the accuracy of the AI model is improved.

The embodiment of the application also provides an AI model calling method. The AI model may be a trained AI model in the above embodiment, and the AI model calling method may be applied to a server, so as to evaluate an Agent corresponding to the AI model by calling the trained AI model. The server may be a single server or a server cluster composed of a plurality of servers.

Referring to fig. 3, fig. 3 is a flowchart of an AI model calling method according to an embodiment of the present application.

As shown in fig. 3, the AI model invoking method includes steps S201 to 203.

S201, acquiring a first initial evaluation parameter corresponding to an Agent to be evaluated.

Taking the AI tournament evaluation as an example, the AI tournament evaluation refers to a process of performing a capacity evaluation of obtaining stability of the Agent to be evaluated by using the reference Agent and the Agent to be evaluated.

Illustratively, initial evaluation parameters corresponding to the agents to be evaluated are obtained. For convenience of distinguishing description, the initial evaluation parameter corresponding to the Agent to be evaluated is hereinafter referred to as a first initial evaluation parameter. For example, if the initial u of the Agent to be evaluated is preset ₀ =3 and σ ₀ ＝1。

S202, selecting AI models of a plurality of first type reference agents matched with the first initial evaluation parameters according to the first initial evaluation parameters.

According to the initial u corresponding to the Agent to be evaluated ₀ Sum sigma ₀ Selecting AND u ₀ Sum sigma ₀ AI models of multiple benchmark agents that are matched, for ease of distinguishing descriptions, the following will be associated with u ₀ Sum sigma ₀ The AI model of the matched plurality of reference agents is referred to as the AI model of the first type of reference Agent. Exemplary, a random number of u values are randomly decimated over the interval [ u ] ₀ -σ ₀ ,u ₀ +σ ₀ ]AI model of the reference Agent in (a).

S203, invoking a plurality of AI models of the first type reference agents, and controlling the to-be-evaluated Agent and the first type reference agents to execute corresponding contrast operation so as to evaluate the capability of the to-be-evaluated Agent.

And calling the AI models of the selected multiple first-type reference agents, controlling the to-be-evaluated agents and the multiple first-type reference agents to execute corresponding office operation, and taking a room system, namely distributing certain initial chips to each Agent, and circularly carrying out office operation until one Agent is eliminated. The room system is compared with the single office system, more offices can be conducted, more chip changes and action changes are observed, various strategies of the agents to be evaluated are tested more perfectly, and the agents to be evaluated are subjected to capability evaluation.

In some embodiments, as shown in fig. 4, the step S203 may include a sub-step S2031 and a sub-step S2032.

S2031, controlling the to-be-evaluated Agent and the plurality of first-type reference agents to execute corresponding contrast operation, and obtaining contrast results corresponding to the to-be-evaluated Agent.

For example, taking the above listed room system for the example, after the room system is used for the office, the Agent to be evaluated may obtain a plurality of office results, for example, N first class reference agents are selected, and then the office results of N win-lose relationships corresponding to the Agent to be evaluated are obtained.

S2032, updating the first initial evaluation parameter according to the game result corresponding to the Agent to be evaluated until the capability confidence coefficient in the updated first initial evaluation parameter is smaller than or equal to a second preset threshold value.

For example, the first initial evaluation parameter corresponding to the Agent to be evaluated may be updated according to the above formulas (2) to (9), that is, u ₀ Sum sigma ₀ And updating. U once per update ₀ Sum sigma ₀ Then, the updated sigma is judged ₀ Whether less than or equal to a second preset threshold. The second preset threshold may be the same as the first preset threshold or may be different from the first preset threshold, which is not particularly limited herein. If the sigma after updating ₀ If the value is greater than the second preset threshold value, continuing to circulate the operation process until the updated sigma ₀ Less than or equal to a second preset threshold.

For example, the first is setThe second preset threshold value is 0.1, and according to the formulas (2) to (9), the first initial evaluation parameters corresponding to the agents to be evaluated are continuously and circularly updated until the updated sigma ₀ And (3) being smaller than or equal to 0.1, and completing the evaluation of the Agent to be evaluated.

The overall flow of the AI tournament assessment is illustrated in fig. 5:

a1, training an AI model of an Agent to be evaluated;

step B1, setting initial u ₀ Sum sigma ₀ A value;

step C1, current sigma ₀ If the number of the air holes is smaller than 0.1, executing the step G1; if not, executing the step D1;

step D1, extracting the value of u in the interval [ u ] ₀ -σ ₀ ,u ₀ +σ ₀ ]AI model of the reference Agent in (a);

e1, performing room making and checking;

step F1, updating u ₀ Sum sigma ₀ The value is calculated and returns to the step C1;

and G1, obtaining u and sigma values of the Agent to be evaluated.

In some embodiments, as shown in fig. 6, the AI model invocation method may further include steps S204 through step 206.

S204, obtaining a second initial evaluation parameter corresponding to the newly registered user.

Taking the player user tournament evaluation as an example, the player user tournament evaluation refers to a process of performing a stable ability evaluation for a new registered user to be obtained by a game with the new registered user using the above-mentioned reference Agent when the new player user is registered.

Illustratively, initial evaluation parameters corresponding to the newly registered user are obtained. For convenience of distinguishing description, the initial evaluation parameter corresponding to the newly registered user will be hereinafter referred to as a second initial evaluation parameter. For example, if the initial u of the newly registered user is preset ₀ =3 and σ ₀ ＝1。

S205, according to the second initial evaluation parameters, selecting AI models of a plurality of second class reference agents matched with the second initial evaluation parameters.

According to the initial u corresponding to the newly registered user ₀ Sum sigma ₀ Selecting u corresponding to the newly registered user ₀ Sum sigma ₀ AI models of multiple reference Agents matched, for ease of description of discrimination, u corresponding to newly registered user will be described below ₀ Sum sigma ₀ The AI model of the matched plurality of reference agents is referred to as the AI model of the second type of reference Agent. Exemplary, a random number N of u values are randomly decimated in interval [ u ] ₀ -2σ ₀ ,u ₀ ]AI model of reference Agent in the section [ u ] and randomly extracting N u values in the section [ u ] ₀ ,u ₀ +2σ ₀ ]AI model of the reference Agent in (a).

The player user tournament assessment requires as reliable an ability assessment as possible in as short a time as possible compared to the AI tournament assessment. Therefore, the faster the reference Agent with a larger u-value gap is updated for σ corresponding to the newly registered user. The same number of agents with higher levels as those with lower levels can make u of the newly registered user to be evaluated more stable.

S206, invoking a plurality of AI models of the second type reference agents, and controlling the second type reference agents and the new registered users to execute corresponding checking operation so as to evaluate the capacity of the new registered users.

And calling the AI models of the selected second class reference agents, controlling the second class reference agents and the new registered users to execute corresponding checking operation, and carrying out single-check checking on the new registered users and the second class reference agents by using the new registered users and the new registered users as an example to evaluate the capacity of the new registered users.

In some embodiments, as shown in fig. 7, the step S206 may include a substep S2061 and a substep S2062.

S2061, controlling a plurality of second class reference agents and the new registered user to execute corresponding contrast operation, and obtaining contrast results corresponding to the new registered user.

Illustratively, taking the single-office system listed above as an example, after each office of the single office system, one office result of the newly registered user, such as a winning office result, or a failed office result, may be obtained.

S2062, updating the second initial evaluation parameters according to the corresponding exchange result of the new registered user until the capability confidence coefficient in the updated second initial evaluation parameters is smaller than or equal to a third preset threshold value.

For example, the second initial evaluation parameters corresponding to the newly registered user may be updated according to the above formulas (2) to (9), that is, u corresponding to the newly registered user ₀ Sum sigma ₀ And updating. U once per update ₀ Sum sigma ₀ Then, the updated sigma is judged ₀ Whether less than or equal to a third preset threshold. The third preset threshold may be the same as the first preset threshold and/or the second preset threshold, or may be different from the first preset threshold and/or the second preset threshold, which is not particularly limited herein. If the sigma after updating ₀ If the value is greater than the third preset threshold value, continuing to circulate the operation process until the updated sigma ₀ Less than or equal to a third preset threshold.

For example, setting the third preset threshold to 0.25, and continuously and circularly updating the second initial evaluation parameters corresponding to the newly registered user according to the above formulas (2) to (9) until the updated sigma ₀ Less than or equal to 0.25, the capability assessment of the newly registered user is completed. Sigma (sigma) ₀ A value less than 0.25 may be considered to have a fluctuation range of the evaluation result below 20%, i.e. the result of the capability evaluation on the newly registered user is more accurate and reliable.

Exemplary, the overall flow of player user tournament assessment is shown in FIG. 8:

Step A2, newly adding a new registered user to be evaluated;

step B2, setting initial u ₀ Sum sigma ₀ A value;

step C2, current sigma ₀ If the number of the holes is less than 0.25, executing the step G2; if not, executing the step D2;

step D2, extracting N u values in the interval [ u ] ₀ -2σ ₀ ,u ₀ ]AI model of reference Agent in the section [ u ] and extracting N u values ₀ ,u ₀ +2σ ₀ ]AI model of the reference Agent in (a);

e2, performing single-office system office checking;

step F2, updating u ₀ Sum sigma ₀ The value is calculated and the step C2 is returned;

and G2, obtaining u and sigma values of the new registered user.

When new registered users get the estimated u and sigma values, the u and sigma values of each user can be continuously updated in the subsequent rounds, the u and sigma values of each user are applied to matching with other users or agents, and only if the u and sigma values are relatively close, the users can be matched together. Therefore, aiming at the situation that the psychological fall is large due to the possible level mismatch of the new registered user in the game, the multi-level Agent and the new registered user are utilized to conduct the game, the capacity assessment of the new registered user is rapidly obtained, the situation of the level mismatch is shortened, the game competition is increased, and therefore the user experience is improved.

Referring to fig. 9, fig. 9 is a schematic block diagram of a server according to an embodiment of the present application.

As shown in fig. 9, the server may include a processor, memory, and a network interface. The processor, memory and network interface are connected by a system bus, such as an I2C (Inter-integrated Circuit) bus.

Specifically, the processor may be a Micro-controller Unit (MCU), a central processing Unit (Central Processing Unit, CPU), a digital signal processor (Digital Signal Processor, DSP), or the like.

Specifically, the Memory may be a Flash chip, a Read-Only Memory (ROM) disk, an optical disk, a U-disk, a removable hard disk, or the like.

The network interface is used for network communication such as transmitting assigned tasks and the like. It will be appreciated by those skilled in the art that the structure shown in fig. 9 is merely a block diagram of a portion of the structure associated with the present application and is not limiting of the server to which the present application is applied, and that a particular server may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

Wherein the processor is configured to run a computer program stored in the memory and to implement the following steps when the computer program is executed:

In some embodiments, the processor, prior to effecting the inputting the second sample data back-propagation to the AI model, further effects:

caching the second sample data in a Redis server;

the processor performs the specific implementation when implementing the inputting the second sample data back-propagation to the AI model and performing iterative training on the AI model based on reinforcement learning:

And acquiring the cached second sample data from the Redis server, inputting the second sample data in a back propagation manner to the AI model, and performing iterative training on the AI model based on reinforcement learning.

In some embodiments, the processor is further configured to implement:

controlling a plurality of agents to execute corresponding contrast operation, and obtaining a contrast result corresponding to each Agent;

updating evaluation parameters corresponding to each Agent according to the game result corresponding to each Agent, wherein the evaluation parameters comprise capability confidence;

and if the updated capability confidence coefficient is larger than a first preset threshold value, returning to the step of executing the corresponding game operation by the agents with the control preset quantity until the capability confidence coefficient is smaller than or equal to the first preset threshold value, and finishing capability assessment of each Agent.

In some embodiments, the evaluation parameters further include a capability evaluation value, and when the processor updates the evaluation parameters corresponding to each Agent according to the game result corresponding to each Agent, the processor specifically implements:

if the game result corresponding to the Agent is a first result, updating the capability assessment value corresponding to the Agent according to a first preset capability assessment value updating formula, and updating the capability confidence coefficient corresponding to the Agent according to a first preset capability confidence coefficient updating formula;

If the game result corresponding to the Agent is a second result, updating the capability assessment value corresponding to the Agent according to a second preset capability assessment value updating formula, and updating the capability confidence coefficient corresponding to the Agent according to a second preset capability confidence coefficient updating formula.

In some embodiments, the processor is configured to run a computer program stored in a memory and when executing the computer program to perform the steps of:

In some embodiments, when the processor performs the controlling the to-be-evaluated Agent to perform corresponding checking operations with the plurality of first-type benchmark agents to perform capability evaluation on the to-be-evaluated Agent, the processor specifically performs:

controlling the to-be-evaluated Agent and a plurality of the first type reference agents to execute corresponding checking operation, and obtaining a checking result corresponding to the to-be-evaluated Agent;

Updating the first initial evaluation parameters according to the game result corresponding to the Agent to be evaluated until the capability confidence coefficient in the updated first initial evaluation parameters is smaller than or equal to a second preset threshold value.

In some embodiments, the processor is further configured to implement:

acquiring a second initial evaluation parameter corresponding to the newly registered user;

according to the second initial evaluation parameters, selecting AI models of a plurality of second class reference agents matched with the second initial evaluation parameters;

and calling AI models of the second class reference agents, and controlling the second class reference agents and the new registered users to execute corresponding checking operation so as to evaluate the capacity of the new registered users.

In some embodiments, when the processor performs the controlling the plurality of second class benchmark agents to perform corresponding checking operations with the new registered user to perform capability assessment on the new registered user, the processor specifically performs:

controlling a plurality of second-class reference agents and the new registered users to execute corresponding checking operations to obtain checking results corresponding to the new registered users;

and updating the second initial evaluation parameters according to the corresponding game result of the new registered user until the capability confidence coefficient in the updated second initial evaluation parameters is smaller than or equal to a third preset threshold value.

It should be noted that, for convenience and brevity of description, specific working processes of the server described above may refer to corresponding processes in the foregoing AI model training method and/or AI model invoking method embodiments, which are not described herein.

An embodiment of the present application further provides a computer readable storage medium, where the computer readable storage medium stores a computer program, where the computer program includes program instructions, and the processor executes the program instructions to implement the steps of the AI model training method and/or the AI model calling method provided in the foregoing embodiment. For example, the computer program is loaded by a processor, the following steps may be performed:

acquiring a plurality of groups of second sample data, wherein the second sample data are randomly generated data in a sample generation operation;

And inputting each group of second sample data into the AI model, and performing iterative training on the AI model based on reinforcement learning until the AI model converges to obtain AI models of agents corresponding to other levels except the different levels.

The specific implementation of each operation above may be referred to the previous embodiments, and will not be described herein.

The computer readable storage medium may be an internal storage unit of the server of the foregoing embodiment, for example, a hard disk or a memory of the server. The computer readable storage medium may also be an external storage device of the server, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the server.

Because the computer program stored in the computer readable storage medium can execute any one of the AI model training methods and/or AI model invoking methods provided in the embodiments of the present application, the beneficial effects that any one of the AI model training methods and/or AI model invoking methods provided in the embodiments of the present application can be achieved, which are detailed in the previous embodiments and are not described herein.

The foregoing embodiment numbers of the present application are merely for describing, and do not represent advantages or disadvantages of the embodiments. While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the invention. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. An AI model training method, comprising:

the second sample data is input to the AI model in a back propagation mode, iteration training is carried out on the AI model based on reinforcement learning, a training result is used as new second sample data, the step of inputting the second sample data to the AI model in a back propagation mode is circularly executed, the AI model is subjected to iteration training based on reinforcement learning until the AI model converges, and the AI models of agents corresponding to other levels except the different levels are obtained;

wherein the method further comprises:

And if the updated capability confidence coefficient is larger than a first preset threshold value, returning to execute the step of controlling the agents to execute corresponding checking operation until the capability confidence coefficient is smaller than or equal to the first preset threshold value, and finishing capability assessment of each Agent.

2. The method of claim 1, wherein the back-propagating the second sample data prior to input to the AI model comprises:

caching the second sample data in a Redis server;

the back-propagating the second sample data to the AI model, iteratively training the AI model based on reinforcement learning, comprising:

3. The method according to claim 1, wherein the evaluation parameters further include capability evaluation values, and the updating the evaluation parameters corresponding to each Agent according to the game result corresponding to each Agent includes:

4. An AI model invoking method, comprising:

5. The method of claim 4, wherein the controlling the Agent to be evaluated to perform corresponding reconciliation operations with the plurality of benchmark agents of the first type to perform the capability evaluation on the Agent to be evaluated comprises:

6. The method according to claim 4, wherein the method further comprises:

7. The method of claim 6, wherein said controlling a plurality of said second type benchmark agents to perform respective reconciliation operations with said new registered user for capability assessment of said new registered user comprises:

8. A server comprising a processor, a memory, and a computer program stored on the memory and executable by the processor, the memory storing AI models, wherein the computer program when executed by the processor implements the AI model training method of any of claims 1-3; alternatively, the AI model invocation method of any of claims 4-7 is implemented.

9. A computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, which when executed by a processor causes the processor to implement the AI model training method of any of claims 1-3; alternatively, the AI model invocation method of any of claims 4-7 is implemented.