CN110443284B

CN110443284B - Artificial intelligence AI model training method, calling method, server and readable storage medium

Info

Publication number: CN110443284B
Application number: CN201910636868.3A
Authority: CN
Inventors: 李宏亮
Original assignee: Super Parameter Technology Shenzhen Co ltd
Current assignee: Super Parameter Technology Shenzhen Co ltd
Priority date: 2019-07-15
Filing date: 2019-07-15
Publication date: 2022-04-05
Anticipated expiration: 2039-07-15
Also published as: CN110443284A

Abstract

The application provides a training method, a calling method, a server and a readable storage medium of an AI model, wherein the method comprises the following steps: obtaining a first sample data set, wherein the first sample data set comprises a first type of image features, a first vector feature and an annotated role election tag; acquiring a second sample data set, wherein the second sample data set comprises a second type of image features, a second vector feature and labeled strategy labels; loading an AI model to be trained, wherein the AI model comprises a role election model and a strategy prediction model; and performing iterative training on the role election model according to the first sample data set until the role election model converges, and performing iterative training on the strategy prediction model according to the second sample data set until the strategy prediction model converges. The method and the device can improve the prediction efficiency and accuracy of the AI model.

Description

Artificial intelligence AI model training method, calling method, server and readable storage medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a training method, a calling method, a server, and a readable storage medium for an AI model.

Background

With the rapid development of Artificial Intelligence (AI) technology, the AI technology is widely applied to various fields, and at present, in the field of game entertainment, the Artificial Intelligence technology can be used to realize the match between virtual AI and real persons in chess games and can defeat the top professional players. Card games are often played by multiple players, and cards among the players are not known to each other, so that the development of the card game AI model has greater challenges.

Currently, AI models are mainly implemented based on Deep Neural Networks (DNNs) and supervised learning. However, for strategy prediction, an AI model implemented based on DNN and supervised learning needs to be subjected to model prediction for many times to determine a strategy prediction result, which is inefficient and poor in user experience. Therefore, how to improve the accuracy of the AI model is a problem to be solved urgently.

Disclosure of Invention

The present application mainly aims to provide a training method, a calling method, a server and a readable storage medium for an AI model, and aims to improve the accuracy of the AI model.

In a first aspect, the present application provides a method for training an AI model, including the following steps:

obtaining a first sample data set, wherein the first sample data set comprises a first type of image features, a first vector feature and an annotated role election tag;

acquiring a second sample data set, wherein the second sample data set comprises a second type of image features, a second vector feature and labeled strategy labels;

loading an AI model to be trained, wherein the AI model comprises a role election model and a strategy prediction model;

and performing iterative training on the role election model according to the first sample data set until the role election model converges, and performing iterative training on the strategy prediction model according to the second sample data set until the strategy prediction model converges.

In a second aspect, the present application further provides a method for calling an AI model, where the method for calling the AI model includes the following steps:

determining whether a call instruction of an AI model is triggered, wherein the AI model comprises a role election model and a strategy prediction model;

if the triggered calling instruction is monitored, current game data are obtained, and according to the current game data, whether a model to be called is a role election model or a strategy prediction model is determined;

if the model to be called is a role election model, calling the role election model to determine a role election label based on the current game data and generating a role election instruction corresponding to the role election label;

and if the model to be called is a strategy prediction model, calling the strategy prediction model, determining a strategy prediction result based on the current office alignment data, and generating a strategy output instruction corresponding to the strategy prediction result.

In a third aspect, the present application further provides a server, which includes a processor, a memory, and a computer program stored on the memory and executable by the processor, wherein the memory stores an AI model including a role election model and a policy prediction model, and the computer program, when executed by the processor, implements the steps of the method for invoking the AI model as described above.

In a fourth aspect, the present application further provides a computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the steps of the calling method of the AI model as described above.

The application provides a training method, a calling method, a server and a readable storage medium of an AI model, and the application can obtain an accurate role election model by obtaining class image characteristics and vector characteristics required by the stored role election model and a first sample data set marked with a role election label, then carrying out iterative training on the role election model based on the first sample data set, and simultaneously obtaining the class image characteristics and the vector characteristics required by the stored strategy prediction model and a second sample data set marked with the strategy label, and then carrying out iterative training on the strategy prediction model based on the second sample data set, so that the accurate strategy prediction model can be obtained, thereby obtaining the AI model comprising the role election model and the strategy prediction model, and improving the accuracy of the AI model.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic flowchart of a method for training an AI model according to an embodiment of the present disclosure;

FIG. 2 is a schematic representation of class image features required by a role election model according to an embodiment of the present application;

FIG. 3 is a representation of class image features required by the policy prediction model according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a hierarchical structure of a role election model in an embodiment of the present application;

FIG. 5 is a diagram illustrating a hierarchical structure of a policy prediction model according to an embodiment of the present application;

FIG. 6 is a schematic diagram of another hierarchical structure of a policy prediction model according to an embodiment of the present application

Fig. 7 is a flowchart illustrating a method for calling an AI model according to an embodiment of the present application;

FIG. 8 is a schematic flow chart diagram illustrating the training and invocation of AI models in an embodiment of the present application;

fig. 9 is a block diagram schematically illustrating a structure of a server according to an embodiment of the present application.

The implementation, functional features and advantages of the objectives of the present application will be further explained with reference to the accompanying drawings.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The flow diagrams depicted in the figures are merely illustrative and do not necessarily include all of the elements and operations/steps, nor do they necessarily have to be performed in the order depicted. For example, some operations/steps may be decomposed, combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

The embodiment of the application provides a training method, a calling method, a server and a readable storage medium of an Artificial Intelligence (AI) model. The AI model training method can be applied to a server, wherein the server can be a single server or a server cluster consisting of a plurality of servers.

Some embodiments of the present application will be described in detail below with reference to the accompanying drawings. The embodiments described below and the features of the embodiments can be combined with each other without conflict.

Referring to fig. 1, fig. 1 is a flowchart illustrating a method for training an AI model according to an embodiment of the present disclosure.

As shown in fig. 1, the AI model training method includes steps S101 to S104.

Step S101, a first sample data set is obtained, wherein the first sample data set comprises a first type of image features, a first vector feature and an annotated role election label.

The AI model comprises a role election model and a strategy prediction model, wherein a class image characteristic and a vector characteristic required by the role election model are extracted from game competition data, a role election label is labeled, a sample data set is formed by taking the extracted class image characteristic and the vector characteristic as well as the labeled role election label as a group of sample data and is labeled as a first sample data set, wherein the first sample data set comprises the class image characteristic and the vector characteristic required by the role election model and the labeled role election label, and the class image characteristic and the vector characteristic required by the role election model are respectively labeled as a first class image characteristic and a first vector characteristic. The role election labels comprise labels corresponding to participation in role election and labels corresponding to non-participation in role election. When an AI model needs to be trained, the server obtains a first sample dataset.

The game match data may be match data of a card game, including but not limited to match data of a fighting game and match data of a mahjong game, and may also be match data of other games, which is not specifically limited in this application.

The game match data comprises match information of a plurality of game participants, including multiple information of each game, initial character holding information of each game participant in each game, character output information and character holding information of each round, secret character information, information of no character and character information. The multiple information is used for representing the multiple of game play, the character holding information comprises holding characters and the number of each holding character, the covert character information comprises covert characters and the number of the covert characters, the information of the non-appearing characters comprises non-appearing characters and the number of each non-appearing character, and the role information is used for representing the role played by the game participant in the game play.

The vector characteristics required by the character election model are used for representing the character election condition of the game play. It should be noted that the class image features and the vector features required by the character election models corresponding to different card games are different, and the class image features and the vector features required by the character election models are explained below by taking the landlord game as an example.

In the field main game, the game participator comprises an upper game participator, a current game participator and a lower game participator, the maximum number of the same cards is 4, the character holding information is the card holding information of the game participator, and the information comprises the held cards and the number of the cards; the character output information is the card playing information of the players participating in the game, and comprises the cards and the number of the cards; the secret character information is bottom plate information which comprises bottom plates and plate numbers; the information of the non-appeared characters is information of non-appeared cards, including non-appeared cards and the number of cards.

The image characteristics are used for representing the information of holding the cards and the information of no cards of the players participating in the game, the horizontal axis is that all the cards are arranged from big to small, the vertical axis is the number of each card, namely the number of the cards is 1, the vertical axis is 1000, the number of the cards is 2, the vertical axis is 1100, the number of the cards is 3, the vertical axis is 1110, the number of the cards is 4, and the vertical axis is 1111; the vector is characterized by being a five-dimensional vector, wherein the first dimension represents the role election participation condition of a player participating in the previous game, 1 represents the role election participation, 0 represents the role election non-participation, the second dimension represents the role election participation condition of a player participating in the next game, 1 represents the role election participation, 0 represents the role election non-participation, and finally, the three dimensions represent the role election multiple of another player participating in the game, wherein the 1 multiple is represented as [100], the 2 multiple is represented as [010], and the 3 multiple is represented as [001 ].

Fig. 2 is a schematic representation of class image features required by a character election model in an embodiment of the present application, where, as shown in fig. 2, the class image features include a feature expression of card holding information of a game participant and a feature expression of card absence information, a diagram a in fig. 2 is a feature expression of card holding information of a game participant, and a diagram B in fig. 2 is a feature expression of card absence information.

Step S102, a second sample data set is obtained, wherein the second sample data set comprises a second type image feature, a second vector feature and an annotated strategy label.

Extracting the class image features and the vector features required by the strategy prediction model from game-game data, labeling strategy labels, forming a sample data set by taking the extracted class image features and the vector features and the labeled strategy labels as a group of sample data, and labeling the sample data set as a second sample data set, wherein the second sample data set comprises the class image features and the vector features required by the strategy prediction model and the labeled strategy labels, and labeling the class image features and the vector features required by the strategy prediction model as second class image features and second vector features respectively. The policy tags include master policy tags and slave policy tags. And when the AI model needs to be trained, the server acquires a second sample data set.

It should be noted that the class image features and the vector features required by the strategy prediction models corresponding to different card games are different, and the strategy labels corresponding to different card games are also different, and the class image features, the vector features and the strategy labels required by the strategy prediction models are explained below by taking a landlord game as an example. The main strategy labels are main label labels, the main label labels include but are not limited to labels corresponding to single cards, single cistrons, pairs, double cistrons, three cistrons, four cistrons, rockets and don't care, the slave strategy labels are label with cards, and the label with cards includes but not limited to labels corresponding to single cards, two single cards, one pair, two pairs and no card.

The image characteristics required by the strategy prediction model are used for representing the space distribution of the card types, the horizontal axis of the image characteristics is that characters of all the cards are arranged from large to small, the vertical axis of the image characteristics is the number of the characters corresponding to each card, if the number of the characters is 1, the vertical axis is [1000], if the number of the characters is 2, the vertical axis is [1100], if the number of the characters is 3, the vertical axis is [1110], if the number of the characters is 4, the vertical axis is [1111], the image characteristics required by the strategy prediction model comprise 13 channels which are card holding information (1 channel) of the current game participator, card information (3 channels) of the latest first round of three game participators, card information (3 channels) of the latest second round of three game participators, card information (3 channels) of the latest third round of three game participators respectively, All deal information (1 lane), no-present information (1 lane), and deal information (1 lane) immediately before the third round.

FIG. 3 is a diagram illustrating an expression of class image characteristics required by a strategy prediction model according to an embodiment of the present application, as shown in FIG. 3, the class image characteristics include 13 channels, FIG. 3A is a diagram illustrating a characteristic expression of card holding information BR22AAKK10109873 of a current game participant, FIG. 3B is a diagram illustrating a characteristic expression of card information QQQQ 8 of a nearest first round of previous game participant, FIG. 3C is a diagram illustrating a characteristic expression of card information 4445 of a nearest second round of previous game participant, FIG. 3D is a diagram illustrating a characteristic expression of card information 6663 of a nearest third round of previous game participant, FIG. 3E is a diagram illustrating a characteristic expression of all card information 210109988765433 before the nearest third round, FIG. 3F is a diagram illustrating a characteristic expression of card-not-present information BR222 AAKKQKKJJJ 1019777, FIG. 3G is a diagram illustrating a characteristic expression of bottom card information R23, and FIG. 3H is a diagram illustrating a characteristic expression of a nearest first round of current card information R22 KK10109873, The feature expression of the card information of the last round of the player participating in the next game, the card information of the last round of the player participating in the current game, the card information of the last round of the player participating in the next game, the card information of the last round of the player participating in the current game or the card information of the last round of the player participating in the next game.

The vector characteristics required by the strategy prediction model are used for representing space-independent characteristics, including the roles, the number of held cards and the character election multiple of three game participants, the card number of the previous game participant and whether the held cards of the current game participant have cards larger than the cards of the previous game participant, if the roles are landowner, the role code is 1, if the roles are farmers, the role code is 0, the code of the number of held cards is between 00000 (holding 0 cards) and 10100 (holding 20 cards), the code of the character election multiple is between 01(1 time) and 11(3 time), the code of the card number of the previous game participant is between 00000 (discharging 0 cards) and 10100 (discharging 20 cards), if the held cards of the current game participant have cards larger than the held cards of the previous game participant, the corresponding code is 1, otherwise, the held cards of the current game participant have no cards larger than the cards discharged by the previous game participant, the corresponding code is 0.

For example, the characters of the three game participants are the landowner, the farmer and the farmer, the number of held cards of the three game participants is 15, 12 and 8, the competition multiples of the characters of the three game participants are 3 times, 2 times and 2 times, the number of played cards of the previous game participant is 5, and if the held cards of the current game participants are larger than the played cards of the previous game, the vector characteristics required by the corresponding strategy prediction model are as follows: [1,0,0, 01111, 01100, 01000, 11, 10, 10, 00101,1].

Step S103, loading an AI model to be trained, wherein the AI model comprises a role election model and a strategy prediction model.

After the first sample data set and the second sample data set are obtained, the server loads an AI model to be trained, wherein the AI model comprises a role election model and a strategy prediction model. Referring to fig. 4, fig. 4 is a schematic diagram of a hierarchical structure of a role election model in an embodiment of the present application, as shown in fig. 4, the role election model includes three fully-connected layers, two convolution layers and a vector splicing layer, where a first fully-connected layer is connected to a second fully-connected layer, a second fully-connected layer is connected to the vector splicing layer, the first convolution layer is connected to a second convolution layer, the second convolution layer is connected to the vector splicing layer, and the vector splicing layer is connected to a third fully-connected layer.

Referring to fig. 5, fig. 5 is a schematic diagram of a hierarchical structure of a policy prediction model in an embodiment of the present application, as shown in fig. 5, the policy prediction model includes seven fully-connected layers, two convolution layers, and two vector split layers, where a first fully-connected layer is connected to a second fully-connected layer, a second fully-connected layer is connected to a first vector split layer, the first convolution layer is connected to a second convolution layer, the second convolution layer is connected to a first vector split layer, the first vector split layer is connected to a third fully-connected layer, a fourth fully-connected layer, and a fifth fully-connected layer, which are all connected to a second vector split layer, and the second vector split layer is connected to a sixth fully-connected layer and a seventh fully-connected layer. It should be noted that the classification task of the first guess result is docked by the third full-connection layer, the first guess tag and the output probability and the Loss are output 1, the classification task of the second guess result is docked by the fourth full-connection layer, the second guess tag and the output probability and the Loss are output 2, the classification task of the master policy is docked by the sixth full-connection layer, the master policy tag and the output probability and the Loss3 are output, the classification task of the slave policy is docked by the seventh full-connection layer, the slave policy tag and the output probability and the Loss4 are output, and the Loss value of the model is Loss1+ Loss2+ Loss3+ Loss 4.

Referring to fig. 6, fig. 6 is a schematic diagram of another hierarchical structure of a policy prediction model according to an embodiment of the present application, as shown in fig. 6, the policy prediction model includes seven fully-connected layers, two convolution layers, and three vector split layers, where a first fully-connected layer is connected to a second fully-connected layer, a second fully-connected layer is connected to a first vector split layer, the first convolution layer is connected to a second convolution layer, the second convolution layer is connected to a first vector split layer, the first vector split layer is connected to a third fully-connected layer, a fourth fully-connected layer, and a fifth fully-connected layer, the third fully-connected layer, the fourth fully-connected layer, and the fifth fully-connected layer are connected to a second vector split layer, the second vector split layer is connected to a sixth fully-connected layer and a third vector split layer, and the third vector split layer is connected to a seventh fully-connected layer.

And S104, performing iterative training on the role election model according to the first sample data set until the role election model converges, and performing iterative training on the strategy prediction model according to the second sample data set until the strategy prediction model converges.

After a first sample data set and a second sample data set are obtained and an AI model to be trained is loaded, the server conducts iterative training on the role election model according to the first sample data set until the role election model converges, conducts iterative training on the strategy prediction model according to the second sample data set until the strategy prediction model converges, and stores the converged role election model and the converged strategy prediction model after the role election model and the strategy prediction model both converge.

In one embodiment, the specific training process of the role election model is as follows: acquiring a group of sample data from a first sample data set each time, wherein the sample data comprises a first type of image features, a first vector feature and a role election label; processing the first vector features through two fully-connected layers to obtain a first target vector, and performing convolution processing on the first type of image features through two convolution layers to obtain a second target vector; splicing the first target vector and the second target vector through a vector splicing layer to obtain a spliced vector, and processing the spliced vector through a role election layer to obtain the output probability of a role election label; calculating a current loss value according to the role election label and the output probability, and determining whether a role election model is converged or not according to the current loss value; and if the role election model is converged, stopping the model training, if the role election model is not converged, updating the parameters of the role election model, and continuing to train the updated role election model. It should be noted that the parameter updating algorithm may be set based on an actual situation, which is not specifically limited in this application, and optionally, the parameters of the role election model are updated based on a back propagation algorithm.

In an embodiment, the manner of determining whether the role election model converges specifically is as follows: obtaining a loss value during the last model training, recording the loss value as a historical loss value, and calculating a difference value between the historical loss value and the current loss value; and determining whether the difference value between the historical loss value and the current loss value is smaller than a preset threshold value corresponding to the role election model, if the difference value between the historical loss value and the current loss value is smaller than the preset threshold value corresponding to the role election model, determining that the role election model converges, otherwise, if the difference value between the historical loss value and the current loss value is larger than or equal to the preset threshold value corresponding to the role election model, determining that the role election model does not converge.

In one embodiment, when the hierarchy of the strategy prediction model is as shown in fig. 5, the specific training process of the strategy prediction model is as follows: acquiring a group of sample data from a second sample data set each time, wherein the sample data comprises a second type of image features, a second vector feature and strategy labels, and the strategy labels comprise a first guess label, a second guess label, a master strategy label and a slave strategy label; processing the second vector characteristics through the first full-connection layer and the second full-connection layer to obtain a first target vector; convolving the second type of image features through the first convolution layer and the second convolution layer to obtain a second target vector; splicing the first target vector and the second target vector through the first vector splicing layer to obtain a first spliced vector; determining the output probability of the first guess label based on the first splicing vector through the third full-connection layer, and calculating a first loss value according to the first guess label and the corresponding output probability;

determining the output probability of a second guess label based on the first splicing vector through a fourth full-connection layer, and calculating a second loss value according to the second guess label and the corresponding output probability; processing the first splicing vector through a fifth full-connection layer, and splicing the first guess label, the second guess label and the processed first splicing vector through a second vector splicing layer to obtain a second splicing vector; determining the output probability of the main strategy label based on the second splicing vector through a sixth full-connection layer, and calculating a third loss value according to the main strategy label and the corresponding output probability; determining the output probability of the slave strategy label based on the second splicing vector through a seventh full-connection layer, and calculating a fourth loss value according to the slave strategy label and the corresponding output probability; determining whether the strategy prediction model converges according to the first loss value, the second loss value, the third loss value and the fourth loss value; and if the strategy prediction model is converged, stopping the model training, if the strategy prediction model is not converged, updating the parameters of the strategy prediction model, and continuing to train the updated strategy prediction model.

In an embodiment, when the hierarchy of the policy prediction model is as shown in fig. 6, unlike fig. 5, the policy prediction model further includes a third vector splicing layer, the second vector splicing layer is connected to a sixth full-connected layer and a third vector splicing layer, and the third vector splicing layer is connected to a seventh full-connected layer, so that the determination manner from the policy tag is specifically: splicing the second splicing vector and the main strategy label through a third vector splicing layer to obtain a third splicing vector; determining, by the seventh fully connected layer, an output probability of the slave policy tag based on the third splice vector, and calculating a fourth loss value according to the slave policy tag and the corresponding output probability.

It should be noted that the parameter updating algorithm may be set based on an actual situation, which is not specifically limited in this application, and optionally, the parameters of the policy prediction model are updated based on a back propagation algorithm. The first guess tag, the second guess tag, the main strategy tag and the slave strategy tag are characterized in a vector mode, the first guess tag represents the cards of the players participating in the previous game, the second guess tag represents the cards of the players participating in the next game, for example, the guess of the cards of the players participating in the previous game is B2AAKJJJ0933, the first guess tag is [200000113012110], the first guess tag represents the number of each card, 15 cards in total, and for example, the type of the master card is 5, the main strategy tag also corresponds to 5, and the vector corresponding to each main strategy tag is [10000], [01000], [00100], [0010] and [00001 ].

In an embodiment, the manner of determining whether the policy prediction model converges specifically is as follows: calculating the sum of the first loss value, the second loss value, the third loss value and the fourth loss value, and taking the sum of the first loss value, the second loss value, the third loss value and the fourth loss value as a total loss value; acquiring a historical total loss value, and calculating a difference value between the historical loss value and the total loss value, wherein the historical total loss value is the total loss value in the last model training; determining whether a difference between the historical loss value and the total loss value is less than or equal to a preset threshold; if the difference value between the historical loss value and the total loss value is smaller than or equal to a preset threshold value, determining that the strategy prediction model is converged; and if the difference value between the historical loss value and the total loss value is larger than a preset threshold value, determining that the strategy prediction model is not converged. It should be noted that the preset threshold may be set based on actual situations, and this application is not limited to this specifically.

In the training method for the AI model provided in the above embodiment, the accurate role election model may be obtained by obtaining the first sample data set in which the class image features and the vector features required by the role election model are stored and the role election label is labeled, performing iterative training on the role election model based on the first sample data set, obtaining the second sample data set in which the class image features and the vector features required by the policy prediction model are stored, and performing iterative training on the policy prediction model based on the second sample data set, so that the accurate policy prediction model may be obtained, and thus the AI model including the role election model and the policy prediction model may be obtained, and the bid efficiency and accuracy of the AI model may be improved.

The embodiment of the application also provides a calling method of the AI model. The calling method of the AI model can be applied to a server, wherein the server can be a single server or a server cluster consisting of a plurality of servers.

Referring to fig. 7, fig. 7 is a flowchart illustrating a method for calling an AI model according to an embodiment of the present disclosure.

As shown in fig. 7, the calling method of the AI model includes steps S201 to 204.

Step S201, determining whether to trigger a call instruction of an AI model, wherein the AI model comprises a role election model and a strategy prediction model.

The server stores an AI model, the AI model comprises a role election model and a strategy prediction model, the role election model comprises three full-connection layers, two convolution layers and a vector splicing layer, the first full-connection layer is connected with the second full-connection layer, the second full-connection layer is connected with the vector splicing layer, the first convolution layer is connected with the second convolution layer, the second convolution layer is connected with the vector splicing layer, and the vector splicing layer is connected with the third full-connection layer.

The strategy prediction model comprises seven fully-connected layers, two convolution layers and two vector splicing layers, wherein the first fully-connected layer is connected with the second fully-connected layer, the second fully-connected layer is connected with the first vector splicing layer, the first convolution layer is connected with the second convolution layer, the second convolution layer is connected with the first vector splicing layer, the first vector splicing layer is respectively connected with the third fully-connected layer, the fourth fully-connected layer and the fifth fully-connected layer, the third fully-connected layer, the fourth fully-connected layer and the fifth fully-connected layer are all connected with the second vector splicing layer, and the second vector splicing layer is respectively connected with the sixth fully-connected layer and the seventh fully-connected layer.

In an embodiment, the determining whether to trigger the call instruction of the AI model specifically includes: monitoring whether the game state of a game participant is a game offline state or not in the game process; when the monitored game state of the game participant player is a game offline state, triggering a calling instruction of the AI model; and when the game state of the game participant is monitored to be the game online state, the calling instruction of the AI model is not triggered. By monitoring the game state of the game-participating player, the AI model can be called to carry out game hosting when the game-participating player is offline, and because the accuracy of the AI model is higher, the loss caused by offline can be reduced, and the user experience is improved.

In an embodiment, the determining whether to trigger the call instruction of the AI model specifically includes: receiving a game control instruction sent by a game participation terminal, wherein the game control instruction comprises a game control label; judging whether the game control tags are located in a preset tag group, wherein the preset tag group comprises tags corresponding to online hosting, man-machine battle and game fast matching; if the game control tag is located in the preset tag group, triggering the calling instruction of the AI model, and if the game control tag is not located in the preset tag group, not triggering the calling instruction of the AI model.

The triggering mode of the game control instruction is specifically as follows: the game participating player participates in the game through the terminal, the game participating player can click the online hosting control in the game process to trigger the game control instruction corresponding to online hosting, the game control instruction comprises a label corresponding to the online hosting, and the game control instruction corresponding to the online hosting is sent to the server so that the server can execute online hosting operation based on the game control instruction corresponding to the online hosting, and the online hosting operation needs to call the AI model, so that the call instruction of the AI model is triggered.

Similarly, when a game participant plays a game through a terminal, the game participant can select different game modes to participate in the game due to the existence of a plurality of modes of the game, including a man-machine fight mode, a quick match mode, a real man fight mode and the like, after the game participant selects the game mode, a game control instruction corresponding to the selected game mode is generated and sent to a server for the server to call corresponding game data, and the calling instruction of the AI model is triggered due to the fact that the modes of the man-machine fight mode, the quick match mode and the like need to call the AI model.

Step S202, if the triggered calling instruction is monitored, current game data are obtained, and according to the current game data, whether the model to be called is a role election model or a strategy prediction model is determined.

If a triggered calling instruction of the AI model is monitored, the server acquires current game data and determines whether the model to be called is a role election model or a strategy prediction model according to the current game data.

The current game play data includes, but is not limited to, character holding information of the game participant, character output information of each round, information that no character appears, secret character information, multiple information, and character information. The multiple information is used for representing the multiple of game play, the character holding information comprises holding characters and the number of each holding character, the covert character information comprises covert characters and the number of each covert character, the information of the non-appearing characters comprises the number of the non-appearing characters and each non-appearing character, and the role information is used for representing the role played by the game participant in the play.

It should be noted that, the current game-play data corresponding to different types of games are different in information, and hereinafter, a pit-in-the-earth game is taken as an example for description, and the current game-play data of the pit-in-the-earth game includes card holding information of players participating in the three-party game, card-out information of each round, fold information, character information, and information of cards not present. The multiple information is used for representing the multiple of the game play, the card holding information comprises the number of the held cards and characters of each held card, the deal information comprises the number of the deal cards and characters of each deal card, the non-appeared card information comprises the number of the non-appeared cards and characters of each non-appeared card, and the role information is used for representing the roles played by the game players in the play, including landlord and farmers.

In an embodiment, the mode of determining whether the model to be invoked is a role election model or a policy prediction model specifically includes: and obtaining the role label of each game participant from the current game-match data, determining whether the role labels of each game participant are the same, if the role labels of each game participant are the same, determining that the model to be called is a strategy prediction model, and if at least one game participant has different role labels, determining that the model to be called is a role election model.

Step S203, if the model to be called is the role election model, calling the role election model to determine a role election label based on the current game data, and generating a role election instruction corresponding to the role election label.

And if the model to be called is the role election model, the server calls the role election model to determine a role election label based on the current game data and generates a role election instruction corresponding to the role election label. The server executes the role election instruction, or sends the role election instruction to the game server, and the game server executes the role election instruction.

The role election labels comprise labels corresponding to role election participation and labels corresponding to role election non-participation, if the role election labels are the labels corresponding to the role election participation, operation of role election participation is executed, and if the role election labels are the labels corresponding to the role election non-participation, operation of role election non-participation is executed.

Specifically, extracting a first vector characteristic and a first type of image characteristic of a corresponding game participant from current game-matching data, wherein the first vector characteristic and the first type of image characteristic are characteristics required by a role election model; processing the first vector features through two full-connection layers, namely a first full-connection layer and a second full-connection layer to obtain a first target vector; convolving the first type of image features through two layers of convolution layers, namely a first convolution layer and a second convolution layer to obtain a second target vector; splicing the first target vector and the second target vector through a vector splicing layer to obtain a spliced vector; and determining the role election labels of the game participants through the role election layer based on the splicing vectors.

And S204, if the model to be called is a strategy prediction model, calling the strategy prediction model to determine a strategy prediction result based on the current office alignment data, and generating a strategy output instruction corresponding to the strategy prediction result.

And if the model to be called is the strategy prediction model, calling the strategy prediction model, determining a strategy prediction result based on the current office alignment data, and generating a strategy output instruction corresponding to the strategy prediction result. The server executes the strategy output instruction, or sends the strategy output instruction to the game server, and the game server executes the strategy output instruction. For example, if the result of the strategy prediction is 9993, a strategy output command corresponding to 9993 is generated, and the server or the game server selects 9993 to play the card based on the strategy output command corresponding to 9993.

Specifically, second vector characteristics and second class image characteristics of corresponding game participants are extracted from current game-play data, wherein the second vector characteristics and the second class image characteristics are required by a strategy prediction model; processing the second vector characteristics through the first full-connection layer and the second full-connection layer to obtain a first target vector; convolving the second type of image features through the first convolution layer and the second convolution layer to obtain a second target vector; splicing the first target vector and the second target vector through the first vector splicing layer to obtain a first spliced vector; determining a first guess label based on the first stitching vector through a third fully-connected layer, and determining a second guess label based on the first stitching vector through a fourth fully-connected layer; processing the first splicing vector through a fifth full-connection layer, and splicing the first guess label, the second guess label and the processed first splicing vector through a second vector splicing layer to obtain a second splicing vector; determining a master strategy label and a master strategy based on the second splicing vector through the sixth full connection layer, and determining a slave strategy label and a slave strategy based on the second splicing vector through the seventh full connection layer; and taking the master strategy and the slave strategy as strategy prediction results of the game participants.

In one embodiment, the method for determining the strategy prediction result of the game participant specifically comprises the following steps: determining a strategy prediction result of a game participant according to a main strategy label, a secondary strategy label and current game data output by a strategy prediction model, namely acquiring current card holding information and historical card playing information of the game participant from the current game data; determining a strategy prediction result of a game participant according to historical card playing information, current card holding information, a main strategy label and a slave strategy label, namely judging whether the playing of the last game participant in the historical card playing information is empty or not, if the playing of the last game participant in the historical card playing information is empty, acquiring card playing information corresponding to the main strategy label and the slave strategy label from the current card holding information, and taking the card playing information as the strategy prediction result of the game participant; if the cards of the last game participator in the historical card playing information are not empty, card playing information corresponding to the main strategy label and the slave strategy label is obtained from the current card holding information, and the main strategy in the card playing information is larger than the main strategy of the last game participator in the card playing.

Referring to fig. 8, fig. 8 is a schematic view of a scene of training and calling of an AI model in an embodiment of the present application, and as shown in fig. 8, the scene includes a model training flow and a model calling flow, the model training flow includes a role election model training flow and a strategy prediction model training flow, the role election model training flow is to extract role election features including required vector features and image-like features from game match data, extract role election labels, form a sample data set, and train a role election model based on the sample data set; the strategy prediction model training process comprises the steps of extracting strategy prediction characteristics including required vector characteristics and image-like characteristics from game data, extracting strategy labels including guess labels and strategy prediction labels to form a sample data set, and training the strategy prediction model based on the sample data set; the model calling process is that the game server sends game match data to the AI server, the AI server determines whether the model to be called is a role election model or a strategy prediction model, if the model to be called is the role election model, the role election model is called, a role election instruction is output to the game server, and if the model to be called is the strategy prediction model, the strategy prediction model is called, and a strategy output instruction is output to the game server.

According to the calling method of the AI model provided by the embodiment, the AI model can be quickly called through the triggered calling instruction of the AI model to realize role election and mahjong tile play prediction, the calling speed of the AI model is improved, and due to the fact that the performance of the AI model is good, the role election and mahjong tile play prediction can be accurately realized, and user experience is effectively improved.

The apparatus provided by the above embodiment may be implemented in a form of a computer program, and the computer program may be run on a server as shown in fig. 9.

Referring to fig. 9, fig. 9 is a schematic block diagram of a server according to an embodiment of the present disclosure. The server may be a single server or a server cluster including a plurality of servers.

As shown in fig. 9, the server includes a processor, a memory and a network interface connected through a system bus, wherein the memory may include a nonvolatile storage medium and an internal memory, and the memory stores AI models including a role election model and a policy prediction model.

The non-volatile storage medium may store an operating system and a computer program. The computer program includes program instructions that, when executed, cause the processor to perform any of the calling methods of the AI model.

The processor is used for providing calculation and control capability and supporting the operation of the whole computer equipment.

The internal memory provides an environment for running a computer program in the nonvolatile storage medium, and the computer program, when executed by the processor, causes the processor to execute a calling method of any one of the AI models. It should be noted that, as will be clearly understood by those skilled in the art, for convenience and brevity of description, the specific working process of the server described above may refer to the corresponding process in the foregoing calling method embodiment of the AI model, and details are not described herein again.

The network interface is used for network communication, such as sending assigned tasks and the like. Those skilled in the art will appreciate that the architecture shown in fig. 9 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, and that a particular server may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

It should be understood that the Processor may be a Central Processing Unit (CPU), and the Processor may be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, etc. Wherein a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

Wherein, in one embodiment, the processor is configured to execute a computer program stored in the memory to implement the steps of:

In one embodiment, the role election model includes two fully-connected layers, two convolutional layers, a vector splicing layer, and a role election layer, where the role election layer is a fully-connected layer, and the processor is configured to implement, when determining a role election tag based on the current game data by invoking the role election model, that:

extracting a first vector characteristic and a first type of image characteristic of a corresponding game participant from the current game-play data;

processing the first vector feature through the two fully-connected layers to obtain a first target vector;

convolving the first type of image features through the two layers of convolution layers to obtain a second target vector;

splicing the first target vector and the second target vector through the vector splicing layer to obtain a spliced vector;

determining, by the role election layer, role election tags for the game participants based on the stitching vectors.

In one embodiment, the policy prediction model includes a first fully-connected layer, a second fully-connected layer, a third fully-connected layer, a fourth fully-connected layer, a fifth fully-connected layer, a sixth fully-connected layer, a seventh fully-connected layer, a first convolution layer, a second convolution layer, a first vector stitching layer, and a second vector stitching layer; the processor is used for realizing that when the strategy prediction model is called and the strategy prediction result is determined based on the current game data, the processor is used for realizing that:

extracting second vector characteristics and second type image characteristics of corresponding game participants from the current game-play data;

processing the second vector characteristics through the first full-connection layer and the second full-connection layer to obtain a first target vector;

convolving the second type of image features through the first convolution layer and the second convolution layer to obtain a second target vector;

splicing the first target vector and the second target vector through the first vector splicing layer to obtain a first spliced vector;

determining, by the third fully-connected layer, a first guess label based on the first stitching vector, and determining, by the fourth fully-connected layer, a second guess label based on the first stitching vector;

processing the first splicing vector through the fifth full-connection layer, and splicing the first guessed label, the second guessed label and the processed first splicing vector through the second vector splicing layer to obtain a second splicing vector;

determining, by the sixth fully-connected layer, a master policy label and a master policy based on the second stitching vector, and determining, by the seventh fully-connected layer, a slave policy label and a slave policy based on the second stitching vector;

and taking the master strategy and the slave strategy as strategy prediction results of the game participants.

In one embodiment, the processor, when executing the call instruction to determine whether to trigger the AI model, is configured to:

monitoring whether the game state of a game participant is a game offline state or not in the game process;

when the monitored game state of the game participant player is a game offline state, triggering a calling instruction of the AI model;

and when the game state of the game participant is monitored to be the game online state, the calling instruction of the AI model is not triggered.

receiving a game control instruction sent by a game participation terminal, wherein the game control instruction comprises a game control label;

judging whether the game control tags are located in a preset tag group or not, wherein the preset tag group comprises tags corresponding to online hosting, man-machine fighting and game fast matching;

and if the game control tag is not located in the preset tag group, not triggering the calling instruction of the AI model.

An embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, where the computer program includes program instructions, and the program instructions, when executed, implement the following steps:

The computer-readable storage medium may be an internal storage unit of the server according to the foregoing embodiment, for example, a hard disk or a memory of the server. The computer readable storage medium may also be an external storage device of the server, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the server.

It is to be understood that the terminology used in the description of the present application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items. It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments. While the invention has been described with reference to specific embodiments, the scope of the invention is not limited thereto, and those skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the invention. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A training method of an Artificial Intelligence (AI) model is characterized by comprising the following steps:

acquiring a first sample data set, wherein the first sample data set comprises first-class image features, first vector features and labeled role election labels, the first-class image features are used for representing character holding information and information of characters which do not appear, the horizontal axis is the type of the characters, the vertical axis is the number of each character, and the first vector features are used for representing role election information;

acquiring a second sample data set, wherein the second sample data set comprises a second type of image features, a second vector feature and labeled strategy labels, the second type of image features comprise a plurality of channels, the second type of image features are used for representing the spatial distribution of character information, the horizontal axis is the type of characters, the vertical axis is the number of each character, and the second vector features are used for representing the comparison results of role information, the number of character holding numbers, role competition multiples, character output information, character holding information and character output information;

loading an AI model to be trained, wherein the AI model comprises a role election model and a strategy prediction model, the role election model comprises a vector splicing layer, the vector splicing layer is used for splicing the processed first vector features and the processed first type of image features, the strategy prediction model comprises a first vector splicing layer, and the first vector splicing layer is used for splicing the processed second vector features and the processed second type of image features;

2. The AI model training method of claim 1, wherein the role election model comprises two fully-connected layers, two convolutional layers, a vector splicing layer, and a role election layer, the role election layer being one fully-connected layer; the iteratively training the role election model according to the first sample data set until the role election model converges includes:

acquiring a group of sample data from the first sample data set each time, wherein the sample data comprises a first type of image feature, a first vector feature and a role election label;

processing the first vector features through the two fully-connected layers to obtain a first target vector, and performing convolution processing on the first type of image features through the two convolution layers to obtain a second target vector;

splicing the first target vector and the second target vector through the vector splicing layer to obtain a spliced vector, and processing the spliced vector through the role election layer to obtain the output probability of the role election label;

calculating a current loss value according to the role election label and the output probability, and determining whether the role election model is converged according to the current loss value;

and if the role election model is converged, stopping model training, if the role election model is not converged, updating parameters of the role election model, and continuing to train the updated role election model.

3. The method of training an AI model of claim 1, wherein the policy prediction model includes a first fully-connected layer, a second fully-connected layer, a third fully-connected layer, a fourth fully-connected layer, a fifth fully-connected layer, a sixth fully-connected layer, a seventh fully-connected layer, a first convolution layer, a second convolution layer, a first vector splice layer, and a second vector splice layer; the iteratively training the strategy prediction model according to the second sample data set until the strategy prediction model converges includes:

obtaining a group of sample data from the second sample data set each time, wherein the sample data comprises a second class of image features, a second vector feature and policy tags, and the policy tags comprise a first guess tag, a second guess tag, a master policy tag and a slave policy tag;

determining, by the third full-link layer, an output probability of the first guess tag based on the first stitching vector, and calculating a first loss value according to the first guess tag and the corresponding output probability;

determining an output probability of the second guess label based on the first stitching vector through the fourth full link layer, and calculating a second loss value according to the second guess label and the corresponding output probability;

determining the output probability of the main strategy label based on the second splicing vector through the sixth full-connection layer, and calculating a third loss value according to the main strategy label and the corresponding output probability;

determining, by the seventh full-link layer, an output probability of the slave policy label based on the second stitching vector, and calculating a fourth loss value according to the slave policy label and the corresponding output probability;

determining whether the strategy prediction model converges according to the first loss value, the second loss value, the third loss value and the fourth loss value;

and if the strategy prediction model is converged, stopping model training, if the strategy prediction model is not converged, updating parameters of the strategy prediction model, and continuing to train the updated strategy prediction model.

4. The AI model training method of claim 3, wherein determining whether the strategy prediction model converges based on the first loss value, the second loss value, the third loss value, and the fourth loss value further comprises:

calculating the sum of the first loss value, the second loss value, the third loss value and the fourth loss value, and taking the sum of the first loss value, the second loss value, the third loss value and the fourth loss value as a total loss value;

acquiring a historical total loss value, and calculating a difference value between the historical loss value and the total loss value, wherein the historical total loss value is the total loss value in the last model training;

determining whether a difference between the historical loss value and the total loss value is less than or equal to a preset threshold;

if the difference value between the historical loss value and the total loss value is smaller than or equal to a preset threshold value, determining that the strategy prediction model is converged;

and if the difference value between the historical loss value and the total loss value is larger than a preset threshold value, determining that the strategy prediction model is not converged.

5. A method for calling an AI model, comprising:

determining whether to trigger a call instruction of an AI model, wherein the AI model comprises a role election model and a strategy prediction model, the role election model comprises a vector splicing layer, the vector splicing layer is used for splicing a processed first vector characteristic and a processed first type of image characteristic, the strategy prediction model comprises a first vector splicing layer, the first vector splicing layer is used for splicing a processed second vector characteristic and a processed second type of image characteristic, the first type of image characteristic is used for representing character holding information and information of non-appearing characters, the horizontal axis is the type of the characters, the vertical axis is the number of each character, the first vector characteristic is used for representing role election information, the second type of image characteristic comprises a plurality of channels, the second type of image characteristic is used for representing the spatial distribution of the character information, and the horizontal axis is the type of the characters, the vertical axis is the number of each character, and the second vector characteristics are used for representing role information, character holding number, role election multiple, character output information and a comparison result of the character holding information and the character output information;

6. The AI model invoking method according to claim 5, wherein the role election model comprises two fully connected layers, two convolutional layers, a vector splicing layer, and a role election layer, the role election layer being one fully connected layer; the step of calling the role election model to determine a role election label based on the current game data comprises the following steps:

7. The method for invoking an AI model according to claim 5, wherein the policy prediction model comprises a first fully connected layer, a second fully connected layer, a third fully connected layer, a fourth fully connected layer, a fifth fully connected layer, a sixth fully connected layer, a seventh fully connected layer, a first convolution layer, a second convolution layer, a first vector splice layer, and a second vector splice layer; the calling the strategy prediction model to determine a strategy prediction result based on the current office data comprises the following steps:

8. The AI model invoking method according to claim 5, wherein the determining whether to trigger an invoking instruction of the AI model comprises:

9. A server, characterized in that the server comprises a processor, a memory, and a computer program stored on the memory and executable by the processor, the memory storing AI models including a role election model and a policy prediction model, wherein the computer program, when executed by the processor, implements the steps of the calling method of the AI model according to any one of claims 5 to 8.

10. A computer-readable storage medium, characterized in that a computer program is stored thereon, wherein the computer program, when being executed by a processor, carries out the steps of the calling method of the AI model according to any one of claims 5 to 8.