CN112149798B

CN112149798B - AI model training method, AI model calling method, apparatus and readable storage medium

Info

Publication number: CN112149798B
Application number: CN202010912339.4A
Authority: CN
Inventors: 郭仁杰; 王宇舟; 武建芳; 杨木; 张弛; 杨正云; 杨少杰; 李宏亮; 刘永升
Original assignee: Super Parameter Technology Shenzhen Co ltd
Current assignee: Super Parameter Technology Shenzhen Co ltd
Priority date: 2020-09-02
Filing date: 2020-09-02
Publication date: 2023-11-24
Anticipated expiration: 2040-09-02
Also published as: CN112149798A

Abstract

The application discloses an AI model training method, an AI model calling device and a readable storage medium, wherein the AI model training method comprises the following steps: acquiring metadata for training, and loading an AI model to be trained, wherein the AI model is realized based on a deep reinforcement neural network; performing block processing on metadata of which the metadata belong to the same frame to obtain a plurality of sub metadata; extracting features of the sub-metadata to obtain feature vectors corresponding to the sub-metadata; training the AI model according to the feature vector, and determining whether the trained AI model converges or not; if the trained AI model does not converge, executing the steps of: and if the trained AI model converges, storing the trained AI model. The training sample of model training is improved, and then the training accuracy and training efficiency of the AI model are improved.

Description

AI model training method, AI model calling method, apparatus and readable storage medium

Technical Field

The present application relates to the field of artificial intelligence, and in particular, to an AI model training method, an AI model calling method, a computer device, and a computer readable storage medium.

Background

With the rapid development of artificial intelligence (Artificial Intelligence, AI) technology, the artificial intelligence technology is widely applied to various fields, such as 3D modeling or 3D game fields, and the artificial intelligence technology can be used for better predicting and predicting character behaviors in games, so that users can have better experience in operation by giving more proper feedback.

In the conventional game AI training mode, a process is adopted corresponding to one game, and the mode is often accompanied with waste of machine resources. The machine resource overhead required for a game is typically very large, especially for large 3D and/or map games such as chicken consumption. Meanwhile, the whole decision process is prolonged due to a larger map during model prediction, and a great challenge is brought to reinforcement learning training.

Therefore, there is a need for an AI model training method that improves AI model accuracy and training efficiency.

Disclosure of Invention

The application provides an AI model training method, an AI model calling method, computer equipment and a computer readable storage medium, so as to improve the accuracy and training efficiency of an AI model.

In a first aspect, the present application provides an AI model training method, where the method includes:

acquiring metadata for training, and loading an AI model to be trained, wherein the AI model is realized based on a deep reinforcement neural network;

performing block processing on metadata of which the metadata belong to the same frame to obtain a plurality of sub metadata;

extracting features of the sub-metadata to obtain feature vectors corresponding to the sub-metadata;

training the AI model according to the feature vector, and determining whether the trained AI model converges or not;

if the trained AI model does not converge, executing the steps of:

and if the trained AI model converges, storing the trained AI model.

In a second aspect, the present application further provides an AI model invoking method, where the method includes:

receiving a model calling instruction, wherein the model calling instruction is used for calling a prestored AI model, and the AI model is obtained by adopting the AI model training method described in the first aspect;

calling a corresponding AI model according to the model calling instruction, and receiving uploaded data to be analyzed;

and inputting the data to be analyzed into the AI model to output and obtain a target behavior instruction, and feeding back the target behavior instruction.

In a third aspect, the present application also provides a computer device, the computer device comprising a processor, a memory, and a computer program stored on the memory and executable by the processor, wherein the computer program when executed by the processor implements the steps of the AI model training method and/or AI model invoking method as described above.

In a fourth aspect, the present application also provides a computer readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of the AI model training method and/or AI model invoking method described above.

According to the AI model training method provided by the embodiment of the application, when the AI model is trained, continuous learning and training of the AI model are realized by acquiring related metadata, specifically, metadata belonging to the same frame in the metadata is subjected to block processing to obtain a plurality of sub-metadata, then each sub-metadata is used as a training sample, feature vectors corresponding to each sub-metadata are obtained through feature extraction, and then the loaded AI model needing to be trained is trained once according to the obtained feature vectors, and one-time learning and training are realized through continuous self-game, so that the AI model is converged. By conducting blocking processing on the metadata, training samples of model training are improved under the condition that simulation times and time length are not improved, and therefore training accuracy of an AI model is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart illustrating steps of an AI model modeling method according to one embodiment of the present application;

FIG. 2 is a schematic diagram of a hierarchical structure of an AI model according to one embodiment of the application;

FIG. 3 (a) is a block diagram of a map block method according to an embodiment of the present application;

FIG. 3 (b) is a block diagram illustrating another map block method according to an embodiment of the present application;

FIG. 4 is a flowchart illustrating steps for obtaining feature vectors according to an embodiment of the present application;

FIG. 5 is a flowchart illustrating the steps for training an AI model according to one embodiment of the application;

FIG. 6 is a flowchart illustrating an AI model calling method according to one embodiment of the present application;

FIG. 7 is a flowchart illustrating steps for obtaining target behavior instructions and feeding back the target behavior instructions according to an embodiment of the present application;

Fig. 8 is a schematic block diagram of a computer device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The flow diagrams depicted in the figures are merely illustrative and not necessarily all of the elements and operations/steps are included or performed in the order described. For example, some operations/steps may be further divided, combined, or partially combined, so that the order of actual execution may be changed according to actual situations.

It is to be understood that the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should also be understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

The embodiment of the application provides an AI model training method, an AI model calling method, AI model calling equipment and a computer readable storage medium. The AI model training method can be applied to a server, wherein the server can be a single server or a server cluster consisting of a plurality of servers.

Some embodiments of the present application are described in detail below with reference to the accompanying drawings. The following embodiments and features of the embodiments may be combined with each other without conflict.

It should be noted that, in the following description, the application of the AI model to the 3D game will be described as an example, and it should be appreciated that the AI model may be applied to other 3D scenes as well.

Referring to fig. 1, fig. 1 is a flowchart illustrating steps of an AI model modeling method according to an embodiment of the application.

As shown in fig. 1, the AI model modeling method includes steps S101 to S105.

And step S101, acquiring metadata for training, and loading an AI model to be trained, wherein the AI model is realized based on a deep reinforcement neural network.

Metadata (Meta data), also called intermediate data, relay data, is data (data about data) describing data, mainly describing data attribute (property) information, and is used to support functions such as indicating storage location, history data, resource searching, file recording, etc. Metadata is an electronic catalog, and in order to achieve the purpose of cataloging, the contents or characteristics of data must be described and collected, so as to achieve the purpose of assisting in data retrieval.

In the application, metadata is data describing data generated in a game play, the metadata can be understood as text data, the data describing the data generated in the game play is recorded in the text, when training of an AI model is required, the recorded metadata is acquired, and the AI model to be trained is loaded at the same time, so that the AI model to be trained is trained by utilizing the obtained metadata.

In one embodiment, upon detecting a model training initiation instruction, the server will obtain the training metadata and load the AI model to be trained. The model training start instruction is used for triggering a corresponding model training function to enable the server to start training a corresponding AI model, where the model training start instruction may be triggered by an operation performed by a user or may be triggered by the server at a fixed time, and is not specifically limited herein.

The game play data is play data of a 3D game, including but not limited to play data of a 3DFPS (3D First Person Shooter) game, but may be play data of other types of 3D games, and is not limited thereto.

The game is taken as a popular 3DFPS hand game to describe game play data, wherein the game play data comprises attributes and characteristics corresponding to characters, materials, sounds, poison circles, equipment, global information and the like, for example, the characters mainly comprise the current attributes of my and enemy players; the supplies mainly comprise attributes of objects visible in the field of view of the player; sound refers to sound generated in a game; the poison circle is the record information of the poison circle in the office; the equipment refers to the information of the armors, helmets and weapons owned by each weapon slot on the player; the global information mainly includes the progress time of the current game, the survival number of teammates, the killing population of the team on my side, and the like.

After acquiring the metadata for training, the server loads the AI model to be trained, wherein the AI model is implemented based on a deep enhanced neural network, as shown in fig. 2, fig. 2 is a schematic hierarchical structure of the AI model according to an embodiment of the present application, and the AI model includes a long-short-term memory layer and a full connection layer, and the long-short-term memory layer is connected with the connection layer.

In some embodiments, in the training process, a server feature vector (which is a feature vector) is used as an input of a long-short-term memory layer to be processed to obtain a corresponding output vector, and then the obtained output vector is used as an input of a full-connection layer to be processed to obtain a current executable behavior instruction, and finally a probability value corresponding to each behavior instruction is calculated to execute a behavior corresponding to a highest probability value, so that one training of the AI model is realized.

It should be noted that, for the hierarchical structure of the AI model, other hierarchical structures may be also used, for example, the AI model includes a first fully-connected layer, a long-short-term memory layer, and a second fully-connected layer, where the first fully-connected layer is connected to the long-short-term memory layer, and the long-short-term memory layer is connected to the second fully-connected layer.

The server processes the feature vector to obtain a first output vector, then processes the first output vector as a long-period memory layer for a second time to obtain a second output vector, finally processes the second output vector as the input of a second full-connection layer to obtain a third output vector so as to obtain current executable behavior instructions, and finally calculates a probability value corresponding to each behavior instruction so as to execute a behavior corresponding to the highest probability value, thereby realizing one-time training of the AI model.

And step S102, performing block processing on the metadata of which the metadata belong to the same frame to obtain a plurality of sub-metadata.

For metadata used in model training, the more training samples are needed, the better the accuracy of model training can be ensured, but in the model training process, the too large amount of data contained in one sample for one training can make the time of one training too long, so that the training efficiency can be reduced to a certain extent, and the training samples need to be reasonably processed.

After obtaining the metadata, the metadata is first processed accordingly to obtain sample data required for training. The data information corresponding to different time points is different for any one process, namely, the data information contained in any one frame of data is different, particularly, the game data corresponding to different game time points is different in the game process, namely, the data of different time point frames is different in the whole game process. Therefore, when metadata is obtained and processed, the metadata belonging to the same frame is subjected to blocking processing, so that the blocking of one frame of data is realized, the data volume contained in each training sample is reduced, and the training sample volume is increased.

Specifically, one frame of data represents data information corresponding to a certain time.

In one embodiment, when the obtained metadata to be trained is subjected to the blocking processing, each frame of data is targeted, not the data of a different frame, and when the blocking processing is specifically performed, the method includes: and receiving an input blocking strategy, and performing blocking processing on metadata belonging to the same frame in the metadata according to the blocking strategy to obtain a plurality of sub-metadata corresponding to the metadata.

When the partitioning processing is performed, a corresponding partitioning strategy is needed, and then the metadata is partitioned according to the selected or set partitioning strategy.

The partitioning policy may be a manually set partitioning policy or a fixed partitioning policy set in advance. In fact, when the data is segmented, the same frame data is divided into a plurality of subframes, while in the 3DFPS game, a game map is an indispensable thing, and when the data is segmented, that is, when the game map is segmented, the data corresponding to a small block after the segmentation process at a certain frame time is used as sample data of model training, and not the data of the whole map corresponding to the frame time is used as sample data of one training.

When the obtained metadata is segmented according to the game map, the obtained metadata can be segmented according to the actually used game map, for example, when the game map has 5 areas (islands) which are not related to each other, each area can be used as one block as shown in fig. 3 (a), so as to realize the metadata segmentation, in practical application, for example, 5 islands are A, B, C, D and E respectively, when the game map is segmented, one frame of data is segmented into A, B, C, D and E and 5 areas, the metadata corresponding to each area is one sub-metadata, and when the game map is trained, the data corresponding to each area is used as one training sample.

In addition, when the block is performed, a corresponding block rule may be set, and specifically, the block may be implemented according to coordinates. When the partitioning is implemented based on coordinates, a coordinate system of the whole game map is constructed, and then a range corresponding to each partitioning is set, as shown in fig. 3 (b), such as actual coordinates of an endpoint or a boundary, so that the partitioning of metadata is implemented according to different partitioning, for example, the map is divided into 6 partitioned maps.

When the metadata are segmented, the data are actually segmented, after the areas of different segments are determined, the metadata of each frame are segmented according to the boundary information of the areas, so that the segmentation of the metadata of each frame is realized, and a plurality of sub-metadata corresponding to the whole metadata are obtained.

And step S103, extracting the characteristics of the sub-metadata to obtain the characteristic vector corresponding to the sub-metadata.

When the feature extraction is carried out, firstly, determining which data are feature data, then, carrying out feature extraction to obtain feature vectors corresponding to the metadata, and further, training the loaded AI model according to the extracted feature vectors.

Therefore, after the metadata is subjected to the blocking processing, the obtained sub-metadata is subjected to feature extraction to obtain a feature vector corresponding to each sub-metadata. The feature vector to be extracted for the metadata may be preset, and the feature vector in the metadata may be extracted by a preset method.

In practical use, since descriptions of game data are recorded in metadata, the description of data information of a game is described in a simple manner. By reading the data information recorded by the metadata, the features contained in the acquired metadata are determined.

For example, the data information recorded by the metadata includes attribute information corresponding to characters, materials, sounds, circles, equipment, and global information. Then the corresponding features at this time include: characters, materials, sounds, circles, equipment, and global information. When the feature is obtained from the metadata, since the feature relates to the game itself, a keyword of the feature of the game can be set in advance, and the feature included in the metadata can be specified by reading information from the metadata and identifying the keyword.

In one embodiment, the features contained in the metadata are identified, the features contained in the metadata are determined, and then vector extraction is performed on each feature, that is, the attribute information of each feature is extracted, so as to obtain the vector feature corresponding to each feature.

The extraction modes of the attribute information corresponding to different features are different, and the respective feature extraction modes can be set according to the different games, and the feature extraction is explained below by taking a certain 3DFPS hand tour as an example.

The attribute information corresponding to each feature is different, for example, the character features comprise the current attributes of the my player and the enemy player, such as the features of position, blood volume, orientation, speed, weapon equipment and the like; the asset characteristics include attributes of the items visible to the player in view, such as item location, distance, relative orientation, asset type, etc., where the asset type includes: gun, knife, armor, helmet, medicine, throwing object, monument, etc.; the sound features mainly include: the location, relative orientation, type of sound source, etc.; the toxin circle characteristics comprise the current toxin circle center, the current toxin circle radius, the current toxin circle surplus, the center of the next toxin circle, the radius of the next toxin circle and the like; the equipment characteristics are information of armors, helmets and weapons owned by each weapon slot on the player; the global information features mainly include the progress time of the current game, the survival number of teammates, the total number of the team hit by my, and the like.

In an embodiment, as shown in fig. 4, fig. 4 is a flowchart illustrating a step of obtaining a feature vector according to an embodiment of the present application. Wherein step S103 includes sub-steps S401 to S404.

And in the substep S401, extracting the characteristics of the sub-metadata to obtain the vector characteristics corresponding to the sub-metadata.

After the blocking processing of the metadata is completed, extracting the characteristics of each obtained sub-metadata to obtain the vector characteristics corresponding to each sub-metadata. In practical application, when extracting the feature vector, the feature vector corresponding to the metadata is obtained, and then the AI model to be trained is trained according to the obtained feature vector. For the feature vector, the information includes the vector feature and the image feature, so when extracting the feature vector from the sub-metadata, the vector feature and the image feature corresponding to the sub-metadata are obtained.

The vector features are used for describing the characters, and the image features are used for describing the block map corresponding to the current sub-metadata.

In one embodiment, when extracting the vector features corresponding to the sub-metadata, the method specifically includes: identifying the characteristics contained in the sub-metadata, and acquiring attribute information corresponding to the characteristics; and determining the sub-vector features corresponding to the features respectively according to the attribute information to obtain the vector features corresponding to the sub-metadata.

When extracting vector features, firstly determining which features are included in the sub-metadata, so that the features are firstly identified, then the attribute information corresponding to each feature is obtained from the sub-metadata, and finally the vector features corresponding to the sub-metadata are obtained according to different features and the corresponding attribute information.

The metadata is a description of the game data, and the feature and attribute information obtained at this time is a brief description of the game data, for example, the feature includes: the attribute information may be obtained by obtaining the attribute information such as the blood volume 100 of the my person, the number 01 of the weapon, and the position coordinates (10, 12, 5), and the like, and the attribute information may be obtained based on the actual metadata.

When extracting the vector features, the information of each character feature, such as blood volume of the character and coordinates in a map, can be contained, and weapons held by tasks can be contained, and then all the information is vectorized to obtain the corresponding vector features. In practical applications, the vector features include blood volume of the person, coordinate values, coordinates of enemy seen in the field of view, coordinates of seen supplies, weapon status currently owned and used, poison circle status, and the like.

For example, the blood volume of the person is 10, the coordinates in the block map are (34, 68, 10), and the attribute of the weapon held is 01, then the vector feature obtained at this time may be [34, 68, 10, 10, 01]. Similarly, if the composition of the desired vector feature is also other data, the final vector feature is obtained using the above-described method, for example, if there are multiple persons, the coordinates, blood volume, weapon attribute, etc. of all the persons are acquired.

And step S402, determining a character entity contained in the sub-metadata, and determining map coordinates corresponding to the character entity.

As is apparent from the above description, when extracting the feature vector, the obtained feature vector includes the vector feature describing the game attribute and the image feature describing the environment information, and therefore, the image feature is acquired while extracting the vector feature, and therefore, the character entity (Agent) included in the sub-metadata is determined, and then the map coordinates corresponding to the included character entity are determined.

The sub-metadata is training sample data for training the AI model to be trained, and if no character entity exists in the map area corresponding to the sub-metadata, the map area does not generate sample data beneficial to model training in the game process, so the sub-metadata is default to be game data recorded in the case that the character entity exists.

In the game map, since the game map is usually recorded and stored in advance due to the predictability of the game map, that is, the number of the game maps is limited, in a 3DFPS hand game, the area of the game map used is reduced on the original large map even in different game modes, so that the map can be recorded and stored in advance during the training and use of the AI model, so that the map data can be directly acquired when the AI model is needed.

After determining the character entity contained in the sub-metadata, determining the map coordinates corresponding to the character entity, and further obtaining the image features corresponding to the sub-metadata according to the determined map coordinates of the character entity.

For a game map, the map may be coordinated in advance so that each position in the map corresponds to a coordinate, and when determining the map coordinate of the character entity, the coordinate of the character entity in the non-segmented map is directly determined to represent the corresponding map coordinate.

When determining the coordinates of one of the character entities, the coordinates of the character entity on the whole game map are determined, but not the coordinates corresponding to the sub-metadata corresponding to the block map. Since behavior prediction is also required according to the map position of the person entity in the subsequent use of the AI model, the resulting coordinates are those on the entire map.

And step S403, determining a corresponding map image based on the map coordinates, and extracting features of the map image to obtain image features corresponding to the sub-metadata.

After the map coordinates corresponding to the character entities are obtained, determining the corresponding map images according to the map coordinates, and further extracting features of the obtained map images to obtain image features corresponding to the sub-metadata.

In practical application, one sub-metadata corresponds to one map block, so after the map coordinates corresponding to the character entity are obtained, when the map image corresponding to the character entity is determined, the map block corresponding to the character entity is determined first, and then after the map block is determined, feature extraction is performed on the map image corresponding to the map block, so that feature features corresponding to the map block are obtained.

When the image features are extracted, the image features are descriptions of the map of the corresponding block, so that data of the descriptions of the map in the sub metadata are acquired to obtain the corresponding image features. And determining the map block to which the current map belongs according to the map coordinates of the character entity, so as to obtain the corresponding image characteristics.

The image features are divided into a depth map and a tangent plane map, wherein the depth map is used for describing a 3D environment seen by a person entity, and the size of each pixel point corresponds to the distance from the person entity to the point. The tangent plane is used for describing the topography of the small range of the environment where the character entity is located, the value of each pixel point is used for expressing whether the obstacle exists or not, the point is not blocked when the value is 0, and the point is blocked when the value is 1.

And step S404, splicing the vector features with the image features to obtain feature vectors corresponding to the sub-metadata.

After the corresponding vector features and image features are obtained through extracting the features of the sub-metadata, the obtained vector features and image features are spliced to obtain feature vectors corresponding to the sub-metadata, and the obtained feature vectors are used for training the AI model.

And step S104, training the AI model according to the feature vector.

And after extracting the characteristics of each piece of sub-metadata to obtain the characteristic vector corresponding to the sub-metadata, taking the obtained characteristic vector as grandconnection data of the AI model to be input into the AI model to be trained, and determining whether the AI model obtained after training converges according to the training result of each time when training the AI model. The method comprises the steps of determining whether the trained AI model converges or not, namely determining whether the AI model is trained or not, so that the behavior prediction can be accurately performed.

As can be seen from the above description, the AI model is composed of two different levels, specifically, a long-short-term memory layer and a full-connection layer, and the AI model to be trained is automatically trained by inputting the obtained feature vectors.

In some embodiments, the server processes the obtained feature vector through the long-short-term memory layer and the full-connection layer to obtain a behavior instruction obtained by each training, and then the server continuously adjusts related parameter information in the AI model according to feedback of the instruction behavior instruction to realize continuous update of the AI model.

In some embodiments, as shown in fig. 5, fig. 5 is a flowchart illustrating a step of training an AI model according to an embodiment of the present application. Step S104 includes sub-steps S501 to S503.

In the substep S501, the feature vector is input to the long-short-term memory layer, so as to output and obtain a corresponding output vector.

When the server trains the loaded AI model according to the obtained feature vector, the feature vector is used as the training input of the AI model, and in the specific training process, the AI model can be continuously completed by utilizing a self-playing training mode, namely, the training of the AI model is completed through repeated training for a plurality of times, and the training completion is determined only when the related data obtained by training meets the actual requirement of the AI model.

In a specific training process, since the AI model has two levels, and each level has a corresponding processing mode, the training feature vector is sequentially input into the long-short-period memory layer and the full-connection layer during specific training, so that one training is completed.

For single training, the obtained feature vector is input into a long-period memory layer of the AI model, so that the long-period memory layer performs certain analysis processing on the feature vector to obtain a corresponding output vector, and then the output vector is processed to complete one training of the AI model.

In practical application, the output vector is processed according to the obtained feature vector, so that the obtained output vector contains a plurality of action actions which may be executed, so that the action corresponding to the sub-metadata is determined after the output vector is obtained.

In some embodiments, when the long-short-period memory layer processes the first output vector, the related information memorized by the long-short-period memory layer is read, in general, the training of a model is realized through repeated learning and training, repeated learning and training can be realized by utilizing the self-memory function of the long-short-period memory layer, but the repeated training and learning is aimed at one training, the long-short-period memory layer has certain memory capacity for each learned data, such as a character blood volume change process, an enemy player moving process and the like, and the current available behavior is predicted through the received first output vector and the self-memorized data information and is output in a vector form.

In step S502, the output vector is input to the fully-connected layer to output a probability value corresponding to the executable behavior.

After obtaining the output vector corresponding to the executable behaviors, inputting the output vector into the fully connected layer for processing, and further outputting and obtaining the probability value corresponding to each executable behavior, wherein the probability value represents the executable possibility of the corresponding behavior, and for the behaviors with high probability values, the behaviors with low probability values can be directly executed, and for the behaviors with low probability values, the behaviors with low probability values can be directly ignored.

In some embodiments, the output vector contains all executable actions derived from the sub-metadata, where the executable actions are related to the actual scenario applied. An explanation will be given taking a 3DFPS hand tour as an example.

In general, the actions that people can perform when performing the manipulation include: movement, side-to-side aiming, up-and-down aiming, attack, medication, rescue, squat, and jump, where only 8 behaviors are assumed, but in practice there are many more, without limitation.

For each behavior, the output vector contains data information related to each behavior, and after the output vector containing the executable behavior is obtained, the output vector is input into a corresponding fully-connected layer for learning, so that different executable behaviors and corresponding executed probability values are obtained.

In this 3DFPS game, the number of sub full connection layers may be set equal to the number of behaviors, for example, 8 sub full connection layers, each sub full connection layer represents a behavior, and when an output vector is input into a sub full connection layer corresponding to each behavior for learning, that is, the output vector is used as an input of each sub full connection layer, and an executable probability value corresponding to each behavior is obtained through certain learning and related processing.

Further, when the probability value of each executable action is obtained, the probability value corresponding to each action, that is, the probability distribution corresponding to all the executable actions, can be calculated by utilizing Softmax, and in actual application, the larger the probability, the more likely the action corresponding to the action is executed.

And step S503, selecting a behavior corresponding to the maximum probability value in the probability values to generate a target behavior instruction, so as to execute the target behavior instruction to obtain a corresponding behavior excitation value.

Through the learning of the full connection layer, probability distribution conditions corresponding to each executable behavior can be obtained, at the moment, the behavior corresponding to the maximum probability value is selected as the current behavior to be executed, then the server controls the AI model to execute the selected behavior, and meanwhile, the behavior excitation value corresponding to the execution of the selected behavior is obtained.

The AI model is a self-game process in the training process, when analyzing and processing one piece of sub-metadata, various behaviors which possibly need to be instructed and correspond to the current time can be determined, but the AI model is not directly controlled to respond according to all the behaviors which possibly need to be executed, and the most suitable behaviors are selected as the corresponding behaviors at the current time each time through one-time learning and game.

In the process of self-game, for each learning, that is, each behavior execution, the server performs a certain feedback, and in general, the feedback given by the server includes positive and negative feedback, so that the quality and the quality of the result of each learning can be determined through each feedback of the server.

In some embodiments, the feedback of the server may be quantified, for example, a mental score of adding and subtracting, a corresponding adding and subtracting is performed when the feedback result is positive, a certain subtracting and subtracting is performed when the feedback result is negative, and each score is used as an incentive value of each behavioral response, and whether the learning is positive or not may be determined by using the incentive value.

Step S105, determining whether the trained AI model converges.

After updating the AI model, it is determined whether the updated AI model converges. Specifically, the behavior excitation values recorded during the training of the updated AI model are obtained, so that all excitation values generated in the training process are summarized to obtain a total behavior excitation value of the whole training process, then whether the recorded total behavior excitation value is larger than or equal to a preset behavior excitation value is determined, if the total behavior excitation value is larger than or equal to the preset behavior excitation value, the updated AI model is determined to be converged, and if the total behavior excitation value is smaller than the preset behavior excitation value, the updated AI model is determined to be not converged. It should be noted that the preset behavior excitation value may be set based on actual situations, which is not particularly limited in the present application.

And step S106, if the trained AI model converges, storing the trained AI model.

If the updated AI model does not converge, step S104 is executed, in which the AI model is trained according to the feature vector until the updated AI model converges. If the updated AI model converges, stopping model training and storing the updated AI model for subsequent calling.

It should be noted that, when the trained AI model is not converged, the learning and training will be performed again, and the data for performing the learning and training at this time will be adjusted according to each learning and training, for example, the blood volume of the user in the metadata is 10, and after the taking action is performed at this time, the blood volume becomes 50, and a certain positive stimulus will be given at this time, then at the time of performing the second learning and training, the blood volume of the user in the metadata may be adjusted, that is, from 10 to 50, but the metadata will not be changed, and then the second learning and training will be performed.

According to the AI model training method provided by the embodiment, when the AI model is trained, continuous learning and training of the AI model are achieved through obtaining related metadata, specifically, metadata belonging to the same frame in the metadata are subjected to block processing to obtain a plurality of sub-metadata, then each sub-metadata is used as a training sample, feature vectors corresponding to each sub-metadata are obtained through feature extraction, the loaded AI model needing to be trained is trained once according to the obtained feature vectors, and primary learning and training are achieved through continuous self-game, so that the AI model is converged. By conducting blocking processing on the metadata, training samples of model training are improved under the condition that simulation times and time length are not improved, and therefore training accuracy of an AI model is improved.

Referring to fig. 6, fig. 6 is a flowchart illustrating an AI model calling method according to an embodiment of the application.

As shown in fig. 6, the AI model invoking method includes steps S601 to S603.

Step S601, receiving an AI model calling instruction, where the model calling instruction is used to call a prestored AI model.

The server acquires a model calling instruction, wherein the model calling instruction is used for calling a prestored AI model. The AI model is implemented based on a deep enhanced neural network, and may be stored locally in a software development kit (Software Development Kit, SDK) or stored in a cloud server, which is not particularly limited in the present application.

In some embodiments, the AI model has a hierarchical structure as shown in FIG. 2, and the AI model includes a long and short term memory layer and a full connection layer. The server inputs the feature vector into the long-period memory layer for processing to obtain an output vector, then takes the output vector as the input of the full-connection layer, processes the output vector to obtain current executable behavior instructions, and finally calculates the probability value corresponding to each behavior instruction to execute the behavior corresponding to the highest probability value.

In an embodiment, there are various triggering manners of the invocation instruction of the AI model, for example, it is determined that the AI model needs to be invoked currently according to the actual operation of the user, or it is determined by state information of a device to which the AI model is applied, specifically, for example, when an application or a process running the AI model is in an intelligent processing state, that is, when the application or the process does not operate by the actual user at this time, prediction of a behavior will be performed through the AI model, for example, when the application or the process to which the user needs to apply the AI model changes to a non-user processing state, and then the user can respond to the application or the process by using the AI model through corresponding adjustment.

An explanation will now be given of a 3DFPS hand tour. In the running process of the game, a certain record is formed on actual data of the game state or the process of the game server, when the AI model is used for predicting the behavior, the game server needs to upload the recorded data to the AI server, namely in the actual use process, a corresponding communication connection is established between the game server for recording the data and the AI server for controlling the invoking and using of the AI model, and when the AI model is accessed to the AI server, the AI model can be invoked to predict the behavior in the game.

And step 602, calling a corresponding AI model according to the model calling instruction, and receiving the uploaded data to be analyzed.

After the model calling instruction is acquired, a corresponding AI model is obtained according to the model calling instruction, so that corresponding behavior prediction can be performed, the AI model is realized based on the deep enhancement neural network, and the uploaded data to be analyzed is received when the AI model is called.

In an embodiment, the calling mode of the AI model is remote calling, specifically, the game server obtains data generated by the game itself to upload the data to the AI server associated with the AI model, so that the AI server uses the AI model to analyze and comb according to the received data to be analyzed to obtain corresponding behavior prediction, and then sends the obtained behavior prediction result to the game server to feed back, and finally the game server responds to the received behavior instruction.

It should be noted that, in addition to the AI model manner in the corresponding AI server, the AI model may be built in the game server, so that in this case, when the AI model is called to perform behavior prediction, it is not necessary to upload the data to be analyzed to the AI server disposed in the cloud end, and the behavior prediction may be directly performed in the game server.

Since the operation load of the game server may be increased and the data processing efficiency of the game server may be reduced for the case of incorporating the AI model in the game server, in the present application, the AI model is more called and used in a manner of using remote call by adopting a manner of setting the AI server corresponding to the AI model itself. However, in the actual use process, how to select the AI model and the setting manner of the AI server are not limited, and other suitable manners besides the two manners mentioned in the present application may be used according to the actual situation.

Step S603, inputting the data to be analyzed into the AI model, so as to output and obtain a target behavior instruction, and feeding back the target behavior instruction.

When the behavior prediction is carried out, the received data to be analyzed is used as the input of the called AI model to be input into the AI model, the AI model is utilized to carry out the behavior prediction to output a corresponding target behavior instruction, and the obtained target behavior instruction is further fed back.

In one embodiment, the AI model has a hierarchical structure as shown in FIG. 2, and the AI model includes a long and short term memory layer and a full connection layer. The data to be analyzed is continuously processed by the long-period memory layer and the full-connection layer so as to output the most appropriate behavior instruction corresponding to the data to be analyzed.

Specifically, in an embodiment, referring to fig. 7, fig. 7 is a flowchart illustrating steps of obtaining a target behavior instruction and feeding back the target behavior instruction according to an embodiment of the present application. Step S603 includes steps S701 to S704.

Step S701, determining a person entity included in the data to be analyzed and map coordinates corresponding to the person entity.

When the behavior prediction is carried out, firstly, determining the character entity contained in the received data to be analyzed, and then analyzing the data to be analyzed to obtain the map coordinates corresponding to the determined character entity. In practice, the received data to be analyzed is metadata of a certain frame in game progress, and the map coordinates corresponding to the character entities are determined by identifying the character entities contained in the metadata.

In practical application, the obtained map coordinates of the person entities may be represented by using actual coordinates, and the number of the person entities included in the received data to be analyzed may be a plurality, that is, one or a plurality. The map coordinates are determined for all the character entities included in the data to be analyzed.

And step S702, obtaining corresponding target data according to the map coordinates.

After the map coordinates contained in the data to be analyzed are determined, corresponding target data are obtained according to the obtained map coordinates, wherein the target data comprise part of the data in the data to be analyzed. In addition, the target data also comprises map information of the environment where the character entity is located, and the map information corresponding to the current position is determined through the obtained map coordinates.

In practical application, when the AI model performs behavior prediction, if the input data size is large, the time used for performing prediction is necessarily improved, so that the input data is reasonably processed, and the prediction efficiency can be improved to a certain extent, that is, the decision length of the model is reduced.

In the training process of the AI model, the data are preprocessed in a block mode, and then the model is trained by using the data of different blocks. The specific partitioning method is not limited at all, and may be divided into a plurality of areas (islands) according to the own characteristics of the map, for example, the map itself, or may be considered as a fixed division, for example, the map is divided into a plurality of identical square areas.

In the use process of the model, after determining the character entities and the corresponding map coordinates contained in the data to be analyzed, determining the target data corresponding to each character entity. Specifically, when determining the target data, the data to be analyzed obtained currently can be partitioned according to a data partitioning mode when model training is performed, and then which partitioned data is the target data corresponding to the current is determined according to the determined map coordinates of the person entity. For example, the map is divided into 5 different islands, the data and the map are partitioned according to the islands, and if the received data to be analyzed determines that the included person entity is on the island A, the data of the data island A in the data to be analyzed is taken as target data.

In addition, when determining the target data, it is also possible to determine a circular area as the map information corresponding to the target data, for example, using the map coordinates of the character entity as the center of a circle and R as the radius, and then, determining the data corresponding to the obtained map information as the target data, after obtaining the map coordinates, instead of the data or the block method of the map during the model training.

It should be noted that, in the process of AI model usage and behavior prediction, the prediction of the behavior of each physical entity is generally implemented based on the data of the current frame, so the data to be analyzed is one frame of data corresponding to a moment in the game running process, for example, one frame of data is obtained at the moment, and when the behavior of the physical entity is predicted, the obtained one frame of data is used as the input of the AI model to output a corresponding result, for example, a finally obtained behavior instruction.

Step S703, extracting features of the target data to obtain feature vectors corresponding to the target data.

After the target data is obtained, feature extraction is carried out on the target data so as to obtain feature vectors corresponding to the target data, wherein the feature vectors are obtained by combining image features and vector features of the target data. Specifically, when extracting features of the target data, the process of extracting features of the sub-metadata in the AI model training process may be referred to. And acquiring attribute information corresponding to each feature by identifying the feature tag contained in the target data, thereby realizing the extraction of the feature vector.

Step S704, inputting the feature vector into the AI model to output and obtain a target behavior instruction.

After the feature vector corresponding to the target data is obtained, the feature vector is input into the AI model for behavior prediction, so that a target behavior instruction corresponding to the data to be analyzed is output. Conventionally, the resulting target behavior instructions include, but are not limited to, the following: taking medicine, shooting, moving, squatting, jumping, etc. When the need of executing the medicine taking action is determined according to the vector characteristics, an action instruction for taking medicine is output.

In one embodiment, obtaining the target behavior instruction includes: inputting the characteristic vector into the long-term and short-term memory layer to output and obtain an output vector; inputting the output vector to the full connection layer to output probability values corresponding to the behaviors; and selecting a behavior corresponding to the maximum probability value in the probability values to generate a target behavior instruction.

When the target behavior is predicted, the AI model inputs the feature vector into the long-short-term memory layer to process, so as to obtain an output vector, then the output vector is used as the input of the full-connection layer to process, so as to obtain a current executable behavior instruction and a probability value corresponding to each behavior, when the target behavior is determined, the behavior with the highest probability value is selected as the current corresponding target behavior, and further, a corresponding target behavior instruction is generated according to the determined target behavior, so that the game server responds to the target behavior instruction when receiving feedback of the AI server.

According to the model calling method provided by the embodiment, when the AI model is used for behavior prediction, the obtained data to be analyzed is subjected to block processing to obtain effective data for behavior prediction, and then the effective data is subjected to feature extraction to take the obtained feature vector as the AI model for behavior prediction. The method and the device realize the partitioning processing of the data during behavior prediction, reduce the decision time during the behavior prediction of the model, and improve the efficiency of the behavior prediction.

The apparatus provided by the above embodiments may be implemented in the form of a computer program which may be run on a computer device as shown in fig. 8.

Referring to fig. 8, fig. 8 is a schematic block diagram of a computer device according to an embodiment of the present application. The computer device may be a server.

As shown in fig. 8, the computer device includes a processor, a memory, and a network interface connected by a system bus, wherein the memory may include a non-volatile storage medium and an internal memory.

The non-volatile storage medium may store an operating system and a computer program. The computer program includes program instructions that, when executed, cause the processor to perform any one of AI model training and/or AI model invoking methods.

The processor is used to provide computing and control capabilities to support the operation of the entire computer device.

The internal memory provides an environment for the execution of a computer program in a non-volatile storage medium that, when executed by the processor, causes the processor to perform any one of an AI model training method and/or an AI model invocation method.

The network interface is used for network communication such as transmitting assigned tasks and the like. It will be appreciated by those skilled in the art that the structure shown in FIG. 8 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

It should be appreciated that the processor may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field-programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. Wherein the general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

Wherein in one embodiment the processor is configured to run a computer program stored in the memory to implement the steps of:

if the trained AI model does not converge, executing the steps of:

and if the trained AI model converges, storing the trained AI model.

In one embodiment, when implementing the blocking processing on the metadata of the same frame, the processor is configured to implement:

and receiving an input blocking strategy, and performing blocking processing on metadata belonging to the same frame in the metadata according to the blocking strategy to obtain a plurality of sub-metadata corresponding to the metadata.

In one embodiment, when implementing the feature extraction on the sub-metadata, the processor is configured to implement:

Extracting features of the sub-metadata to obtain vector features corresponding to the sub-metadata;

determining a character entity contained in the sub-metadata, and determining map coordinates corresponding to the character entity;

determining a corresponding map image based on the map coordinates, and extracting features of the map image to obtain image features corresponding to the sub-metadata;

and splicing the vector features with the image features to obtain feature vectors corresponding to the sub-metadata.

In one embodiment, when implementing the feature extraction on the sub-metadata to obtain the vector feature corresponding to the sub-metadata, the processor is configured to implement:

identifying the characteristics contained in the sub-metadata, and acquiring attribute information corresponding to the characteristics;

and determining the sub-vector features corresponding to the features respectively according to the attribute information to obtain the vector features corresponding to the sub-metadata.

In one embodiment, the processor, when implementing the training of the AI model according to the feature vector, is configured to implement:

inputting the characteristic vector into the long-period and short-period memory layer to output and obtain a corresponding output vector;

Inputting the output vector to the full connection layer to output a probability value corresponding to the executable behavior;

and selecting a behavior corresponding to the maximum probability value in the probability values to generate a target behavior instruction, and executing the target behavior instruction to obtain a corresponding behavior excitation value.

In one embodiment, the processor, when implementing the determining whether the trained AI model converges, is configured to implement:

summarizing the behavior excitation values obtained by executing the target behavior instructions to obtain total behavior excitation values;

and determining whether the trained AI model converges or not according to the total behavioral excitation value.

In one embodiment, when implementing the determining whether the trained AI model converges according to the total behavioral excitation value, the processor is configured to implement:

comparing the total behavior excitation value with a preset behavior excitation value;

if the total behavior excitation value is greater than or equal to the preset behavior excitation value, determining that the updated AI model converges;

and if the total behavior excitation value is smaller than the preset behavior excitation value, determining that the updated AI model is not converged.

Receiving a model calling instruction, wherein the model calling instruction is used for calling a pre-stored AI model, and the AI model is obtained by adopting the AI model training method as claimed in any one of claims 1 to 7;

In one embodiment, when the processor implements the inputting the data to be analyzed into the AI model to output a get target behavior instruction, the processor is configured to implement:

determining a character entity contained in the data to be analyzed and map coordinates corresponding to the character entity;

obtaining corresponding target data according to the map coordinates;

extracting features of the target data to obtain feature vectors corresponding to the target data, wherein the feature vectors are obtained by combining image features and vector features of the target data;

and inputting the feature vector into the AI model to output and obtain a target behavior instruction.

It should be noted that, for convenience and brevity of description, specific working processes of the above-described computer device may refer to corresponding processes in the foregoing AI model training method and/or AI model invoking method embodiments, which are not described herein.

Embodiments of the present application also provide a computer readable storage medium having a computer program stored thereon, where the computer program includes program instructions that, when executed, implement a method that may refer to various embodiments of the AI model training method and/or AI model invocation method of the present application.

The computer readable storage medium may be an internal storage unit of the computer device according to the foregoing embodiment, for example, a hard disk or a memory of the computer device. The computer readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like, which are provided on the computer device.

It should also be understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations. It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments. While the application has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the application. Therefore, the protection scope of the application is subject to the protection scope of the claims.

Claims

1. A method of AI model training, the method comprising:

performing block processing on metadata belonging to the same frame in the metadata to obtain a plurality of sub metadata;

if the trained AI model does not converge, executing the steps of: training the AI model according to the feature vector, and determining whether the trained AI model converges or not;

if the trained AI model converges, storing the trained AI model;

the block processing is performed on the metadata of the metadata belonging to the same frame to obtain a plurality of sub metadata, including:

and receiving an input blocking strategy to block the metadata belonging to the same frame in the metadata according to the blocking strategy to obtain a plurality of sub-metadata corresponding to the metadata, wherein the blocking strategy comprises the step of realizing blocking according to map coordinates.

2. The method of claim 1, wherein the extracting features of the sub-metadata to obtain feature vectors corresponding to the sub-metadata includes:

3. The method according to claim 2, wherein the feature extraction of the sub-metadata to obtain the vector features corresponding to the sub-metadata includes:

4. The method of claim 3, wherein the AI model comprises a long-term memory layer and a fully connected layer; the training the AI model according to the feature vector includes:

5. The method of claim 4, wherein the determining whether the trained AI model converges comprises:

6. The method of claim 5, wherein determining whether the trained AI model converges based on the aggregate behavioral stimulus value comprises:

7. An AI model invocation method, the method comprising:

receiving a model calling instruction, wherein the model calling instruction is used for calling a pre-stored AI model, and the AI model is obtained by adopting the AI model training method as claimed in any one of claims 1 to 6;

8. The method of claim 7, wherein inputting the data to be analyzed into the AI model for outputting get target behavior instructions comprises:

obtaining corresponding target data according to the map coordinates;

extracting the characteristics of the target data to obtain characteristic vectors corresponding to the target data;

9. A computer device comprising a processor, a memory, and a computer program stored on the memory and executable by the processor, wherein the computer program when executed by the processor implements the AI model training method of any of claims 1-6, and/or implements the AI model invocation method of claim 7 or 8.

10. A computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the AI model training method of any of claims 1-6, and/or implements the AI model invocation method of claim 7 or 8.