CN111589158B

CN111589158B - AI model training method, AI model calling method, apparatus and readable storage medium

Info

Publication number: CN111589158B
Application number: CN202010415411.2A
Authority: CN
Inventors: 杨木; 张弛; 杨正云; 王宇舟; 郭仁杰; 李宏亮; 刘永升
Original assignee: Super Parameter Technology Shenzhen Co ltd
Current assignee: Super Parameter Technology Shenzhen Co ltd
Priority date: 2020-05-15
Filing date: 2020-05-15
Publication date: 2024-03-12
Anticipated expiration: 2040-05-15
Also published as: CN111589158A

Abstract

The application discloses an AI model training method, an AI model calling device and a readable storage medium, wherein the AI model training method comprises the following steps: acquiring metadata for training, and loading an AI model to be trained, wherein the AI model is realized based on a deep reinforcement neural network; extracting the characteristics of entity units contained in the metadata to obtain vector characteristics corresponding to the metadata; training the AI model according to the vector characteristics, and determining whether the trained AI model converges or not; if the trained AI model does not converge, executing the steps of: updating the AI model according to the vector characteristics, and determining whether the trained AI model converges or not; and if the trained AI model converges, storing the trained AI model. The influence of the garbage on the accuracy of the model is reduced, and the accuracy of the AI model is improved.

Description

AI model training method, AI model calling method, apparatus and readable storage medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to an AI model training method, an AI model calling method, a computer device, and a computer readable storage medium.

Background

With the rapid development of artificial intelligence (Artificial Intelligence, AI) technology, the artificial intelligence technology is widely applied to various fields, for example, in the field of 3D modeling or 3D games, and the prediction of character behaviors can be better realized through the artificial intelligence technology, so that a user has better experience in operation. However, 3D games face greater challenges for AI of 3D virtual environments due to their greater spatial complexity and state space.

Currently, the main implementations include two types: an AI is a behavior tree realized based on rules, which is designed by a game plan in advance to enable the AI to execute appointed actions in a fixed scene; another approach is based on neural network learning, using RGB images of the game interface as raw inputs, then feature learning, and finally model prediction AI to perform actions.

For the existing two implementation modes, the first mode has great limitation and the AI model is weak in capacity, and the second mode is that the RGB image is used as the original data for AI model training, and the image data has the condition of inaccurate data display and contains more useless data, so that the model training efficiency is low and the accuracy of the AI model is influenced.

Therefore, there is a need for an AI model training method that improves the accuracy of AI models.

Disclosure of Invention

The application provides an AI model training method, an AI model calling method, computer equipment and a computer readable storage medium, so as to improve the accuracy of an AI model.

In a first aspect, the present application provides an AI model training method, where the method includes:

acquiring metadata for training, and loading an AI model to be trained, wherein the AI model is realized based on a deep reinforcement neural network;

extracting the characteristics of entity units contained in the metadata to obtain vector characteristics corresponding to the metadata;

training the AI model according to the vector characteristics, and determining whether the trained AI model converges or not;

if the trained AI model does not converge, executing the steps of: updating the AI model according to the vector characteristics, and determining whether the trained AI model converges or not;

and if the trained AI model converges, storing the trained AI model.

In a second aspect, the present application further provides an AI model invoking method, including:

receiving a model calling instruction, wherein the model calling instruction is used for calling a prestored AI model, and the AI model is obtained by adopting the AI model training method described in the first aspect;

Calling a corresponding AI model according to the model calling instruction, and receiving uploaded data to be analyzed;

and inputting the data to be analyzed into the AI model to output and obtain a target behavior instruction, and feeding back the target behavior instruction.

In a third aspect, the present application further provides a computer device, the computer device comprising a processor, a memory, and a computer program stored on the memory and executable by the processor, wherein the computer program when executed by the processor implements the steps of the AI model training method and/or AI model invoking method as described above.

In a fourth aspect, the present application further provides a computer readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of the AI model training method and/or AI model invoking method as described above.

In the AI model training method, the AI model calling method, the computer device and the computer readable storage medium provided by the application, when the AI model is trained, continuous learning and training of the AI model are realized by acquiring related metadata, specifically, entity units contained in the metadata are identified, simultaneously, the characteristics corresponding to each entity unit are extracted, the vector characteristics corresponding to each entity unit are obtained, so that the vector characteristics corresponding to the whole metadata are obtained, then the loaded AI model needing to be trained is trained once according to the obtained vector characteristics, and once learning and training are realized through continuous self-game, so that the AI model is converged. Through training the AI model in a self-game mode based on the entity units, influence of useless information on model accuracy is reduced, and the accuracy of the AI model is improved under the condition that model learning cost is effectively reduced.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart illustrating steps of a 3D game-based AI model modeling method in one embodiment of the application;

FIG. 2 is a schematic diagram of a hierarchical structure of an AI model in one embodiment of the application;

FIG. 3 is a flowchart illustrating steps for obtaining vector features according to one embodiment of the present application;

FIG. 4 is a schematic diagram of key points of a human body according to an embodiment of the present application;

FIG. 5 is a flow chart illustrating the steps for training an AI model in accordance with one embodiment of the present application;

FIG. 6 is a flowchart illustrating a step of obtaining a first output vector according to one embodiment of the present application,

FIG. 7 is a flowchart of an AI model invoking method according to one embodiment of the disclosure;

FIG. 8 is a schematic diagram of a 3DFPS game as a game interface in one embodiment of the present application;

FIG. 9 is a flowchart illustrating steps for obtaining target behavior instructions for feedback according to one embodiment of the present application

Fig. 10 is a schematic block diagram of a structure of a computer device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

The flow diagrams depicted in the figures are merely illustrative and not necessarily all of the elements and operations/steps are included or performed in the order described. For example, some operations/steps may be further divided, combined, or partially combined, so that the order of actual execution may be changed according to actual situations.

It is to be understood that the terminology used in the description of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should also be understood that the term "and/or" as used in this specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

The embodiment of the application provides an AI model training method, an AI model calling method, AI model calling equipment and a computer readable storage medium. The AI model training method can be applied to a server, wherein the server can be a single server or a server cluster consisting of a plurality of servers.

Some embodiments of the present application are described in detail below with reference to the accompanying drawings. The following embodiments and features of the embodiments may be combined with each other without conflict.

It should be noted that, in the following description, the application of the AI model to the 3D game will be described as an example, and it should be appreciated that the AI model may be applied to other 3D scenes as well.

Referring to fig. 1, fig. 1 is a flowchart illustrating steps of an AI model modeling method according to an embodiment of the present application.

As shown in fig. 1, the AI model modeling method includes steps S101 to S105.

And step S101, acquiring metadata for training, and loading an AI model to be trained, wherein the AI model is realized based on a deep reinforcement neural network.

Metadata (Meta data), also called intermediate data, relay data, is data (data about data) describing data, mainly describing data attribute (property) information, and is used to support functions such as indicating storage location, history data, resource searching, file recording, etc. Metadata is an electronic catalog, and in order to achieve the purpose of cataloging, the contents or characteristics of data must be described and collected, so as to achieve the purpose of assisting in data retrieval.

In the application, metadata is data describing data generated in a game, and can be understood as text data, the data describing the data generated in the game is recorded in the text, when training of an AI model is required, the recorded metadata is acquired, and meanwhile, the AI model to be trained is loaded, so that the AI model to be trained is trained by using the obtained metadata.

In one embodiment, upon detecting a model training initiation instruction, the server will obtain the training metadata and load the AI model to be trained. The model training start instruction is used for triggering a corresponding model training function to enable the server to start training a corresponding AI model, where the model training start instruction may be triggered by an operation performed by a user or may be triggered by the server at a fixed time, and is not specifically limited herein.

The game play data is play data of a 3D game, including but not limited to play data of a 3DFPS (3D First Person Shooter) game, but may be play data of other types of 3D games, and is not limited thereto.

The game is taken as a popular 3DFPS hand game to describe game play data, wherein the game play data comprises attributes and characteristics corresponding to characters, materials, sounds, poison circles, equipment and global information, for example, the characters mainly comprise the current attributes of my and enemy players; the supplies mainly comprise attributes of objects visible in the field of view of the player; sound refers to sound generated in a game; the poison circle is the record information of the poison circle in the office; the equipment refers to the information of the armors, helmets and weapons owned by each weapon slot on the player; the global information mainly includes the progress time of the current game, the survival number of teammates, the killing population of the team on my side, and the like.

After acquiring the metadata for training, the server loads the AI model to be trained, wherein the AI model is implemented based on the deep enhanced neural network, as shown in fig. 2, fig. 2 is a schematic hierarchical structure of the AI model in an embodiment of the present application, and the AI model includes a first full-connection layer, a long-short-period memory layer, and a second full-connection layer, and the first full-connection layer is connected with the long-short-period memory layer, and the long-short-period memory layer is connected with the second full-connection layer.

In some embodiments, the server processes the vector features to obtain a first output vector, then processes the first output vector as a second time in the long-short-term memory layer to obtain a second output vector, finally processes the second output vector as an input of the second full-connection layer to obtain a third output vector, so as to obtain current executable behavior instructions, and finally calculates a probability value corresponding to each behavior instruction to execute a behavior corresponding to the highest probability value, thereby realizing one training of the AI model.

And step S102, extracting the characteristics of the entity units contained in the metadata to obtain the vector characteristics corresponding to the metadata.

After the training metadata is obtained, extracting the characteristics of the entity units recorded in the metadata, and further obtaining the vector characteristics which correspond to the metadata and are used for representing the total attribute information of the metadata. Specifically, after the metadata is obtained, feature extraction is performed on each entity unit recorded in the metadata, and then corresponding processing is performed after feature extraction on each entity unit is completed, so as to obtain vector features corresponding to the metadata.

In one embodiment, as shown in fig. 3, step S102 includes sub-steps S301 to S302.

In the substep S301, a plurality of entity units contained in the metadata are identified.

The number of entity units included in the metadata is determined based on the game itself, and the number of entity units corresponding to different games may be different or the same. Thus, after obtaining the metadata, the entity units contained in the metadata are firstly identified and acquired. The following explanation will be made by taking a popular 3DFPS hand tour as an example.

When the metadata is obtained, since the description of the game data is recorded in the metadata, the data information of the game is described in a simple manner. And determining entity units contained in the acquired metadata by reading the data information recorded by the metadata.

For example, the data information recorded by the metadata includes attributes and features corresponding to characters, materials, sounds, circles, equipment, and global information. Then there are 6 corresponding entity units, specifically: characters, materials, sounds, circles, equipment, and global information. When the entity units are obtained from the metadata, since the entity units are related to the game itself, keywords of the entity units of the game can be set in advance, and the entity units contained in the metadata can be determined by reading information from the metadata and identifying the keywords.

And step S302, extracting features of each entity unit to obtain sub-vector features corresponding to each entity unit, and associating the sub-vector features with labels corresponding to the corresponding entity units, wherein the sub-vector feature set is the vector features corresponding to the metadata.

The number of entity units contained in the metadata is determined by identifying the entity units contained in the metadata, and then feature extraction is performed on each entity unit, namely the attribute information of each entity unit is extracted, so as to obtain the sub-vector feature corresponding to each entity unit.

The feature extraction modes corresponding to different entity units are different, and the feature extraction modes corresponding to the entity units can be set according to different games, and feature extraction is explained by taking a certain 3DFPS hand tour as an example.

The corresponding features of each entity unit are different, for example, the character features comprise the current attributes of the my and enemy players, such as the features of position, blood volume, orientation, speed, weapon equipment and the like; the asset characteristics include attributes of the items visible to the player in view, such as item location, distance, relative orientation, asset type, etc., where the asset type includes: gun, knife, armor, helmet, medicine, throwing object, monument, etc.; the sound features mainly include: the location, relative orientation, type of sound source, etc.; the toxin circle characteristics comprise the current toxin circle center, the current toxin circle radius, the current toxin circle surplus, the center of the next toxin circle, the radius of the next toxin circle and the like; the equipment characteristics are information of armors, helmets and weapons owned by each weapon slot on the player; the global information features mainly include the progress time of the current game, the survival number of teammates, the total number of the team hit by my, and the like.

It should be noted that, because of a certain specificity to the description of the person, the person is accurately positioned by using the key points of the human body in the application. Specifically, explanation will be given by taking a certain 3DFPS hand tour as an example.

In 3D space, the character is in a 3D state, and in order to accurately describe the position of the character, firstly, accurate identification is performed on character key points, wherein the character key points include, but are not limited to, a head, a big arm, a small arm, a big leg, a small leg and the like, as shown in fig. 4, then, a three-dimensional coordinate is constructed based on the shape of the character, namely, three attributes of x, y and z are constructed, wherein x and y are coordinate representations in plane space, z is coordinate representation in vertical space, and then the character key points are represented based on the three-dimensional coordinate.

By the expression mode, people can be accurately positioned, for example, when shooting the enemy people, in order to more quickly eliminate the opponent, the accurate shooting of the opponent is the fastest mode, so that the accurate positioning of the head of the people is completely necessary.

In the case of accurately locating a character using key points of a human body, the position of each key point of the human body in the human body space is expressed herein as a position of the character in the whole game scene, unlike the actual position of the character in the game.

And after extracting the characteristics of the entity units, obtaining the sub-vector characteristics corresponding to the entity units. Taking a person as an entity unit as an example, the characteristics of the person include a position, a blood volume, an orientation, a speed, a weapon attribute and the like, the position of the person is described by using three-dimensional coordinates, and specifically, the position of the person is expressed by using coordinates (x, y, z), the blood volume, the orientation and the speed can be directly recorded into a blood volume value, an orientation value (such as 30 degrees, N and the like) and a speed value, and for the weapon attribute, the position of the person can be expressed by using a preset characteristic corresponding rule, such as using a numerical number to classify and mark the weapon attribute. If the character information is the position (34, 68,5), the blood volume is 10, the direction is N (0 °), the speed is 50km/h and the weapon attribute is 01, then the sub-vector features corresponding to the character feature are: [34, 68,5, 10,0, 50, 01], the first three of the vector features can be determined by specific restrictions or descriptions to represent the person's position.

It should be noted that, the features included in each entity unit are different, including the feature types and the number, so the items included in the sub-feature vectors corresponding to each entity unit are different.

And step S103, training the AI model according to the vector characteristics.

The method comprises the steps of identifying entity units contained in metadata, extracting features of each entity unit to obtain sub-vector features corresponding to each entity unit, taking the obtained vector features as training initial data of an AI model, inputting the training initial data into the AI model to realize training of the AI model, and determining whether the AI model obtained after training converges according to each training result when the AI model is trained. The method comprises the steps of determining whether the trained AI model converges or not, namely determining whether the AI model is trained or not, so that the behavior prediction can be accurately performed.

As can be seen from the above description, the AI model is composed of three different levels, specifically, a first full-connection layer, a long-short-term memory layer, and a second full-connection layer, where the first full-connection layer is connected with the long-short-term memory layer, and the long-short-term memory layer is connected with the second full-connection layer, and the AI model is enabled to perform automatic training by inputting the obtained vector features.

In some embodiments, the server processes the obtained vector features through the first full-connection layer, the long-short-term memory layer and the second full-connection layer to obtain a behavior instruction obtained by each training, and then the server continuously adjusts related parameter information in the AI model according to feedback of the behavior instruction obtained by the instruction, so as to continuously update the AI model.

In some embodiments, as shown in fig. 5, step S103 includes sub-steps S501-S504.

In the substep S501, the vector feature is input to the first full connection layer, so as to output and obtain a first output vector.

When the server trains the loaded AI model according to the obtained vector features, the vector features are used as input for training the AI model, and when the server specifically trains, the training of the AI model is continuously perfected by utilizing a self-playing training mode, namely, the training of the AI model is completed through repeated training for a plurality of times, and the completion of the training is determined only when the AI model meets the actual requirements when the related data obtained by training is verified.

In a specific training process, since the AI model has three levels, each level has a corresponding processing mode, and the training vector features are sequentially input into the first full-connection layer, the long-short-period memory layer and the second full-connection layer during specific training to complete one training.

For single training, the obtained vector features are input into a first full-connection layer of an AI model, so that the first full-connection layer performs certain analysis processing on the vector features to obtain a first output vector. When the vector features are input to the first full-connection layer for training, the features of each entity unit are learned through the first full-connection layer, so that a first output vector is output after the learning is completed.

However, since the types of the entity units are not limited, and the number of entity units is not one, and the learning manner of each entity unit is different, the number of sub-first full-connection layers included in the first full-connection layer is determined according to the number of actual entity units, and then each entity unit learns by using the sub-first full-connection layer corresponding to each entity unit.

Specifically, in some embodiments, as shown in fig. 6, sub-step S501 includes sub-steps S601 through S602.

In the substep S601, a plurality of sub-vector features included in the vector features are identified, so as to determine a sub-first full connection layer corresponding to each sub-vector feature.

The vector features obtained by extracting the features of the metadata comprise sub-vector features corresponding to all entity features, and all the sub-vector features can be spliced before the obtained vector features are input as input data of the AI model, so that a vector feature for representing the metadata is obtained, and the obtained vector features are further used as input of the AI model. Instead of stitching sub-vector features, a set of vector features may be obtained directly, as input to the AI model.

When the vector features are input as training data of the AI model, since the feature attributes of each entity unit are different, the sub-vector features corresponding to each entity unit need to be input into the respective sub-first full connection layers, specifically, when the vector features are obtained, a plurality of sub-vector features contained in the vector features are identified to determine the sub-first full connection layers corresponding to each sub-vector feature.

For the two types of generation modes of the vector features, when the sub-vector features of each entity unit are needed to be spliced to obtain the input vector features, the vector features are directly input into each sub-first full-connection layer when the vector features are input into the first full-connection layer, so that each sub-first full-connection layer performs autonomous selection processing, and the vector features are decomposed to obtain the sub-vector features corresponding to the entity units corresponding to the sub-first full-connection layer, and further learning is performed.

And when the sub-vector features are not spliced, determining the corresponding sub-first full-connection layer according to the information of the entity unit associated with each sub-vector feature. The following explanation will be made taking a certain 3DFPS hand tour as an example.

Because there are 6 kinds of entity units, the first full-connection layer includes 6 sub-first full-connection layers, each sub-first full-connection layer corresponds to one entity unit, and each sub-first full-connection layer can be marked by using a corresponding label, so that when the vector feature is input, it can be determined which sub-vector feature needs to be input to which sub-first full-connection layer for learning directly according to the corresponding label. For example, when labeling the first sub-full-connection layer, the name of the entity unit is directly used for labeling, for example, labeling the first sub-first full-connection layer as a character, labeling the second sub-first full-connection layer as a material, labeling the third sub-first full-connection layer as a sound, labeling the fourth sub-first full-connection layer as a poison circle, labeling the fifth sub-first full-connection layer as equipment and labeling the sixth sub-first full-connection layer as global information, and meanwhile, each sub-vector feature is associated with the corresponding entity unit in advance, so that when inputting the vector feature, the sub-first full-connection layer corresponding to each sub-vector feature is determined by determining the sub-vector feature contained.

And step S602, inputting the features of each sub-vector into a corresponding sub-first full connection layer to output to obtain a sub-first output vector, and splicing the sub-first output vectors to obtain a first output vector.

After a plurality of sub-vector features are obtained through processing the vector features, and sub-first full connection layers corresponding to each sub-vector feature are determined, each sub-vector feature is input into the corresponding sub-first full connection layer for learning so as to output a sub-first output vector corresponding to each sub-vector feature, and then the obtained sub-first output vectors are spliced to obtain a first output vector output by the first full connection layer.

Step S502, the first output vector is input to the long-short-term memory layer to output and obtain a second output vector.

The first full-connection layer of each sub-vector is learned to obtain the full-connection output of each sub-vector, and then the first output vector is obtained through vector splicing. And then, taking the obtained first output vector as the input of the long-period memory layer, so that the long-period memory layer processes the first output vector to output and obtain a second output vector.

In some embodiments, when the long-short-period memory layer processes the first output vector, the related information memorized by the long-short-period memory layer is read, in general, the training of a model is realized through repeated learning and training, repeated learning and training can be realized by utilizing the self-memory function of the long-short-period memory layer, but the repeated training and learning is aimed at one training, the long-short-period memory layer has certain memory capacity for each learned data, such as a character blood volume change process, an enemy player moving process and the like, and the current available behavior is predicted through the received first output vector and the self-memorized data information and is output in a vector form.

In step S503, the second output vector is input to the second fully-connected layer to output a probability value corresponding to the executable behavior.

After obtaining the second output vector corresponding to the executable behaviors, inputting the second output vector into the second fully-connected layer for processing, and outputting a probability value corresponding to each executable behavior, wherein the probability value represents the executable possibility of one behavior, and for the behaviors with high probability values, the behavior with low probability values can be directly executed, and for the behaviors with low probability values, the behavior with low probability values can be directly ignored.

In some embodiments, the second output vector includes all executable actions derived from the metadata, wherein the executable actions are related to the actual scenario applied. An explanation will be given taking a 3DFPS hand tour as an example.

In general, the actions that people can perform when performing the manipulation include: movement, side-to-side aiming, up-and-down aiming, attack, medication, rescue, squat, and jump, where only 8 behaviors are assumed, but in practice there are many more, without limitation. For each behavior, the second output vector includes data information related to each behavior, after obtaining a second output vector including executable behaviors, the second output vector is input into a full-connection layer corresponding to each behavior for learning, where the second full-connection layer is also formed by a plurality of sub-second full-connection layers, and the data of a specific sub-second full-connection layer depends on the used scenario, in this 3DFPS game, the number of the sub-second full-connection layers may be set to be equal to the number of behaviors, for example, 8 sub-second full-connection layers, each sub-second full-connection layer represents one behavior, and when the second output vector is input into the full-connection layer corresponding to each behavior for learning, that is, the second output vector is taken as the input of each sub-second full-connection layer, and the executable probability value corresponding to each behavior is obtained through a certain learning and correlation process.

Further, when the probability value of each executable action is obtained, the probability value corresponding to each action, namely the probability distribution corresponding to all the executable actions, is calculated through Softmax, and in actual application, the larger the probability, the more likely the action corresponding to the action is executed.

And step S504, selecting a behavior corresponding to the maximum probability value in the probability values to generate a target behavior instruction, and executing the target behavior instruction to obtain a corresponding behavior excitation value.

Through the learning of the second full connection layer, probability distribution conditions corresponding to each executable behavior can be obtained, at the moment, the behavior corresponding to the maximum probability value is selected as the current behavior to be executed, then the server controls the AI model to execute the selected behavior, and meanwhile, the behavior excitation value corresponding to the execution of the selected behavior is obtained.

The AI model is a self-game process in the training process, when the metadata is analyzed and processed, various behaviors which possibly need to be instructed and correspond to the current situation can be determined, but the AI model is not directly controlled to respond according to all the behaviors which possibly need to be executed, and the most suitable behaviors are selected as the corresponding behaviors at the current time each time through one-time learning and game.

In the process of self-game, for each learning, that is, each behavior execution, the server performs a certain feedback, and in general, the feedback given by the server includes positive and negative feedback, so that the quality and the quality of the result of each learning can be determined through each feedback of the server.

In some embodiments, the feedback of the server may be quantified, for example, a mental score of adding and subtracting, a corresponding adding and subtracting is performed when the feedback result is positive, a certain subtracting and subtracting is performed when the feedback result is negative, and each score is used as an incentive value of each behavioral response, and whether the learning is positive or not may be determined by using the incentive value.

Step S104, determining whether the trained AI model converges.

After updating the AI model, it is determined whether the updated AI model converges. Specifically, the behavior excitation values recorded during the training of the updated AI model are obtained, so that all excitation values generated in the training process are summarized to obtain a total behavior excitation value of the whole training process, then whether the recorded total behavior excitation value is larger than or equal to a preset behavior excitation value is determined, if the total behavior excitation value is larger than or equal to the preset behavior excitation value, the updated AI model is determined to be converged, and if the total behavior excitation value is smaller than the preset behavior excitation value, the updated AI model is determined to be not converged. It should be noted that the preset behavior excitation value may be set based on actual situations, which is not specifically limited in this application.

Step 105, if the trained AI model converges, the trained AI model is stored.

If the updated AI model does not converge, step S103 is performed, in which the AI model is trained according to the vector features until the updated AI model converges. If the updated AI model converges, stopping model training and storing the updated AI model for subsequent calling.

It should be noted that, when the trained AI model is not converged, the learning and training will be performed again, and the data for performing the learning and training at this time will be adjusted according to each learning and training, for example, the blood volume of the user in the metadata is 10, and after the taking action is performed at this time, the blood volume becomes 50, and a certain positive stimulus will be given at this time, then at this time, the blood volume of the user in the metadata may be adjusted, that is, from 10 to 50, but the metadata will not be changed, and then at this time, the learning and training will be performed for the second time.

According to the model training method provided by the embodiment, when the AI model is trained, continuous learning and training of the AI model are achieved through obtaining relevant metadata, specifically, through identifying entity units contained in the metadata, extracting characteristics corresponding to each entity unit at the same time, obtaining vector characteristics corresponding to each entity unit, so that vector characteristics corresponding to the whole metadata are obtained, then the loaded AI model needing to be trained is trained once according to the obtained vector characteristics, and one-time learning and training are achieved through continuous self-game, so that the AI model is converged. Through training the AI model in a self-game mode based on the entity units, influence of useless information on model accuracy is reduced, and the accuracy of the AI model is improved under the condition that model learning cost is effectively reduced.

Referring to fig. 7, fig. 7 is a flowchart of an AI model calling method according to an embodiment of the disclosure.

As shown in fig. 7, the AI model invoking method includes steps S701 to S704.

Step S701, receiving an AI model calling instruction, where the model calling instruction is used to call a prestored AI model.

The server acquires a model calling instruction, wherein the model calling instruction is used for calling a prestored AI model. The AI model is implemented based on a deep enhanced neural network, and may be stored locally in a software development kit (Software Development Kit, SDK), or may be stored in a cloud server, which is not specifically limited in this application.

In some embodiments, the hierarchical structure of the AI model is shown in fig. 2, where the AI model includes a first fully-connected layer, a long-short term memory layer, and a second fully-connected layer, and the first fully-connected layer is connected to the long-short term memory layer, and the long-short term memory layer is connected to the second fully-connected layer. The server processes the vector characteristics to obtain a first output vector, then takes the first output vector as an input of a second full-connection layer, processes the first output vector as an input of the second full-connection layer to obtain a third output vector, so as to obtain current executable behavior instructions, and finally calculates probability values corresponding to each behavior instruction to execute behaviors corresponding to the highest probability values.

In an embodiment, there are various triggering manners of the invocation instruction of the AI model, for example, it is determined that the AI model needs to be invoked currently according to the actual operation of the user, or it is determined by state information of a device to which the AI model is applied, specifically, for example, when an application or a process running the AI model is in an intelligent processing state, that is, when the application or the process does not operate by the actual user at this time, prediction of a behavior will be performed through the AI model, for example, when the application or the process to which the user needs to apply the AI model changes to a non-user processing state, and then the user can respond to the application or the process by using the AI model through corresponding adjustment.

A description will now be given of a certain 3DFPS hand game, and a certain game interface of the 3DFPS hand game is shown in fig. 8. In the running process of the game, a certain record is formed on actual data of the game state or the process of the game server, when the AI model is used for predicting the behavior, the game server needs to upload the recorded data to the AI server, namely in the actual use process, a corresponding communication connection is established between the game server for recording the data and the AI server for controlling the invoking and using of the AI model, and when the AI model is accessed to the AI server, the AI model can be invoked to predict the behavior in the game.

And step S702, calling a corresponding AI model according to the model calling instruction, and receiving the uploaded data to be analyzed, wherein the AI model is realized based on a deep reinforcement neural network.

After the model calling instruction is acquired, a corresponding AI model is obtained according to the model calling instruction, so that corresponding behavior prediction can be performed, the AI model is realized based on the deep enhancement neural network, and the uploaded data to be analyzed is received when the AI model is called.

In an embodiment, the calling mode of the AI model is remote calling, specifically, the game server obtains data generated by the game itself to upload the data to the AI server associated with the AI model, so that the AI server uses the AI model to analyze and comb according to the received data to be analyzed to obtain corresponding behavior prediction, and then sends the obtained behavior prediction result to the game server to feed back, and finally the game server responds to the received behavior instruction.

It should be noted that, in addition to the AI model manner in the corresponding AI server, the AI model may be built in the game server, so that in this case, when the AI model is called to perform behavior prediction, it is not necessary to upload the data to be analyzed to the AI server disposed in the cloud end, and the behavior prediction may be directly performed in the game server.

Since the operation load of the game server may be increased and the data processing efficiency of the game server may be reduced when the AI model is built in the game server, in the present application, the AI model is preferably set with the AI server corresponding to the AI model, and the AI model is called and used in a remote call manner. However, in the actual use process, how to select the AI model and the setting manner of the AI server are not limited, and other suitable manners besides the two manners mentioned in the application may be used according to the actual situation.

Further, in an embodiment, after receiving the uploaded data to be analyzed, it needs to determine whether the AI model can be directly used for behavior prediction at present, which specifically includes: determining whether a current network state of the AI model is abnormal; if the current network state is normal, extracting the characteristics of the data to be analyzed; and if the current network state is abnormal, executing an action tree AI, and feeding back the obtained output result.

Before the behavior prediction is performed specifically using the AI model, it is necessary to determine whether the behavior prediction can be performed directly using the AI model. The use of an AI model is not always possible, and the AI model is implemented based on a deep-enhanced neural network and is obtained by continuous self-learning, so that a necessary condition for the AI model to normally operate is that the current network state in which the AI model is located is a normal state, and the AI model cannot be used for performing behavior prediction when the current network state is abnormal.

Therefore, the current network state of the AI model is firstly acquired to determine whether the AI model is in an abnormal state, and when the current network state is in a normal state, it is determined that the AI model can be used for performing behavior prediction based on the data to be analyzed, otherwise, the AI model cannot be used.

When the current network state is abnormal, the AI model cannot be used for predicting the behavior based on the data to be analyzed, but the current behavior still needs to be predicted, so that the behavior tree AI is executed according to the data to be analyzed, namely a rule which can be obtained according to the general rule of an expert and can simply perform behavior prediction, and the immobilized data specific to one or more data can be predicted through the behavior tree AI, but the behavior tree AI cannot be used for some unconventional data.

In the application, the AI model is directly used for behavior prediction when the current network state of the AI model is normal, and the behavior tree AI is used for realizing behavior prediction when the current network state is abnormal. So that behavior prediction can be accurately performed under normal conditions, and behavior is not unpredictable due to inoperability in abnormal conditions.

Step S703, inputting the data to be analyzed into the AI model, so as to output and obtain a target behavior instruction, and feeding back the target behavior instruction.

When the behavior prediction is carried out, the received data to be analyzed is used as the input of the called AI model to be input into the AI model, the AI model is utilized to carry out the behavior prediction to output a corresponding target behavior instruction, and the obtained target behavior instruction is further fed back.

In one embodiment, the hierarchical structure of the AI model is shown in fig. 2, where the AI model includes a first fully-connected layer, a long-short-term memory layer, and a second fully-connected layer, and the first fully-connected layer is connected to the long-short-term memory layer, and the long-short-term memory layer is connected to the second fully-connected layer. The data to be analyzed is continuously processed by the first full-connection layer, the long-short-period memory layer and the second full-connection layer so as to output the most appropriate behavior instruction corresponding to the data to be analyzed.

Specifically, in an embodiment, referring to fig. 9, step S703 includes steps S901 to S902.

And step 901, extracting features of the data to be analyzed to obtain vector features corresponding to the data to be analyzed.

And when the data to be trained is obtained, extracting the characteristics of the data to be analyzed to obtain the vector characteristics corresponding to the data to be analyzed. Specifically, when feature extraction is performed on the data to be analyzed, feature extraction is performed on the entity units included in the data to be analyzed, so as to obtain vector features corresponding to each entity unit included in the data to be analyzed, and further obtain vector features corresponding to the data to be analyzed.

In an embodiment, obtaining the vector features corresponding to the data to be analyzed includes: identifying a plurality of entity units contained in the data to be analyzed; and extracting the characteristics of each entity unit to obtain sub-vector characteristics corresponding to each entity unit, and associating the sub-vector characteristics with labels corresponding to the corresponding entity units, wherein the sub-vector characteristics set is the vector characteristics corresponding to the metadata.

Step S902, inputting the vector feature into the AI model to output and obtain a target behavior instruction.

After the vector features corresponding to the data to be analyzed are obtained, the vector features are input into the AI model for behavior prediction, so that the target behavior instructions corresponding to the data to be analyzed are output. Conventionally, the resulting target behavior instructions include, but are not limited to, the following: taking medicine, shooting, moving, squatting, jumping, etc. When the need of executing the medicine taking action is determined according to the vector characteristics, an action instruction for taking medicine is output.

In one embodiment, obtaining the target behavior instruction includes: inputting the vector features to the first full-connection layer to output and obtain a first output vector; inputting the first output vector to the long-term and short-term memory layer to output and obtain a second output vector; inputting the second output vector to a second full-connection layer to output probability values corresponding to the behaviors; and selecting a behavior corresponding to the maximum probability value in the probability values to generate a target behavior instruction.

And when the target behavior is determined, selecting the behavior with the highest probability value as the current corresponding target behavior, and generating a corresponding target behavior instruction according to the determined target behavior to enable the game server to respond to the target behavior instruction when receiving feedback of the AI server.

In the model invoking method provided in the above embodiment, when the AI model is used for performing behavior prediction, the entity units included in the data are identified, and the vector features included in each entity unit are extracted, so that the vector features are used as the AI model for performing behavior prediction. The entity unit can accurately position the reference data of the behavior prediction, so that the influence of useless data on the behavior prediction is reduced, and the accuracy of the behavior prediction is improved.

The apparatus provided by the above embodiments may be implemented in the form of a computer program which may be run on a computer device as shown in fig. 10.

Referring to fig. 10, fig. 10 is a schematic block diagram of a computer device according to an embodiment of the present application. The computer device may be a server.

As shown in fig. 10, the computer device includes a processor, a memory, and a network interface connected by a system bus, wherein the memory may include a non-volatile storage medium and an internal memory.

The non-volatile storage medium may store an operating system and a computer program. The computer program includes program instructions that, when executed, cause the processor to perform any one of AI model training and/or AI model invoking methods.

The processor is used to provide computing and control capabilities to support the operation of the entire computer device.

The internal memory provides an environment for the execution of a computer program in a non-volatile storage medium that, when executed by the processor, causes the processor to perform any one of an AI model training method and/or an AI model invocation method.

The network interface is used for network communication such as transmitting assigned tasks and the like. It will be appreciated by those skilled in the art that the structure shown in fig. 10 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

It should be appreciated that the processor may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field-programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. Wherein the general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

Wherein in one embodiment the processor is configured to run a computer program stored in the memory to implement the steps of:

And if the trained AI model converges, storing the trained AI model.

In one embodiment, the processor performs feature extraction on the entity units included in the metadata to obtain corresponding vector features, where the feature extraction is used to implement:

identifying a plurality of entity units contained in the metadata;

and extracting the characteristics of each entity unit to obtain sub-vector characteristics corresponding to each entity unit, and associating the sub-vector characteristics with labels corresponding to the corresponding entity units, wherein the sub-vector characteristics set is the vector characteristics corresponding to the metadata.

In one embodiment, the processor, when implementing the training of the AI model according to vector features, is configured to implement:

inputting the vector features to the first full-connection layer to output and obtain a first output vector;

inputting the first output vector to the long-term and short-term memory layer to output and obtain a second output vector;

inputting the second output vector to a second full-connection layer to output a probability value corresponding to the executable behavior;

and selecting a behavior corresponding to the maximum probability value in the probability values to generate a target behavior instruction, and executing the target behavior instruction to obtain a corresponding behavior excitation value.

In one embodiment, the processor, when implementing the determination of whether the trained AI model converges, is to implement:

summarizing the behavior excitation values obtained by executing the target behavior instructions to obtain total behavior excitation values;

and determining whether the trained AI model converges or not according to the total behavioral excitation value.

In one embodiment, the processor, when implementing determining whether the trained AI model converges based on the total behavioral stimulus value, is configured to implement:

comparing the total behavior excitation value with a preset behavior excitation value;

if the total behavior excitation value is greater than or equal to the preset behavior excitation value, determining that the updated AI model converges;

and if the total behavior excitation value is smaller than the preset behavior excitation value, determining that the updated AI model is not converged.

In one embodiment, when implementing the inputting the vector feature to the first fully-connected layer to output a first output vector, the processor is configured to implement:

identifying the plurality of sub-vector features contained in the vector features, and determining a sub-first full-succession layer corresponding to each sub-vector feature;

and inputting each sub-vector feature into a corresponding sub-first full-connection layer to output to obtain a sub-first output vector, and splicing the sub-first output vectors to obtain a first output vector.

In one embodiment, the processor is configured, when implementing inputting the second output vector to a second fully-connected layer to output a probability value corresponding to an executable action, to implement:

the second output vector is respectively input to the plurality of sub-second full-connection layers to obtain an output result of each sub-second full-connection layer so as to determine a probability value corresponding to the executable behavior

receiving a model calling instruction, wherein the model calling instruction is used for calling a prestored AI model, and the AI model is obtained by adopting the AI model training method;

In one embodiment, when the processor implements the inputting the data to be analyzed into the AI model to output a get target behavior instruction, the processor is configured to implement:

Extracting features of the data to be analyzed to obtain vector features corresponding to the data to be analyzed;

and inputting the vector features into the AI model to output and obtain a target behavior instruction.

In one embodiment, before implementing the inputting the data to be analyzed into the AI model to output and obtain a target behavior instruction, the processor is configured to implement:

determining whether a current network state of the AI model is abnormal;

if the current network state is normal, extracting the characteristics of the data to be analyzed;

and if the current network state is abnormal, executing an action tree AI, and feeding back the obtained output result.

In one embodiment, the processor, when implementing the inputting the vector feature into the AI model to output a get target behavior instruction, is configured to implement:

inputting the second output vector to a second full-connection layer to output probability values corresponding to the behaviors;

And selecting a behavior corresponding to the maximum probability value in the probability values to generate a target behavior instruction.

It should be noted that, for convenience and brevity of description, specific working processes of the above-described computer device may refer to corresponding processes in the foregoing AI model training method and/or AI model invoking method embodiments, which are not described herein.

Embodiments of the present application also provide a computer readable storage medium having a computer program stored thereon, where the computer program includes program instructions, and when the program instructions are executed, a method implemented by the method may refer to embodiments of the AI model training method and/or the AI model invoking method of the present application.

The computer readable storage medium may be an internal storage unit of the computer device according to the foregoing embodiment, for example, a hard disk or a memory of the computer device. The computer readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like, which are provided on the computer device.

It should also be understood that the term "and/or" as used in this specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations. It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The foregoing embodiment numbers of the present application are merely for describing, and do not represent advantages or disadvantages of the embodiments. While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the invention. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of AI model training, the method comprising:

acquiring metadata for training, and loading an AI model to be trained, wherein the AI model is realized based on a deep reinforcement neural network, and the metadata is data for describing data generated in a game;

identifying a plurality of entity units contained in the metadata;

extracting features of each entity unit to obtain sub-vector features corresponding to each entity unit, wherein the sub-vector feature set is a vector feature corresponding to the metadata;

training the AI model according to vector features, and determining whether the trained AI model converges, wherein the AI model comprises a first full-connection layer, the first full-connection layer comprises a plurality of sub-first full-connection layers, the plurality of sub-first full-connection layers are parallel full-connection layers, and sub-vector features corresponding to each entity unit are input to the sub-first full-connection layers respectively;

If the trained AI model does not converge, executing the steps of: training the AI model according to the vector characteristics, and determining whether the trained AI model converges or not;

and if the trained AI model converges, storing the trained AI model.

2. The method of claim 1, wherein after obtaining the sub-vector feature corresponding to each entity unit, further comprises:

and associating the sub-vector features with the labels corresponding to the corresponding entity units.

3. The method of claim 1, wherein the AI model further comprises a long-short term memory layer and a second fully-connected layer, the long-short term memory layer being located between the first fully-connected layer and the second fully-connected layer; the training of the AI model according to vector features includes:

4. The method of claim 3, wherein the determining whether the trained AI model converges comprises:

5. The method of claim 4, wherein the determining whether the trained AI model converges based on the aggregate behavioral stimulus value comprises:

6. A method according to claim 3, wherein said inputting the vector features into the first fully connected layer for output results in a first output vector, comprising:

7. The method of claim 3, wherein the second fully-connected layer includes a plurality of sub-second fully-connected layers, and the plurality of sub-second fully-connected layers are parallel fully-connected layers, each sub-second fully-connected layer being associated with an executable action, the inputting the second output vector into the second fully-connected layer to output a probability value corresponding to the executable action, comprising:

and respectively inputting the second output vectors to the plurality of sub-second full-connection layers to obtain an output result of each sub-second full-connection layer so as to determine a probability value corresponding to the executable behavior.

8. An AI model invocation method, the method comprising:

receiving a model calling instruction, wherein the model calling instruction is used for calling a prestored AI model, and the AI model is obtained by adopting the AI model training method as set forth in any one of claims 1 to 7;

9. A computer device comprising a processor, a memory, and a computer program stored on the memory and executable by the processor, wherein the computer program when executed by the processor implements the AI model training method of any of claims 1-7, and/or implements the AI model invocation method of claim 8.

10. A computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the AI model training method of any of claims 1-7, and/or implements the AI model invocation method of claim 8.