WO2020119737A1 - 信息预测的方法、模型训练的方法以及服务器 - Google Patents

信息预测的方法、模型训练的方法以及服务器 Download PDF

Info

Publication number
WO2020119737A1
WO2020119737A1 PCT/CN2019/124681 CN2019124681W WO2020119737A1 WO 2020119737 A1 WO2020119737 A1 WO 2020119737A1 CN 2019124681 W CN2019124681 W CN 2019124681W WO 2020119737 A1 WO2020119737 A1 WO 2020119737A1
Authority
WO
WIPO (PCT)
Prior art keywords
trained
label
feature
predicted
target
Prior art date
Application number
PCT/CN2019/124681
Other languages
English (en)
French (fr)
Inventor
李宏亮
王亮
施腾飞
袁博
杨少杰
誉洪生
殷尹玉婷
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to KR1020217017878A priority Critical patent/KR102542774B1/ko
Priority to EP19896168.2A priority patent/EP3896611A4/en
Priority to JP2021512924A priority patent/JP7199517B2/ja
Publication of WO2020119737A1 publication Critical patent/WO2020119737A1/zh
Priority to US17/201,152 priority patent/US20210201148A1/en

Links

Images

Classifications

    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/30Interconnection arrangements between game servers and game devices; Interconnection arrangements between game devices; Interconnection arrangements between game servers
    • A63F13/35Details of game servers
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/50Controlling the output signals based on the game progress
    • A63F13/53Controlling the output signals based on the game progress involving additional visual information provided to the game scene, e.g. by overlay to simulate a head-up display [HUD] or displaying a laser sight in a shooting game
    • A63F13/537Controlling the output signals based on the game progress involving additional visual information provided to the game scene, e.g. by overlay to simulate a head-up display [HUD] or displaying a laser sight in a shooting game using indicators, e.g. showing the condition of a game character on screen
    • A63F13/5378Controlling the output signals based on the game progress involving additional visual information provided to the game scene, e.g. by overlay to simulate a head-up display [HUD] or displaying a laser sight in a shooting game using indicators, e.g. showing the condition of a game character on screen for displaying an additional top view, e.g. radar screens or maps
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/60Generating or modifying game content before or while executing the game program, e.g. authoring tools specially adapted for game development or game-integrated level editor
    • A63F13/67Generating or modifying game content before or while executing the game program, e.g. authoring tools specially adapted for game development or game-integrated level editor adaptively or by learning from player actions, e.g. skill level adjustment or by storing successful combat sequences for re-use
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/80Special adaptations for executing a specific game genre or game mode
    • A63F13/822Strategy games; Role-playing games

Definitions

  • This application relates to the field of artificial intelligence technology, and in particular, to an information prediction method, a model training method, and a server.
  • FIG. 1 is a schematic diagram of a model for layering in related technologies.
  • the decision-making is divided according to the overall situation of "outfield”, "clearing soldiers", “group battle”, and "pushing tower”.
  • FIG. 2 is a schematic structural diagram of a layered model in the related art.
  • the overall view model is established using the overall view feature
  • the micro operation model is established using the micro operation feature.
  • the overall view model outputs the overall view label, and the micro operation label can be output through the micro operation model.
  • the embodiments of the present application provide a method for information prediction, a method for model training, and a server. Only one joint model can be used to predict micromanipulation and overall view, which effectively solves the hard handover problem in the layered model and improves prediction. Convenience.
  • the first aspect of the present application provides a method for information prediction, including: acquiring an image to be predicted; extracting a set of features to be predicted in the image to be predicted, wherein the set of features to be predicted includes a first A feature to be predicted, a second feature to be predicted, and a third feature to be predicted, the first feature to be predicted represents an image feature of a first area, the second feature to be predicted represents an image feature of a second area, and the third The feature to be predicted represents an attribute feature related to the interactive operation, and the range of the first area is smaller than the range of the second area;
  • first label and/or the second label corresponding to the feature set to be predicted through the target joint model wherein the first label represents a label related to operation content, and the second label represents a related to operation intention label.
  • a second aspect of the present application provides a method for model training, including: acquiring a set of images to be trained, wherein the set of images to be trained includes N images to be trained, where N is an integer greater than or equal to 1; extraction A set of features to be trained in each image to be trained, wherein the set of features to be trained includes a first feature to be trained, a second feature to be trained and a third feature to be trained, the first feature to be trained represents a first region Image features, the second feature to be trained represents the image feature of the second area, the third feature to be trained represents the attribute feature related to the interactive operation, the range of the first area is smaller than the range of the second area ; Obtaining a first to-be-trained label and a second to-be-trained label corresponding to each of the images to be trained, wherein the first to-be-trained label represents a label related to operation content, and the second to-be-trained label represents a Tags related to operation intent; based on the feature set to be trained in each of
  • the third aspect of the present application provides a server, including:
  • the acquisition module is used to acquire the image to be predicted
  • An extraction module for extracting a set of features to be predicted in the to-be-predicted image acquired by the acquiring module, wherein the set of features to be predicted includes a first feature to be predicted, a second feature to be predicted, and a third feature to be predicted ,
  • the first feature to be predicted represents the image feature of the first area
  • the second feature to be predicted represents the image feature of the second area
  • the third feature to be predicted represents the attribute feature related to the interactive operation, the first The range of a region is smaller than the range of the second region;
  • the acquiring module is further configured to acquire the first tag and the second tag corresponding to the feature set to be predicted extracted by the extracting module through a target joint model, where the first tag represents a tag related to the operation content ,
  • the second label represents a label related to the operation intention.
  • the obtaining module is configured to obtain the first label, the second label, and the third label corresponding to the feature set to be predicted through the target joint model, wherein the third label represents Situation related tags.
  • the fourth aspect of the present application provides a server, including:
  • An obtaining module configured to obtain a set of images to be trained, wherein the set of images to be trained includes N images to be trained, where N is an integer greater than or equal to 1;
  • An extraction module for extracting a set of features to be trained in each image to be trained acquired by the acquiring module, wherein the set of features to be trained includes a first feature to be trained, a second feature to be trained and a third feature to be trained ,
  • the first feature to be trained represents the image feature of the first area
  • the second feature to be trained represents the image feature of the second area
  • the third feature to be trained represents the attribute feature related to the interactive operation, the The range of a region is smaller than the range of the second region;
  • the acquiring module is configured to acquire a first to-be-trained label and a second to-be-trained label corresponding to each of the images to be trained, wherein the first to-be-trained label represents a label related to operation content, and the first 2.
  • the label to be trained indicates a label related to the operation intention;
  • a training module configured to: according to the feature set to be trained in each to-be-trained image extracted by the extraction module, and the first to-be-corresponded corresponding to each to-be-trained image acquired by the acquiring module
  • the training label and the second to-be-trained label are trained to obtain the target joint model.
  • the first feature to be trained is a two-dimensional vector feature, wherein the first feature to be trained includes character position information, moving object position information, fixed object position information, and defense object position information in the first area At least one of
  • the second feature to be trained is a two-dimensional vector feature, wherein the second feature to be trained includes character position information, moving object position information, fixed object position information, defensive object position information in the second area, At least one of obstacle object position information and output object position information;
  • the third feature to be trained is a one-dimensional vector feature, wherein the first feature to be trained includes at least one of character life value, character output value, time information, and score information; wherein, the first feature to be trained There is a correspondence between the feature, the second feature to be trained, and the third feature to be trained.
  • the first label to be trained includes key type information and/or key parameter information
  • the key parameter information includes at least one of a directional parameter, a position parameter, and a target parameter.
  • the directional parameter is used to indicate the direction of the character's movement
  • the position parameter is used to indicate where the character is located.
  • the target type parameter is used to represent the object to be output of the character.
  • the second label to be trained includes operation intention information and character position information; wherein, the operation intention information represents a character
  • the character position information indicates the position of the character in the first area.
  • the training module is configured to process the feature set to be trained in each of the images to be trained To obtain a target feature set, where the target feature set includes a first target feature, a second target feature, and a third target feature;
  • the first predicted label represents a predicted label related to the operation content
  • the second predicted label Represents the predicted label related to the operation intention
  • the core parameters of the model are obtained by training, wherein the first Both a predicted label and the second predicted label belong to the predicted value, and both the first to-be-trained label and the second to-be-trained label are true values;
  • the target joint model is generated according to the core parameters of the model.
  • the training module is configured to use a fully connected layer to The feature to be trained is processed to obtain the third target feature, wherein the third target feature is a one-dimensional vector feature;
  • the training module is configured to obtain the first prediction corresponding to the target feature set through a long-short-term memory LSTM layer A label, a second predicted label, and a third predicted label, where the third predicted label represents the predicted label related to the outcome of the victory and defeat;
  • the training obtains the Core parameters of the model, wherein the third label to be trained belongs to a predicted value, and the third predicted label belongs to a true value.
  • the server further includes an update module
  • the acquisition module is also used in the training module according to the to-be-trained feature set in each to-be-trained image, and the first to-be-trained label and all corresponding to each to-be-trained image
  • the second to-be-trained label after training to obtain the target joint model, obtain the to-be-trained video, where the to-be-trained video includes multiple frames of interactive images;
  • the acquiring module is further configured to acquire target scene data corresponding to the video to be trained through the target joint model, wherein the target scene data includes relevant data under the target scene;
  • the training module is further used for training to obtain target model parameters based on the target scene data, the first to-be-trained label and the first predicted label obtained by the obtaining module, wherein the first predicted label represents prediction For the obtained label related to the operation content, the first predicted label belongs to a predicted value, and the first label to be trained belongs to a true value;
  • the update module is configured to update the target joint model using the target model parameters trained by the training module to obtain an enhanced joint model.
  • the server further includes an update module
  • the acquisition module is also used in the training module according to the to-be-trained feature set in each to-be-trained image, and the first to-be-trained label and all corresponding to each to-be-trained image
  • the second to-be-trained label after training to obtain the target joint model, obtain the to-be-trained video, where the to-be-trained video includes multiple frames of interactive images;
  • the acquiring module is further configured to acquire target scene data corresponding to the video to be trained through the target joint model, wherein the target scene data includes relevant data under the target scene;
  • the training module is further configured to obtain target model parameters by training based on the target scene data, the second to-be-trained label, and the second predicted label obtained by the obtaining module, wherein the second predicted label represents prediction For the obtained label related to the operation intention, the second predicted label belongs to a predicted value, and the second label to be trained belongs to a true value;
  • the update module is configured to update the target joint model using the target model parameters trained by the training module to obtain an enhanced joint model.
  • a fifth aspect of the present application provides a server, where the communication device is configured to execute the information prediction method in the first aspect or any possible implementation manner of the first aspect.
  • the server may include a module for performing the information prediction method in the first aspect or any possible implementation manner of the first aspect.
  • a sixth aspect of the present application provides a server, where the communication device is used to execute the model training method in the second aspect or any possible implementation manner of the second aspect.
  • the server may include a module for performing the model training method in the second aspect or any possible implementation manner of the second aspect.
  • a seventh aspect of the present application provides a computer-readable storage medium having instructions stored therein, which when executed on a computer, causes the computer to execute the method described in the above aspects.
  • An eighth aspect of the present application provides a computer program (product), the computer program (product) comprising: computer program code, when the computer program code is executed by a computer, causing the computer to perform any of the above aspects One of the methods.
  • An embodiment of the present application provides a method for information prediction.
  • the server acquires an image to be predicted, and then extracts a set of features to be predicted in the image to be predicted, where the set of features to be predicted includes a first feature to be predicted, a second The feature to be predicted and the third feature to be predicted, the first feature to be predicted represents the image feature of the first region, the second feature to be predicted represents the image feature of the second region, and the third feature to be predicted represents the attribute feature related to the interactive operation,
  • the range of the first area is smaller than the range of the second area.
  • the server can obtain the first label and the second label corresponding to the image to be predicted through the target joint model, where the first label represents the label related to the operation content, the second The label indicates a label related to the operation intention.
  • the first label represents the label related to the operation content
  • the second The label indicates a label related to the operation intention.
  • only one joint model can be used to predict micro-operations and the overall view.
  • the prediction results of the micro-operations are expressed as the first label
  • the prediction results of the overall view are the second label.
  • the models are merged into a joint model, which effectively solves the hard handover problem in the layered model and improves the convenience of prediction.
  • Figure 1 is a schematic diagram of a layered model in the related art
  • Figure 2 is a schematic structural diagram of a layered model in the related art
  • FIG. 3 is a schematic structural diagram of an information prediction system in an embodiment of this application.
  • FIG. 4 is a schematic diagram of a system structure of a target joint model in an embodiment of the present application.
  • FIG. 5 is a schematic diagram of an embodiment of a method for information prediction in an embodiment of the present application.
  • FIG. 6 is a schematic diagram of a workflow of an enhanced joint model in an embodiment of this application.
  • FIG. 7 is a schematic diagram of an embodiment of a method for model training in an embodiment of the present application.
  • FIG. 8 is a schematic diagram of an embodiment for extracting a feature set to be trained in an embodiment of the present application
  • FIG. 9 is a schematic diagram of a feature expression of a feature set to be trained in an embodiment of the present application.
  • FIG. 10 is a schematic diagram of a feature expression of a class image in an embodiment of this application.
  • FIG. 11 is a schematic diagram of a micro-operation label in an embodiment of the present application.
  • FIG. 12 is another schematic diagram of the micro-operation label in the embodiment of the present application.
  • FIG. 13 is another schematic diagram of the micro-operation label in the embodiment of the present application.
  • 15 is a schematic diagram of the overall view label in the embodiment of the present application.
  • 16 is a schematic diagram of a network structure of a target joint model in an embodiment of the present application.
  • 17 is a schematic diagram of a system structure of an enhanced joint model in an embodiment of the present application.
  • 18 is a schematic diagram of another system structure of an enhanced joint model in an embodiment of this application.
  • FIG. 19 is a schematic diagram of an embodiment of a server in an embodiment of this application.
  • 20 is a schematic diagram of another embodiment of the server in the embodiment of the present application.
  • 21 is a schematic diagram of another embodiment of the server in the embodiment of the present application.
  • 22 is a schematic structural diagram of a server in an embodiment of this application.
  • the embodiments of the present application provide a method for information prediction, a method for model training, and a server. Only one joint model can be used to predict micromanipulation and overall view, which effectively solves the hard handover problem in the layered model and improves prediction. Convenience.
  • the models involved in this application can be applied in the field of AI, and its application range includes but is not limited to machine translation, intelligent control, expert systems, robotics, language and image understanding, automatic programming, aerospace applications, and huge information processing, Storage and management, etc.
  • this application will use the network game scene as an example for introduction, which may be a MOBA type game scene.
  • an embodiment of the present application designs an AI model that can better simulate the behavior of human players, and achieve better results in situations such as human-machine battles, simulating dropped players, and players practicing game roles.
  • the typical gameplay of the MOBA game is a multiplayer vs. multiplayer mode, that is, two (or more) teams composed of the same number of players fight each other. Each player controls a hero character and prefers to push the other side away.” The side of the "Crystal" base is the winner.
  • FIG. 3 is a schematic structural diagram of an information prediction system in an embodiment of the application, such as
  • a multi-game game is performed on the client to generate a large amount of game screen data (that is, images to be trained), and then the game screen data is sent to the server.
  • the process of the game screen data may be data generated by a human player in the actual game process, or data obtained by a machine simulating the operation of a human player.
  • This application is mainly based on data provided by a human player. Based on an average of 30 minutes in a game and 15 frames per second, there are 27000 frames of images in each game.
  • This application mainly selects the data related to the overall view task and the micro-manipulation task for training to reduce the complexity of the data.
  • the overall view tasks are divided according to operational intent.
  • the overall view tasks include but are not limited to "Jungle”, “Clear Soldier”, “Team Battle” and “Push Tower”, and each game only has an average of about 100 overall situations.
  • View tasks, and the number of micro-operation decision steps in each overall view task is about 200. Therefore, the number of decision steps for the overall view and the number of micro-operations are within the acceptable range.
  • the server uses the game screen data reported by the client to train the model, and further generates an enhanced joint model based on the target joint model.
  • FIG. 4 is a schematic diagram of a system structure of an enhanced joint model in an embodiment of the present application.
  • the entire model training process can be divided into two stages. Firstly, supervised learning is used to learn the initial joint view of the overall situation and micro operation from the game data of human players, and the full connection (FC) layer and the micro operation FC layer of the overall view are added on the basis of the joint model, thus obtaining Target joint model. Then, through reinforcement learning to optimize the FC layer (or overall view FC layer), the parameters of the other layers remain fixed, so as to improve the core indicators such as the skill hit rate and the success rate of evasion skills in the "group battle".
  • the client is deployed on a terminal device, where the terminal device includes but is not limited to a tablet computer, a laptop computer, a palmtop computer, a mobile phone, and a personal computer (PC), which is not limited herein.
  • the terminal device includes but is not limited to a tablet computer, a laptop computer, a palmtop computer, a mobile phone, and a personal computer (PC), which is not limited herein.
  • An embodiment of the information prediction method in the embodiment of the present application includes:
  • the server first obtains the image to be predicted.
  • the image to be predicted may refer to an image in a MOBA game.
  • Extract a set of features to be predicted in an image to be predicted where the set of features to be predicted includes a first feature to be predicted, a second feature to be predicted, and a third feature to be predicted, the first feature to be predicted represents an image feature of the first region , The second feature to be predicted represents the image feature of the second region, the third feature to be predicted represents the attribute feature related to the interactive operation, and the range of the first region is smaller than the range of the second region;
  • the server needs to extract the set of features to be predicted in the image to be predicted, and the set of features to be predicted here mainly includes three types of features, namely, a first feature to be predicted, a second feature to be predicted, and a third feature to be predicted .
  • the first feature to be predicted represents an image feature of the first area.
  • the first feature to be predicted is a mini-map image feature in the MOBA game.
  • the second feature to be predicted represents the image feature of the second region.
  • the second feature to be predicted is the current visual field image feature in the MOBA game.
  • the third feature to be predicted represents an attribute feature related to the interactive operation.
  • the third feature to be predicted is a hero attribute vector feature in the MOBA game.
  • the server inputs the extracted feature set to be predicted to the target joint model. Furthermore, the server may also input the enhanced joint model after enhancement, where the enhanced joint model is a model obtained by enhancing the target joint model.
  • FIG. 6, is a schematic diagram of a workflow of the target joint model in the embodiment of the present application. As shown in FIG. 6, this application combines the overall view model and the micro operation model into the same model, which is Joint model. On the basis of the joint model, the FC layer of the overall situation and the FC layer of micromanipulation are added to obtain the target joint model, which can be more in line with the human decision-making process.
  • the joint model adopts a unified feature input, that is, input the feature set to be predicted.
  • the output of the overall view task is input into the encoding layer of the operation task in a cascade manner.
  • the joint model can finally output only the first label related to the operation content.
  • the output of the micro-operation FC layer is used as an execution instruction. It is also possible to output only the second tag related to the operation intention, and use the output of the overall view FC layer as the execution instruction according to the second tag. It is also possible to output the first tag and the second tag at the same time, that is, to simultaneously output the micro-operation FC layer and the overall view FC layer as execution instructions according to the first tag and the second tag.
  • a method for information prediction is provided.
  • the server acquires the image to be predicted.
  • the feature set to be predicted in the image to be predicted is extracted.
  • the feature set to be predicted includes a first feature to be predicted, a second feature to be predicted and a third feature to be predicted, the first feature to be predicted represents the image feature of the first region, and the second feature to be predicted represents the image feature of the second region ,
  • the third feature to be predicted represents an attribute feature related to the interactive operation, and the range of the first area is smaller than that of the second area.
  • the server can obtain the first label and the second label corresponding to the image to be predicted through the target joint model.
  • the first label represents a label related to the operation content
  • the second label represents a label related to the operation intention.
  • the prediction results of the micro-operations are expressed as the first label
  • the prediction results of the overall view are the second label.
  • the models are merged into a joint model, which effectively solves the hard handover problem in the layered model and improves the convenience of prediction.
  • the first label and the corresponding first feature set of the feature set to be predicted are obtained through the target joint model
  • the second label may include: acquiring the first label, the second label, and the third label corresponding to the feature set to be predicted through the target joint model, where the third label represents a label related to a win or loss situation.
  • a more comprehensive prediction method is provided, that is, the first label, the second label, and the third label are simultaneously output through the target joint model, which can not only predict the operation under the overall task and the operation under the micro-operation task , You can also predict the outcome.
  • continuous multi-frame images to be predicted are usually input to improve the accuracy of prediction.
  • 100 frames of images to be predicted are input, and feature extraction is performed on each frame of images to be predicted, and 100 sets of feature sets to be predicted are obtained.
  • the outcome of the game such as the game may win, or the game may lose.
  • the target joint model can output not only the first label and the second label, but also a third label, that is, the target joint model can also predict the outcome.
  • the actual situation can better predict the situation results, which helps to improve the reliability of the prediction and increase the flexibility and practicability of the prediction.
  • an example of the method of model prediction in the embodiment of this application include:
  • a set of images to be trained where the set of images to be trained includes N images to be trained, and N is an integer greater than or equal to 1;
  • the server obtains the corresponding image set to be trained through the human player game data reported by the client.
  • the set of images to be trained usually contains multiple frames of images, that is, the set of images to be trained includes N images to be trained to improve the accuracy of the model, and N is an integer greater than or equal to 1.
  • Extract a set of features to be trained in each image to be trained where the set of features to be trained includes a first feature to be trained, a second feature to be trained, and a third feature to be trained, where the first feature to be trained represents the Image features, the second feature to be trained represents the image feature of the second area, the third feature to be trained represents the attribute feature related to the interactive operation, and the range of the first area is smaller than the range of the second area;
  • the server needs to extract the to-be-trained feature set of each to-be-trained image in the to-be-trained image set, where the to-be-trained feature set mainly includes three types of features, which are the first to-be-trained feature and the second to-be-trained feature, respectively And the third feature to be trained.
  • the first feature to be trained represents the image feature of the first region.
  • the first feature to be trained is a mini-map image feature in the MOBA game.
  • the second feature to be trained represents the image feature of the second area.
  • the second feature to be trained is an image feature of the current field of view in the MOBA game.
  • the third feature to be trained represents an attribute feature related to the interactive operation.
  • the third feature to be trained is a hero attribute vector feature in the MOBA game.
  • the server also needs to obtain the first to-be-trained label and the second to-be-trained label corresponding to each to-be-trained image, where the first to-be-trained label represents a label related to the operation content.
  • the first The tag to be trained is a tag related to the micro-manipulation task.
  • the second label to be trained represents a label related to the operation intention.
  • the second label to be trained is a label related to the overall task.
  • step 203 may be executed before step 202, may be executed after step 202, or may be executed simultaneously with step 202, which is not limited herein.
  • the server performs training based on the set of features to be trained extracted from each image to be trained, and the first label to be trained and the second label to be trained corresponding to each image to be trained, thereby obtaining the target combination model.
  • the target joint model can be used to predict the situation of the overall view task and the instructions of the micromanipulation task.
  • a method for model training will be introduced.
  • the server acquires a set of images to be trained, and then extracts a set of features to be trained in each image to be trained, where the set of features to be trained includes the first feature to be trained, The second feature to be trained and the third feature to be trained.
  • the server needs to obtain the first to-be-trained label and the second to-be-trained label corresponding to each to-be-trained image, and finally according to the to-be-trained feature set in each to-be-trained image, and the One to-be-trained label and the second to-be-trained label are trained to obtain the target joint model.
  • the first feature to be trained is a two-dimensional vector feature, where the first The training features include at least one of character position information, moving object position information, fixed object position information, and defense object position information in the first area;
  • the second feature to be trained is a two-dimensional vector feature, where the second feature to be trained includes character position information, moving object position information, fixed object position information, defense object position information, obstacle object position information, and output in the second area At least one of object location information;
  • the third feature to be trained is a one-dimensional vector feature, where the first feature to be trained includes at least one of character life value, character output value, time information, and score information;
  • FIG. 8 is a schematic diagram of an embodiment for extracting a feature set to be trained in an embodiment of the present application.
  • the portion indicated by S1 is hero attribute information, including the hero character in the game, and each The health value, physical attack value, spell attack value, physical defense value, and magic defense value of each hero character.
  • the portion indicated by S2 is a small map, that is, the first area.
  • the hero characters include the hero characters controlled by teammates and the hero characters controlled by the enemy, and the soldier line refers to the position where the soldiers of the two sides engage.
  • Wild monsters refer to non-player (Non-Player Character, NPC) monsters that have "neutral and hostile" objects in the environment other than the player, and are not controlled by the player.
  • the defensive tower is a defensive building. Among them, the two sides of the camp each have a crystal defense tower, destroy the opponent's crystal defense tower to win.
  • the portion indicated by S3 is the current field of view, that is, the second area. In the current field of view, you can clearly see heroes, soldiers, monsters, defensive towers, map obstacles and bullets.
  • FIG. 9 is a schematic diagram of a feature expression of the feature set to be trained in the embodiment of the present application.
  • the vector feature of the hero attribute ie, the third feature to be trained
  • passes through the image feature of the minimap That is, the first to-be-trained feature
  • the current field-of-view image feature that is, the second to-be-trained feature
  • the vector features of hero attributes are features composed of numerical values. Therefore, they belong to one-dimensional vector features.
  • the vector features include but are not limited to the attribute features of hero characters, such as the amount of blood (that is, the health values of five enemy hero characters and me The health value of the five heroes of the party), attack power (that is, the output value of the five heroes of the enemy and the output value of the five heroes of our party), time (the length of a game) and the score (each The team's final score).
  • the features of the minimap-like image and the current-view-like image are both class-like image features.
  • FIG. 10 is a schematic diagram of a feature expression of the class image in the embodiment of the present application. As shown in FIG.
  • the image-like feature is a two-dimensional feature artificially constructed from the original pixel image, which reduces the difficulty of learning directly from the original complex image.
  • the features of the mini-map image include location information such as heroes, soldier lines, monsters, and defense towers, and are used to express information on a macro scale.
  • the current visual image features include the location information of heroes, soldiers, monsters, defensive towers, map obstacles, and bullets, and are used to express local micro-scale information.
  • This multi-modal and multi-scale feature that simulates the human perspective can not only better model the relative positional relationship in space, but also very suitable for the expression of high-dimensional state features in MOBA games.
  • the first feature to be trained is a two-dimensional vector feature
  • the second feature to be trained is a two-dimensional vector feature
  • the third feature to be trained is a one-dimensional Vector features.
  • the first label to be trained includes key type information and/or key parameter information
  • the key parameter information includes at least one of a directional parameter, a position parameter, and a target parameter.
  • the directional parameter is used to indicate the direction of the character's movement
  • the position parameter is used to indicate the position of the character
  • the target parameter is used to The object to be output representing the role.
  • the first label to be trained includes key type information and/or key parameter information.
  • the key type information and key parameter information are considered as the first label to be trained at the same time, so as to improve the accuracy of the label. Since human players usually decide which button to use before deciding on the operating parameters of the button, this application uses a layered label design, that is, first predict which button should be executed at the current time, and then predict the release parameter of the button .
  • the key parameter information is mainly divided into three types of information, which are directional information, location information, and target information.
  • the direction of a circle is 360 degrees. Assuming that every 6 degrees is set as a label, the direction information can be discretized into 60 directions.
  • a hero character usually occupies 1000 pixels in the image, so the position type information can be discretized into 30 ⁇ 30 positions.
  • the target type information is expressed as a candidate attack target, which may refer to the target attacked by the hero character when launching the skill.
  • FIG. 11 is a schematic diagram of the micro-operation label in the embodiment of the present application.
  • the hero character launches skill 3 within the range shown in A1, and the skill direction is the 45-degree direction at the lower right.
  • A2 indicates the position of skill 3 in the operation interface. This indicates that the human player operates as "skill 3+ direction”.
  • FIG. 12 which is another schematic diagram of the micro-operation label in the embodiment of the present application. As shown in FIG. 12, the hero character moves in the direction shown in A3, and the moving direction is to the right. This indicates that the human player operates as "movement + direction”.
  • FIG. 13 is another schematic diagram of the micro-operation label in the embodiment of the present application. As shown in FIG.
  • the hero character launches skill 1, and A4 indicates the position of skill 1 in the operation interface. This indicates that the human player operates as "Skill 1".
  • FIG. 14 is another schematic diagram of the micro-operation label in the embodiment of the present application. As shown in FIG. 14, the hero character launches skill 2 within the range shown in A5, and the skill direction is the 45-degree direction on the upper right .
  • A6 indicates the position of skill 2 in the operation interface. This indicates that the human player operates as "skill 2+ direction".
  • AI can predict the skills of different release types separately, that is, the direction of the directional key, the position of the position key, and the specific target of the target key.
  • the design method of layered labels is closer to the real operation intention of human players in the game process, and is more conducive to AI learning.
  • the first label to be trained includes key type information and/or key parameter information, where the key parameter information includes at least one of a directional parameter, a position parameter, and a target parameter.
  • the type parameter is used to indicate the direction of the character's movement
  • the position type parameter is used to indicate the position of the character
  • the target type parameter is used to indicate the object to be output of the character.
  • the second to-be-trained label includes operation intention information and character position information
  • the operation intention information indicates the purpose of interaction between the character and the object
  • the character position information indicates the position of the character in the first area.
  • the second label to be trained includes operation intention information and character position information.
  • human players will make overall decisions based on the current state of the game, for example, to clear the upper lanes, to fight wild monsters in our wilderness, to participate in team battles on the middle, and to push down defenses. Tower etc.
  • These big picture decisions are not like micro-manipulations, which have clear operation buttons corresponding to them, but are reflected in the player data as an implicit intention.
  • FIG. 15 is a schematic diagram of the overall view label in the embodiment of the present application.
  • the human overall view and the corresponding overall view label are obtained according to the change of the timeline.
  • a game of human players can be divided into “group battle”, “clearing soldiers”, “outfield” and “push tower” and other scenes, and these scenes can be modeled to express the player's overall intentions.
  • the minimap is discretized into 24*24 grids, and the character position information is expressed as the grid where the character will attack next time.
  • the second tag to be trained is operation intention information + character position information, that is, respectively represented as “fielder+coordinate A”, “team battle+coordinate B”, and “clearing soldier+coordinate C”.
  • the second label to be trained includes operation intention information and character position information, where the operation intention information indicates the purpose of interaction between the character and the object, and the character position information indicates the position of the character in the first area .
  • the use of operation intention information and character position information together reflect the overall view of human players. The decision of the overall view in the MOBA game is very important, thereby improving the feasibility and operability of the program.
  • the target joint model obtained by training may include:
  • first predicted label and the second predicted label corresponding to the target feature set through the long and short-term memory LSTM layer, where the first predicted label represents the predicted label related to the operation content, and the second predicted label represents the predicted AND operation Intent-related tags;
  • the core parameters of the model are obtained by training, where both the first prediction label and the second prediction label belong to the predicted value , Both the first label to be trained and the second label to be trained are true values;
  • the target joint model is generated according to the core parameters of the model.
  • FIG. 16 is a schematic diagram of a network structure of the target joint model in the embodiment of the present application.
  • the input of the model is the set of features to be trained in the image to be trained in the current frame.
  • the set of features to be trained includes image features of the minimap class (ie, the first feature to be trained), and the image features of the current field of view (ie, the second Feature to be trained) and the vector feature of the hero character (ie the third feature to be trained).
  • the image-like features are respectively encoded by a convolutional network, and the vector features are encoded by a fully connected network to obtain a target feature set.
  • the target feature set includes a first target feature, a second target feature, and a third target feature.
  • the first target feature is obtained after the first feature to be trained is processed
  • the second target feature is obtained after the second feature to be trained is processed
  • the third target feature is obtained after the third feature to be trained is processed.
  • Then the target feature set is spliced into a common coding layer.
  • the coding layer is input to the Long Short-Term Memory (LSTM) network layer.
  • the LSTM network layer is mainly used to solve some observable problems in the hero's field of vision.
  • the LSTM network is a time-recursive neural network, suitable for processing and predicting important events with relatively long intervals and delays in the time series.
  • the difference between LSTM and Recurrent Neural Network (RNN) is mainly because it adds a processor to determine whether the information is useful or not in the algorithm.
  • the structure of this processor is called a unit. Three doors are placed in a unit, called input gate, forget gate and output gate. When a piece of information enters the LSTM network layer, it can be judged whether it is useful according to the rules. Only the information that meets the algorithm authentication will be left, and the information that does not match is forgotten through the forget gate.
  • LSTM is an effective technology to solve the problem of long-sequence dependence, and the universality of this technology is very high.
  • MOBA game there may be an invisible vision problem, that is, our hero character can only observe enemy heroes, monsters and lines near our unit (such as the teammate’s hero character), and the enemy in other positions Units are not observed, and enemy heroes can also block their vision by hiding in the grass or using stealth skills.
  • the integrity of information is considered in the process of training the model, it is necessary to use the LSTM network layer to restore these hidden information.
  • the first prediction label and the second prediction label of the image to be trained in the frame can be obtained. Since the first to-be-trained label and the second to-be-trained label of the frame to be trained image are determined according to the manual labeling result, a minimum value between the first predicted label and the first to-be-trained label can be obtained by using a loss function, and the loss is used The function obtains the minimum value between the second predicted label and the second label to be trained, and determines the core parameters of the model in the case of the minimum value.
  • the core parameters of the model include model parameters under micro-manipulation tasks (such as key press, movement, general attack, skill 1, skill 2 and skill 3, etc.) and model parameters under the overall task.
  • the target joint model is generated according to the core parameters of the model.
  • each output task can be calculated independently, that is, the output layer fully connected network parameters of each task are only affected by the task.
  • the goal joint model contains auxiliary tasks for predicting the position and intention of the overall view.
  • the output of the overall view task is input into the coding layer of the micromanipulation task in a cascade form.
  • the loss function is used to measure the degree of inconsistency between the predicted value and the true value of the model. It is a non-negative real-valued function. The smaller the loss function, the better the robustness of the identification model.
  • the loss function is the core part of the empirical risk function and an important part of the structural risk function. Common loss functions include but are not limited to hinge loss (Hinge Loss), mutual entropy loss (Cross Entropy Loss), square loss (Square Loss) and exponential loss (Exponential Loss).
  • the embodiment of the present application provides a process of training to obtain a target joint model, which mainly includes first processing the feature set to be trained in each image to be trained to obtain the target feature set. Then obtain the first prediction label and the second prediction label corresponding to the target feature set through LSTM, and then according to the first prediction label, the first training label, the second prediction label and the second training label of each image to be trained, The core parameters of the model are obtained by training, and the core parameters of the model are used to generate the target joint model.
  • the LSTM layer can solve the problem of unobservable part of the field of view, that is, the LSTM layer can obtain data in the past period of time, which can make the data more complete, which is conducive to inference and decision-making during the model training process.
  • Processing to obtain a target feature set may include: processing the third to-be-trained feature in each to-be-trained image through a fully connected layer to obtain a third target feature, where the third target feature is a one-dimensional vector feature;
  • the convolutional layer processes the second to-be-trained feature in each to-be-trained image to obtain a second target feature, where the second target feature is a one-dimensional vector feature;
  • the features to be trained are processed to obtain a first target feature, where the first target feature is a one-dimensional vector feature.
  • the set of features to be trained includes image features of the minimap class (ie, the first feature to be trained), and the current field of view Image features (ie, the second feature to be trained) and vector features of the hero character (ie, the third feature to be trained).
  • the processing method for the third feature to be trained is that the third feature to be trained is input to the FC layer, and the third target feature is obtained through the output of the FC layer.
  • the role of the FC layer is to map the distributed feature representation to the sample label space.
  • Each node of the FC layer is connected to all the nodes of the previous layer to synthesize the features extracted from the front. Due to its fully connected nature, the parameters of the fully connected layer are usually the most.
  • the processing method for the first to-be-trained feature and the second to-be-trained feature is to input the two to the convolutional layer, output the first target feature corresponding to the first to-be-trained feature through the convolutional layer, and output the second to-be-trained feature The second target feature corresponding to the training feature.
  • the original image can be flattened through the convolutional layer.
  • a pixel will have a great correlation with the data in the up, down, left, right and other directions. When fully connected, after the data is expanded, it is easy to ignore the correlation of the picture. Sex, or force two unrelated pixels to be related together. Therefore, it is necessary to perform convolution processing on the image data.
  • the first target feature obtained after the convolutional layer is a 100-dimensional vector feature.
  • the image pixel corresponding to the second feature to be trained is 10 ⁇ 10, and the second target feature obtained after the convolutional layer is a 100-dimensional vector feature.
  • the third target feature corresponding to the third feature to be trained is a 10-dimensional vector feature. Then, after the concat layer, 210 (100+100+10) vector features can be obtained.
  • the feature set to be trained can also be processed, that is, the first target feature in each image to be trained is processed through the fully connected layer to obtain the first target feature.
  • the second target feature is obtained by processing the second feature to be trained in each image to be trained through the convolutional layer.
  • the third target feature is obtained by processing the third feature to be trained in each image to be trained through the convolutional layer.
  • the long-short-term memory LSTM layer is used to obtain the corresponding target feature set.
  • the first prediction label and the second prediction label may include:
  • the third predicted label represents the predicted label related to the outcome of the victory and defeat;
  • the core parameters of the model obtained by training include:
  • the core parameters of the model are obtained by training, where the third training label belongs to the predicted value ,
  • the third predicted label belongs to the true value.
  • the target joint model can further predict the outcome.
  • the third to-be-trained label of the to-be-trained image of the frame can be obtained based on the output result of the LSTM layer. Since the third to-be-trained label and the third to-be-trained label of the frame to be trained image are determined according to the manual labeling result, a minimum value between the third predicted label and the third to-be-trained label can be obtained by using a loss function at this time To determine the core parameters of the model.
  • the core parameters of the model include not only the model parameters under the micro-manipulation tasks (such as key press, mobile, general attack, skill 1, skill 2 and skill 3, etc.) and the model parameters under the overall view task, but also include the task of winning and losing.
  • the model parameters under the model are finally generated according to the core parameters of the model.
  • the target joint model can further train the labels related to the outcome, that is, the server obtains the first predicted label, the second predicted label and the third predicted label corresponding to the target feature set through the LSTM layer ,
  • the third prediction label represents the predicted label related to the outcome, then obtain the third training label corresponding to each image to be trained, and finally according to the first prediction label, the first training label, and the second prediction label ,
  • the second label to be trained, the third prediction label and the third label to be trained, and the core parameters of the model are obtained through training.
  • the goal joint model can also predict the game win rate, which can strengthen the recognition and learning of the situation, thereby enhancing the reliability and diversity of model applications.
  • the seventh optional embodiment of the method for providing model training provided by the embodiments of the present application, according to each The feature set to be trained in the images to be trained, and the first to-be-trained label and the second to-be-trained label corresponding to each to-be-trained image.
  • the method may further include:
  • the video to be trained includes multiple frames of interactive images
  • target scene data corresponding to the video to be trained through the target joint model, where the target scene data includes relevant data under the target scene;
  • the target model parameters are obtained by training, where the first predicted label represents the predicted label related to the operation content, and the first predicted label belongs to the predicted value.
  • the training label belongs to the real value;
  • the target joint model is updated using the target model parameters to obtain an enhanced joint model.
  • this application can optimize some of the task layers in the target joint model through reinforcement learning. For example, instead of performing reinforcement learning on the FC layer of the overall view, only reinforcement learning is performed on the FC layer of micromanipulation.
  • FIG. 17 is a schematic diagram of a system structure of an enhanced joint model in an embodiment of the present application.
  • the target joint model includes a joint model, an overall view FC layer, and a micro-operation FC layer.
  • the coding layer and the overall view FC layer in the joint model have obtained corresponding core model parameters through supervised learning. It should be noted that in the process of reinforcement learning, the core model parameters of the coding layer and the overall view FC layer in the joint model remain unchanged, so that there is no need to learn feature expression during reinforcement learning, which accelerates reinforcement The convergence rate of learning.
  • the average number of decision-making steps for a micro-manipulation task in a team battle scenario is 100 steps (about 20 seconds), which can effectively reduce the number of decision-making steps.
  • the FC layer of the micromanipulation is trained with a reinforcement learning algorithm.
  • the algorithm may specifically be a deep reinforcement learning (Proximal Policy Optimization) (PPO) algorithm.
  • Step 1 After training to obtain the target joint model, the server can load the supervised learning target joint model, and fix the coding layer and overall view FC layer of the joint model, and the game environment needs to be loaded.
  • Step 2 Obtain the video to be trained.
  • the video to be trained includes multiple frames of interactive images.
  • the target joint model is used to start the battle from the starting frame in the video to be trained, and the target scene data of the hero team battle scene is saved.
  • the target scene data can include features, actions, reward signals and Probability distribution of the joint model network output.
  • the features are the vector features of the hero attribute, the image features of the mini-map and the image features of the current field of view. Actions are the buttons used by the player to control the hero character.
  • the reward signal is the number of times the hero character killed the enemy hero character during the team battle.
  • the probability distribution of the joint model network output can be expressed as the distribution probability of each label in the micromanipulation task, for example, the distribution probability of label 1 is 0.1, the distribution probability of label 2 is 0.3, and the distribution probability of label 3 is 0.6.
  • Step 3 Based on the target scene data, the first to-be-trained label and the first predicted label, obtain the target model parameters through training, and use the PPO algorithm to update the core model parameters in the target joint model.
  • the model parameters of the micro-operation FC layer are updated here, that is, the updated model parameters are generated according to the first to-be-trained label and the first predicted label.
  • the first to-be-trained label and the first predicted label are both related to the micro-operation task.
  • step 4 if each frame in the training video is processed from step 2 to step 4, if the maximum number of iterative frames is not reached, the updated target joint model is sent to the battle environment and returns to step 2. If the maximum number of iteration frames is reached, step 5 is entered.
  • the maximum number of iterative frames can be set based on experience or based on scene settings, which is not limited in the embodiments of the present application.
  • Step 5 Save the enhanced joint model obtained after the final enhancement.
  • part of the task layer in the target joint model can also be optimized through reinforcement learning. If the part of the micro-operation task needs to be strengthened, the server obtains the video to be trained. Then, the target scene data corresponding to the video to be trained is obtained through the target joint model, and based on the target scene data, the first to-be-trained label, and the first predicted label, the target model parameters are obtained by training. Finally, the server uses the target model parameters to update the target joint model to obtain an enhanced joint model.
  • strengthening the FC layer of micro-operations can improve AI capabilities.
  • reinforcement learning can overcome the misoperation problems that exist due to various factors such as human tension or inattention, thereby greatly reducing training data. The number of undesirable samples in the system improves the reliability of the model and the accuracy of prediction using the model. Reinforcement learning methods can only strengthen part of the scene, thereby reducing the number of decision steps and accelerating the speed of convergence.
  • the eighth optional embodiment of the method for model training provided by the embodiments of the present application, according to each The feature set to be trained in the images to be trained, and the first to-be-trained label and the second to-be-trained label corresponding to each to-be-trained image.
  • the method may further include:
  • the video to be trained includes multiple frames of interactive images
  • target scene data corresponding to the video to be trained through the target joint model, where the target scene data includes relevant data under the target scene;
  • the target model parameters are obtained by training, where the second predicted label represents the predicted label related to the operation intention, and the second predicted label belongs to the predicted value, and the second waiting label
  • the training label belongs to the real value
  • the target joint model is updated using the target model parameters to obtain an enhanced joint model.
  • this application can optimize some of the task layers in the target joint model through reinforcement learning. For example, intensive learning is not performed on the FC layer of micro-manipulation, but only on the FC layer of the overall situation.
  • FIG. 18 is a schematic diagram of another system structure of an enhanced joint model in an embodiment of the present application.
  • the target joint model includes a joint model, an overall view FC layer, and a micro-operation FC layer.
  • the coding layer and micro-operation FC layer in the joint model have obtained corresponding core model parameters through supervised learning. It should be noted that in the process of reinforcement learning, the core model parameters of the coding layer and the micro-operation FC layer in the joint model remain unchanged, so that there is no need to learn feature expression during reinforcement learning, thereby accelerating reinforcement The convergence rate of learning.
  • the overall view FC layer uses reinforcement learning algorithm for training.
  • the algorithm can be PPO (Proximal Policy Optimization) algorithm or deep reinforcement learning (Actor-Critic) algorithm.
  • Step 1 After training to obtain the target joint model, the server can load the supervised learning target joint model, fix the coding layer and micro-operation FC layer of the joint model, and need to load the game environment.
  • Step 2 Obtain the video to be trained.
  • the video to be trained includes multiple frames of interactive images.
  • the target joint model is used to start the battle from the start frame in the video to be trained, and the target scene data of the hero team battle scene is saved.
  • the target scene data includes "beating wild” and “clearing soldiers” ", “Team Battle”, and “Push Tower” and other scenarios.
  • Step 3 Based on the target scene data, the second to-be-trained label and the second predicted label, obtain the target model parameters through training, and use the Actor-Critic algorithm to update the core model parameters in the target joint model.
  • the model parameters of the FC layer in the overall view are updated here, that is, the updated model parameters are generated according to the second to-be-trained label and the second predicted label.
  • the second to-be-trained label and the second predicted label are both related to the overall task.
  • step 4 if each frame in the training video is processed from step 2 to step 4, if the maximum number of iterative frames is not reached, the updated target joint model is sent to the battle environment and returns to step 2. If the maximum number of iteration frames is reached, step 5 is entered.
  • Step 5 Save the enhanced joint model obtained after the final enhancement.
  • reinforcement learning can also be used to optimize part of the task layer in the target joint model. If it is necessary to strengthen the part of the current task, the server obtains the video to be trained. Then, the target scene data corresponding to the video to be trained is obtained through the target joint model, and based on the target scene data, the second to-be-trained label, and the second predicted label, the target model parameters are obtained by training. Finally, the server uses the target model parameters to update the target joint model to obtain an enhanced joint model. In this way, strengthening the FC layer of the overall view can improve AI capabilities.
  • reinforcement learning can also overcome the misoperation problems due to various factors such as human tension or inattention, thereby greatly reducing the number of bad samples in the training data, thereby improving the reliability of the model and its application The accuracy of the model's predictions. Reinforcement learning methods can only strengthen part of the scene, thereby reducing the number of decision steps and accelerating the speed of convergence.
  • FIG. 19 is a schematic diagram of an embodiment of the server in the embodiment of the present application.
  • the server 30 includes:
  • the obtaining module 301 is used to obtain an image to be predicted
  • the extracting module 302 is used to extract the feature set to be predicted in the to-be-predicted image acquired by the acquiring module 301.
  • the feature set to be predicted includes a first feature to be predicted, a second feature to be predicted, and a third feature to be predicted
  • the first feature to be predicted represents an image feature of a first region
  • the second feature to be predicted represents Image features of the second area
  • the third feature to be predicted represents attribute features related to interactive operations
  • the range of the first area is smaller than the range of the second area
  • the acquiring module 301 is further configured to acquire the first label and the second label corresponding to the feature set to be predicted extracted by the extraction module 302 through a target joint model.
  • the first label represents a label related to operation content
  • the second label represents a label related to operation intention.
  • the acquisition module 301 acquires the image to be predicted, and the extraction module 302 extracts the feature set to be predicted from the image to be predicted acquired by the acquisition module 301.
  • the feature set to be predicted includes a first feature to be predicted, a second feature to be predicted, and a third feature to be predicted
  • the first feature to be predicted represents an image feature of a first region
  • the second feature to be predicted represents Image features of the second region
  • the third feature to be predicted represents attribute features related to interactive operations
  • the range of the first region is smaller than the range of the second region.
  • the obtaining module 301 obtains the first label and the second label corresponding to the feature set to be predicted extracted by the extracting module 302 through the target joint model.
  • the first label represents a label related to operation content
  • the second label represents a label related to operation intention.
  • a server obtains an image to be predicted, and then extracts a feature set to be predicted from the image to be predicted.
  • the feature set to be predicted includes a first feature to be predicted, a second feature to be predicted and a third feature to be predicted, the first feature to be predicted represents the image feature of the first region, and the second feature to be predicted represents the image feature of the second region ,
  • the third feature to be predicted represents an attribute feature related to the interactive operation, and the range of the first area is smaller than that of the second area.
  • the server can obtain the first label and the second label corresponding to the image to be predicted through the target joint model.
  • the first label represents a label related to the operation content
  • the second label represents a label related to the operation intention.
  • only one joint model can be used to predict the micro-operation and the overall view.
  • the prediction result of the micro-operation is expressed as the first label
  • the prediction result of the overall view is the second label. Therefore, combining the overall view model and the micro-operation model into a joint model effectively solves the hard handover problem in the layered model and improves the convenience of prediction.
  • the obtaining module 301 is configured to obtain the target The first label, the second label, and the third label corresponding to the prediction feature set.
  • the third label represents a label related to the situation of victory or defeat.
  • the target joint model can output not only the first label and the second label, but also the third label, that is, the target joint model can also predict the outcome.
  • the actual situation can better predict the situation, which helps to improve the reliability of the prediction, and increase the flexibility and practicality of the prediction.
  • FIG. 20 is a schematic diagram of an embodiment of the server in the embodiment of the present application.
  • the server 40 includes:
  • the obtaining module 401 is used to obtain a set of images to be trained, wherein the set of images to be trained includes N images to be trained, where N is an integer greater than or equal to 1;
  • An extraction module 402 is used to extract the set of features to be trained in each to-be-trained image acquired by the acquiring module 401.
  • the feature set to be trained includes a first feature to be trained, a second feature to be trained, and a third feature to be trained, the first feature to be trained represents an image feature of a first region, and the second feature to be trained represents Image features of a second area, the third feature to be trained represents attribute features related to interactive operations, and the range of the first area is smaller than the range of the second area;
  • the obtaining module 401 is configured to obtain a first label to be trained and a second label to be trained corresponding to each image to be trained.
  • the first label to be trained represents a label related to operation content
  • the second label to be trained represents a label related to operation intention
  • the training module 403 is configured to obtain the feature set to be trained in each to-be-trained image extracted by the extraction module 402, and the corresponding to the first to be trained image acquired by the acquiring module A label to be trained and the second label to be trained to obtain the target joint model.
  • the acquisition module 401 acquires the set of images to be trained.
  • the set of images to be trained includes N images to be trained, where N is an integer greater than or equal to 1, and the extraction module 402 extracts the set of features to be trained in each image to be trained acquired by the acquisition module 401.
  • the feature set to be trained includes a first feature to be trained, a second feature to be trained, and a third feature to be trained, the first feature to be trained represents an image feature of a first region, and the second feature to be trained represents Image features of the second region, the third feature to be trained represents attribute features related to interactive operations, and the range of the first region is smaller than the range of the second region.
  • the obtaining module 401 obtains a first label to be trained and a second label to be trained corresponding to each image to be trained.
  • the first to-be-trained label represents a label related to operation content
  • the second to-be-trained label represents a label related to operation intention
  • the training module 403 extracts each to-be-trained image extracted according to the extraction module 402
  • the set of features to be trained in, and the first to-be-trained label and the second to-be-trained label corresponding to each of the to-be-trained images acquired by the acquisition module are trained to obtain a target joint model.
  • a server will be introduced.
  • the server obtains a set of images to be trained, and then extracts a set of features to be trained in each image to be trained.
  • the feature set to be trained includes a first feature to be trained, a second feature to be trained, and a third feature to be trained.
  • the server needs to obtain the first to-be-trained label and the second to-be-trained label corresponding to each to-be-trained image, and finally according to the to-be-trained feature set in each to-be-trained image, and the One to-be-trained label and the second to-be-trained label are trained to obtain the target joint model.
  • the first feature to be trained is a two-dimensional vector feature, where the A feature to be trained includes at least one of character position information, moving object position information, fixed object position information and defensive object position information in the first area;
  • the second feature to be trained is a two-dimensional vector feature, wherein the second feature to be trained includes character position information, moving object position information, fixed object position information, defensive object position information in the second area, At least one of obstacle object position information and output object position information;
  • the third feature to be trained is a one-dimensional vector feature, wherein the first feature to be trained includes at least one of character life value, character output value, time information, and score information;
  • the first feature to be trained is a two-dimensional vector feature
  • the second feature to be trained is a two-dimensional vector feature
  • the third feature to be trained is a one-dimensional Vector features.
  • the first label to be trained includes key type information and/or key parameter information
  • the key parameter information includes at least one of a directional parameter, a position parameter, and a target parameter.
  • the directional parameter is used to indicate the direction of the character's movement
  • the position parameter is used to indicate where the character is located.
  • the target type parameter is used to represent the object to be output of the character.
  • the first label to be trained includes key type information and/or key parameter information, where the key parameter information includes at least one of a directional parameter, a position parameter, and a target parameter.
  • the type parameter is used to indicate the direction of the character's movement
  • the position type parameter is used to indicate the position of the character
  • the target type parameter is used to indicate the object to be output of the character.
  • the second label to be trained includes operation intention information and character position information
  • the operation intention information indicates the purpose of interaction between the character and the object
  • the character position information indicates the position of the character in the first area
  • the second label to be trained includes operation intention information and character position information, where the operation intention information indicates the purpose of interaction between the character and the object, and the character position information indicates the position of the character in the first area .
  • the use of operation intention information and character position information together reflect the overall view of human players. The decision of the overall view in the MOBA game is very important, thereby improving the feasibility and operability of the program.
  • the training module 403 is configured to Processing the feature set to be trained to obtain a target feature set, where the target feature set includes a first target feature, a second target feature, and a third target feature;
  • the first predicted label represents a predicted label related to the operation content
  • the second predicted label Represents the predicted label related to the operation intention
  • the core parameters of the model are obtained by training, wherein the first Both a predicted label and the second predicted label belong to the predicted value, and both the first to-be-trained label and the second to-be-trained label are true values;
  • the target joint model is generated according to the core parameters of the model.
  • the embodiment of the present application provides a process of training to obtain a target joint model, which mainly includes first processing the feature set to be trained in each image to be trained to obtain the target feature set. Then obtain the first prediction label and the second prediction label corresponding to the target feature set through LSTM, and then according to the first prediction label, the first training label, the second prediction label and the second training label of each image to be trained, The core parameters of the model are obtained by training, and the core parameters of the model are used to generate the target joint model.
  • the LSTM layer can solve the problem of unobservable part of the field of view, that is, the LSTM layer can obtain data in the past period of time, which can make the data more complete, which is conducive to inference and decision-making during the model training process.
  • the training module 403 is configured to Processing the third feature to be trained in the training image to obtain the third target feature, wherein the third target feature is a one-dimensional vector feature;
  • the feature set to be trained can also be processed, that is, the first feature to be trained in each image to be trained is processed through the fully connected layer to obtain the first target feature, and the convolutional layer is used to process each feature.
  • the second feature to be trained in the images to be trained is processed to obtain the second target feature
  • the third target feature in each to be trained image is processed through the convolution layer to obtain the third target feature.
  • the training module 403 is configured to acquire the target through a long-short-term memory LSTM layer A first predicted label, a second predicted label, and a third predicted label corresponding to the feature set, where the third predicted label represents the predicted label related to the outcome of the victory and defeat;
  • the training obtains the Core parameters of the model, wherein the third label to be trained belongs to a predicted value, and the third predicted label belongs to a true value.
  • the target joint model can further train the labels related to the outcome, that is, the server obtains the first predicted label, the second predicted label and the third predicted label corresponding to the target feature set through the LSTM layer ,
  • the third prediction label represents the predicted label related to the outcome, then obtain the third training label corresponding to each image to be trained, and finally according to the first prediction label, the first training label, and the second prediction label ,
  • the second label to be trained, the third prediction label and the third label to be trained, and the core parameters of the model are obtained through training.
  • the goal joint model can also predict the game win rate, which can strengthen the recognition and learning of the situation, thereby enhancing the reliability and diversity of model applications.
  • the server 40 further includes an update module 404;
  • the obtaining module 401 is also used in the training module 403 according to the feature set to be trained in each to-be-trained image and the first to-be-trained label corresponding to each to-be-trained image And the second label to be trained, after training to obtain the target joint model, the video to be trained is obtained, wherein the video to be trained includes multiple frames of interactive images;
  • the obtaining module 401 is further configured to obtain target scene data corresponding to the video to be trained through the target joint model, wherein the target scene data includes relevant data under the target scene;
  • the training module 403 is further used for training to obtain target model parameters based on the target scene data, the first to-be-trained label and the first predicted label obtained by the obtaining module 401, wherein the first predicted label Indicates a predicted label related to the operation content, the first predicted label belongs to a predicted value, and the first label to be trained belongs to a true value;
  • the update module 404 is configured to update the target joint model using the target model parameters trained by the training module 403 to obtain an enhanced joint model.
  • part of the task layer in the target joint model can also be optimized through reinforcement learning. If the part of the micro-operation task needs to be strengthened, the server obtains the video to be trained. Then, the target scene data corresponding to the video to be trained is obtained through the target joint model, and based on the target scene data, the first to-be-trained label, and the first predicted label, the target model parameters are obtained by training. Finally, the server uses the target model parameters to update the target joint model to obtain an enhanced joint model.
  • strengthening the FC layer of micro-operations can improve AI capabilities.
  • reinforcement learning can overcome the misoperation problems that exist due to various factors such as human tension or inattention, thereby greatly reducing training data. The number of undesirable samples in the system improves the reliability of the model and the accuracy of prediction using the model. Reinforcement learning methods can only strengthen part of the scene, thereby reducing the number of decision steps and accelerating the speed of convergence.
  • the server 40 further includes an update module 404;
  • the obtaining module 401 is also used in the training module 403 according to the feature set to be trained in each to-be-trained image and the first to-be-trained label corresponding to each to-be-trained image And the second label to be trained, after training to obtain the target joint model, the video to be trained is obtained.
  • the video to be trained includes multiple frames of interactive images
  • the obtaining module 401 is further configured to obtain target scene data corresponding to the video to be trained through the target joint model, wherein the target scene data includes relevant data under the target scene;
  • the training module 403 is further configured to train and obtain target model parameters according to the target scene data, the second to-be-trained label, and the second predicted label obtained by the obtaining module 401.
  • the second predicted label represents a predicted label related to an operation intention, the second predicted label belongs to a predicted value, and the second to-be-trained label belongs to a true value;
  • the update module 404 is configured to update the target joint model using the target model parameters trained by the training module 403 to obtain an enhanced joint model.
  • reinforcement learning can also be used to optimize part of the task layer in the target joint model. If it is necessary to strengthen the part of the current task, the server obtains the video to be trained. Then, the target scene data corresponding to the video to be trained is obtained through the target joint model, and based on the target scene data, the second to-be-trained label, and the second predicted label, the target model parameters are obtained by training. Finally, the server uses the target model parameters to update the target joint model to obtain an enhanced joint model.
  • strengthening the FC layer of the overall view can improve AI capabilities.
  • reinforcement learning can also overcome the misoperation problems due to various factors such as human tension or inattention, thereby greatly reducing training data. The number of undesirable samples in the system improves the reliability of the model and the accuracy of prediction using the model. Reinforcement learning methods can only strengthen part of the scene, thereby reducing the number of decision steps and speeding up the convergence speed.
  • the server 500 may have a relatively large difference due to different configurations or performances, and may include one or more central processing units (CPUs) 522 (for example , One or more processors) and memory 532, one or more storage media 530 (such as one or more mass storage devices) that stores application programs 542 or data 544.
  • the memory 532 and the storage medium 530 may be short-term storage or persistent storage.
  • the program stored in the storage medium 530 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations on the server.
  • the central processor 522 may be configured to communicate with the storage medium 530 and execute a series of instruction operations in the storage medium 530 on the server 500.
  • the server 500 may also include one or more power supplies 526, one or more wired or wireless network interfaces 550, one or more input/output interfaces 558, and/or one or more operating systems 541, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
  • the steps performed by the server in the above embodiments may be based on the server structure shown in FIG. 22.
  • the CPU 522 is used to perform the following steps:
  • Extracting a set of features to be predicted in the image to be predicted wherein the set of features to be predicted includes a first feature to be predicted, a second feature to be predicted, and a third feature to be predicted, the first feature to be predicted represents the first Image features of the area, the second feature to be predicted represents the image feature of the second area, the third feature to be predicted represents the attribute feature related to the interactive operation, and the range of the first area is smaller than that of the second area range;
  • first label and/or the second label corresponding to the feature set to be predicted through the target joint model wherein the first label represents a label related to operation content, and the second label represents a related to operation intention label.
  • the CPU 522 is used to perform the following steps:
  • the first label, the second label, and a third label corresponding to the feature set to be predicted are obtained through the target joint model, where the third label represents a label related to a win or loss situation.
  • the CPU 522 is used to perform the following steps:
  • the set of images to be trained includes N images to be trained, where N is an integer greater than or equal to 1;
  • Extracting a set of features to be trained in each image to be trained wherein the set of features to be trained includes a first feature to be trained, a second feature to be trained and a third feature to be trained, the first feature to be trained represents the first Image features of the area, the second feature to be trained represents the image feature of the second area, the third feature to be trained represents the attribute feature related to the interactive operation, and the range of the first area is smaller than that of the second area range;
  • first to-be-trained label represents a label related to operation content
  • second to-be-trained label represents and operation Intent-related tags
  • a target joint model is obtained by training.
  • the CPU 522 is used to perform the following steps:
  • the target feature set includes a first target feature, a second target feature, and a third target feature
  • the first predicted label represents a predicted label related to the operation content
  • the second predicted label Represents the predicted label related to the operation intention
  • the core parameters of the model are obtained by training, wherein the first Both a predicted label and the second predicted label belong to the predicted value, and both the first to-be-trained label and the second to-be-trained label are true values;
  • the target joint model is generated according to the core parameters of the model.
  • the CPU 522 is used to perform the following steps:
  • the CPU 522 is used to perform the following steps:
  • training to obtain core parameters of the model includes:
  • the training obtains the Core parameters of the model, wherein the third label to be trained belongs to a predicted value, and the third predicted label belongs to a true value.
  • the CPU 522 is also used to perform the following steps:
  • Target scene data corresponding to the video to be trained through the target joint model, wherein the target scene data includes related data under the target scene
  • the first to-be-trained label and the first predicted label, the target model parameters are obtained by training, wherein the first predicted label represents a predicted label related to the operation content, and the first prediction The label belongs to the predicted value, and the first label to be trained belongs to the real value;
  • the CPU 522 is also used to perform the following steps:
  • Target scene data corresponding to the video to be trained through the target joint model, wherein the target scene data includes related data under the target scene
  • the target model parameters are obtained by training, wherein the second predicted label represents a predicted label related to the operation intention, and the second prediction The label belongs to the predicted value, and the second label to be trained belongs to the real value;
  • the disclosed system, device, and method may be implemented in other ways.
  • the device embodiments described above are only schematic.
  • the division of the units is only a division of logical functions.
  • there may be other divisions for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above integrated unit may be implemented in the form of hardware or software functional unit.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium.
  • the technical solution of the present application essentially or part of the contribution to the existing technology or all or part of the technical solution can be embodied in the form of a software product, the computer software product is stored in a storage medium , Including several instructions to enable a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM), random access memory (random access memory, RAM), magnetic disk or optical disk and other media that can store program codes .
  • the "plurality” referred to herein refers to two or more.
  • “And/or” describes the relationship of the related objects, indicating that there can be three relationships, for example, A and/or B, which can indicate: there are three conditions: A exists alone, A and B exist at the same time, and B exists alone.
  • the character “/” generally indicates that the related object is a “or” relationship.
  • “At least one” means one or more.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Medical Informatics (AREA)
  • Biophysics (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computational Mathematics (AREA)
  • Algebra (AREA)
  • Probability & Statistics with Applications (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Optics & Photonics (AREA)
  • Image Analysis (AREA)

Abstract

一种信息预测的方法,模型训练的方法及服务器,所述信息预测方法包括:获取待预测图像(101);提取待预测图像中的待预测特征集合,待预测特征集合包括第一待预测特征、第二待预测特征以及第三待预测特征,第一待预测特征表示第一区域的图像特征,第二待预测特征表示第二区域的图像特征,第三待预测特征表示与交互操作相关的属性特征,第一区域的范围小于第二区域的范围(102);通过目标联合模型获取待预测特征集合所对应的第一标签和/或第二标签,第一标签表示与操作内容相关的标签,第二标签表示与操作意图相关的标签(103)。仅使用一个联合模型就可以预测微操和大局观,有效地解决了分层模型中的硬切换问题,提升了预测的便利性。

Description

信息预测的方法、模型训练的方法以及服务器
本申请要求于2018年12月13日提交的申请号为201811526060.1、发明名称为“一种信息预测的方法、模型训练的方法以及服务器”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能技术领域,尤其涉及一种信息预测的方法、模型训练的方法以及服务器。
背景技术
人工智能(artificial intelligence,AI)程序已经在规则明确的棋类游戏里战胜了顶级职业选手。相比之下,多人在线战术竞技(multiplayer online battle arena,MOBA)游戏的操作更加复杂,更接近真实世界的场景。攻克MOBA游戏中的AI问题,有助于我们探索和解决真实世界的复杂问题。
基于MOBA游戏操作的复杂性,通常可以将整个MOBA游戏的操作分为大局观和微操两种,以降低整个MOBA游戏的复杂程度。请参阅图1,图1为相关技术中分层建立模型的一个示意图,如图1所示,按“打野”、“清兵”、“团战”以及“推塔”等大局观决策划分,每局游戏平均只有约100个的大局观任务,每个大局观任务中的微操决策步数平均为200个。基于上述内容,请参阅图2,图2为相关技术中分层模型的一个结构示意图,如图2所示,采用大局观特征建立大局观模型,并采用微操特征建立微操模型,可以通过大局观模型输出大局观标签,可以通过微操模型输出微操标签。
然而,分层建模需要分别设计和训练大局观模型和微操模型,也就是说,这两个模型之间是相互独立的,在实际应用中,还需决定选用采用哪个模型进行预测,因此,两个模型之间存在硬切换的问题,不利于预测的便利性。
发明内容
本申请实施例提供了一种信息预测的方法、模型训练的方法以及服务器,仅使用一个联合模型就可以预测微操和大局观,有效地解决了分层模型中的硬切换问题,提升了预测的便利性。
有鉴于此,本申请的第一方面提供了一种信息预测的方法,包括:获取待预测图像;提取所述待预测图像中的待预测特征集合,其中,所述待预测特征集合包括第一待预测特征、第二待预测特征以及第三待预测特征,所述第一待预测特征表示第一区域的图像特征,所述第二待预测特征表示第二区域的图像特征,所述第三待预测特征表示与交互操作相关的属性特征,所述第一区域的范围小于所述第二区域的范围;
通过目标联合模型获取所述待预测特征集合所对应的第一标签和/或第二标签,其中,所述第一标签表示与操作内容相关的标签,所述第二标签表示与操作意图相关的标签。
本申请的第二方面提供了一种模型训练的方法,包括:获取待训练图像集合,其中,所述待训练图像集合包括N个待训练图像,所述N为大于或等于1的整数;提取每个待训练图像中的待训练特征集合,其中,所述待训练特征集合包括第一待训练特征、第二待训练特征以及第三待训练特征,所述第一待训练特征表示第一区域的图像特征,所述第二待训练特征表示第二区域的图像特征,所述第三待训练特征表示与交互操作相关的属性特征,所述第一区域的范围小于所述第二区域的范围;获取所述每个待训练图像所对应的第一待训练标签以及第二待训练标签,其中,所述第一待训练标签表示与操作内容相关的标签,所述第二待训练标签表示与操作意图相关的标签;根据所述每个待训练图像中的所述待训练特征集合,以及所述每个待训练图像所对应的所述第一待训练标签以及所述第二待训练标签,训练得到目标联合模型。
本申请的第三方面提供了一种服务器,包括:
获取模块,用于获取待预测图像;
提取模块,用于提取所述获取模块获取的所述待预测图像中的待预测特征集合,其中,所述待预测特征集合包括第一待预测特征、第二待预测特征以及第三待预测特征,所述第一待预测特征表示第一区域的图像特征,所述第二待预测特征表示第二区域的图像特征,所述第三待预测特征表示与交互操作相关的属性特征,所述第一区域的范围小于所述第二区域的范围;
所述获取模块,还用于通过目标联合模型获取所述提取模块提取的所述待预测特征集合所对应的第一标签以及第二标签,其中,所述第一标签表示与操作内容相关的标签,所述第二标签表示与操作意图相关的标签。
在一种可能的设计中,在本申请实施例的第三方面的第一种实现方式中,
所述获取模块,用于通过所述目标联合模型获取所述待预测特征集合所对应的所述第一标签、所述第二标签以及第三标签,其中,所述第三标签表示与胜负情况相关的标签。
本申请的第四方面提供了一种服务器,包括:
获取模块,用于获取待训练图像集合,其中,所述待训练图像集合包括N个待训练图像,所述N为大于或等于1的整数;
提取模块,用于提取所述获取模块获取的每个待训练图像中的待训练特征集合,其中,所述待训练特征集合包括第一待训练特征、第二待训练特征以及第三待训练特征,所述第一待训练特征表示第一区域的图像特征,所述第二待训练特征表示第二区域的图像特征,所述第三待训练特征表示与交互操作相关的属性特征,所述第一区域的范围小于所述第二区域的范围;
所述获取模块,用于获取所述每个待训练图像所对应的第一待训练标签以及第二待训练标签,其中,所述第一待训练标签表示与操作内容相关的标签,所述第二待训练标签表示与操作意图相关的标签;
训练模块,用于根据所述提取模块提取的所述每个待训练图像中的所述待训练特征集合,以及所述获取模块获取的所述每个待训练图像所对应的所述第一待训练标签以及所述第二待训练标签,训练得到目标联合模型。
在一种可能的设计中,在本申请实施例的第四方面的第一种实现方式中,
所述第一待训练特征为二维向量特征,其中,所述第一待训练特征包括在所述第一区 域内的角色位置信息、移动对象位置信息、固定对象位置信息以及防御对象位置信息中的至少一种;
所述第二待训练特征为二维向量特征,其中,所述第二待训练特征包括在所述第二区域内的角色位置信息、移动对象位置信息、固定对象位置信息、防御对象位置信息、障碍对象位置信息以及输出对象位置信息中的至少一种;
所述第三待训练特征为一维向量特征,其中,所述第一待训练特征包括角色生命值、角色输出值、时间信息以及比分信息中的至少一种;其中,所述第一待训练特征、所述第二待训练特征以及所述第三待训练特征之间具有对应关系。
在一种可能的设计中,在本申请实施例的第四方面的第二种实现方式中,
所述第一待训练标签包括按键类型信息和/或按键参数信息;
其中,所述按键参数信息包括方向型参数、位置型参数以及目标型参数中的至少一项,所述方向型参数用于表示角色移动的方向,所述位置型参数用于表示所述角色所在的位置,所述目标型参数用于表示所述角色的待输出对象。
在一种可能的设计中,在本申请实施例的第四方面的第三种实现方式中,所述第二待训练标签包括操作意图信息以及角色位置信息;其中,所述操作意图信息表示角色与对象进行交互的目的,所述角色位置信息表示所述角色在所述第一区域内的位置。
在一种可能的设计中,在本申请实施例的第四方面的第四种实现方式中,所述训练模块,用于对所述每个待训练图像中的所述待训练特征集合进行处理,得到目标特征集合,其中,所述目标特征集合包括第一目标特征、第二目标特征以及第三目标特征;
通过长短期记忆LSTM层获取所述目标特征集合所对应的第一预测标签以及第二预测标签,其中,所述第一预测标签表示预测得到的与操作内容相关的标签,所述第二预测标签表示预测得到的与操作意图相关的标签;
根据所述每个待训练图像的所述第一预测标签、所述第一待训练标签、所述第二预测标签以及所述第二待训练标签,训练得到模型核心参数,其中,所述第一预测标签与所述第二预测标签均属于预测值,所述第一待训练标签以及所述第二待训练标签均属于真实值;
根据所述模型核心参数生成所述目标联合模型。
在一种可能的设计中,在本申请实施例的第四方面的第五种实现方式中,所述训练模块,用于通过全连接层对所述每个待训练图像中的所述第三待训练特征进行处理,得到所述第三目标特征,其中,所述第三目标特征为一维向量特征;
通过卷积层对所述每个待训练图像中的所述第二待训练特征进行处理,得到所述第二目标特征,其中,所述第二目标特征为一维向量特征;
通过所述卷积层对所述每个待训练图像中的所述第一待训练特征进行处理,得到所述第一目标特征,其中,所述第一目标特征为一维向量特征。
在一种可能的设计中,在本申请实施例的第四方面的第六种实现方式中,所述训练模块,用于通过长短期记忆LSTM层获取所述目标特征集合所对应的第一预测标签、第二预测标签以及第三预测标签,所述第三预测标签表示预测得到的与胜负情况相关的标签;
获取所述每个待训练图像所对应的第三待训练标签,其中,所述第三待训练标签用于表示实际胜负情况;
根据所述第一预测标签、所述第一待训练标签、所述第二预测标签、所述第二待训练 标签、所述第三预测标签以及所述第三待训练标签,训练得到所述模型核心参数,其中,所述第三待训练标签属于预测值,所述第三预测标签属于真实值。
在一种可能的设计中,在本申请实施例的第四方面的第七种实现方式中,所述服务器还包括更新模块;
所述获取模块,还用于在所述训练模块根据所述每个待训练图像中的所述待训练特征集合,以及所述每个待训练图像所对应的所述第一待训练标签以及所述第二待训练标签,训练得到目标联合模型之后,获取待训练视频,其中,所述待训练视频包括多帧交互图像;
所述获取模块,还用于通过所述目标联合模型获取所述待训练视频对应的目标场景数据,其中,所述目标场景数据包括在目标场景下的相关数据;
所述训练模块,还用于根据所述获取模块获取的所述目标场景数据、所述第一待训练标签以及第一预测标签,训练得到目标模型参数,其中,所述第一预测标签表示预测得到的与操作内容相关的标签,所述第一预测标签属于预测值,所述第一待训练标签属于真实值;
所述更新模块,用于采用所述训练模块训练得到的所述目标模型参数对所述目标联合模型进行更新,得到强化联合模型。
在一种可能的设计中,在本申请实施例的第四方面的第八种实现方式中,所述服务器还包括更新模块;
所述获取模块,还用于在所述训练模块根据所述每个待训练图像中的所述待训练特征集合,以及所述每个待训练图像所对应的所述第一待训练标签以及所述第二待训练标签,训练得到目标联合模型之后,获取待训练视频,其中,所述待训练视频包括多帧交互图像;
所述获取模块,还用于通过所述目标联合模型获取所述待训练视频对应的目标场景数据,其中,所述目标场景数据包括在目标场景下的相关数据;
所述训练模块,还用于根据所述获取模块获取的所述目标场景数据、所述第二待训练标签以及第二预测标签,训练得到目标模型参数,其中,所述第二预测标签表示预测得到的与操作意图相关的标签,所述第二预测标签属于预测值,所述第二待训练标签属于真实值;
所述更新模块,用于采用所述训练模块训练得到的所述目标模型参数对所述目标联合模型进行更新,得到强化联合模型。
本申请第五方面提供一种服务器,所述通信设备用于执行上述第一方面或第一方面的任一可能的实现方式中的信息预测方法。具体地,所述服务器可以包括用于执行第一方面或第一方面的任一可能的实现方式中的信息预测方法的模块。
本申请第六方面提供一种服务器,所述通信设备用于执行上述第二方面或第二方面的任一可能的实现方式中的模型训练方法。示例性地,所述服务器可以包括用于执行第二方面或第二方面的任一可能的实现方式中的模型训练方法的模块。
本申请的第七方面提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行上述各方面所述的方法。
本申请的第八方面提供了一种计算机程序(产品),所述计算机程序(产品)包括:计算机程序代码,当所述计算机程序代码被计算机运行时,使得所述计算机执行上述各方面中任一所述的方法。
从以上技术方案可以看出,本申请实施例至少具有以下优点:
本申请实施例中,提供了一种信息预测的方法,首先,服务器获取待预测图像,然后提取待预测图像中的待预测特征集合,其中,待预测特征集合包括第一待预测特征、第二待预测特征以及第三待预测特征,第一待预测特征表示第一区域的图像特征,第二待预测特征表示第二区域的图像特征,第三待预测特征表示与交互操作相关的属性特征,第一区域的范围小于第二区域的范围,最后,服务器可以通过目标联合模型获取待预测图像所对应的第一标签以及第二标签,其中,第一标签表示与操作内容相关的标签,第二标签表示与操作意图相关的标签。通过上述方式,仅使用一个联合模型就可以预测微操和大局观,其中,微操的预测结果表示为第一标签,大局观的预测结果表示第二标签,因此,将大局观模型和微操模型合并成一个联合模型,有效地解决了分层模型中的硬切换问题,提升了预测的便利性。
附图说明
图1为相关技术中分层建立模型的一个示意图;
图2为相关技术中分层模型的一个结构示意图;
图3为本申请实施例中信息预测系统的一个架构示意图;
图4为本申请实施例中目标联合模型的一个系统结构示意图;
图5为本申请实施例中信息预测的方法一个实施例示意图;
图6为本申请实施例中强化联合模型的一个工作流程示意图;
图7为本申请实施例中模型训练的方法一个实施例示意图;
图8为本申请实施例中提取待训练特征集合的一个实施例示意图;
图9为本申请实施例中待训练特征集合的一个特征表达示意图;
图10为本申请实施例中类图像的一个特征表达示意图;
图11为本申请实施例中微操作标签的一个作示意图;
图12为本申请实施例中微操作标签的另一个作示意图;
图13为本申请实施例中微操作标签的另一个作示意图;
图14为本申请实施例中微操作标签的另一个作示意图;
图15为本申请实施例中大局观标签的一个示意图;
图16为本申请实施例中目标联合模型的一个网络结构示意图;
图17为本申请实施例中强化联合模型的一个系统结构示意图;
图18为本申请实施例中强化联合模型的另一个系统结构示意图;
图19为本申请实施例中服务器一个实施例示意图;
图20为本申请实施例中服务器另一个实施例示意图;
图21为本申请实施例中服务器另一个实施例示意图;
图22为本申请实施例中服务器一个结构示意图。
具体实施方式
本申请实施例提供了一种信息预测的方法、模型训练的方法以及服务器,仅使用一个联合模型就可以预测微操和大局观,有效地解决了分层模型中的硬切换问题,提升了预测 的便利性。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本申请的实施例例如能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
应理解,本申请所涉及的模型可以应用于AI领域,其应用范围包含但不仅限于机器翻译、智能控制、专家系统、机器人、语言和图像理解、自动程序设计、航天应用以及庞大的信息处理、储存与管理等。为了便于介绍,本申请将以网络游戏场景为例进行介绍,可以是MOBA类型的游戏场景。针对MOBA游戏,本申请实施例设计了一种AI模型,能够更好地模拟人类玩家的行为,在人机对战,模拟掉线玩家,玩家练习游戏角色等情形中均取得更好的效果。其中,MOBA游戏的典型玩法是多人对战多人的模式,即两支(或多支)分别由相同玩家人数组成的队伍之间进行对抗,每个玩家控制一个英雄角色,优先推掉对方“水晶”基地的一方即为获胜方。
为了便于理解,本申请提出了一种信息预测的方法,该方法应用于图3所示的信息预测系统,请参阅图3,图3为本申请实施例中信息预测系统的一个架构示意图,如图3所示,客户端上进行多局游戏,生成大量的游戏画面数据(即待训练图像),然后游戏画面数据发送至服务器。其中,这些游戏画面数据的过程可以是人类玩家在实际游戏过程中生成的数据,也可以是由机器模拟人类玩家操作后得到的数据,本申请主要是以人类玩家提供的数据为主。以一局游戏平均30分钟,且每秒15帧来计算,每局游戏平局有27000帧图像。本申请主要选择与大局观任务和微操任务相关的数据进行训练,以降低数据的复杂度。其中,大局观任务是以操作意图来划分的,大局观任务包含但不仅限于“打野”、“清兵”、“团战”以及“推塔”,每局游戏只有平均100个左右的大局观任务,而每个大局观任务中的微操决策步数约为200个,因此,大局观的决策步数和微操的决策步数都在可以接受的范围内。
服务器采用客户端上报的游戏画面数据对模型进行训练,在得到目标联合模型的基础上进一步生成强化联合模型。为了便于介绍,请参阅图4,图4为本申请实施例中强化联合模型的一个系统结构示意图。如图4所示,整个模型训练的过程可以分为两个阶段。首先利用监督学习从人类玩家的游戏数据中学习初始的大局观与微操的联合模型,在联合模型的基础上增加大局观全连接(full connection,FC)层和微操FC层,由此得到目标联合模型。然后,通过强化学习来优化为微操FC层(或大局观FC层),其他层的参数保持固定,以此提高“团战”中的技能命中率和躲避技能成功率等核心指标。
需要说明的是,客户端部署于终端设备上,其中,终端设备包含但不仅限于平板电脑、笔记本电脑、掌上电脑、手机以及个人电脑(personal computer,PC),此处不做限定。
结合上述介绍,下面将对本申请中信息预测的方法进行介绍,请参阅图5,本申请实施例中信息预测的方法一个实施例包括:
101、获取待预测图像;
本实施例中,服务器首先获取待预测图像,该待预测图像可以是指MOBA游戏中的图 像。
102、提取待预测图像中的待预测特征集合,其中,待预测特征集合包括第一待预测特征、第二待预测特征以及第三待预测特征,第一待预测特征表示第一区域的图像特征,第二待预测特征表示第二区域的图像特征,第三待预测特征表示与交互操作相关的属性特征,第一区域的范围小于第二区域的范围;
本实施例中,服务器需要提取待预测图像中的待预测特征集合,这里的待预测特征集合主要包括了三类特征,分别为第一待预测特征、第二待预测特征以及第三待预测特征。第一待预测特征表示第一区域的图像特征,示例性地,第一待预测特征为MOBA游戏中的小地图类图像特征。第二待预测特征表示第二区域的图像特征,示例性地,第二待预测特征为MOBA游戏中的当前视野类图像特征。第三待预测特征表示与交互操作相关的属性特征,示例性地,第三待预测特征为MOBA游戏中的英雄属性向量特征。
103、通过目标联合模型获取待预测特征集合所对应的第一标签和/或第二标签,其中,第一标签表示与操作内容相关的标签,第二标签表示与操作意图相关的标签。
本实施例中,服务器将提取的待预测特征集合输入至目标联合模型,进一步地,还可以输入至强化后的强化联合模型,其中,强化联合模型是对目标联合模型进行强化后得到的模型。为了便于理解,请参阅图6,图6为本申请实施例中目标联合模型的一个工作流程示意图,如图6所示,本申请将大局观模型和微操模型合并为同一个模型,即为联合模型。在联合模型的基础上增加大局观FC层和微操FC层,由此得到目标联合模型,这样可以更加符合人类的决策过程。联合模型采用统一的特征输入,即输入待预测特征集合。学习统一的编码层,同时学习大局观任务和微操任务,大局观任务的输出以级联的方式输入到操作任务的编码层中,联合模型最终可以只输出与操作内容相关的第一标签,并根据第一标签将微操FC层的输出作为执行指令。也可以只输出与操作意图相关的第二标签,并根据第二标签将大局观FC层的输出作为执行指令。还可以同时输出第一标签和第二标签,即同时根据第一标签以及第二标签将微操FC层和大局观FC层的输出作为执行指令。
本申请实施例中,提供了一种信息预测的方法。首先,服务器获取待预测图像。然后,提取待预测图像中的待预测特征集合。其中,待预测特征集合包括第一待预测特征、第二待预测特征以及第三待预测特征,第一待预测特征表示第一区域的图像特征,第二待预测特征表示第二区域的图像特征,第三待预测特征表示与交互操作相关的属性特征,第一区域的范围小于第二区域的范围。最后,服务器可以通过目标联合模型获取待预测图像所对应的第一标签以及第二标签。其中,第一标签表示与操作内容相关的标签,第二标签表示与操作意图相关的标签。通过上述方式,仅使用一个联合模型就可以预测微操和大局观,其中,微操的预测结果表示为第一标签,大局观的预测结果表示第二标签,因此,将大局观模型和微操模型合并成一个联合模型,有效地解决了分层模型中的硬切换问题,提升了预测的便利性。
可选地,在上述图5对应的实施例的基础上,本申请实施例提供信息预测的方法第一个可选实施例中,通过目标联合模型获取待预测特征集合所对应的第一标签和/或第二标签,可以包括:通过目标联合模型获取待预测特征集合所对应的第一标签、第二标签以及第三标签,其中,第三标签表示与胜负情况相关的标签。
本实施例中,将提供一种较为全面的预测方式,即通过目标联合模型同时输出第一标 签、第二标签和第三标签,不但可以预测大局观任务下的操作和微操任务下的操作,还可以预测胜负情况。
可选地,在实际应用中,通常会输入连续的多帧待预测图像,以提升预测的准确度。比如输入100帧待预测图像,对每帧待预测图像进行特征提取,也就得到100组的待预测特征集合。将100组的待预测特征集合输入至目标联合模型,由此预测出与大局观任务相关的隐含意图,学会通用的导航能力,并且预测出微操任务的执行指令,以及预测这局游戏可能的胜负情况,比如这局游戏可能会胜利,或者这局游戏可能会失败。
其次,本申请实施例中,目标联合模型不但可以输出第一标签和第二标签,还可以进一步输出第三标签,即目标联合模型还可以预测胜负情况。通过上述方式,在实际应用中能够更好地预测局势结果,有助于提升预测的可靠性,并且增加预测的灵活度和实用性。
下面将对本申请中模型预测的方法进行介绍,既利用人类数据进行快速监督学习,又能利用强化学习提高模型的预测准确度,请参阅图7,本申请实施例中模型预测的方法一个实施例包括:
201、获取待训练图像集合,其中,待训练图像集合包括N个待训练图像,N为大于或等于1的整数;
本实施例中,将介绍模型训练的流程,首先,服务器通过客户端上报的人类玩家游戏数据获取相应的待训练图像集合。待训练图像集合通常包含多帧图像,即待训练图像集合包括N个待训练图像,以提高模型的精度,N为大于或等于1的整数。
202、提取每个待训练图像中的待训练特征集合,其中,待训练特征集合包括第一待训练特征、第二待训练特征以及第三待训练特征,第一待训练特征表示第一区域的图像特征,第二待训练特征表示第二区域的图像特征,第三待训练特征表示与交互操作相关的属性特征,第一区域的范围小于第二区域的范围;
本实施例中,服务器需要提取待训练图像集合中每个待训练图像的待训练特征集合,这里的待训练特征集合主要包括了三类特征,分别为第一待训练特征、第二待训练特征以及第三待训练特征。第一待训练特征表示第一区域的图像特征,示例性地,第一待训练特征为MOBA游戏中的小地图类图像特征。第二待训练特征表示第二区域的图像特征,示例性地,第二待训练特征为MOBA游戏中的当前视野类图像特征。第三待训练特征表示与交互操作相关的属性特征,示例性地,第三待训练特征为MOBA游戏中的英雄属性向量特征。
203、获取每个待训练图像所对应的第一待训练标签以及第二待训练标签,其中,第一待训练标签表示与操作内容相关的标签,第二待训练标签表示与操作意图相关的标签;
本实施例中,服务器还需要获取每个待训练图像所对应的第一待训练标签以及第二待训练标签,其中,第一待训练标签表示与操作内容相关的标签,示例性地,第一待训练标签是与微操任务相关的标签。第二待训练标签表示与操作意图相关的标签,示例性地,第二待训练标签是与大局观任务相关的标签。
需要说明的是,在实际应用中,步骤203可以在步骤202之前执行,也可以在步骤202之后执行,还可以与步骤202同时执行,此处不做限定。
204、根据每个待训练图像中的待训练特征集合,以及每个待训练图像所对应的第一待训练标签以及第二待训练标签,训练得到目标联合模型。
本实施例中,最后,服务器基于从每个待训练图像中提取的待训练特征集合,以及每 个待训练图像所对应的第一待训练标签以及第二待训练标签进行训练,从而得到目标联合模型。该目标联合模型可以用于预测大局观任务的情形以及微操任务的指令。
本申请实施例中,将介绍一种模型训练的方法,首先服务器获取待训练图像集合,然后提取每个待训练图像中的待训练特征集合,其中,待训练特征集合包括第一待训练特征、第二待训练特征以及第三待训练特征。接下来,服务器需要获取每个待训练图像所对应的第一待训练标签以及第二待训练标签,最后根据每个待训练图像中的待训练特征集合,以及每个待训练图像所对应的第一待训练标签以及第二待训练标签,训练得到目标联合模型。通过上述方式,设计了一个可以同时预测微操和大局观的模型,由此,将大局观模型和微操模型合并成一个联合模型,有效地解决了分层模型中的硬切换问题,提升了预测的便利性。同时,考虑大局观任务可以有效地提升宏观决策的准确度,尤其在MOBA游戏中,大局观的决策是非常重要的。
可选地,在上述图7对应的实施例的基础上,本申请实施例提供模型训练的方法第一个可选实施例中,第一待训练特征为二维向量特征,其中,第一待训练特征包括在第一区域内的角色位置信息、移动对象位置信息、固定对象位置信息以及防御对象位置信息中的至少一种;
第二待训练特征为二维向量特征,其中,第二待训练特征包括在第二区域内的角色位置信息、移动对象位置信息、固定对象位置信息、防御对象位置信息、障碍对象位置信息以及输出对象位置信息中的至少一种;
第三待训练特征为一维向量特征,其中,第一待训练特征包括角色生命值、角色输出值、时间信息以及比分信息中的至少一种;
其中,第一待训练特征、第二待训练特征以及第三待训练特征之间具有对应关系。
本实施例中,将对第一待训练特征、第二待训练特征以及第三待训练特征之间的关系以及内容,为了便于介绍,下面将以MOBA游戏的场景为例进行说明,人类玩家在进行操作时,会综合考虑小地图、当前视野以及英雄属性等信息。因此本申请采用了一种多模态和多尺度的特征表达。请参阅图8,图8为本申请实施例中提取待训练特征集合的一个实施例示意图,如图8所示,S1所指示的部分为英雄属性信息,包括对局中的英雄角色,以及每个英雄角色的生命值、物理攻击值、法术攻击值、物理防御值和法术防御值。S2所指示的部分为小地图,即第一区域。在小地图中可以看到英雄角色、兵线、野怪和防御塔等位置。其中,英雄角色包括队友控制的英雄角色以及敌方控制的英雄角色,兵线是指双方小兵交战的位置。野怪是指除了玩家以外,在环境中有“中立敌对”对象,属于非玩家角色(Non-Player Character,NPC)类怪物,不受玩家控制。防御塔是一种防御建筑。其中,双方阵营分别具有一座水晶防御塔,摧毁对方水晶防御塔即获得胜利。S3所指示的部分为当前视野,即第二区域。在当前视野中可以清楚看到英雄、兵线、野怪、防御塔、地图障碍物以及子弹等。
请参阅图9,图9为本申请实施例中待训练特征集合的一个特征表达示意图,如图9所示,英雄属性的向量特征(即第三待训练特征)通过与小地图类图像特征(即第一待训练特征)和当前视野类图像特征(即第二待训练特征)建立一一映射关系,既可以用于宏观决策,也可以用于微观决策。英雄属性的向量特征是由数值构成的特征,因此,属于一维向量特征,该向量特征中包含但不仅限于英雄角色的属性特征,如血量(即敌方五位英雄 角色的生命值和我方五位英雄角色的生命值)、攻击力(即即敌方五位英雄角色的角色输出值和我方五位英雄角色的角色输出值)、时间(一局游戏时长)以及比分(每个队伍的最后得分)。小地图类图像特征和当前视野类图像特征均属于类图像特征,为了便于理解,请参阅图10,图10为本申请实施例中类图像的一个特征表达示意图。如图10所示,类图像特征是一种从原始像素图像中人工构造的二维特征,降低了直接从原始的复杂图像中学习的难度。小地图类图像特征包含英雄、兵线、野怪以及防御塔等位置信息,用于表达宏观尺度的信息。当前视野类图像特征包含英雄、兵线、野怪、防御塔、地图障碍物和子弹的位置信息,用于表达局部的微观尺度信息。
这种模拟人类视角的多模态和多尺度特征不仅可以更好地对空间相对位置关系进行建模,而且非常适合MOBA游戏中高维状态特征的表达。
其次,本申请实施例中,介绍了三种待训练特征的内容,其中,第一待训练特征为二维向量特征,第二待训练特征为二维向量特征,第三待训练特征为一维向量特征。通过上述方式,一方面能够确定三种待训练特征中所包含的具体信息,由此得到更多的信息量用于进行模型训练。另一方面,第一待训练特征和第二待训练特征都是二维向量特征,有利于提升特征的空间表达,从而增加特征的多样性。
可选地,在上述图7对应的实施例的基础上,本申请实施例提供模型训练的方法第二个可选实施例中,第一待训练标签包括按键类型信息和/或按键参数信息;
其中,按键参数信息包括方向型参数、位置型参数以及目标型参数中的至少一项,方向型参数用于表示角色移动的方向,位置型参数用于表示角色所在的位置,目标型参数用于表示角色的待输出对象。
本实施例中,将对第一待训练标签所包含的内容进行详细介绍。第一待训练标签包括按键类型信息和/或按键参数信息,通常情况下,会同时考虑将按键类型信息和按键参数信息作为第一待训练标签,以提升标签的精准度。由于人类玩家在操作时,通常先决定用哪个按键,再决定按键的操作参数,因此,本申请采用了分层的标签设计,即先预测当前时刻应该执行哪个按键,然后预测该按键的释放参数。
为了便于理解,下面将结合附图举例介绍第一待训练标签。按键参数信息主要分为三种类型的信息,分别为方向型信息、位置型信息以及目标型信息。一圈方向为360度,假设每6度设定为一个标签,则可以将方向型信息离散化成60个方向。一个英雄角色通常占据图像中的1000个像素,因此,可以将位置型信息离散化成30×30个位置。而目标型信息表示为候选的攻击目标,可以是指英雄角色发动技能时所攻击的对象。
请参阅图11,图11为本申请实施例中微操作标签的一个示意图,如图11所示,英雄角色在A1所示的范围内发动技能3,技能方向为右下方的45度方向。A2指示技能3在操作界面中的位置。由此表示人类玩家操作为“技能3+方向”。请参阅图12,图12为本申请实施例中微操作标签的另一个作示意图,如图12所示,英雄角色按照A3所示的方向进行移动,移动方向正右方。由此表示人类玩家操作为“移动+方向”。请参阅图13,图13为本申请实施例中微操作标签的另一个作示意图,如图13所示,英雄角色在发动技能1,A4指示技能1在操作界面中的位置。由此表示人类玩家操作为“技能1”。请参阅图14,图14为本申请实施例中微操作标签的另一个作示意图,如图14所示,英雄角色在A5所示的范围内发动技能2,技能方向为右上方的45度方向。A6指示技能2在操作界面中的位置。由此 表示人类玩家操作为“技能2+方向”。
AI可以对不同释放类型的技能分别预测,即对方向型按键预测方向,对位置型按键预测位置,对目标型按键预测具体目标。分层标签的设计方法更贴近人类玩家在游戏过程中的真实操作意图,更有利于AI的学习。
其次,本申请实施例中,说明了第一待训练标签包括按键类型信息和/或按键参数信息,其中,按键参数信息包括方向型参数、位置型参数以及目标型参数中的至少一项,方向型参数用于表示角色移动的方向,位置型参数用于表示角色所在的位置,目标型参数用于表示角色的待输出对象。通过上述方式,将第一待训练标签的内容更加精细化,以分层的方式建立标签,可以更加贴近人类玩家在游戏过程中的真实操作意图,从而有利于提升AI的学习能力。
可选地,在上述图7对应的实施例的基础上,本申请实施例提供模型训练的方法第三个可选实施例中,第二待训练标签包括操作意图信息以及角色位置信息;
其中,操作意图信息表示角色与对象进行交互的目的,角色位置信息表示角色在第一区域内的位置。
本实施例中,将对第二待训练标签所包含的内容进行详细介绍,第二待训练标签包括操作意图信息以及角色位置信息。在实际应用中,人类玩家会根据当前的游戏状态进行大局观的决策,例如,去清理上路的兵线,去打我方野区的野怪,去中路参加团战,去推下路的防御塔等。这些大局观决策并不像微操一样,有明确的操作按键与其对应,而是作为一种隐含的意图反映在玩家数据中。
为了便于理解,请参阅图15,图15为本申请实施例中大局观标签的一个示意图,示例性地,根据时间线的变化得到人类大局观以及对应的大局观标签(第二待训练标签)。可以将一局人类玩家的对战录像分成“团战”、“清兵”、“打野”和“推塔”等场景,将这些场景建模即可表达为玩家大局观意图的操作意图信息,并且将小地图离散化成了24*24个格子,角色位置信息表示为角色下一次攻击时所在的格子。如图15所示,第二待训练标签为操作意图信息+角色位置信息,即分别表示为“打野+坐标A”,“团战+坐标B”以及“清兵+坐标C”。
其次,本申请实施例中,说明了第二待训练标签包括操作意图信息以及角色位置信息,其中,操作意图信息表示角色与对象进行交互的目的,角色位置信息表示角色在第一区域内的位置。通过上述方式,利用操作意图信息以及角色位置信息共同反映人类玩家的大局观,在MOBA游戏中大局观的决策是非常重要的,从而提升了方案的可行性和可操作性。
可选地,在上述图7对应的实施例的基础上,本申请实施例提供模型训练的方法第四个可选实施例中,根据每个待训练图像中的待训练特征集合,以及每个待训练图像所对应的第一待训练标签以及第二待训练标签,训练得到目标联合模型,可以包括:
对每个待训练图像中的待训练特征集合进行处理,得到目标特征集合,其中,目标特征集合包括第一目标特征、第二目标特征以及第三目标特征;
通过长短期记忆LSTM层获取目标特征集合所对应的第一预测标签以及第二预测标签,其中,第一预测标签表示预测得到的与操作内容相关的标签,第二预测标签表示预测得到的与操作意图相关的标签;
根据每个待训练图像的第一预测标签、第一待训练标签、第二预测标签以及第二待训练标签,训练得到模型核心参数,其中,第一预测标签与第二预测标签均属于预测值,第 一待训练标签以及第二待训练标签均属于真实值;
根据模型核心参数生成目标联合模型。
本实施例中,将介绍训练得到目标联合模型的大致过程,为了便于理解,请参阅图16,图16为本申请实施例中目标联合模型的一个网络结构示意图。如图16所示,模型的输入是当前帧待训练图像的待训练特征集合,该待训练特征集合包括小地图类图像特征(即第一待训练特征),当前视野类图像特征(即第二待训练特征)和英雄角色的向量特征(即第三待训练特征)。类图像特征分别经过卷积网络进行编码,而向量特征经过全连接网络进行编码,得到目标特征集合,目标特征集合包括第一目标特征、第二目标特征以及第三目标特征。其中,第一目标特征是第一待训练特征经过处理后得到的,第二目标特征是第二待训练特征经过处理后得到的,第三目标特征是第三待训练特征经过处理后得到的。然后将目标特征集合拼接成公用的编码层。编码层输入到长短期记忆(Long Short-Term Memory,LSTM)网络层,LSTM网络层主要用于解决英雄视野的部分可观问题。
其中,LSTM网络是一种时间递归神经网络,适合于处理和预测时间序列中间隔和延迟相对较长的重要事件。LSTM区别于循环神经网络(Recurrent Neural Network,RNN)的地方,主要就在于它在算法中加入了一个判断信息有用与否的处理器,这个处理器作用的结构被称为单元。一个单元当中被放置了三扇门,分别叫做输入门、遗忘门和输出门。一个信息进入LSTM网络层当中,可以根据规则来判断是否有用,只有符合算法认证的信息才会留下,不符的信息则通过遗忘门被遗忘。LSTM是解决长序依赖问题的有效技术,并且这种技术的普适性非常高。对于MOBA游戏而言可能会存在不可见的视野问题,即我方英雄角色仅能观察到我方单元(如队友的英雄角色)附近的敌方英雄、野怪和兵线,其他位置的敌方单元则观察不到,并且敌方英雄还可以通过躲在草丛或者使用隐身技能来屏蔽自身视野。这样的话,在训练模型的过程中由于考虑信息的完整性,因此,需要利用LSTM网络层还原这些隐藏的信息。
基于LSTM层的输出结果可以得到该帧待训练图像的第一预测标签以及第二预测标签。由于根据人工标注结果确定该帧待训练图像的第一待训练标签和第二待训练标签,此时可以采用损失函数获取第一预测标签和第一待训练标签之间的最小值,并且采用损失函数获取第二预测标签和第二待训练标签之间的最小值,在最小值的情况下确定模型核心参数。其中,模型核心参数包括微操任务(如按键、移动、普攻、技能1、技能2和技能3等)下的模型参数以及大局观任务下的模型参数。根据模型核心参数生成目标联合模型。
可以理解的是,每个输出任务均可以独立计算,即每个任务的输出层全连接网络参数只受该任务的影响。目标联合模型中包含了预测大局观位置和意图的辅助任务,该大局观任务的输出以级联的形式输入到微操任务的编码层中。
需要说明的是,损失函数是用来估量模型的预测值与真实值之间不一致的程度,它是一个非负实值函数。损失函数越小,标识模型的鲁棒性就越好。损失函数是经验风险函数的核心部分,也是结构风险函数的重要组成部分。常用损失函数包含但不仅限于铰链损失(Hinge Loss)、互熵损失(Cross Entropy Loss)、平方损失(Square Loss)以及指数损失(Exponential Loss)。
其次,本申请实施例中,提供了训练得到目标联合模型的过程,主要包括先对每个待训练图像中的待训练特征集合进行处理,得到目标特征集合。然后通过LSTM获取目标特 征集合所对应的第一预测标签以及第二预测标签,再根据每个待训练图像的第一预测标签、第一待训练标签、第二预测标签以及第二待训练标签,训练得到模型核心参数,该模型核心参数用于生成目标联合模型。通过上述方式,利用LSTM层可以解决部分视野不可观测的问题,即LSTM层能够获取过去一段时间内的数据,由此能够使得数据更加完整,有利于在模型训练的过程中进行推断和决策。
可选地,在上述图7对应的第四个实施例的基础上,本申请实施例提供模型训练的方法第五个可选实施例中,对每个待训练图像中的待训练特征集合进行处理,得到目标特征集合,可以包括:通过全连接层对每个待训练图像中的第三待训练特征进行处理,得到第三目标特征,其中,第三目标特征为一维向量特征;通过卷积层对每个待训练图像中的第二待训练特征进行处理,得到第二目标特征,其中,第二目标特征为一维向量特征;通过卷积层对每个待训练图像中的第一待训练特征进行处理,得到第一目标特征,其中,第一目标特征为一维向量特征。
本实施例中,将介绍如何对模型输入的每帧待训练图像的待训练特征集合进行处理,其中,该待训练特征集合包括小地图类图像特征(即第一待训练特征),当前视野类图像特征(即第二待训练特征)和英雄角色的向量特征(即第三待训练特征)。示例性地,对于第三待训练特征的处理方式是,将第三待训练特征输入至FC层,经过FC层输出得到第三目标特征。FC层的作用是将分布式特征表示映射到样本标记空间。FC层的每一个结点都与上一层的所有结点相连,用来把前边提取到的特征综合起来。由于其全相连的特性,通常情况下,全连接层的参数也是最多的。
对于第一待训练特征和第二待训练特征的处理方式是,分别将两者输入至卷积层,经过卷积层输出第一待训练特征所对应的第一目标特征,以及输出第二待训练特征所对应的第二目标特征。通过卷积层可以将原始图像拉平(flat),针对图像数据来说,一个像素会与其上下左右等方向的数据有很大的相关性,全连接时,将数据展开后,容易忽略图片的相关性,或者是将毫无相关的两个像素强制关联在一起。因此,需要对图像数据进行卷积处理。假设第一待训练特征所对应的图像像素为10×10,经过卷积层后得到的第一目标特征为100维的向量特征。假设第二待训练特征所对应的图像像素为10×10,经过卷积层后得到的第二目标特征为100维的向量特征。假设第三待训练特征所对应的第三目标特征为10维的向量特征。那么经过拼接(concat)层可以得到210(100+100+10)的向量特征。
再次,本申请实施例中,还可以对待训练特征集合进行处理,即通过全连接层对每个待训练图像中的第一待训练特征进行处理,得到第一目标特征。通过卷积层对每个待训练图像中的第二待训练特征进行处理,得到第二目标特征。通过卷积层对每个待训练图像中的第三待训练特征进行处理,得到第三目标特征。通过上述方式,能够得到均为一维向量的特征,由此可以将这些向量特征进行拼接处理,便于后续的模型训练,有利于提升方案的可行性和可操作性。
可选地,在上述图7对应的第四个实施例的基础上,本申请实施例提供模型训练的方法第六个可选实施例中,通过长短期记忆LSTM层获取目标特征集合所对应的第一预测标签以及第二预测标签,可以包括:
通过长短期记忆LSTM层获取目标特征集合所对应的第一预测标签、第二预测标签以及第三预测标签,第三预测标签表示预测得到的与胜负情况相关的标签;
根据每个待训练图像的第一预测标签、第一待训练标签、第二预测标签以及第二待训练标签,训练得到模型核心参数,包括:
获取每个待训练图像所对应的第三待训练标签,其中,第三待训练标签用于表示实际胜负情况;
根据第一预测标签、第一待训练标签、第二预测标签、第二待训练标签、第三预测标签以及第三待训练标签,训练得到模型核心参数,其中,第三待训练标签属于预测值,第三预测标签属于真实值。
本实施例中,还介绍了目标联合模型可以进一步预测胜负情况。示例性地,在上述图7对应的第四个实施例的基础上,基于LSTM层的输出结果可以得到该帧待训练图像的第三待训练标签。由于根据人工标注结果确定该帧待训练图像的第三待训练标签和第三待训练标签,此时可以采用损失函数获取第三预测标签和第三待训练标签之间的最小值,在最小值的情况下确定模型核心参数。这个时候,模型核心参数不仅包括微操任务(如按键、移动、普攻、技能1、技能2和技能3等)下的模型参数以及大局观任务下的模型参数,同时还可以包括胜负任务下的模型参数,最后根据模型核心参数生成目标联合模型。
再次,本申请实施例中,说明了目标联合模型还可以进一步训练与胜负相关的标签,即服务器通过LSTM层获取目标特征集合所对应的第一预测标签、第二预测标签以及第三预测标签,第三预测标签表示预测得到的与胜负情况相关的标签,然后获取每个待训练图像所对应的第三待训练标签,最后根据第一预测标签、第一待训练标签、第二预测标签、第二待训练标签、第三预测标签以及第三待训练标签,训练得到模型核心参数。通过上述方式,目标联合模型还能够预测比赛胜率,由此,可以加强对局面的认知和学习,从而提升模型应用的可靠性和多样性。
可选地,在上述图7以及图7对应的第一个至第六个实施例中任一项的基础上,本申请实施例提供模型训练的方法第七个可选实施例中,根据每个待训练图像中的待训练特征集合,以及每个待训练图像所对应的第一待训练标签以及第二待训练标签,训练得到目标联合模型之后,还可以包括:
获取待训练视频,其中,待训练视频包括多帧交互图像;
通过目标联合模型获取待训练视频对应的目标场景数据,其中,目标场景数据包括在目标场景下的相关数据;
根据目标场景数据、第一待训练标签以及第一预测标签,训练得到目标模型参数,其中,第一预测标签表示预测得到的与操作内容相关的标签,第一预测标签属于预测值,第一待训练标签属于真实值;
采用目标模型参数对目标联合模型进行更新,得到强化联合模型。
本实施例中,由于MOBA游戏的玩家众多,因此通常有大量人类玩家数据可以用于监督学习训练,从而通过模型来模拟人类操作。然而,由于人类的紧张或注意力不集中等各种因素可能会存在误操作,比如技能释放方向有偏差,躲避敌方技能不及时等,由此导致训练数据中存在着不良样本。有鉴于此,本申请能够通过强化学习来优化目标联合模型中的部分任务层。例如,不对大局观FC层进行强化学习,而只对微操FC层进行强化学习。
为了便于理解,请参阅图17,图17为本申请实施例中强化联合模型的一个系统结构示意图,如图17所示,目标联合模型包括联合模型、大局观FC层以及微操FC层。联合模型 中的编码层和大局观FC层已经通过监督学习获取相应的核心模型参数。需要注意的是,在强化学习的过程中,联合模型中的编码层和大局观FC层的核心模型参数保持不变,如此,在强化学习的时候也就不需要学习特征表达,从而加快了强化学习的收敛速度。微操任务在团战场景的决策步数平均为100步(约20秒),可以有效降低决策步数。通过对微操FC层进行强化,可以提高AI的技能命中率和躲避敌方技能等关键能力。微操FC层采用强化学习算法进行训练,算法具体可以是深度增强学习(Proximal Policy Optimization,PPO)算法。
下面将介绍强化学习的流程:
步骤一,在训练得到目标联合模型之后,服务器可以加载监督学习得到的目标联合模型,并固定联合模型的编码层和大局观FC层,并且需要加载游戏环境。
步骤二,获取待训练视频。其中,待训练视频包括多帧交互图像,利用目标联合模型从待训练视频中的起始帧开始进行对战,保存英雄团战场景的目标场景数据,目标场景数据可以包括特征、动作、奖励信号以及联合模型网络输出的概率分布。其中,特征即为英雄属性的向量特征、小地图类图像特征以及当前视野类图像特征。动作即为玩家控制英雄角色时候所采用的按键。奖励信号即为英雄角色在团战过程中击杀敌方英雄角色的次数。联合模型网络输出的概率分布可以表示为微操任务中每个标签的分布概率,比如标签1的分布概率为0.1,标签2的分布概率为0.3,标签3的分布概率为0.6。
步骤三,根据目标场景数据、第一待训练标签以及第一预测标签,训练得到目标模型参数,采用PPO算法对目标联合模型中的核心模型参数进行更新。需要注意的是,这里只更新微操FC层的模型参数,也就是根据第一待训练标签以及第一预测标签生成更新后的模型参数。其中,第一待训练标签和第一预测标签均为与微操任务相关的标签。
步骤四,如果对待训练视频中每帧图像进行步骤二至步骤四的处理,如果未达到最大迭代帧数,则将更新后的目标联合模型发送至对战环境,并返回至步骤二。如果达到最大迭代帧数,则进入步骤五。其中,最大迭代帧数可以基于经验设置,或者基于场景设置,本申请实施例对此不进行限定。
步骤五,保存最终强化后得到的强化联合模型。
进一步地,本申请实施例中,还可以通过强化学习来优化目标联合模型中的部分任务层,若需要强化微操任务这部分,则服务器获取待训练视频。然后通过目标联合模型获取待训练视频对应的目标场景数据,基于目标场景数据、第一待训练标签以及第一预测标签,训练得到目标模型参数。最后,服务器采用目标模型参数对目标联合模型进行更新,得到强化联合模型。通过上述方式,对微操FC层进行强化可以提高AI能力,此外,强化学习还可以克服由于人类的紧张或者注意力不集中等各种因素所存在的误操作问题,从而大幅地减少了训练数据中存在的不良样本数量,进而提升模型的可靠性,以及应用该模型进行预测的准确度。强化学习方法可以只对部分场景进行强化,从而降低决策步数,并且加快收敛速度。
可选地,在上述图7以及图7对应的第一个至第六个实施例中任一项的基础上,本申请实施例提供模型训练的方法第八个可选实施例中,根据每个待训练图像中的待训练特征集合,以及每个待训练图像所对应的第一待训练标签以及第二待训练标签,训练得到目标联合模型之后,还可以包括:
获取待训练视频,其中,待训练视频包括多帧交互图像;
通过目标联合模型获取待训练视频对应的目标场景数据,其中,目标场景数据包括在目标场景下的相关数据;
根据目标场景数据、第二待训练标签以及第二预测标签,训练得到目标模型参数,其中,第二预测标签表示预测得到的与操作意图相关的标签,第二预测标签属于预测值,第二待训练标签属于真实值;
采用目标模型参数对目标联合模型进行更新,得到强化联合模型。
本实施例中,由于MOBA游戏的玩家众多,因此通常有大量人类玩家数据可以用于监督学习训练,从而通过模型来模拟人类操作。然而,由于人类的紧张或注意力不集中等各种因素可能会存在误操作,比如技能释放方向有偏差,躲避敌方技能不及时等,由此导致训练数据中存在着不良样本。有鉴于此,本申请能够通过强化学习来优化目标联合模型中的部分任务层。例如,不对微操FC层进行强化学习,而只对大局观FC层进行强化学习。
为了便于理解,请参阅图18,图18为本申请实施例中强化联合模型的另一个系统结构示意图,如图18所示,目标联合模型包括联合模型、大局观FC层以及微操FC层。联合模型中的编码层和微操FC层已经通过监督学习获取相应的核心模型参数。需要注意的是,在强化学习的过程中,联合模型中的编码层和微操FC层的核心模型参数保持不变,如此,在强化学习的时候也就不需要学习特征表达,从而加快了强化学习的收敛速度。通过对大局观FC层进行强化,可以提高AI的宏观决策能力。大局观FC层采用强化学习算法进行训练,算法可以是PPO(Proximal Policy Optimization,近端策略优化)算法或者深度强化学习(Actor-Critic)算法。
下面将介绍强化学习的流程:
步骤一,在训练得到目标联合模型之后,服务器可以加载监督学习得到的目标联合模型,并固定联合模型的编码层和微操FC层,并且需要加载游戏环境。
步骤二,获取待训练视频。其中,待训练视频包括多帧交互图像,利用目标联合模型从待训练视频中的起始帧开始进行对战,保存英雄团战场景的目标场景数据,目标场景数据包括“打野”、“清兵”、“团战”以及“推塔”等场景下的数据。
步骤三,根据目标场景数据、第二待训练标签以及第二预测标签,训练得到目标模型参数,采用Actor-Critic算法对目标联合模型中的核心模型参数进行更新。需要注意的是,这里只更新大局观FC层的模型参数,也就是根据第二待训练标签以及第二预测标签生成更新后的模型参数。其中,第二待训练标签和第二预测标签均为与大局观任务相关的标签。
步骤四,如果对待训练视频中每帧图像进行步骤二至步骤四的处理,如果未达到最大迭代帧数,则将更新后的目标联合模型发送至对战环境,并返回至步骤二。如果达到最大迭代帧数,则进入步骤五。
步骤五,保存最终强化后得到的强化联合模型。
进一步地,本申请实施例中,还可以通过强化学习来优化目标联合模型中的部分任务层,若需要强化当任务这部分,则服务器获取待训练视频。然后通过目标联合模型获取待训练视频对应的目标场景数据,基于根据目标场景数据、第二待训练标签以及第二预测标签,训练得到目标模型参数。最后,服务器采用目标模型参数对目标联合模型进行更新,得到强化联合模型。通过上述方式,对大局观FC层进行强化可以提高AI能力。此外,强 化学习还可以克服由于人类的紧张或者注意力不集中等各种因素所存在的误操作问题,从而大幅地减少了训练数据中存在的不良样本数量,进而提升模型的可靠性,以及应用该模型进行预测的准确度。强化学习方法可以只对部分场景进行强化,从而降低决策步数,并且加快收敛速度。
下面对本申请中的服务器进行详细描述,请参阅图19,图19为本申请实施例中服务器一个实施例示意图,服务器30包括:
获取模块301,用于获取待预测图像;
提取模块302,用于提取所述获取模块301获取的所述待预测图像中的待预测特征集合。其中,所述待预测特征集合包括第一待预测特征、第二待预测特征以及第三待预测特征,所述第一待预测特征表示第一区域的图像特征,所述第二待预测特征表示第二区域的图像特征,所述第三待预测特征表示与交互操作相关的属性特征,所述第一区域的范围小于所述第二区域的范围;
所述获取模块301,还用于通过目标联合模型获取所述提取模块302提取的所述待预测特征集合所对应的第一标签以及第二标签。其中,所述第一标签表示与操作内容相关的标签,所述第二标签表示与操作意图相关的标签。
本实施例中,获取模块301获取待预测图像,提取模块302提取所述获取模块301获取的所述待预测图像中的待预测特征集合。其中,所述待预测特征集合包括第一待预测特征、第二待预测特征以及第三待预测特征,所述第一待预测特征表示第一区域的图像特征,所述第二待预测特征表示第二区域的图像特征,所述第三待预测特征表示与交互操作相关的属性特征,所述第一区域的范围小于所述第二区域的范围。所述获取模块301通过目标联合模型获取所述提取模块302提取的所述待预测特征集合所对应的第一标签以及第二标签。其中,所述第一标签表示与操作内容相关的标签,所述第二标签表示与操作意图相关的标签。
本申请实施例中,提供了一种服务器,首先,该服务器获取待预测图像,然后提取待预测图像中的待预测特征集合。其中,待预测特征集合包括第一待预测特征、第二待预测特征以及第三待预测特征,第一待预测特征表示第一区域的图像特征,第二待预测特征表示第二区域的图像特征,第三待预测特征表示与交互操作相关的属性特征,第一区域的范围小于第二区域的范围。最后,服务器可以通过目标联合模型获取待预测图像所对应的第一标签以及第二标签。其中,第一标签表示与操作内容相关的标签,第二标签表示与操作意图相关的标签。通过上述方式,仅使用一个联合模型就可以预测微操和大局观,其中,微操的预测结果表示为第一标签,大局观的预测结果表示第二标签。因此,将大局观模型和微操模型合并成一个联合模型,有效地解决了分层模型中的硬切换问题,提升了预测的便利性。
可选地,在上述图19所对应的实施例的基础上,本申请实施例提供的服务器30的另一实施例中,所述获取模块301,用于通过所述目标联合模型获取所述待预测特征集合所对应的所述第一标签、所述第二标签以及第三标签。其中,所述第三标签表示与胜负情况相关的标签。
其次,本申请实施例中,目标联合模型不但可以输出第一标签和第二标签,还可以进一步输出第三标签,也就是说,目标联合模型还可以预测胜负情况。通过上述方式,在实 际应用中能够更好地预测局势结果,有助于提升预测的可靠性,并且增加预测的灵活度和实用性。
下面对本申请中的服务器进行详细描述,请参阅图20,图20为本申请实施例中服务器一个实施例示意图,服务器40包括:
获取模块401,用于获取待训练图像集合,其中,所述待训练图像集合包括N个待训练图像,所述N为大于或等于1的整数;
提取模块402,用于提取所述获取模块401获取的每个待训练图像中的待训练特征集合。其中,所述待训练特征集合包括第一待训练特征、第二待训练特征以及第三待训练特征,所述第一待训练特征表示第一区域的图像特征,所述第二待训练特征表示第二区域的图像特征,所述第三待训练特征表示与交互操作相关的属性特征,所述第一区域的范围小于所述第二区域的范围;
所述获取模块401,用于获取所述每个待训练图像所对应的第一待训练标签以及第二待训练标签。其中,所述第一待训练标签表示与操作内容相关的标签,所述第二待训练标签表示与操作意图相关的标签;
训练模块403,用于根据所述提取模块402提取的所述每个待训练图像中的所述待训练特征集合,以及所述获取模块获取的所述每个待训练图像所对应的所述第一待训练标签以及所述第二待训练标签,训练得到目标联合模型。
本实施例中,获取模块401获取待训练图像集合。其中,所述待训练图像集合包括N个待训练图像,所述N为大于或等于1的整数,提取模块402提取所述获取模块401获取的每个待训练图像中的待训练特征集合。其中,所述待训练特征集合包括第一待训练特征、第二待训练特征以及第三待训练特征,所述第一待训练特征表示第一区域的图像特征,所述第二待训练特征表示第二区域的图像特征,所述第三待训练特征表示与交互操作相关的属性特征,所述第一区域的范围小于所述第二区域的范围。所述获取模块401获取所述每个待训练图像所对应的第一待训练标签以及第二待训练标签。其中,所述第一待训练标签表示与操作内容相关的标签,所述第二待训练标签表示与操作意图相关的标签,训练模块403根据所述提取模块402提取的所述每个待训练图像中的所述待训练特征集合,以及所述获取模块获取的所述每个待训练图像所对应的所述第一待训练标签以及所述第二待训练标签,训练得到目标联合模型。
本申请实施例中,将介绍一种服务器,首先服务器获取待训练图像集合,然后提取每个待训练图像中的待训练特征集合。其中,待训练特征集合包括第一待训练特征、第二待训练特征以及第三待训练特征。接下来,服务器需要获取每个待训练图像所对应的第一待训练标签以及第二待训练标签,最后根据每个待训练图像中的待训练特征集合,以及每个待训练图像所对应的第一待训练标签以及第二待训练标签,训练得到目标联合模型。通过上述方式,设计了一个可以同时预测微操和大局观的模型,由此,将大局观模型和微操模型合并成一个联合模型,有效地解决了分层模型中的硬切换问题,提升了预测的便利性。同时,考虑大局观任务可以有效地提升宏观决策的准确度,尤其在MOBA游戏中,大局观的决策是非常重要的。
可选地,在上述图20所对应的实施例的基础上,本申请实施例提供的服务器40的另一实施例中,所述第一待训练特征为二维向量特征,其中,所述第一待训练特征包括在所 述第一区域内的角色位置信息、移动对象位置信息、固定对象位置信息以及防御对象位置信息中的至少一种;
所述第二待训练特征为二维向量特征,其中,所述第二待训练特征包括在所述第二区域内的角色位置信息、移动对象位置信息、固定对象位置信息、防御对象位置信息、障碍对象位置信息以及输出对象位置信息中的至少一种;
所述第三待训练特征为一维向量特征,其中,所述第一待训练特征包括角色生命值、角色输出值、时间信息以及比分信息中的至少一种;
其中,所述第一待训练特征、所述第二待训练特征以及所述第三待训练特征之间具有对应关系。
其次,本申请实施例中,介绍了三种待训练特征的内容,其中,第一待训练特征为二维向量特征,第二待训练特征为二维向量特征,第三待训练特征为一维向量特征。通过上述方式,一方面能够确定三种待训练特征中所包含的具体信息,由此得到更多的信息量用于进行模型训练。另一方面,第一待训练特征和第二待训练特征都是二维向量特征,有利于提升特征的空间表达,从而增加特征的多样性。
可选地,在上述图20所对应的实施例的基础上,本申请实施例提供的服务器40的另一实施例中,所述第一待训练标签包括按键类型信息和/或按键参数信息;
其中,所述按键参数信息包括方向型参数、位置型参数以及目标型参数中的至少一项,所述方向型参数用于表示角色移动的方向,所述位置型参数用于表示所述角色所在的位置,所述目标型参数用于表示所述角色的待输出对象。
其次,本申请实施例中,说明了第一待训练标签包括按键类型信息和/或按键参数信息,其中,按键参数信息包括方向型参数、位置型参数以及目标型参数中的至少一项,方向型参数用于表示角色移动的方向,位置型参数用于表示角色所在的位置,目标型参数用于表示角色的待输出对象。通过上述方式,将第一待训练标签的内容更加精细化,以分层的方式建立标签,可以更加贴近人类玩家在游戏过程中的真实操作意图,从而有利于提升AI的学习能力。
可选地,在上述图20所对应的实施例的基础上,本申请实施例提供的服务器40的另一实施例中,所述第二待训练标签包括操作意图信息以及角色位置信息;
其中,所述操作意图信息表示角色与对象进行交互的目的,所述角色位置信息表示所述角色在所述第一区域内的位置。
其次,本申请实施例中,说明了第二待训练标签包括操作意图信息以及角色位置信息,其中,操作意图信息表示角色与对象进行交互的目的,角色位置信息表示角色在第一区域内的位置。通过上述方式,利用操作意图信息以及角色位置信息共同反映人类玩家的大局观,在MOBA游戏中大局观的决策是非常重要的,从而提升了方案的可行性和可操作性。
可选地,在上述图20所对应的实施例的基础上,本申请实施例提供的服务器40的另一实施例中,所述训练模块403,用于对所述每个待训练图像中的所述待训练特征集合进行处理,得到目标特征集合,其中,所述目标特征集合包括第一目标特征、第二目标特征以及第三目标特征;
通过长短期记忆LSTM层获取所述目标特征集合所对应的第一预测标签以及第二预测标签,其中,所述第一预测标签表示预测得到的与操作内容相关的标签,所述第二预测标 签表示预测得到的与操作意图相关的标签;
根据所述每个待训练图像的所述第一预测标签、所述第一待训练标签、所述第二预测标签以及所述第二待训练标签,训练得到模型核心参数,其中,所述第一预测标签与所述第二预测标签均属于预测值,所述第一待训练标签以及所述第二待训练标签均属于真实值;
根据所述模型核心参数生成所述目标联合模型。
其次,本申请实施例中,提供了训练得到目标联合模型的过程,主要包括先对每个待训练图像中的待训练特征集合进行处理,得到目标特征集合。然后通过LSTM获取目标特征集合所对应的第一预测标签以及第二预测标签,再根据每个待训练图像的第一预测标签、第一待训练标签、第二预测标签以及第二待训练标签,训练得到模型核心参数,该模型核心参数用于生成目标联合模型。通过上述方式,利用LSTM层可以解决部分视野不可观测的问题,即LSTM层能够获取过去一段时间内的数据,由此能够使得数据更加完整,有利于在模型训练的过程中进行推断和决策。
可选地,在上述图20所对应的实施例的基础上,本申请实施例提供的服务器40的另一实施例中,所述训练模块403,用于通过全连接层对所述每个待训练图像中的所述第三待训练特征进行处理,得到所述第三目标特征,其中,所述第三目标特征为一维向量特征;
通过卷积层对所述每个待训练图像中的所述第二待训练特征进行处理,得到所述第二目标特征,其中,所述第二目标特征为一维向量特征;
通过所述卷积层对所述每个待训练图像中的所述第一待训练特征进行处理,得到所述第一目标特征,其中,所述第一目标特征为一维向量特征。
再次,本申请实施例中,还可以对待训练特征集合进行处理,即通过全连接层对每个待训练图像中的第一待训练特征进行处理,得到第一目标特征,通过卷积层对每个待训练图像中的第二待训练特征进行处理,得到第二目标特征,通过卷积层对每个待训练图像中的第三待训练特征进行处理,得到第三目标特征。通过上述方式,能够得到均为一维向量的特征,由此可以将这些向量特征进行拼接处理,便于后续的模型训练,有利于提升方案的可行性和可操作性。
可选地,在上述图20所对应的实施例的基础上,本申请实施例提供的服务器40的另一实施例中,所述训练模块403,用于通过长短期记忆LSTM层获取所述目标特征集合所对应的第一预测标签、第二预测标签以及第三预测标签,所述第三预测标签表示预测得到的与胜负情况相关的标签;
获取所述每个待训练图像所对应的第三待训练标签,其中,所述第三待训练标签用于表示实际胜负情况;
根据所述第一预测标签、所述第一待训练标签、所述第二预测标签、所述第二待训练标签、所述第三预测标签以及所述第三待训练标签,训练得到所述模型核心参数,其中,所述第三待训练标签属于预测值,所述第三预测标签属于真实值。
再次,本申请实施例中,说明了目标联合模型还可以进一步训练与胜负相关的标签,即服务器通过LSTM层获取目标特征集合所对应的第一预测标签、第二预测标签以及第三预测标签,第三预测标签表示预测得到的与胜负情况相关的标签,然后获取每个待训练图像所对应的第三待训练标签,最后根据第一预测标签、第一待训练标签、第二预测标签、第二待训练标签、第三预测标签以及第三待训练标签,训练得到模型核心参数。通过上述 方式,目标联合模型还能够预测比赛胜率,由此,可以加强对局面的认知和学习,从而提升模型应用的可靠性和多样性。
可选地,在上述图20所对应的实施例的基础上,请参阅图21,本申请实施例提供的服务器40的另一实施例中,所述服务器40还包括更新模块404;
所述获取模块401,还用于在所述训练模块403根据所述每个待训练图像中的所述待训练特征集合,以及所述每个待训练图像所对应的所述第一待训练标签以及所述第二待训练标签,训练得到目标联合模型之后,获取待训练视频,其中,所述待训练视频包括多帧交互图像;
所述获取模块401,还用于通过所述目标联合模型获取所述待训练视频对应的目标场景数据,其中,所述目标场景数据包括在目标场景下的相关数据;
所述训练模块403,还用于根据所述获取模块401获取的所述目标场景数据、所述第一待训练标签以及第一预测标签,训练得到目标模型参数,其中,所述第一预测标签表示预测得到的与操作内容相关的标签,所述第一预测标签属于预测值,所述第一待训练标签属于真实值;
所述更新模块404,用于采用所述训练模块403训练得到的所述目标模型参数对所述目标联合模型进行更新,得到强化联合模型。
进一步地,本申请实施例中,还可以通过强化学习来优化目标联合模型中的部分任务层,若需要强化微操任务这部分,则服务器获取待训练视频。然后通过目标联合模型获取待训练视频对应的目标场景数据,基于目标场景数据、第一待训练标签以及第一预测标签,训练得到目标模型参数。最后,服务器采用目标模型参数对目标联合模型进行更新,得到强化联合模型。通过上述方式,对微操FC层进行强化可以提高AI能力,此外,强化学习还可以克服由于人类的紧张或者注意力不集中等各种因素所存在的误操作问题,从而大幅地减少了训练数据中存在的不良样本数量,进而提升模型的可靠性,以及应用该模型进行预测的准确度。强化学习方法可以只对部分场景进行强化,从而降低决策步数,并且加快收敛速度。
可选地,在上述图20所对应的实施例的基础上,请再次参阅图21,本申请实施例提供的服务器40的另一实施例中,所述服务器40还包括更新模块404;
所述获取模块401,还用于在所述训练模块403根据所述每个待训练图像中的所述待训练特征集合,以及所述每个待训练图像所对应的所述第一待训练标签以及所述第二待训练标签,训练得到目标联合模型之后,获取待训练视频。其中,所述待训练视频包括多帧交互图像;
所述获取模块401,还用于通过所述目标联合模型获取所述待训练视频对应的目标场景数据,其中,所述目标场景数据包括在目标场景下的相关数据;
所述训练模块403,还用于根据所述获取模块401获取的所述目标场景数据、所述第二待训练标签以及第二预测标签,训练得到目标模型参数。其中,所述第二预测标签表示预测得到的与操作意图相关的标签,所述第二预测标签属于预测值,所述第二待训练标签属于真实值;
所述更新模块404,用于采用所述训练模块403训练得到的所述目标模型参数对所述目标联合模型进行更新,得到强化联合模型。
进一步地,本申请实施例中,还可以通过强化学习来优化目标联合模型中的部分任务层,若需要强化当任务这部分,则服务器获取待训练视频。然后通过目标联合模型获取待训练视频对应的目标场景数据,基于根据目标场景数据、第二待训练标签以及第二预测标签,训练得到目标模型参数。最后,服务器采用目标模型参数对目标联合模型进行更新,得到强化联合模型。通过上述方式,对大局观FC层进行强化可以提高AI能力,此外,强化学习还可以克服由于人类的紧张或者注意力不集中等各种因素所存在的误操作问题,从而大幅地减少了训练数据中存在的不良样本数量,进而提升模型的可靠性,以及应用该模型进行预测的准确度。强化学习方法可以只对部分场景进行强化,从而降低决策步数,并且加快收敛速度。
图22是本申请实施例提供的一种服务器结构示意图,该服务器500可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上中央处理器(central processing units,CPU)522(例如,一个或一个以上处理器)和存储器532,一个或一个以上存储应用程序542或数据544的存储介质530(例如一个或一个以上海量存储设备)。其中,存储器532和存储介质530可以是短暂存储或持久存储。存储在存储介质530的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对服务器中的一系列指令操作。更进一步地,中央处理器522可以设置为与存储介质530通信,在服务器500上执行存储介质530中的一系列指令操作。
服务器500还可以包括一个或一个以上电源526,一个或一个以上有线或无线网络接口550,一个或一个以上输入输出接口558,和/或,一个或一个以上操作系统541,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM等等。
上述实施例中由服务器所执行的步骤可以基于该图22所示的服务器结构。
本申请实施例中,CPU 522用于执行如下步骤:
获取待预测图像;
提取所述待预测图像中的待预测特征集合,其中,所述待预测特征集合包括第一待预测特征、第二待预测特征以及第三待预测特征,所述第一待预测特征表示第一区域的图像特征,所述第二待预测特征表示第二区域的图像特征,所述第三待预测特征表示与交互操作相关的属性特征,所述第一区域的范围小于所述第二区域的范围;
通过目标联合模型获取所述待预测特征集合所对应的第一标签和/或第二标签,其中,所述第一标签表示与操作内容相关的标签,所述第二标签表示与操作意图相关的标签。
可选地,CPU 522用于执行如下步骤:
通过所述目标联合模型获取所述待预测特征集合所对应的所述第一标签、所述第二标签以及第三标签,其中,所述第三标签表示与胜负情况相关的标签。
本申请实施例中,CPU 522用于执行如下步骤:
获取待训练图像集合,其中,所述待训练图像集合包括N个待训练图像,所述N为大于或等于1的整数;
提取每个待训练图像中的待训练特征集合,其中,所述待训练特征集合包括第一待训练特征、第二待训练特征以及第三待训练特征,所述第一待训练特征表示第一区域的图像特征,所述第二待训练特征表示第二区域的图像特征,所述第三待训练特征表示与交互操作相关的属性特征,所述第一区域的范围小于所述第二区域的范围;
获取所述每个待训练图像所对应的第一待训练标签以及第二待训练标签,其中,所述第一待训练标签表示与操作内容相关的标签,所述第二待训练标签表示与操作意图相关的标签;
根据所述每个待训练图像中的所述待训练特征集合,以及所述每个待训练图像所对应的所述第一待训练标签以及所述第二待训练标签,训练得到目标联合模型。
可选地,CPU 522用于执行如下步骤:
对所述每个待训练图像中的所述待训练特征集合进行处理,得到目标特征集合,其中,所述目标特征集合包括第一目标特征、第二目标特征以及第三目标特征;
通过长短期记忆LSTM层获取所述目标特征集合所对应的第一预测标签以及第二预测标签,其中,所述第一预测标签表示预测得到的与操作内容相关的标签,所述第二预测标签表示预测得到的与操作意图相关的标签;
根据所述每个待训练图像的所述第一预测标签、所述第一待训练标签、所述第二预测标签以及所述第二待训练标签,训练得到模型核心参数,其中,所述第一预测标签与所述第二预测标签均属于预测值,所述第一待训练标签以及所述第二待训练标签均属于真实值;
根据所述模型核心参数生成所述目标联合模型。
可选地,CPU 522用于执行如下步骤:
通过全连接层对所述每个待训练图像中的所述第三待训练特征进行处理,得到所述第三目标特征,其中,所述第三目标特征为一维向量特征;
通过卷积层对所述每个待训练图像中的所述第二待训练特征进行处理,得到所述第二目标特征,其中,所述第二目标特征为一维向量特征;
通过所述卷积层对所述每个待训练图像中的所述第一待训练特征进行处理,得到所述第一目标特征,其中,所述第一目标特征为一维向量特征。
可选地,CPU 522用于执行如下步骤:
通过长短期记忆LSTM层获取所述目标特征集合所对应的第一预测标签、第二预测标签以及第三预测标签,所述第三预测标签表示预测得到的与胜负情况相关的标签;
所述根据所述每个待训练图像的所述第一预测标签、所述第一待训练标签、所述第二预测标签以及所述第二待训练标签,训练得到模型核心参数,包括:
获取所述每个待训练图像所对应的第三待训练标签,其中,所述第三待训练标签用于表示实际胜负情况;
根据所述第一预测标签、所述第一待训练标签、所述第二预测标签、所述第二待训练标签、所述第三预测标签以及所述第三待训练标签,训练得到所述模型核心参数,其中,所述第三待训练标签属于预测值,所述第三预测标签属于真实值。
可选地,CPU 522还用于执行如下步骤:
获取待训练视频,其中,所述待训练视频包括多帧交互图像;
通过所述目标联合模型获取所述待训练视频对应的目标场景数据,其中,所述目标场景数据包括在目标场景下的相关数据;
根据所述目标场景数据、所述第一待训练标签以及第一预测标签,训练得到目标模型参数,其中,所述第一预测标签表示预测得到的与操作内容相关的标签,所述第一预测标签属于预测值,所述第一待训练标签属于真实值;
采用所述目标模型参数对所述目标联合模型进行更新,得到强化联合模型。
可选地,CPU 522还用于执行如下步骤:
获取待训练视频,其中,所述待训练视频包括多帧交互图像;
通过所述目标联合模型获取所述待训练视频对应的目标场景数据,其中,所述目标场景数据包括在目标场景下的相关数据;
根据所述目标场景数据、所述第二待训练标签以及第二预测标签,训练得到目标模型参数,其中,所述第二预测标签表示预测得到的与操作意图相关的标签,所述第二预测标签属于预测值,所述第二待训练标签属于真实值;
采用所述目标模型参数对所述目标联合模型进行更新,得到强化联合模型。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
应当理解的是,在本文中提及的“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。“至少一个”表示一个或多个。
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。

Claims (25)

  1. 一种信息预测的方法,其特征在于,所述方法应用于服务器,所述方法包括:
    获取待预测图像;
    提取所述待预测图像中的待预测特征集合,其中,所述待预测特征集合包括第一待预测特征、第二待预测特征以及第三待预测特征,所述第一待预测特征表示第一区域的图像特征,所述第二待预测特征表示第二区域的图像特征,所述第三待预测特征表示与交互操作相关的属性特征,所述第一区域的范围小于所述第二区域的范围;
    通过目标联合模型获取所述待预测特征集合所对应的第一标签和/或第二标签,其中,所述第一标签表示与操作内容相关的标签,所述第二标签表示与操作意图相关的标签。
  2. 根据权利要求1所述的方法,其特征在于,所述通过目标联合模型获取所述待预测特征集合所对应的第一标签和/或第二标签,包括:
    通过所述目标联合模型获取所述待预测特征集合所对应的所述第一标签和/或所述第二标签,以及第三标签,其中,所述第三标签表示与胜负情况相关的标签。
  3. 一种模型训练的方法,其特征在于,所述方法应用于服务器,所述方法包括:
    获取待训练图像集合,其中,所述待训练图像集合包括N个待训练图像,所述N为大于或等于1的整数;
    提取每个待训练图像中的待训练特征集合,其中,所述待训练特征集合包括第一待训练特征、第二待训练特征以及第三待训练特征,所述第一待训练特征表示第一区域的图像特征,所述第二待训练特征表示第二区域的图像特征,所述第三待训练特征表示与交互操作相关的属性特征,所述第一区域的范围小于所述第二区域的范围;
    获取所述每个待训练图像所对应的第一待训练标签以及第二待训练标签,其中,所述第一待训练标签表示与操作内容相关的标签,所述第二待训练标签表示与操作意图相关的标签;
    根据所述每个待训练图像中的所述待训练特征集合,以及所述每个待训练图像所对应的所述第一待训练标签以及所述第二待训练标签,训练得到目标联合模型。
  4. 根据权利要求3所述的方法,其特征在于,
    所述第一待训练特征为二维向量特征,其中,所述第一待训练特征包括在所述第一区域内的角色位置信息、移动对象位置信息、固定对象位置信息以及防御对象位置信息中的至少一种;
    所述第二待训练特征为二维向量特征,其中,所述第二待训练特征包括在所述第二区域内的角色位置信息、移动对象位置信息、固定对象位置信息、防御对象位置信息、障碍对象位置信息以及输出对象位置信息中的至少一种;
    所述第三待训练特征为一维向量特征,其中,所述第一待训练特征包括角色生命值、角色输出值、时间信息以及比分信息中的至少一种;
    其中,所述第一待训练特征、所述第二待训练特征以及所述第三待训练特征之间具有对应关系。
  5. 根据权利要求3所述的方法,其特征在于,
    所述第一待训练标签包括按键类型信息和/或按键参数信息;
    其中,所述按键参数信息包括方向型参数、位置型参数以及目标型参数中的至少一项, 所述方向型参数用于表示角色移动的方向,所述位置型参数用于表示所述角色所在的位置,所述目标型参数用于表示所述角色的待输出对象。
  6. 根据权利要求3所述的方法,其特征在于,
    所述第二待训练标签包括操作意图信息以及角色位置信息;
    其中,所述操作意图信息表示角色与对象进行交互的目的,所述角色位置信息表示所述角色在所述第一区域内的位置。
  7. 根据权利要求3所述的方法,其特征在于,所述根据所述每个待训练图像中的所述待训练特征集合,以及所述每个待训练图像所对应的所述第一待训练标签以及所述第二待训练标签,训练得到目标联合模型,包括:
    对所述每个待训练图像中的所述待训练特征集合进行处理,得到目标特征集合,其中,所述目标特征集合包括第一目标特征、第二目标特征以及第三目标特征;
    通过长短期记忆LSTM层获取所述目标特征集合所对应的第一预测标签以及第二预测标签,其中,所述第一预测标签表示预测得到的与操作内容相关的标签,所述第二预测标签表示预测得到的与操作意图相关的标签;
    根据所述每个待训练图像的所述第一预测标签、所述第一待训练标签、所述第二预测标签以及所述第二待训练标签,训练得到模型核心参数,其中,所述第一预测标签与所述第二预测标签均属于预测值,所述第一待训练标签以及所述第二待训练标签均属于真实值;
    根据所述模型核心参数生成所述目标联合模型。
  8. 根据权利要求7所述的方法,其特征在于,所述对所述每个待训练图像中的所述待训练特征集合进行处理,得到目标特征集合,包括:
    通过全连接层对所述每个待训练图像中的所述第三待训练特征进行处理,得到所述第三目标特征,其中,所述第三目标特征为一维向量特征;
    通过卷积层对所述每个待训练图像中的所述第二待训练特征进行处理,得到所述第二目标特征,其中,所述第二目标特征为一维向量特征;
    通过所述卷积层对所述每个待训练图像中的所述第一待训练特征进行处理,得到所述第一目标特征,其中,所述第一目标特征为一维向量特征。
  9. 根据权利要求7所述的方法,其特征在于,所述通过长短期记忆LSTM层获取所述目标特征集合所对应的第一预测标签以及第二预测标签,包括:
    通过长短期记忆LSTM层获取所述目标特征集合所对应的第一预测标签、第二预测标签以及第三预测标签,所述第三预测标签表示预测得到的与胜负情况相关的标签;
    所述根据所述每个待训练图像的所述第一预测标签、所述第一待训练标签、所述第二预测标签以及所述第二待训练标签,训练得到模型核心参数,包括:
    获取所述每个待训练图像所对应的第三待训练标签,其中,所述第三待训练标签用于表示实际胜负情况;
    根据所述第一预测标签、所述第一待训练标签、所述第二预测标签、所述第二待训练标签、所述第三预测标签以及所述第三待训练标签,训练得到所述模型核心参数,其中,所述第三待训练标签属于预测值,所述第三预测标签属于真实值。
  10. 根据权利要求3至9中任一项所述的方法,其特征在于,所述根据所述每个待训练图像中的所述待训练特征集合,以及所述每个待训练图像所对应的所述第一待训练标签以及 所述第二待训练标签,训练得到目标联合模型之后,所述方法还包括:
    获取待训练视频,其中,所述待训练视频包括多帧交互图像;
    通过所述目标联合模型获取所述待训练视频对应的目标场景数据,其中,所述目标场景数据包括在目标场景下的相关数据;
    根据所述目标场景数据、所述第一待训练标签以及第一预测标签,训练得到目标模型参数,其中,所述第一预测标签表示预测得到的与操作内容相关的标签,所述第一预测标签属于预测值,所述第一待训练标签属于真实值;
    采用所述目标模型参数对所述目标联合模型进行更新,得到强化联合模型。
  11. 根据权利要求3至9中任一项所述的方法,其特征在于,所述根据所述每个待训练图像中的所述待训练特征集合,以及所述每个待训练图像所对应的所述第一待训练标签以及所述第二待训练标签,训练得到目标联合模型之后,所述方法还包括:
    获取待训练视频,其中,所述待训练视频包括多帧交互图像;
    通过所述目标联合模型获取所述待训练视频对应的目标场景数据,其中,所述目标场景数据包括在目标场景下的相关数据;
    根据所述目标场景数据、所述第二待训练标签以及第二预测标签,训练得到目标模型参数,其中,所述第二预测标签表示预测得到的与操作意图相关的标签,所述第二预测标签属于预测值,所述第二待训练标签属于真实值;
    采用所述目标模型参数对所述目标联合模型进行更新,得到强化联合模型。
  12. 一种服务器,其特征在于,包括:
    获取模块,用于获取待预测图像;
    提取模块,用于提取所述获取模块获取的所述待预测图像中的待预测特征集合,其中,所述待预测特征集合包括第一待预测特征、第二待预测特征以及第三待预测特征,所述第一待预测特征表示第一区域的图像特征,所述第二待预测特征表示第二区域的图像特征,所述第三待预测特征表示与交互操作相关的属性特征,所述第一区域的范围小于所述第二区域的范围;
    所述获取模块,还用于通过目标联合模型获取所述提取模块提取的所述待预测特征集合所对应的第一标签以及第二标签,其中,所述第一标签表示与操作内容相关的标签,所述第二标签表示与操作意图相关的标签。
  13. 一种服务器,其特征在于,包括:
    获取模块,用于获取待训练图像集合,其中,所述待训练图像集合包括N个待训练图像,所述N为大于或等于1的整数;
    提取模块,用于提取所述获取模块获取的每个待训练图像中的待训练特征集合,其中,所述待训练特征集合包括第一待训练特征、第二待训练特征以及第三待训练特征,所述第一待训练特征表示第一区域的图像特征,所述第二待训练特征表示第二区域的图像特征,所述第三待训练特征表示与交互操作相关的属性特征,所述第一区域的范围小于所述第二区域的范围;
    所述获取模块,用于获取所述每个待训练图像所对应的第一待训练标签以及第二待训练标签,其中,所述第一待训练标签表示与操作内容相关的标签,所述第二待训练标签表示与操作意图相关的标签;
    训练模块,用于根据所述提取模块提取的所述每个待训练图像中的所述待训练特征集合,以及所述获取模块获取的所述每个待训练图像所对应的所述第一待训练标签以及所述第二待训练标签,训练得到目标联合模型。
  14. 一种服务器,其特征在于,包括:存储器、收发器、处理器以及总线系统;
    其中,所述存储器用于存储程序;
    所述处理器用于执行所述存储器中的程序,包括如下步骤:
    获取待预测图像;
    提取所述待预测图像中的待预测特征集合,其中,所述待预测特征集合包括第一待预测特征、第二待预测特征以及第三待预测特征,所述第一待预测特征表示第一区域的图像特征,所述第二待预测特征表示第二区域的图像特征,所述第三待预测特征表示与交互操作相关的属性特征,所述第一区域的范围小于所述第二区域的范围;
    通过目标联合模型获取所述待预测特征集合所对应的第一标签和/或第二标签,其中,所述第一标签表示与操作内容相关的标签,所述第二标签表示与操作意图相关的标签;
    所述总线系统用于连接所述存储器以及所述处理器,以使所述存储器以及所述处理器进行通信。
  15. 根据权利要求14所述的服务器,其特征在于,所述处理器用于执行如下步骤:
    通过所述目标联合模型获取所述待预测特征集合所对应的所述第一标签和/或所述第二标签,以及第三标签,其中,所述第三标签表示与胜负情况相关的标签。
  16. 一种服务器,其特征在于,包括:存储器、收发器、处理器以及总线系统;
    其中,所述存储器用于存储程序;
    所述处理器用于执行所述存储器中的程序,包括如下步骤:
    获取待训练图像集合,其中,所述待训练图像集合包括N个待训练图像,所述N为大于或等于1的整数;
    提取每个待训练图像中的待训练特征集合,其中,所述待训练特征集合包括第一待训练特征、第二待训练特征以及第三待训练特征,所述第一待训练特征表示第一区域的图像特征,所述第二待训练特征表示第二区域的图像特征,所述第三待训练特征表示与交互操作相关的属性特征,所述第一区域的范围小于所述第二区域的范围;
    获取所述每个待训练图像所对应的第一待训练标签以及第二待训练标签,其中,所述第一待训练标签表示与操作内容相关的标签,所述第二待训练标签表示与操作意图相关的标签;
    根据所述每个待训练图像中的所述待训练特征集合,以及所述每个待训练图像所对应的所述第一待训练标签以及所述第二待训练标签,训练得到目标联合模型;
    所述总线系统用于连接所述存储器以及所述处理器,以使所述存储器以及所述处理器进行通信。
  17. 根据权利要求16所述的服务器,其特征在于,所述处理器用于执行如下步骤:
    对所述每个待训练图像中的所述待训练特征集合进行处理,得到目标特征集合,其中,所述目标特征集合包括第一目标特征、第二目标特征以及第三目标特征;
    通过长短期记忆LSTM层获取所述目标特征集合所对应的第一预测标签以及第二预测标签,其中,所述第一预测标签表示预测得到的与操作内容相关的标签,所述第二预测标签表示预测得到的与操作意图相关的标签;
    根据所述每个待训练图像的所述第一预测标签、所述第一待训练标签、所述第二预测标签以及所述第二待训练标签,训练得到模型核心参数,其中,所述第一预测标签与所述第二预测标签均属于预测值,所述第一待训练标签以及所述第二待训练标签均属于真实值;
    根据所述模型核心参数生成所述目标联合模型。
  18. 根据权利要求17所述的服务器,其特征在于,所述处理器用于执行如下步骤:
    通过全连接层对所述每个待训练图像中的所述第三待训练特征进行处理,得到所述第三目标特征,其中,所述第三目标特征为一维向量特征;
    通过卷积层对所述每个待训练图像中的所述第二待训练特征进行处理,得到所述第二目标特征,其中,所述第二目标特征为一维向量特征;
    通过所述卷积层对所述每个待训练图像中的所述第一待训练特征进行处理,得到所述第一目标特征,其中,所述第一目标特征为一维向量特征。
  19. 根据权利要求17所述的服务器,其特征在于,所述处理器用于执行如下步骤:
    通过长短期记忆LSTM层获取所述目标特征集合所对应的第一预测标签、第二预测标签以及第三预测标签,所述第三预测标签表示预测得到的与胜负情况相关的标签;
    所述根据所述每个待训练图像的所述第一预测标签、所述第一待训练标签、所述第二预测标签以及所述第二待训练标签,训练得到模型核心参数,包括:
    获取所述每个待训练图像所对应的第三待训练标签,其中,所述第三待训练标签用于表示实际胜负情况;
    根据所述第一预测标签、所述第一待训练标签、所述第二预测标签、所述第二待训练标签、所述第三预测标签以及所述第三待训练标签,训练得到所述模型核心参数,其中,所述第三待训练标签属于预测值,所述第三预测标签属于真实值。
  20. 根据权利要求16-19中任一项所述的服务器,其特征在于,所述处理器用于执行如下步骤:
    获取待训练视频,其中,所述待训练视频包括多帧交互图像;
    通过所述目标联合模型获取所述待训练视频对应的目标场景数据,其中,所述目标场景数据包括在目标场景下的相关数据;
    根据所述目标场景数据、所述第一待训练标签以及第一预测标签,训练得到目标模型参数,其中,所述第一预测标签表示预测得到的与操作内容相关的标签,所述第一预测标签属于预测值,所述第一待训练标签属于真实值;
    采用所述目标模型参数对所述目标联合模型进行更新,得到强化联合模型。
  21. 根据权利要求16-19中任一项所述的服务器,其特征在于,所述处理器用于执行如下步骤:
    获取待训练视频,其中,所述待训练视频包括多帧交互图像;
    通过所述目标联合模型获取所述待训练视频对应的目标场景数据,其中,所述目标场景数据包括在目标场景下的相关数据;
    根据所述目标场景数据、所述第二待训练标签以及第二预测标签,训练得到目标模型参数,其中,所述第二预测标签表示预测得到的与操作意图相关的标签,所述第二预测标签属于预测值,所述第二待训练标签属于真实值;
    采用所述目标模型参数对所述目标联合模型进行更新,得到强化联合模型。
  22. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有至少一条指令,所述至少一条指令在被执行时实现如权利要求1或2所述的信息预测的方法。
  23. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有至少一条指令,所述至少一条指令在被执行时实现如权利要求3-11中任一项所述的模型训练的方法。
  24. 一种计算机程序产品,所述计算机程序产品包括:计算机程序代码,当所述计算机程序代码被计算机运行时,使得所述计算机执行所述权利要求1或2所述的信息预测的方法。
  25. 一种计算机程序产品,所述计算机程序产品包括:计算机程序代码,当所述计算机程序代码被计算机运行时,使得所述计算机执行所述权利要求3-11中任一项所述的模型训练的方法。
PCT/CN2019/124681 2018-12-13 2019-12-11 信息预测的方法、模型训练的方法以及服务器 WO2020119737A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
KR1020217017878A KR102542774B1 (ko) 2018-12-13 2019-12-11 정보 예측 방법, 모델 훈련 방법 및 서버
EP19896168.2A EP3896611A4 (en) 2018-12-13 2019-12-11 INFORMATION PREDICTION METHODS, MODEL TRAINING METHODS AND SERVERS
JP2021512924A JP7199517B2 (ja) 2018-12-13 2019-12-11 情報予測方法、モデルトレーニング方法、サーバー及びコンピュータプログラム
US17/201,152 US20210201148A1 (en) 2018-12-13 2021-03-15 Method, apparatus, and storage medium for predicting information

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811526060.1 2018-12-13
CN201811526060.1A CN110163238B (zh) 2018-12-13 2018-12-13 一种信息预测的方法、模型训练的方法以及服务器

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/201,152 Continuation US20210201148A1 (en) 2018-12-13 2021-03-15 Method, apparatus, and storage medium for predicting information

Publications (1)

Publication Number Publication Date
WO2020119737A1 true WO2020119737A1 (zh) 2020-06-18

Family

ID=67645216

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/124681 WO2020119737A1 (zh) 2018-12-13 2019-12-11 信息预测的方法、模型训练的方法以及服务器

Country Status (6)

Country Link
US (1) US20210201148A1 (zh)
EP (1) EP3896611A4 (zh)
JP (1) JP7199517B2 (zh)
KR (1) KR102542774B1 (zh)
CN (1) CN110163238B (zh)
WO (1) WO2020119737A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115121913A (zh) * 2022-08-30 2022-09-30 北京博清科技有限公司 激光中心线的提取方法

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110163238B (zh) * 2018-12-13 2023-04-07 腾讯科技(深圳)有限公司 一种信息预测的方法、模型训练的方法以及服务器
CN111450534B (zh) * 2020-03-31 2021-08-13 腾讯科技(深圳)有限公司 一种标签预测模型的训练方法、标签预测的方法及装置
CN113469188A (zh) * 2021-07-15 2021-10-01 有米科技股份有限公司 字符识别模型训练的数据增强、字符识别的方法及装置
CN113780101A (zh) * 2021-08-20 2021-12-10 京东鲲鹏(江苏)科技有限公司 避障模型的训练方法、装置、电子设备及存储介质
KR102593036B1 (ko) 2021-11-24 2023-10-23 고려대학교 산학협력단 알츠하이머병 진단 모델의 결정을 추론하고 강화하는 방법 및 장치
CN116109525B (zh) * 2023-04-11 2024-01-05 北京龙智数科科技服务有限公司 基于多维度数据增强的强化学习方法及装置
CN116842856B (zh) * 2023-09-04 2023-11-14 长春工业大学 一种基于深度强化学习的工业过程优化方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103544496A (zh) * 2012-07-12 2014-01-29 同济大学 基于空间与时间信息融合的机器人场景识别方法
US20180121748A1 (en) * 2016-11-02 2018-05-03 Samsung Electronics Co., Ltd. Method and apparatus to recognize object based on attribute of object and train
CN108460389A (zh) * 2017-02-20 2018-08-28 阿里巴巴集团控股有限公司 一种识别图像中对象的类型预测方法、装置及电子设备
CN108724182A (zh) * 2018-05-23 2018-11-02 苏州大学 基于多类别模仿学习的端到端游戏机器人生成方法及系统
CN109893857A (zh) * 2019-03-14 2019-06-18 腾讯科技(深圳)有限公司 一种操作信息预测的方法、模型训练的方法及相关装置
CN110163238A (zh) * 2018-12-13 2019-08-23 腾讯科技(深圳)有限公司 一种信息预测的方法、模型训练的方法以及服务器

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3827691B2 (ja) * 2004-09-03 2006-09-27 株式会社コナミデジタルエンタテインメント ゲーム装置、その制御方法、ならびに、プログラム
US8774515B2 (en) * 2011-04-20 2014-07-08 Xerox Corporation Learning structured prediction models for interactive image labeling
CN103544960B (zh) * 2013-11-11 2016-03-30 苏州威士达信息科技有限公司 基于人耳感知的drm+系统的动态数据发送方法
JP2015198935A (ja) * 2014-04-04 2015-11-12 コナミゲーミング インコーポレーテッド ゲーミング環境の操作のためのシステムおよび方法
CN107480687A (zh) * 2016-06-08 2017-12-15 富士通株式会社 信息处理装置和信息处理方法
CN107766870A (zh) * 2016-08-22 2018-03-06 富士通株式会社 信息处理装置和信息处理方法
CN107019901B (zh) * 2017-03-31 2020-10-20 北京大学深圳研究生院 基于图像识别及自动化控制的棋牌类游戏自动博弈机器人的建立方法
CN108090561B (zh) * 2017-11-09 2021-12-07 腾讯科技(成都)有限公司 存储介质、电子装置、游戏操作的执行方法和装置
CN107890674A (zh) * 2017-11-13 2018-04-10 杭州电魂网络科技股份有限公司 Ai行为调用方法和装置
CN108434740B (zh) * 2018-03-23 2021-01-29 腾讯科技(深圳)有限公司 一种策略信息确定的方法及装置、存储介质
CN109529338B (zh) * 2018-11-15 2021-12-17 腾讯科技(深圳)有限公司 对象控制方法、装置、电子设计及计算机可读介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103544496A (zh) * 2012-07-12 2014-01-29 同济大学 基于空间与时间信息融合的机器人场景识别方法
US20180121748A1 (en) * 2016-11-02 2018-05-03 Samsung Electronics Co., Ltd. Method and apparatus to recognize object based on attribute of object and train
CN108460389A (zh) * 2017-02-20 2018-08-28 阿里巴巴集团控股有限公司 一种识别图像中对象的类型预测方法、装置及电子设备
CN108724182A (zh) * 2018-05-23 2018-11-02 苏州大学 基于多类别模仿学习的端到端游戏机器人生成方法及系统
CN110163238A (zh) * 2018-12-13 2019-08-23 腾讯科技(深圳)有限公司 一种信息预测的方法、模型训练的方法以及服务器
CN109893857A (zh) * 2019-03-14 2019-06-18 腾讯科技(深圳)有限公司 一种操作信息预测的方法、模型训练的方法及相关装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3896611A4

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115121913A (zh) * 2022-08-30 2022-09-30 北京博清科技有限公司 激光中心线的提取方法
CN115121913B (zh) * 2022-08-30 2023-01-10 北京博清科技有限公司 激光中心线的提取方法

Also Published As

Publication number Publication date
EP3896611A4 (en) 2022-01-19
JP2021536066A (ja) 2021-12-23
US20210201148A1 (en) 2021-07-01
JP7199517B2 (ja) 2023-01-05
KR102542774B1 (ko) 2023-06-14
EP3896611A1 (en) 2021-10-20
KR20210090239A (ko) 2021-07-19
CN110163238B (zh) 2023-04-07
CN110163238A (zh) 2019-08-23

Similar Documents

Publication Publication Date Title
WO2020119737A1 (zh) 信息预测的方法、模型训练的方法以及服务器
CN109893857B (zh) 一种操作信息预测的方法、模型训练的方法及相关装置
Hausknecht et al. A neuroevolution approach to general atari game playing
US11491400B2 (en) Method, apparatus, and device for scheduling virtual objects in virtual environment
Andersen et al. Deep RTS: a game environment for deep reinforcement learning in real-time strategy games
CN112691377B (zh) 虚拟角色的控制方法、装置、电子设备及存储介质
CN111111220B (zh) 多人对战游戏的自对弈模型训练方法、装置和计算机设备
CN110064205B (zh) 用于游戏的数据处理方法、设备和介质
CN109499068A (zh) 对象的控制方法和装置、存储介质、电子装置
CN111450534B (zh) 一种标签预测模型的训练方法、标签预测的方法及装置
CN111437608B (zh) 基于人工智能的游戏对局方法、装置、设备及存储介质
CN112870721B (zh) 一种游戏互动方法、装置、设备及存储介质
CN111450531B (zh) 虚拟角色控制方法、装置、电子设备以及存储介质
WO2023024762A1 (zh) 人工智能对象控制方法、装置、设备及存储介质
CN112402986A (zh) 一种对战游戏中强化学习模型的训练方法及装置
CN116821693B (zh) 虚拟场景的模型训练方法、装置、电子设备及存储介质
Liaw et al. Evolving a team in a first-person shooter game by using a genetic algorithm
CN115888119A (zh) 一种游戏ai训练方法、装置、电子设备及存储介质
CN113018862A (zh) 虚拟对象的控制方法、装置、电子设备及存储介质
CN116726500B (zh) 一种虚拟角色的控制方法、装置、电子设备和存储介质
Zhang Using artificial intelligence assistant technology to develop animation games on iot
CN116966573A (zh) 交互模型处理方法、装置、计算机设备和存储介质
Lin et al. AI Reinforcement Study of Gank Behavior in MOBA Games
Wu et al. Performance Comparison Between Genetic Fuzzy Tree and Reinforcement Learning in Gaming Environment
Moreno et al. Ithaca. A Tool for Integrating Fuzzy Logic in Unity

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19896168

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021512924

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 20217017878

Country of ref document: KR

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2019896168

Country of ref document: EP

Effective date: 20210713