CN117899483A

CN117899483A - Data processing method, device, equipment and storage medium

Info

Publication number: CN117899483A
Application number: CN202410311420.5A
Authority: CN
Inventors: 文荟俨; 刘一锋; 林上奥; 刘戈; 邱福浩; 付强
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2024-03-19
Filing date: 2024-03-19
Publication date: 2024-04-19
Anticipated expiration: 2044-03-19
Also published as: CN117899483B

Abstract

The embodiment of the application discloses a data processing method, a device, equipment and a storage medium, and can be applied to various scenes such as artificial intelligence, intelligent traffic, auxiliary driving and the like. The data processing method comprises the following steps: position reconstruction is carried out on the absolute position information associated with each game map, so that relative position information is obtained; according to map resource files corresponding to the M game maps respectively, map environment perception information corresponding to the M game maps respectively is determined; controlling an initial agent model, and executing a game task in M game maps according to the relative position information and map environment perception information respectively corresponding to the M game maps and game parameters respectively corresponding to the M game maps; and according to the task execution results respectively corresponding to the M game maps, carrying out parameter adjustment on the initial intelligent agent model to obtain the general intelligent agent model. By adopting the training method and the training device, the training efficiency and the training cost of the intelligent body model can be improved.

Description

Data processing method, device, equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a data processing method, apparatus, device, and storage medium.

Background

With the development of internet technology, games are becoming a common way of daily entertainment, and some intelligent agent models, or AI characters, are often added to many games at present to accompany player characters to complete game play. In general, in order for an agent model to successfully complete a game, it is necessary to provide the agent model with the ability to complete the game by means of machine learning.

At present, one agent model is designed and trained based on one game map in a game, the number of the game maps in the game is large, and the agent models corresponding to each game map need to be designed and trained one by one, so that the training cost of the agent models of a plurality of game maps is high and the training efficiency is low.

Disclosure of Invention

The embodiment of the application provides a data processing method, a device, equipment and a storage medium, which can improve the training efficiency of an agent model and reduce the training cost of the agent model.

An aspect of an embodiment of the present application provides a data processing method, including:

According to the reference position of each game map in M game maps of the virtual game environment, carrying out position reconstruction on the absolute position information associated with each game map to obtain the relative position information associated with each game map;

According to map key information respectively associated with the M game maps, map environment perception information respectively corresponding to the M game maps is determined;

controlling the initial intelligent agent model, and executing game tasks in the M game maps according to the relative position information and the map environment perception information respectively corresponding to the M game maps and the game parameters respectively corresponding to the M game maps to obtain task execution results respectively corresponding to the initial intelligent agent model in the M game maps;

and carrying out parameter adjustment on model parameters in the initial intelligent agent model according to task execution results respectively corresponding to the M game maps to obtain the general intelligent agent model.

An aspect of an embodiment of the present application provides a data processing apparatus, including:

The first reconstruction module is used for carrying out position reconstruction on absolute position information associated with each game map according to the reference position of each game map in M game maps of the virtual game environment to obtain relative position information associated with each game map;

The first determining module is used for determining map environment perception information corresponding to the M game maps according to the map key information respectively associated with the M game maps;

The first execution module is used for controlling the initial intelligent agent model, executing the game-playing tasks in the M game maps according to the relative position information and the map environment perception information respectively corresponding to the M game maps and the game parameters respectively corresponding to the M game maps, and obtaining task execution results respectively corresponding to the initial intelligent agent model in the M game maps;

and the first adjusting module is used for carrying out parameter adjustment on model parameters in the initial intelligent agent model according to task execution results respectively corresponding to the M game maps to obtain the general intelligent agent model.

In one aspect, the present application provides a computer readable storage medium storing a computer program adapted to be loaded and executed by a processor, so that a computer device having the processor performs the method provided by the embodiment of the present application.

In one aspect, embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the method provided by the embodiment of the present application.

The embodiment of the application provides a general intelligent body model training method, which obtains a general intelligent body model through fusion training according to M game maps, M is an integer larger than 1, and the general intelligent body model has higher universality and adaptability, can adapt to a plurality of game maps without training one intelligent body model aiming at different game maps, reduces the training cost of the intelligent body model and improves the training efficiency of the intelligent body model. Specifically, absolute position information associated with each game map in the M game maps is converted into relative position information, so that the initial intelligent agent model can capture commonalities among different game maps better, and the problems that learning ambiguity and learning difficulty of the initial intelligent agent model occur in the M game maps are avoided. Meanwhile, the road searching capability of the initial intelligent body model on different game maps is enhanced through the map environment perception information corresponding to each game map, and the problem of game environment perception loss caused by game map migration is solved. And controlling the initial intelligent agent model, executing a game task in the M game maps according to the relative position information and the map environment perception information respectively corresponding to the M game maps and the game parameters respectively corresponding to the M game maps, and training the initial intelligent agent model to obtain the general intelligent agent model. Therefore, the initial intelligent body model can learn general game knowledge and general game strategies in M game maps, the general intelligent body model obtained by training is applied to any game map, one intelligent body model is not required to be trained aiming at different game maps, the training cost of the intelligent body model can be greatly reduced, and the training efficiency of the intelligent body model can be improved.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a data processing system according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a training method for a general agent model according to an embodiment of the present application;

FIG. 3 is a schematic flow chart of a data processing method according to an embodiment of the present application;

FIG. 4a is a schematic diagram of a map resource file according to an embodiment of the present application;

FIG. 4b is a schematic view of a walkable region provided by an embodiment of the present application;

FIG. 4c is a schematic diagram of an initial path structure diagram provided by an embodiment of the present application;

FIG. 4d is a schematic diagram of a target path structure diagram according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a multi-map fusion training general agent model according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a generic agent model training provided by an embodiment of the present application;

FIG. 7 is a schematic flow chart of a data processing method according to an embodiment of the present application;

FIG. 8 is a schematic diagram of a data processing apparatus according to an embodiment of the present application;

Fig. 9 is a schematic diagram of a computer device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The embodiment of the application relates to the technical field of artificial intelligence, in particular to a general intelligent body model which is applicable to a plurality of game maps in a virtual game environment and can be obtained through training, so that the training efficiency and the training cost of the intelligent body model can be improved, the game accuracy of the general intelligent body model in the plurality of game maps can be improved, and the game performance of the general intelligent body model in the plurality of game maps can be improved.

Wherein artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) is the theory, method, technique, and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend, and expand human intelligence, sense the environment, acquire knowledge, and use knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision. The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning, automatic driving, intelligent traffic and other directions.

In particular, the present application relates to machine learning, which is a subordinate to artificial intelligence technology. Machine learning (MACHINE LEARNING, ML) is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, and the like. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, confidence networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like.

In order to facilitate understanding of the technical solution provided by the embodiments of the present application, some key terms used in the embodiments of the present application are explained here:

Reinforcement learning: also known as re-excitation learning, evaluation learning or reinforcement learning, is one of the paradigm and methodology of machine learning for describing and solving the problem that an agent model passes through a learning strategy in the course of interaction with an environment to achieve maximization of game return benefits or to achieve a specific goal. Unlike supervised learning and unsupervised learning, reinforcement learning does not require any data to be given in advance, but rather obtains learning information and updates model parameters by receiving rewards (feedback) of actions from the environment. Specifically, reinforcement learning is learning in which the agent model learns in a "trial and error" manner, and rewards guiding actions obtained by interacting with the environment are aimed at making the agent model obtain the largest rewards. Reinforcement learning is different from supervised learning in connection with sense learning, and is mainly represented by reinforcement signals, wherein reinforcement signals provided by the environment in reinforcement learning are used for evaluating the quality of action, knowledge is obtained in the action-evaluation environment, and action schemes are improved to adapt to the environment.

The basic principle is as follows: if a certain behavior strategy of the smart model results in an environmental positive reward (signal enhancement), the smart model's later trend of generating this behavior strategy will be enhanced, and the smart model's goal is to find the optimal strategy to maximize the desired discount rewards and. The goal of reinforcement learning is to dynamically adjust the model parameters to achieve reinforcement signal maximization.

And (3) an agent model: or AI characters, which is a technology for developing intelligence and methods for simulating and expanding a person, and realizing character simulation of a real player in a game, refers to a virtual character in a virtual game environment that can be controlled to complete a process related to the virtual game environment, and a virtual character created in a player account is controlled by a player, while an agent model is not controlled by a player, but performs autonomous actions in the virtual game environment according to autonomously learned knowledge and strategies. The agent model refers to a movable object in a virtual game environment, and the movable object can be at least one of a virtual character, a virtual animal and a cartoon character. Alternatively, when the virtual game environment is a three-dimensional virtual game scene, the agent model may be a three-dimensional stereoscopic model created based on an animated skeleton technique. Each agent model has its own shape and volume in the three-dimensional virtual game scene, occupying a portion of the space in the three-dimensional virtual game scene. The initial agent model in the present application may be an agent model based on reinforcement learning.

Virtual gaming environment: the virtual environment in which the virtual character is located, that is, the virtual game environment is a virtual environment required for providing a game, that is, a virtual environment displayed (or provided) when the game client runs on the terminal device, and the virtual environment may be a simulation environment of a real world, a semi-simulation and semi-virtual three-dimensional environment, or a pure virtual three-dimensional environment, for example, any one of a two-dimensional virtual game scene, a 2.5-dimensional virtual game scene and a three-dimensional virtual game scene, and the following embodiments are exemplified by the virtual environment being a three-dimensional virtual game scene, but are not limited thereto. Optionally, the virtual environment is also used for virtual opponent scene combat between at least two virtual characters. Optionally, the virtual environment is also for use in a combat between at least two virtual characters using the game props.

The game task: a game play result of a game in the present field is determined by a task that needs to be completed in a virtual game environment, for example, for a game, there is a mode of determining a game win or lose, and a game play task refers to a condition that a game win or lose is completed. In different games and different game scenes, the game-playing tasks may be different, for example, in some scenes, the game-playing tasks may be crystal for breaking up the hostile campaigns, in other scenes, the game-playing tasks may be to simulate the installation of blasting props and successful blasting, etc., which are not exemplified here.

Neural network (ARTIFICIAL NEURAL NETWORK, ANN): the human brain neural network is abstracted from the information processing perspective, a certain simple model is built, and different networks are formed according to different connection modes. The neural network is an operation model, which is formed by interconnecting a plurality of nodes (or neurons), each node represents a specific output function, called an excitation function (activation function), the connection between every two nodes represents a weighting value for the signal passing through the connection, called a weight, which is equivalent to the memory of an artificial neural network, the output of the network is different according to the connection mode of the network, the weight value and the excitation function are different, and the network itself is usually an approximation to a certain algorithm or function in nature, and can also be an expression of a logic strategy.

Referring to fig. 1, fig. 1 is a schematic diagram of a data processing system according to an embodiment of the present application. As shown in fig. 1, the data processing system may comprise a server 10 and a cluster of terminal devices. The cluster of terminal devices may comprise one or more terminal devices, the number of which will not be limited here. As shown in fig. 1, specifically, the terminal device 100a, the terminal device 100b, the terminal devices 100c, …, and the terminal device 100n may be included. As shown in fig. 1, the terminal devices 100a, 100b, 100c, …, 100n may respectively perform network connection with the above-mentioned server 10, so that each terminal device may perform data interaction with the server 10 through the network connection. Of course, the terminal devices 100a, 100b, 100c, … and 100n may communicate through a network direct connection, i.e. peer-to-peer communication between the respective terminal devices may be implemented; that is, when data interaction is required between every two terminal devices, one terminal device (i.e., a transmitting terminal device) may directly transmit data to the other terminal device (i.e., a receiving terminal).

Wherein each terminal device in the terminal device cluster may include: smart phones, tablet computers, notebook computers, desktop computers, intelligent voice interaction devices, intelligent home appliances (e.g., smart televisions), wearable devices, vehicle terminals, and other intelligent terminals with data processing functions. It should be understood that each terminal device in the terminal device cluster shown in fig. 1 may be provided with an application having a data processing function, and when the application runs in each terminal device, the application may interact with the server 10 shown in fig. 1, where the application may specifically include a game application, an entertainment application, and so on. For easy understanding, the embodiment of the present application may select one terminal device from the plurality of terminal devices shown in fig. 1 as the target terminal device. For example, in the embodiment of the present application, the terminal device 100a shown in fig. 1 may be used as a target terminal device, and an application having a data processing function may be installed in the target terminal device, where the target terminal device may implement data interaction between the application in the target terminal device and the server 10.

As shown in fig. 1, the server 10 may provide a background service for an application in a terminal device. The server 10 is an independent physical server, may be a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, cdns (Content Delivery Network, content delivery networks), basic cloud computing services such as big data and artificial intelligence platforms, and the like.

It should be appreciated that based on one of the data processing systems of FIG. 1, which may be adapted for training of an agent model in a virtual gaming environment, it will be appreciated that in order to increase entertainment in a game, some agent models, otherwise known as AI characters, are often added to many games, which agent model may be used to accompany a user-controlled player character to complete a game play to increase entertainment and interest in the game. Because of the large number of game maps in game applications, if an agent model is designed and trained for each game map, a large amount of training resources and model parameter adjustment work are required, which undoubtedly causes problems such as consuming a large amount of time, computing resources and labor cost.

In order to solve the problems, the embodiment of the application provides a general intelligent body model learning method based on reinforcement learning, which can be used on any game map by fusing models of M game maps into a unified model on the premise of not reducing performance, and M is a positive integer. It can be understood that the universal agent model trained in the embodiment of the present application may be applicable to M game maps (e.g., performing a game task in M game maps), and may even be applicable to other game maps (e.g., performing a game task in other game maps) other than M game maps. It can be appreciated that when a new and added game map is encountered, the general intelligent body model can be represented by a person without training, so that the research and development efficiency is greatly improved, the development and operation costs are reduced, and meanwhile, more stable and efficient game experience is provided for players.

Specifically, the embodiment of the application can perform fusion training on M game maps in a virtual game environment, namely training the initial intelligent body model by using sample pair game data on the M game maps, endowing the initial intelligent body model with the capability of covering the M game maps, and enabling the initial intelligent body model to learn general strategies on different game maps so as to realize that the general intelligent body model obtained by training is suitable for the M game maps. Therefore, the game playing task can be carried out in M game maps through one general intelligent agent model, even the game playing task can be carried out in a newly-added game map (the newly-added game map is different from the M game maps), and the adaptability of the general intelligent agent model can be improved.

When the M game maps are fusion trained, since the different game maps have corresponding associated absolute position information, the absolute position information may refer to map-specific features, which may include actual positions of game elements in the game map, and the like. Wherein the game elements in the game map may include virtual plants, virtual buildings, virtual vehicles, virtual animals, virtual objects, virtual rivers, virtual floors, etc. in the game map. It will be appreciated that the initial agent model has difficulty learning the general game play knowledge and general game play strategy in the M game maps because the game elements in each game map are constructed in different positions and reference points.

In order to avoid learning ambiguity and learning difficulty of an initial agent model caused by map specific features in M game maps, the embodiment of the application can reconstruct the position of absolute position information associated with the corresponding game map according to the reference position of each game map in the M game maps. Therefore, the initial intelligent agent model can be used for carrying out route searching and game matching between any two points (namely the current position point and the reference position of the initial intelligent agent model), so that the commonality among different game maps can be better captured, and the situation that the initial intelligent agent model is overfitted to a single game map is prevented.

Specifically, position reconstruction is performed on absolute position information associated with each game map according to a reference position of each game map in M game maps of a virtual game environment, map specific features in each game map are eliminated, absolute position information is converted into relative feature information, and relative position information associated with each game map is obtained. Therefore, the generalization capability of the initial intelligent agent model on different game maps can be enhanced, so that the universal intelligent agent model obtained through training can be better adapted to various game map environments. Each game map may have a corresponding reference position, where the reference position may refer to a virtual character birth place in the game map, and the positions in the M game maps may be used as reference positions. Therefore, the initial intelligent body model can be enabled to seek paths and play the game according to the current position point and the reference position of the initial intelligent body model, and commonalities among different game maps are better captured.

Meanwhile, map environment perception information corresponding to the M game maps can be determined according to map key information respectively associated with the M game maps, and the map environment perception information can be used for representing depth maps and height maps of the game maps, so that the road searching capability of the initial agent model on different game maps is enhanced. Taking the ith game map in the M game maps as an example, i is a positive integer less than or equal to M, and the map key information of the ith game map may include a map resource file of the ith game map, a reference position of the ith game map, an actual position of the initial agent model in the ith game map, and the like. Wherein, the map resource file of the ith game map can be used for restoring the ith game map, such as restoring game elements, space layout and the like in the ith game map.

Further, the initial agent model is controlled, and game tasks are executed in the M game maps according to the relative position information and the map environment perception information respectively corresponding to the M game maps and the game parameters respectively corresponding to the M game maps, so that execution results respectively corresponding to the initial agent model in the M game maps are obtained. Taking the ith game map in the M game maps as an example, the game parameters corresponding to the ith game map may include a map identifier of the ith game map, a map resource file, and a game state parameter of the initial agent model in the ith game map. The game state parameters of the initial agent model in the ith game map may include self character information, teammate character information, hostile character information, and the like of the initial agent model. Wherein the self character information of the initial agent model may include a current action in the ith game map, a current available skill, a current combat force, etc. The teammate character information may include teammate position information, teammate state information, and the like, and the adversary character information may include adversary position information, adversary state information, and the like.

Further, according to the task execution results of the initial intelligent agent model in the M game maps, parameter adjustment can be performed on model parameters in the initial intelligent agent model, the model capacity of the initial intelligent agent model after parameter adjustment is evaluated, and if the capacity upper limit is reached or the iteration time step of the initial intelligent agent model reaches the maximum iteration time step, training and storing of the final initial intelligent agent model are stopped, so that the general intelligent agent model is obtained. It can be understood that the embodiment of the application provides a unified scheme of the multi-game map model, and a universal agent model is obtained through fusion training of a plurality of game maps, the single universal agent model can adapt to a plurality of game maps, and the universal agent model has higher universality and adaptability, reduces the training cost of the agent model and improves the training efficiency of the agent model. Therefore, one agent model does not need to be trained for each game map, the training cost and training efficiency of the agent model can be reduced, and the development and operation cost can be reduced. It can be understood that the trained general agent model can be used for representing the agent model which is not input into a single game map for special training on each game map, and has higher model effect.

In addition, because the general agent model learns general game knowledge and general game strategies in M game maps, when a new game map appears, the game task can be directly executed by the general agent model in the new game map without or with little training. It can be understood that the embodiment of the application can apply the general game knowledge and the general game strategy learned in M game maps to the newly added game map without training the general agent model based on the newly added game map, thereby realizing the execution of game tasks in the newly added game map based on the general agent model. Of course, in order to further improve the performance of the general intelligent agent model in the newly added game map, only a small amount of training is needed to be performed on the general intelligent agent model based on the sample pair game data of the newly added game map, so that a general intelligent model with higher performance can be obtained. In addition, training the general agent model can greatly improve training efficiency relative to an agent model with zero foundation. It can be understood that the general intelligent body model has higher applicability, and the road searching capability and the fight capability on the newly-added game map can be realized through the general intelligent body model, so that the image expanding efficiency is improved.

Fig. 2 is a schematic diagram of a general agent model training manner according to an embodiment of the present application, as shown in fig. 2, a terminal device 201a, a terminal device 202a, a terminal device 203a, etc. in a terminal device cluster 20a may be terminal devices of a terminal device cluster in an embodiment corresponding to fig. 1, and a server 20b shown in fig. 2 may be a server 10 in an embodiment corresponding to fig. 1. As shown in fig. 2, the terminal device 201a, the terminal device 202a, and the terminal device 203a in the terminal device cluster 20a may obtain sample game data of the initial agent model in M game maps. The sample game data may include map key information, absolute position information, game parameters, and the like for each of the M game maps. Taking the ith game map in the M game maps as an example, the map key information of the ith game map may include a map resource file of the ith game map, a reference position of the ith game map, and an actual position of the initial agent model in the ith game map. The map resource file of the ith game map may be used to restore the game scene of the ith game map, such as restoring game elements, spatial layout, etc. in the game scene of the ith game map. The reference position of the ith game map may be the place of birth of the virtual character in the ith game map, or may be M game maps or a position shared by each game map, and may be specifically set according to specific situations.

Wherein the absolute position information of the ith game map may include an actual position of the game element in the ith game map and an actual position of the initial agent model in the ith game map. The game parameters of the ith game map may include a map identification of the ith game map of the initial agent model in the ith game map, a map resource file, and a game play status parameter of the initial agent model in the ith game map. The game state parameters of the initial agent model in the ith game map may include self character information, teammate character information, hostile character information, and the like of the initial agent model. Wherein the self character information of the initial agent model may include a current action in the ith game map, a current available skill, a current combat force, etc. The teammate character information may include teammate position information, teammate state information, and the like, and the adversary character information may include adversary position information, adversary state information, and the like.

The server 20b may reconstruct the absolute position information of each of the M game maps based on the reference position of each of the M game maps to obtain the relative position information of each of the M game maps. It can be understood that the absolute position information of each game map is a map specific feature of the game map, and when the map specific feature exists in the game map, the initial agent model has the problems of learning ambiguity and learning difficulty when learning in M game maps, namely, the general knowledge and general strategy between different game maps are difficult to learn. It will be appreciated that since the actual positions of the game elements in the different game maps are determined based on the different position building methods, i.e. the actual positions of the game elements in the different game maps are absolute, learning ambiguity may occur when the initial agent model determines the game strategy and the game action from the actual positions of the game elements in the different game maps.

For example, an initial agent model may generate ambiguity when the target location in a first one of the M game maps learns a different game play strategy and game play than the target location in a second one of the M game maps. Wherein the target position in a first one of the M game maps is the same as the target position (i.e., the position coordinate information is the same) in a second one of the M game maps.

Therefore, the server 20b performs position reconstruction on the absolute position information of each game map based on the reference position of each game map, resulting in the relative position information of each game map. For example, taking the reference position as the virtual character place of birth as an example, the server 20b may convert the absolute position information of each game map into relative position information between the absolute position information of each game map and the virtual character place of birth based on the virtual character place of birth in each game map. Thus, the initial agent model learns the game strategy and game action in each game map based on the position information between the current position of the initial agent model and the reference position. Therefore, the map specific features of each game map are converted into the map general features, so that the commonalities among different game maps can be better captured by the initial intelligent body model, the problem that learning ambiguity or learning difficulty occurs to the initial intelligent body model due to the map specific features of M game maps can be avoided, and further, the situation that the map specific features are fitted into a single game map can be prevented, and the training accuracy of the general intelligent body model obtained through training can be improved.

The server 20b may generate map environment sensing information corresponding to M game maps, respectively, according to map key information of each game map included in the sample game data, where the map environment sensing information may include a path structure map and mapping position information corresponding to an initial agent in the M game maps, respectively. The path structure diagram may refer to a graph diagram structure, which is a many-to-many nonlinear structure and is composed of vertices (vertexes) and edges (edges). Vertices represent elements in the graph, and edges represent connection relationships between vertices. The graph structure can be classified into a directed graph structure and an undirected graph structure according to directionality of edges. Edges in the directed graph structure have directions, representing unidirectional connections from one vertex to another; while the edges in the undirected graph structure have no direction, representing a bi-directional connection between the two vertices.

The server 20b may convert each game map into a graph structure, so that the initial agent model may better understand and operate the game environment, that is, better understand the game environment corresponding to each game map, so as to help compensate for the problem of loss of perception of the game environment caused by migration of the game map, and improve the generalization capability of the trained general agent model on M game maps and the newly added game map (different from M game maps). The training method can enable the universal agent model obtained through training to be applied to M game maps and newly-added game maps without training corresponding agent models aiming at each game map, so that training efficiency of the agent model can be improved, and training cost of the agent model can be reduced.

Further, the server 20b may control the initial agent model, and execute the game tasks corresponding to the M game maps in the M game maps according to the relative position information and the map environment sensing information corresponding to the M game maps, respectively, and the game parameters corresponding to the M game maps, so as to obtain task execution results corresponding to the initial agent model in the M game maps, respectively. Specifically, through the initial agent model, according to the relative position information and the map environment perception information respectively corresponding to the M game maps and the game parameters respectively corresponding to the M game maps, determining a game play strategy and a game play action respectively corresponding to the initial agent model in the M game maps, and executing the game play action and the game play strategy to execute the game play tasks in the M game maps, so as to obtain task execution results respectively corresponding to the initial agent model in the M game maps.

The server 20b may detect whether the initial intelligent object model completes the game task, the quality of the game task, and the like according to the task execution results of the initial intelligent object model corresponding to the M game maps, and then perform parameter adjustment on model parameters in the initial intelligent object model according to the detection of whether the initial intelligent object model completes the game task, the quality of the game task, and the like until the performance of the initial intelligent object model reaches the performance threshold, or the iteration time step of the initial intelligent object model reaches the target iteration time step (for example, the target iteration time step is the set maximum iteration time step), so as to obtain the general intelligent object model. It can be appreciated that the embodiment of the application provides a unified scheme of the multi-game map model, and a general intelligent body model is obtained through fusion training of the sample game data of a plurality of game maps, and can adapt to the game maps, and the general intelligent body model has higher universality and adaptability, so that the training cost of the intelligent body model can be reduced, and the training efficiency of the intelligent body model can be improved. Therefore, one agent model does not need to be trained for each game map, the training cost of the agent model can be reduced, the training efficiency of the agent model can be improved, and the development and operation cost can be reduced. It can be understood that the trained general agent model can be represented on each game map by an agent model which is not specially trained by a single game map, and has a higher model effect.

Therefore, according to the embodiment of the application, the initial intelligent body model is trained through the sample game data of the M game maps, the map specific features in each game map are removed, the map specific features of each game map are converted into the map general features, so that the initial intelligent body model can better capture the commonalities among different game maps, the problem that learning ambiguity or learning difficulty occurs to the initial intelligent body model due to the map specific features of the M game maps can be avoided, and further, the situation that the initial intelligent body model is excessively fitted into a single game map can be prevented, and the training accuracy of the general intelligent body model obtained through training can be improved. Meanwhile, map environment perception information of each game map is generated, so that the initial intelligent agent model can better understand and operate the game environment, namely better understand the game environment corresponding to each game map, the problem of game environment perception loss caused by game map migration is facilitated to be compensated, and generalization capability of the general intelligent agent model obtained through training on M game maps and newly added game maps (different from the M game maps) can be improved. The training method can enable the universal agent model obtained through training to be applied to M game maps and newly-added game maps without training corresponding agent models aiming at each game map, so that training efficiency of the agent model can be improved, and training cost of the agent model can be reduced. And training the initial intelligent body model through the relative position information and the map environment perception information respectively corresponding to the M game maps, the game parameters respectively corresponding to the M game maps and the game tasks respectively corresponding to the M game maps to obtain a general intelligent body model. Thus, the training efficiency of the agent model can be improved and the training cost of the agent model can be reduced.

Further, referring to fig. 3, fig. 3 is a flow chart of a data processing method according to an embodiment of the application. As shown in fig. 3, the method may be performed by any terminal device in fig. 1, may be performed by the server 10 in fig. 1, or may be performed by both the terminal device and the server in fig. 1, and the apparatus for performing the data processing method in the present application may be collectively referred to as a computer apparatus. Wherein the data processing method may include, but is not limited to, the following steps:

S101, carrying out position reconstruction on absolute position information associated with each game map according to a reference position of each game map in M game maps of the virtual game environment, and obtaining relative position information associated with each game map.

Specifically, the embodiment of the application provides a unified scheme of a multi-game map model, and a general intelligent body model is obtained through fusion training of the sample game data of a plurality of game maps, can adapt to the game maps, has higher universality and adaptability, and can reduce the training cost of the intelligent body model and improve the training efficiency of the intelligent body model. Therefore, one agent model does not need to be trained for each game map, the training cost of the agent model can be reduced, the training efficiency of the agent model can be improved, and the development and operation cost can be reduced. It can be understood that the trained general agent model can be used for representing the agent model which is not input into a single game map for special training on each game map, and has higher model effect. Specifically, because of the location construction manner adopted by the M game maps in the virtual game environment when being constructed, the actual location of the game element of each of the M game maps is absolute location information, which is a map-specific feature of each game map. When map specific features exist in the game maps, the initial agent model has the problems of learning ambiguity and difficult learning when learning in M game maps, namely, the general knowledge and general strategy between different game maps are difficult to learn.

Thus, the computer device may determine a reference position for each of the M game maps, which may be a virtual character birth point in each game map, or a game end point in each game map, or a position point in each game map, or the like, may be specifically determined according to the specific situation.

Further, the computer device may perform position reconstruction on the absolute position information associated with each game map according to the reference position of each game map, to obtain the relative position information associated with each game map. It will be appreciated that taking the ith game map as an example, the computer device may obtain absolute position information associated with the ith game map and a position difference between the absolute position information and a reference position of the ith game map, to obtain relative position information of each game map. Therefore, the initial agent model can find a way and play a game between any two points (namely the current position point and the reference position of the initial agent model), so that the commonality among different game maps can be better captured, and the situation that the initial agent model is overfitted to a single game map is prevented. It can be understood that the relative position information of each game map can help the initial intelligent agent model learn the general information among different game maps, help the initial intelligent agent model better understand the similarity and the difference among different game maps, and improve the generalization capability of the general intelligent agent model obtained by training in a multi-game map environment.

Optionally, taking an ith game map in the M game maps as an example, i is a positive integer less than or equal to M, the absolute position information associated with the ith game map includes first absolute position information and second absolute position information, the first absolute position information reflects an actual position of a game element in the ith game map in the virtual game environment, and the second absolute position information reflects an actual position of the initial agent model in the ith game map. The specific manner of the computer device for reconstructing the position of the absolute position information associated with each game map according to the reference position of each game map in the M game maps of the virtual game environment to obtain the relative position information associated with each game map may include: and carrying out position reconstruction on the first absolute position information according to the reference position of the ith game map to obtain the relative position information of the game elements in the ith game map. And carrying out position reconstruction on the second absolute position information according to the reference position of the ith game map to obtain the relative position information of the initial intelligent agent model in the ith game map. And determining the relative position information of the game elements in the ith game map and the relative position information of the initial agent model in the ith game map as the relative position information associated with the ith game map.

Specifically, the computer device may obtain, as the relative position information of the game element in the i-th game map, the position difference between the reference position of the i-th game map and the first absolute position information, that is, convert the first absolute position information into the relative position information. Similarly, the computer device may obtain the position between the reference position of the ith game map and the second absolute position information, so as to obtain the relative position information of the initial agent model in the ith game map, i.e. convert the second absolute position information into the relative position information. And determining the relative position information of the game elements in the ith game map and the relative position information of the initial agent model in the ith game map as the relative position information associated with the ith game map. Therefore, the initial agent model can conveniently learn the relative position information between any current point in the game map and the reference position in the M game maps, and further learn general knowledge and general strategies based on the relative position information. The initial intelligent agent model can find routes and make a game between any two points (namely the current position point and the reference position of the initial intelligent agent model), so that the commonality among different game maps can be better captured, and the situation that the initial intelligent agent model is overfitted to a single game map is prevented. Therefore, the generalization capability of the initial intelligent agent model on different game maps can be enhanced, so that the universal intelligent agent model obtained through training can be better adapted to various game map environments.

Optionally, the specific manner of obtaining the relative position information of the game element in the ith game map by the computer device performing position reconstruction on the first absolute position information according to the reference position of the ith game map may include: and acquiring a position difference value between the reference position of the ith game map and the actual position of the game element in the ith game map in the virtual game environment. The position difference value is determined as the relative position information of the game element in the ith game map.

Specifically, the computer device may acquire, using the reference position of the ith game map as a reference point, a position difference between the reference position of the ith game map and an actual position of a game element in the ith game map in the virtual game environment, and determine the position difference as relative position information of the game element in the ith game map. In particular, the computer device may determine a coordinate difference between coordinates of the reference position of the ith game map and coordinates of an actual position of the game element in the ith game map in the virtual game environment as the relative position information of the game element in the ith game map.

For example, taking the reference position of the ith game map as the virtual character birth place as an example, the coordinates of the virtual character birth place in the ith game map may be (2, 3, 1), and if the coordinates of the actual position of a certain virtual plant in the ith game map are (6,9,4), the computer device may acquire the coordinate difference between the coordinates (6,9,5) and the coordinates (2, 3, 1), that is, (4,6,3), and determine the coordinate difference (4,6,3) as the relative position information of a certain virtual plant in the ith game map. Of course, the relative position information of the virtual character birth place in the ith game map is (0, 0).

Similarly, the specific manner of the computer device performing position reconstruction on the second absolute position information according to the reference position of the ith game map to obtain the relative position information of the initial agent model in the ith game map may include: the computer device may obtain a model position difference between the reference position of the ith game map and the actual position of the initial agent model in the ith game map, and determine the model position difference as the relative position information of the initial agent model in the ith game map. Specifically, the computer device may determine, as the relative position information of the initial agent model in the i-th game map, a coordinate difference between the coordinates of the reference position of the i-th game map and the coordinates of the actual position of the initial agent model in the i-th game map.

S102, according to map key information respectively associated with the M game maps, map environment perception information respectively corresponding to the M game maps is determined.

Specifically, in order to enhance the routing capability of the initial agent model on different game maps and make up for the problem of game environment perception loss caused by the migration of the game maps, the computer device may determine map environment perception information corresponding to the M game maps according to map key information associated with the M game maps respectively. Through map environment perception information, the initial intelligent agent model can better know the position information of the initial intelligent agent model in the game map and the surrounding environment information, so that the initial intelligent agent model is helped to learn the game playing commonalities from each game map, more information can be obtained than that of a single game map, and the game playing capacity of the initial intelligent agent model is improved.

Optionally, the map key information associated with the ith game map in the M game maps includes a map resource file of the ith game map, a reference position of the ith game map, and an actual position of the initial agent model in the ith game map. The specific manner of determining the map environment perception information corresponding to the M game maps respectively according to the map key information associated with the M game maps respectively by the computer device may include: and extracting the walkable region corresponding to the ith game map from the map resource file corresponding to the ith game map. And generating a target path structure diagram corresponding to the ith game map according to the walkable region corresponding to the ith game map. And determining the mapping position information of the initial intelligent agent model in the target path structural diagram according to the reference position of the ith game map and the actual position of the initial intelligent agent model in the ith game map. And determining the mapping position information of the target path structure diagram and the initial intelligent agent model in the target path structure diagram as map environment perception information corresponding to the ith game map.

Specifically, the computer device may extract the walkable region corresponding to the ith game map from the map resource file of the ith game map. The walkable region is a region where a virtual character in the game application can walk, namely, an agent reachable region, for example, the walkable region can comprise a surface which can walk on the ground, stairs, a platform and the like, and which regions are walkable regions can be set when the game map is created. Further, the computer device may generate a target path structure diagram corresponding to the ith game map according to the walkable region corresponding to the ith game map. The target path structure diagram is the path structure diagram of the ith game map, and in this way, the path structure diagram of each of the M game maps can also be generated. The path structure diagram may be a graph structure diagram, which is used to reflect a path that the intelligent agent model can walk on the game map.

The computer device may map the actual position of the initial agent model in the ith game map to the target path structure map, and determine the mapping position information of the initial agent model in the target path structure map according to the reference position of the ith game map. The computer device may determine map environment awareness information corresponding to the ith game map as the map environment awareness information corresponding to the ith game map. Thus, the initial agent model can be convenient to know the position of the initial agent model in the game map, and the road searching capability in the game map is improved. Meanwhile, the learning route searching and fight tasks of the intelligent agent model in the single game map in the traditional technology can be converted into the route searching and fight tasks at any two positions in the path structure map through the map environment perception information, so that the difference between the game maps can be greatly simplified, and the generalization capability of the general intelligent agent model obtained through training on different game maps and newly added game maps is improved.

Optionally, the specific manner of generating the target path structure diagram corresponding to the ith game map by the computer device according to the walkable region corresponding to the ith game map may include: and determining the walkable path of the initial intelligent agent model in the ith game map according to the walkable region in the ith game map. The walking direction of the walkable path is obtained, and a directed path structure diagram corresponding to the ith game map is generated according to the walkable path and the walking direction of the walkable path. And cutting the directed path structure diagram corresponding to the ith game map to obtain a target path structure diagram corresponding to the ith game map.

Specifically, the computer device may analyze the walkable region in the ith game map, determine a walkable path of the initial agent in the ith game map, and further generate an initial path structure diagram based on the walkable path in the ith game map, where the initial path structure diagram may be a directed path result diagram or an undirected structure diagram. When the initial path structure diagram is the directed path result diagram, since some paths can only walk in one direction, the initial path structure diagram can be the directed path result diagram, and the computer equipment can acquire the walking direction of the walkable path in the ith game map and determine the key points in the ith game map, such as corners and crossing points. The computer device may set up to generate a directed path structure diagram corresponding to the i-th game map based on the walkable path in the i-th game map, the walking direction of the walkable path, and the key point in the i-th game map.

Further, the computer device may extract a critical path in the directional path structure diagram corresponding to the ith game map, and based on the extracted critical path, perform a clipping operation on the directional path structure diagram corresponding to the ith game map, to obtain a target path structure diagram corresponding to the ith game map. And cutting and pruning the directed path structure diagram corresponding to the ith game map, thereby simplifying map representation and reducing calculation complexity.

The computer device may generate a target path structure diagram corresponding to the ith game map according to the map resource file of the ith game map using a navigation grid (NavMesh) generation algorithm. In particular, the computer device may divide the game scene corresponding to the map resource file of the ith game map into a series of triangular meshes (or "polygons"). Further simplifying the triangle grids, removing details which are not important for navigation, obtaining simplified triangle grids, and converting the simplified triangle grids into a continuous collision-free navigation grid. The computer device may use a navigation grid (NavMesh) generation algorithm to detect whether the initial agent model is able to navigate triangles in the grid, mark the non-passing triangles of the initial agent model as non-passing (e.g., the non-passing triangles are a wall or area of an obstacle), and remove the non-passing triangles from the navigation grid to obtain a processed navigation grid. The computer device may optimize the processed navigation grid to improve the efficiency of the movement of the treatment agent model on the processed navigation grid. For example, the shape and the size of the triangle in the processed navigation grid can be adjusted, or additional connection points are added, so that the intelligent body model can be smoothly moved from one place to another place, and the optimized navigation grid can obtain a target path structure diagram of the ith game map.

Optionally, the specific manner of determining the mapping position information of the initial agent model in the target path structure diagram according to the reference position of the ith game map and the actual position of the initial agent model in the ith game map by the computer device may include: and mapping the actual position of the initial intelligent agent model in the ith game map to the target path structure diagram to obtain a first mapping position of the initial intelligent agent model in the target path structure diagram. Mapping the reference position of the ith game map to the target path structure diagram to obtain a second mapping position of the reference position of the ith game map in the target path structure diagram. And obtaining the mapping position distance between the first mapping position and the second mapping position in the target path structure diagram. And determining the mapping position distance as the mapping position information of the initial intelligent agent model in the target path structure diagram.

Specifically, the computer device may map the actual position of the initial agent model in the ith game map to the target path structure diagram, to obtain the first mapped position of the initial agent model in the target path structure diagram. It may be appreciated that, since the target path structure map is generated based on the map resource file of the ith game map, for example, the size of the target path structure map may be the reduced size of the ith game map, and the corresponding position may be determined in the target path structure map based on the actual position of the initial agent model in the ith game map, so as to obtain the first mapping position of the initial agent model in the target path structure map.

Similarly, the computer device may map the reference position of the ith game map to the target path structure diagram to obtain a second mapped position of the reference position of the ith game map in the target path structure diagram. In this way, the first mapping position and the second mapping position are both two positions in the target path structure diagram, and the computer device can acquire the mapping position distance between the first mapping position and the second mapping position in the target path structure diagram, and determine the mapping position distance as the mapping position information of the initial intelligent agent model in the target path structure diagram. Therefore, the way finding and the game matching of the initial intelligent agent model between any two points in any path structure diagram can be realized, the difference between different game maps is greatly simplified, and the capabilities of the initial intelligent agent model in different game maps and newly added game maps are improved.

Of course, the computer device may determine depth information, altitude information and light projection information of the ith game map according to the map resource file of the ith game map, and determine the depth information, altitude information, light projection information, the mapping position information of the target path structure diagram and the initial agent model in the target path structure diagram as map environment sensing information corresponding to the ith game map. In this way, further, the routing ability of the initial agent model is improved.

Fig. 4a is a schematic diagram of a map resource file according to an embodiment of the present application, where fig. 4a shows a map resource file of an ith game map, and it can be seen that, in the map resource file of the ith game map shown in fig. 4a, a game scene of the ith game map can be restored, for example, information such as a spatial structure and an actual position of a game element (such as a virtual building, a virtual object, a virtual plant) in the ith game map is shown. Fig. 4b is a schematic view of a walkable area according to an embodiment of the present application, and fig. 4b shows a walkable area in the ith game map. Specifically, when the ith game map is constructed, which areas are walkable areas and which areas are non-walkable areas are set, and the computer equipment can generate walkable areas in the ith game map according to walkable information recorded in a map resource file of the ith game map. Fig. 4c is a schematic diagram of an initial path structure diagram provided by the embodiment of the present application, and fig. 4c shows an initial path structure diagram of an ith game map, where the initial path structure diagram of the ith game map may be an undirected structure diagram or a directed path structure diagram. Wherein the computer device may determine an initial path structure map of the ith game map from the walkable region in the ith game map by means of a navigation grid generation algorithm. Fig. 4d is a schematic diagram of a target path structure diagram provided by the embodiment of the present application, and fig. 4d is a schematic diagram of a target path structure diagram of an ith game map, where fig. 4d shows a critical path in the ith game map and a node number in the critical path, and compared with an initial path structure diagram, the calculation complexity of an initial intelligent agent model can be reduced.

And S103, controlling the initial agent model, and executing the game task in the M game maps according to the relative position information and the map environment perception information respectively corresponding to the M game maps and the game parameters respectively corresponding to the M game maps to obtain the task execution results respectively corresponding to the initial agent model in the M game maps.

Specifically, the computer device may input the relative position information and the map environment awareness information corresponding to the M game maps, the game parameters corresponding to the M game maps, and the game tasks corresponding to the M game maps, respectively, to the initial agent model. Controlling an initial intelligent agent model, determining predicted game actions and predicted game strategies in M game maps according to relative position information and map environment perception information respectively corresponding to the M game maps, game parameters respectively corresponding to the M game maps and game tasks respectively corresponding to the M game maps, and executing game tasks in the M game maps according to the predicted game actions and the predicted game strategies to obtain task execution results respectively corresponding to the initial intelligent agent model in the M game maps. It will be appreciated that the initial agent model may automatically determine actions and policies currently to be performed based on the entered information to perform the corresponding game tasks.

Optionally, taking an ith game map in the M game maps as an example, game parameters of the ith game map in the M game maps include a map identifier of the ith game map, a map resource file, and a game state parameter of the initial agent model in the ith game map. The specific manner in which the computer device controls the initial agent model to perform the game play tasks in the M game maps may include: and removing the actual positions of game elements included in the map resource file of the ith game map in the virtual game environment to obtain the universal map resource file of the ith game map. And carrying out feature preprocessing on the universal map resource file, the relative position information, the map identification and the map environment perception information corresponding to the ith game map and the game play state parameters of the initial intelligent agent model in the ith game map to obtain game play features corresponding to the ith game map. And executing a game play task in the ith game map according to the game play characteristics to obtain a task execution result corresponding to the ith game map by the initial agent model.

Specifically, in order to avoid the problems of learning ambiguity and difficulty in learning caused by the initial agent model due to the absolute position information feature of each of the M game maps, the computer device may remove the actual positions of the game elements included in the map resource file of the ith game map in the virtual game environment, to obtain the universal map resource file of the ith game map. In this way, the initial agent model can better capture the commonalities among different game maps, and prevent overfitting to a single game map. Further, the computer equipment can perform feature preprocessing on the universal map resource file, the relative position information, the map identification and the map environment perception information corresponding to the ith game map through the initial intelligent agent model, and the game state parameters of the initial intelligent agent model in the ith game map to obtain game features corresponding to the ith game map.

The game state parameters of the initial intelligent agent model in the ith game map can comprise self role information, teammate role information, hostile role information and the like of the initial intelligent agent model. Wherein the self character information of the initial agent model may include a current action in the ith game map, a current available skill, a current combat force, etc. The teammate character information may include teammate position information, teammate state information, and the like, and the adversary character information may include adversary position information, adversary state information, and the like. Further, the computer device can analyze game features through the initial agent model to execute game tasks in the ith game map, and a task execution result corresponding to the ith game map by the initial agent model is obtained.

Optionally, specific ways of performing feature preprocessing by the computer device may include: and embedding the map identification of the ith game map into the relative position information associated with the ith game map through a perception layer in the initial intelligent agent model to obtain the embedded relative position characteristic corresponding to the ith game map. And performing feature conversion on the universal map resource file and map environment perception information corresponding to the ith game map and the game state parameters of the initial intelligent agent model in the ith game map to obtain initial game features corresponding to the ith game map. And splicing the embedded relative position features and the initial game feature to obtain game feature corresponding to the ith game map.

Specifically, the initial agent model includes a sensing layer, which may be an MLP (i.e., a multi-layer sensor, multilayer Perceptron), a common feed-forward neural network (Feedforward Neural Network). MLPs consist of multiple fully connected layers (also known as densely connected layers or linear layers) that typically use nonlinear activation functions between them to increase the complexity of the model so that more complex functions can be approximated. The basic structure of the MLP is as follows: input Layer (Input Layer): raw data is received as input. Concealing layer (HIDDEN LAYERS): the layers between the input layer and the output layer for learning the representation of the data, the MLP may have one or more hidden layers. Output Layer (Output Layer): a final output of the model is generated. During training, the MLP uses a back propagation algorithm (Backpropagation) and gradient descent (GRADIENT DESCENT) to update weights and biases to minimize loss functions (e.g., mean square error, cross entropy, etc.).

Specifically, the sensing layer included in the initial agent model includes a first sensing layer, a second sensing layer and a third sensing layer, and the computer device can perform feature conversion on the map identifier of the ith game map through the first sensing layer in the initial agent model to obtain the map identifier feature of the ith game map. Meanwhile, the computer equipment can perform feature conversion on the relative position information associated with the ith game map through a second perception layer in the initial agent model to obtain the relative position feature associated with the ith game map. Further, the computer equipment can embed the map identification feature of the ith game map into the relative position feature associated with the ith game map through the third perception layer in the initial intelligent agent model, so as to obtain the embedded relative position feature corresponding to the ith game map. In this way, the initial agent model is facilitated to better discern which game map is in.

Meanwhile, the computer equipment can carry out convolution processing on the universal map resource file corresponding to the ith game map and the map environment perception information through the convolution layer in the initial agent model to obtain map environment characteristics in the ith game map. The computer equipment can extract attention characteristics of the game state parameters of the initial intelligent agent model in the ith game map through the attention layer in the initial intelligent agent model to obtain character perception characteristics in the ith game map. Further, the computer device may splice the map environment feature in the ith game map and the character perception feature in the ith game map to obtain the initial game feature corresponding to the ith game map. The computer device may splice the embedded relative position feature and the initial game feature to obtain a game feature corresponding to the ith game map.

Optionally, the specific manner in which the computer device performs the game play task in the ith game map according to the game play feature may include: and removing invalid features in the game play features through a neural network layer in the initial intelligent agent model to obtain valid game play features. And screening important game features from the effective game features, and generating state update parameters according to the important game features. And updating the state of the memory unit in the neural network according to the state updating parameters to obtain the updated state of the memory unit. And determining the predicted game action and the predicted game strategy of the initial intelligent body model in the ith game map according to the updated memory unit state and the effective game characteristics. And executing the game play task on the ith game map according to the predicted game play action and the predicted game play strategy to obtain a task execution result corresponding to the ith game map.

The initial agent model includes a neural network layer, which may be an LSTM network (Long Short-Term Memory network), which is a special cyclic neural network (RNN) architecture designed to solve the Long-Term dependency problem encountered by the conventional RNN when processing sequence data. LSTM can learn long-term dependencies by introducing the concept of "gates" to control the flow of information. These doors include: input Gate (Input Gate): deciding which new information is to be stored in the cell state; forget gate (Forget Gate): deciding which information is to be forgotten or discarded from the cell state; output Gate (Output Gate): how the information in the control unit state is output to the current output of the LSTM.

Specifically, the computer device may obtain the effective game play feature by removing the ineffective feature from the game play feature through the forgetting gate included in the neural network layer in the initial agent model. And screening important game features from the effective game features through input gates included in the neural network layer, and generating state update parameters according to the important game features. And updating the state of the memory unit in the neural network according to the state updating parameters through an output gate included in the neural network layer to obtain the updated state of the memory unit. And then according to the updated memory unit state and the effective game play characteristics, determining the predicted play action and the predicted play strategy of the initial intelligent body model in the ith game map. Specifically, the computer device may predict, through the initial agent model, a game play action of the initial agent model in each of the M game frames and a game play policy of the next game frame until the game play is ended. It can be understood that, through the initial agent model, the game play action of each game frame is automatically predicted in the M game maps, and the game play action of each game frame is executed until the game play is finished, so as to obtain the task execution results respectively corresponding to the M game maps.

It will be appreciated that taking the ith game map as an example, when the initial agent model is controlled to execute the game task in the ith game map, the initial agent model may determine the game action and the game policy at the second moment according to the relative position information, the map environment awareness information, the game parameters and the game task at the first moment, where the second moment is the next moment of the first moment. And executing the game action at the second moment in the ith game map to obtain game environment feedback information, wherein the environment feedback information indicates the relative position information, map environment perception information and game parameters of the initial intelligent agent model at the second moment, and the relative position information, map environment perception information and game parameters of the initial intelligent agent model at the first moment are changed. The initial agent model can predict the game action and the game strategy at the third moment according to the relative position information, the map environment perception information, the game parameters and the game strategy of the initial agent model at the second moment. Executing the game playing action and game playing strategy at the third moment in the ith game map, continuously obtaining game environment feedback information, and continuously playing until the game playing is finished, and obtaining track information of the initial agent model and a task execution result.

And S104, carrying out parameter adjustment on model parameters in the initial intelligent agent model according to task execution results respectively corresponding to the M game maps to obtain the general intelligent agent model.

Specifically, the computer device may detect, according to task execution results corresponding to the M game maps, a game ability of the initial agent model in the M game maps, and perform parameter adjustment on model parameters in the initial agent model by using a reinforcement learning algorithm, to obtain a general agent model. The reinforcement learning algorithm may be a PPO algorithm, an A3C algorithm, DDPG algorithm, or the like. Among other things, PPO (i.e., proximal Policy Optimization) is a policy gradient approach that limits the difference between new and old policies at each update, thereby avoiding instability caused by excessive policy updates. The A3C algorithm (Asynchronous Advantage Actor-critic) is an asynchronous reinforcement learning algorithm based on the actor-critic architecture, which uses multiple parallel environments to collect experience simultaneously and update network parameters in an asynchronous manner. A3C combines a value function estimate (Critic) and a policy function estimate (Actor), where the value function is used to evaluate the value of the state and the policy function is used to select the action. The DDPG algorithm (DEEP DETERMINISTIC Policy Gradient) is a deterministic strategy Gradient method based on deep learning, which uses a deep neural network to approach a value function and a strategy function, and selects actions by introducing deterministic strategies into the strategy network, so that the problem of sampling in a continuous action space is avoided.

Therefore, a general intelligent body model is obtained through sample game data fusion training of a plurality of game maps, the general intelligent body model can adapt to the game maps, has higher universality and adaptability, and can reduce training cost of the intelligent body model and improve training efficiency of the intelligent body model. It can be understood that the universal agent model trained in the embodiment of the application can be applied to any other game map with the same game mode as the M game maps, and when a newly-added game map with the same game mode as the M game maps is encountered, the universal agent model can be represented by a person without training, so that the research and development efficiency is greatly improved, the development and operation cost is reduced, and meanwhile, more stable and efficient game experience is provided for players.

Optionally, the specific manner in which the computer device performs parameter adjustment on the model parameters in the initial agent model may include: generating game return benefits for reflecting the task execution quality of the initial agent model in the ith game map according to the task execution result corresponding to the ith game map in the M game maps; i is a positive integer less than or equal to M. And determining a parameter updating gradient of the initial intelligent agent model according to game return benefits respectively corresponding to the M game maps and the reinforcement learning function corresponding to the initial intelligent agent model. And according to the parameter updating gradient, carrying out parameter adjustment on the model parameters in the initial intelligent body model to obtain the initial intelligent body model after parameter adjustment. And if the initial intelligent body model after parameter adjustment meets the training stopping condition, determining the initial intelligent body model after parameter adjustment as an intelligent body model.

Specifically, taking an ith game map in the M game maps as an example, the task execution result corresponding to the ith game map may include a task execution condition, a task execution duration, a final state of the initial agent model, and the like, and the task execution condition corresponding to the ith game map may be used to reflect success or failure of executing the game task of the ith game map. The computer device may generate a game return benefit for reflecting the task execution quality of the initial agent model in the ith game map according to the task execution situation, the task execution time length, the final state of the initial agent model, and the like corresponding to the ith game map. It will be appreciated that the higher the game return benefit, the successful execution of the task by the initial agent model in the ith game map, the shorter the duration of task execution than the duration threshold, and the final state of the initial agent model being better than the target state.

Further, the computer device may comprehensively determine a parameter update gradient of the initial agent model based on game return benefits corresponding to the M game maps, respectively, and a reinforcement learning function corresponding to the initial agent model. And according to the parameter updating gradient, carrying out parameter adjustment on model parameters in the initial intelligent body model to obtain an initial intelligent body model after parameter adjustment, and if the initial intelligent body model after parameter adjustment meets the training stop condition, determining the initial intelligent body model after parameter adjustment as the intelligent body model. The training stop condition may be that the performance of the initial intelligent agent model reaches a performance threshold, or that the iteration time step of the initial intelligent agent model reaches a target iteration time step (e.g., the target iteration time step is a set maximum iteration time step).

Of course, the computer device may determine a parameter update gradient corresponding to each game map according to the game return benefit corresponding to each game map of the M game maps and the reinforcement learning function corresponding to the initial intelligent agent model, and further perform parameter adjustment on the model parameters in the initial intelligent agent model according to the parameter update gradient corresponding to each game map, that is, perform parameter update on the initial intelligent agent model according to the M game maps.

Fig. 5 is a schematic diagram of a multi-map fusion training general-purpose intelligent agent model according to an embodiment of the present application, and as shown in fig. 5, a computer device may copy an operation script of an initial intelligent agent model to obtain a plurality of operation scripts of the initial intelligent agent model, and operate the operation scripts of the initial intelligent agent model in each of M game maps through CPUs (i.e., CPU1, CPU2, …, CPUn) of the computer device. Taking the ith game map as an example, running scripts of the initial intelligent agent model in the ith game map, wherein the computer equipment can set game parameters (such as setting a target position of the ith game map and setting a game state parameter of the initial intelligent agent model) for the initial intelligent agent model, and in the ith game map, the initial intelligent agent model is controlled to play by itself, so that game data of the initial intelligent agent model on the ith game map is obtained and used as game map data i. The game map data i comprises absolute position information associated with an ith game map, a map resource file of the ith game map, a map identifier of the ith game map, and corresponding state parameters of N intelligent agents in the ith game map respectively.

Further, the computer device may package the game map data of the M game maps, i.e. the game map data 1 (i.e. the game map data of the initial agent model in the first game map), the game map data 2 (i.e. the game map data of the initial agent model in the second game map), …, the game map data M (i.e. the game map data of the initial agent model in the mth game map), to obtain sample game data, i.e. the map sample data 1, the game map sample data 2, …, and the game map sample data M. And sending the sample game data to GPUs of the computer equipment, and carrying out position reconstruction on absolute position information associated with each game map according to the reference position of each game map in M game maps of the virtual game environment through the GPUs of the computer equipment to obtain relative position information associated with each game map. The data transmission tool may be mempool, and mempool is a abbreviation of "Memory Pool", which is a Memory management technology. In the kernel, the memory pool is typically used as a backup cache to ensure that critical applications can still successfully apply for memory in the event of memory stress. Sample pair office data may be stored mempool, retrieved from mempool by GPUs of the computer device. It will be appreciated that embodiments of the present application generate sample game play data in M game maps, as opposed to training an agent model for one game map. The data of different game maps need to be separated according to corresponding map labels, so that the model can learn the different game maps better, and the initial intelligent agent model can distinguish the map better.

And determining map environment perception information corresponding to the M game maps respectively according to the map key information respectively associated with the M game maps. And controlling the initial agent model, and performing feature preprocessing according to the relative position information and the map environment perception information corresponding to the M game maps and the game parameters corresponding to the M game maps to obtain game features corresponding to each game map. And executing game play tasks in the M game maps according to game play characteristics corresponding to each game map, and obtaining task execution results respectively corresponding to the initial agent model in the M game maps. And carrying out parameter adjustment on model parameters in the initial intelligent agent model according to task execution results respectively corresponding to the M game maps to obtain the general intelligent agent model.

It can be understood that the computer device can perform feature preprocessing on the data of each game map through the initial agent model, taking the ith game map as an example, the computer device can perform feature conversion on the map identifier of the ith game map through the first perception layer in the initial agent model, so as to obtain the map identifier feature of the ith game map. Meanwhile, the computer equipment can perform feature conversion on the relative position information associated with the ith game map through a second perception layer in the initial agent model to obtain the relative position feature associated with the ith game map. Further, the computer equipment can embed the map identification feature of the ith game map into the relative position feature associated with the ith game map through the third perception layer in the initial intelligent agent model, so as to obtain the embedded relative position feature corresponding to the ith game map. In this way, the initial agent model is facilitated to better discern which game map is in.

Further, the computer device may perform feature conversion on the universal map resource file and the map environment awareness information corresponding to the ith game map, and the game state parameters of the initial agent model in the ith game map, so as to obtain an initial game feature corresponding to the ith game map. And splicing the embedded relative position features and the initial game feature through a splicing layer in the initial intelligent agent model to obtain the game feature corresponding to the ith game map. And then, according to the game play characteristics corresponding to the ith game map, the play actions and the play strategies corresponding to the initial intelligent body model respectively can be determined, namely, according to the play data of the initial intelligent body model in the ith game map, the play actions and the play strategies corresponding to the initial intelligent body model are determined. According to the processed data and characteristics, a general intelligent agent model which can adapt to a plurality of maps is trained. In the training process, the general agent model learns general game strategies and general game knowledge on different game maps, so that the performance under the environment of the multi-game map is improved.

Fig. 6 is a schematic diagram of training a general-purpose agent model according to an embodiment of the present application, where, as shown in fig. 6, a computer device may obtain sample game data of an initial agent model in M game maps, that is, game map data 1 (that is, game data of an initial agent model in a first game map), game map data 2 (that is, game data of an initial agent model in a second game map), …, and game map data M (that is, game data of an initial agent model in an mth game map). The computer device may remove the actual locations of the game elements in the map resource file included in each game map data in the virtual game environment, and generate map environment awareness information for each game map, resulting in a map generic feature for each game map.

Meanwhile, the computer equipment can reconstruct the position of absolute position information (namely map specific characteristics) associated with each game map data according to the reference position of each game map, and convert the absolute position information associated with each game map data into relative position information to obtain the relative position information associated with each game map data. And extracting the features of the map general features of each game map through a perception layer (namely a multi-layer perceptron) in the initial intelligent agent model to obtain the initial game feature of each game map. And further performing feature learning on the initial game feature of each game map and the relative position information associated with each game map data according to the long-term memory network in the initial intelligent agent model, and outputting the game actions and the game strategies of the initial intelligent agent model in the M game maps.

The computer device may control the initial agent model to execute the corresponding game actions and game strategies in the game environments corresponding to the M game maps respectively, that is, game map environment 1 (that is, game map environment corresponding to the first game map), game map environment 2 (that is, game map environment corresponding to the second game map), …, and game map environment M (that is, game map environment corresponding to the M game map), so as to execute the game tasks corresponding to the M game maps respectively, and train the initial agent model according to the game return benefits in the M game maps, so as to obtain the passing agent model.

Specifically, the training process of the general agent model according to the embodiment of the present application may include, but is not limited to, the following steps: step one, the computer equipment can generate and extract a target path diagram structure according to map resource files corresponding to M game maps respectively for path planning and strategy learning of a subsequent initial agent model. The computer device may convert the absolute position information associated with each game map into relative position information, i.e., the absolute position information of the game elements in each game map into relative position information, and convert the absolute position information of the initial agent model in each game map into relative position information (i.e., the difference in coordinates between the actual position of the initial agent model in each game map and the virtual character birth point of the corresponding game map). Meanwhile, the computer equipment can acquire the first mapping position of the initial intelligent agent model in the target path structure diagram, and the mapping position distance between the reference position of the ith game map in the second mapping position in the target path structure diagram, and adds the map id (namely the map identifier) of the corresponding game map to be embedded into the characteristics of the universal map resource file, the game parameters and the like so as to assist the initial intelligent agent model in distinguishing different game maps.

Step three, the computer device may load an initial agent model (which may be a neural network model), randomly initialize the initial agent model, i.e., prepare the initial agent model, ready for the training process. Loading game environments corresponding to M game maps respectively, starting running scripts of the initial agent model in parallel in a multi-computer, uniformly playing the game by self of a plurality of different game maps according to proportion to obtain sample game data of < state, target and action > and calculating to obtain corresponding game return benefits, and sending the distribution map to GPUs in the computer equipment. The states in the < states, targets and actions > refer to the game state parameters of the initial intelligent agent model in the game map, the targets refer to the game tasks of the initial intelligent agent model in the game map, and the actions refer to the actions required to be executed by the initial intelligent agent model in the game map. And fifthly, the GPUs in the computer equipment can update parameters of the initial intelligent agent model according to the sample office data and PPO algorithm (also can refer to other reinforcement learning algorithms) so as to optimize the strategy of the initial intelligent agent model. And step six, the computer equipment can evaluate the model capacity of the initial intelligent agent model, and if the capacity upper limit or the maximum iteration time step is reached, the training and the storage of the final model are stopped, and the general intelligent agent model is obtained. Otherwise, returning to the step 4 to continue training until the stopping condition is met. And step seven, if the general agent model is required to be put in the newly added game map, the relative position information and the map environment perception information of the newly added game map can be obtained through the step one and the step two. At this time, the general intelligent body model can be directly used, the game task is executed in the newly-added game map according to the relative position information and the map environment perception information of the newly-added game map and other game parameters, and the general intelligent body model can be recovered and trained or distilled according to the game result in the newly-added game map, so that better model performance is obtained, and the performance of the general intelligent body model is improved.

Further, referring to fig. 7, fig. 7 is a flow chart of a data processing method according to an embodiment of the application. As shown in fig. 7, the method may be performed by any terminal device in fig. 1, may be performed by the server 10 in fig. 1, or may be performed by both the terminal device and the server in fig. 1, and the apparatus for performing the data processing method in the present application may be collectively referred to as a computer apparatus. Wherein the data processing method may include, but is not limited to, the following steps:

S201, according to the reference position of each game map in M game maps of the virtual game environment, carrying out position reconstruction on the absolute position information associated with each game map to obtain the relative position information associated with each game map.

S202, according to map key information respectively associated with M game maps, map environment perception information respectively corresponding to the M game maps is determined.

S203, controlling the initial agent model, and executing the game task in the M game maps according to the relative position information and the map environment perception information respectively corresponding to the M game maps and the game parameters respectively corresponding to the M game maps to obtain the task execution results respectively corresponding to the initial agent model in the M game maps.

S204, according to the task execution results respectively corresponding to the M game maps, carrying out parameter adjustment on model parameters in the initial intelligent agent model to obtain the general intelligent agent model.

Specifically, the content of step S201 to step S204 in the embodiment of the present application may refer to the content of step S101 to step S104, which is not described herein.

S205, according to the reference position of the newly added game map in the virtual game environment, carrying out position reconstruction on the absolute position information associated with the newly added game map to obtain the relative position information associated with the newly added game map.

Specifically, after the general agent model is obtained through training, the general agent model has higher applicability and generalization, and can be directly applied to a newly added game map when the game map is newly added in a virtual game environment. It can be understood that the computer device can apply the general-purpose agent model to the newly-added game map, so as to implement the game task in the newly-added game map through the general-purpose agent model, without additionally training the agent model corresponding to the newly-added game map, thereby improving the training efficiency of the agent model and reducing the training cost of the agent model. Specifically, the computer device may convert absolute position information of the newly added game map into relative position information to remove map-specific features in the newly added game map. The computer equipment can newly increase the reference position of the game map, reconstruct the absolute position information associated with the newly increased game map to obtain the relative position information associated with the newly increased game map, and convert the newly increased game map into a general game map. Thus, the general agent model can apply the game knowledge and the game strategy learned in the M game maps to the newly added game map so as to realize that the general agent model can be directly applied to the newly added game map.

The content of performing the position reconstruction on the absolute position information associated with the newly added game map may refer to the content of performing the position reconstruction on the absolute position information associated with the ith game map, which is not described herein in detail.

S206, determining map environment perception information corresponding to the newly added game map according to the map key information corresponding to the newly added game map.

Specifically, the computer device may determine map environment awareness information corresponding to the newly added game map according to a map resource file corresponding to the newly added game map. Therefore, the method is beneficial to compensating for environmental perception loss caused by migration of different game maps, improving the generalization capability of the general intelligent body model on the newly added game map and improving the path finding capability of the general intelligent body model on the newly added game map. The specific content of the map environment sensing information corresponding to each newly added game map may be referred to the content of the map environment sensing information corresponding to the i-th game map, which is not described herein.

S207, controlling the general agent model, and performing game play with the game player character in the newly-added game map according to the relative position information and the map environment perception information corresponding to the newly-added game map and the game parameters corresponding to the newly-added game map to obtain a game play result corresponding to the general agent model in the newly-added game map.

Specifically, the computer device may control the general agent model, and according to the relative position information and the map environment sensing information corresponding to the newly-added game map, and the game parameters corresponding to the general agent model in the newly-added game map, play is performed with the game player character in the newly-added game map, so as to obtain a play result corresponding to the general agent model in the newly-added game map. It will be appreciated that the generic agent model may utilize generic strategies and generic knowledge learned on the M game maps to perform game play tasks in the newly added game map. Therefore, the general agent model in the embodiment of the application can be applied to any game map with the same game mode as M game maps, and one agent model is not needed to be obtained by training each game map, so that the training efficiency of the agent model can be improved, and the training cost of the agent model can be reduced.

Optionally, the computer device may also perform recovery training and distillation on the general agent model based on the game result on the newly added game map, so that the general agent model may learn new game knowledge and game policy in the newly added game map, and further improve the performance of the general agent model. Specifically, the computer device may generate game return benefits for reflecting the game quality of the general agent model on the newly added game map according to the game result corresponding to the newly added game map. And adjusting model parameters in the general intelligent agent model according to game return benefits corresponding to the newly added game map to obtain an adjusted general intelligent agent model.

Therefore, compared with training the intelligent body model corresponding to the newly-added game map from zero, the embodiment of the application trains the general intelligent body model according to the game result corresponding to the newly-added game map, and the general intelligent body model with excellent performance in the newly-added game map can be obtained by only a small amount of training, so that the training cost of the intelligent body model can be reduced and the training efficiency of the intelligent body model can be improved. It can be understood that, by means of the trained general intelligent agent model, the embodiment of the application can perform recovery training and distillation on the basis of the existing model parameters in the general intelligent agent model, and quickly obtain an adjusted general intelligent agent model, and the adjusted general intelligent agent model can perform excellent game playing on a newly added game map. Thus, the adjusted general intelligent agent model learns the route searching capability and the game playing capability on the newly-added game map, the game playing capability of the adjusted general intelligent agent model in other newly-added game maps is further improved, and the performance of the adjusted general intelligent agent model is improved.

The embodiment of the application provides a general intelligent body model training method, which obtains a general intelligent body model through fusion training according to M game maps, M is an integer larger than 1, and the general intelligent body model has higher universality and adaptability, can adapt to a plurality of game maps without training one intelligent body model aiming at different game maps, reduces the training cost of the intelligent body model and improves the training efficiency of the intelligent body model. Specifically, absolute position information associated with each game map in the M game maps is converted into relative position information, so that the initial intelligent agent model can capture commonalities among different game maps better, and the problems that learning ambiguity and learning difficulty of the initial intelligent agent model occur in the M game maps are avoided. Meanwhile, the road searching capability of the initial intelligent body model on different game maps is enhanced through the map environment perception information corresponding to each game map, and the problem of game environment perception loss caused by game map migration is solved. And controlling the initial intelligent agent model, executing a game task in the M game maps according to the relative position information and the map environment perception information respectively corresponding to the M game maps and the game parameters respectively corresponding to the M game maps, and training the initial intelligent agent model to obtain the general intelligent agent model. Therefore, the initial intelligent body model can learn general game knowledge and general game strategies in M game maps, the general intelligent body model obtained by training is applied to any game map, one intelligent body model is not required to be trained aiming at different game maps, the training cost of the intelligent body model can be greatly reduced, and the training efficiency of the intelligent body model can be improved. The embodiment of the application can also recover training and distillation based on the existing model parameters in the general intelligent body model by means of the trained general intelligent body model, and can quickly obtain an adjusted general intelligent body model, and the adjusted general intelligent body model can perform excellent game performance on a newly added game map. Thus, the adjusted general intelligent agent model learns the route searching capability and the game playing capability on the newly-added game map, the game playing capability of the adjusted general intelligent agent model in other newly-added game maps is further improved, and the performance of the adjusted general intelligent agent model is improved.

Further, referring to fig. 8, fig. 8 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application. The data processing means may be a computer program (comprising program code) running in a computer device, for example, the data processing means is an application software; the data processing device may be used to perform the corresponding steps in the method provided by the embodiments of the present application. As shown in FIG. 8, the data processing apparatus may be any blockchain node in a blockchain network. The data processing apparatus may include: the first reconstruction module 11, the first determination module 12, the first execution module 13, the first adjustment module 14, the second reconstruction module 15, the second determination module 16, the second execution module 17, the generation module 18, and the second adjustment module 19.

A first reconstruction module 11, configured to perform position reconstruction on absolute position information associated with each game map according to a reference position of each game map in M game maps of the virtual game environment, so as to obtain relative position information associated with each game map;

The first determining module 12 is configured to determine map environment sensing information corresponding to each of the M game maps according to map key information associated with each of the M game maps;

The first execution module 13 is configured to control the initial agent model, and execute the game task in the M game maps according to the relative position information and the map environment sensing information corresponding to the M game maps, and the game parameters corresponding to the M game maps, so as to obtain task execution results corresponding to the initial agent model in the M game maps;

the first adjustment module 14 is configured to perform parameter adjustment on model parameters in the initial agent model according to task execution results corresponding to the M game maps, so as to obtain a general agent model.

The absolute position information associated with an ith game map in the M game maps comprises first absolute position information and second absolute position information, wherein the first absolute position information reflects the actual position of a game element in the ith game map in the virtual game environment, and the second absolute position information reflects the actual position of an initial intelligent agent model in the ith game map; i is a positive integer less than or equal to M;

the first reconstruction module 11 is specifically configured to:

Performing position reconstruction on the first absolute position information according to the reference position of the ith game map to obtain the relative position information of the game elements in the ith game map;

Performing position reconstruction on the second absolute position information according to the reference position of the ith game map to obtain the relative position information of the initial intelligent agent model in the ith game map;

And determining the relative position information of the game elements in the ith game map and the relative position information of the initial agent model in the ith game map as the relative position information associated with the ith game map.

The first reconstruction module 11 is specifically configured to:

Acquiring a reference position of an ith game map and a position difference value between the actual positions of game elements in the ith game map in the virtual game environment;

The position difference value is determined as the relative position information of the game element in the ith game map.

The map key information related to the ith game map in the M game maps comprises a map resource file of the ith game map, a reference position of the ith game map and an actual position of the initial agent model in the ith game map; i is a positive integer less than or equal to M;

The first determining module 12 is specifically configured to:

extracting a walkable region corresponding to the ith game map from a map resource file corresponding to the ith game map;

generating a target path structure diagram corresponding to the ith game map according to the walkable region corresponding to the ith game map;

determining mapping position information of the initial intelligent agent model in the target path structural diagram according to the reference position of the ith game map and the actual position of the initial intelligent agent model in the ith game map;

and determining the mapping position information of the target path structure diagram and the initial intelligent agent model in the target path structure diagram as map environment perception information corresponding to the ith game map.

The first determining module 12 is further specifically configured to:

Determining a walkable path of the initial intelligent agent model in the ith game map according to the walkable region in the ith game map;

acquiring the walking direction of a walkable path, and generating a directed path structure diagram corresponding to an ith game map according to the walkable path and the walking direction of the walkable path;

and cutting the directed path structure diagram corresponding to the ith game map to obtain a target path structure diagram corresponding to the ith game map.

The first determining module 12 is further specifically configured to:

mapping the actual position of the initial intelligent agent model in the ith game map to a target path structure diagram to obtain a first mapping position of the initial intelligent agent model in the target path structure diagram;

mapping the reference position of the ith game map to the target path structure diagram to obtain a second mapping position of the reference position of the ith game map in the target path structure diagram;

obtaining a mapping position distance between a first mapping position and a second mapping position in a target path structure diagram;

and determining the mapping position distance as the mapping position information of the initial intelligent agent model in the target path structure diagram.

The game parameters of the ith game map in the M game maps comprise map identifications and map resource files of the ith game map and game state parameters of the initial intelligent agent model in the ith game map; i is a positive integer less than or equal to M;

The first execution module 13 is specifically configured to:

removing the actual positions of game elements included in the map resource file of the ith game map in the virtual game environment to obtain a universal map resource file of the ith game map;

The method comprises the steps that through an initial intelligent agent model, feature preprocessing is conducted on a universal map resource file, relative position information, map identification and map environment perception information corresponding to an ith game map and game play state parameters of the initial intelligent agent model in the ith game map, so that game play features corresponding to the ith game map are obtained;

and executing a game play task in the ith game map according to the game play characteristics to obtain a task execution result corresponding to the ith game map by the initial agent model.

The first execution module 13 is further specifically configured to:

Embedding a map identifier of an ith game map into relative position information associated with the ith game map through a perception layer in the initial intelligent agent model to obtain an embedded relative position feature corresponding to the ith game map;

performing feature conversion on the universal map resource file and map environment perception information corresponding to the ith game map and the game state parameters of the initial intelligent agent model in the ith game map to obtain initial game features corresponding to the ith game map;

and splicing the embedded relative position features and the initial game feature to obtain game feature corresponding to the ith game map.

The first execution module 13 is further specifically configured to:

Removing invalid features in game play features through a neural network layer in the initial intelligent agent model to obtain effective game play features;

screening important game features from the effective game features, and generating state update parameters according to the important game features;

Updating the state of the memory unit in the neural network according to the state updating parameters to obtain the updated state of the memory unit;

According to the updated memory unit state and the effective game play characteristics, determining the predicted play action and the predicted play strategy of the initial intelligent body model in the ith game map;

and executing the game play task on the ith game map according to the predicted game play action and the predicted game play strategy to obtain a task execution result corresponding to the ith game map.

The first adjustment module 14 is specifically configured to:

Generating game return benefits for reflecting the task execution quality of the initial agent model in the ith game map according to the task execution result corresponding to the ith game map in the M game maps; i is a positive integer less than or equal to M;

Determining a parameter updating gradient of the initial intelligent agent model according to game return benefits respectively corresponding to the M game maps and the reinforcement learning function corresponding to the initial intelligent agent model;

according to the parameter updating gradient, carrying out parameter adjustment on model parameters in the initial intelligent body model to obtain an initial intelligent body model after parameter adjustment;

And if the initial intelligent body model after parameter adjustment meets the training stopping condition, determining the initial intelligent body model after parameter adjustment as an intelligent body model.

Wherein the data processing apparatus further comprises:

The second reconstruction module 15 is configured to perform position reconstruction on absolute position information associated with the newly-added game map according to a reference position of the newly-added game map in the virtual game environment, so as to obtain relative position information associated with the newly-added game map;

the second determining module 16 is configured to determine map environment sensing information corresponding to the newly added game map according to a map resource file corresponding to the newly added game map;

the second execution module 17 is configured to control the general agent model, and perform a game with the game player character in the newly-added game map according to the relative position information and the map environment sensing information corresponding to the newly-added game map and the game parameters corresponding to the general agent model in the newly-added game map, so as to obtain a game result corresponding to the general agent model in the newly-added game map.

Wherein the data processing apparatus further comprises:

The generation module 18 is configured to generate game return benefits for reflecting the game quality of the general agent model on the newly added game map according to the game result corresponding to the newly added game map;

The second adjustment module 19 is configured to adjust model parameters in the universal agent model according to game return benefits corresponding to the newly added game map, so as to obtain an adjusted universal agent model.

In the present embodiment, the term "module" or "unit" refers to a computer program or a part of a computer program having a predetermined function and working together with other relevant parts to achieve a predetermined object, and may be implemented in whole or in part by using software, hardware (such as a processing circuit or a memory), or a combination thereof. Also, a processor (or multiple processors or memories) may be used to implement one or more modules or units. Furthermore, each module or unit may be part of an overall module or unit that incorporates the functionality of the module or unit. According to an embodiment of the present application, each module in the data processing apparatus shown in fig. 8 may be formed by combining one or several units separately or all, or some (some) of the units may be further split into at least two sub-units with smaller functions, so that the same operation may be implemented without affecting the implementation of the technical effects of the embodiment of the present application. The above modules are divided based on logic functions, and in practical application, the functions of one module may be implemented by at least two units, or the functions of at least two modules may be implemented by one unit. In other embodiments of the application, the data processing apparatus may also comprise other units, and in practical applications, these functions may also be assisted by other units and may be realized by cooperation of at least two units.

According to an embodiment of the present application, a data processing apparatus as shown in fig. 8 may be constructed by running a computer program (including program code) capable of executing the steps involved in the respective methods as shown in fig. 3 on a general-purpose computer device such as a computer including a processing element such as a Central Processing Unit (CPU), a random access storage medium (RAM), a read only storage medium (ROM), and the like, and a storage element, and a data processing method of the embodiment of the present application is implemented. The computer program may be recorded on, for example, a computer-readable recording medium, and loaded into and executed by the computer apparatus via the computer-readable recording medium.

The embodiment of the application provides a general intelligent body model training method, which obtains a general intelligent body model through fusion training according to M game maps, M is an integer larger than 1, and the general intelligent body model has higher universality and adaptability, can adapt to a plurality of game maps without training one intelligent body model aiming at different game maps, reduces the training cost of the intelligent body model and improves the training efficiency of the intelligent body model. Specifically, absolute position information associated with each game map in the M game maps is converted into relative position information, so that the initial intelligent agent model can capture commonalities among different game maps better, and the problems that learning ambiguity and learning difficulty of the initial intelligent agent model occur in the M game maps are avoided. Meanwhile, the road searching capability of the initial intelligent body model on different game maps is enhanced through the map environment perception information corresponding to each game map, and the problem of game environment perception loss caused by game map migration is solved. And controlling the initial intelligent agent model, executing a game task in the M game maps according to the relative position information and the map environment perception information respectively corresponding to the M game maps and the game parameters respectively corresponding to the M game maps, and training the initial intelligent agent model to obtain the general intelligent agent model. Therefore, the initial intelligent body model can learn general game knowledge and general game strategies in M game maps, the general intelligent body model obtained by training is applied to any game map, one intelligent body model is not required to be trained aiming at different game maps, the training cost of the intelligent body model can be greatly reduced, and the training efficiency of the intelligent body model can be improved. The embodiment of the application can also recover training and distillation by means of the trained general intelligent body model on the basis of the existing model parameters in the general intelligent body model, and quickly obtain an adjusted general intelligent body model, wherein the adjusted general intelligent body model can perform excellent game playing on a newly added game map. Thus, the adjusted general intelligent agent model learns the route searching capability and the game playing capability on the newly-added game map, the game playing capability of the adjusted general intelligent agent model in other newly-added game maps is further improved, and the performance of the adjusted general intelligent agent model is improved.

Further, referring to fig. 9, fig. 9 is a schematic diagram of a computer device according to an embodiment of the application. As shown in fig. 7, the computer device 3000 may be a terminal device or a server in the corresponding embodiment of fig. 2, and the computer device 3000 may include: at least one processor 3001, e.g., a CPU, at least one network interface 3004, a user interface 3003, memory 3005, at least one communication bus 3002. Wherein the communication bus 3002 is used to enable connected communications between these components. The user interface 3003 may include a Display screen (Display), a Keyboard (Keyboard), and the network interface 3004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), among others. The memory 3005 may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as at least one disk memory. The memory 3005 may also optionally be at least one memory device located remotely from the aforementioned processor 3001. As shown in fig. 9, the memory 3005, which is one type of computer storage medium, may include an operating system, a network communication module, a user interface module, and a computer program control application program.

In the computer device 3000 shown in fig. 9, the network interface 3004 is mainly used for network communication between the second node device and the target relay server and the target predictor server; while the user interface 3003 is primarily used as an interface for providing input to a user; and the processor 3001 may be used to invoke a computer program control application stored in the memory 3005 to implement:

It should be understood that the computer device 3000 described in the embodiment of the present application may also perform the description of a data processing method in the corresponding embodiment of fig. 7, and the computer device 3000 described in the embodiment of the present application may also perform the description of a data processing apparatus in the corresponding embodiment of fig. 8, which is not repeated herein. In addition, the description of the beneficial effects of the same method is omitted.

Furthermore, it should be noted here that: the embodiment of the present application further provides a computer readable storage medium, in which a computer program executed by the aforementioned data processing apparatus is stored, and the computer program includes program instructions, when executed by the processor, can execute the description of the data processing method in the embodiment corresponding to fig. 3 or fig. 7, and therefore, a detailed description will not be given here. In addition, the description of the beneficial effects of the same method is omitted. For technical details not disclosed in the embodiments of the computer-readable storage medium according to the present application, please refer to the description of the method embodiments of the present application. As an example, program instructions may be deployed to be executed on one computing device or on multiple computing devices at one site or distributed across multiple sites and interconnected by a communication network, where the multiple computing devices distributed across multiple sites and interconnected by a communication network may constitute a blockchain system.

In one aspect, the application provides a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions, so that the computer device may execute a data processing method in the foregoing embodiment corresponding to fig. 3 or fig. 7, which is not described herein. In addition, the description of the beneficial effects of the same method is omitted.

It should be noted that, in the application of the present application, the relevant data collection process should strictly obtain the informed consent or independent consent (or have legal basis) of the personal information body according to the requirements of the relevant national laws and regulations, and develop the subsequent data use and processing behaviors within the authorized range of the laws and regulations and the personal information body.

Those skilled in the art will appreciate that the processes implementing all or part of the methods of the above embodiments may be implemented by a computer program for instructing relevant hardware, and the program may be stored in a computer readable storage medium, and the program may include the processes of the embodiments of the methods as above when executed. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), a random-access memory (Random Access Memory, RAM), or the like.

The foregoing disclosure is illustrative of the present invention and is not to be construed as limiting the scope of the invention, which is defined by the appended claims.

Claims

1. A method of data processing, comprising:

According to the reference position of each game map in M game maps of the virtual game environment, carrying out position reconstruction on absolute position information associated with each game map to obtain relative position information associated with each game map;

According to the map key information respectively associated with the M game maps, map environment perception information respectively corresponding to the M game maps is determined;

Controlling an initial intelligent agent model, and executing game tasks in the M game maps according to the relative position information and the map environment perception information respectively corresponding to the M game maps and the game parameters respectively corresponding to the M game maps to obtain task execution results respectively corresponding to the initial intelligent agent model in the M game maps;

And carrying out parameter adjustment on model parameters in the initial intelligent agent model according to task execution results respectively corresponding to the M game maps to obtain a general intelligent agent model.

2. The method of claim 1, wherein the absolute position information associated with an i-th game map of the M game maps includes first absolute position information reflecting an actual position of a game element in the i-th game map in the virtual game environment and second absolute position information reflecting an actual position of the initial agent model in the i-th game map; i is a positive integer less than or equal to M;

the method for reconstructing the position of the absolute position information associated with each game map according to the reference position of each game map in M game maps of the virtual game environment to obtain the relative position information associated with each game map comprises the following steps:

performing position reconstruction on the first absolute position information according to the reference position of the ith game map to obtain the relative position information of game elements in the ith game map;

And determining the relative position information of the game elements in the ith game map and the relative position information of the initial intelligent agent model in the ith game map as the relative position information associated with the ith game map.

3. The method according to claim 2, wherein the performing position reconstruction on the first absolute position information according to the reference position of the ith game map to obtain the relative position information of the game element in the ith game map includes:

acquiring a position difference value between a reference position of the ith game map and an actual position of a game element in the ith game map in the virtual game environment;

and determining the position difference value as the relative position information of the game element in the ith game map.

4. The method of claim 1, wherein the map key information associated with an i-th game map of the M game maps includes a map resource file of the i-th game map, a reference location of the i-th game map, and an actual location of the initial agent model in the i-th game map; i is a positive integer less than or equal to M;

the determining map environment perception information corresponding to the M game maps respectively according to the map key information associated with the M game maps respectively includes:

Determining mapping position information of the initial intelligent agent model in the target path structure diagram according to the reference position of the ith game map and the actual position of the initial intelligent agent model in the ith game map;

and determining the mapping position information of the target path structure diagram and the initial agent model in the target path structure diagram as map environment perception information corresponding to the ith game map.

5. The method of claim 4, wherein the generating the target path structure map corresponding to the ith game map according to the walkable region corresponding to the ith game map comprises:

determining a walkable path of the initial agent model in the ith game map according to the walkable region in the ith game map;

Acquiring the walking direction of the walkable path, and generating a directed path structure diagram corresponding to the ith game map according to the walkable path and the walking direction of the walkable path;

6. The method of claim 4, wherein said determining the mapping location information of the initial agent model in the target path structure map based on the reference location of the i-th game map and the actual location of the initial agent model in the i-th game map comprises:

Mapping the actual position of the initial intelligent agent model in the ith game map to the target path structure diagram to obtain a first mapping position of the initial intelligent agent model in the target path structure diagram;

Acquiring a mapping position distance between the first mapping position and the second mapping position in the target path structure diagram;

and determining the mapping position distance as the mapping position information of the initial intelligent agent model in the target path structural diagram.

7. The method of claim 1, wherein game parameters of an i-th game map of the M game maps include map identifications and map resource files of the i-th game map, and game play status parameters of the initial agent model in the i-th game map; i is a positive integer less than or equal to M;

The control of the initial agent model, according to the relative position information and the map environment perception information respectively corresponding to the M game maps and the game parameters respectively corresponding to the M game maps, executing the game task in the M game maps to obtain the task execution result respectively corresponding to the initial agent model in the M game maps, including:

Performing feature preprocessing on a universal map resource file, relative position information, map identification and map environment perception information corresponding to the ith game map and game play state parameters of the initial intelligent agent model in the ith game map to obtain game play features corresponding to the ith game map;

8. The method according to claim 7, wherein the feature preprocessing is performed on the universal map resource file, the relative position information, the map identifier and the map environment awareness information corresponding to the ith game map and the game play status parameters of the initial agent model in the ith game map to obtain the game play feature corresponding to the ith game map, and the method comprises:

Embedding the map identification of the ith game map into the relative position information associated with the ith game map through a perception layer in the initial intelligent agent model to obtain the embedded relative position characteristic corresponding to the ith game map;

and splicing the embedded relative position features and the initial game feature to obtain the game feature corresponding to the ith game map.

9. The method of claim 7, wherein executing the game task in the ith game map according to the game feature to obtain a task execution result corresponding to the initial agent model in the ith game map comprises:

removing invalid features in the game play features through a neural network layer in the initial agent model to obtain effective game play features;

Determining a predicted game action and a predicted game strategy of the initial agent model in the ith game map according to the updated memory unit state and the effective game characteristics;

And executing a game task on the ith game map according to the predicted game action and the predicted game strategy to obtain a task execution result corresponding to the ith game map.

10. The method of claim 1, wherein the performing parameter adjustment on the model parameters in the initial agent model according to the task execution results respectively corresponding to the M game maps to obtain a general agent model includes:

generating game return benefits for reflecting the task execution quality of the initial agent model in an ith game map according to the task execution result corresponding to the ith game map in the M game maps; i is a positive integer less than or equal to M;

and if the initial intelligent body model after the parameter adjustment meets the training stopping condition, determining the initial intelligent body model after the parameter adjustment as an intelligent body model.

11. The method according to claim 1, wherein the method further comprises:

Performing position reconstruction on absolute position information associated with the newly added game map according to a reference position of the newly added game map in the virtual game environment to obtain relative position information associated with the newly added game map;

determining map environment perception information corresponding to the newly-added game map according to the map key information corresponding to the newly-added game map;

And controlling the general agent model, and performing game play with the game player character in the new game map according to the relative position information and the map environment perception information corresponding to the new game map and the game parameters corresponding to the new game map to obtain a corresponding game play result of the general agent model in the new game map.

12. The method of claim 11, wherein the method further comprises:

generating game return benefits for reflecting the game quality of the general agent model on the newly added game map according to the game result corresponding to the newly added game map;

And adjusting model parameters in the general intelligent agent model according to the game return benefits corresponding to the newly added game map to obtain an adjusted general intelligent agent model.

13. A data processing apparatus, comprising:

the first determining module is used for determining map environment perception information corresponding to the M game maps respectively according to the map key information respectively associated with the M game maps;

The first execution module is used for controlling the initial intelligent agent model, executing game tasks in the M game maps according to the relative position information and the map environment perception information respectively corresponding to the M game maps and the game parameters respectively corresponding to the M game maps, and obtaining task execution results respectively corresponding to the initial intelligent agent model in the M game maps;

And the first adjusting module is used for carrying out parameter adjustment on the model parameters in the initial intelligent agent model according to the task execution results respectively corresponding to the M game maps to obtain a general intelligent agent model.

14. A computer device, comprising: a processor and a memory;

The processor is connected to the memory, wherein the memory is configured to store a computer program, and the processor is configured to invoke the computer program to cause the computer device to perform the method of any of claims 1-12.

15. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program adapted to be loaded and executed by a processor to cause a computer device having the processor to perform the method of any of claims 1-12.

16. A computer program product or computer program, characterized in that it comprises computer instructions stored in a computer-readable storage medium, which are adapted to be read and executed by a processor to cause a computer device with the processor to perform the method of any of claims 1-12.