WO2022213702A1

WO2022213702A1 - Method and apparatus for configuring game inference service on cloud platform, and related device

Info

Publication number: WO2022213702A1
Application number: PCT/CN2022/072425
Authority: WO
Inventors: 伍丝琪; 邵坤; 朱疆成; 白小龙; 戴宗宏
Original assignee: 华为云计算技术有限公司
Priority date: 2021-04-09
Filing date: 2022-01-17
Publication date: 2022-10-13

Abstract

The present application provides a method for configuring a game inference service on a cloud platform, comprising: when configuring, on the cloud platform, the game inference service for a game developer, acquiring a first configuration file that comprises configuration information for a first game, to configure, on the cloud platform, an inference service for the first game on the basis of a game algorithm framework of the cloud platform and the acquired first configuration file. Furthermore, an inference service for a second game can further be configured on the cloud platform on the basis of the game algorithm framework of the cloud platform and a second configuration file corresponding to the second game. In this way, for different games, inference services required by game developers for one or more games can all be configured on the cloud platform, such that cloud service providers do not need to perform specialized design of the inference services. Moreover, the difficulty for small and medium-sized game manufacturers to apply AI technology to obtain game inference results is also effectively reduced. In addition, a corresponding apparatus and a related device are further provided.

Description

Method, device and related equipment for configuring game reasoning service on cloud platform

technical field

The present application relates to the technical field of artificial intelligence, and in particular, to a method, apparatus and related equipment for configuring a reasoning service of a game on a cloud platform.

Background technique

At present, artificial intelligence (AI) technology is widely used in various fields. For example, in the game field, game developers (such as game manufacturers, etc.) can use AI technology to implement functions such as non-player character (NPC) training, player behavior prediction, and character battle strategies in game scenarios. It is to infer the battle action or battle state of objects such as non-player characters in the game battle process through AI technology.

In the process of game development, if game developers want to obtain game reasoning capabilities based on AI technology, they need to independently build the entire reasoning service based on specific games according to specific needs, or propose to service providers (such as cloud service providers) for specific games. The game's inference service requirements, and the service provider customizes the game's specific inference service for the game developer based on the requirements. However, the above two ways of obtaining the reasoning service of the game make the cost and difficulty of obtaining the reasoning service of the AI-based game relatively high.

SUMMARY OF THE INVENTION

In view of this, the embodiments of the present application provide a method for configuring a game reasoning service on a cloud platform, so as to reduce the cost and difficulty of providing a game reasoning service by a cloud service provider. The present application also provides corresponding apparatuses, computing device clusters, computer-readable storage media, and computer program products.

In the first aspect, an embodiment of the present application provides a method for configuring an inference service for a game on a cloud platform. Specifically, when configuring an inference service for a first game for a game developer on a cloud platform, you can obtain information including an inference service for the first game. A first configuration file of configuration information of a game, so that the inference service of the first game is configured on the cloud platform based on the game algorithm framework of the cloud platform and the acquired first configuration file.

In this way, for the game developed by the game developer, the general game algorithm framework and the corresponding configuration file can be used to automatically configure the inference service for the specific game required by the game developer on the cloud platform, so as to use the configured inference service. The reasoning service performs corresponding reasoning on the game, such as reasoning about the action and/or state of any character in the game. In this way, the cloud service provider does not need to carry out specialized design of the reasoning service for the game, so that the difficulty and cost of providing and using the reasoning service of the game can be effectively reduced, and the efficiency of providing and using the reasoning service can also be effectively improved.

Moreover, in practical applications, the application of AI technology is inseparable from the training algorithm represented by reinforcement learning (RL) and the computing resources required to support the operation of the training algorithm. However, for small and medium-sized game manufacturers and other game developers, In other words, it may be difficult to have high-quality AI models and enough computing resources to implement game inference. By configuring inference services on the cloud platform to infer the actions and/or states of characters in the game, the application of game developers can be effectively reduced. The difficulty of AI technology to obtain game reasoning results.

In a possible implementation manner, not only the reasoning service corresponding to the first game but also the reasoning service corresponding to the second game may be configured on the cloud platform. Specifically, a second configuration file including configuration information for the second game may be obtained, so that the inference service of the second game is configured on the cloud platform based on the game algorithm framework of the cloud platform and the obtained second configuration file. In this way, it is possible to configure inference services corresponding to a plurality of different games on the cloud platform by using a general game algorithm framework, thereby improving the universality of solution implementation.

In a possible implementation, after the inference service of the first game is configured, the inference service of the first game can be used to respond to the inference request sent by the game terminal, where the game terminal can run the first game The device of the game application instance, such as a terminal and/or a server, etc., and the reasoning request includes the data to be processed for the target object in the game application instance of the first game, such as including the information picture of the target object, etc., and the reasoning service The response made may include indication information on the action and/or state of the target object, for example, the indication information may indicate the action (such as attack, long jump, etc.) performed by the target object in the future, and/or may indicate The state of the target object in the future (such as emotion, attack speed, etc.). In this way, the game developer can realize the reasoning about the action and/or state of the target object through the reasoning service configured on the cloud platform, which can effectively reduce the difficulty of the game developer applying AI technology to obtain the game reasoning result.

In a possible implementation, the first configuration file includes one or more of the following configuration information: the action space of the target object in the game application instance of the first game, the target in the game application instance of the first game The state space of the object, the first type of target training algorithm, the second type of artificial intelligence AI model, the reward function, the training method of the AI model, the reasoning method of the AI model, the storage address of the AI model, the training method of the AI model and the The specification of the computing resources for inference, so that the configuration of the inference service can be implemented using these configuration information on the cloud platform. The implementation manner of the second configuration file is similar to that of the first configuration file, which can be understood by reference, and will not be repeated here.

In a possible implementation, when configuring the reasoning service of the first game, at least one AI model can be trained according to the first configuration file and the game algorithm framework of the cloud platform, so that at least one AI model can be trained according to the at least one trained The AI model configuration obtains the reasoning service of the first game. Correspondingly, when using the inference service to infer the action and/or state of the target object in the game application instance of the first game, inference may be specifically performed by using the trained AI model.

In a possible implementation, when training at least one AI model, it may specifically receive multiple training requests from the game terminal, the multiple training requests are from multiple game application instances of the first game, and different training requests Different training data for the same target object in the multiple game application examples are included, so that at least one AI model can be trained by using the training data in multiple training requests. In this way, by running multiple game application instances on the game side at the same time, multiple copies of training data can be generated in parallel, which can effectively improve the efficiency of generating training data, that is, the efficiency of training AI models.

Optionally, the game terminal may also run only one game application instance in the same time period, so that one or more AI models are obtained by training using the training data generated by the single game application instance.

In a possible implementation manner, the trained at least one AI model includes a first AI model and a second AI model, in this case, the hyperparameters of the first AI model and the second AI model are different, and/or, The reward functions corresponding to the first AI model and the second AI model are different. In this way, by training AI models with different hyperparameters, it is possible to determine the hyperparameters that can make the inference effect of the AI model higher, and then the quality of the AI model based on the hyperparameters is higher, that is, the configuration based on the AI model The quality of the inference service is high. In addition, by using different reward functions to train different AI models, AI models of different inference types can be trained, for example, AI models of various inference styles can be obtained by training with multiple reward functions, etc., so as to realize the diversification of inference.

In a possible implementation, when the AI model to be trained includes the first AI model and the second AI model, multiple processes may run on the cloud platform. Here, the first process and the second process are used as examples. When training the first AI model and the second AI model, specifically, according to the port number and/or IP address of the first process, the training data in the multiple training requests are sent to the first process and the second process, The training data received by the first process and the second process may be different. Then, the first AI model is trained using the first process and the training data received by the first process, and the second AI model is trained using the second process and the training data received by the second process. In this way, multiple AI models can be trained in parallel on the cloud platform, so that the training efficiency of the AI model can be improved, that is, the efficiency of configuring the reasoning service of the first game can be improved.

In a possible implementation, in the game algorithm framework of the cloud platform, one or more different types of training algorithms and AI models may be predefined, so that when configuring the reasoning service of the first game, you can The first type of target training algorithm and the second type of AI model in the file, call the first type of target training algorithm and at least one AI model of the second type from the game algorithm framework.

Exemplarily, the target training algorithm can be, for example, any one of a deep reinforcement learning algorithm, a near-end policy optimization algorithm, a flexible action evaluation algorithm, a deep deterministic policy gradient algorithm, a double-delay deep deterministic policy gradient algorithm, and a rainbow algorithm, Or other applicable algorithms. The AI model, for example, can be any one of a deep neural network model, a recurrent neural network model, and a convolutional neural network model, or can also be other applicable models.

In a possible implementation, when the data formats of the game terminal and the cloud platform are different, before using the inference service of the first game to respond to the inference request sent by the game terminal, the inference request sent by the game terminal may be used first. The data format is processed to obtain data in a data format that can be recognized by the cloud platform. In this way, it can be avoided that the game terminal and the cloud platform are difficult to identify the reasoning request sent by the game terminal due to the difference of the deployment environment.

In a possible implementation, a persistent connection may be maintained between the cloud platform and the game terminal, and the cloud platform may receive and respond to inference requests sent by the game terminal through the persistent connection. In this way, after a long connection is established between the cloud platform and the game terminal, they can communicate with each other multiple times, and it is not necessary to re-establish the connection every time for data communication, which can effectively reduce the relationship between the cloud platform and the game terminal. communication delay between.

In a possible implementation manner, when acquiring the first configuration file, the first configuration file may be specifically acquired based on the configuration information selected by the game developer. For example, the cloud platform can provide the game developer with a corresponding configuration interface, and multiple configuration information items for the game developer to select are presented on the configuration interface, so that the game developer can choose from multiple configuration information items , so that the cloud platform automatically generates a corresponding first configuration file based on the game developer's selection of the configuration information item. In this way, the configuration efficiency of the game developer can be effectively provided, and the configuration experience can be improved.

In a second aspect, the present application provides a device for configuring an inference service of a game on a cloud platform, the device includes a communication module for acquiring a first configuration file, where the first configuration file includes configuration information for the first game; the configuration module, It is used for the game algorithm framework and the first configuration file based on the cloud platform, and the reasoning service of the first game is configured on the cloud platform.

In a possible implementation manner, the communication module is further configured to acquire a second configuration file, where the second configuration file includes configuration information for the second game; the configuration module is further configured to obtain the game algorithm framework based on the cloud platform and the second configuration file. Configuration file, configure the reasoning service of the second game on the cloud platform.

In a possible implementation manner, the apparatus further includes: an inference module, configured to use an inference service of the first game to respond to an inference request sent by the game terminal, wherein the game terminal includes a device running a game application instance of the first game , the inference request includes data to be processed for the target object in the game application instance of the first game, and the response includes indication information for the action and/or state of the target object.

In a possible implementation, the first configuration file includes one or more of the following configuration information: the action space of the target object in the game application instance of the first game, the target in the game application instance of the first game The state space of the object, the first type of target training algorithm, the second type of artificial intelligence AI model, the reward function, the training method of the AI model, the reasoning method of the AI model, the storage address of the AI model, the training method of the AI model and the Specifications of computational resources for inference.

In a possible implementation, the configuration module is specifically configured to: train at least one AI model based on the first configuration file and the game algorithm framework; configure the reasoning service of the first game according to the trained at least one AI model.

In a possible implementation, the configuration module is specifically configured to: receive multiple training requests from the game terminal, the multiple training requests are from multiple game application instances of the first game, and the different training requests include multiple training requests for multiple game applications Different training data of the same target object in the instance; at least one AI model is trained according to the training data in multiple training requests.

In a possible implementation, when the at least one AI model includes the first AI model and the second AI model, the hyperparameters of the first AI model and the second AI model are different, and/or the first AI model and the second AI model are different. The reward functions corresponding to the two AI models are different.

In a possible implementation, when at least one AI model includes a first AI model and a second AI model, the cloud platform runs a first process and a second process, and the configuration module is specifically used for: according to the first process The port number and/or IP address and the port number and/or IP address of the second process, send the training data in the multiple training requests to the first process and the second process; using the data received by the first process and the first process The training data trains the first AI model, and the second AI model is trained using the second process and the training data received by the second process.

In a possible implementation manner, the configuration module is specifically configured to: call the first type of target training in the game algorithm framework according to the first type of target training algorithm and the second type of AI model in the first configuration file an algorithm and at least one AI model of the second type; based on the called target training algorithm of the first type, the at least one AI model of the second type is trained.

In a possible implementation, when the data formats of the game terminal and the cloud platform are different, before using the inference service of the first game to respond to the inference request sent by the game terminal, the method further includes: responding to the inference request sent by the game terminal The format of the data in the request is processed to obtain data in a data format that the cloud platform can recognize.

In a possible implementation, a persistent connection is maintained between the cloud platform and the game terminal, and the cloud platform receives and responds to inference requests sent by the game terminal through the persistent connection.

In a possible implementation manner, the communication module is specifically configured to acquire the first configuration file based on the configuration information item selected by the game developer.

In a third aspect, the present application provides a computing device cluster, where the computing device cluster includes at least one computing device, wherein each computing device includes a processor and a memory. The processor is configured to execute instructions stored in the memory, so that the at least one computing device executes the method for configuring an inference service of a game on a cloud platform as in the first aspect or any implementation manner of the first aspect.

In a fourth aspect, the present application provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, when the computer-readable storage medium runs on a computing device, the computing device causes the computing device to perform the first aspect or any one of the first aspect. A method for configuring a game reasoning service on a cloud platform according to the implementation manner.

In a fifth aspect, the present application provides a computer program product containing instructions, which, when run on a computing device, enables the computing device to execute the cloud platform described in the first aspect or any implementation manner of the first aspect. A method to configure the game's inference service.

On the basis of the implementation manners provided by the above aspects, the present application may further combine to provide more implementation manners.

Description of drawings

In order to illustrate the technical solutions in the embodiments of the present application more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments. Obviously, the drawings in the following description are only some implementations described in the present application. For example, for those of ordinary skill in the art, other drawings can also be obtained from these drawings.

FIG. 1 is a schematic diagram of an exemplary application scenario provided by an embodiment of the present application;

FIG. 2 is a schematic structural diagram of a game reasoning device provided by an embodiment of the present application;

3 is a schematic flowchart of a method for configuring an inference service of a game on a cloud platform according to an embodiment of the present application;

4 is a schematic diagram of a configuration interface provided by an embodiment of the present application;

FIG. 5 is a schematic diagram of the transmission of training data provided by an embodiment of the present application from the game terminal 200 to the game inference device 300;

6 is a schematic flowchart of a method for configuring an inference service of a game on a cloud platform in combination with a specific scenario provided by an embodiment of the present application;

FIG. 7 is a schematic structural diagram of another game reasoning apparatus provided by an embodiment of the present application;

FIG. 8 is a schematic diagram of blood volume at the end of the battle between character A and character B according to an embodiment of the present application;

FIG. 9 is a schematic diagram of the victory rate of character A and character B provided in an embodiment of the present application;

FIG. 10 is a schematic diagram of the victory rate of character A of three different fighting styles provided in the embodiment of the present application;

FIG. 11 is a schematic diagram of a hardware structure of a computing device cluster according to an embodiment of the present application.

Detailed ways

The solutions in the embodiments provided in this application will be described below with reference to the accompanying drawings in this application.

The terms "first", "second" and the like in the description and claims of the present application and the above drawings are used to distinguish similar objects, and are not necessarily used to describe a specific order or sequence. It should be understood that the terms used in this way can be interchanged under appropriate circumstances, and this is only a distinguishing manner adopted when describing objects with the same attributes in the embodiments of the present application.

As shown in FIG. 1 , it is a schematic diagram of an exemplary application scenario. In the application scenario shown in FIG. 1, the game developer 101 can develop the game through the game terminal 200. For example, the game developer 101 can develop the game environment, design the game map and the actions of each character in the game on the game terminal 200, status, etc. The game developed by the game developer 101 on the game terminal 200 can provide the game player 102 with a game experience. For example, the developed game can be run on the game terminal 200, so that the game player 102 can trigger the game operation for game experience. It should be understood that the game developer 101 in this application refers to the subject who develops and designs the game or develops the game reasoning ability. It should be understood that the game terminal 200 described in FIG. 1 may include a device for developing a game, as well as a device for deploying a game. For example, the game terminal 200 may include a server for game development, and may also include a device running the game (eg, a background server of the game developer 101, and/or a terminal device of the game player 102 where the game is installed). It should be understood that FIG. 1 is an example of including one game terminal 200 for exemplary illustration, and in actual application, the number of game terminals 200 may also be multiple.

During the running of the game, the game terminal 200 may send an inference request to the cloud platform to request an inference service for the game. For example, the game on the game terminal 200 may support the "human-machine battle" mode, that is, when the player character controlled by the game player 102 competes with the non-player character controlled by the machine, the game terminal 200 may request the cloud platform for the non-player character's Battle action (such as attack, jump, etc.) and/or battle state (such as attack speed increase, movement speed decrease, etc.) and other information, so that the game terminal 200 can control the non-player character and the game player 102 according to the battle information provided by the cloud platform. Player characters fight against each other. As shown in FIG. 1 , the cloud platform may include a game inference apparatus 300, and the game inference apparatus 300 includes an inference service for the game. At this time, the game terminal 200 may specifically request an inference service from the game inference device 300 on the cloud platform, and the game inference device 300 performs corresponding game inference for the game terminal 200 .

Exemplarily, the game terminal 200 may specifically include the client 201 and the server 202 shown in FIG. 2 , where the client 201 is used to interact with the game player 102 , and the core content of the developed game is deployed in the server 202 . At this time, the client 201 sends an inference request to the server 202 while receiving the game operation of the game player 102 . In the process of responding to the reasoning request, the server 202 may request the game reasoning service from the game reasoning device 300 when some information such as the battle action and/or battle state of the non-player characters is required. Then, the server 202 may forward the battle action and/or state of the non-player characters fed back by the game inference device 300 to the client 201 , so that the client 201 controls the non-player characters to interact with the game player 102 in the game. In an actual application scenario, the server 202 may request game inference services for multiple clients 201 at the same time.

In another embodiment, the game terminal 200 may only include the client 201 . As shown in FIG. 2 , at this time, a complete game is deployed in the client 201 , and the client 201 can send an inference request to the game inference apparatus 300 . Then, the game reasoning device 300 feeds back the responsive reasoning information to the client 201, so that the client 201 can interact with the game player 102 based on the reasoning information.

The game inference device 300 may include a communication module 301 and an inference module 302 as shown in FIG. 2 . In the process of the game reasoning apparatus 300 providing reasoning services, the communication module 301 can communicate with the game terminal 200 through the seventh layer protocol (ie the application layer, Application Layer) in the Open System Interconnection Model (OSI) , that is, the communication module 301 can receive the inference request sent by the game terminal 200 based on the protocol, and send the inference request or the data to be processed carried in the inference request to the inference module 302 . The reasoning module 302 can use the corresponding AI model to execute the corresponding reasoning process for the game terminal 200 , and send the result obtained by the inference to the game terminal 200 through the communication module 301 .

The game inference apparatus 300 may be deployed on a server in a cloud center, or the game inference apparatus 300 may also be a server deployed in an edge center. In yet another possible implementation, the communication module 301 and the inference module 302 on the game inference device 300 may be deployed separately, for example, the communication module 301 may be deployed in a server in an edge center, and the inference module 302 may be deployed in a cloud center server etc. In this embodiment, the deployment manner of the game inference apparatus 300 is not limited. It should be understood that the cloud center in this application refers to a set of devices established by a cloud service provider and used to provide services for cloud tenants in a region (eg, East China region). The cloud center usually includes a large number of resources, and can provide basic resource services and/or software application services for cloud tenants in various regions in the region. The cloud center includes multiple computing devices (such as servers), hardware resources in each computing device, and virtual resources abstracted based on hardware resources (hardware resources and virtual resources can also be called computing nodes, for example, computing nodes can be containers or A virtual machine) may be used to deploy the aforementioned game inference device 300 .

However, for some application scenarios with high latency requirements, the use of cloud centers to provide services to some cloud tenants may not meet the latency requirements. Therefore, cloud service providers also set up edge centers.

The edge center in this application refers to a set of devices established by a cloud service provider in at least one specific area in an area, and the edge center also includes multiple computing devices, which can be used to provide services for tenants in a specific area in an area. Because edge centers are geographically deployed closer to cloud tenants in a specific region than cloud centers, edge centers can provide faster service responses.

Cloud service providers can provide cloud services through cloud platforms. Cloud platforms include software resources and hardware resources owned by cloud service providers. Specifically, cloud platforms include software systems that interact with cloud tenants to sell, configure, and run cloud services. , and the cloud center or edge center of the cloud service provider. The cloud platform can display at least one cloud service to the user, and after the cloud tenant purchases and configures the cloud service on the cloud platform, when the cloud tenant uses the cloud service, the cloud platform can call the cloud center or the node in the edge center. The device corresponding to the cloud service responds to the service, for example: in this application, after the user configures the game inference service on the graphical user interface of the cloud platform, the game inference device 300 deployed in the cloud center or the edge center can process data from the game terminal 200 the inference request and return the response.

The game terminal 200 may complete the configuration of the inference service on the cloud platform in advance before requesting the inference service of the game. At this time, the game inference device 300 may be a device for configuring the inference service of the game on the cloud platform. During specific implementation, as shown in FIG. 2 , the game inference apparatus 300 may further include a configuration module 303 . In the process of configuring the inference service, the game developer 101 may provide a configuration file for a specific game to the cloud platform. In this way, the configuration module 303 in the game inference device 300 can configure the game inference service for the game developer 101 on the cloud platform based on the game algorithm framework of the cloud platform and the configuration file provided by the game developer 101, so as to facilitate the inference module 302 may use the successfully configured reasoning service to perform game reasoning for the game terminal 200 . Among them, the game algorithm framework can be pre-defined with multiple training algorithms and multiple AI models, etc., so that the configuration module 303 can select from the game algorithm framework according to the configuration file. model and other content to configure the inference service.

In this way, for each game developed by different game developers 101 , the game reasoning device 300 can use the general game algorithm framework and the configuration file provided by the game developer 101 to automatically configure on the cloud platform to meet the requirements of the game developer 101 . required inference services. For example, when the game inference apparatus 300 obtains the first configuration file including the configuration information of the first game, it can configure the inference of the first game on the cloud platform based on the game algorithm framework on the cloud platform and the first configuration file and, when the game inference device obtains the first configuration file including the configuration information of the second game, it can configure the inference service of the second game on the cloud platform based on the game algorithm framework and the second configuration file. The first game and the second game are different games. In this way, the subsequent game reasoning device 300 can further use the configured reasoning service to perform game reasoning for the game terminal 200, which makes the cloud service provider do not need to carry out special design for the reasoning service for the game, thereby effectively reducing the need for the cloud service provider to provide a reasoning service. The difficulty and cost of inference services, and the efficiency of cloud service providers to provide inference services can also be effectively improved.

Moreover, in practical applications, the application of AI technology is inseparable from the training algorithm represented by reinforcement learning (RL) and the computing resources required to support the operation of the training algorithm. However, for small and medium-sized game manufacturers and other game developers, In other words, it may be difficult to have a high-quality AI model and enough computing resources to realize game reasoning. Therefore, the game terminal 200 uses the reasoning service on the cloud platform to realize game reasoning, which can effectively reduce the number of game developers 101 applying AI technology to obtain game reasoning results. difficulty.

It should be understood that the structure of the game reasoning apparatus 300 shown in FIG. 2 is only used as an exemplary illustration, and other possible implementations may also be adopted for the structure of the game reasoning apparatus 300 in practical application. For example, in a possible implementation, the game inference device 300 can also be divided into a configuration device and an inference device, wherein the configuration device is used to implement the inference service of configuring the game on the cloud platform, and the inference device is used to complete the The configured reasoning service performs corresponding game reasoning for the game terminal 200 . Alternatively, the game inference apparatus 300 may further include an online service module, which may be configured to publish the inference service as an online cloud service on the cloud platform after the configuration module 303 completes the configuration of the inference service of the game. Alternatively, in other game inference devices 300, the configuration module 303 and the inference module 302 may be integrated into one module or the like. This embodiment does not limit the specific implementation of the game inference apparatus 300 .

Next, various non-limiting specific implementations of the method for configuring the inference service of the game on the cloud platform will be described in detail.

Referring to FIG. 3 , it is a schematic flowchart of a method for configuring a game inference service on a cloud platform according to an embodiment of the present application. This method can be applied to the game reasoning apparatus 300 shown in FIG. 2 above. In this embodiment, the game reasoning device 300 can configure the corresponding reasoning services of various games on the cloud platform according to the configuration files of various games. For the corresponding first configuration file, the reasoning service of the first game is configured on the cloud platform is taken as an example for illustration. The implementation process of the game reasoning apparatus 300 configuring the reasoning service of the second game (and other games) according to the second configuration file corresponding to the second game (and other games) can be understood by referring to the specific implementation of the following embodiments. The example is not limited to this. The method for configuring the inference service of the game on the cloud platform shown in FIG. 3 may specifically include:

S301: The game inference apparatus 300 acquires a first configuration file, where the first configuration file includes configuration information for the first game.

Under normal circumstances, the data processing capability of the game terminal 200 and the quality of the AI model it has are limited, and it is difficult to perform information reasoning on the action and/or state of the target object in real time. Therefore, in this embodiment, the game developer 101 In the process of developing the first game, the game inference device 300 may be requested to configure an inference service for the first game, so as to utilize the stronger data processing capability of the cloud platform and the higher-quality AI model for the game application of the first game The target object in the instance (such as a game character in a game application instance) performs inference, and the inference service can be used to infer actions and/or states performed by one or more game characters in the game.

As an implementation example, the game developer 101 may provide the game inference apparatus 300 with a first configuration file, so that the game inference apparatus 300 configures the inference service of the first game on the cloud platform according to the first configuration file. It should be understood that the game developer 101 may send the first configuration file to the game inference apparatus 300 based on the game terminal 200, as shown in FIG. 3 . In other embodiments, the game developer 101 may also use any other device to provide the first configuration file to the game reasoning apparatus 300. For example, the game developer 101 may use any terminal device to log in to the web page of the cloud platform through the web page. A configuration operation is performed on the interface to provide the first configuration file to the game inference device 300 .

As an example, the first configuration file may include the action space of the target object in the game application instance of the first game, the state space of the target object in the game application instance of the first game, the reward function, the type of AI model, One or more of the type of target training algorithm used to train the AI model, the training method of the AI model, the inference method of the AI model, the storage address of the AI model, and the specifications of computing resources used for AI model training and inference, etc. . Optionally, when the first configuration file includes part of the above information, the remaining part of the information can be determined by the game reasoning device 300 by itself, for example, the remaining part of the information can be preset on the cloud platform.

The action space of the target object includes at least one action of the target object, such as the target object moving forward, backward, left, right, jumping, attacking, dodging and other actions in the first game. Correspondingly, the action deduced by the game inference device 300 for the target object is an action in the action space. In this embodiment, the target object may be a non-player character in the game application instance of the first game running on the game terminal 200, such as a multiplayer online battle arena (MOBA) controlled by the game terminal 200 A "human machine" or "wild monster" that fights against the player. Alternatively, the target object can also be an object that needs to be predicted to perform actions or states in a future time period (or other time periods). At this time, game player A is experiencing the process of predicting the behavior and/or state of game player B, which can be regarded as the process of game player A experiencing the game.

The state space of the target object, including at least one state of the target object, such as the state of the non-player character's HP, mana, attack power, defense power, skill cooldown time, and emotion in the game scene. Correspondingly, the state deduced by the game inference device 300 for the target object is a state in the state space.

The type of target training algorithm refers to the type of algorithm used to train the AI model used to infer the action and/or state of the target object. For example, the type of target training algorithm can be developed by the game before configuring the inference service. The player 101 specifies, of course, it may be specified by the game inference device 300 or the like. Exemplarily, the target training algorithm may be, for example, a deep reinforcement learning (deep reinforcement learning, DQN) algorithm, a proximal policy optimization (proximal policy optimization, PPO) algorithm, a soft actor-critic (soft actor-critic, SAC) algorithm, a deep Deterministic policy gradient (deep deterministic policy gradient, DDPG) algorithm, dual-delay deep deterministic policy gradient (twin delayed deep deterministic policy gradient algorithm, TD3) algorithm, rainbow (rainbow) algorithm of any reinforcement learning algorithm, or also There may be other applicable algorithms.

The type of AI model refers to the type of AI model used to realize the action and/or state of the inference target object. Exemplarily, the AI model may be any one of a deep neural network (deep neural network, DNN) model, a recurrent neural network (RNN) model, and a convolutional neural network (convolutional neural network, CNN) model neural network model, or other applicable models.

During the operation of the target training algorithm, the training data can be input into the AI model, and the parameters and/or network structure in the AI model can be adjusted according to the difference between the results output by the AI model and the actual results in the training data, so as to This implements the training of AI models.

The reward function refers to the function used to control the adjustment direction of the parameters in the AI model in the process of training the AI model. Specifically, in the model training process, the game reasoning device 300 can calculate the corresponding reward value through the reward function for the result output by the AI model, so that the AI can be determined according to the value between the reward value and the preset threshold. The adjustment direction of the parameters in the model, so that the model training is finally completed through multiple iterative adjustments.

The training method of the AI model, for example, can be the training method of self-play. Among them, self-play refers to the game between the model itself and a virtual opponent. The virtual opponent can be the model itself with past experience, or it can be an agent trained by other models. Specifically, the game reasoning device 300 may acquire the training data for object A generated by one or more game application instances, and input the training data of the object A into the AI model, and the AI model will use the training data of the object A according to the training data of the object A. Output the inference action of object A, and use the inference action to play a game with object B, so that the parameters in the AI model can be adjusted according to the game result between object A and object B.

The reasoning method, for example, can be a method of inferring actions and/or states of one style for a target object, or a method of inferring actions and/or states of multiple styles in parallel.

The save address of the AI model is used to indicate the save location of the AI model on the cloud platform after the training of the AI model is completed.

The computing resource specification used for AI model training and inference refers to the specification of computing resources that the service inference device 300 relies on when training the AI model or using the AI model to perform inference. configuration file to define.

In practical applications, the first configuration file may also include other information, such as a heuristic algorithm, etc., the heuristic algorithm may be used to impose rationality constraints on the actions and/or states obtained by the target training algorithm using the AI model inference, etc. . In this embodiment, the specific implementation of the first configuration file is not limited.

In a possible implementation, the game terminal 200 may present a configuration interface as shown in FIG. 4 to the user, and the configuration interface may present information prompting the game developer 101 to input the relevant configuration, so that the game developer 101 can The configuration interface configures the action space, state space, reward function, target training algorithm, AI model, etc. of the target object (the configuration of the rest of the content is not shown). The game terminal 200 may present configuration information items that can be selected by the game developer 101 on the configuration interface. For example, different types of candidate items for target training algorithms and AI models as shown in FIG. The item can be provided to the game terminal 200 in advance by the game reasoning device 300, so that the game developer 101 can directly select the training algorithm and AI model on the configuration interface, without inputting the specific training algorithm file and AI model file, Thereby, the configuration of the game developer 101 can be further facilitated. Then, the game terminal 200 can automatically generate the corresponding first configuration file according to the configuration information item selected by the game developer 101 and send it to the game inference device 300 . In practical application, the game developer 101 may also provide the first configuration file to the game inference apparatus 300 in other manners, which is merely an exemplary description, and its specific implementation manner is not limited.

S302: The game inference apparatus 300 configures the inference service of the first game on the cloud platform based on the game algorithm framework of the cloud platform and the acquired first configuration file.

The game algorithm framework may be an algorithm library in which one or more different types of training algorithms and AI models are pre-defined. Various training algorithms and AI models in the game algorithm framework can be pre-built. Among them, when building the AI model in the game algorithm framework in advance, you can choose to build the AI model based on the general-purpose fully connected network structure or convolutional neural network structure of reinforcement learning, and you can also add L2 regularity to the network architecture of the AI model. The L2 regularization term and/or the Dropout layer can be used to improve the generalization performance of the AI model, so as to avoid the low universality of the inference actions output by the trained AI model as much as possible. In practical application scenarios, the number of network layers and/or the number of neurons in the AI model can also be adaptively adjusted according to actual application requirements. When configuring the reasoning service of the first game, the game reasoning apparatus 300 may call the training algorithm and AI model in the game algorithm framework through an application programming interface (Application Programming Interface, API). In practical application, the game developer 101 may subscribe to the game algorithm framework on the cloud platform, so that the game reasoning apparatus 300 configures the reasoning service of the first game based on the game algorithm framework subscribed to.

As an implementation example of a configuration reasoning service, the game reasoning apparatus 300 may include a communication module 301 and a configuration module 303 as shown in FIG. 2 . After obtaining the first configuration file provided by the game developer 101 , the communication module 301 may send the first configuration file to the configuration module 303 . The configuration module 303 can call the corresponding type of target training algorithm in the game algorithm framework according to the type of target training algorithm defined in the first configuration file, and according to the type of AI model defined in the first configuration file, in the game algorithm framework. One or more AI models of the corresponding type are called in the . Because the AI model in the game algorithm framework usually has poor reasoning effect before model training, the configuration module 303 can use the called target training algorithm to train the AI model, so that the configuration module 303 can further configure the AI model according to the trained AI model. Inference services required by game developers 101 . The specification of computing resources required by the configuration module 303 to implement AI model training and the subsequent reasoning module 302 to use the AI model to perform game reasoning can be determined according to the first configuration file provided by the game developer 101 .

Exemplarily, the configuration module 303 can train the AI model by means of hyperparameter search. Among them, the hyperparameters in the AI model refer to the parameters preset by the AI model before model training, and other parameters remaining in the AI model can be determined through the subsequent model training process. In practical application, the pre-set hyperparameters may not necessarily enable the inference effect of the AI model for the target object to reach a higher or highest level. Therefore, in this embodiment, the game inference device 300 can search through the hyperparameters. The AI model is trained in a way to determine the hyperparameters that can make the trained AI model achieve a higher inference effect.

During specific implementation, the configuration module 303 can pre-determine multiple sets of possible values of the hyperparameters, so as to construct multiple AI models based on different values of the hyperparameters, taking the construction of the first AI model and the second AI model as an example (actually When applied, a larger number of AI models can be constructed based on the multiple possibilities of hyperparameter values). The first AI model and the second AI model have different hyperparameters, but both the first AI model and the second AI model belong to the type of AI model defined in the first configuration file. In this way, the configuration module 303 can use the pre-acquired training data to separately train the first AI model and the second AI model. Usually, there are at least one set of possible values of hyperparameters, which can make the AI model trained based on the set of values have a higher inference effect, such as the actions and/or actions inferred by the AI model. Or the applicability of the state is higher, etc. In this embodiment, the reasoning effect of the AI model can be measured by the reward value. Specifically, when training the first AI model and the second AI model, the configuration module 303 can use the reward function defined in the first configuration file to calculate the reward value corresponding to the first AI model and the reward corresponding to the second AI model respectively. value. At this time, if the reward value corresponding to the first AI model is greater than the reward value corresponding to the second AI model, the configuration module 303 uses the hyperparameters of the first AI model as the hyperparameters to be searched, and uses the first AI model As an AI model with relatively high training effect. Wherein, the reward value corresponding to the AI model can be calculated by the reward function preconfigured by the game developer 101 . In practical application, when the number of constructed AI models is more than two, the configuration module 303 can use the hyperparameter of the AI model with the largest reward value among the multiple AI models as the hyperparameter to be searched out. Correspondingly, The AI model with this hyperparameter is the AI model with the highest training effect. It should be understood that the hyperparameter search is performed according to the reward value as an example for illustrative description. In practical application, the hyperparameter search may also be completed in other possible ways, which is not limited in this embodiment.

During the hyperparameter search process, the configuration module 303 can create a process on the cloud platform, so that each AI model with different hyperparameters can be serially trained by using this process, wherein, when training each AI model with different hyperparameters The reward function used is the same.

Alternatively, the configuration module 303 may create multiple processes on the cloud platform, each process may be responsible for training at least one AI model, and the AI models trained by different processes have different hyperparameters. For example, when the configuration module 303 builds the first AI model and the second AI model, the configuration module 303 can create the first process and the second process on the cloud platform, where the first process is used to train the first AI model , and the second process is used to train the second AI model. Exemplarily, the process created by the configuration module 303 can be represented as an executor (worker), and each executor can be implemented by a process or the like. In this way, the configuration module 303 can use multiple processes to train multiple AI models in parallel, thereby effectively improving the efficiency of hyperparameter search. In an actual application scenario, when the number of processes is the same as the number of AI models, the processes and AI models can be in one-to-one correspondence, so that the configuration module 303 can train the AI models corresponding to each process in parallel through each process. When the number of processes is less than the number of AI models (the hyperparameters of different AI models are different), one process can correspond to multiple AI models, then the configuration module 303 can use multiple processes to first perform parallel training on some AI models , and after completing the training of the part of the AI model, the configuration module 303 may continue to train the remaining part of the AI model by using the multiple processes. Optionally, after completing the training of some AI models, hyperparameter data can be exchanged between multiple processes. In this way, the hyperparameters of the AI model with good performance can be reused between processes, and the remaining hyperparameters can be randomly selected. Explore new hyperparameters that can reduce the computational overhead required to train multiple AI models.

In the above embodiment, an optimal AI model trained by the configuration module 303 by means of hyperparameter search can be used to infer a style of action/state. In a further embodiment, the configuration module 303 can also obtain an AI model capable of inferring a variety of styles of different actions/states by means of population based training (PBT). The reward function used is different when different styles of AI models are used. For example, the configuration module 303 can build the first AI model and the second AI model on the cloud platform, wherein the first AI model can correspond to the reward function 1, and the configuration module 303 can use the reward function 1 to train a model for inferring the first AI model. At the same time, the second AI model may correspond to the reward function 2, and the configuration module 303 may use the reward function 2 to train the action/state for inferring the second style. Population evolution is an asynchronous automatic hyperparameter adjustment optimization method that combines parallelized search and sequence optimization. In the process of population evolution, the configuration module 303 can obtain AI models for inferring different styles by training different pre-defined reward functions, that is, each reward function can correspond to an AI model of one inference style. For example, in the battle game scenario, for the non-player character A in the battle game, the configuration module 303 can train the non-player character A to obtain "aggressive type", "conservative type" and "aggressive type" through the population evolution process. The AI models of the three combat styles of "Balanced" can each correspond to three different reward functions, that is, the AI models of "aggressive" combat styles can correspond to reward function 1, "Conservative" "The AI model of the combat style can correspond to the reward function 2, and the AI model of the "balanced" combat style can correspond to the reward function 3. For each combat style, the configuration module 303 can train an AI model (such as the AI model with the largest reward value) that belongs to the combat style and has a higher reasoning effect through the above-mentioned hyperparameter search. The different reward functions used in the population evolution process may be determined by the configuration module 303 according to the first configuration file, or may be set by the configuration module 303 itself.

In the process of population evolution, based on an AI model, a new AI model may be evolved due to iterative training. At this time, if each AI model (including the new AI model and the old AI model) is assigned a port, then Due to the large number of generated AI models, the number of ports occupied by the process is too large. To this end, when the configuration module 303 uses the process to iteratively train the AI model, it can replace the old AI model with the new AI model, so that the process can reuse the port corresponding to the old AI model to receive training data, and use the training data By training the new AI model, it can effectively avoid that the number of AI models obtained before and after population evolution is too large, which leads to the excessive number of ports occupied by the process in the game inference device 300 .

Wherein, when the configuration module 303 iteratively trains the AI model by means of hyperparameter search and/or population evolution, the AI model can be trained based on the training method of self-play defined in the first configuration file.

In this embodiment, the training data required for training the AI model may be provided by the game terminal 200 to the game inference apparatus 300 . After receiving the training data, the communication module 301 in the game inference device 300 can send the training data to the configuration module 303 . Exemplarily, the training data can be specifically generated by the game application instance of the first game running on the game terminal 200, for example, it can be the attack power corresponding to the non-player character and the player character at different battle moments in the game application instance of the first game. , Defense, HP and other game data.

As an implementation example, the game terminal 200 may send multiple training requests to the communication module 301, and different training requests include different training data of the target object. As shown in FIG. 5 , the multiple training requests come from multiple game application instances of the first game running on the game terminal 200 . For the same target object, multiple game application instances can generate multiple pieces of data about the target object at runtime. In this way, multiple game application instances on the game terminal 200 can generate more training data per unit time, thereby speeding up the training process of the AI model by the game inference apparatus 300 . Of course, in this embodiment, multiple game application instances on the game terminal 200 are used to generate training data in parallel as an example. In other embodiments, the game terminal 200 may also only run one game application instance of the first game and generate training data. data.

Further, the training data for the target object generated by the game application instance running on the game terminal 200 may contain some invalid data, such as data used to mark the data generation time, data size, etc. 303 has no guiding significance for training the AI model, therefore, the game terminal 200 can filter this part of the data as invalid data. In this way, the amount of training data sent by the game terminal 200 to the game inference device 300 can be reduced, thereby reducing the time delay caused by data communication between the game terminal 200 and the game inference device 300 and improving model training efficiency.

In the process of using multiple processes to train multiple AI models in parallel, taking the first process responsible for training the first AI model and the second process responsible for training the second AI model as an example, after the communication module 301 receives multiple inference requests, The training data in the multiple training requests may be sent to the first process and the second process according to the port number and/or IP address of the first process and the port number and/or IP address of the second process. In this way, when using multiple processes to train multiple AI models in parallel, the configuration module 303 can use the first process and the training data received by the first process to train the first AI model, and use the second process and the training data received by the second process to train the first AI model. of training data to train a second AI model. Similarly, when a process is responsible for training multiple AI models (for example, the number of processes is less than the number of AI models), the process of training any one of the multiple AI models by a single process can be similar to the above process, and will not be done here. Repeat.

Optionally, the training data used by different processes when training their corresponding AI models may be derived from one or more game application instances of the first game. In this way, after obtaining the training data generated by different game application instances of the first game, the communication module 301 can send the training data to one or more processes according to the pre-configuration. For example, as shown in FIG. 5 , a process 1 and a process 2 may be created on the game inference device 300 , and the game application instance 1 , the game application instance 2 and the game application instance of the first game may run on the game terminal 200 n. After the communication module 301 receives the training data generated by the three game application instances, the communication module 301 can send the training data 1 of the game application instance 1 and the training data 2 of the game application instance 2 to the process 1, and the game application instance The training data n of n is sent to process 2. The communication module 301 may be preconfigured with a corresponding relationship between the process and the game application instance, so that the communication module 301 can send the training data generated by the game application instance to the corresponding process according to the corresponding relationship. Of course, in other possible examples, the communication module 301 may also send the training data 1 of the game application instance 1 to the process 1 and the process 2 at the same time. be limited. In this way, the game inference apparatus 300 can use the training data generated by different game application instances on the game terminal 200 to simultaneously train multiple AI models. In addition, for the game terminal 200, it can provide training data to the communication module 301 through one output port, instead of providing training data for AI models with different inference styles through multiple output ports, thereby reducing the need for the game terminal 200. port requirements.

After the game inference device 300 completes the successful configuration of the inference service of the first game, the game inference device 300 may also generate a notification message, and send the notification message to the game terminal 200 through the communication module 301 to notify the game terminal 200 of the training process . In practical applications, the game reasoning apparatus 300 may also feed back the training results (eg, including game results, etc.) for the target object in each training process to the game terminal 200 through the communication module 301 . In this way, the game developer 101 can view the relevant data in the model training process on the game terminal 200 through a corresponding interface or window.

In this embodiment, the game reasoning device 300 is used to create a new AI model and train it as an example for illustration. In other practical applications, the existing AI model can also be reused, and the “surgery” method can be used for )" method to train the existing AI model, that is, a new network layer can be added to the network structure in the existing AI model, so that when training the AI model, the hyperparameters in the newly added network layer are mainly searched. And the parameters in the network layer are trained, which can improve the efficiency of model training and reduce the amount of computation required for model training.

In this embodiment, after the configuration of the inference service for the first game is completed on the cloud platform, the game inference device 300 can use the inference service to respond to the inference request for the first game sent by the game terminal 200, so as to realize the game The terminal 200 performs corresponding game reasoning. Based on this, this embodiment may further include:

S303: When running the game application instance, the game terminal 200 sends an inference request to the game inference apparatus 300, where the inference request includes the data to be processed of the target object in the game application instance of the first game.

Exemplarily, in a game battle scenario, the data to be processed carried in the inference request may, for example, be data indicating the state of the non-player character (ie, the target object) and the opponent's player character in the first game, such as the non-player character and the opponent. The game screen of the player character. The content in the game screen may include information such as the HP, magic power, attack power, defense power, skill status of the non-player character and the opposing player character, or may describe the non-player character and the opposing player. Text information of the character's battle status, etc. In the user behavior prediction scenario, the data to be processed may be, for example, video or picture data including the user's past actions. In this embodiment, the specific implementation manner of the data to be processed is not limited.

In practical application, the game inference device 300 can provide an API interface to the game terminal 200, so that the game terminal 200 can send multiple inference requests to the communication module 301 in the game inference device 300 through the API interface. Request the inference action and/or state of the target object at multiple moments in the future.

Since the successful establishment of a communication connection between the game terminal 200 and the communication module 301 (such as a TCP connection established based on a three-way handshake, etc.) usually takes a certain amount of time, the delay of each request by the game terminal 200 for the action inference service may be caused by The connection establishment process between the game terminal 200 and the communication module 301 is added. Based on this, in a possible implementation, the game terminal 200 can establish a long connection with the communication module 301. For example, the game terminal 200 uses Hypertext Transfer Protocol Version 1.1 (Hypertext Transfer Protocol Version 1.1, HTTP1.1) to communicate with A long connection is established between the modules 301 and so on. In this way, after the game terminal 200 successfully establishes a connection with the communication module 301, each time it requests the game inference device 300 for inference services, it does not need to perform the process of establishing a connection, but can directly infer the game based on the established long connection. The apparatus 300 sends the reasoning request, so that the delay for the game terminal 200 to obtain the reasoning result of the target object can be effectively reduced. Correspondingly, when acquiring the training data provided by the game terminal 200 , the game reasoning apparatus 300 may also receive the training data sent by the game terminal 200 through a pre-established persistent connection.

In a further possible implementation, the game terminal 200 and the game inference device 300 may have different deployment environments, for example, the game terminal 200 is deployed in an environment corresponding to a Windows operating system, while the game inference device 300 is deployed in an environment corresponding to a Linux operating system in the environment. At this time, it may be difficult for the game inference device 300 deployed in the environment corresponding to the Linux operating system to directly perform action inference based on the data to be processed generated in the environment corresponding to the Windows operating system. Therefore, the communication module 301 can first determine whether the format of the data to be processed is the target format. If not, it indicates that the game terminal 200 and the game reasoning device 300 may be deployed based on different environments, and the communication module 301 can determine whether the format of the data to be processed is in the first format. The data is decoded to obtain data to be processed in a target format that can be recognized by the game inference device 300 (ie, the cloud platform), and provided to the inference module 302 . The specific implementation of decoding data in one format to obtain data in another format has related applications in actual scenarios, and the process will not be repeated in this embodiment. If the format of the data to be processed in the inference request is the target format, the communication module 301 can directly send the data to be processed to the inference module 302 . Correspondingly, when the game terminal 200 provides the training data to the game inference apparatus 300, it may also perform corresponding processing on the format of the training data to obtain data in a format conforming to the deployment environment of the cloud platform.

When another client sends an inference request to the game inference device 300, the communication module 301 may also detect whether the second format of the data to be processed in the inference request sent by the other client is the same as the target format, and when the second format is the same as the target When the formats are inconsistent, the communication module 301 may convert the data to be processed in the second format into the data to be processed in the target format.

S304: The game inference device 300 uses the inference service of the configured first game to infer the data to be processed in the inference request, and obtains indication information for the action and/or state of the target object.

In a specific implementation, after receiving the data to be processed provided by the communication module 301, the inference module 302 can call the pre-configured inference service, which relies on the AI model trained in advance through the target training algorithm, so the inference module 302 can first Input the data to be processed into the AI model, and obtain the indication information of the action (such as attack, escape, etc.) and/or state (such as defense increase, emotion) output by the AI model for the target object inference.

Optionally, before obtaining the indication information, the reasoning module 302 may also use a heuristic algorithm to constrain the indication information output by the AI model. Among them, a heuristic algorithm is an algorithm constructed based on intuition or experience, which can give a feasible solution to each instance of the combinatorial optimization problem to be solved under acceptable time and space complexity, wherein the given feasible solution may is the optimal solution, or it may not be the optimal solution, and the degree of deviation of the feasible solution from the optimal solution is usually difficult to predict. In practical applications, constraint rules can be predefined in the heuristic algorithm. For example, in the battle game scenario, the action deduced by the reasoning module 302 for the non-player character (ie the target object) is "moving backward", that is, "running away" in the direction away from the player character in battle, but the non-player character "runs away" The character currently has an insurmountable obstacle behind the game position in the battle map, and there are no obstacles in the remaining directions, that is, the non-player character cannot "move backward" in the battle map. At this time, the inference module 302 can use a heuristic algorithm to constrain the inference action output by the AI model, specifically constraining the inference action of "move backward" to be "move left" or "move right", so as to improve the inference module 302 The plausibility of inferred actions for non-player characters.

S305: The game reasoning apparatus 300 returns the indication information for the action and/or state of the target object to the game terminal 200.

In practical application, if the game terminal 200 and the game inference device 300 are deployed in different environments, the communication module 301 encodes the indication information of the action and/or state of the target object output by the inference module 302 into the information that the game terminal 200 can recognize. instruction information in the first format, and send the action instruction information in the first format to the game terminal 200, so that the game terminal 200 can recognize the instruction information, and set the target object to perform the action corresponding to the instruction information at the next moment or show the Indicates the status corresponding to the information. In this way, since the game reasoning apparatus 300 can provide action reasoning services for the game terminals 200 deployed in various environments, the requirements for the deployment environment of the game terminals 200 can be reduced, and the game reasoning apparatus 300 can provide the game terminals 200 with game-specific reasoning services. universality.

In the above-mentioned embodiment, the process of training the AI model by the game reasoning apparatus 300 and using the trained AI model to infer the action and/or state of the target object is described. In order to facilitate further understanding of the technical solutions of the embodiments of the present application, the following will be introduced in conjunction with application examples of game scenarios. In this application scenario, the game developer expects the game reasoning device 300 to provide a series of reasoning actions for the non-player character A (referred to as character A) in the game application instance, so that character A can defeat non-player character B (referred to as character A) in the game character B), and character A can use a series of reasoning actions in three different combat styles to achieve victory over character B, including three combat styles of "aggressive", "conservative" and "balanced". Among them, the initial blood volume of character A and character B are the same, the strength of attack power and defense power are the same, and the actions that may be performed are also the same. When character A attacks character B within the specified number of steps and reduces character B's HP to 0 and character A still has HP, then character A wins. Set the strength of character A and character B to be the same, and the available moves are also the same.

Referring to FIG. 6 , a schematic flowchart of a method for configuring a game reasoning service on a cloud platform combined with a specific game scene provided by an embodiment of the present application is shown. The method can be applied to the game reasoning apparatus 300 shown in FIG. 7 . Wherein, based on the game inference apparatus 300 shown in FIG. 2 , the game inference apparatus 300 shown in FIG. 7 may further include an object storage service module 304 , a deployment module 305 and an online cloud service module 306 .

As shown in Figure 6, the method may specifically include:

S601: The object storage service module 304 prestores program codes for implementing the communication module 301, the game algorithm framework, the reasoning module 302 and the configuration module 303.

Among them, the program codes of the communication module 301 , the game algorithm framework, the reasoning module 302 and the configuration module 303 can be developed in advance by a technician and stored in the object storage service module 304 . In practical application, the communication module 301 may be implemented by, for example, a flask framework or the like.

S602: The deployment module 305 deploys the program code stored by the object storage service module 304 on the cloud platform, and publishes it as an online service.

In this embodiment, the communication module 301 and the reasoning module 302 for providing game inference services for the game terminal 200 may form a cloud service and be deployed on a cloud platform, etc., to support online provision of services for game developers. In practical applications, the game inference device 300 (or the communication module 301, the game algorithm framework, the inference module 302, and the configuration module 303) can be deployed in a cloud center or an edge center as a game AI (game AI) framework.

S603: When the game developer 101 subscribes to the online service, the online service module 306 pulls the program code of the implementation communication module 301, the game algorithm framework, the reasoning module 302 and the configuration module 303 of the storage node, and deploys them in the cloud center or On the computing nodes in the edge center, in order to use the computing resources on the computing nodes to support the training of AI models and the action reasoning process.

In practical application, after the deployment module 305 publishes the program code for realizing the inference service of the game as an online service, the game developer 101 can subscribe to the online service on the cloud platform through the game terminal 200 to trigger the game inference device 300 for the game. The developer 101 configures the inference service of the game on the cloud platform.

Further, after the online service module 306 successfully deploys the program codes that implement the communication module 301, the game algorithm framework, the reasoning module 302 and the configuration module 303 on the computing node, the communication module 301 can provide an API interface to the game terminal 200 to facilitate the game terminal. 200 establishes a communication connection with the game reasoning device 300 through the API interface. Exemplarily, a long connection can be established between the game terminal 200 and the communication module 301 based on protocols such as HTTP 1.1.

S604: The game terminal 200 receives the configuration file provided by the game developer 101. The configuration file defines the action space, state space of character A and character B, the type of training algorithm, the type of AI model trained by the training algorithm, and the type of AI model used by the training algorithm. The reward function of the AI model is trained, the environment variable indicating the storage address of the AI model, and the specification of the computing resource, and the configuration file is forwarded to the game inference device 300 .

S605: The configuration module 303 in the game inference device 300 invokes the corresponding type of training algorithm and AI model from the game algorithm framework according to the configuration file received by the communication module 301.

Among them, the action space of character A and character B can include walking forward, walking backward, walking left, walking right, attack 1, attack 2, attack 3, attack 4, attack 5, run forward, and backward Jump, run left, run right, etc. State space, including character A and character B's health, position, orientation, mana, attack, defense and other states.

Different reward functions are used to train AI models that infer different fighting styles. As an example, when training an AI model that belongs to the "balanced" fighting style, the reward function defined by the game developer can be shown in the following formula (1):

in,

Represents the reward value calculated by the reward function, hp represents blood volume, t and t-1 represent different moments, and α is a preset coefficient value.

After receiving the configuration file, the configuration module 303 in the game inference device 300 can allocate the computing resources on the computing nodes that meet the specifications to the game inference device 300 according to the computing resource specifications in the configuration file, so that the game inference device 300 configures the computing resources. The game's inference service runs through this computing resource. In addition, the configuration module 303 can call the corresponding type of training algorithm and AI model from the game algorithm framework according to the type of training algorithm and the type of AI model specified in the configuration file, so as to use the selected training algorithm to perform at least one AI model. train. Exemplarily, the network structure of the AI model in the game algorithm framework may include a Dropout layer and an L2 regularization term, so as to improve the generalization performance of the AI model.

Since the AI model called from the game algorithm framework has not been trained yet, its reasoning effect is poor. Therefore, the game reasoning apparatus 300 may further perform step S606 to obtain training data for training the AI model.

S606: Start multiple instances of the same game application on the game terminal 200, and send multiple copies of training data to the communication module 301 by running a preset script. Wherein, each game application instance includes character A and character B, and each game application instance generates a piece of training data for character A and character B.

The script on the game terminal 200 can be developed by technical personnel in advance, and the script can support the communication between the game terminal 200 and the game reasoning device 300 when running.

In this embodiment, multiple game application instances are simultaneously run on the game terminal 200 and multiple copies of training data are generated in parallel, which can speed up the acquisition of training data by the game inference device 300 , thereby speeding up the model training process of the game inference device 300 .

S607 : The communication module 301 decodes the training data sent by the game terminal 200 to obtain the training data in the target format that can be recognized by the inference module 302 , and provides the training data to the configuration module 303 .

In practical applications, the game terminal 200 and the game inference device 300 may be deployed in different environments. Therefore, after receiving the training data sent by the game terminal 200, the communication module 301 can decode the training data into the game inference device. 300 training data in the target format that can be recognized.

In a possible implementation, before sending the training data to the configuration module 303, the communication module 301 may further preprocess the training data. For example, the communication module 301 can standardize information such as game maps and location coordinates in each piece of training data, and add corresponding features for describing information such as character distance and orientation.

S608: The configuration module 303 runs multiple processes, and uses the training data forwarded by the communication module 301 to train multiple AI models in parallel.

As an implementation example, the configuration module 303 may train an AI model in a distributed manner. Specifically, the configuration module 303 may include multiple processes, and each process may train an AI model based on one or more pieces of training data. For each AI model, the configuration module 303 may assign the role A in the training data to the training data. The data is input into the AI model, and the inference action of the character A output by the AI model is obtained, and then the action of the character A obtained by the inference is used to play a game with the character B, and the obtained game result is used to feed back and adjust the parameters in the AI model . In this way, the efficiency of multiple AI models where the configuration module 303 trains can be improved. Similarly, the process of model training for character B is similar to the process of model training for character A, which can be described with reference to relevant places, and will not be repeated here. As shown in Figure 8, in the initial training stage of the AI model, when character A iteratively trains the AI model for about 200 times, the remaining HP of character A at the end of the battle is lower than the remaining HP of character B at the end of the battle, however, As the number of iterative training of the AI model increases, when the iterative training reaches 100, the remaining HP of character A at the end of the battle begins to be higher than that of character B at the end of the battle, that is, character A can defeat B. This can also be reflected in the graph of the win rate of both sides as shown in Figure 9. When the iterative training reaches 100 times, the winning rate of character A is close to 100%. When the iterative training for the AI model reaches the convergence condition, for example, the AI model can make the winning rate of character A reach the preset value (eg 98%, etc.) during the most recent preset number of iterations (eg 20 times) ), the configuration module 303 can continue to train the AI model for character B in a similar manner.

Then, through hyperparameter search and population evolution, the AI models of the above-mentioned three combat styles of "aggressive", "conservative" and "balanced" can be trained. For the specific implementation process of hyperparameter search and population evolution, reference may be made to the descriptions in the above-mentioned embodiments, which will not be repeated here.

Wherein, for the AI model obtained by training, the configuration module 303 may save the AI model according to the storage address executed in the configuration file.

S609: The configuration module 303 feeds back a notification of the completion of the AI model training to the game terminal 200 through the communication module 301.

In this embodiment, when data communication is performed between the game terminal 200 and the game inference device 300, the communication module 301 can complete the format conversion of the communication data, so that the communication parties can mutually identify the communication data sent by the other party.

In practical application, the game developer can view the data generated in the process of training the AI model by the game reasoning apparatus 300 through the game terminal 200 . For example, the game developer 101 can view the training effect and the like on the interface of the cloud platform through the game terminal 200 . As shown in FIG. 10 , the game developer 101 can view the AI models of the three combat styles based on the population evolution method on the interface of the cloud platform, and the change curve of the winning rate for character A during the model training process; or, The game developer 101 can view the change curve of the blood volume between the two sides as shown in FIG. 8 , or the change curve of the victory rate between the two sides as shown in FIG. 9 on the interface of the cloud platform.

S610: The game terminal 200 sends an action inference request to the game inference device 300 for requesting action inference for the character A, where the action inference request includes the game screen of the character A and the character B and the identification of the fighting style.

S611 : After the communication module 301 obtains the game screen and the fighting style identifier in the target format after conversion, it sends it to the reasoning module 302 .

S612: The reasoning module 302 uses the AI model corresponding to the identifier of the fighting style to infer the action instruction information of the character A according to the game screen of the target format.

S613: The reasoning module 302 sends the action indication information of the character A to the communication module 301.

S614: After completing the format conversion of the action indication information, the communication module 301 sends the action indication information in a format that can be recognized by the game terminal 200 to the game terminal 200.

The method for configuring a game reasoning service on a cloud platform provided by the embodiments of the present application is described above with reference to FIGS. 1 to 10 . Next, the functions of the game reasoning apparatus 300 provided by the embodiments of the present application for implementing the above method embodiments are described with reference to the accompanying drawings. of computing devices.

Figure 11 provides a computing device cluster. As shown in FIG. 11 , the computing device cluster 1100 can be specifically used to implement the functions of the game reasoning apparatus 300 shown in FIG. 3 .

Computing device cluster 1100 includes at least one computing device, where each computing device may include a bus 1101 , a processor 1102 and a memory 1103 . The processor 1102 and the memory 1103 communicate through the bus 1101 .

The bus 1101 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus or the like. The bus can be divided into address bus, data bus, control bus and so on. For ease of presentation, only one thick line is used in FIG. 11, but it does not mean that there is only one bus or one type of bus.

The processor 1102 can be a central processing unit (CPU), a graphics processing unit (GPU), a microprocessor (MP) or a digital signal processor (DSP), a neural network Any one or more of processors such as a network processor (neural network processing unit, NPU).

The memory 1103 may include volatile memory, such as random access memory (RAM). The memory 1103 may also include non-volatile memory, such as read-only memory (ROM), flash memory, hard drive (HDD) or solid state drive , SSD).

Executable program codes are stored in the memory 1103, and the processor 1102 executes the executable program codes to execute the aforementioned method for configuring an inference service of a game on a cloud platform executed by the game inference device 300.

Embodiments of the present application also provide a computer-readable storage medium. The computer-readable storage medium may be any available medium that a computing device can store, or a data storage device such as a data center that contains one or more available media. The usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVDs), or semiconductor media (eg, solid state drives), and the like. The computer-readable storage medium includes instructions, and the instructions instruct the computing device to execute the method for configuring an inference service for a game on a cloud platform, which is executed by the game inference apparatus 300 described above.

The embodiments of the present application also provide a computer program product. The computer program product includes one or more computer instructions. When the computer instructions are loaded and executed on the computing device, all or part of the processes or functions described in the embodiments of the present application are generated.

The computer instructions may be stored in or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be transmitted over a wire from a website site, computer or data center. (eg coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (eg infrared, wireless, microwave, etc.) to another website site, computer or data center.

The computer program product can be a software installation package, which can be downloaded and executed on a computing device when any of the aforementioned object recognition methods needs to be used.

The descriptions of the processes or structures corresponding to each of the above-mentioned drawings have their own emphasis, and for the parts that are not described in detail in a certain process or structure, reference may be made to the related descriptions of other processes or structures.

Claims

A method for configuring a game reasoning service on a cloud platform, characterized in that the method comprises:

obtaining a first configuration file, where the first configuration file includes configuration information for the first game;

Based on the game algorithm framework of the cloud platform and the first configuration file, the inference service of the first game is configured on the cloud platform.
The method according to claim 1, wherein the method further comprises:

obtaining a second configuration file, where the second configuration file includes configuration information for the second game;

Based on the game algorithm framework and the second configuration file of the cloud platform, the reasoning service of the second game is configured on the cloud platform.
The method according to claim 1 or 2, wherein the method further comprises:

Utilize the inference service of the first game to respond to an inference request sent by a game terminal, wherein the game terminal includes a device running a game application instance of the first game, and the inference request includes an inference request for the first game The to-be-processed data of the target object in the game application instance, the response includes the indication information for the action and/or state of the target object.
The method according to any one of claims 1-3, wherein the first configuration file includes one or more of the following configuration information: the target object in the game application instance of the first game Action space, the state space of the target object in the game application instance of the first game, the first type of target training algorithm, the second type of artificial intelligence AI model, the reward function, the training method of the AI model, the The inference method of the AI model, the storage address of the AI model, and the specification of computing resources used for training and inference of the AI model.
The method according to any one of claims 1-4, wherein the inference service of the first game is configured on the cloud platform based on the game algorithm framework of the cloud platform and the first configuration file, comprising: :

training at least one AI model based on the first configuration file and the game algorithm framework;

The reasoning service of the first game is configured according to the at least one AI model that has been trained.
The method according to claim 5, wherein the training of at least one AI model comprises:

Receive multiple training requests from the game terminal, the multiple training requests are from multiple game application instances of the first game, and different training requests include different training data for the same target object in the multiple game application instances ;

The at least one AI model is trained according to the training data in the plurality of training requests.
The method according to claim 5 or 6, wherein when the at least one AI model includes a first AI model and a second AI model, the hyperparameters of the first AI model and the second AI model and/or, the reward functions corresponding to the first AI model and the second AI model are different.
The method according to any one of claims 5-7, wherein when the at least one AI model includes a first AI model and a second AI model, the cloud platform runs a first process and a second process , the at least one AI model is trained according to the training data in the multiple training requests, including:

According to the port number and/or IP address of the first process and the port number and/or IP address of the second process, the training data in the plurality of training requests is sent to the first process and the the second process;

The first AI model is trained using the first process and the training data received by the first process, and the second AI model is trained using the second process and the training data received by the second process.
The method according to any one of claims 5-8, wherein the training of at least one AI model based on the first configuration file and the game algorithm framework includes:

According to the first type of target training algorithm and the second type of the AI model in the first configuration file, the first type of target training algorithm and the second type of target training algorithm are called in the game algorithm framework the at least one AI model;

The at least one AI model of the second type is trained based on the called target training algorithm of the first type.
The method according to any one of claims 1-9, wherein when the data formats of the game terminal and the cloud platform are different, the reasoning request sent by the game terminal is performed using the reasoning service of the first game. Before responding, the method further includes:

The format of the data in the reasoning request sent by the game terminal is processed to obtain data in a data format that can be recognized by the cloud platform.
The method according to any one of claims 1-10, wherein a long connection is maintained between the cloud platform and the game terminal, and the cloud platform receives and responds to the game through the long connection The inference request sent by the client.
The method according to any one of claims 1-11, wherein the acquiring the first configuration file comprises:

The first configuration file is acquired based on the configuration information item selected by the game developer.
An apparatus for configuring a reasoning service for a game, wherein the apparatus comprises:

a communication module, configured to obtain a first configuration file, where the first configuration file includes configuration information for the first game;

The configuration module is configured to configure the reasoning service of the first game on the cloud platform based on the game algorithm framework of the cloud platform and the first configuration file.
The device of claim 13, wherein:

The communication module is further configured to obtain a second configuration file, where the second configuration file includes configuration information for the second game;

The configuration module is further configured to configure the reasoning service of the second game on the cloud platform based on the game algorithm framework and the second configuration file of the cloud platform.
The device according to claim 13 or 14, wherein the device further comprises:

an inference module, configured to use the inference service of the first game to respond to an inference request sent by a game terminal, wherein the game terminal includes a device running a game application instance of the first game, and the inference request includes an inference request for The to-be-processed data of the target object in the game application instance of the first game, and the response includes indication information for the action and/or state of the target object.
The device according to any one of claims 13 to 15, wherein the first configuration file includes one or more of the following configuration information: the target object in the game application instance of the first game Action space, the state space of the target object in the game application instance of the first game, the first type of target training algorithm, the second type of artificial intelligence AI model, the reward function, the training method of the AI model, the The inference method of the AI model, the storage address of the AI model, and the specification of computing resources used for training and inference of the AI model.
The device according to any one of claims 13-16, wherein the configuration module is specifically used for:

training at least one AI model based on the first configuration file and the game algorithm framework;

The reasoning service of the first game is configured according to the at least one AI model that has been trained.
The device according to claim 17, wherein the configuration module is specifically used for:

Receive multiple training requests from the game terminal, the multiple training requests are from multiple game application instances of the first game, and different training requests include different training data for the same target object in the multiple game application instances ;

The at least one AI model is trained according to the training data in the plurality of training requests.
The apparatus according to claim 17 or 18, wherein when the at least one AI model includes a first AI model and a second AI model, hyperparameters of the first AI model and the second AI model and/or, the reward functions corresponding to the first AI model and the second AI model are different.
The apparatus according to any one of claims 17-19, wherein when the at least one AI model includes a first AI model and a second AI model, the cloud platform runs a first process and a second process , the configuration module is specifically used for:

According to the port number and/or IP address of the first process and the port number and/or IP address of the second process, the training data in the plurality of training requests is sent to the first process and the the second process;

The first AI model is trained using the first process and the training data received by the first process, and the second AI model is trained using the second process and the training data received by the second process.
The device according to any one of claims 17-20, wherein the configuration module is specifically used for:

According to the first type of target training algorithm and the second type of the AI model in the first configuration file, the first type of target training algorithm and the second type of target training algorithm are called in the game algorithm framework the at least one AI model;

The at least one AI model of the second type is trained based on the called target training algorithm of the first type.
The device according to any one of claims 13-21, wherein when the data formats of the game terminal and the cloud platform are different, the reasoning request sent by the game terminal is performed using the reasoning service of the first game. Before responding, the method further includes:

The format of the data in the reasoning request sent by the game terminal is processed to obtain data in a data format that can be recognized by the cloud platform.
The device according to any one of claims 13-22, wherein a long connection is maintained between the cloud platform and the game terminal, and the cloud platform receives and responds to the transmission from the game terminal through the long connection inference request.
The device according to any one of claims 13-23, wherein the communication module is specifically configured to acquire the first configuration file based on a configuration information item selected by a game developer.
A computing device cluster, characterized in that the computing device cluster includes at least one computing device, and each computing device includes a processor and a memory;

The processor is adapted to execute instructions stored in the memory to cause the at least one computing device to perform the method of any one of claims 1 to 12.
A computer-readable storage medium, comprising instructions that, when executed on a computing device, cause the computing device to perform the method of any one of claims 1 to 12.
A computer program product comprising instructions which, when run on a computing device, cause the computing device to perform the method of any one of claims 1 to 12.