CN113599802B

CN113599802B - Data processing method, device and system

Info

Publication number: CN113599802B
Application number: CN202110837835.2A
Authority: CN
Inventors: 刘舟; 杨帆; 黎广璘
Original assignee: Anhui Sanqi Jiyu Network Technology Co ltd
Current assignee: Anhui Sanqi Jiyu Network Technology Co ltd
Priority date: 2021-07-23
Filing date: 2021-07-23
Publication date: 2024-01-16
Anticipated expiration: 2041-07-23
Also published as: CN113599802A

Abstract

The present disclosure relates to the field of machine learning, and in particular, to a data processing method, apparatus, and system. The method comprises the following steps: the method comprises the steps that when a first server controls a target object to execute a next action, environmental data related to the target object are collected; the target object refers to a proxy object in any scene; performing data type conversion on the environment data to obtain standard format data, and transmitting the standard format data to a second server; the standard format data at least comprises matrix dimension information and environment parameter information; when action data returned by the second server is received, controlling the target object to execute the next action according to the action data; the action data is obtained by the second server analyzing the environmental parameter information according to the matrix dimension information to obtain a series of environmental parameters included in the environmental parameter information and deducing according to the series of environmental parameters. The embodiment of the invention can train A I in different scenes by using a set of solutions, and reduces the research and development cost of A I.

Description

Data processing method, device and system

Technical Field

The present disclosure relates to the field of machine learning, and in particular, to a data processing method, apparatus, and system.

Background

Often, in a game scene, a player is disconnected due to a series of uncorruptable factors such as network, equipment crashing and the like, so that the game experience of other players is affected, and the game is particularly good for MOBA (Multiplayer Online Battle Arena, multiplayer online tactical competition game) games, so that offline escrow AI (Artificial Intelligence ) is needed to solve the problems.

The traditional offline managed AI is an AI which is manually written by a developer and developed by means of a state machine, a behavior tree, a rule script and the like, is easy to act, is single in path-sleeving, cannot adapt to on-site situation adjustment strategies, and is very high in research and development cost when the traditional development mode is used for developing the AI for complex scenes. The AI of reinforcement learning training is used, the training device can be more flexibly adapted to the environment, has changeable behaviors and random strain, is more real and personified, requires machine calculation forces such as GPU, CPU and the like in the training process, does not need human intervention, and can reduce the research and development cost compared with the training device.

However, when the reinforcement learning is used to train the AI at present, only the solution of training the AI can be developed independently according to specific cases, if the AI needs to be trained for different game scenes, the corresponding solution needs to be developed according to the requirements of different game scenes, for example, the corresponding communication standard code library, the communication entity serialization scheme defined by training interaction, the AI training end standard code library and the like are developed for different game scenes, so that the research and development cost of the AI is increased.

Disclosure of Invention

The invention aims at the defects or shortcomings and provides a data processing method, a device and a system, and the embodiment of the invention can train AI in different scenes by using a set of solutions, so that the research and development cost of the AI is reduced.

The present invention provides, according to a first aspect, a data processing method, which in one embodiment is applied to a first server for controlling proxy objects in a plurality of scenarios to perform actions; the method comprises the following steps:

the method comprises the steps that when a first server controls a target object to execute a next action, environmental data related to the target object are collected; the target object refers to a proxy object in any scene;

performing data type conversion on the environment data to obtain standard format data, and transmitting the standard format data to a second server; the standard format data at least comprises matrix dimension information and environment parameter information;

when action data returned by the second server is received, controlling the target object to execute the next action according to the action data; the action data is obtained by the second server analyzing the environmental parameter information according to the matrix dimension information to obtain a series of environmental parameters included in the environmental parameter information and deducing according to the series of environmental parameters.

In one embodiment, the action data is obtained by analyzing the environmental parameter information by the second server according to the matrix dimension information to obtain a series of environmental parameters included in the environmental parameter information, converting the environmental parameters into model input data meeting the input data format requirement of the target model by using an analysis tool, and performing inference by using the target model based on the model input data; the target model is used to infer a next action of the target object.

In one embodiment, the step of transmitting the standard format data to the second server comprises:

serializing the standard format data into binary data;

transmitting the binary data to a second server based on a predetermined communication protocol; the predetermined communication protocol is the only communication protocol employed for communication between the first server and the second server;

accordingly, after receiving the binary data, the second server deserializes the binary data to obtain the standard format data.

In one embodiment, the environmental data includes a plurality of parameters; the step of converting the data type of the environment data to obtain the standard format data comprises the following steps:

determining information types corresponding to all parameters in the environment data; the information type at least comprises a matrix dimension and an environment parameter;

Transmitting each parameter in the environment data into a standard parameter with a data type corresponding to the information type of the parameter as a structural body to obtain standard format data; the number of standard parameters is the same as the number of categories of information types.

The present invention provides, according to a second aspect, a data processing method, which in one embodiment is applied to a second server; the method comprises the following steps:

obtaining standard format data from a first server; the standard format data at least comprises matrix dimension information and environment parameter information; the standard format data is obtained by collecting environment data related to a target object when the first server controls the target object to execute the next action and converting the data type of the environment data, wherein the first server is used for controlling proxy objects in a plurality of scenes to execute the action, and the target object refers to the proxy object in any scene;

analyzing the environmental parameter information according to the matrix dimension information to obtain a series of environmental parameters included in the environmental parameter information, and deducing to obtain action data according to the series of environmental parameters;

and transmitting the action data to the first server, so that the first server controls the target object to execute the next action according to the action data when receiving the action data returned by the second server.

In one embodiment, the step of deriving motion data from the series of environmental parameters comprises:

converting the series of environmental parameters into model input data conforming to the input data format requirements of the target model using an analytical tool; the target model is used for deducing the next action of the target object;

and deducing action data based on the model input data by using the target model.

In one embodiment, prior to the step of obtaining the standard format data from the first server, the method comprises:

receiving binary data transmitted by a first server based on a predetermined communication protocol;

performing deserialization on the binary data to obtain standard format data; the predetermined communication protocol is the only communication protocol employed for communication between the first server and the second server.

The present invention provides, according to a third aspect, a system comprising, in one embodiment, a first server for controlling proxy objects in a plurality of scenarios to perform actions, and a second server;

when a first server controls a target object to execute the next action, collecting environment data related to the target object, wherein the target object refers to a proxy object in any scene; performing data type conversion on the environment data to obtain standard format data, and transmitting the standard format data to a second server; the standard format data at least comprises matrix dimension information and environment parameter information;

The second server analyzes the environmental parameter information according to the matrix dimension information to obtain a series of environmental parameters included in the environmental parameter information, deduces action data according to the series of environmental parameters, and transmits the action data to the first server;

when the first server receives the action data returned by the second server, the first server controls the target object to execute the next action according to the action data.

The present invention provides, according to a fourth aspect, a data processing apparatus for controlling, in one embodiment, proxy objects in a plurality of scenarios to perform actions; the device comprises:

the environment data acquisition module is used for acquiring environment data related to the target object when the target object is controlled to execute the next action; the target object refers to a proxy object in any scene;

the data processing module is used for carrying out data type conversion on the environment data to obtain standard format data and transmitting the standard format data to the second server; the standard format data at least comprises matrix dimension information and environment parameter information;

the control module is used for controlling the target object to execute the next action according to the action data when the action data returned by the second server are received; the action data is obtained by the second server analyzing the environmental parameter information according to the matrix dimension information to obtain a series of environmental parameters included in the environmental parameter information and deducing according to the series of environmental parameters.

The present invention provides, according to a fifth aspect, a data processing apparatus, which in one embodiment comprises:

the data acquisition module is used for acquiring standard format data from the first server; the standard format data at least comprises matrix dimension information and environment parameter information; the standard format data is obtained by collecting environment data related to a target object when the first server controls the target object to execute the next action and converting the data type of the environment data, wherein the first server is used for controlling proxy objects in a plurality of scenes to execute the action, and the target object refers to the proxy object in any scene;

the data processing module is used for analyzing the environmental parameter information according to the matrix dimension information to obtain a series of environmental parameters included in the environmental parameter information, and deducing action data according to the series of environmental parameters;

and the transmission module is used for transmitting the action data to the first server, so that the first server controls the target object to execute the next action according to the action data when receiving the action data returned by the second server.

In the embodiment of the invention, when a first server controls a target object, namely an agent object of any one of a plurality of scenes, to execute a next action, environment data related to the target object is collected, the environment data is firstly subjected to data type conversion to obtain standard format data, and then the standard format data is transmitted to a second server; the standard format data at least comprises matrix dimension information and environment parameter information; after receiving the standard format data, the second server analyzes the environment parameter information according to the matrix dimension information to obtain a series of environment parameters included in the environment parameter information, deduces and obtains action data according to the series of environment parameters, and returns the action data to the first server.

Drawings

FIG. 1 is a diagram of an application environment for a data processing method in one embodiment;

FIG. 2 is a flow chart of a data processing method according to an embodiment;

FIG. 3 is a flow chart illustrating data type conversion performed by the first server according to one embodiment;

FIG. 4 is a flow chart of a data processing method according to another embodiment;

FIG. 5 is a flow diagram of interactions between a first server and a second server in a system in one embodiment;

FIG. 6 is a block diagram of a data processing apparatus in one embodiment;

FIG. 7 is a block diagram of a data processing apparatus according to another embodiment;

fig. 8 is an internal structural diagram of a computer device in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

The invention provides a data processing method. In this embodiment, the data processing method may be applied to an application environment as shown in fig. 1. The method comprises the steps that a first server controls agent objects (agents) in different scenes to act, when the first server controls the agent objects in any scene to act next, environment data related to the agent objects are collected, the environment data are converted into standard format data and then are sent to a second server, after the standard format data are received by the second server, environment parameter information is analyzed according to matrix dimension information in the standard format data, a series of environment parameters included in the environment parameter information are obtained, action data are deduced according to the series of environment parameters through a neural network, the action data are returned to the first server, and the first server can know how to control the agent objects to act next through the action data. In this embodiment, the environment parameters of different scenes are abstracted, and the environment parameter formats are abstracted, when the environment parameters of any scene are transmitted to the second server, the first server firstly converts the environment parameters into data in a standard format, so that corresponding communication frames do not need to be developed according to the characteristics (such as the number of parameters, the types of parameters and the like) of the environment parameters in each scene, related personnel only need to set corresponding matrix dimensions for the environment parameters according to the characteristics of the environment parameters in the scene, and the second server can analyze the environment parameter information according to the matrix dimension information in the data in the standard format, so that a series of environment parameters included in the environment parameters are obtained, and further related operations of action inference can be performed according to the series of environment parameters. The AI in this embodiment may be a game AI in the game field, such as a monster AI, an NPC (non-player character) AI, a PVP (player versus player, player-to-player) AI, a PVE (player versus environment, player-to-player environment) AI, a pet/satellite AI, etc., which is used to control a virtual object in a game scene to perform a corresponding action, or may be a robot AI in the robot field, which is used to control a robot to perform a corresponding action, or may be an AI in another field.

Further, the related process of the first server to infer the next action of the proxy object in any scenario by using the second server may occur in the training scenario or the inference scenario of the AI (i.e. the scenario inferred by using the trained AI), that is, the data processing method provided in this embodiment may be applied in the training scenario or the inference scenario of the AI.

For example, in one embodiment, in a training scenario of AI (hereinafter, also referred to as a model), the second server is configured to train AI of proxy objects of each scenario, where the second server may train each AI sequentially or simultaneously, such as in the same reinforcement learning training period, and the second server trains each AI using the same neural network (or different neural networks). The first server converts environment data related to the proxy object in different scenes into standard format data and then sends the standard format data to the second server, and the second server uses a corresponding neural network to infer according to the standard format data related to the proxy object in different scenes. The neural network refers to a simulated neural network model constructed by using a reinforcement learning framework in reinforcement learning.

The process of the first server and the second server interacting to train different AI is similar, and the process of the first server and the second server interacting to train one AI is described as follows:

the first server and the second server are connected firstly, then the first server sends initialization parameters to the second server to instruct the second server to set corresponding training interaction parameters for the trainer, and then the following interaction process is repeated between the first server and the second server to train the AI until the preset training ending condition is met. When a predetermined training end condition is satisfied, for example, when a predetermined value of reward or max step reaches a set value, the second server derives and saves the training result of AI as a model file. Wherein, AI of different scenes can correspond to different model files. In addition, when the training result of AI is saved, the training result may be saved in various common formats in the industry, such as ". Onnx", ". Pth", ". Ckpt", etc., or may be saved in a unified manner in a certain format, such as ". Onnx", which may be flexibly adjusted according to actual requirements, which is not limited in this embodiment. Furthermore, the number of repetitions of this interaction process is in the order of millions or even more than tens of millions when training the AI. For example, the well-known Weiqi AI alpha go is trained by completing hundreds of billions of rounds of game play.

The initialization parameters may include information such as a seed value, a communication version, a packet version, and a training capability parameter, and the training capability parameter may further include information such as a basic reinforcement learning capability, a connection PNG observation setting value, a compression channel transmission mapping, a hybrid action, a training analysis, a variable length observation setting value, and whether to use multiple proxy groups. The initialization parameters can be used for standardizing the first server and the second server, the seed value, the communication version, the packet version and other information are the basis of communication training interaction between the first server and the second server, the parameters are mainly used for verifying the legitimacy of the two parties, and the training capability parameters are some basic configurations for designating a training end (in the training AI scene, the second server can also be called a training end), so that the training end can perform data communication interaction in a designated mode. The training interaction parameters may include information such as environment parameter configuration, action space configuration, and the action space configuration may include configuration information such as action dimension, action value, performance name, and training agent group ID, where the performance name is used to distinguish between different model configurations, and the training agent group ID is used to group different agent objects of the first server.

The interaction process is as follows:

step 1: when a proxy object (namely a proxy) in a control target scene executes a next action, the first server collects environment data related to the proxy object, converts the environment data into standard format data and then sends the standard format data to the second server;

step 2: the second server uses the neural network to infer according to the standard format data to obtain action data, and the action data is sent to the first server;

step 3: the first server controls the proxy object to execute the next action according to the action data.

In each interaction process, after receiving the standard format data, the second server can extract at least a plurality of environment parameters (the types and the number of the environment parameters corresponding to different scenes may be different), calculate the current reward according to the environment parameters, if the current reward reaches the preset maximum reward or the current training number reaches max step (the maximum training number), determine that training is completed, derive a training result, otherwise, convert the environment parameters into model input data, and then infer action data according to the model input data by using a neural network and return to the first server, so that the first server can control the proxy object to execute the next action according to the action data.

In another embodiment, in the inferred scene of the AI, the trained AI of the plurality of scenes may be deployed in the second server, when the first server needs to control the proxy object of any scene to perform the next action, the environment data related to the proxy object is converted into standard format data and then sent to the second server, the second server may extract the environment parameters therefrom, then convert the extracted environment parameters into model input data, and further call the model file corresponding to the proxy object to infer action data according to the model input data and return to the first server, so that the first server may control the proxy object to perform the next action according to the action data.

The first server and the second server may be implemented by independent servers or a server cluster formed by a plurality of servers.

Example 1

In this embodiment, a data processing method provided by the present invention includes steps shown in fig. 2, and an example of application of the method to the first server in fig. 1 is described below.

S110: and collecting environment data related to the target object when the target object is controlled to execute the next action.

The first server is used for controlling proxy objects in a plurality of scenes to execute actions, and the target object refers to the proxy object in any scene. The plurality of scenes refers to a plurality of scenes in the same field (such as all of the fields of a game, a robot, and the like).

Illustratively, in the field of games, the plurality of scenes may refer to a plurality of games, such as a game, B game, and C game, in which case the proxy object in the plurality of scenes may refer to a monster in the a game (the monster refers to a proxy object), B monster in the B game, and C monster in the C game, each of which may perform each step of action (such as a run, jump, lie down, forward (or back/left/right) attack, etc.) under the control of the first server. When the first server is to control any monster to perform the next action, environmental data relating to that monster may be collected. It should be noted that, in different games, the environmental data related to monster that needs to be collected may be different, and it is assumed that the environmental data of monster a that needs to be collected in game a includes at least game identifier of game a, monster identifier of monster a, and attack force and position of monster a, and the environmental data of monster B that needs to be collected in game B includes at least game identifier of game B, monster identifier of monster B, and blood volume, attack force and position of monster B.

For example, in the field of robots, the scene may be an application scene of a robot, and the proxy objects in the multiple scenes may refer to robots in multiple application scenes, such as an a robot in a home scene, and a B robot in an office scene. The first server may control each robot to perform actions (e.g., make various sounds, shake, kick, etc.), and the first server may collect current environmental data of each robot (e.g., volume, light intensity, voice, etc. data collected by various sensors disposed in the robot and uploaded by the robot).

S120: and performing data type conversion on the environment data to obtain standard format data, and transmitting the standard format data to a second server, wherein the standard format data at least comprises matrix dimension information and environment parameter information.

The first server can convert the collected environmental data into standard format data through a universal communication framework and transmit the standard format data to the second server. The communication framework may be understood as a code implementation of a defined communication standard, which may be embedded in software code in the form of a software dependent library (also called a function dependent library), i.e. SDK, english called Software Development Kit, chinese called software development kit.

In one embodiment, the step of performing data type conversion on the environmental data to obtain standard format data, as shown in fig. 3, includes:

s121: and determining the information type corresponding to each parameter in the environment data.

S122: and transmitting each parameter in the environment data into the standard parameter with the data type corresponding to the information type as the structural body to obtain the standard format data.

The environment data comprises a plurality of parameters, and the number of the parameters of the environment data is different in different scenes. The information type at least comprises a matrix dimension and an environment parameter, wherein the matrix dimension information refers to a standard parameter of which the information type is the matrix dimension, and the environment parameter information refers to a standard parameter of which the information type is the environment parameter. It should be noted that, the types and the number of the information types may be defined according to the requirements of the actual scene, the standard parameters are in one-to-one correspondence with the information types, that is, the number of the standard parameters is the same as the number of the types of the information types, and the first server stores the parameters in the environmental data belonging to the same information type in one standard parameter during conversion. The data type of the standard parameter is a structure (i.e. struct), and can be used for storing a group of data of different or same type.

For example, assuming that AI is required to be trained for multiple games, the basic data types of the games in the training scene and the basic data types of the inferred scene can be found first, and then the common points of the basic data types are found according to the used scene, so as to define corresponding information types, for example, the defined information types are respectively environmental parameters and matrix dimensions, and standard parameters corresponding to the information types are respectively FloatData, observation. Wherein,

FloatData is a collection of flow type data. Because the parameters of the game environment can be expressed by floating point number flow, the flow storage structure of the variable length set can be defined, and different games only have different numbers of game parameters and can be stored in the flow data and prepared for subsequent transmission;

the Observation contains attributes such as acquisition data type of String type, variable dimension tuple of tuple type, etc. The Observation is used for storing the matrix dimension of the environment parameters, and different games can set corresponding values for the Observation according to different environment parameters.

In another example, in addition to two information types, the defined information types may also include vector sensor and proxy object information, whose corresponding standard parameters are vector sensor and agentffo, respectively. Wherein,

The vector sensor contains attributes such as a set of Double type data, a String type proxy object name, an Observation structure body and the like;

the agentffo contains a rewind (bonus value) attribute of the flow type data, a done (current office end mark) attribute of the boolean type, an id attribute of the String type, a groupId attribute of the String type, a maxstepread attribute of the boolean type, etc.

When the first server performs conversion, each parameter in the environment data is transmitted into the standard parameter of which the data type corresponding to the information type is the structural body. The following describes the conversion process by taking the conversion of parameters belonging to two information types of an environment parameter and a matrix dimension in the environment data into FloatData, observation standard parameters as an example (the conversion of other standard parameters is the same).

For example, the game scene A is a balance ball, at this time, the acquired environmental parameters to be transmitted are 6 coordinates, directions and the like of the balance ball, and the data matrix is 1X6 dimensions, so that 6 environmental parameters such as the coordinates, the directions and the like of the balance ball are transmitted into a standard parameter FloatData during conversion, and the data matrix 1X6 is transmitted into a standard parameter Observation; for another example, the B game scene is ping-pong, where the acquired environmental parameters to be transferred are 8 coordinates and speeds of the ball, positions of two players, and the like, and the data matrix is 1X8 dimensions, then the 8 environmental parameters are transferred into the standard parameter FloatData during conversion, and the data matrix 1X8 is transferred into the standard parameter Observation.

Further, the second server is also embedded with the aforementioned general communication framework, and after receiving the standard format data sent by the first server, the second server extracts the matrix dimension information and the environment parameter information from the standard format data through the general communication framework, then analyzes the environment parameter information according to the matrix dimension information, so as to obtain a series of environment parameters (for example, 6 environment parameters of the balance ball in the above example) included in the environment parameter information, deduces according to the series of environment parameters to obtain action data, and then returns the action data to the first server.

When the second server performs inference according to the series of environmental parameters, the analysis tool is used for converting the series of environmental parameters into model input data meeting the input data format requirement of a target model for inferring the next action of the target object, and then the target model is used for performing inference based on the model input data to obtain action data.

Still further, in order to better transmit data in the network, the step of the first server transmitting the data in the standard format to the second server includes: serializing the standard format data into binary data; transmitting the binary data to a second server based on a predetermined communication protocol; the predetermined communication protocol is the only communication protocol employed for communication between the first server and the second server.

Wherein, protobuf can be selected for serialization and deserialization. Serialization refers to the conversion of structural data or objects into a format that can be stored and transmitted (e.g., network transmission), while ensuring that the result of this serialization can be later (possibly in another computing environment) reconstructed back into the original structural data or object.

Accordingly, the second server receives binary data after the standard format data is serialized instead of the standard format data, so that the second server needs to deserialize the binary data after receiving the binary data to obtain the standard format data.

S130: and when the action data returned by the second server is received, controlling the target object to execute the next action according to the action data.

When the first server receives the action data returned by the second server, the first server can control the target object to execute the next action according to the action data.

In this embodiment, the first server needs to convert the environment data into standard format data before transmitting it to the second server via the network. The environment data of the proxy object of any scene can be finally converted into standard format data composed of a plurality of standard parameters, so that a corresponding communication protocol is not required to be developed (or defined) for each scene, only one communication protocol is required to be adopted when the standard format data is transmitted (for example, in the prior art, if the environment data related to the proxy object of the scene A has 8 parameters, an a communication protocol is required to be developed to transmit the 8 parameters, if the environment data related to the proxy object of the scene B has 6 parameters, at this time, the well developed a communication protocol cannot be multiplexed, and a B communication protocol is required to be developed to transmit the 6 parameters, and in the prior art, a c communication protocol is developed to transmit the 4 standard parameters, no matter how many parameters the environment data contain, the environment data is finally converted into the 4 standard parameters, so that the environment data of the proxy objects of different scenes can be transmitted through the communication protocol. For example, when the environment data of the proxy object of the a scene is transmitted in the past, the a communication protocol is required, when the environment data of the proxy object of the B scene is transmitted, the B communication protocol is required, in order to realize data transmission, the a communication protocol is required, when the data of the a scene is transmitted, the a communication protocol is required to be switched to the B communication protocol, and great development cost is required, while when the environment data of the proxy object of any scene is transmitted, the first server converts the environment data into the standard format data and then performs network transmission, so that multiple communication protocols are not required, and only one communication protocol (for example QUIC (Quick UDP Internet Connection)) is required to be selected according to the requirement of the data transmission of each scene.

Example two

Based on the same inventive concept as in the first embodiment, the present invention provides another data processing method, which is applied to the second server shown in fig. 1 in this embodiment. As shown in fig. 4, the method includes:

s210: standard format data from a first server is obtained.

The standard format data at least comprises matrix dimension information and environment parameter information. The standard format data is obtained by collecting environment data related to a target object when the first server controls the target object to execute the next action and performing data type conversion on the environment data. The first server is used for controlling the proxy object in a plurality of scenes to execute actions, and the target object refers to the proxy object in any scene.

S220: analyzing the environmental parameter information according to the matrix dimension information to obtain a series of environmental parameters included in the environmental parameter information, and deducing to obtain action data according to the series of environmental parameters;

s230: and transmitting the action data to the first server, so that the first server controls the target object to execute the next action according to the action data when receiving the action data returned by the second server.

In one embodiment, the step of the second server deriving motion data from the series of environmental parameters comprises:

In one embodiment, before the step of the second server obtaining the standard format data from the first server, the method comprises:

The second server may implement the steps S210 and S220 and deserialize the binary data to obtain the standard format data through a general communication framework. For a description of the general communication framework, please refer to the related content in the first embodiment.

The specific limitation of the first server and the second server in the data processing method provided in the present embodiment may refer to the limitation of the data processing method in the first embodiment, and will not be described herein.

Example III

Based on the same inventive concept as the first embodiment, the present invention also provides a system, in this embodiment, the system includes a first server and a second server shown in fig. 1, where the first server is used to control proxy objects in multiple scenes to perform actions. The interaction between the first server and the second server is shown in fig. 5, and includes:

s310: when a first server controls a target object to execute the next action, collecting environment data related to the target object, wherein the target object refers to a proxy object in any scene; performing data type conversion on the environment data to obtain standard format data, and transmitting the standard format data to a second server; the standard format data at least comprises matrix dimension information and environment parameter information;

s320: the second server analyzes the environmental parameter information according to the matrix dimension information to obtain a series of environmental parameters included in the environmental parameter information, deduces action data according to the series of environmental parameters, and transmits the action data to the first server;

s330: when the first server receives the action data returned by the second server, the first server controls the target object to execute the next action according to the action data.

In one embodiment, the step of the first server transmitting the standard format data to the second server comprises:

the first server serializes the standard format data into binary data;

In one embodiment, the environmental data includes a plurality of parameters; the step of converting the data type of the environmental data by the first server to obtain standard format data comprises the following steps:

the first server determines information types corresponding to all parameters in the environment data; the information type at least comprises a matrix dimension and an environment parameter;

The second server uses an analysis tool to convert the series of environmental parameters into model input data meeting the input data format requirements of the target model; the target model is used for deducing the next action of the target object;

The specific limitation of the first server and the second server in the system provided in this embodiment may refer to the limitation of the data processing method in the first embodiment, which is not repeated herein.

Fig. 2-5 are flow diagrams of a method of determining a transport path in one embodiment. It should be understood that, although the steps in the flowcharts of fig. 2-5 are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in fig. 2-5 may include multiple sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor do the order in which the sub-steps or stages are performed necessarily occur sequentially, but may be performed alternately or alternately with at least a portion of the sub-steps or stages of other steps or steps.

Example IV

Based on the same inventive concept as the first embodiment, the present invention also provides a data processing apparatus, in this embodiment, the apparatus is configured to control proxy objects in multiple scenes to perform actions; as shown in fig. 6, the apparatus includes:

an environmental data collection module 110, configured to collect environmental data related to the target object when the target object is controlled to perform a next action; the target object refers to a proxy object in any scene;

the data processing module 120 is configured to perform data type conversion on the environmental data to obtain standard format data, and transmit the standard format data to the second server; the standard format data at least comprises matrix dimension information and environment parameter information;

the control module 130 is configured to control the target object to execute a next action according to the action data when the action data returned by the second server is received; the action data is obtained by the second server analyzing the environmental parameter information according to the matrix dimension information to obtain a series of environmental parameters included in the environmental parameter information and deducing according to the series of environmental parameters.

In one embodiment, a data processing module includes:

a serialization sub-module for serializing the standard format data into binary data;

A transmission sub-module for transmitting binary data to the second server based on a predetermined communication protocol; the predetermined communication protocol is the only communication protocol employed for communication between the first server and the second server.

In one embodiment, the environmental data includes a plurality of parameters; a data processing module, comprising:

the information type determining submodule is used for determining information types corresponding to all parameters in the environment data; the information type at least comprises a matrix dimension and an environment parameter;

the conversion sub-module is used for transmitting each parameter in the environment data into the standard parameter of which the data type corresponding to the information type is a structural body to obtain standard format data; the number of standard parameters is the same as the number of categories of information types.

The specific limitation of the data processing apparatus provided in this embodiment may refer to the limitation of the data processing method in the first embodiment, and will not be described herein. Each of the modules in the above-described data processing apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

Example five

Based on the same inventive concept as the embodiments, the present invention also provides a data processing apparatus, in one embodiment, as shown in fig. 7, comprising:

a data acquisition module 210, configured to acquire standard format data from the first server; the standard format data at least comprises matrix dimension information and environment parameter information; the standard format data is obtained by collecting environment data related to a target object when the first server controls the target object to execute the next action and converting the data type of the environment data, wherein the first server is used for controlling proxy objects in a plurality of scenes to execute the action, and the target object refers to the proxy object in any scene;

the data processing module 220 is configured to parse the environmental parameter information according to the matrix dimension information to obtain a series of environmental parameters included in the environmental parameter information, and infer action data according to the series of environmental parameters;

and the transmission module 230 is configured to transmit the motion data to the first server, so that when the first server receives the motion data returned by the second server, the first server controls the target object to execute a next motion according to the motion data.

In one embodiment, a data processing module includes:

the conversion sub-module is used for converting the series of environment parameters into model input data meeting the input data format requirements of the target model by using an analysis tool; the target model is used for deducing the next action of the target object;

and the inference sub-module is used for using the target model to infer motion data based on the model input data.

In one embodiment, the data acquisition module is further configured to receive binary data transmitted by the first server based on a predetermined communication protocol, and deserialize the binary data to obtain standard format data before being used for acquiring the standard format data from the first server; the predetermined communication protocol is the only communication protocol employed for communication between the first server and the second server.

The specific limitation of the data processing apparatus provided in this embodiment may refer to the limitation of the data processing method in the second embodiment, and will not be described herein. Each of the modules in the above-described data processing apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

Example six

In the present embodiment, a computer device is provided, the internal structure of which can be shown in fig. 8.

The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing data such as environmental data, and the specific stored data can also be referred to as limitation in the above method embodiment. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a data processing method.

It will be appreciated by those skilled in the art that the structure shown in fig. 8 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

The present embodiment also provides a computer device including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the steps included in the first or second embodiments as described above when executing the computer program.

Example seven

In this embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which when executed by a processor implements the steps as comprised in the above-described first or second embodiments.

Those skilled in the art will appreciate that implementing all or part of the above-described method embodiments may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the method embodiments described above. Any reference to memory, storage, database, or other medium used in the various embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples merely represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

Claims

1. A data processing method, wherein the method is applied to a first server, and the first server is used for controlling proxy objects in a plurality of scenes to execute actions; the method comprises the following steps:

the method comprises the steps that when a first server controls a target object to execute a next action, environmental data related to the target object are collected; the target object refers to a proxy object in any scene; the environment data comprises a plurality of parameters, and the number of the parameters of the environment data in different scenes is different;

Performing data type conversion on the environment data to obtain standard format data, and transmitting the standard format data to a second server; the standard format data at least comprises matrix dimension information and environment parameter information; the second server trains the models of the proxy objects of all scenes by using the same neural network or trains the models of the proxy objects of different scenes by using different neural networks; the step of converting the data type of the environmental data to obtain standard format data comprises the following steps: determining information types corresponding to the parameters in the environment data; the information type at least comprises a matrix dimension and an environment parameter; transmitting each parameter in the environment data into a standard parameter with a data type corresponding to the information type of the parameter as a structural body to obtain standard format data; the number of the standard parameters is the same as the variety number of the information types;

when action data returned by the second server is received, controlling the target object to execute a next action according to the action data; the action data are obtained by the second server analyzing the environmental parameter information according to the matrix dimension information to obtain a series of environmental parameters included in the environmental parameter information, converting the environmental parameters into model input data meeting the input data format requirement of a target model by using an analysis tool, and deducing by using the target model based on the model input data; the target model is used to infer a next action of the target object.

2. The method of claim 1, wherein the step of transmitting the standard format data to the second server comprises:

serializing the standard format data into binary data;

transmitting the binary data to the second server based on a predetermined communication protocol; the predetermined communication protocol is the only communication protocol adopted by the communication between the first server and the second server;

correspondingly, after receiving the binary data, the second server deserializes the binary data to obtain the standard format data.

3. A data processing method, characterized in that the method is applied to a second server; the method comprises the following steps:

obtaining standard format data from a first server; the standard format data at least comprises matrix dimension information and environment parameter information; the standard format data are obtained by collecting environment data related to a target object when the first server controls the target object to execute the next action and converting the data types of the environment data, wherein the first server is used for controlling proxy objects in a plurality of scenes to execute the action, and the target object refers to the proxy object in any one scene; the second server trains the models of the proxy objects of all scenes by using the same neural network or trains the models of the proxy objects of different scenes by using different neural networks; the environment data comprises a plurality of parameters, and the number of the parameters of the environment data in different scenes is different; the step of the first server performing data type conversion on the environmental data comprises the following steps: determining information types corresponding to the parameters in the environment data; the information type at least comprises a matrix dimension and an environment parameter; transmitting each parameter in the environment data into a standard parameter with a data type corresponding to the information type of the parameter as a structural body to obtain standard format data; the number of the standard parameters is the same as the variety number of the information types;

transmitting the action data to the first server, so that the first server controls the target object to execute the next action according to the action data when receiving the action data returned by the second server;

a step of deriving motion data from the series of environmental parameters, comprising:

converting the series of environmental parameters into model input data meeting the input data format requirements of the target model by using an analysis tool; the target model is used for deducing the next action of the target object;

4. A method as claimed in claim 3, comprising, prior to the step of obtaining standard format data from the first server:

receiving binary data transmitted by the first server based on a predetermined communication protocol;

performing deserialization on the binary data to obtain the standard format data; the predetermined communication protocol is the only communication protocol employed for communication between the first server and the second server.

5. A data processing system, the system comprising a first server and a second server, the first server configured to control proxy objects in a plurality of scenarios to perform actions; the second server trains the models of the proxy objects of all scenes by using the same neural network or trains the models of the proxy objects of different scenes by using different neural networks;

the first server acquires environment data related to a target object when controlling the target object to execute a next action, wherein the target object refers to a proxy object in any scene; the environment data comprises a plurality of parameters, and the number of the parameters of the environment data in different scenes is different; performing data type conversion on the environment data to obtain standard format data, and transmitting the standard format data to a second server; the standard format data at least comprises matrix dimension information and environment parameter information; the step of converting the data type of the environmental data to obtain standard format data comprises the following steps: determining information types corresponding to the parameters in the environment data; the information type at least comprises a matrix dimension and an environment parameter; transmitting each parameter in the environment data into a standard parameter with a data type corresponding to the information type of the parameter as a structural body to obtain standard format data; the number of the standard parameters is the same as the variety number of the information types;

The second server analyzes the environmental parameter information according to the matrix dimension information to obtain a series of environmental parameters included in the environmental parameter information, deduces and obtains action data according to the series of environmental parameters, and transmits the action data to the first server; a step of deriving motion data from the series of environmental parameters, comprising: converting the series of environmental parameters into model input data meeting the input data format requirements of the target model by using an analysis tool; the target model is used for deducing the next action of the target object; using the target model to infer motion data based on the model input data;

and when the first server receives the action data returned by the second server, controlling the target object to execute the next action according to the action data.

6. A data processing apparatus, wherein the apparatus is configured to control proxy objects in a plurality of scenarios to perform actions; the device comprises:

the environment data acquisition module is used for acquiring environment data related to the target object when the target object is controlled to execute the next action; the target object refers to a proxy object in any scene; the environment data comprises a plurality of parameters, and the number of the parameters of the environment data in different scenes is different;

The data processing module is used for carrying out data type conversion on the environment data to obtain standard format data, and transmitting the standard format data to a second server; the standard format data at least comprises matrix dimension information and environment parameter information; the second server trains the models of the proxy objects of all scenes by using the same neural network or trains the models of the proxy objects of different scenes by using different neural networks; the data processing module is specifically configured to: determining information types corresponding to the parameters in the environment data; the information type at least comprises a matrix dimension and an environment parameter; transmitting each parameter in the environment data into a standard parameter with a data type corresponding to the information type of the parameter as a structural body to obtain standard format data; the number of the standard parameters is the same as the variety number of the information types;

the control module is used for controlling the target object to execute the next action according to the action data when the action data returned by the second server is received; the action data are obtained by the second server analyzing the environmental parameter information according to the matrix dimension information to obtain a series of environmental parameters included in the environmental parameter information, converting the environmental parameters into model input data meeting the input data format requirement of a target model by using an analysis tool, and deducing by using the target model based on the model input data; the target model is used to infer a next action of the target object.

7. A data processing apparatus, wherein the apparatus is applied to a second server; the device comprises:

the data acquisition module is used for acquiring standard format data from the first server; the standard format data at least comprises matrix dimension information and environment parameter information; the standard format data are obtained by collecting environment data related to a target object when the first server controls the target object to execute the next action and converting the data types of the environment data, wherein the first server is used for controlling proxy objects in a plurality of scenes to execute the action, and the target object refers to the proxy object in any one scene; the device trains the models of the proxy objects of all scenes by using the same neural network or trains the models of the proxy objects of different scenes by using different neural networks; the environment data comprises a plurality of parameters, and the number of the parameters of the environment data in different scenes is different; the step of the first server performing data type conversion on the environmental data comprises the following steps: determining information types corresponding to the parameters in the environment data; the information type at least comprises a matrix dimension and an environment parameter; transmitting each parameter in the environment data into a standard parameter with a data type corresponding to the information type of the parameter as a structural body to obtain standard format data; the number of the standard parameters is the same as the variety number of the information types;

The data processing module is used for analyzing the environmental parameter information according to the matrix dimension information to obtain a series of environmental parameters included in the environmental parameter information, and deducing action data according to the series of environmental parameters; when the data processing module is used for obtaining action data according to the inference of the series of environment parameters, the analysis tool is used for converting the series of environment parameters into model input data which accords with the input data format requirement of a target model, and the target model is used for obtaining action data according to the inference based on the model input data; the target model is used for deducing the next action of the target object;