CN111401557A - Agent decision making method, AI model training method, server and medium - Google Patents
Agent decision making method, AI model training method, server and medium Download PDFInfo
- Publication number
- CN111401557A CN111401557A CN202010492473.3A CN202010492473A CN111401557A CN 111401557 A CN111401557 A CN 111401557A CN 202010492473 A CN202010492473 A CN 202010492473A CN 111401557 A CN111401557 A CN 111401557A
- Authority
- CN
- China
- Prior art keywords
- information
- agent
- current frame
- frame
- map
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F13/00—Video games, i.e. games using an electronically generated display having two or more dimensions
- A63F13/55—Controlling game characters or game objects based on the game progress
- A63F13/58—Controlling game characters or game objects based on the game progress by computing conditions of game characters, e.g. stamina, strength, motivation or energy level
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F13/00—Video games, i.e. games using an electronically generated display having two or more dimensions
- A63F13/60—Generating or modifying game content before or while executing the game program, e.g. authoring tools specially adapted for game development or game-integrated level editor
- A63F13/67—Generating or modifying game content before or while executing the game program, e.g. authoring tools specially adapted for game development or game-integrated level editor adaptively or by learning from player actions, e.g. skill level adjustment or by storing successful combat sequences for re-use
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F2300/00—Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
- A63F2300/60—Methods for processing data by generating or executing the game program
- A63F2300/6027—Methods for processing data by generating or executing the game program using adaptive systems learning from user actions, e.g. for skill level adjustment
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F2300/00—Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
- A63F2300/60—Methods for processing data by generating or executing the game program
- A63F2300/65—Methods for processing data by generating or executing the game program for computing the condition of a game character
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Processing Or Creating Images (AREA)
Abstract
The application discloses an agent decision making method based on an AI model, an AI model training method, a server and a medium, wherein the method comprises the following steps: acquiring current frame state information of an agent and current frame 3D map information in a 3D virtual environment; outputting current frame action output information corresponding to the agent based on the current frame state information and the current frame 3D map information through a time sequence feature extraction module of an AI model; obtaining the next frame state information of the intelligent agent according to the current frame action output information; acquiring historical position information of the intelligent agent, and generating the next frame of 3D map information according to the historical position information; and outputting next frame action output information corresponding to the intelligent agent according to the next frame state information and the next frame 3D map information. Therefore, reliable and efficient AI simulation is achieved.
Description
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to an agent decision making method, an AI model training method, a server, and a medium.
Background
With the rapid development of Artificial Intelligence (AI) technology, the AI technology is widely applied to various fields such as 3D games, virtual traffic, automatic driving simulation, robot trajectory planning, etc., and AI simulation in a 3D virtual space has a great commercial value.
At present, the correct decisions which need to be made at different positions of an intelligent agent in AI simulation are generally learned through the memory capacity of a neural network, and a soft-attention mechanism is used to perform decision analysis on all state information, including dynamically changing information and static unchanging information, such as information that teammates and enemies continuously move in a 3D game, position of material points and other various information, so that some scenes that environmental information simply changes can be met, but the method is not suitable for scenes that the environmental information rapidly changes, and the intelligent agent is difficult to make long-term decisions. Therefore, how to realize reliable and efficient AI simulation becomes an urgent problem to be solved.
Disclosure of Invention
The embodiment of the application provides an agent decision making method, an AI model training method, a server and a medium, which can realize reliable and efficient AI simulation.
In a first aspect, an embodiment of the present application provides an agent decision making method based on an AI model, including:
acquiring current frame state information of an agent and current frame 3D map information in a 3D virtual environment;
outputting current frame action output information corresponding to the agent based on the current frame state information and the current frame 3D map information through a time sequence feature extraction module of an AI model;
obtaining the next frame state information of the intelligent agent according to the current frame action output information;
acquiring historical position information of the intelligent agent, and generating the next frame of 3D map information according to the historical position information;
and outputting next frame action output information corresponding to the intelligent agent according to the next frame state information and the next frame 3D map information.
In a second aspect, an embodiment of the present application further provides a method for training an AI model, including:
acquiring a sample data set, wherein the sample data set comprises multi-frame state information and multi-frame 3D map information of an agent;
outputting multi-frame fusion state vector information corresponding to the agent through a timing sequence feature extraction module of an AI model to be trained based on the multi-frame state information and the multi-frame 3D map information;
constructing a loss function according to the multi-frame fusion state vector information;
and performing multi-step iteration on the loss function to train and update the AI model.
In a third aspect, an embodiment of the present application further provides a server, where the server includes a processor, a memory, and a computer program stored on the memory and executable by the processor, where the memory stores an AI model, and where the computer program, when executed by the processor, implements the AI model-based agent decision making method as described above; alternatively, a training method of the AI model as described above is implemented.
In a fourth aspect, the present application further provides a computer-readable storage medium for storing a computer program, which when executed by a processor, causes the processor to implement the above-mentioned AI model-based agent decision-making method; alternatively, the above-described training method of the AI model is implemented.
The embodiment of the application provides an agent decision making method based on an AI model, an AI model training method, a server and a computer readable storage medium, based on current frame state information of an agent and current frame 3D map information in a 3D virtual environment, a time sequence feature extraction module of the AI model outputs current frame action output information corresponding to the agent based on the current frame state information of the agent and the current frame 3D map information, obtains next frame state information of the agent according to the current frame action output information, generates next frame 3D map information according to historical position information of the agent, further obtains next frame action output information of the agent according to the next frame state information of the agent and the next frame 3D map information, obtains each frame action output information of the agent according to the mode, thereby realizing long-term decision, therefore, reliable and efficient AI simulation is achieved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flow chart illustrating steps of an AI model based agent decision-making method according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a first-level channel of a 3D map provided by an embodiment of the present application;
FIG. 3 is a schematic diagram of a second level channel of a 3D map provided by an embodiment of the present application;
FIG. 4 is a schematic diagram of a third level channel of a 3D map provided by an embodiment of the present application;
FIG. 5 is a diagram illustrating a fourth layer of channels of a 3D map according to an embodiment of the present disclosure;
FIG. 6 is a schematic diagram of an AI model based agent action output provided by an embodiment of the application;
FIG. 7 is a flowchart illustrating steps of a method for training an AI model according to an embodiment of the present application;
FIG. 8 is a schematic diagram of AI model training provided by an embodiment of the present application;
fig. 9 is a schematic block diagram of a server provided in an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The flow diagrams depicted in the figures are merely illustrative and do not necessarily include all of the elements and operations/steps, nor do they necessarily have to be performed in the order depicted. For example, some operations/steps may be decomposed, combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
It is to be understood that the terminology used in the description of the present application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
Some embodiments of the present application will be described in detail below with reference to the accompanying drawings. The embodiments described below and the features of the embodiments can be combined with each other without conflict.
At present, in the AI simulation of a 3D virtual space, generally, the memory capacity of a neural network is used to learn the correct decisions that need to be made at different positions of an agent in the AI simulation, and a soft-attention mechanism is used to analyze all state information including dynamically changing information and static unchanging information, such as information that teammates and enemies continuously move in a 3D game, and various information such as positions of material points, so that the AI simulation can meet some scenes where environmental information changes simply, but is not suitable for scenes where environmental information changes rapidly, and the agent is difficult to make long-term decisions.
In order to solve the above problems, embodiments of the present application provide an AI model-based agent decision making method, an AI model training method, a server, and a computer-readable storage medium, which are used to implement reliable and efficient AI simulation. The AI model-based agent decision making method and the AI model training method can be applied to a server, and the server can be a single server or a server cluster consisting of a plurality of servers.
Referring to fig. 1, fig. 1 is a schematic flowchart illustrating an AI model-based agent decision-making method according to an embodiment of the present disclosure.
As shown in fig. 1, the AI model-based agent decision-making method specifically includes steps S101 to S105.
S101, obtaining current frame state information of an agent and current frame 3D map information in a 3D virtual environment.
For example, in various application scenarios such as Artificial Intelligence (AI), robot simulation in a 3D virtual environment, mechanical arm, unmanned driving, virtual traffic simulation, and the like, or in game AI in a 3D type game, in order to implement rapid and efficient simulation, a correct decision is made on an Agent (Agent) in the 3D virtual environment, and current frame state information of the Agent and current frame 3D map information in the 3D virtual environment are acquired. The intelligent agent is an intelligent agent which is hosted in a complex dynamic environment, autonomously senses environment information, autonomously takes action and realizes a series of preset targets or tasks. The status information of the agent includes, but is not limited to, location information, athletic information, combat power information, etc. of the agent.
Illustratively, the 3D map information is relative map information within a preset range centered on a current location of the agent, rather than global map information of the 3D virtual environment. For example, relative map information in the range of 90m × 90m around the current position of the agent as the center point.
In an embodiment, the 3D map of the 3D virtual environment comprises a plurality of layers of channels, each layer of channels consisting of a plurality of grids, for example each layer of channels consisting of n × n grids, such as taking n to 9, each layer of channels consisting of 9 × 9 grids, wherein each grid is L m × L m in size, for example, taking L to 10, and each grid is 10m × 10m in size.
It should be noted that the number of grids of each layer of channels and the size of the grids can be flexibly set according to actual situations, and are not specifically limited herein. By carrying out grid segmentation on the local 3D map, the problem of overlarge map information dimension caused by overlarge map dimension is avoided.
Each layer of channel records different types of information, wherein the different types of information include but are not limited to whether the intelligent agent moves to the position of the grid, the frequency of the intelligent agent moving to the position of the grid, the sequence of the intelligent agent moving to the position of the grid, the number of material points in the position of the grid, the state information of the intelligent agent moving to the position of the grid, and the like.
Optionally, the multiple layers of channels of the 3D map include a first layer of channels, a second layer of channels, a third layer of channels, and a fourth layer of channels, where a grid of the first layer of channels records whether the agent moves to a position where the grid is located, a grid of the second layer of channels records a frequency with which the agent moves to the position where the grid is located, a grid of the third layer of channels records an order in which the agent moves to the position where the grid is located, and a grid of the fourth layer of channels records a number of material points at the position where the grid is located.
Illustratively, filling a corresponding grid of the first-layer channel with first identification information to represent that the intelligent agent moves to the position of the grid; and filling the second identification information into the corresponding grid of the first-layer channel to represent that the intelligent agent does not move to the position of the grid.
For example, as shown in fig. 2, the first identification information is set to be a value 1, the second identification information is set to be a value 0, and the grid of the first-layer channel stores the value 0 or the value 1 to represent whether the agent has moved to the location of the grid, where the grid storing the value 0 represents that the agent has not moved to the location of the grid, and the grid storing the value 1 represents that the agent has moved to the location of the grid.
Illustratively, the respective grid of the second-level channel is filled with respective integers, representing the frequency of the agent moving to the location of the grid. For example, as shown in FIG. 3, grid fill 0 of the second level of channels represents that the agent has not moved to the location of the grid, grid fill 1 of the second level of channels represents that the agent has moved to the location of the grid 1 time, grid fill 2 of the second level of channels represents that the agent has moved to the location of the grid 2 times, grid fill 3 of the second level of channels represents that the agent has moved to the location of the grid 3 times, and so on.
Illustratively, the respective grids filled to the third level of channels are numbered in different sizes, characterizing the order in which the agents move to the locations of the grids. For example, as shown in FIG. 4, the smaller the number stored in the grid of the third tier of channels, the later the time the agent moves to the location of the grid. It should be noted that, the representation may be reversed, and the number stored in the grid is smaller as the time for the agent to move to the position of the corresponding grid is earlier.
Illustratively, different values are adopted to fill corresponding grids of the fourth layer of channels, and the quantity of material points at the positions of the grids is represented. For example, as shown in fig. 5, a grid filling value 0 of the fourth layer of channels represents that there are no material points at the position of the grid, a grid filling value 1 of the fourth layer of channels represents that there are 1 material points at the position of the grid, a grid filling value 2 of the fourth layer of channels represents that there are 2 material points at the position of the grid, a grid filling value 3 of the fourth layer of channels represents that there are 3 material points at the position of the grid, and so on.
Based on the 3D map information, information such as whether the intelligent agent moves to the position of the corresponding grid of the 3D map, the frequency of the intelligent agent moving to the position of the grid, the sequence of the intelligent agent moving to the position of the grid, the number of material points at the position of the grid and the like is obtained.
In an embodiment, the 3D map records information such as whether the agent moves to the position of the corresponding grid of the 3D map within a preset time period, the frequency of the agent moving to the position of the grid, the order of the agent moving to the position of the grid, and the number of material points existing at the position of the grid. For example, setting the preset number of times as 20, where a grid storage value 0 of a first-layer channel indicates that the agent has not moved to the position of the grid in the 20 times of history data, and a storage value 1 indicates that the agent has moved to the position of the grid in the 20 times of history data; grid filling 1 of the second layer channel represents that the intelligent agent reaches the position of the grid for 1 time in the 20 times of historical data, and grid filling 2 of the second layer channel represents that the intelligent agent reaches the position of the grid for 2 times in the 20 times of historical data; the grid of the third tier of channels stores the order in which agents arrive at the grid in the 20 historical data, numbering the order in which agents arrive at the grid from 0 to 19, with the later the time to arrive at the grid, the smaller the number stored by the grid.
By embedding the corresponding position information of the intelligent agent into the channel of the 3D map and adding the information of the material points into the channel, the position information identification in the AI simulation is promoted, and the generalization of the AI model network is further improved.
And S102, outputting current frame action output information corresponding to the agent through a time sequence feature extraction module of an AI model based on the current frame state information and the current frame 3D map information.
In this embodiment, the AI model is provided with a corresponding timing feature extraction module, where the timing feature extraction module includes, but is not limited to, L STM (L ong Short-Term Memory, long Short Term Memory network) module, GRU (Gated secured unit, Gated cell network) module, transform module, and the like.
The method comprises the steps that an AI model is called, a time sequence feature extraction module based on the AI model takes current frame state information and current frame 3D map information of an agent as input information, the input information is processed by the time sequence feature extraction module, time sequence feature extraction is carried out, and current frame action output information corresponding to the agent is output.
In an embodiment, the current frame state information of the agent and the current frame 3D map information are first subjected to CONCAT fusion and then input to the timing feature extraction module for processing. Specifically, firstly, extracting the state embedding vector feature S in the current frame state information of the agenttAnd obtaining map vector feature M according to current frame 3D map informationtEmbedding the states into vector features StAnd map vector feature MtMerging and inputting the full-connection neural network for processing to obtain the state embedded vector characteristics StAnd map vector feature MtCorresponding fusion information. And inputting the fusion information into a time sequence characteristic extraction module for processing, and outputting current frame action output information corresponding to the intelligent agent.
In one embodiment, different types of information are recorded in a multi-layer channel based on a 3D map, and a map vector feature M is obtained according to current frame 3D map informationtSpecifically, the different types of information are subjected to multilayer convolution calculation to obtain corresponding map vector features Mt. For example, taking a 3D map including the four layers of channels as an example, the current frame 3D map information is subjected to 4 layers of convolution calculation, and is subjected to a leveling operation in the last layer of convolution calculation to obtain a map vector feature Mt。
And S103, acquiring state information of the next frame of the intelligent agent according to the current frame action output information.
And S104, acquiring historical position information of the intelligent agent, and generating the next frame of 3D map information according to the historical position information.
And controlling the intelligent agent to execute corresponding action output based on the output current frame action output information, interacting with the 3D virtual environment, updating the state information of the intelligent agent, and obtaining the next frame state information of the intelligent agent. Meanwhile, the location information of the agent is recorded, and the location information recorded each time is stored, and the location information is stored locally in the server as the historical location information of the agent, or may be stored in another storage device other than the server, which is not limited specifically herein.
And inquiring and acquiring the stored historical position information of the intelligent agent, and constructing and obtaining the next frame of 3D map information according to the historical position information. For example, a preset number corresponding to historical position information for constructing 3D map information is preset, the preset number of historical position information is acquired, and based on the preset number of historical position information, the next frame of 3D map information is constructed. Optionally, the preset number is set to 20, that is, the next frame of 3D map information is constructed according to 20 sets of historical position information. The preset number can be flexibly set according to actual conditions, and is not particularly limited herein.
In one embodiment, to save storage space, only a preset amount of historical location information is saved. The history position information is stored once every time the history position information is recorded, and the history position information with the earliest record is deleted from a plurality of stored history position information, so that the number of the stored history position information is maintained at a preset number. Specifically, each time the current location information of the agent is recorded, it is determined whether the number of stored historical location information reaches a preset number. If the quantity of the stored historical position information does not reach the preset quantity, directly storing the current position information; and if the quantity of the stored historical position information reaches the preset quantity, storing the current position information, and deleting the historical position information recorded earliest in the stored historical position information, so that the quantity of the stored historical position information is maintained at the preset quantity.
In one embodiment, the position information of the agent is not recorded every frame, but the position information of the agent is recorded and stored every preset time period by setting a preset time period as the historical position information of the agent. The preset duration can be flexibly set according to actual conditions, for example, the preset duration is set to 10s, that is, the position information of the agent is recorded and stored every 10 s.
In combination with the above preset number, it is assumed that the preset number is 20, the preset duration is 10s, that is, the location information of the agent is recorded every 10s, and a total of 20 sets of historical location information are saved, which is equivalent to saving the historical location information within a time span of 200 s. And constructing and obtaining the 3D map information of the next frame according to 20 groups of historical position information within the 200s time span.
Illustratively, the timing feature extraction module is L STM module, the L STM module is an independent feature extraction unit, and can receive previous frame hidden state information and current frame state information as input, and output corresponding current frame hidden state information, where the hidden state information includes hidden information (hidden state) and unit state information (cell state), and the current frame hidden state information is used as input of a next frame, the L STM module performs concatation on the current frame state information of the agent and the current frame 3D map information based on the current frame state information, the current frame 3D map information, and the previous frame hidden state information of the agent, and then inputs the concatation information and the previous frame hidden state information into the L STM module, and outputs corresponding current frame hidden state information.
For example, as shown in fig. 6, the current frame 3D map information is subjected to 4-layer convolution calculation to obtain a current frame map vector feature MtEmbedding the current frame state corresponding to the current frame state information of the agent into the vector characteristic StWith the current frame map vector feature MtCONCAT merging is carried out, input into the full-connection neural network for processing, corresponding fusion information is obtained, and then the fusion information and the previous frame of hidden information h are processedt-1Last frame unit status information Ct-1And inputting L STM module for processing, and outputting current frame action output information corresponding to the agent.
L STM module has three gates, namely a forgetting gate (forget gate), an input gate (input gate) and an output gate (output gate), which will process the input information differentlyInformation including previous frame concealment information ht-1And last frame unit state information Ct-1And fusion information x of current frame state information and current frame 3D map information of agenttOutputting the hidden information h of the current frametAnd current frame unit state information Ct. Hiding the previous frame with the information h through a forgetting gatet-1And fusion information xtMerging (CONCAT), passing through a forward network and then outputting forgetting probability f through Sigmoid functiont(a value between 0 and 1). The information h of the previous frame can be hidden by inputting the information ht-1And fusion information xtMerging (CONCAT), passing through a forward network and then through a Sigmoid function to output a corresponding input probability it(value between 0 and 1) and outputs the fusion information x through another forward network via the tanh functiontProcessing result of (C)t ~By multiplication of ftAnd last frame unit state information Ct-1Multiply, and itAnd Ct ~Multiplying, adding the two obtained product values, and updating to the output current frame unit state information C with the added sum valuetThe method comprises the following steps:
Ct=ft·Ct-1+it·Ct ~
output gate control L STM unit output information, output current frame hidden informationIntegrates the hidden information h of the previous framet-1And last frame unit state information Ct-1And fusion information xt. Calculating fusion information x by Sigmoid functiontOutput probability of (1)tAt the same time, the current frame unit state information CtProcessed by tanh function and reacted with OtMultiplying to obtain hidden information h of current frametComprises the following steps:
ht=Ot·tanh(Ct)
wherein, the current frame conceals the information htThe medium contains fusion state vector information corresponding to the intelligent agentBased on the current frame hidden information h in the output current frame hidden state informationtAnd acquiring fusion state vector information corresponding to the intelligent agent, wherein the fusion state vector information comprises multi-frame state information of the intelligent agent. And obtaining the current frame action output information of the agent according to the fusion state vector information.
And S105, outputting next frame action output information corresponding to the intelligent agent according to the next frame state information and the next frame 3D map information.
After the next frame state information and the next frame 3D map information of the agent are obtained, according to the operation in step S102, the next frame action output information corresponding to the agent is output based on the next frame state information and the next frame 3D map information of the agent through the timing feature extraction module of the AI model. The specific operation process can refer to the process described in step S102, and is not described herein again.
Therefore, based on each frame of state information and each frame of 3D map information of the intelligent agent, each frame of action output information corresponding to the intelligent agent can be output, and efficient and reliable long-term decision making of the intelligent agent is further realized according to each frame of action output information corresponding to the intelligent agent.
The method for making a decision of an agent based on an AI model according to the foregoing embodiments is based on current frame state information of the agent and current frame 3D map information in a 3D virtual environment, and outputs current frame action output information corresponding to the agent based on the current frame state information of the agent and the current frame 3D map information through a timing feature extraction module of the AI model, and obtains next frame state information of the agent according to the current frame action output information, and generates next frame 3D map information according to historical position information of the agent, and further obtains next frame action output information of the agent according to the next frame state information of the agent and the next frame 3D map information, and obtains each frame action output information of the agent according to the method, thereby implementing a long-term decision, and thus implementing reliable and efficient AI simulation.
The embodiment of the application also provides a training method of the AI model. The training method of the AI model can be applied to a server, so that reliable and efficient AI simulation can be realized by calling the trained AI model. The server may be a single server or a server cluster composed of a plurality of servers.
Referring to fig. 7, fig. 7 is a flowchart illustrating a method for training an AI model according to an embodiment of the present disclosure.
As shown in fig. 7, the AI model training method includes steps S201 to 204.
S201, obtaining a sample data set, wherein the sample data set comprises multi-frame state information and multi-frame 3D map information of the intelligent agent.
Illustratively, a sample data set corresponding to AI model training is stored in a Remote Dictionary service (Remote Dictionary service) database, and is used for training the AI model, wherein the sample data set includes, but is not limited to, multiframe state information and multiframe 3D map information of a smart agent. And acquiring a sample data set corresponding to AI model training through query access redis.
S202, outputting multi-frame fusion state vector information corresponding to the intelligent agent through a time sequence feature extraction module of the AI model to be trained on the basis of the multi-frame state information and the multi-frame 3D map information.
As described in the embodiment of the AI model-based agent decision making method, the AI model is provided with a corresponding timing feature extraction module, wherein the timing feature extraction module includes, but is not limited to, L STM module, GRU module, Transformer module, and the like.
And the timing sequence feature extraction module based on the AI model takes the multi-frame state information and the multi-frame 3D map information of the intelligent agent as input information, processes the input information by the timing sequence feature extraction module, extracts the timing sequence feature and outputs multi-frame fusion state vector information corresponding to the intelligent agent. Specifically, extracting state embedding vector feature S in multi-frame state informationiAnd map vector features corresponding to multi-frame 3D map informationSign MiEmbedding multiframe states into vector features SiAnd map vector feature MiAnd the input time sequence characteristic extraction module processes and outputs multi-frame fusion state vector information corresponding to the intelligent agent.
Illustratively, the timing feature extraction module is still an L STM module, for example, as shown in fig. 8, a map vector feature M corresponding to each frame of 3D map information is obtained by respectively performing multi-layer convolution calculation on multiple frames of 3D map informationiIs composed of Mt、Mt+1And obtaining the state embedding vector characteristic S corresponding to the state information of each frameiIncluding St、St+1Etc. respectively adding StAnd MtAnd the previous frame hidden information ht-1Last frame unit status information Ct-1Inputting L STM module for processing, and outputting current frame hidden information htCurrent frame unit state information CtCorresponding fusion state vector information; will St+1、Mt+1And current frame hidden information htCurrent frame unit state information CtInputting L STM module for processing, and outputting hidden information h of next framet+1Next frame unit state information Ct+1And obtaining multi-frame fusion state vector information according to the corresponding fusion state vector information.
S203, constructing a loss function according to the multi-frame fusion state vector information.
The loss function includes a value loss (value loss), a policy gradient loss (policy gradient loss), an entropy loss (entropy loss), and the like.
In an embodiment, for multi-frame fusion state vector information, based on each frame of fusion state vector information, motion output information corresponding to each frame of fusion state vector information and a cost function output value corresponding to the motion output information are obtained respectively. The value function output value is used for evaluating the action output information, and if the value function output value is high, the relevant action instruction of the corresponding action output information can be controlled to be executed; and if the value function output value is low, the relevant action command of the corresponding action output information is not executed. And constructing a corresponding loss function based on the obtained multi-frame action output information and the value function output value corresponding to the multi-frame action output information.
And S204, carrying out multi-step iteration on the loss function to train and update the AI model.
Optionally, as shown in fig. 8, the loss function is sent to a GPU (Graphics Processing Unit) for multi-step iterative optimization, so as to obtain relevant AI model parameters after iteration, where the AI model parameters include, but are not limited to, parameters of a timing feature extraction module, parameters of a cost function, and so on. And updating the parameters of the iterative relevant AI model to the AI model so as to finish the training and updating of the AI model.
Meanwhile, various information such as state information of the agent, 3D map information and the like generated by continuous interaction with the 3D virtual environment are stored in a data storage system, such as redis, and used as data in sample data set for iterative training of the AI model.
Referring to fig. 9, fig. 9 is a schematic block diagram of a server according to an embodiment of the present disclosure.
As shown in fig. 9, the server may include a processor, memory, and a network interface. The processor, memory, and network interface are connected by a system bus, such as an I2C (Inter-integrated Circuit) bus.
Specifically, the Processor may be a Micro-controller Unit (MCU), a Central Processing Unit (CPU), a Digital Signal Processor (DSP), or the like.
Specifically, the Memory may be a Flash chip, a Read-Only Memory (ROM) magnetic disk, an optical disk, a usb disk, or a removable hard disk.
The network interface is used for network communication, such as sending assigned tasks and the like. Those skilled in the art will appreciate that the architecture shown in fig. 9 is a block diagram of only a portion of the architecture associated with the subject application, and does not constitute a limitation on the servers to which the subject application applies, as a particular server may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
Wherein the processor is configured to run a computer program stored in the memory and to implement the following steps when executing the computer program:
acquiring current frame state information of an agent and current frame 3D map information in a 3D virtual environment;
outputting current frame action output information corresponding to the agent based on the current frame state information and the current frame 3D map information through a time sequence feature extraction module of an AI model;
obtaining the next frame state information of the intelligent agent according to the current frame action output information;
acquiring historical position information of the intelligent agent, and generating the next frame of 3D map information according to the historical position information;
and outputting next frame action output information corresponding to the intelligent agent according to the next frame state information and the next frame 3D map information.
In some embodiments, before implementing the timing feature extraction module through the AI model to output the current frame action output information corresponding to the agent based on the current frame state information and the current frame 3D map information, the processor further implements:
extracting state embedding vector features in the current frame state information, and acquiring map vector features according to the current frame 3D map information;
merging the state embedding vector features and the map vector features and inputting the merged state embedding vector features and the map vector features into a fully-connected neural network to obtain corresponding fusion information;
when the processor outputs the current frame action output information corresponding to the agent based on the current frame state information and the current frame 3D map information through the timing characteristic extraction module of the AI model, the following steps are specifically implemented:
and inputting the fusion information into the time sequence feature extraction module, and outputting the current frame action output information corresponding to the agent.
In some embodiments, the 3D map of the 3D virtual environment comprises a plurality of layers of channels, each layer of channels consisting of a plurality of meshes, the plurality of layers of channels each recording different types of information.
In some embodiments, the multi-layer channels include at least two layers of channels among a first layer of channels, a second layer of channels, a third layer of channels, and a fourth layer of channels, where a grid of the first layer of channels records whether the agent moves to a position where the grid is located, a grid of the second layer of channels records a frequency of the agent moving to the position where the grid is located, a grid of the third layer of channels records a sequence of the agent moving to the position where the grid is located, and a grid of the fourth layer of channels records a number of material points existing at the position where the grid is located.
In some embodiments, the current frame 3D map information includes different types of information recorded in multiple channels of a 3D map, and the processor specifically implements, when implementing the obtaining of the map vector feature according to the current frame 3D map information:
and carrying out multilayer convolution calculation on the different types of information to obtain the map vector characteristics.
In some embodiments, the current frame 3D map information is relative map information within a preset range centered on a current location of the agent.
In some embodiments, the processor, when executing the computer program, further implements:
and recording and storing the position information of the intelligent agent every preset time, wherein the historical position information of the intelligent agent is a plurality of stored position information.
In some embodiments, when the processor implements the recording and storing of the location information of the agent, the processor implements:
determining whether the quantity of the stored historical position information reaches a preset quantity every time the current position information of the intelligent agent is recorded;
if the quantity of the stored historical position information does not reach the preset quantity, storing the current position information;
and if the quantity of the stored historical position information reaches the preset quantity, storing the current position information, and deleting the historical position information recorded earliest in the stored historical position information.
In some embodiments, the timing feature extraction module comprises an L STM module, the processor, when executing the computer program, further implements:
acquiring L previous frame hidden state information corresponding to the STM module;
when the processor outputs the current frame action output information corresponding to the agent based on the current frame state information and the current frame 3D map information through the timing characteristic extraction module of the AI model, the following steps are specifically implemented:
outputting, by the L STM module, current frame hidden state information corresponding to the L STM module based on the current frame state information, the current frame 3D map information, and the previous frame hidden state information;
and acquiring the current frame action output information corresponding to the agent according to the current frame hidden state information.
In some embodiments, when the processor obtains the current frame action output information corresponding to the agent according to the current frame hidden state information, the following is specifically implemented:
acquiring fusion state vector information corresponding to the agent according to the current frame hidden state information;
and acquiring the current frame action output information according to the fusion state vector information.
In some embodiments, the processor, when executing the computer program, further implements:
acquiring a sample data set, wherein the sample data set comprises multi-frame state information and multi-frame 3D map information of an agent;
outputting multi-frame fusion state vector information corresponding to the agent through a timing sequence feature extraction module of an AI model to be trained based on the multi-frame state information and the multi-frame 3D map information;
constructing a loss function according to the multi-frame fusion state vector information;
and performing multi-step iteration on the loss function to train and update the AI model.
In some embodiments, the timing feature extraction module includes L STM module, the sample data set further includes hidden state information corresponding to the L STM module, and the processor is specifically configured to, when implementing that the timing feature extraction module through the AI model to be trained outputs multi-frame fusion state vector information corresponding to the agent based on the multi-frame state information and the multi-frame 3D map information:
outputting, by the L STM module, the multi-frame fusion state vector information based on the hidden state information, and the multi-frame state information and the multi-frame 3D map information.
In some embodiments, when the processor implements the constructing of the loss function according to the multi-frame fusion state vector information, the following is specifically implemented:
acquiring multiframe action output information and a value function output value corresponding to the multiframe action output information according to the multiframe fusion state vector information;
and constructing the loss function according to the multi-frame action output information and the value function output value corresponding to the multi-frame action output information.
It should be noted that, as will be clearly understood by those skilled in the art, for convenience and brevity of description, the specific working process of the server described above may refer to the corresponding process in the foregoing embodiment of the AI model-based intelligent agent decision making method and/or the AI model training method, and details are not repeated herein.
Embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, where the computer program includes program instructions, and the processor executes the program instructions to implement the steps of the AI model-based agent decision-making method and/or the AI model training method provided in the foregoing embodiments. For example, the computer program is loaded by a processor and may perform the following steps:
acquiring current frame state information of an agent and current frame 3D map information in a 3D virtual environment;
calling an AI model, and outputting current frame action output information corresponding to the agent through a time sequence feature extraction module of the AI model based on the current frame state information and the current frame 3D map information;
obtaining the next frame state information of the intelligent agent according to the current frame action output information;
and acquiring historical position information of the intelligent agent, and generating the next frame of 3D map information according to the historical position information.
The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.
The computer readable storage medium may be an internal storage unit of the server in the foregoing embodiment, for example, a hard disk or a memory of the server. The computer readable storage medium may also be an external storage device of the server, such as a plug-in hard disk provided on the server, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like.
Since the computer program stored in the computer-readable storage medium can execute any one of the AI model-based agent decision making methods and/or AI model training methods provided in the embodiments of the present application, beneficial effects that can be achieved by any one of the AI model-based agent decision making methods and/or AI model training methods provided in the embodiments of the present application can be achieved, which are detailed in the foregoing embodiments and will not be described herein again.
It is to be understood that the terminology used in the description of the present application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items. It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments. While the invention has been described with reference to specific embodiments, the scope of the invention is not limited thereto, and various equivalent modifications or substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
Claims (15)
1. An AI model-based agent decision-making method, comprising:
acquiring current frame state information of an agent and current frame 3D map information in a 3D virtual environment;
outputting current frame action output information corresponding to the agent based on the current frame state information and the current frame 3D map information through a time sequence feature extraction module of an AI model;
obtaining the next frame state information of the intelligent agent according to the current frame action output information;
acquiring historical position information of the intelligent agent, and generating the next frame of 3D map information according to the historical position information;
and outputting next frame action output information corresponding to the intelligent agent according to the next frame state information and the next frame 3D map information.
2. The method according to claim 1, wherein before the outputting, by the timing feature extraction module of the AI model, the current frame action output information corresponding to the agent based on the current frame state information and the current frame 3D map information, the method includes:
extracting state embedding vector features in the current frame state information, and acquiring map vector features according to the current frame 3D map information;
merging the state embedding vector features and the map vector features and inputting the merged state embedding vector features and the map vector features into a fully-connected neural network to obtain corresponding fusion information;
the outputting of the current frame action output information corresponding to the agent by the timing characteristic extraction module of the AI model based on the current frame state information and the current frame 3D map information includes:
and inputting the fusion information into the time sequence feature extraction module, and outputting the current frame action output information corresponding to the agent.
3. The method of claim 1, wherein the 3D map of the 3D virtual environment comprises a plurality of layers of channels, each layer of channels being composed of a plurality of meshes, the plurality of layers of channels each recording different types of information.
4. The method of claim 3, wherein the plurality of layers of channels includes at least two layers of channels selected from a first layer of channels, a second layer of channels, a third layer of channels, and a fourth layer of channels, wherein a grid of the first layer of channels records whether the agent moves to a location of the grid, wherein a grid of the second layer of channels records a frequency of the agent moving to the location of the grid, wherein a grid of the third layer of channels records an order of the agent moving to the location of the grid, and wherein a grid of the fourth layer of channels records a number of asset points at the location of the grid.
5. The method of claim 2, wherein the current frame 3D map information includes different types of information recorded in a multi-layer channel of a 3D map, and the obtaining of the map vector feature according to the current frame 3D map information includes:
and carrying out multilayer convolution calculation on the different types of information to obtain the map vector characteristics.
6. The method of claim 1, wherein the current frame 3D map information is relative map information within a preset range centered on a current location of the agent.
7. The method of claim 1, further comprising:
and recording and storing the position information of the intelligent agent every preset time, wherein the historical position information of the intelligent agent is a plurality of stored position information.
8. The method of claim 7, wherein recording and storing the location information of the agent comprises:
determining whether the quantity of the stored historical position information reaches a preset quantity every time the current position information of the intelligent agent is recorded;
if the quantity of the stored historical position information does not reach the preset quantity, storing the current position information;
and if the quantity of the stored historical position information reaches the preset quantity, storing the current position information, and deleting the historical position information recorded earliest in the stored historical position information.
9. The method of any of claims 1 to 8, wherein the timing feature extraction module comprises an L STM module, the method further comprising:
acquiring L previous frame hidden state information corresponding to the STM module;
the outputting, by the timing feature extraction module of the AI model, current frame action output information corresponding to the agent based on the current frame state information and the current frame 3D map information includes:
outputting, by the L STM module, current frame hidden state information corresponding to the L STM module based on the current frame state information, the current frame 3D map information, and the previous frame hidden state information;
and acquiring the current frame action output information corresponding to the agent according to the current frame hidden state information.
10. The method according to claim 9, wherein the obtaining the current frame action output information corresponding to the agent according to the current frame hidden state information includes:
acquiring fusion state vector information corresponding to the agent according to the current frame hidden state information;
and acquiring the current frame action output information according to the fusion state vector information.
11. A method for training an AI model, comprising:
acquiring a sample data set, wherein the sample data set comprises multi-frame state information and multi-frame 3D map information of an agent;
outputting multi-frame fusion state vector information corresponding to the agent through a timing sequence feature extraction module of an AI model to be trained based on the multi-frame state information and the multi-frame 3D map information;
constructing a loss function according to the multi-frame fusion state vector information;
and performing multi-step iteration on the loss function to train and update the AI model.
12. The method of claim 11, wherein the timing feature extraction module comprises an L STM module, the sample data set further comprises hidden state information corresponding to the L STM module, and the outputting, by the timing feature extraction module of the AI model to be trained, multiframe fusion state vector information corresponding to the agent based on the multiframe state information and the multiframe 3D map information comprises:
outputting, by the L STM module, the multi-frame fusion state vector information based on the hidden state information, and the multi-frame state information and the multi-frame 3D map information.
13. The method according to claim 11 or 12, wherein said constructing a loss function according to said multi-frame fusion state vector information comprises:
acquiring multiframe action output information and a value function output value corresponding to the multiframe action output information according to the multiframe fusion state vector information;
and constructing the loss function according to the multi-frame action output information and the value function output value corresponding to the multi-frame action output information.
14. A server, characterized in that the server comprises a processor, a memory, and a computer program stored on the memory and executable by the processor, the memory storing an AI model, wherein the computer program, when executed by the processor, implements an AI model based agent decision-making method according to any one of claims 1 to 10; alternatively, a training method of the AI model according to any one of claims 11 to 13 is implemented.
15. A computer-readable storage medium, characterized in that a computer program is stored thereon, which, when executed by a processor, causes the processor to carry out the AI model-based agent decision-making method according to any one of claims 1 to 10; alternatively, a training method of the AI model according to any one of claims 11 to 13 is implemented.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010492473.3A CN111401557B (en) | 2020-06-03 | 2020-06-03 | Agent decision making method, AI model training method, server and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010492473.3A CN111401557B (en) | 2020-06-03 | 2020-06-03 | Agent decision making method, AI model training method, server and medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111401557A true CN111401557A (en) | 2020-07-10 |
CN111401557B CN111401557B (en) | 2020-09-18 |
Family
ID=71435720
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010492473.3A Active CN111401557B (en) | 2020-06-03 | 2020-06-03 | Agent decision making method, AI model training method, server and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111401557B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111738372A (en) * | 2020-08-26 | 2020-10-02 | 中国科学院自动化研究所 | Distributed multi-agent space-time feature extraction method and behavior decision method |
CN112295232A (en) * | 2020-11-23 | 2021-02-02 | 超参数科技(深圳)有限公司 | Navigation decision making method, AI model training method, server and medium |
CN112494949A (en) * | 2020-11-20 | 2021-03-16 | 超参数科技(深圳)有限公司 | Intelligent agent action strategy making method, server and storage medium |
CN114627981A (en) * | 2020-12-14 | 2022-06-14 | 阿里巴巴集团控股有限公司 | Method and apparatus for generating molecular structure of compound, and nonvolatile storage medium |
WO2023206532A1 (en) * | 2022-04-29 | 2023-11-02 | Oppo广东移动通信有限公司 | Prediction method and apparatus, electronic device and computer-readable storage medium |
CN118378094A (en) * | 2024-06-25 | 2024-07-23 | 武汉人工智能研究院 | Chip layout model training and application method and device, electronic equipment and storage medium |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102737097A (en) * | 2012-03-30 | 2012-10-17 | 北京峰盛博远科技有限公司 | Three-dimensional vector real-time dynamic stacking technique based on LOD (Level of Detail) transparent textures |
CN107679618A (en) * | 2017-07-28 | 2018-02-09 | 北京深鉴科技有限公司 | A kind of static policies fixed point training method and device |
CN108427989A (en) * | 2018-06-12 | 2018-08-21 | 中国人民解放军国防科技大学 | Deep space-time prediction neural network training method for radar echo extrapolation |
CN109241291A (en) * | 2018-07-18 | 2019-01-18 | 华南师范大学 | Knowledge mapping optimal path inquiry system and method based on deeply study |
CN109464803A (en) * | 2018-11-05 | 2019-03-15 | 腾讯科技(深圳)有限公司 | Virtual objects controlled, model training method, device, storage medium and equipment |
CN109711529A (en) * | 2018-11-13 | 2019-05-03 | 中山大学 | A kind of cross-cutting federal learning model and method based on value iterative network |
US20190213786A1 (en) * | 2015-07-14 | 2019-07-11 | Samsung Electronics Co., Ltd. | Three dimensional content generating apparatus and three dimensional content generating method thereof |
CN110827320A (en) * | 2019-09-17 | 2020-02-21 | 北京邮电大学 | Target tracking method and device based on time sequence prediction |
US10579875B2 (en) * | 2017-10-11 | 2020-03-03 | Aquifi, Inc. | Systems and methods for object identification using a three-dimensional scanning system |
-
2020
- 2020-06-03 CN CN202010492473.3A patent/CN111401557B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102737097A (en) * | 2012-03-30 | 2012-10-17 | 北京峰盛博远科技有限公司 | Three-dimensional vector real-time dynamic stacking technique based on LOD (Level of Detail) transparent textures |
US20190213786A1 (en) * | 2015-07-14 | 2019-07-11 | Samsung Electronics Co., Ltd. | Three dimensional content generating apparatus and three dimensional content generating method thereof |
CN107679618A (en) * | 2017-07-28 | 2018-02-09 | 北京深鉴科技有限公司 | A kind of static policies fixed point training method and device |
US10579875B2 (en) * | 2017-10-11 | 2020-03-03 | Aquifi, Inc. | Systems and methods for object identification using a three-dimensional scanning system |
CN108427989A (en) * | 2018-06-12 | 2018-08-21 | 中国人民解放军国防科技大学 | Deep space-time prediction neural network training method for radar echo extrapolation |
CN109241291A (en) * | 2018-07-18 | 2019-01-18 | 华南师范大学 | Knowledge mapping optimal path inquiry system and method based on deeply study |
CN109464803A (en) * | 2018-11-05 | 2019-03-15 | 腾讯科技(深圳)有限公司 | Virtual objects controlled, model training method, device, storage medium and equipment |
CN109711529A (en) * | 2018-11-13 | 2019-05-03 | 中山大学 | A kind of cross-cutting federal learning model and method based on value iterative network |
CN110827320A (en) * | 2019-09-17 | 2020-02-21 | 北京邮电大学 | Target tracking method and device based on time sequence prediction |
Non-Patent Citations (2)
Title |
---|
董一民、张弛: "5G网络背景下人工智能技术应用的探讨", 《信息通信技术与政策》 * |
赵婷婷等: "模型化强化学习研究综述", 《计算机科学与探索》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111738372A (en) * | 2020-08-26 | 2020-10-02 | 中国科学院自动化研究所 | Distributed multi-agent space-time feature extraction method and behavior decision method |
CN112494949A (en) * | 2020-11-20 | 2021-03-16 | 超参数科技(深圳)有限公司 | Intelligent agent action strategy making method, server and storage medium |
CN112494949B (en) * | 2020-11-20 | 2023-10-31 | 超参数科技(深圳)有限公司 | Intelligent body action policy making method, server and storage medium |
CN112295232A (en) * | 2020-11-23 | 2021-02-02 | 超参数科技(深圳)有限公司 | Navigation decision making method, AI model training method, server and medium |
CN112295232B (en) * | 2020-11-23 | 2024-01-23 | 超参数科技(深圳)有限公司 | Navigation decision making method, AI model training method, server and medium |
CN114627981A (en) * | 2020-12-14 | 2022-06-14 | 阿里巴巴集团控股有限公司 | Method and apparatus for generating molecular structure of compound, and nonvolatile storage medium |
WO2023206532A1 (en) * | 2022-04-29 | 2023-11-02 | Oppo广东移动通信有限公司 | Prediction method and apparatus, electronic device and computer-readable storage medium |
CN118378094A (en) * | 2024-06-25 | 2024-07-23 | 武汉人工智能研究院 | Chip layout model training and application method and device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN111401557B (en) | 2020-09-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111401557B (en) | Agent decision making method, AI model training method, server and medium | |
CN108073981B (en) | Method and apparatus for processing convolutional neural network | |
CN111295675B (en) | Apparatus and method for processing convolution operations using kernels | |
US20200327409A1 (en) | Method and device for hierarchical learning of neural network, based on weakly supervised learning | |
CN110728317A (en) | Training method and system of decision tree model, storage medium and prediction method | |
CN111325664B (en) | Style migration method and device, storage medium and electronic equipment | |
KR20180111959A (en) | Circular networks by motion-based attention for video understanding | |
CN114915630B (en) | Task allocation method, network training method and device based on Internet of Things equipment | |
CN110781893B (en) | Feature map processing method, image processing method, device and storage medium | |
CN108510058B (en) | Weight storage method in neural network and processor based on method | |
CN112163601B (en) | Image classification method, system, computer device and storage medium | |
CN112199190A (en) | Memory allocation method and device, storage medium and electronic equipment | |
CN110132282A (en) | Unmanned plane paths planning method and device | |
CN111589157B (en) | AI model using method, apparatus and storage medium | |
CN112597217B (en) | Intelligent decision platform driven by historical decision data and implementation method thereof | |
CN111709493A (en) | Object classification method, training method, device, equipment and storage medium | |
CN111125519A (en) | User behavior prediction method and device, electronic equipment and storage medium | |
CN111967271A (en) | Analysis result generation method, device, equipment and readable storage medium | |
CN114757362A (en) | Multi-agent system communication method based on edge enhancement and related device | |
CN117993443B (en) | Model processing method, apparatus, computer device, storage medium, and program product | |
CN114830186A (en) | Image classification method and device, storage medium and electronic equipment | |
CN113625753A (en) | Method for guiding neural network to learn maneuvering flight of unmanned aerial vehicle by expert rules | |
CN115545168A (en) | Dynamic QoS prediction method and system based on attention mechanism and recurrent neural network | |
EP4246375A1 (en) | Model processing method and related device | |
US20240104375A1 (en) | Method and system for lightweighting artificial neural network model, and non-transitory computer-readable recording medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |