CN112494949A - Intelligent agent action strategy making method, server and storage medium - Google Patents
Intelligent agent action strategy making method, server and storage medium Download PDFInfo
- Publication number
- CN112494949A CN112494949A CN202011312201.7A CN202011312201A CN112494949A CN 112494949 A CN112494949 A CN 112494949A CN 202011312201 A CN202011312201 A CN 202011312201A CN 112494949 A CN112494949 A CN 112494949A
- Authority
- CN
- China
- Prior art keywords
- current frame
- information
- agent
- parallel task
- task information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000009471 action Effects 0.000 title claims abstract description 82
- 238000000034 method Methods 0.000 title claims abstract description 44
- 230000003993 interaction Effects 0.000 claims abstract description 42
- 239000003795 chemical substances by application Substances 0.000 claims description 165
- 239000000463 material Substances 0.000 claims description 23
- 230000004927 fusion Effects 0.000 claims description 22
- 230000002452 interceptive effect Effects 0.000 claims description 22
- 238000000605 extraction Methods 0.000 claims description 20
- 239000002574 poison Substances 0.000 claims description 19
- 231100000614 poison Toxicity 0.000 claims description 19
- 239000008280 blood Substances 0.000 claims description 15
- 210000004369 blood Anatomy 0.000 claims description 15
- 238000004590 computer program Methods 0.000 claims description 10
- 230000015654 memory Effects 0.000 claims description 10
- 239000013589 supplement Substances 0.000 claims description 8
- 230000006870 function Effects 0.000 claims description 7
- 238000005457 optimization Methods 0.000 claims description 6
- 230000008685 targeting Effects 0.000 claims description 3
- 238000009472 formulation Methods 0.000 claims 1
- 239000000203 mixture Substances 0.000 claims 1
- 238000004088 simulation Methods 0.000 abstract description 23
- 230000000875 corresponding effect Effects 0.000 description 84
- 238000013473 artificial intelligence Methods 0.000 description 57
- 230000001276 controlling effect Effects 0.000 description 11
- 238000010586 diagram Methods 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 7
- 230000008569 process Effects 0.000 description 6
- 230000036544 posture Effects 0.000 description 5
- 230000000694 effects Effects 0.000 description 3
- 230000002829 reductive effect Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 229940079593 drug Drugs 0.000 description 2
- 239000003814 drug Substances 0.000 description 2
- 230000007717 exclusion Effects 0.000 description 2
- 230000000670 limiting effect Effects 0.000 description 2
- 230000036961 partial effect Effects 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- QBPFLULOKWLNNW-UHFFFAOYSA-N chrysazin Chemical compound O=C1C2=CC=CC(O)=C2C(=O)C2=C1C=CC=C2O QBPFLULOKWLNNW-UHFFFAOYSA-N 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000009191 jumping Effects 0.000 description 1
- 230000002147 killing effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000001502 supplementing effect Effects 0.000 description 1
- 230000004083 survival effect Effects 0.000 description 1
Images
Classifications
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F13/00—Video games, i.e. games using an electronically generated display having two or more dimensions
- A63F13/60—Generating or modifying game content before or while executing the game program, e.g. authoring tools specially adapted for game development or game-integrated level editor
- A63F13/67—Generating or modifying game content before or while executing the game program, e.g. authoring tools specially adapted for game development or game-integrated level editor adaptively or by learning from player actions, e.g. skill level adjustment or by storing successful combat sequences for re-use
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F13/00—Video games, i.e. games using an electronically generated display having two or more dimensions
- A63F13/55—Controlling game characters or game objects based on the game progress
- A63F13/56—Computing the motion of game characters with respect to other game characters, game objects or elements of the game scene, e.g. for simulating the behaviour of a group of virtual soldiers or for path finding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F2300/00—Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
- A63F2300/60—Methods for processing data by generating or executing the game program
- A63F2300/6027—Methods for processing data by generating or executing the game program using adaptive systems learning from user actions, e.g. for skill level adjustment
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Processing Or Creating Images (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The application discloses an agent action strategy making method, a server and a storage medium, wherein the method comprises the steps of obtaining current frame state information of an agent in a 3D virtual environment and current frame interaction information of the agent and the 3D virtual environment; outputting current frame parallel task information and current frame non-parallel task information corresponding to the agent through an AI model based on the current frame state information and the current frame interaction information; outputting current frame action output information corresponding to the agent according to the current frame parallel task information and the current frame non-parallel task information; controlling the intelligent agent to interact with the 3D virtual environment according to the current frame action output information so as to obtain the next frame state information and the next frame interaction information of the intelligent agent; and outputting the next frame action output information corresponding to the agent according to the next frame state information and the next frame interaction information. The application can realize highly anthropomorphic AI simulation.
Description
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to a method, a server, and a storage medium for formulating an agent action policy.
Background
With the rapid development of Artificial Intelligence (AI) technology, the AI technology is widely applied to various fields such as 3D games, virtual traffic, automatic driving simulation, robot trajectory planning, etc., and AI simulation in a 3D virtual space has a great commercial value, for example, a match between an agent and a real person in various games can be realized by the AI technology.
Currently, in AI simulation of a partial 3D virtual space, an agent needs to collect various resources in the 3D virtual space and confront other agent players in a continuously reduced security area to enable the agent to live to the end, and in the AI simulation process, the agent needs to make a correct action decision in different environments to enable the agent to transfer and explore with a relative security area as a target point, and to fight with an enemy agent to enable the agent to live to the end.
Therefore, in order to enhance the game experience of the user, the AI simulation is desired to be highly anthropomorphic by the intelligent agent, and therefore how to realize the highly anthropomorphic AI simulation becomes a problem which needs to be solved urgently.
Disclosure of Invention
The embodiment of the application provides an agent action strategy making method, a server and a storage medium, aiming at realizing highly anthropomorphic AI simulation.
In a first aspect, an embodiment of the present application provides an agent action policy making method, where the method includes:
acquiring current frame state information of an agent in a 3D virtual environment and current frame interaction information of the agent and the 3D virtual environment;
outputting current frame parallel task information and current frame non-parallel task information corresponding to the agent through an AI model based on the current frame state information and the current frame interaction information;
outputting current frame action output information corresponding to the intelligent body according to the current frame parallel task information and the current frame non-parallel task information;
controlling the intelligent agent to interact with the 3D virtual environment according to the current frame action output information so as to obtain the next frame state information and the next frame interaction information of the intelligent agent;
and outputting the next frame action output information corresponding to the agent according to the next frame state information and the next frame interaction information.
In a second aspect, an embodiment of the present application further provides a server, where the server includes a processor and a memory; the memory stores a computer program and an AI model that can be invoked and executed by the processor, wherein the computer program, when executed by the processor, implements the agent action policy making method described above.
In a third aspect, an embodiment of the present application further provides a computer-readable storage medium for storing a computer program, where the computer program, when executed by a processor, causes the processor to implement the method for making an action policy of an intelligent agent.
The embodiment of the application provides an agent action strategy making method, a server and a storage medium, wherein the agent action strategy making method comprises the steps of obtaining current frame state information of an agent in a 3D virtual environment and current frame interaction information of the agent and the 3D virtual environment; outputting current frame parallel task information and current frame non-parallel task information corresponding to the agent based on the current frame state information and the current frame interaction information through an AI model; outputting current frame action output information corresponding to the agent according to the current frame parallel task information and the current frame non-parallel task information; controlling the interaction between the intelligent body and the 3D virtual environment according to the current frame action output information so as to acquire the next frame state information and the next frame interaction information of the intelligent body; and outputting the next frame action output information corresponding to the agent according to the next frame state information and the next frame interaction information. By analyzing the executable parallel task information and the non-parallel task information of the intelligent agent in the current state, the action which can be synchronously executed and the mutually exclusive execution action of the intelligent agent at present are obtained according to the parallel task information and the non-parallel task information, and the intelligent agent is controlled to output the corresponding output action according to the parallel task information and the non-parallel task information, so that the action output by the intelligent agent is more reasonable and more humanized.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
FIG. 1 is a flowchart illustrating steps of a method for formulating an action policy of an agent according to an embodiment of the present application;
FIG. 2 is a schematic diagram of an application scenario of a method for making an action policy of an agent according to an embodiment of the present application;
FIG. 3 is a diagram illustrating actions that an agent may selectively output according to parallel task information and non-parallel task information in the application scenario corresponding to FIG. 2;
FIG. 4 is a schematic diagram of an AI model based agent action output provided by an embodiment of the application;
FIG. 5 is another schematic diagram of an AI model based agent action output provided by an embodiment of the application;
fig. 6 is a schematic block diagram of a server provided in an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described clearly and completely with reference to the drawings in the embodiments of the present application, and it should be understood that the described embodiments are some, but not all embodiments of the present application. All other embodiments that can be derived by a person skilled in the art from the embodiments given herein without making any inventive effort fall within the scope of protection of the present application.
The flow diagrams depicted in the figures are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order depicted. For example, some operations/steps may be divided, combined or partially combined, so that the actual execution sequence may be changed according to actual situations.
It is to be understood that the terminology used in the description of the present application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
Some embodiments of the present application will be described in detail below with reference to the accompanying drawings. The embodiments described below and the features of the embodiments can be combined with each other without conflict.
With the rapid development of Artificial Intelligence (AI) technology, the AI technology is widely applied to various fields such as 3D games, virtual traffic, automatic driving simulation, robot trajectory planning, etc., and AI simulation in a 3D virtual space has a great commercial value, for example, a match between an agent and a real person in various games can be realized by the AI technology.
Currently, in AI simulation of a partial 3D virtual space, an agent needs to collect various resources in the 3D virtual space and confront other agent players in a continuously reduced security area to enable the agent to live to the end, and in the AI simulation process, the agent needs to make a correct action decision in different environments to enable the agent to transfer and explore with a relative security area as a target point, and to fight with an enemy agent to enable the agent to live to the end.
Therefore, in order to enhance the game experience of the user, the AI simulation is desired to be highly anthropomorphic by the intelligent agent, and therefore how to realize the highly anthropomorphic AI simulation becomes a problem which needs to be solved urgently.
In order to solve the above problems, embodiments of the present application provide an agent action policy making method, a server, and a computer-readable storage medium for implementing highly anthropomorphic AI simulation. The method for formulating the action strategy of the intelligent agent can be applied to a server, and the server can be a single server or a server cluster consisting of a plurality of servers.
Referring to fig. 1, fig. 1 is a schematic flowchart illustrating a method for making an intelligent agent action policy according to an embodiment of the present application.
As shown in fig. 1, the action decision making method specifically includes steps S101 to S105.
Step S101: the method comprises the steps of obtaining current frame state information of an agent in a 3D virtual environment and current frame interaction information of the agent and the 3D virtual environment.
For example, in various application scenarios such as Artificial Intelligence (AI), robot simulation in a 3D virtual environment, a mechanical arm, unmanned driving, virtual traffic simulation, and the like, or in a game AI in a 3D type game, in order to implement highly anthropomorphic simulation, a highly anthropomorphic action decision is made for an Agent in the 3D virtual environment, and current frame state information of the Agent in the 3D virtual environment and current frame interaction information of the Agent and the 3D virtual environment are acquired, so as to make a corresponding action decision according to the current frame state information and the current frame interaction information. The intelligent agent is an intelligent agent which is hosted in a complex dynamic environment, autonomously senses environment information, autonomously takes action and realizes a series of preset targets or tasks.
The current frame state information of the intelligent agent is self-related state data information used for representing the current frame intelligent agent, and comprises self data information of the intelligent agent and equipment information worn by the intelligent agent. The data information of the intelligent agent comprises position information, motion information, blood volume information, equipment information, affiliated marketing information and the like.
The agent-3D virtual environment interaction information is relative data information between the agent and the 3D virtual environment, such as global information, circle information, material information, and sound information, for representing the current frame of the agent.
In the present embodiment, the AI simulation of the 3D game play is taken as an example for explanation, but the AI simulation includes, but is not limited to, 3DFPS (3DFirst Person Shooter) game, and may be an AI simulation of other 3D game play, and is not limited herein.
As shown in fig. 2, in the 3D game match, the agent may compete with a preset number of other players, where the other players may be other agents or game characters operated by human players, in this embodiment, the other players are taken as the other agents for example, but not limited to that, the other players may only be other agents.
The agents may participate in the game in a team with other agents or in a single team such that there are different teams of agents in the game. The intelligent agent participating in the game can select any region in the 3D virtual environment as a target region and can descend to the target region through parachuting, the intelligent agent needs to collect resources such as different weapons, defensive tools and props in the 3D virtual environment so as to increase the fighting capacity of the intelligent agent, and meanwhile, along with the progress of the game, the safe region on the 3D virtual environment is gradually reduced, the poison circle region is gradually enlarged, the intelligent agent participating in the game can also fight more between the intelligent agents in different camps in order to reach the safe region, the intelligent agent can kill the intelligent agent of the enemy of the intelligent agent belonging to other camps through various strategies, so that the winning is finally obtained.
The position information, the motion information, the blood volume information, the equipment information and the belonging marketing information of the current frame of the intelligent agent are obtained, so that the self-related information of the intelligent agent can be accurately evaluated.
Wherein the position information comprises a spatial position of the agent in the 3D virtual environment, which can be represented by a spatial coordinate system; the motion information comprises the current orientation and the moving speed of the agent; the blood volume information comprises the total blood volume, the residual blood volume and the like of the intelligent agent; the equipment information comprises information of armor, helmet and weapon owned by each weapon slot on the intelligent body, wherein the weapon information comprises weapon type, weapon state, such as weapon loading, residual amount and the like.
Global information, poison circle information, material information and sound information generated after the intelligent agent interacts with the 3D virtual environment are obtained, so that the current environment information of the intelligent agent can be accurately evaluated.
The global information mainly comprises the progress time of the current game, the number of the survival teammates, the total number of killing people of the teams of the current party and the like. The poison circle information comprises record information of the poison circle in the game, such as the center of the current poison circle, the radius of the current poison circle, the stage of the current poison circle, the residual time of the current poison circle, the center of the next poison circle, the radius of the next poison circle and the total time of the next poison circle. The material information comprises the position, the type, the attribute and the quantity of visible materials in the field of view of the intelligent body, wherein the type of the materials comprises but is not limited to guns, cutters, armour, helmets, medicines, throwing objects and the like. The sound information mainly includes the position, relative orientation, type of sound source, and the like of the sound source.
Step S102: and outputting current frame parallel task information and current frame non-parallel task information corresponding to the agent through an AI model based on the current frame state information and the current frame interaction information.
The parallel task information is information corresponding to related actions which can be synchronously executed by the representative agent in the same time, and the parallel task information includes, but is not limited to, mobile task information, first direction aiming task information, second direction aiming task information and non-parallel task selection information. The movement task information includes a movement direction, a movement speed, a posture during movement, and the like. The targeting task information is used to characterize the targeting direction of the agent.
That is, the agent may synchronously output at least one of an action corresponding to execution of the movement task, an action corresponding to execution of the first direction aiming task, an action corresponding to execution of the second direction aiming task, and an action corresponding to non-parallel task selection at the same time.
As shown in FIG. 3, the moving direction includes, but is not limited to, the eight directions of front, back, left, right, left front, left back, right front, and right back.
The aiming directions include, but are not limited to, four directions, i.e., up aiming, down aiming, left aiming, and right aiming, wherein the up aiming, the down aiming are in a first direction, the left aiming, and the right aiming are in a second direction, and the first direction and the second direction are perpendicular to each other.
The non-parallel task information is information corresponding to a task for representing that the intelligent agent outputs action mutual exclusion in the same time, and includes but is not limited to at least one of attack task information, material picking task information, posture control task information and blood volume supplement task information.
That is, the agent can output only one of the action corresponding to the execution of the attack task, the action corresponding to the execution of the material pickup task, the action corresponding to the execution of the posture control task, and the action corresponding to the execution of the blood volume supplement task at the same time. The attack task information is used for controlling selection and switching of weapons when the intelligent agent fights with other intelligent agents in formation, such as opening a gun, switching close-range weapons, switching far-range weapons, collecting weapons, throwing objects and the like.
The material picking task information is used for controlling the intelligent body to pick related information of corresponding articles in a preset range, such as weapons, blood bags and the like in the preset range.
The posture control task information is related information for controlling the intelligent agent to switch postures, such as jumping, squatting, lying, standing, running and walking, as shown in fig. 3.
The blood volume supplement task information is related information used for controlling the intelligent agent to select reasonable medicines to treat the intelligent agent so as to recover the state. And screening out the current frame parallel task information and the current frame non-parallel task information corresponding to the agent by using a preset AI model according to the current frame state information and the current frame interaction information. According to the acquired parallel task information and the parallel task information, the fact that the intelligent agent can output the action corresponding to one or more subtask information in the parallel task information in the current frame state and only can output the action corresponding to one subtask information in the non-parallel task information can be known, the intelligent agent is prevented from synchronously outputting a mutual exclusion action in the current state, and the AI simulation anthropomorphic effect is better.
Referring to fig. 4, in some embodiments, the outputting, by the AI model, current frame parallel task information and current frame non-parallel task information corresponding to the smart object based on the current frame state information and the current frame interaction information includes: respectively extracting the characteristics of the current frame state information and the current frame interactive information to obtain corresponding current frame state characteristic information and current frame interactive characteristic information; and acquiring corresponding current frame parallel task information and current frame non-parallel task information by the timing sequence characteristic extraction module of the AI model based on the current frame state characteristic information and the current frame interactive characteristic information.
The acquiring, by the timing characteristic extraction module of the AI model, corresponding current frame parallel task information and current frame non-parallel task information based on current frame state characteristic information and current frame interactive characteristic information includes: inputting the current frame state characteristic information and the current frame interactive characteristic information into a first full-connection network corresponding to the AI model to obtain corresponding first output information of the current frame; acquiring current frame fusion state vector information corresponding to the agent through a time sequence feature extraction module of the AI model based on the current frame first output information; and inputting the current frame fusion state vector information into a second full-connection network corresponding to the AI model to acquire corresponding current frame parallel task information and current frame non-parallel task information.
Referring to fig. 5, in some embodiments, the current frame interactive feature information includes current frame global feature information, current frame poison circle feature information, current frame material feature information, and current frame sound feature information. The current frame global feature information is obtained by extracting the current frame global information, the current frame poison circle feature information is obtained by extracting the current frame poison circle information, the current frame material feature information is obtained by extracting the current frame material information, and the current frame sound feature information is obtained by extracting the current frame sound information.
Respectively carrying out feature extraction on the current frame state information and the current frame interactive information of the intelligent agent to acquire corresponding current frame state feature information and current frame interactive feature information, and the method comprises the following steps:
and respectively taking the current frame global characteristic information, the current frame poison circle characteristic information, the current frame material characteristic information and the current frame sound characteristic information as the input of the first fully-connected network corresponding to the AI model so as to output the corresponding current frame first output information.
And the timing sequence feature extraction module based on the AI model performs timing sequence feature fusion on the first output information of the current frame to acquire fusion state vector information of the current frame corresponding to the agent.
And inputting the current frame fusion state vector information into the corresponding second full-connection network so as to obtain corresponding current frame parallel task information and current frame non-parallel task information.
In some embodiments, the current frame parallel task information includes mobile task information, first direction aiming task information, second direction aiming task information and non-parallel task selection information of the agent in the current frame; the current frame non-parallel task information comprises attack task information, material picking task information, attitude control task information and blood volume supplement task information of the intelligent agent in the current frame; inputting the current frame fusion state vector information into a second full-connection network corresponding to the AI model to obtain corresponding current frame parallel task information and current frame non-parallel task information, including:
and respectively inputting the current frame fusion state vector information to a second full-connection network corresponding to the AI model so as to output corresponding current frame movement task information, current frame first direction aiming task information, current frame second direction aiming task information, current frame non-parallel task selection information, current frame attack task information, current frame material picking task information, current frame attitude control task information and current frame blood volume supplementing task information.
The same current frame fusion state vector information is used as the input of the second full-connection network, and a plurality of multi-task learning results are output, so that the learning generalization effect is better, and the simulation anthropomorphic effect is stronger.
In this embodiment, the AI model is provided with a corresponding fully-connected neural network and a timing feature extraction module, where the timing feature extraction module includes, but is not limited to, an LSTM (Long Short-Term Memory) module, a GRU (Gated secure Unit), a transform module, and the like.
Taking the example that the timing feature extraction module is an LSTM module, the acquiring, by the timing feature extraction module of the AI model, current frame fusion state vector information corresponding to the agent based on the current frame first output information includes: acquiring previous frame hidden state information corresponding to the LSTM module; outputting, by the LSTM module, current frame hidden state information corresponding to the LSTM module based on the current frame first output information and the previous frame hidden state information; and acquiring current frame fusion state vector information corresponding to the agent according to the current frame hidden state information.
The LSTM module serves as an independent feature extraction unit, and can receive previous frame hidden state information and current frame first output information as inputs of the LSTM module, and output corresponding current frame hidden state information, where the hidden state information includes hidden information (hidden state) and cell state information (cell state), and the current frame hidden state information serves as an input of a next frame.
S103: and controlling the interaction between the intelligent agent and the 3D virtual environment according to the current frame action output information so as to acquire the state information of the next frame and the interaction information of the next frame of the intelligent agent.
And controlling the intelligent agent to execute corresponding action output based on the output current frame action output information, so that the intelligent agent interacts with the 3D virtual environment, the state information and the interaction information of the intelligent agent are updated, and the state information and the interaction information of the next frame of the intelligent agent are obtained.
In some embodiments, the outputting current frame action output information corresponding to the agent according to the current frame parallel task information and the current frame non-parallel task information includes: and outputting current frame action output information corresponding to the intelligent agent according to the current frame parallel task information and the current frame non-parallel task information based on a preset strategy gradient optimization function.
wherein A istAn Advantage function (Advantage function) at time t is shown, and N represents the number of learning trajectories.Gradient representing parallel tasks, F (a)t|st) Represents the gradient of the operating space of all non-parallel tasks, T represents all the moments in a learning sequence, and T represents a certain moment in this sequence.
wherein W represents the number of parallel tasks, m represents the size of the operating space of each task,represents the probability that any one operation in each parallel task is selected, and the W parallelable tasks do not obey a category distribution (i.e., the W tasks are independent of each other).
atRepresents the action selected at time t, stThe state of the t moment is represented, including the state of the intelligent agent at the t moment and the interaction state of the intelligent agent and the 3D virtual environment, such as material information, sound information, global information and equipment information at the t moment, ajq,tIndicates that the q-th action selected at time t is any one of the task information.
Gradient F (a) of the operating space of non-parallel taskst|st) Can be expressed as:
wherein, M represents the number of non-parallel tasks, and M represents the size of the operation space of each task, i.e. the number of predictable actions. For each non-parallel task, all actions of the non-parallel task cannot be simultaneously predicted to be executed, and only one operation in the non-parallel task can be selected to be executed at any time.
S104: and outputting the next frame action output information corresponding to the agent according to the next frame state information and the next frame interaction information.
After the next frame state information and the next frame interaction information of the agent are obtained, the next frame action output information corresponding to the agent is output through the AI model based on the next frame state information and the next frame interaction information according to the operation in the step S102. The specific operation process can refer to the steps S102-S105, and is not described herein again.
The method for formulating the action strategy of the intelligent agent provided by the embodiment comprises the steps of obtaining current frame state information of the intelligent agent in a 3D virtual environment and current frame interaction information of the intelligent agent and the 3D virtual environment; outputting current frame parallel task information and current frame non-parallel task information corresponding to the agent through an AI model based on the current frame state information and the current frame interaction information; outputting current frame action output information corresponding to the agent according to the current frame parallel task information and the current frame non-parallel task information; controlling the intelligent agent to interact with the 3D virtual environment according to the current frame action output information so as to obtain the next frame state information and the next frame interaction information of the intelligent agent; and outputting the next frame action output information corresponding to the intelligent agent according to the next frame state information and the next frame interaction information. The method comprises the steps of analyzing executable parallel task information and non-parallel task information of the intelligent agent in the current state, obtaining the action which can be synchronously executed and the mutually exclusive execution action of the current intelligent agent according to the parallel task information and the non-parallel task information, and controlling the intelligent agent to output the corresponding output action according to the parallel task information and the non-parallel task information, so that the action output by the intelligent agent is more reasonable and more humanized.
Referring to fig. 6, fig. 6 is a schematic block diagram of a server according to an embodiment of the present disclosure.
As shown in fig. 6, the server 30 may include a processor 301, a memory 302, and a network interface 303. The processor 301, memory 302, and network interface 303 are connected by a system bus, such as an I2C (Inter-integrated Circuit) bus.
Specifically, the Processor 301 may be a Micro-controller Unit (MCU), a Central Processing Unit (CPU), a Digital Signal Processor (DSP), or the like.
Specifically, the Memory 302 may be a Flash chip, a Read-Only Memory (ROM) magnetic disk, an optical disk, a usb disk, or a removable hard disk.
The network interface 303 is used for network communication such as sending assigned tasks and the like. Those skilled in the art will appreciate that the architecture shown in fig. 6 is a block diagram of only a portion of the architecture associated with the subject application, and does not constitute a limitation on the servers to which the subject application applies, as a particular server may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
Wherein the processor 301 is configured to run a computer program stored in the memory 302, and when executing the computer program, implement the following steps:
acquiring current frame state information of an agent in a 3D virtual environment and current frame interaction information of the agent and the 3D virtual environment;
outputting current frame parallel task information and current frame non-parallel task information corresponding to the agent through an AI model based on the current frame state information and the current frame interaction information;
outputting current frame action output information corresponding to the intelligent body according to the current frame parallel task information and the current frame non-parallel task information;
controlling the intelligent agent to interact with the 3D virtual environment according to the current frame action output information so as to obtain the next frame state information and the next frame interaction information of the intelligent agent;
and outputting the next frame action output information corresponding to the agent according to the next frame state information and the next frame interaction information.
In some embodiments, the outputting, by the processor 301 through the AI model, the current frame parallel task information and the current frame non-parallel task information corresponding to the agent based on the current frame state information and the current frame interaction information includes:
respectively extracting the characteristics of the current frame state information and the current frame interactive information to obtain corresponding current frame state characteristic information and current frame interactive characteristic information;
and acquiring corresponding current frame parallel task information and current frame non-parallel task information based on the current frame state characteristic information and the current frame interactive characteristic information through the time sequence characteristic extraction module of the AI model.
In some embodiments, the processor 301 obtains, through the timing characteristic extraction module of the AI model, corresponding current frame parallel task information and current frame non-parallel task information based on current frame state characteristic information and current frame interactive characteristic information, and includes:
inputting the current frame state characteristic information and the current frame interactive characteristic information into a first full-connection network corresponding to the AI model to obtain corresponding first output information of the current frame;
acquiring current frame fusion state vector information corresponding to the agent through a time sequence feature extraction module of the AI model based on the current frame first output information;
and inputting the current frame fusion state vector information into a second full-connection network corresponding to the AI model to acquire corresponding current frame parallel task information and current frame non-parallel task information.
In some embodiments, the current frame interactive feature information includes current frame global feature information, current frame poison circle feature information, current frame material feature information, and current frame sound feature information, and the processor 301 inputs the current frame state feature information and the current frame interactive feature information to the first fully-connected network corresponding to the AI model to obtain corresponding first output information of the current frame, including:
and respectively inputting the current frame state characteristic information, the current frame global characteristic information, the current frame poison circle characteristic information, the current frame material characteristic information and the current frame sound characteristic information into corresponding first full-connection networks to obtain corresponding first output information of the current frame.
In some embodiments, the current frame parallel task information includes mobile task information, aiming task information and non-parallel task selection information of the agent in the current frame; the current frame non-parallel task information comprises attack task information, material picking task information, attitude control task information and blood volume supplement task information of the agent in the current frame; the processor 301 inputs the current frame fusion state vector information into the second fully-connected network corresponding to the AI model to obtain corresponding current frame parallel task information and current frame non-parallel task information, including:
and respectively inputting the current frame fusion state vector information to a second full-connection network corresponding to the AI model so as to obtain corresponding current frame movement task information, current frame aiming task information, current frame non-parallel task selection information, current frame attack task information, current frame material picking task information, current frame attitude control task information and current frame blood volume supplement task information.
In some embodiments, the timing feature extraction module includes an LSTM module, and the obtaining, by the processor 301 through the timing feature extraction module of the AI model, current frame fusion state vector information corresponding to the agent based on the current frame first output information includes:
acquiring previous frame hidden state information corresponding to the LSTM module;
outputting, by the LSTM module, current frame hidden state information corresponding to the LSTM module based on the current frame first output information and the previous frame hidden state information;
and acquiring current frame fusion state vector information corresponding to the agent according to the current frame hidden state information.
In some embodiments, the outputting, by the processor 301, current frame action output information corresponding to the agent according to the current frame parallel task information and the current frame non-parallel task information includes:
and outputting current frame action output information corresponding to the agent according to the current frame parallel task information and the current frame non-parallel task information based on a preset strategy gradient optimization function.
wherein A istThe merit function at time t is expressed, N represents the number of learning trajectories,representing the gradient of the parallel task, F (a)t|st) Represents the gradient of the operating space of the non-parallel task, T represents all the moments in a learning sequence, and T represents a certain moment in this sequence.
The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.
The computer readable storage medium may be an internal storage unit of the server in the foregoing embodiment, for example, a hard disk or a memory of the server. The computer readable storage medium may also be an external storage device of the server, such as a plug-in hard disk provided on the server, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like.
Since the computer program stored in the computer-readable storage medium can execute any intelligent agent action policy making method provided in the embodiments of the present application, beneficial effects that can be achieved by any intelligent agent action policy making method provided in the embodiments of the present application can be achieved, which are detailed in the foregoing embodiments and will not be described herein again.
It is to be understood that the terminology used in the description of the present application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should also be understood that the term "and/or" as used in this specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations. It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments. While the invention has been described with reference to specific embodiments, the scope of the invention is not limited thereto, and various equivalent modifications or substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
Claims (10)
1. An agent action policy making method, the method comprising:
acquiring current frame state information of an agent in a 3D virtual environment and current frame interaction information of the agent and the 3D virtual environment;
outputting current frame parallel task information and current frame non-parallel task information corresponding to the agent through an AI model based on the current frame state information and the current frame interaction information;
outputting current frame action output information corresponding to the agent according to the current frame parallel task information and the current frame non-parallel task information;
controlling the intelligent agent to interact with the 3D virtual environment according to the current frame action output information so as to obtain the next frame state information and the next frame interaction information of the intelligent agent;
and outputting the next frame action output information corresponding to the agent according to the next frame state information and the next frame interaction information.
2. The method according to claim 1, wherein the outputting, by the AI model, current frame parallel task information and current frame non-parallel task information corresponding to the agent based on the current frame state information and the current frame interaction information includes:
respectively extracting the characteristics of the current frame state information and the current frame interactive information to obtain corresponding current frame state characteristic information and current frame interactive characteristic information;
and acquiring corresponding current frame parallel task information and current frame non-parallel task information based on the current frame state characteristic information and the current frame interactive characteristic information through the time sequence characteristic extraction module of the AI model.
3. The method according to claim 2, wherein the obtaining, by the timing feature extraction module of the AI model, corresponding current frame parallel task information and current frame non-parallel task information based on current frame state feature information and current frame interactive feature information includes:
inputting the current frame state characteristic information and the current frame interactive characteristic information into a first full-connection network corresponding to the AI model to obtain corresponding first output information of the current frame;
acquiring current frame fusion state vector information corresponding to the agent through a time sequence feature extraction module of the AI model based on the current frame first output information;
and inputting the current frame fusion state vector information into a second full-connection network corresponding to the AI model to acquire corresponding current frame parallel task information and current frame non-parallel task information.
4. The method according to claim 3, wherein the current frame interactive feature information includes current frame global feature information, current frame poison circle feature information, current frame material feature information, and current frame sound feature information, and the inputting the current frame state feature information and the current frame interactive feature information into the first fully-connected network corresponding to the AI model to obtain the corresponding current frame first output information includes:
and respectively inputting the current frame state characteristic information, the current frame global characteristic information, the current frame poison circle characteristic information, the current frame material characteristic information and the current frame sound characteristic information into corresponding first full-connection networks to obtain corresponding first output information of the current frame.
5. The method of claim 4, wherein the current frame parallel task information comprises mobile task information, targeting task information, and non-parallel task selection information of the agent at the current frame; the current frame non-parallel task information comprises attack task information, material picking task information, attitude control task information and blood volume supplement task information of the intelligent agent in the current frame; the inputting the current frame fusion state vector information into a second full-connection network corresponding to the AI model to obtain corresponding current frame parallel task information and current frame non-parallel task information includes:
and respectively inputting the current frame fusion state vector information to a second fully-connected network corresponding to the AI model so as to obtain corresponding current frame movement task information, current frame aiming task information, current frame non-parallel task selection information, current frame attack task information, current frame material picking task information, current frame attitude control task information and current frame blood volume supplement task information.
6. The method according to claim 3, wherein the timing feature extraction module includes an LSTM module, and the obtaining, by the timing feature extraction module of the AI model, current frame fusion state vector information corresponding to the agent based on the current frame first output information includes:
acquiring previous frame hidden state information corresponding to the LSTM module;
outputting, by the LSTM module, current frame hidden state information corresponding to the LSTM module based on the current frame first output information and the previous frame hidden state information;
and acquiring current frame fusion state vector information corresponding to the agent according to the current frame hidden state information.
7. The method according to claim 1, wherein the outputting current frame action output information corresponding to the agent according to the current frame parallel task information and the current frame non-parallel task information comprises:
and outputting current frame action output information corresponding to the agent according to the current frame parallel task information and the current frame non-parallel task information based on a preset strategy gradient optimization function.
8. The method of claim 7, wherein the strategic gradient optimization functionComprises the following steps:
wherein A istThe merit function at time t is expressed, N represents the number of learning trajectories,gradient representing parallel tasks, F (a)t|st) Represents the gradient of the operating space of the non-parallel task, T represents all the moments in a learning sequence, and T represents a certain moment in this sequence.
9. A server, comprising a processor, a memory;
the memory stores a computer program and an AI model that can be invoked and executed by the processor, wherein the computer program, when executed by the processor, implements the agent action policy making method according to any one of claims 1 to 8.
10. A computer-readable storage medium, having stored thereon a computer program which, when executed by a processor, causes the processor to carry out a method of agent action policy formulation according to any one of claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011312201.7A CN112494949B (en) | 2020-11-20 | 2020-11-20 | Intelligent body action policy making method, server and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011312201.7A CN112494949B (en) | 2020-11-20 | 2020-11-20 | Intelligent body action policy making method, server and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112494949A true CN112494949A (en) | 2021-03-16 |
CN112494949B CN112494949B (en) | 2023-10-31 |
Family
ID=74959236
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011312201.7A Active CN112494949B (en) | 2020-11-20 | 2020-11-20 | Intelligent body action policy making method, server and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112494949B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114118400A (en) * | 2021-10-11 | 2022-03-01 | 中国科学院自动化研究所 | Concentration network-based cluster countermeasure method and device |
CN116747521A (en) * | 2023-08-17 | 2023-09-15 | 腾讯科技(深圳)有限公司 | Method, device, equipment and storage medium for controlling intelligent agent to conduct office |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190286979A1 (en) * | 2018-03-14 | 2019-09-19 | Electronic Arts Inc. | Reinforcement Learning for Concurrent Actions |
CN110622174A (en) * | 2017-05-19 | 2019-12-27 | 渊慧科技有限公司 | Imagination-based agent neural network |
KR20200063309A (en) * | 2018-11-20 | 2020-06-05 | 고려대학교 산학협력단 | Method and system for performing environment adapting stategy based on ai |
CN111401557A (en) * | 2020-06-03 | 2020-07-10 | 超参数科技(深圳)有限公司 | Agent decision making method, AI model training method, server and medium |
CN111950726A (en) * | 2020-07-09 | 2020-11-17 | 华为技术有限公司 | Decision method based on multi-task learning, decision model training method and device |
-
2020
- 2020-11-20 CN CN202011312201.7A patent/CN112494949B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110622174A (en) * | 2017-05-19 | 2019-12-27 | 渊慧科技有限公司 | Imagination-based agent neural network |
US20190286979A1 (en) * | 2018-03-14 | 2019-09-19 | Electronic Arts Inc. | Reinforcement Learning for Concurrent Actions |
KR20200063309A (en) * | 2018-11-20 | 2020-06-05 | 고려대학교 산학협력단 | Method and system for performing environment adapting stategy based on ai |
CN111401557A (en) * | 2020-06-03 | 2020-07-10 | 超参数科技(深圳)有限公司 | Agent decision making method, AI model training method, server and medium |
CN111950726A (en) * | 2020-07-09 | 2020-11-17 | 华为技术有限公司 | Decision method based on multi-task learning, decision model training method and device |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114118400A (en) * | 2021-10-11 | 2022-03-01 | 中国科学院自动化研究所 | Concentration network-based cluster countermeasure method and device |
CN114118400B (en) * | 2021-10-11 | 2023-01-03 | 中国科学院自动化研究所 | Concentration network-based cluster countermeasure method and device |
CN116747521A (en) * | 2023-08-17 | 2023-09-15 | 腾讯科技(深圳)有限公司 | Method, device, equipment and storage medium for controlling intelligent agent to conduct office |
CN116747521B (en) * | 2023-08-17 | 2023-11-03 | 腾讯科技(深圳)有限公司 | Method, device, equipment and storage medium for controlling intelligent agent to conduct office |
Also Published As
Publication number | Publication date |
---|---|
CN112494949B (en) | 2023-10-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110201403B (en) | Method, device and medium for controlling virtual object to discard virtual article | |
KR20210028728A (en) | Method, apparatus, and device for scheduling virtual objects in a virtual environment | |
CN112494949B (en) | Intelligent body action policy making method, server and storage medium | |
CN105935493A (en) | Computer system, game device and method used for controlling roles | |
CN108211361B (en) | The determination method and apparatus of virtual resource acquisition probability, storage medium, electronic device in game | |
US11969654B2 (en) | Method and apparatus for determining target virtual object, terminal, and storage medium | |
CN109529352A (en) | The appraisal procedure of scheduling strategy, device and equipment in virtual environment | |
CN109529356A (en) | Battle result determines method, apparatus and storage medium | |
CN111589166A (en) | Interactive task control, intelligent decision model training methods, apparatus, and media | |
US12090397B2 (en) | Systems and methods for using natural language processing (NLP) to control automated execution of in-game activities | |
CN111603766A (en) | Control method and device of virtual carrier, storage medium and electronic device | |
CN113144597A (en) | Virtual vehicle display method, device, equipment and storage medium | |
Gajurel et al. | Neuroevolution for rts micro | |
JP2023548802A (en) | Stage screen display method, device, and equipment | |
CN111544889A (en) | Behavior control method and device of virtual object and storage medium | |
Gemine et al. | Imitative learning for real-time strategy games | |
CN114344905A (en) | Team interaction processing method, device, equipment, medium and program for virtual object | |
Miyake | Current status of applying artificial intelligence in digital games | |
Hagelbäck | Multi-agent potential field based architectures for real-time strategy game bots | |
Goecks et al. | Combining learning from human feedback and knowledge engineering to solve hierarchical tasks in minecraft | |
Hagelback et al. | Dealing with fog of war in a real time strategy game environment | |
Bernstein et al. | Evaluating the Effectiveness of Multi-Agent Organisational Paradigms in a Real-Time Strategy Environment: Engineering Multiagent Systems Track | |
CN115944924A (en) | Model training method, action strategy making method, server and storage medium | |
Li et al. | Fuzzy Self-Adaptive Soccer Robot Behavior Decision System Design through ROS. | |
CN112295232B (en) | Navigation decision making method, AI model training method, server and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |