CN112494949B - Intelligent body action policy making method, server and storage medium - Google Patents

Intelligent body action policy making method, server and storage medium Download PDF

Info

Publication number
CN112494949B
CN112494949B CN202011312201.7A CN202011312201A CN112494949B CN 112494949 B CN112494949 B CN 112494949B CN 202011312201 A CN202011312201 A CN 202011312201A CN 112494949 B CN112494949 B CN 112494949B
Authority
CN
China
Prior art keywords
current frame
information
agent
task information
parallel task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011312201.7A
Other languages
Chinese (zh)
Other versions
CN112494949A (en
Inventor
杨木
张弛
武建芳
王宇舟
郭仁杰
杨正云
杨少杰
李宏亮
刘永升
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Super Parameter Technology Shenzhen Co ltd
Original Assignee
Super Parameter Technology Shenzhen Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Super Parameter Technology Shenzhen Co ltd filed Critical Super Parameter Technology Shenzhen Co ltd
Priority to CN202011312201.7A priority Critical patent/CN112494949B/en
Publication of CN112494949A publication Critical patent/CN112494949A/en
Application granted granted Critical
Publication of CN112494949B publication Critical patent/CN112494949B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/60Generating or modifying game content before or while executing the game program, e.g. authoring tools specially adapted for game development or game-integrated level editor
    • A63F13/67Generating or modifying game content before or while executing the game program, e.g. authoring tools specially adapted for game development or game-integrated level editor adaptively or by learning from player actions, e.g. skill level adjustment or by storing successful combat sequences for re-use
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/55Controlling game characters or game objects based on the game progress
    • A63F13/56Computing the motion of game characters with respect to other game characters, game objects or elements of the game scene, e.g. for simulating the behaviour of a group of virtual soldiers or for path finding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F2300/00Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
    • A63F2300/60Methods for processing data by generating or executing the game program
    • A63F2300/6027Methods for processing data by generating or executing the game program using adaptive systems learning from user actions, e.g. for skill level adjustment

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Processing Or Creating Images (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses an agent action strategy making method, a server and a storage medium, wherein the method comprises the steps of obtaining state information of an agent current frame in a 3D virtual environment and interaction information of the agent and the current frame of the 3D virtual environment; outputting current frame parallel task information and current frame non-parallel task information corresponding to the intelligent agent based on the current frame state information and the current frame interaction information through an AI model; outputting current frame action output information corresponding to the intelligent agent according to the current frame parallel task information and the non-parallel task information of the current frame; according to the current frame action output information, the agent is controlled to interact with the 3D virtual environment so as to acquire next frame state information and next frame interaction information of the agent; and outputting next frame action output information corresponding to the intelligent agent according to the next frame state information and the next frame interaction information. The application can realize highly anthropomorphic AI simulation.

Description

Intelligent body action policy making method, server and storage medium
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to an agent action policy making method, a server, and a storage medium.
Background
With the rapid development of artificial intelligence (Artificial Intelligence, AI) technology, the artificial intelligence technology is widely applied to various fields of 3D games, virtual traffic, automatic driving simulation, robot trajectory planning and the like, and the AI simulation in a 3D virtual space has great commercial value, for example, the artificial intelligence technology can realize the exchange between an agent and a real person in various games.
At present, in AI simulation of a part of 3D virtual space, an agent needs to collect various resources in the 3D virtual space and fight against other agent players in a continuously reduced safety area, so that the agent needs to make correct action decisions in different environments in the AI simulation process, so that the agent can transfer and explore with the relatively safe area as a target point and fight against an enemy agent, so that the agent survives to the end.
Therefore, in order to enhance the game experience of the user, we want to highly personify the agent in AI simulation, so how to realize highly personified AI simulation becomes a problem to be solved.
Disclosure of Invention
The embodiment of the application provides an agent action strategy making method, a server and a storage medium, aiming at realizing highly anthropomorphic AI simulation.
In a first aspect, an embodiment of the present application provides a method for formulating an action policy of an agent, where the method includes:
acquiring state information of a current frame of an agent in a 3D virtual environment and interaction information of the agent and the current frame of the 3D virtual environment;
outputting current frame parallel task information and current frame non-parallel task information corresponding to the intelligent agent based on the current frame state information and the current frame interaction information through an AI model;
outputting current frame action output information corresponding to the intelligent agent according to the current frame parallel task information and the non-parallel task information of the current frame;
according to the current frame action output information, the agent is controlled to interact with the 3D virtual environment so as to acquire next frame state information and next frame interaction information of the agent;
and outputting next frame action output information corresponding to the intelligent agent according to the next frame state information and the next frame interaction information.
In a second aspect, an embodiment of the present application further provides a server, where the server includes a processor and a memory; the memory stores a computer program and an AI model which can be called and executed by the processor, wherein the method for formulating the action policy of the agent is realized when the computer program is executed by the processor.
In a third aspect, an embodiment of the present application further provides a computer readable storage medium, where the computer readable storage medium is configured to store a computer program, where the computer program when executed by a processor causes the processor to implement the method for preparing an agent action policy described above.
The embodiment of the application provides an agent action strategy making method, a server and a storage medium, wherein the agent action strategy making method is implemented by acquiring state information of an agent current frame in a 3D virtual environment and interaction information of the agent and the current frame of the 3D virtual environment; outputting current frame parallel task information and current frame non-parallel task information corresponding to the intelligent agent based on the current frame state information and the current frame interaction information through an AI model; outputting current frame action output information corresponding to the intelligent agent according to the current frame parallel task information and the non-parallel task information of the current frame; according to the current frame action output information, the agent is controlled to interact with the 3D virtual environment so as to acquire next frame state information and next frame interaction information of the agent; and outputting next frame action output information corresponding to the intelligent agent according to the next frame state information and the next frame interaction information. By analyzing the executable parallel task information and the non-parallel task information of the intelligent agent in the current state, the action which can be synchronously executed and the action which can be mutually exclusive executed by the current intelligent agent are obtained according to the parallel task information and the non-parallel task information, and the intelligent agent is controlled to output the corresponding output action according to the parallel task information and the non-parallel task information, so that the action output by the intelligent agent is more reasonable and humanized.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart illustrating steps of a method for establishing an agent action policy according to an embodiment of the present application;
FIG. 2 is a schematic diagram of an application scenario of an agent action policy formulation method according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a corresponding action that an agent can selectively output according to parallel task information and non-parallel task information in the application scenario corresponding to FIG. 2;
FIG. 4 is a schematic diagram of an AI model-based agent action output provided in accordance with one embodiment of the application;
FIG. 5 is another schematic diagram of an AI model-based agent action output provided in accordance with one embodiment of the application;
fig. 6 is a schematic block diagram of a server according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The flow diagrams depicted in the figures are merely illustrative and not necessarily all of the elements and operations/steps are included or performed in the order described. For example, some operations/steps may be further divided, combined, or partially combined, so that the order of actual execution may be changed according to actual situations.
It is to be understood that the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should also be understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
Some embodiments of the present application are described in detail below with reference to the accompanying drawings. The following embodiments and features of the embodiments may be combined with each other without conflict.
With the rapid development of artificial intelligence (Artificial Intelligence, AI) technology, the artificial intelligence technology is widely applied to various fields of 3D games, virtual traffic, automatic driving simulation, robot trajectory planning and the like, and the AI simulation in a 3D virtual space has great commercial value, for example, the artificial intelligence technology can realize the exchange between an agent and a real person in various games.
At present, in AI simulation of a part of 3D virtual space, an agent needs to collect various resources in the 3D virtual space and fight against other agent players in a continuously reduced safety area, so that the agent needs to make correct action decisions in different environments in the AI simulation process, so that the agent can transfer and explore with the relatively safe area as a target point and fight against an enemy agent, so that the agent survives to the end.
Therefore, in order to enhance the game experience of the user, we want to highly personify the agent in AI simulation, so how to realize highly personified AI simulation becomes a problem to be solved.
To solve the above problems, embodiments of the present application provide an agent action policy formulation method, a server, and a computer-readable storage medium for implementing highly personified AI simulation. The method for formulating the action policy of the agent can be applied to a server, and the server can be a single server or a server cluster consisting of a plurality of servers.
Referring to fig. 1, fig. 1 is a flow chart of an agent action policy making method according to an embodiment of the application.
As shown in fig. 1, the action decision making method specifically includes steps S101 to S105.
Step S101: and acquiring state information of a current frame of the intelligent agent in the 3D virtual environment and interaction information of the intelligent agent and the current frame of the 3D virtual environment.
For example, in various application scenarios such as artificial intelligence (Artificial Intelligence, AI), robot simulation under a 3D virtual environment, a mechanical arm, unmanned, virtual traffic simulation, etc., or game AI in a 3D type game, in order to realize highly anthropomorphic simulation, highly anthropomorphic action decisions are made for an Agent (Agent) in the 3D virtual environment, current frame state information of the Agent in the 3D virtual environment and current frame interaction information of the Agent and the 3D virtual environment are obtained, so that corresponding action decisions are made according to the current frame state information and the current frame interaction information. The intelligent agent is an intelligent agent which is in a complex dynamic environment, autonomously perceives environment information, autonomously takes action and realizes a series of preset targets or tasks.
The current frame state information of the intelligent agent is self-related state data information for representing the intelligent agent in the current frame, and comprises the self-data information of the intelligent agent and equipment information worn by the intelligent agent. The intelligent agent self data information comprises position information, movement information, blood volume information, equipment information, affiliated camping information and the like.
The agent and 3D virtual environment interaction information is relative data information, such as global information, poison loop information, material information, and sound information, used for representing the current frame of the agent and the 3D virtual environment.
In the present embodiment, the description is given taking the AI simulation of the game in 3D game as an example, including but not limited to 3DFPS (3 DFirst Person Shooter) game, but may also be the AI simulation of the game in other types of 3D games, and the description is not limited herein.
As shown in fig. 2, in the 3D game, the agent may play with a preset number of other players, where the other players may be other agents or game characters controlled by human players, in this embodiment, the other players are taken as other agents for illustration, but not limited to, the other players may be only other agents.
The agents may participate in the game in a team or single team with other agents such that agents of different camps are present in the game. The agent participating in the game can select any area in the 3D virtual environment as a target area, and falls on the target area through the jump parachute, so that the agent needs to collect different resources such as weapons, protective equipment, props and the like in the 3D virtual environment to increase self fight force, meanwhile, along with the progress of the game, the safety area on the 3D virtual environment is gradually reduced, the poison circle area is gradually enlarged, the agent participating in the game can generate more fights in order to reach the safety area, and the agent kills the agent enemy agent belonging to other camps through various strategies, so that the winning is finally obtained.
By acquiring the position information, the motion information, the blood volume information, the equipment information and the affiliated camping information of the current frame of the intelligent agent, the related information of the intelligent agent can be accurately evaluated.
Wherein the location information comprises a spatial location of the agent in the 3D virtual environment, which may be represented by a spatial coordinate system; the motion information comprises the current direction and the moving speed of the intelligent body; the blood volume information comprises the total blood volume, the residual blood volume and the like of the intelligent agent; the equipment information comprises information of armors, helmets and weapons owned by each weapon slot on the intelligent agent, wherein the weapon information comprises weapon types, weapon states, such as weapon loading, residual bullet amount and the like.
Global information, poison circle information, material information and sound information generated after the intelligent agent interacts with the 3D virtual environment are acquired, so that the current environment information of the intelligent agent can be accurately evaluated.
The global information mainly comprises the progress time of the current game, the survival number of teammates, the killing total number of the team on my side and the like. The circle information includes record information of the local circle, such as the current circle center, the current circle radius, the stage of the current circle, the current circle residual time, the center of the next circle, the radius of the next circle and the total time of the next circle. The material information includes the location, type, nature, quantity of the visible material in the field of view of the agent, wherein the material type includes but is not limited to firearm, knife, armour, helmet, medicine, throwing object, etc. The sound information mainly includes the position, relative orientation, kind of sound source, and the like of the sound source.
Step S102: and outputting current frame parallel task information and current frame non-parallel task information corresponding to the intelligent agent based on the current frame state information and the current frame interaction information through an AI model.
The parallel task information is information corresponding to related actions for representing that the agent can synchronously execute at the same time, and the parallel task information includes, but is not limited to, mobile task information, first direction aiming task information, second direction aiming task information and non-parallel task selection information. The movement task information includes a movement direction, a movement speed, a posture at the time of movement, and the like. The targeting task information is a targeting direction that characterizes the agent.
That is, the agent can synchronously output at least any one of an action corresponding to the execution of the mobile task, an action corresponding to the execution of the first direction aiming task, an action corresponding to the execution of the second direction aiming task, and an action corresponding to the selection of the non-parallel task at the same time.
As shown in fig. 3, the moving directions include, but are not limited to, front, rear, left, right, front left, rear left, front right, and rear right directions.
The aiming direction includes, but is not limited to, four directions of upper aiming, lower aiming, left aiming and right aiming, wherein the upper aiming and the lower aiming are in a first direction, the left aiming and the right aiming are in a second direction, and the first direction and the second direction are perpendicular to each other.
The non-parallel task information is information corresponding to a task for representing that the intelligent agent outputs action mutex at the same time, and comprises at least one of attack task information, material picking task information, attitude control task information and blood volume supplementing task information.
That is, the agent can only output any one of the action corresponding to the execution of the attack task, the action corresponding to the execution of the material picking task, the action corresponding to the execution of the posture control task, and the action corresponding to the execution of the blood volume replenishment task at the same time. The attack task information is used for controlling selection and switching of weapons when the intelligent agent fights with other intelligent agents in camping, such as firing, switching of near weapons, switching of far weapons, weapon storage, throwing of throwing objects and the like.
The material picking task information is used for controlling the intelligent agent to pick up relevant information of corresponding articles in a preset range, such as weapons, blood bags and the like in the preset range.
The gesture control task information is related information for controlling the intelligent body to perform gesture switching, such as jumping, squatting, lying, standing, running and walking, as shown in fig. 3.
The blood volume supplementing task information is related information for controlling the intelligent agent to select reasonable medicines to treat the intelligent agent so as to recover the state. And screening out the current frame parallel task information and the current frame non-parallel task information corresponding to the intelligent agent according to the current frame state information and the current frame interaction information by using a preset AI model. According to the obtained parallel task information and the parallel task information, the intelligent agent can know that the intelligent agent can output actions corresponding to one or more subtask information in the parallel task information in the current frame state, and can only output actions corresponding to one subtask information in the non-parallel task information, so that the intelligent agent is prevented from synchronously outputting mutual exclusion actions in the current state, and the anthropomorphic effect of AI simulation is better.
Referring to fig. 4, the AI model includes a first fully-connected network and a second fully-connected network, where the fully-connected network is also called a fully-connected network layer (fully connected layer, FC), and in some embodiments, the outputting, by the AI model, current frame parallel task information and current frame non-parallel task information corresponding to the agent based on the current frame state information and the current frame interaction information includes: extracting the characteristics of the current frame state information and the current frame interaction information respectively to obtain corresponding current frame state characteristic information and current frame interaction characteristic information; and acquiring corresponding current frame parallel task information and current frame non-parallel task information based on the current frame state characteristic information and the current frame interaction characteristic information by the time sequence characteristic extraction module of the AI model.
The time sequence feature extraction module through the AI model obtains corresponding current frame parallel task information and current frame non-parallel task information based on current frame state feature information and current frame interaction feature information, and the time sequence feature extraction module comprises: inputting the current frame state characteristic information and the current frame interaction characteristic information into a first fully-connected network corresponding to the AI model to obtain corresponding current frame first output information; acquiring current frame fusion state vector information corresponding to the intelligent agent based on the first output information of the current frame through a time sequence feature extraction module of the AI model; and inputting the current frame fusion state vector information into a second fully-connected network corresponding to the AI model to obtain corresponding current frame parallel task information and current frame non-parallel task information.
Referring to fig. 5, in some embodiments, the current frame interaction feature information includes current frame global feature information, current frame poison circle feature information, current frame material feature information, and current frame sound feature information. The current frame global feature information is obtained through current frame global information extraction, the current frame poison circle feature information is obtained through current frame poison circle information extraction, the current frame material feature information is obtained through current frame material information extraction, and the current frame sound feature information is obtained through current frame sound information extraction.
Feature extraction is respectively carried out on the state information of the current frame and the interaction information of the current frame of the intelligent agent to obtain corresponding state feature information of the current frame and interaction feature information of the current frame, and the method comprises the following steps:
and respectively taking the global characteristic information of the current frame, the poison circle characteristic information of the current frame, the material characteristic information of the current frame and the sound characteristic information of the current frame as inputs of a first fully-connected network corresponding to the AI model so as to output corresponding first output information of the current frame.
And the time sequence feature extraction module based on the AI model performs time sequence feature fusion on the first output information of the current frame to acquire the fusion state vector information of the current frame corresponding to the intelligent agent.
And inputting the current frame fusion state vector information into a corresponding second fully-connected network so as to acquire corresponding current frame parallel task information and current frame non-parallel task information.
In some embodiments, the current frame parallel task information includes moving task information, first direction aiming task information, second direction aiming task information and non-parallel task selection information of the agent in the current frame; the non-parallel task information of the current frame comprises attack task information, material picking task information, attitude control task information and blood volume supplementing task information of the intelligent agent in the current frame; the step of inputting the current frame fusion state vector information into a second fully connected network corresponding to the AI model to obtain corresponding current frame parallel task information and current frame non-parallel task information, including:
and respectively inputting the current frame fusion state vector information into a second full-connection network corresponding to the AI model to output corresponding current frame movement task information, current frame first direction aiming task information, current frame second direction aiming task information, current frame non-parallel task selection information, current frame attack task information, current frame material picking task information, current frame attitude control task information and current frame blood volume supplementing task information.
The same current frame fusion state vector information is used as the input of a second fully-connected network, and a plurality of multi-task learning results are output, so that the learning generalization effect is better, and the simulation anthropomorphic effect is stronger.
In this embodiment, the AI model is provided with a corresponding fully-connected neural network and a timing characteristic extraction module, where the timing characteristic extraction module includes, but is not limited to, an LSTM (Long Short-Term Memory) module, a GRU (Gated Recurrent Unit, gate control unit network) module, a transducer module, and the like.
Taking the time sequence feature extraction module as an LSTM module for illustration, the time sequence feature extraction module passing through the AI model obtains the current frame fusion state vector information corresponding to the agent based on the first output information of the current frame, and the method comprises the following steps: acquiring the hidden state information of the last frame corresponding to the LSTM module; outputting the current frame hiding state information corresponding to the LSTM module based on the first output information of the current frame and the hiding state information of the previous frame through the LSTM module; and acquiring the fusion state vector information of the current frame corresponding to the intelligent agent according to the hiding state information of the current frame.
The LSTM module is used as an independent feature extraction unit, and can accept the hidden state information of the previous frame and the first output information of the current frame as the input of the LSTM module and output the corresponding hidden state information of the current frame, wherein the hidden state information comprises hidden information (hidden state) and cell state information (cell state), and the hidden state information of the current frame is used as the input of the next frame.
S103: and according to the current frame action output information, controlling interaction between the intelligent agent and the 3D virtual environment to acquire next frame state information and next frame interaction information of the intelligent agent.
And controlling the intelligent agent to execute corresponding action output based on the output current frame action output information, so that the intelligent agent interacts with the 3D virtual environment, and the state information and interaction information of the intelligent agent are updated to obtain the next frame state information and the next frame interaction information of the intelligent agent.
In some embodiments, the outputting the current frame action output information corresponding to the agent according to the current frame parallel task information and the current frame non-parallel task information includes: and outputting current frame action output information corresponding to the intelligent agent based on a preset strategy gradient optimization function according to the current frame parallel task information and the current frame non-parallel task information.
Illustratively, a predetermined policy gradient optimization functionExpressed as:
wherein A is t The dominance function (Advantage function) at time t is shown, and N is the number of learning trajectories.Representing gradients of parallel tasks, F (a) t |s t ) Representing gradients of the operating space of all non-parallel tasks, T representing all moments in a learning sequence, T representing a certain moment in the sequence.
Specifically, the gradient of parallel tasksCan be expressed as:
wherein W represents the number of parallel tasks, m represents the operation space size of each task,representing each parallel arbitraryAny one of the tasks operates on the selected probability and the W parallelizable tasks do not follow the class distribution (categorical distribution), i.e., the W tasks are independent of each other.
a t Representing the action selected at time t, s t The state at time t is represented by the state of the agent at time t and the interaction state of the agent and the 3D virtual environment, such as material information, sound information, global information, equipment information at time t, a jq,t The q-th action selected at time t is any task information.
Gradient F (a) of the operating space of non-parallel tasks t |s t ) Can be expressed as:
wherein M represents the number of non-parallel tasks, and M represents the operation space size of each task, i.e. the number of actions which can be predicted. For each non-parallel task, all actions of the non-parallel task cannot be simultaneously predicted to be executed, and only one operation in the non-parallel task can be selected to be executed at any time.
S104: and outputting next frame action output information corresponding to the intelligent agent according to the next frame state information and the next frame interaction information.
After the next frame status information and the next frame interaction information of the agent are obtained, according to the operation in the step S102, the next frame action output information corresponding to the agent is output through the AI model based on the next frame status information and the next frame interaction information. The specific operation process may be described with reference to steps S102-S105, and will not be described here.
According to the method for formulating the action strategy of the intelligent agent, the current frame state information of the intelligent agent in the 3D virtual environment and the current frame interaction information of the intelligent agent and the 3D virtual environment are obtained; outputting current frame parallel task information and current frame non-parallel task information corresponding to the intelligent agent based on the current frame state information and the current frame interaction information through an AI model; outputting current frame action output information corresponding to the intelligent agent according to the current frame parallel task information and the non-parallel task information of the current frame; according to the current frame action output information, the agent is controlled to interact with the 3D virtual environment so as to acquire next frame state information and next frame interaction information of the agent; and outputting next frame action output information corresponding to the intelligent agent according to the next frame state information and the next frame interaction information. By analyzing the executable parallel task information and the non-parallel task information of the intelligent agent in the current state, the action which can be synchronously executed and the action which can be mutually exclusive executed by the current intelligent agent are obtained according to the parallel task information and the non-parallel task information, and the intelligent agent is controlled to output the corresponding output action according to the parallel task information and the non-parallel task information, so that the action output by the intelligent agent is more reasonable and humanized.
Referring to fig. 6, fig. 6 is a schematic block diagram of a server according to an embodiment of the present application.
As shown in fig. 6, the server 30 may include a processor 301, a memory 302, and a network interface 303. The processor 301, memory 302, and network interface 303 are connected by a system bus, such as an I2C (Inter-integrated Circuit) bus.
Specifically, the processor 301 may be a Micro-controller Unit (MCU), a central processing Unit (Central Processing Unit, CPU), a digital signal processor (Digital Signal Processor, DSP), or the like.
Specifically, the Memory 302 may be a Flash chip, a Read-Only Memory (ROM) disk, an optical disk, a U-disk, a removable hard disk, or the like.
The network interface 303 is used for network communication such as transmission of assigned tasks and the like. It will be appreciated by those skilled in the art that the structure shown in fig. 6 is merely a block diagram of a portion of the structure associated with the present inventive arrangements and is not limiting of the server to which the present inventive arrangements are applied, and that a particular server may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
Wherein the processor 301 is configured to run a computer program stored in the memory 302 and to implement the following steps when executing the computer program:
acquiring state information of a current frame of an agent in a 3D virtual environment and interaction information of the agent and the current frame of the 3D virtual environment;
outputting current frame parallel task information and current frame non-parallel task information corresponding to the intelligent agent based on the current frame state information and the current frame interaction information through an AI model;
outputting current frame action output information corresponding to the intelligent agent according to the current frame parallel task information and the non-parallel task information of the current frame;
according to the current frame action output information, the agent is controlled to interact with the 3D virtual environment so as to acquire next frame state information and next frame interaction information of the agent;
and outputting next frame action output information corresponding to the intelligent agent according to the next frame state information and the next frame interaction information.
In some embodiments, the outputting, by the processor 301, current frame parallel task information and current frame non-parallel task information corresponding to the agent based on the current frame state information and the current frame interaction information through an AI model includes:
extracting the characteristics of the current frame state information and the current frame interaction information respectively to obtain corresponding current frame state characteristic information and current frame interaction characteristic information;
and acquiring corresponding current frame parallel task information and current frame non-parallel task information based on the current frame state characteristic information and the current frame interaction characteristic information by the time sequence characteristic extraction module of the AI model.
In some embodiments, the processor 301 obtains, through the timing feature extraction module of the AI model, corresponding current frame parallel task information and current frame non-parallel task information based on the current frame state feature information and the current frame interaction feature information, including:
inputting the current frame state characteristic information and the current frame interaction characteristic information into a first fully-connected network corresponding to the AI model to obtain corresponding current frame first output information;
acquiring current frame fusion state vector information corresponding to the intelligent agent based on the first output information of the current frame through a time sequence feature extraction module of the AI model;
and inputting the current frame fusion state vector information into a second fully-connected network corresponding to the AI model to obtain corresponding current frame parallel task information and current frame non-parallel task information.
In some embodiments, the current frame interaction feature information includes current frame global feature information, current frame poison circle feature information, current frame material feature information, and current frame sound feature information, and the processor 301 inputs the current frame state feature information and the current frame interaction feature information to a first fully-connected network corresponding to the AI model to obtain corresponding current frame first output information, including:
and respectively inputting the current frame state characteristic information, the current frame global characteristic information, the current frame poison circle characteristic information, the current frame material characteristic information and the current frame sound characteristic information into a corresponding first fully-connected network to acquire corresponding first output information of the current frame.
In some embodiments, the current frame parallel task information includes moving task information, aiming task information and non-parallel task selection information of the agent in the current frame; the non-parallel task information of the current frame comprises attack task information, material picking task information, attitude control task information and blood volume supplementing task information of the intelligent agent in the current frame; the processor 301 inputs the current frame fusion state vector information into the second fully-connected network corresponding to the AI model to obtain corresponding current frame parallel task information and current frame non-parallel task information, including:
and respectively inputting the current frame fusion state vector information into a second fully-connected network corresponding to the AI model to obtain corresponding current frame movement task information, current frame aiming task information, current frame non-parallel task selection information, current frame attack task information, current frame material picking task information, current frame attitude control task information and current frame blood volume supplementing task information.
In some embodiments, the timing feature extraction module includes an LSTM module, and the processor 301 obtains, by using the timing feature extraction module of the AI model, current frame fusion state vector information corresponding to the agent based on the current frame first output information, including:
acquiring the hidden state information of the last frame corresponding to the LSTM module;
outputting the current frame hiding state information corresponding to the LSTM module based on the first output information of the current frame and the hiding state information of the previous frame through the LSTM module;
and acquiring the fusion state vector information of the current frame corresponding to the intelligent agent according to the hiding state information of the current frame.
In some embodiments, the outputting, by the processor 301, the current frame action output information corresponding to the agent according to the current frame parallel task information and the current frame non-parallel task information includes:
and outputting current frame action output information corresponding to the intelligent agent based on a preset strategy gradient optimization function according to the current frame parallel task information and the current frame non-parallel task information.
In some embodiments, the policy gradient optimization functionThe method comprises the following steps:
wherein A is t Represents the dominance function at time t, N represents the number of learning tracks,representing gradients of parallel tasks, F (a) t |s t ) Representing gradients in the operating space of non-parallel tasks, T representing all moments in a learning sequence, T representing a certain moment in the sequence.
The specific implementation of each operation above may be referred to the previous embodiments, and will not be described herein.
The computer readable storage medium may be an internal storage unit of the server of the foregoing embodiment, for example, a hard disk or a memory of the server. The computer readable storage medium may also be an external storage device of the server, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the server.
Because the computer program stored in the computer readable storage medium can execute any of the agent action policy formulation methods provided by the embodiments of the present application, the beneficial effects that any of the agent action policy formulation methods provided by the embodiments of the present application can be achieved, and detailed descriptions thereof are omitted herein.
It is to be understood that the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should also be understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations. It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments. While the application has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the application. Therefore, the protection scope of the application is subject to the protection scope of the claims.

Claims (8)

1. An agent action policy formulation method, the method comprising:
acquiring state information of a current frame of an agent in a 3D virtual environment and interaction information of the agent and the current frame of the 3D virtual environment;
outputting current frame parallel task information and current frame non-parallel task information corresponding to the intelligent agent based on the current frame state information and the current frame interaction information through an AI model;
outputting current frame action output information corresponding to the intelligent agent according to the current frame parallel task information and the non-parallel task information of the current frame;
controlling interaction between the intelligent agent and the 3D virtual environment according to the current frame action output information so as to acquire next frame state information and next frame interaction information of the intelligent agent;
outputting next frame action output information corresponding to the intelligent agent according to the next frame state information and the next frame interaction information;
the outputting, by the AI model, current frame parallel task information and current frame non-parallel task information corresponding to the agent based on the current frame state information and the current frame interaction information includes:
extracting the characteristics of the current frame state information and the current frame interaction information respectively to obtain corresponding current frame state characteristic information and current frame interaction characteristic information;
acquiring corresponding current frame parallel task information and current frame non-parallel task information based on current frame state feature information and current frame interaction feature information through a time sequence feature extraction module of the AI model;
the time sequence feature extraction module through the AI model obtains corresponding current frame parallel task information and current frame non-parallel task information based on current frame state feature information and current frame interaction feature information, and the time sequence feature extraction module comprises:
inputting the current frame state characteristic information and the current frame interaction characteristic information into a first fully-connected network corresponding to the AI model to obtain corresponding current frame first output information;
acquiring current frame fusion state vector information corresponding to the intelligent agent based on the first output information of the current frame through a time sequence feature extraction module of the AI model;
and inputting the current frame fusion state vector information into a second fully-connected network corresponding to the AI model to obtain corresponding current frame parallel task information and current frame non-parallel task information.
2. The method of claim 1, wherein the current frame interaction characteristic information includes current frame global characteristic information, current frame poison circle characteristic information, current frame material characteristic information, and current frame sound characteristic information, and the inputting the current frame state characteristic information and the current frame interaction characteristic information into the first fully connected network corresponding to the AI model to obtain corresponding current frame first output information includes:
and respectively inputting the current frame state characteristic information, the current frame global characteristic information, the current frame poison circle characteristic information, the current frame material characteristic information and the current frame sound characteristic information into a corresponding first fully-connected network to acquire corresponding first output information of the current frame.
3. The method of claim 2, wherein the current frame parallel task information includes movement task information, aiming task information, and non-parallel task selection information of the agent at the current frame; the non-parallel task information of the current frame comprises attack task information, material picking task information, attitude control task information and blood volume supplementing task information of the intelligent agent in the current frame; the step of inputting the current frame fusion state vector information into a second fully connected network corresponding to the AI model to obtain corresponding current frame parallel task information and current frame non-parallel task information, including:
and respectively inputting the current frame fusion state vector information into a second fully-connected network corresponding to the AI model to obtain corresponding current frame movement task information, current frame aiming task information, current frame non-parallel task selection information, current frame attack task information, current frame material picking task information, current frame attitude control task information and current frame blood volume supplementing task information.
4. The method of claim 1, wherein the timing feature extraction module comprises an LSTM module, the timing feature extraction module that passes through the AI model obtains current frame fusion state vector information corresponding to the agent based on the current frame first output information, comprising:
acquiring the hidden state information of the last frame corresponding to the LSTM module;
outputting the current frame hiding state information corresponding to the LSTM module based on the first output information of the current frame and the hiding state information of the previous frame through the LSTM module;
and acquiring the fusion state vector information of the current frame corresponding to the intelligent agent according to the hiding state information of the current frame.
5. The method according to claim 1, wherein outputting the current frame action output information corresponding to the agent according to the current frame parallel task information and the current frame non-parallel task information comprises:
and outputting current frame action output information corresponding to the intelligent agent based on a preset strategy gradient optimization function according to the current frame parallel task information and the current frame non-parallel task information.
6. The method of claim 5, wherein the policy gradient optimization functionThe method comprises the following steps:
wherein A is t Represents the dominance function at time t, N represents the number of learning tracks,representing gradients of parallel tasks, F (a) t |s t ) Representing gradients in the operating space of non-parallel tasks, T representing all moments in a learning sequence, T representing a certain moment in the sequence.
7. A server, wherein the server comprises a processor and a memory;
the memory stores a computer program and an AI model that can be called and executed by the processor, wherein the computer program, when executed by the processor, implements the agent action policy formulation method according to any one of claims 1 to 6.
8. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, causes the processor to implement the agent action policy formulation method according to any one of claims 1 to 6.
CN202011312201.7A 2020-11-20 2020-11-20 Intelligent body action policy making method, server and storage medium Active CN112494949B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011312201.7A CN112494949B (en) 2020-11-20 2020-11-20 Intelligent body action policy making method, server and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011312201.7A CN112494949B (en) 2020-11-20 2020-11-20 Intelligent body action policy making method, server and storage medium

Publications (2)

Publication Number Publication Date
CN112494949A CN112494949A (en) 2021-03-16
CN112494949B true CN112494949B (en) 2023-10-31

Family

ID=74959236

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011312201.7A Active CN112494949B (en) 2020-11-20 2020-11-20 Intelligent body action policy making method, server and storage medium

Country Status (1)

Country Link
CN (1) CN112494949B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114118400B (en) * 2021-10-11 2023-01-03 中国科学院自动化研究所 Concentration network-based cluster countermeasure method and device
CN116747521B (en) * 2023-08-17 2023-11-03 腾讯科技(深圳)有限公司 Method, device, equipment and storage medium for controlling intelligent agent to conduct office

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110622174A (en) * 2017-05-19 2019-12-27 渊慧科技有限公司 Imagination-based agent neural network
KR20200063309A (en) * 2018-11-20 2020-06-05 고려대학교 산학협력단 Method and system for performing environment adapting stategy based on ai
CN111401557A (en) * 2020-06-03 2020-07-10 超参数科技(深圳)有限公司 Agent decision making method, AI model training method, server and medium
CN111950726A (en) * 2020-07-09 2020-11-17 华为技术有限公司 Decision method based on multi-task learning, decision model training method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11580378B2 (en) * 2018-03-14 2023-02-14 Electronic Arts Inc. Reinforcement learning for concurrent actions

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110622174A (en) * 2017-05-19 2019-12-27 渊慧科技有限公司 Imagination-based agent neural network
KR20200063309A (en) * 2018-11-20 2020-06-05 고려대학교 산학협력단 Method and system for performing environment adapting stategy based on ai
CN111401557A (en) * 2020-06-03 2020-07-10 超参数科技(深圳)有限公司 Agent decision making method, AI model training method, server and medium
CN111950726A (en) * 2020-07-09 2020-11-17 华为技术有限公司 Decision method based on multi-task learning, decision model training method and device

Also Published As

Publication number Publication date
CN112494949A (en) 2021-03-16

Similar Documents

Publication Publication Date Title
Vinyals et al. Starcraft ii: A new challenge for reinforcement learning
CN110201403B (en) Method, device and medium for controlling virtual object to discard virtual article
CN112494949B (en) Intelligent body action policy making method, server and storage medium
CN112783209B (en) Unmanned aerial vehicle cluster confrontation control method based on pigeon intelligent competition learning
CN113791634A (en) Multi-aircraft air combat decision method based on multi-agent reinforcement learning
CN112791394B (en) Game model training method and device, electronic equipment and storage medium
Safadi et al. Artificial intelligence in video games: Towards a unified framework
CN114460959A (en) Unmanned aerial vehicle group cooperative autonomous decision-making method and device based on multi-body game
CN109529352A (en) The appraisal procedure of scheduling strategy, device and equipment in virtual environment
Singh et al. Structured world belief for reinforcement learning in pomdp
CN113144597B (en) Virtual vehicle display method, device, equipment and storage medium
US20220111292A1 (en) Systems and methods for using natural language processing (nlp) to control automated execution of in-game activities
Hagelbäck Multi-agent potential field based architectures for real-time strategy game bots
Miyake Current status of applying artificial intelligence in digital games
JP2023548802A (en) Stage screen display method, device, and equipment
Goecks et al. Combining learning from human feedback and knowledge engineering to solve hierarchical tasks in minecraft
CN114344905A (en) Team interaction processing method, device, equipment, medium and program for virtual object
CN114272599A (en) Artificial intelligence object control method, device, equipment and storage medium
Lai et al. Training an agent for third-person shooter game using unity ml-agents
CN113324545A (en) Multi-unmanned aerial vehicle collaborative task planning method based on hybrid enhanced intelligence
Oakes Practical and theoretical issues of evolving behaviour trees for a turn-based game
Bernstein et al. Evaluating the Effectiveness of Multi-Agent Organisational Paradigms in a Real-Time Strategy Environment: Engineering Multiagent Systems Track
Lim An AI player for DEFCON: An evolutionary approach using behavior trees
CN112295232B (en) Navigation decision making method, AI model training method, server and medium
Bian et al. Cooperative strike target assignment algorithm based on deep reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant