CN107145948A - A NPC Control Method Based on Multi-agent Technology - Google Patents
A NPC Control Method Based on Multi-agent Technology Download PDFInfo
- Publication number
- CN107145948A CN107145948A CN201710237082.5A CN201710237082A CN107145948A CN 107145948 A CN107145948 A CN 107145948A CN 201710237082 A CN201710237082 A CN 201710237082A CN 107145948 A CN107145948 A CN 107145948A
- Authority
- CN
- China
- Prior art keywords
- mrow
- msubsup
- msup
- npc
- player
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 20
- 238000005516 engineering process Methods 0.000 title claims abstract description 10
- 230000009471 action Effects 0.000 claims abstract description 99
- 230000002452 interceptive effect Effects 0.000 claims abstract description 48
- 238000013515 script Methods 0.000 claims abstract description 39
- 238000010586 diagram Methods 0.000 claims abstract description 36
- 230000003993 interaction Effects 0.000 claims abstract description 13
- 230000008859 change Effects 0.000 claims abstract description 4
- 230000006870 function Effects 0.000 claims description 30
- 230000007704 transition Effects 0.000 claims description 16
- 230000007774 longterm Effects 0.000 claims description 12
- 230000008569 process Effects 0.000 claims description 5
- 230000000694 effects Effects 0.000 claims description 2
- 241000764238 Isis Species 0.000 claims 1
- 230000002776 aggregation Effects 0.000 claims 1
- 238000004220 aggregation Methods 0.000 claims 1
- 238000010276 construction Methods 0.000 claims 1
- 230000003631 expected effect Effects 0.000 claims 1
- 238000013473 artificial intelligence Methods 0.000 abstract description 5
- 230000006399 behavior Effects 0.000 description 41
- 210000005155 neural progenitor cell Anatomy 0.000 description 10
- 238000004422 calculation algorithm Methods 0.000 description 3
- 230000007123 defense Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000003542 behavioural effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000001627 detrimental effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F13/00—Video games, i.e. games using an electronically generated display having two or more dimensions
- A63F13/80—Special adaptations for executing a specific game genre or game mode
- A63F13/822—Strategy games; Role-playing games
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/18—Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Mathematical Analysis (AREA)
- Computational Mathematics (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Optimization (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Operations Research (AREA)
- Probability & Statistics with Applications (AREA)
- Computing Systems (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
技术领域technical field
一种基于多智能体技术的NPC控制方法,将人工智能方法用于多人在线角色扮演游戏中自动控制非玩家角色的行为,涉及人工智能、计算机游戏等技术领域。An NPC control method based on multi-agent technology uses artificial intelligence methods to automatically control the behavior of non-player characters in multiplayer online role-playing games, and involves technical fields such as artificial intelligence and computer games.
背景技术Background technique
多人在线角色扮演游戏,是一种大型网络游戏,其中玩家扮演一个虚构角色,并控制该角色的各项活动,并与游戏中大量其他玩家扮演的角色交流互动,协作打败怪物,完成任务。不同于单机游戏模式中的人机单独对抗,在线游戏可以让多个玩家实时协作与对抗,游戏内容与趣味性得到大幅提高。一款优秀的多人在线角色扮演游戏可以吸引大量玩家,通过提供高质量的游戏服务,丰富玩家的精神生活,让人们生活得更快乐。这也催生了一个新兴的产业,已产生了巨大的经济效益。A multiplayer online role-playing game is a large-scale online game in which players play a fictional character and control the character's activities, communicate and interact with a large number of characters played by other players in the game, and cooperate to defeat monsters and complete tasks. Different from the individual confrontation between man and machine in the stand-alone game mode, online games allow multiple players to cooperate and compete in real time, and the game content and fun are greatly improved. An excellent multiplayer online role-playing game can attract a large number of players. By providing high-quality game services, it can enrich the spiritual life of players and make people live happier. This has also given birth to a new industry, which has produced huge economic benefits.
除了玩家角色,游戏中还存在大量非玩家角色,即NPC(Non-player characters)。除了一类提供游戏功能的NPC外,例如物品买卖,任务引导人物外,游戏中存在大量的怪物NPC,这类NPC是各项游戏任务的核心,玩家需要战胜大量各类怪物从而完成任务。通过使NPC的行为更加智能,来合理地调整游戏的难度,兼顾公平性、挑战性与趣味性,是提供优秀在线游戏服务的重要内容。各大游戏厂商在每款游戏中均投入大量人力物力来试图实现这一目标。In addition to player characters, there are also a large number of non-player characters in the game, namely NPC (Non-player characters). In addition to a class of NPCs that provide game functions, such as item trading and task guide characters, there are a large number of monster NPCs in the game. These NPCs are the core of various game tasks. Players need to defeat a large number of various monsters to complete tasks. By making the behavior of NPCs more intelligent, it is an important content to provide excellent online game services to reasonably adjust the difficulty of the game, taking into account fairness, challenge and fun. Major game manufacturers have invested a lot of manpower and material resources in each game to try to achieve this goal.
现有的NPC行为大多是通过预先编制好的脚本进行控制。由于人力与时间等开发资源的限制,游戏开发人员无法为每个场景下的每个NPC编制应对每个可能遇到情况的方法。为了逐步增加游戏挑战性,开发人员会通过大幅强化NPC的物理属性(比如攻击力、防御力等)来提高游戏玩家的难度。所以NPC常显得行为呆板但异常强大,且有损游戏的公平性。预编制脚本的一个重要缺陷是NPC只具有被动应对玩家的能力,而没有主动推理能力。也就是说,玩家可以通过多次试探,推断出其固定的行为模式,从而寻找到战胜看似强大的NPC的策略。更重要的是,这一策略还会在玩家间快速传播,使得游戏的挑战性与趣味性大幅降低。为此,游戏开发人员会根据游戏日志来频繁修改NPC脚本,从而保证游戏的趣味性,因频繁修改NPC脚本受限于人力成本,所以只会针对重要的怪物进行NPC脚本修改。如何针对游戏玩家的策略,有效地定量地调整脚本也非常依赖于游戏开发人员的经验。Existing NPC behaviors are mostly controlled through pre-programmed scripts. Due to the limitation of development resources such as manpower and time, game developers cannot prepare a method for each NPC in each scene to deal with every possible situation. In order to gradually increase the challenge of the game, developers will greatly enhance the physical attributes of NPCs (such as attack power, defense power, etc.) to increase the difficulty of game players. Therefore, NPCs often appear to act dull but extremely powerful, and it is detrimental to the fairness of the game. An important flaw of pre-programmed scripts is that NPCs only have the ability to deal with players passively, but have no active reasoning ability. That is to say, the player can infer its fixed behavior pattern through multiple trials, so as to find a strategy to defeat the seemingly powerful NPC. More importantly, this strategy will spread quickly among players, making the game less challenging and less interesting. For this reason, game developers will frequently modify NPC scripts based on game logs to ensure the fun of the game. Because frequent modification of NPC scripts is limited by labor costs, only important monsters will be modified for NPC scripts. How to effectively and quantitatively adjust the script according to the game player's strategy also depends on the experience of the game developer.
发明内容Contents of the invention
本发明针对上述不足之处提供了一种基于多智能体技术的NPC控制方法,解决现有技术中的NPC不具有自动推理、自动调整行为,需要根据游戏开发人员的经验来频繁的修改NPC脚本,造成人力成本的浪费等问题。The present invention provides a kind of NPC control method based on multi-agent technology for above-mentioned inadequacy, solves the problem that NPC in the prior art does not have automatic reasoning, automatic adjustment behavior, needs to revise NPC script frequently according to the experience of game developer , resulting in waste of labor costs and other issues.
为了实现上述目的,本发明采用的技术方案如下:In order to achieve the above object, the technical scheme adopted in the present invention is as follows:
一种基于多智能体技术的NPC控制方法,其特征在于,包括以下步骤:A kind of NPC control method based on multi-agent technology, is characterized in that, comprises the following steps:
步骤(1)、为每个NPC定义一个自动推理模型——交互式动态影响图,根据NPC自身的特点与场景的特点定义交互式动态影响图的参数;Step (1), define an automatic reasoning model-interactive dynamic influence diagram for each NPC, define the parameters of the interactive dynamic influence diagram according to the characteristics of the NPC itself and the characteristics of the scene;
步骤(2)、根据交互式动态影响图的参数,构造NPC的行为脚本;Step (2), according to the parameter of interactive dynamic influence graph, constructs the behavior script of NPC;
步骤(3)、NPC在与玩家交互过程中执行NPC的行为脚本,并记录每一动作的执行结果,其中动作即指行为脚本;Step (3), the NPC executes the behavior script of the NPC during the interaction with the player, and records the execution result of each action, wherein the action refers to the behavior script;
步骤(4)、根据步骤(3)中记录的每一动作的执行结果,更新交互式动态影响图的参数,重新构造NPC的行为脚本应用于新的玩家交互中。Step (4), according to the execution result of each action recorded in step (3), update the parameters of the interactive dynamic influence graph, reconstruct the behavior script of the NPC and apply it to the new player interaction.
进一步,所述步骤(1)中,根据NPC自身的特点与场景的特点定义交互式动态影响图的参数的具体步骤如下:Further, in the step (1), the specific steps of defining the parameters of the interactive dynamic influence graph according to the characteristics of the NPC itself and the characteristics of the scene are as follows:
步骤(11)、在交互式动态影响图中,根据NPC的可执行动作定义动作集合根据玩家角色的可执行动作定义动作集合 Step (11), in the interactive dynamic influence diagram, define an action set according to the executable actions of the NPC Defines a set of actions based on the executable actions of the player character
步骤(12)、根据NPC所处场景的性质、NPC与玩家角色的位置与属性定义状态集合S*;Step (12), according to the nature of the scene where the NPC is located, the position and attributes of the NPC and the player role define the state collection S * ;
步骤(13)、根据步骤(11)和步骤(12)定义从时间t到时间t+1,状态st∈St通过NPC动作和玩家角色动作转移到状态st+1∈St+1的可能性的条件概率函数,即状态转移函数 Step (13), according to step (11) and step (12) define from time t to time t+1, state s t ∈ S t is acted by NPC and player character actions The conditional probability function of the possibility of transitioning to state s t+1 ∈ S t+1 , the state transition function
步骤(14)、根据NPC角色的行为风格与偏好,定义在时间t,NPC通过动作和玩家角色动作从状态st∈St转移到状态st+1∈St+1的效用函数 Step (14), according to the behavior style and preference of the NPC character, define that at time t, the NPC passes the action and player character actions The utility function for transitioning from state s t ∈ S t to state s t+1 ∈ S t+1
步骤(15)、根据现有的经验知识,初始化状态转移函数与效用函数的参数;Step (15), according to the existing empirical knowledge, initialize the state transition function and utility function parameters;
步骤(16)、根据先验在NPC的交互式动态影响图中包含有若干个刻画玩家角色行为的策略描述每个策略描述表示玩家角色在不同状态下将执行不同动作的条件概率即玩家角色在状态st时执行动作的概率将N个策略描述存储于玩家行为模式的集合具体如下:Step (16), according to the priori, the interactive dynamic influence diagram of the NPC contains several strategy descriptions describing the behavior of the player character Description of each policy Indicates the conditional probability that the player character will perform different actions in different states That is, the player character performs an action in state s t The probability Describe the N strategies A collection of stored player behavior patterns details as follows:
进一步,所述步骤(2)的具体步骤如下:Further, the concrete steps of described step (2) are as follows:
步骤(21)、根据步骤(1)中所构造的交互式动态影响图和所需NPC行为脚本长度,将交互式动态影响图扩展成包含T步推理的交互式动态影响图;Step (21), according to the interactive dynamic influence diagram constructed in step (1) and the required NPC behavior script length, the interactive dynamic influence diagram is expanded into an interactive dynamic influence diagram comprising T-step reasoning;
步骤(22)、采用经典的动态规划求解算法求解步骤(21)得到的包含了T步推理的交互式动态影响图,最大化NPC的即时期望效用与远期期望效用的加权和,寻找到能够最大化NPC角色期望效用的策略,即寻找到能够最大化NPC在每一时刻针对各种情形所应采取的动作的条件概率最大化NPC的即时期望效用与远期期望效用的加权和的公式如下:Step (22), use the classic dynamic programming algorithm to solve the interactive dynamic influence diagram that contains T-step reasoning obtained in step (21), maximize the weighted sum of the immediate expected utility and the long-term expected utility of the NPC, and find the one that can The strategy for maximizing the expected utility of the NPC role is to find the actions that can maximize the NPC's actions for various situations at each moment The conditional probability of The formula for maximizing the weighted sum of immediate expected utility and long-term expected utility of NPC is as follows:
其中,是即时期望效用,是远期期望效用,λ是折扣因子,用来减弱远期效用对当前动作的影响,ERi是总效用值,即NPC的即时期望效用与远期期望效用的加权和;in, is the immediate expected utility, is the long-term expected utility, λ is the discount factor, which is used to weaken the impact of the long-term utility on the current action, ER i is the total utility value, that is, the weighted sum of the immediate expected utility and the long-term expected utility of the NPC;
步骤(23)、将步骤(22)得到的最大化NPC期望效用的策略转换成兼容当前游戏的行为脚本格式。Step (23), converting the strategy of maximizing NPC expected utility obtained in step (22) into a behavior script format compatible with the current game.
进一步,所述步骤(3)的具体步骤如下:Further, the concrete steps of described step (3) are as follows:
步骤(31)、NPC在与玩家角色交互过程中,针对不同玩家角色的不同动作采用NPC动作即采用步骤(23)中得到的行为脚本;Step (31), NPC adopts NPC actions for different actions of different player characters in the process of interacting with player characters Promptly adopt the behavioral script that obtains in the step (23);
步骤(32)、记录每次NPC动作执行的结果:在每个时刻,包括玩家角色执行的动作NPC执行的动作对状态的改变,即从状态st迁移到状态st+1和状态st迁移到状态st+1下动作对应的效用值 Step (32), record every NPC action Result of execution: at each moment, including the actions performed by the player character Actions performed by NPCs The change to the state, that is, the utility value corresponding to the action under the transition from state s t to state s t+1 and state s t to state s t+1
进一步,所述步骤(4)的具体步骤如下:Further, the concrete steps of described step (4) are as follows:
步骤(41)、根据步骤(3)记录每次动作执行的结果,统计出NPC在每个状态st∈St时,玩家角色执行的各个动作的频率,将频率归一化处理,得到在各个状态下的行为频率的条件概率即可以看作是NPC基于当前玩家的行为对NPC构造的策略描述 通过与玩家行为模式的集合比较,寻找到最相似的玩家策略描述即的与的是最相似的:Step (41), record the results of each action execution according to step (3), and count the actions performed by the player role when the NPC is in each state s t ∈ S t The frequency of , normalize the frequency to get the conditional probability of the behavior frequency in each state That is, it can be regarded as the NPC's strategy description based on the current player's behavior to the NPC By integrating with player behavior patterns Compare to find the most similar player strategy description which is of and of is most similar to:
根据统计得到的玩家行为策略描述更新交互式动态影响图中最相似的玩家策略描述在各种情况下采取各种行为的条件概率得到更新后的策略描述的具体公式如下:Strategy description of player behavior based on statistics Updated interactive dynamic influence graph for most similar player strategy descriptions Conditional probability of taking various actions in various situations get the updated policy description The specific formula is as follows:
其中,是现有交互式动态影响图中玩家行为的策略描述,是从交互数据中得到的策略描述,α是策略更新率,表示当前策略描述被更新的速度;in, is a policy description of player behavior in existing interactive dynamic influence graphs, is the policy description obtained from the interaction data, and α is the policy update rate, indicating the speed at which the current policy description is updated;
步骤(42)、根据步骤(3)记录的每次动作执行的结果,统计当NPC执行动作与玩家执行动作时、当前状态从st转移到状态st+1的频率的条件概率和动作对应的期望效用值更新NPC交互式动态影响图的状态转移函数和效用函数更新的具体公式如下:Step (42), according to the result of each action execution recorded in step (3), count when the NPC performs the action Perform actions with the player The conditional probability of the frequency at which the current state transitions from s t to state s t+1 when The expected utility value corresponding to the action Update state transition function of NPC interactive dynamic influence graph and utility function The updated specific formula is as follows:
其中,是现有交互式动态影响图中的条件概率,是现有交互式动态影响图中的效用函数;in, is the conditional probability in the existing interactive dynamic influence diagram, is the utility function in the existing interactive dynamic influence diagram;
步骤(43)、将步骤(41)和步骤(42)得到的结果更新交互式动态影响图,用于步骤(2)-步骤(4)。Step (43), updating the interactive dynamic influence diagram with the results obtained in step (41) and step (42), for step (2)-step (4).
综上所述,由于采用了上述技术方案,本发明的有益效果是:In summary, owing to adopting above-mentioned technical scheme, the beneficial effect of the present invention is:
1、本发明中的交互式动态影响图是人工智能最新研究成果之一,拥有强大的针对个体建模能力,能够准确的刻画玩家的行为,并且基于模型的脚本生成算法运行效率与脚本质量都满足实际应用的要求;1. The interactive dynamic influence graph in the present invention is one of the latest research results of artificial intelligence. It has powerful modeling capabilities for individuals and can accurately describe the player's behavior. Moreover, the operating efficiency and script quality of the script generation algorithm based on the model are both excellent. Meet the requirements of practical applications;
2、本发明可以通过预先为怪物NPC定义的交互式动态影响图,实现行为脚本的自动在线更新与生成,大幅降低游戏开发人员的工作负担;2. The present invention can realize the automatic online update and generation of behavior scripts through the interactive dynamic influence graph defined in advance for monster NPCs, greatly reducing the workload of game developers;
3、在本发明中,怪物NPC所包含的交互式动态影响图的更新基于真实在线数据,所以NPC脚本的更新对玩家更有针对性,更进一步地,通过每种怪物NPC在多个副本之间共享数据与玩家策略描述,生成的NPC行为脚本具有良好的通用性。3. In the present invention, the update of the interactive dynamic influence map contained in the monster NPC is based on real online data, so the update of the NPC script is more targeted to the player. Further, through each monster NPC in multiple copies Sharing data and player strategy descriptions among players, the generated NPC behavior scripts have good versatility.
附图说明Description of drawings
图1是本发明包含2步推理的交互式动态影响图;Fig. 1 is the interactive dynamic influence figure that the present invention comprises 2 steps reasoning;
图2是本发明策略描述示意图;Fig. 2 is a schematic diagram describing the strategy of the present invention;
图3是本发明执行记录数据示例。Fig. 3 is an example of execution record data of the present invention.
具体实施方式detailed description
为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本发明,并不用于限定本发明。In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.
步骤(1)、为每个NPC定义一个自动推理模型——交互式动态影响图(以下简称影响图),根据NPC自身的特点与场景的特点定义影响图的参数。Step (1), define an automatic reasoning model for each NPC—an interactive dynamic influence diagram (hereinafter referred to as the influence diagram), and define the parameters of the influence diagram according to the characteristics of the NPC itself and the characteristics of the scene.
步骤(2)、根据影响图的参数,构造NPC的行为脚本。Step (2), according to the parameters of the influence diagram, the behavior script of the NPC is constructed.
步骤(3)、NPC在与玩家交互过程中执行行为脚本,并记录每一动作,其中动作即指行为脚本。Step (3), the NPC executes the behavior script during the interaction with the player, and records each action, wherein the action refers to the behavior script.
步骤(4)、根据步骤(3)中记录的每一动作的执行结果,更新影响图参数,并重新产生NPC的行为脚本应用于新的玩家交互中。Step (4), according to the execution result of each action recorded in step (3), update the parameters of the influence graph, and regenerate the behavior script of the NPC to be applied to the new player interaction.
所述步骤(1)中,根据NPC自身的特点与场景的特点定义影响图的参数具体如下:In the step (1), according to the characteristics of the NPC itself and the characteristics of the scene, the parameters of the influence diagram are defined as follows:
在影响图中,根据NPC的可执行动作定义动作集合(例如,NPC的可执行动作如攻击、防御、逃跑、追击动作等),根据玩家角色的可执行动作定义动作集合玩家角色的可执行动作如轻拳、重拳、逃跑等;根据所处场景性质、NPC与玩家角色的位置与属性定义状态集合S*;从时间t到时间t+1的状态转移函数根据NPC的行为风格与偏好(例如凶悍型NPC攻击得到的效用会被设置的较高以鼓励攻击,而保守型角色采取防守的时候得到的效用相对较高),定义NPC的效用函数其中,状态集合S*中的状态包含了NPC和玩家角色的基本信息:如NPC与玩家位置和属性值等。状态转移函数是一种条件概率函数,具体的说,状态转移函数量化地表达了从时间t到时间t+1,状态st∈St,通过NPC动作和玩家角色动作转移到状态st+1∈St+1的可能性,即概率;效用函数量化地表示NPC每次动作执行的结果,具体的说,效用函数表示了在时间t,NPC通过动作和玩家角色动作从状态st∈St转移到状态st+1∈St+1所得到的效用值。当NPC执行动作后,玩家角色受到的伤害是NPC的正效用;NPC自身受到的伤害是NPC的负效用,该动作的总效用为这两部分之和。NPC动作的效用还同时依赖于初始状态st、目标状态st+1和玩家角色的动作图1中是包含2步推理(t和t+1)动态影响图,当需要第3步推理时,第t+2步的模型可以通过复制t+1时间片(即所有以t+1为上标的节点)和节点之间的链接来构造,重复此过程,可以构造出包含T步推理的动态影响图。根据现有的经验知识,初始化状态转移函数与效用函数的参数。In the influence diagram, define the set of actions according to the executable actions of the NPC (For example, NPC's executable actions such as attack, defense, escape, pursuit, etc.), define the action set according to the executable actions of the player character The executable actions of the player character such as light punch, heavy punch, escape, etc.; define the state set S * according to the nature of the scene, the position and attributes of the NPC and the player character; the state transition function from time t to time t+1 According to the NPC's behavior style and preferences (for example, the utility of aggressive NPC attacks will be set higher to encourage attacks, and the utility of conservative characters when they take defense is relatively high), define the utility function of NPC Among them, the state in the state set S * contains the basic information of the NPC and the player character: such as the position and attribute value of the NPC and the player. The state transition function is a conditional probability function, specifically, the state transition function Quantitatively express from time t to time t+1, state s t ∈ S t , through NPC actions and player character actions The possibility of transitioning to the state s t+1 ∈ S t+1 , that is, the probability; the utility function Quantitatively represent the results of each action performed by the NPC, specifically, the utility function Indicates that at time t, the NPC passes the action and player character actions The utility value obtained by transferring from state s t ∈ S t to state s t+1 ∈ S t+1 . When the NPC performs an action, the damage received by the player character is the positive utility of the NPC; the damage received by the NPC itself is the negative utility of the NPC, and the total utility of the action is the sum of these two parts. The utility of the NPC action also depends on the initial state s t , the target state s t+1 and the action of the player character Figure 1 is a dynamic influence diagram containing 2-step reasoning (t and t+1). When the 3rd step reasoning is required, the model of the t+2 step can be copied by the t+1 time slice (that is, all the t+1-based Superscripted nodes) and the links between the nodes are constructed. Repeating this process, a dynamic influence graph including T-step reasoning can be constructed. According to the existing empirical knowledge, initialize the state transition function and utility function parameters.
根据先验知识,NPC的影响图中包含有若干个刻画玩家角色行为的策略描述每个策略描述表示玩家角色在不同状态下将执行不同动作的条件概率具体的说,定义了玩家角色在状态st时执行动作的概率。将N个策略描述存储于玩家行为模式的集合公式如下:According to prior knowledge, the NPC's influence diagram contains several policy descriptions that describe the behavior of the player character Description of each policy Indicates the conditional probability that the player character will perform different actions in different states Specifically, Defines the action that the player character performs in state s t The probability. Store N policy descriptions in a collection of player behavior patterns The formula is as follows:
所述步骤(2)的具体步骤如下:The concrete steps of described step (2) are as follows:
步骤(21),针对步骤(1)中所构造的影响图,根据所需脚本长度,扩展成包含T步推理的影响图。因为在t时刻,NPC执行的动作会产生效用值这里的效用值也同时依赖于当前状态St、目标状态St+1和玩家角色的动作这里的效用值量化地刻画了NPC在一个特定情形下(包括当前状态、新状态和玩家的动作)执行动作的结果。此外,NPC所执行的动作还会随机地导致对战态势的变化,即状态迁移的变化,继而影响未来可能得到的效用值。所以,求解这个包含T步推理的影像图的目标是最大化NPC的即时期望效用与远期期望效用的加权和,加权是为了降低未来效用对现在效用的影响。Step (21), expanding the influence diagram constructed in step (1) into an influence diagram including T-step reasoning according to the required script length. Because at time t, the action performed by the NPC will generate utility value The utility value here also depends on the current state S t , the target state S t+1 and the action of the player character The utility value here quantitatively describes the actions performed by the NPC in a specific situation (including the current state, new state and player's actions) the result of. In addition, the actions performed by the NPC will also randomly lead to changes in the battle situation, that is, state transitions changes, and then affect the utility value that may be obtained in the future. Therefore, the goal of solving this image graph containing T-step reasoning is to maximize the immediate expected utility of the NPC and forward expected utility The weighted sum of , weighting is to reduce the impact of future utility on current utility.
步骤(22)、采用经典的动态规划求解算法求解步骤(21)得到的包含了T步推理的交互式动态影响图,最大化NPC的即时期望效用与远期期望效用的加权和,寻找到能够最大化NPC角色期望效用的策略,这里的策略指定了NPC在每一时刻针对各种情形(包括自身的状态、玩家的状态、位置等)所应采取的动作每一次采取的动作同时取决于当前的状态,这里策略采用条件概率函数进行描述。当前的状态取决于之前的状态与玩家和NPC之前采取的动作的概率玩家和NPC之前采取的动作又被称为观测信息;其中,最大化NPC的即时期望效用与远期期望效用的加权和的公式如下:Step (22), use the classic dynamic programming algorithm to solve the interactive dynamic influence diagram that contains T-step reasoning obtained in step (21), maximize the weighted sum of the immediate expected utility and long-term expected utility of the NPC, and find the one that can A strategy for maximizing the expected utility of an NPC role, where the strategy specifies the actions that the NPC should take for various situations (including its own state, player's state, position, etc.) at each moment Each action taken also depends on the current state, where the strategy uses a conditional probability function to describe. The current state depends on the probabilities of the previous state and previous actions taken by the player and NPC The previous actions taken by the player and the NPC are also called observation information; among them, the formula for maximizing the weighted sum of the NPC’s immediate expected utility and long-term expected utility is as follows:
其中,是即时期望效用,是远期期望效用,λ是折扣因子,用来减弱远期效用对当前动作的影响,ERi是总效用值,即NPC的即时期望效用与远期期望效用的加权和。期望效用值是各种特定情形下效用值与各种情形出现的可能性,即当前状态为st,NPC执行动作目标状态st+1,且玩家角色执行动作时得到的具体效用值与出现这一情形的概率的加权和,式中需要考虑每种可能的组合。in, is the immediate expected utility, is the long-term expected utility, λ is the discount factor, which is used to weaken the impact of the long-term utility on the current action, and ER i is the total utility value, that is, the weighted sum of the immediate expected utility and the long-term expected utility of the NPC. The expected utility value is the utility value in various specific situations and the possibility of various situations, that is, the current state is st , and the NPC performs an action Goal state s t+1 , and the player character performs the action The specific utility value obtained when and the probability of this happening The weighted sum of , where every possible combination needs to be considered.
步骤(23)、将步骤(22)得到的最大化NPC期望效用的策略转换成兼容当前游戏的行为脚本格式。即将得到的由条件概率函数表达的策略,转换成兼容当前游戏的基于条件判断的行为脚本格式。Step (23), converting the strategy of maximizing NPC expected utility obtained in step (22) into a behavior script format compatible with the current game. The strategy expressed by the conditional probability function to be obtained is converted into a behavior script format based on conditional judgment compatible with the current game.
所述步骤(3)的具体步骤如下:The concrete steps of described step (3) are as follows:
(31)NPC在与玩家角色交互过程中,执行步骤(22)中针对不同玩家不同动作定义的行为脚本。每一次采取的动作同时取决于当前状态St,这里St也可以看作NPC在不同时刻观测到的信息。(31) During the interaction process between the NPC and the player character, execute the behavior script defined in step (22) for different actions of different players. Each action taken also depends on the current state S t , where S t can also be regarded as the information observed by the NPC at different times.
(32)记录每次动作执行的结果:在每个时刻,包括玩家角色执行的动作NPC执行的动作对状态的改变,即从状态st迁移到状态st+1和从状态st迁移到状态st+1下动作对应的效用值数据格式如图3所示。(32) Record the results of each action execution: at each moment, including the actions performed by the player character Actions performed by NPCs The change to the state, that is, the utility value corresponding to the action under the transition from state s t to state s t+1 and from state s t to state s t+1 The data format is shown in Figure 3.
所述步骤(4)的具体步骤如下:The concrete steps of described step (4) are as follows:
(41)从步骤(32)记录的每次动作执行的结果,统计出NPC在每个状态st∈St时,玩家角色执行的各个动作的频率,将频率通过归一化,继而得到在各个状态下的行为频率的条件概率即NPC基于当前玩家的行为对玩家构造的策略描述 通过与玩家行为模式的集合比较,寻找到最相似的一个玩家策略即的与的是最相似的:(41) From the results of each action recorded in step (32), count the actions performed by the player character when the NPC is in each state st ∈ S t The frequency of , the frequency is normalized, and then the conditional probability of the behavior frequency in each state is obtained That is, the NPC describes the player's strategy based on the current player's behavior By integrating with player behavior patterns Compare to find the most similar player strategy which is of and of is most similar to:
根据统计得到的玩家行为策略描述更新交互式动态影响图中最相似的玩家策略描述在各种情况下(即不同的当前状态st,不同的玩家角色动作)采取各种行为的条件概率得到更新后的策略描述的具体公式如下:Strategy description of player behavior based on statistics Updated interactive dynamic influence graph for most similar player strategy descriptions In various situations (i.e. different current states s t , different player character actions ) the conditional probability of taking various actions get the updated policy description The specific formula is as follows:
其中,是现有交互式动态影响图中玩家行为的策略描述,是从交互数据中得到的策略描述,α是策略更新率,表示当前策略描述被更新的速度;in, is a policy description of player behavior in existing interactive dynamic influence graphs, is the policy description obtained from the interaction data, and α is the policy update rate, indicating the speed at which the current policy description is updated;
通过基于交互数据更新现有交互式动态影响图中玩家角度的策略描述,能够更加准确地在行为脚本生成中预测玩家的行为,使得NPC的行为在之后的对战中更加智能。By updating the strategy description of the player's angle in the existing interactive dynamic influence graph based on the interaction data, the player's behavior can be more accurately predicted in the behavior script generation, making the NPC's behavior more intelligent in subsequent battles.
(42)根据步骤(32)记录的每次动作执行的结果,统计当NPC角色执行动作与玩家执行的动作当前状态从st转移到状态st+1的频率的概率和动作对应的期望效用值这里的期望效用值是各种特定情形下效用值与各种情形出现频率的加权和。更新怪物NPC影响图的状态转移函数和效用函数更新的方法与步骤(41)类似:(42) According to the result of each action execution recorded in step (32), count when the NPC character performs the action Action performed with the player Probability of how often the current state transitions from state s t to state s t+1 The expected utility value corresponding to the action The expected utility value here is the weighted sum of the utility value in various specific situations and the occurrence frequency of various situations. Update the state transition function of the monster NPC influence graph and utility function The updated method is similar to step (41):
通过基于交互数据更新现有影响图中的状态转移函数与效用函数,能够更加准确刻画NPC所处游戏场景,使得生成的NPC行为在之后的对战中更加智能。By updating the state transition function and utility function in the existing influence diagram based on the interaction data, the game scene where the NPC is located can be more accurately described, making the generated NPC behavior more intelligent in subsequent battles.
(43)将步骤(41)和步骤(42)得到的结果更新交互式动态影响图,用于步骤(2)-步骤(4)。(43) Update the interactive dynamic influence diagram with the results obtained in step (41) and step (42), and use it in step (2)-step (4).
以上所述仅为本发明的较佳实施例而已,并不用以限制本发明,凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等,均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present invention should be included in the protection of the present invention. within range.
Claims (5)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710237082.5A CN107145948B (en) | 2017-04-12 | 2017-04-12 | A NPC control method based on multi-agent technology |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710237082.5A CN107145948B (en) | 2017-04-12 | 2017-04-12 | A NPC control method based on multi-agent technology |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107145948A true CN107145948A (en) | 2017-09-08 |
CN107145948B CN107145948B (en) | 2021-05-18 |
Family
ID=59773612
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710237082.5A Expired - Fee Related CN107145948B (en) | 2017-04-12 | 2017-04-12 | A NPC control method based on multi-agent technology |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107145948B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107890674A (en) * | 2017-11-13 | 2018-04-10 | 杭州电魂网络科技股份有限公司 | AI behaviors call method and device |
CN108920221A (en) * | 2018-06-29 | 2018-11-30 | 网易(杭州)网络有限公司 | The method and device of game difficulty adjustment, electronic equipment, storage medium |
CN110141867A (en) * | 2019-04-23 | 2019-08-20 | 广州多益网络股份有限公司 | A kind of game intelligence body training method and device |
CN113144609A (en) * | 2021-04-25 | 2021-07-23 | 上海雷鸣文化传播有限公司 | Game role behavior algorithm based on genetic algorithm |
CN114681919A (en) * | 2022-03-29 | 2022-07-01 | 米哈游科技(上海)有限公司 | NPC (network processor core) performance method, device, medium and electronic equipment based on NPC mutual behavior influence |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101158897A (en) * | 2007-10-09 | 2008-04-09 | 南京大学 | Method and system for implementing intelligent non-player characters in interactive games |
US20100022301A1 (en) * | 2008-07-22 | 2010-01-28 | Sony Online Entertainment Llc | System and method for progressing character abilities in a simulation |
CN102930338A (en) * | 2012-11-13 | 2013-02-13 | 沈阳信达信息科技有限公司 | Game non-player character (NPC) action based on neural network |
CN105561578A (en) * | 2015-12-11 | 2016-05-11 | 北京像素软件科技股份有限公司 | NPC behavior decision method |
-
2017
- 2017-04-12 CN CN201710237082.5A patent/CN107145948B/en not_active Expired - Fee Related
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101158897A (en) * | 2007-10-09 | 2008-04-09 | 南京大学 | Method and system for implementing intelligent non-player characters in interactive games |
US20100022301A1 (en) * | 2008-07-22 | 2010-01-28 | Sony Online Entertainment Llc | System and method for progressing character abilities in a simulation |
CN102930338A (en) * | 2012-11-13 | 2013-02-13 | 沈阳信达信息科技有限公司 | Game non-player character (NPC) action based on neural network |
CN105561578A (en) * | 2015-12-11 | 2016-05-11 | 北京像素软件科技股份有限公司 | NPC behavior decision method |
Non-Patent Citations (3)
Title |
---|
JOSEPH A. TATMAN等: "Dynamic Programming and Influence Diagrams", 《IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS》 * |
ROSS CONROY等: "A Value Equivalence Approach for Solving Interactive Dynamic Influence Diagrams", 《PROCEEDINGS OF THE 15TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS》 * |
罗键等: "基于多Agent的交互式动态影响图研究、应用与展望", 《厦门大学学报(自然科学版)》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107890674A (en) * | 2017-11-13 | 2018-04-10 | 杭州电魂网络科技股份有限公司 | AI behaviors call method and device |
CN108920221A (en) * | 2018-06-29 | 2018-11-30 | 网易(杭州)网络有限公司 | The method and device of game difficulty adjustment, electronic equipment, storage medium |
CN110141867A (en) * | 2019-04-23 | 2019-08-20 | 广州多益网络股份有限公司 | A kind of game intelligence body training method and device |
CN113144609A (en) * | 2021-04-25 | 2021-07-23 | 上海雷鸣文化传播有限公司 | Game role behavior algorithm based on genetic algorithm |
CN114681919A (en) * | 2022-03-29 | 2022-07-01 | 米哈游科技(上海)有限公司 | NPC (network processor core) performance method, device, medium and electronic equipment based on NPC mutual behavior influence |
CN114681919B (en) * | 2022-03-29 | 2024-10-01 | 米哈游科技(上海)有限公司 | NPC interaction influence-based NPC expression method, device, medium and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN107145948B (en) | 2021-05-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107145948B (en) | A NPC control method based on multi-agent technology | |
Kalyanakrishnan et al. | Half field offense in RoboCup soccer: A multiagent reinforcement learning case study | |
CN108211362B (en) | A non-player character combat strategy learning method based on deep Q-learning network | |
CN110882544B (en) | Multi-agent training method and device and electronic equipment | |
WO2023071854A1 (en) | Control method and apparatus for virtual character in game, computer device, storage medium, and program | |
CN112870721B (en) | Game interaction method, device, equipment and storage medium | |
Lin et al. | Juewu-mc: Playing minecraft with sample-efficient hierarchical reinforcement learning | |
CN101834842B (en) | RoboCup platform player intelligent control method and system in embedded environment | |
CN104102522B (en) | The artificial emotion driving method of intelligent non-player roles in interactive entertainment | |
CN111841018B (en) | Model training method, model using method, computer device, and storage medium | |
CN108228251B (en) | Method and device for controlling target object in game application | |
US10880192B1 (en) | Interactive agents for user engagement in an interactive environment | |
CN113926196B (en) | Control method, device, storage medium and electronic device for virtual game characters | |
Wang et al. | Large scale deep reinforcement learning in war-games | |
Edwards et al. | The role of machine learning in game development domain-a review of current trends and future directions | |
Zhang et al. | A leader-following paradigm based deep reinforcement learning method for multi-agent cooperation games | |
CN116090549A (en) | Knowledge-driven multi-agent reinforcement learning decision-making method, system and storage medium | |
Liaw et al. | Evolving a team in a first-person shooter game by using a genetic algorithm | |
Kayakoku et al. | A Novel Behavioral Strategy for RoboCode Platform Based on Deep Q‐Learning | |
CN117654053A (en) | Non-player character behavior decision method based on artificial intelligence | |
Sithungu et al. | Adaptive Game AI-Based Dynamic Difficulty Scaling via the Symbiotic Game Agent | |
CN112870727A (en) | Training and control method for intelligent agent in game | |
Zhang | Using artificial intelligence assistant technology to develop animation games on iot | |
Maggiorini et al. | Follow the leader: a scalable approach for realistic group behavior of roaming NPCs in MMO games | |
Mohan et al. | Learning to play Mario |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20210518 |
|
CF01 | Termination of patent right due to non-payment of annual fee |