CN107145948A

CN107145948A - A NPC Control Method Based on Multi-agent Technology

Info

Publication number: CN107145948A
Application number: CN201710237082.5A
Authority: CN
Inventors: 陈盈科; 唐华锦; 燕锐; 李洪莹
Original assignee: Sichuan University
Current assignee: Sichuan University
Priority date: 2017-04-12
Filing date: 2017-04-12
Publication date: 2017-09-08
Anticipated expiration: 2037-04-12
Also published as: CN107145948B

Abstract

The invention discloses a kind of NPC control methods based on multi-agent Technology, it is related to the technical fields such as artificial intelligence, computer game, NPC of the prior art is solved without automated reasoning, adjust automatically behavior, need frequently to change NPC scripts according to the experience of game developer, cause the problems such as human cost is paid wages.The present invention includes defining an automated reasoning model for each NPC --- Interactive Dynamic influence figure, the parameter of defining interactive Dynamic Influence Diagrams according to the characteristics of NPC itself and the characteristics of scene；According to the Interactive Dynamic influence figure for defining parameter, NPC behavior script is constructed；NPC records the implementing result of each action in the behavior script with execution NPC in player's interaction；According to the implementing result of each action of record, updating Interactive Dynamic influences the parameter of figure, reconfigures NPC behavior script applications in new player's interaction.Artificial intelligence approach is used in the online RPG of many people the behavior for automatically controlling non-player role by the present invention.

Description

A NPC Control Method Based on Multi-agent Technology

技术领域technical field

一种基于多智能体技术的NPC控制方法，将人工智能方法用于多人在线角色扮演游戏中自动控制非玩家角色的行为，涉及人工智能、计算机游戏等技术领域。An NPC control method based on multi-agent technology uses artificial intelligence methods to automatically control the behavior of non-player characters in multiplayer online role-playing games, and involves technical fields such as artificial intelligence and computer games.

背景技术Background technique

多人在线角色扮演游戏，是一种大型网络游戏，其中玩家扮演一个虚构角色，并控制该角色的各项活动，并与游戏中大量其他玩家扮演的角色交流互动，协作打败怪物，完成任务。不同于单机游戏模式中的人机单独对抗，在线游戏可以让多个玩家实时协作与对抗，游戏内容与趣味性得到大幅提高。一款优秀的多人在线角色扮演游戏可以吸引大量玩家，通过提供高质量的游戏服务，丰富玩家的精神生活，让人们生活得更快乐。这也催生了一个新兴的产业，已产生了巨大的经济效益。A multiplayer online role-playing game is a large-scale online game in which players play a fictional character and control the character's activities, communicate and interact with a large number of characters played by other players in the game, and cooperate to defeat monsters and complete tasks. Different from the individual confrontation between man and machine in the stand-alone game mode, online games allow multiple players to cooperate and compete in real time, and the game content and fun are greatly improved. An excellent multiplayer online role-playing game can attract a large number of players. By providing high-quality game services, it can enrich the spiritual life of players and make people live happier. This has also given birth to a new industry, which has produced huge economic benefits.

除了玩家角色，游戏中还存在大量非玩家角色，即NPC(Non-player characters)。除了一类提供游戏功能的NPC外，例如物品买卖，任务引导人物外，游戏中存在大量的怪物NPC，这类NPC是各项游戏任务的核心，玩家需要战胜大量各类怪物从而完成任务。通过使NPC的行为更加智能，来合理地调整游戏的难度，兼顾公平性、挑战性与趣味性，是提供优秀在线游戏服务的重要内容。各大游戏厂商在每款游戏中均投入大量人力物力来试图实现这一目标。In addition to player characters, there are also a large number of non-player characters in the game, namely NPC (Non-player characters). In addition to a class of NPCs that provide game functions, such as item trading and task guide characters, there are a large number of monster NPCs in the game. These NPCs are the core of various game tasks. Players need to defeat a large number of various monsters to complete tasks. By making the behavior of NPCs more intelligent, it is an important content to provide excellent online game services to reasonably adjust the difficulty of the game, taking into account fairness, challenge and fun. Major game manufacturers have invested a lot of manpower and material resources in each game to try to achieve this goal.

现有的NPC行为大多是通过预先编制好的脚本进行控制。由于人力与时间等开发资源的限制，游戏开发人员无法为每个场景下的每个NPC编制应对每个可能遇到情况的方法。为了逐步增加游戏挑战性，开发人员会通过大幅强化NPC的物理属性(比如攻击力、防御力等)来提高游戏玩家的难度。所以NPC常显得行为呆板但异常强大，且有损游戏的公平性。预编制脚本的一个重要缺陷是NPC只具有被动应对玩家的能力，而没有主动推理能力。也就是说，玩家可以通过多次试探，推断出其固定的行为模式，从而寻找到战胜看似强大的NPC的策略。更重要的是，这一策略还会在玩家间快速传播，使得游戏的挑战性与趣味性大幅降低。为此，游戏开发人员会根据游戏日志来频繁修改NPC脚本，从而保证游戏的趣味性，因频繁修改NPC脚本受限于人力成本，所以只会针对重要的怪物进行NPC脚本修改。如何针对游戏玩家的策略，有效地定量地调整脚本也非常依赖于游戏开发人员的经验。Existing NPC behaviors are mostly controlled through pre-programmed scripts. Due to the limitation of development resources such as manpower and time, game developers cannot prepare a method for each NPC in each scene to deal with every possible situation. In order to gradually increase the challenge of the game, developers will greatly enhance the physical attributes of NPCs (such as attack power, defense power, etc.) to increase the difficulty of game players. Therefore, NPCs often appear to act dull but extremely powerful, and it is detrimental to the fairness of the game. An important flaw of pre-programmed scripts is that NPCs only have the ability to deal with players passively, but have no active reasoning ability. That is to say, the player can infer its fixed behavior pattern through multiple trials, so as to find a strategy to defeat the seemingly powerful NPC. More importantly, this strategy will spread quickly among players, making the game less challenging and less interesting. For this reason, game developers will frequently modify NPC scripts based on game logs to ensure the fun of the game. Because frequent modification of NPC scripts is limited by labor costs, only important monsters will be modified for NPC scripts. How to effectively and quantitatively adjust the script according to the game player's strategy also depends on the experience of the game developer.

发明内容Contents of the invention

本发明针对上述不足之处提供了一种基于多智能体技术的NPC控制方法，解决现有技术中的NPC不具有自动推理、自动调整行为，需要根据游戏开发人员的经验来频繁的修改NPC脚本，造成人力成本的浪费等问题。The present invention provides a kind of NPC control method based on multi-agent technology for above-mentioned inadequacy, solves the problem that NPC in the prior art does not have automatic reasoning, automatic adjustment behavior, needs to revise NPC script frequently according to the experience of game developer , resulting in waste of labor costs and other issues.

为了实现上述目的，本发明采用的技术方案如下：In order to achieve the above object, the technical scheme adopted in the present invention is as follows:

一种基于多智能体技术的NPC控制方法，其特征在于，包括以下步骤：A kind of NPC control method based on multi-agent technology, is characterized in that, comprises the following steps:

步骤(1)、为每个NPC定义一个自动推理模型——交互式动态影响图，根据NPC自身的特点与场景的特点定义交互式动态影响图的参数；Step (1), define an automatic reasoning model-interactive dynamic influence diagram for each NPC, define the parameters of the interactive dynamic influence diagram according to the characteristics of the NPC itself and the characteristics of the scene;

步骤(2)、根据交互式动态影响图的参数，构造NPC的行为脚本；Step (2), according to the parameter of interactive dynamic influence graph, constructs the behavior script of NPC;

步骤(3)、NPC在与玩家交互过程中执行NPC的行为脚本，并记录每一动作的执行结果，其中动作即指行为脚本；Step (3), the NPC executes the behavior script of the NPC during the interaction with the player, and records the execution result of each action, wherein the action refers to the behavior script;

步骤(4)、根据步骤(3)中记录的每一动作的执行结果，更新交互式动态影响图的参数，重新构造NPC的行为脚本应用于新的玩家交互中。Step (4), according to the execution result of each action recorded in step (3), update the parameters of the interactive dynamic influence graph, reconstruct the behavior script of the NPC and apply it to the new player interaction.

进一步，所述步骤(1)中，根据NPC自身的特点与场景的特点定义交互式动态影响图的参数的具体步骤如下：Further, in the step (1), the specific steps of defining the parameters of the interactive dynamic influence graph according to the characteristics of the NPC itself and the characteristics of the scene are as follows:

步骤(11)、在交互式动态影响图中，根据NPC的可执行动作定义动作集合根据玩家角色的可执行动作定义动作集合 Step (11), in the interactive dynamic influence diagram, define an action set according to the executable actions of the NPC Defines a set of actions based on the executable actions of the player character

步骤(12)、根据NPC所处场景的性质、NPC与玩家角色的位置与属性定义状态集合S^*；Step (12), according to the nature of the scene where the NPC is located, the position and attributes of the NPC and the player role define the state collection S ^* ;

步骤(13)、根据步骤(11)和步骤(12)定义从时间t到时间t+1，状态s^t∈S^t通过NPC动作和玩家角色动作转移到状态s^t+1∈S^t+1的可能性的条件概率函数，即状态转移函数 Step (13), according to step (11) and step (12) define from time t to time t+1, state s ^t ∈ S ^t is acted by NPC and player character actions The conditional probability function of the possibility of transitioning to state s ^t+1 ∈ S ^t+1 , the state transition function

步骤(14)、根据NPC角色的行为风格与偏好，定义在时间t，NPC通过动作和玩家角色动作从状态s^t∈S^t转移到状态s^t+1∈S^t+1的效用函数 Step (14), according to the behavior style and preference of the NPC character, define that at time t, the NPC passes the action and player character actions The utility function for transitioning from state s ^t ∈ S ^t to state s ^t+1 ∈ S ^t+1

步骤(15)、根据现有的经验知识，初始化状态转移函数与效用函数的参数；Step (15), according to the existing empirical knowledge, initialize the state transition function and utility function parameters;

步骤(16)、根据先验在NPC的交互式动态影响图中包含有若干个刻画玩家角色行为的策略描述每个策略描述表示玩家角色在不同状态下将执行不同动作的条件概率即玩家角色在状态s^t时执行动作的概率将N个策略描述存储于玩家行为模式的集合具体如下：Step (16), according to the priori, the interactive dynamic influence diagram of the NPC contains several strategy descriptions describing the behavior of the player character Description of each policy Indicates the conditional probability that the player character will perform different actions in different states That is, the player character performs an action in state s ^t The probability Describe the N strategies A collection of stored player behavior patterns details as follows:

进一步，所述步骤(2)的具体步骤如下：Further, the concrete steps of described step (2) are as follows:

步骤(21)、根据步骤(1)中所构造的交互式动态影响图和所需NPC行为脚本长度，将交互式动态影响图扩展成包含T步推理的交互式动态影响图；Step (21), according to the interactive dynamic influence diagram constructed in step (1) and the required NPC behavior script length, the interactive dynamic influence diagram is expanded into an interactive dynamic influence diagram comprising T-step reasoning;

步骤(22)、采用经典的动态规划求解算法求解步骤(21)得到的包含了T步推理的交互式动态影响图，最大化NPC的即时期望效用与远期期望效用的加权和，寻找到能够最大化NPC角色期望效用的策略，即寻找到能够最大化NPC在每一时刻针对各种情形所应采取的动作的条件概率最大化NPC的即时期望效用与远期期望效用的加权和的公式如下：Step (22), use the classic dynamic programming algorithm to solve the interactive dynamic influence diagram that contains T-step reasoning obtained in step (21), maximize the weighted sum of the immediate expected utility and the long-term expected utility of the NPC, and find the one that can The strategy for maximizing the expected utility of the NPC role is to find the actions that can maximize the NPC's actions for various situations at each moment The conditional probability of The formula for maximizing the weighted sum of immediate expected utility and long-term expected utility of NPC is as follows:

其中，是即时期望效用，是远期期望效用，λ是折扣因子，用来减弱远期效用对当前动作的影响，ER_i是总效用值，即NPC的即时期望效用与远期期望效用的加权和；in, is the immediate expected utility, is the long-term expected utility, λ is the discount factor, which is used to weaken the impact of the long-term utility on the current action, ER _i is the total utility value, that is, the weighted sum of the immediate expected utility and the long-term expected utility of the NPC;

步骤(23)、将步骤(22)得到的最大化NPC期望效用的策略转换成兼容当前游戏的行为脚本格式。Step (23), converting the strategy of maximizing NPC expected utility obtained in step (22) into a behavior script format compatible with the current game.

进一步，所述步骤(3)的具体步骤如下：Further, the concrete steps of described step (3) are as follows:

步骤(31)、NPC在与玩家角色交互过程中，针对不同玩家角色的不同动作采用NPC动作即采用步骤(23)中得到的行为脚本；Step (31), NPC adopts NPC actions for different actions of different player characters in the process of interacting with player characters Promptly adopt the behavioral script that obtains in the step (23);

步骤(32)、记录每次NPC动作执行的结果：在每个时刻，包括玩家角色执行的动作NPC执行的动作对状态的改变，即从状态s^t迁移到状态s^t+1和状态s^t迁移到状态s^t+1下动作对应的效用值 Step (32), record every NPC action Result of execution: at each moment, including the actions performed by the player character Actions performed by NPCs The change to the state, that is, the utility value corresponding to the action under the transition from state s ^t to state s ^t+1 and state s ^t to state s ^t+1

进一步，所述步骤(4)的具体步骤如下：Further, the concrete steps of described step (4) are as follows:

步骤(41)、根据步骤(3)记录每次动作执行的结果，统计出NPC在每个状态s^t∈S^t时，玩家角色执行的各个动作的频率，将频率归一化处理，得到在各个状态下的行为频率的条件概率即可以看作是NPC基于当前玩家的行为对NPC构造的策略描述通过与玩家行为模式的集合比较，寻找到最相似的玩家策略描述即的与的是最相似的：Step (41), record the results of each action execution according to step (3), and count the actions performed by the player role when the NPC is in each state s ^t ∈ S ^t The frequency of , normalize the frequency to get the conditional probability of the behavior frequency in each state That is, it can be regarded as the NPC's strategy description based on the current player's behavior to the NPC By integrating with player behavior patterns Compare to find the most similar player strategy description which is of and of is most similar to:

根据统计得到的玩家行为策略描述更新交互式动态影响图中最相似的玩家策略描述在各种情况下采取各种行为的条件概率得到更新后的策略描述的具体公式如下：Strategy description of player behavior based on statistics Updated interactive dynamic influence graph for most similar player strategy descriptions Conditional probability of taking various actions in various situations get the updated policy description The specific formula is as follows:

其中，是现有交互式动态影响图中玩家行为的策略描述，是从交互数据中得到的策略描述，α是策略更新率，表示当前策略描述被更新的速度；in, is a policy description of player behavior in existing interactive dynamic influence graphs, is the policy description obtained from the interaction data, and α is the policy update rate, indicating the speed at which the current policy description is updated;

步骤(42)、根据步骤(3)记录的每次动作执行的结果，统计当NPC执行动作与玩家执行动作时、当前状态从s^t转移到状态s^t+1的频率的条件概率和动作对应的期望效用值更新NPC交互式动态影响图的状态转移函数和效用函数更新的具体公式如下：Step (42), according to the result of each action execution recorded in step (3), count when the NPC performs the action Perform actions with the player The conditional probability of the frequency at which the current state transitions from s ^t to state s ^t+1 when The expected utility value corresponding to the action Update state transition function of NPC interactive dynamic influence graph and utility function The updated specific formula is as follows:

其中，是现有交互式动态影响图中的条件概率，是现有交互式动态影响图中的效用函数；in, is the conditional probability in the existing interactive dynamic influence diagram, is the utility function in the existing interactive dynamic influence diagram;

步骤(43)、将步骤(41)和步骤(42)得到的结果更新交互式动态影响图，用于步骤(2)-步骤(4)。Step (43), updating the interactive dynamic influence diagram with the results obtained in step (41) and step (42), for step (2)-step (4).

综上所述，由于采用了上述技术方案，本发明的有益效果是：In summary, owing to adopting above-mentioned technical scheme, the beneficial effect of the present invention is:

1、本发明中的交互式动态影响图是人工智能最新研究成果之一，拥有强大的针对个体建模能力，能够准确的刻画玩家的行为，并且基于模型的脚本生成算法运行效率与脚本质量都满足实际应用的要求；1. The interactive dynamic influence graph in the present invention is one of the latest research results of artificial intelligence. It has powerful modeling capabilities for individuals and can accurately describe the player's behavior. Moreover, the operating efficiency and script quality of the script generation algorithm based on the model are both excellent. Meet the requirements of practical applications;

2、本发明可以通过预先为怪物NPC定义的交互式动态影响图，实现行为脚本的自动在线更新与生成，大幅降低游戏开发人员的工作负担；2. The present invention can realize the automatic online update and generation of behavior scripts through the interactive dynamic influence graph defined in advance for monster NPCs, greatly reducing the workload of game developers;

3、在本发明中，怪物NPC所包含的交互式动态影响图的更新基于真实在线数据，所以NPC脚本的更新对玩家更有针对性，更进一步地，通过每种怪物NPC在多个副本之间共享数据与玩家策略描述，生成的NPC行为脚本具有良好的通用性。3. In the present invention, the update of the interactive dynamic influence map contained in the monster NPC is based on real online data, so the update of the NPC script is more targeted to the player. Further, through each monster NPC in multiple copies Sharing data and player strategy descriptions among players, the generated NPC behavior scripts have good versatility.

附图说明Description of drawings

图1是本发明包含2步推理的交互式动态影响图；Fig. 1 is the interactive dynamic influence figure that the present invention comprises 2 steps reasoning;

图2是本发明策略描述示意图；Fig. 2 is a schematic diagram describing the strategy of the present invention;

图3是本发明执行记录数据示例。Fig. 3 is an example of execution record data of the present invention.

具体实施方式detailed description

为了使本发明的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本发明进行进一步详细说明。应当理解，此处所描述的具体实施例仅用以解释本发明，并不用于限定本发明。In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

步骤(1)、为每个NPC定义一个自动推理模型——交互式动态影响图(以下简称影响图)，根据NPC自身的特点与场景的特点定义影响图的参数。Step (1), define an automatic reasoning model for each NPC—an interactive dynamic influence diagram (hereinafter referred to as the influence diagram), and define the parameters of the influence diagram according to the characteristics of the NPC itself and the characteristics of the scene.

步骤(2)、根据影响图的参数，构造NPC的行为脚本。Step (2), according to the parameters of the influence diagram, the behavior script of the NPC is constructed.

步骤(3)、NPC在与玩家交互过程中执行行为脚本，并记录每一动作，其中动作即指行为脚本。Step (3), the NPC executes the behavior script during the interaction with the player, and records each action, wherein the action refers to the behavior script.

步骤(4)、根据步骤(3)中记录的每一动作的执行结果，更新影响图参数，并重新产生NPC的行为脚本应用于新的玩家交互中。Step (4), according to the execution result of each action recorded in step (3), update the parameters of the influence graph, and regenerate the behavior script of the NPC to be applied to the new player interaction.

所述步骤(1)中，根据NPC自身的特点与场景的特点定义影响图的参数具体如下：In the step (1), according to the characteristics of the NPC itself and the characteristics of the scene, the parameters of the influence diagram are defined as follows:

在影响图中，根据NPC的可执行动作定义动作集合(例如，NPC的可执行动作如攻击、防御、逃跑、追击动作等)，根据玩家角色的可执行动作定义动作集合玩家角色的可执行动作如轻拳、重拳、逃跑等；根据所处场景性质、NPC与玩家角色的位置与属性定义状态集合S^*；从时间t到时间t+1的状态转移函数根据NPC的行为风格与偏好(例如凶悍型NPC攻击得到的效用会被设置的较高以鼓励攻击，而保守型角色采取防守的时候得到的效用相对较高)，定义NPC的效用函数其中，状态集合S^*中的状态包含了NPC和玩家角色的基本信息：如NPC与玩家位置和属性值等。状态转移函数是一种条件概率函数，具体的说，状态转移函数量化地表达了从时间t到时间t+1，状态s^t∈S^t，通过NPC动作和玩家角色动作转移到状态s^t+1∈S^t+1的可能性，即概率；效用函数量化地表示NPC每次动作执行的结果，具体的说，效用函数表示了在时间t，NPC通过动作和玩家角色动作从状态s^t∈S^t转移到状态s^t+1∈S^t+1所得到的效用值。当NPC执行动作后，玩家角色受到的伤害是NPC的正效用；NPC自身受到的伤害是NPC的负效用，该动作的总效用为这两部分之和。NPC动作的效用还同时依赖于初始状态s^t、目标状态s^t+1和玩家角色的动作图1中是包含2步推理(t和t+1)动态影响图，当需要第3步推理时，第t+2步的模型可以通过复制t+1时间片(即所有以t+1为上标的节点)和节点之间的链接来构造，重复此过程，可以构造出包含T步推理的动态影响图。根据现有的经验知识，初始化状态转移函数与效用函数的参数。In the influence diagram, define the set of actions according to the executable actions of the NPC (For example, NPC's executable actions such as attack, defense, escape, pursuit, etc.), define the action set according to the executable actions of the player character The executable actions of the player character such as light punch, heavy punch, escape, etc.; define the state set S ^* according to the nature of the scene, the position and attributes of the NPC and the player character; the state transition function from time t to time t+1 According to the NPC's behavior style and preferences (for example, the utility of aggressive NPC attacks will be set higher to encourage attacks, and the utility of conservative characters when they take defense is relatively high), define the utility function of NPC Among them, the state in the state set S ^* contains the basic information of the NPC and the player character: such as the position and attribute value of the NPC and the player. The state transition function is a conditional probability function, specifically, the state transition function Quantitatively express from time t to time t+1, state s ^t ∈ S ^t , through NPC actions and player character actions The possibility of transitioning to the state s ^t+1 ∈ S ^t+1 , that is, the probability; the utility function Quantitatively represent the results of each action performed by the NPC, specifically, the utility function Indicates that at time t, the NPC passes the action and player character actions The utility value obtained by transferring from state s ^t ∈ S ^t to state s ^t+1 ∈ S ^t+1 . When the NPC performs an action, the damage received by the player character is the positive utility of the NPC; the damage received by the NPC itself is the negative utility of the NPC, and the total utility of the action is the sum of these two parts. The utility of the NPC action also depends on the initial state s ^t , the target state s ^t+1 and the action of the player character Figure 1 is a dynamic influence diagram containing 2-step reasoning (t and t+1). When the 3rd step reasoning is required, the model of the t+2 step can be copied by the t+1 time slice (that is, all the t+1-based Superscripted nodes) and the links between the nodes are constructed. Repeating this process, a dynamic influence graph including T-step reasoning can be constructed. According to the existing empirical knowledge, initialize the state transition function and utility function parameters.

根据先验知识，NPC的影响图中包含有若干个刻画玩家角色行为的策略描述每个策略描述表示玩家角色在不同状态下将执行不同动作的条件概率具体的说，定义了玩家角色在状态s^t时执行动作的概率。将N个策略描述存储于玩家行为模式的集合公式如下：According to prior knowledge, the NPC's influence diagram contains several policy descriptions that describe the behavior of the player character Description of each policy Indicates the conditional probability that the player character will perform different actions in different states Specifically, Defines the action that the player character performs in state s ^t The probability. Store N policy descriptions in a collection of player behavior patterns The formula is as follows:

所述步骤(2)的具体步骤如下：The concrete steps of described step (2) are as follows:

步骤(21)，针对步骤(1)中所构造的影响图，根据所需脚本长度，扩展成包含T步推理的影响图。因为在t时刻，NPC执行的动作会产生效用值这里的效用值也同时依赖于当前状态S^t、目标状态S^t+1和玩家角色的动作这里的效用值量化地刻画了NPC在一个特定情形下(包括当前状态、新状态和玩家的动作)执行动作的结果。此外，NPC所执行的动作还会随机地导致对战态势的变化，即状态迁移的变化，继而影响未来可能得到的效用值。所以，求解这个包含T步推理的影像图的目标是最大化NPC的即时期望效用与远期期望效用的加权和，加权是为了降低未来效用对现在效用的影响。Step (21), expanding the influence diagram constructed in step (1) into an influence diagram including T-step reasoning according to the required script length. Because at time t, the action performed by the NPC will generate utility value The utility value here also depends on the current state S ^t , the target state S ^t+1 and the action of the player character The utility value here quantitatively describes the actions performed by the NPC in a specific situation (including the current state, new state and player's actions) the result of. In addition, the actions performed by the NPC will also randomly lead to changes in the battle situation, that is, state transitions changes, and then affect the utility value that may be obtained in the future. Therefore, the goal of solving this image graph containing T-step reasoning is to maximize the immediate expected utility of the NPC and forward expected utility The weighted sum of , weighting is to reduce the impact of future utility on current utility.

步骤(22)、采用经典的动态规划求解算法求解步骤(21)得到的包含了T步推理的交互式动态影响图，最大化NPC的即时期望效用与远期期望效用的加权和，寻找到能够最大化NPC角色期望效用的策略,这里的策略指定了NPC在每一时刻针对各种情形(包括自身的状态、玩家的状态、位置等)所应采取的动作每一次采取的动作同时取决于当前的状态，这里策略采用条件概率函数进行描述。当前的状态取决于之前的状态与玩家和NPC之前采取的动作的概率玩家和NPC之前采取的动作又被称为观测信息；其中，最大化NPC的即时期望效用与远期期望效用的加权和的公式如下：Step (22), use the classic dynamic programming algorithm to solve the interactive dynamic influence diagram that contains T-step reasoning obtained in step (21), maximize the weighted sum of the immediate expected utility and long-term expected utility of the NPC, and find the one that can A strategy for maximizing the expected utility of an NPC role, where the strategy specifies the actions that the NPC should take for various situations (including its own state, player's state, position, etc.) at each moment Each action taken also depends on the current state, where the strategy uses a conditional probability function to describe. The current state depends on the probabilities of the previous state and previous actions taken by the player and NPC The previous actions taken by the player and the NPC are also called observation information; among them, the formula for maximizing the weighted sum of the NPC’s immediate expected utility and long-term expected utility is as follows:

其中，是即时期望效用，是远期期望效用，λ是折扣因子，用来减弱远期效用对当前动作的影响，ER_i是总效用值，即NPC的即时期望效用与远期期望效用的加权和。期望效用值是各种特定情形下效用值与各种情形出现的可能性，即当前状态为s^t，NPC执行动作目标状态s^t+1，且玩家角色执行动作时得到的具体效用值与出现这一情形的概率的加权和，式中需要考虑每种可能的组合。in, is the immediate expected utility, is the long-term expected utility, λ is the discount factor, which is used to weaken the impact of the long-term utility on the current action, and ER _i is the total utility value, that is, the weighted sum of the immediate expected utility and the long-term expected utility of the NPC. The expected utility value is the utility value in various specific situations and the possibility of various situations, that is, the current state is ^st , and the NPC performs an action Goal state s ^t+1 , and the player character performs the action The specific utility value obtained when and the probability of this happening The weighted sum of , where every possible combination needs to be considered.

步骤(23)、将步骤(22)得到的最大化NPC期望效用的策略转换成兼容当前游戏的行为脚本格式。即将得到的由条件概率函数表达的策略，转换成兼容当前游戏的基于条件判断的行为脚本格式。Step (23), converting the strategy of maximizing NPC expected utility obtained in step (22) into a behavior script format compatible with the current game. The strategy expressed by the conditional probability function to be obtained is converted into a behavior script format based on conditional judgment compatible with the current game.

所述步骤(3)的具体步骤如下：The concrete steps of described step (3) are as follows:

(31)NPC在与玩家角色交互过程中，执行步骤(22)中针对不同玩家不同动作定义的行为脚本。每一次采取的动作同时取决于当前状态S^t，这里S^t也可以看作NPC在不同时刻观测到的信息。(31) During the interaction process between the NPC and the player character, execute the behavior script defined in step (22) for different actions of different players. Each action taken also depends on the current state S ^t , where S ^t can also be regarded as the information observed by the NPC at different times.

(32)记录每次动作执行的结果：在每个时刻，包括玩家角色执行的动作NPC执行的动作对状态的改变，即从状态s^t迁移到状态s^t+1和从状态s^t迁移到状态s^t+1下动作对应的效用值数据格式如图3所示。(32) Record the results of each action execution: at each moment, including the actions performed by the player character Actions performed by NPCs The change to the state, that is, the utility value corresponding to the action under the transition from state s ^t to state s ^t+1 and from state s ^t to state s ^t+1 The data format is shown in Figure 3.

所述步骤(4)的具体步骤如下：The concrete steps of described step (4) are as follows:

(41)从步骤(32)记录的每次动作执行的结果，统计出NPC在每个状态s^t∈S^t时，玩家角色执行的各个动作的频率，将频率通过归一化，继而得到在各个状态下的行为频率的条件概率即NPC基于当前玩家的行为对玩家构造的策略描述通过与玩家行为模式的集合比较，寻找到最相似的一个玩家策略即的与的是最相似的：(41) From the results of each action recorded in step (32), count the actions performed by the player character when the NPC is in each state ^st ∈ S ^t The frequency of , the frequency is normalized, and then the conditional probability of the behavior frequency in each state is obtained That is, the NPC describes the player's strategy based on the current player's behavior By integrating with player behavior patterns Compare to find the most similar player strategy which is of and of is most similar to:

根据统计得到的玩家行为策略描述更新交互式动态影响图中最相似的玩家策略描述在各种情况下(即不同的当前状态s^t,不同的玩家角色动作)采取各种行为的条件概率得到更新后的策略描述的具体公式如下：Strategy description of player behavior based on statistics Updated interactive dynamic influence graph for most similar player strategy descriptions In various situations (i.e. different current states s ^t , different player character actions ) the conditional probability of taking various actions get the updated policy description The specific formula is as follows:

通过基于交互数据更新现有交互式动态影响图中玩家角度的策略描述，能够更加准确地在行为脚本生成中预测玩家的行为，使得NPC的行为在之后的对战中更加智能。By updating the strategy description of the player's angle in the existing interactive dynamic influence graph based on the interaction data, the player's behavior can be more accurately predicted in the behavior script generation, making the NPC's behavior more intelligent in subsequent battles.

(42)根据步骤(32)记录的每次动作执行的结果，统计当NPC角色执行动作与玩家执行的动作当前状态从s^t转移到状态s^t+1的频率的概率和动作对应的期望效用值这里的期望效用值是各种特定情形下效用值与各种情形出现频率的加权和。更新怪物NPC影响图的状态转移函数和效用函数更新的方法与步骤(41)类似：(42) According to the result of each action execution recorded in step (32), count when the NPC character performs the action Action performed with the player Probability of how often the current state transitions from state s ^t to state s ^t+1 The expected utility value corresponding to the action The expected utility value here is the weighted sum of the utility value in various specific situations and the occurrence frequency of various situations. Update the state transition function of the monster NPC influence graph and utility function The updated method is similar to step (41):

通过基于交互数据更新现有影响图中的状态转移函数与效用函数，能够更加准确刻画NPC所处游戏场景，使得生成的NPC行为在之后的对战中更加智能。By updating the state transition function and utility function in the existing influence diagram based on the interaction data, the game scene where the NPC is located can be more accurately described, making the generated NPC behavior more intelligent in subsequent battles.

(43)将步骤(41)和步骤(42)得到的结果更新交互式动态影响图，用于步骤(2)-步骤(4)。(43) Update the interactive dynamic influence diagram with the results obtained in step (41) and step (42), and use it in step (2)-step (4).

以上所述仅为本发明的较佳实施例而已，并不用以限制本发明，凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等，均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present invention should be included in the protection of the present invention. within range.

Claims

1. An NPC control method based on multi-agent technology is characterized by comprising the following steps:

defining an automatic reasoning model, namely an interactive dynamic influence graph, for each NPC, and defining parameters of the interactive dynamic influence graph according to the characteristics of the NPC and the characteristics of a scene;

step (2), constructing a behavior script of the NPC according to the parameters of the interactive dynamic influence diagram;

step (3), the NPC executes the behavior script of the NPC in the interactive process with the player, and records the execution result of each action, wherein the action refers to the behavior script;

and (4) updating parameters of the interactive dynamic influence diagram according to the execution result of each action recorded in the step (3), and reconstructing the behavior script of the NPC to be applied to new player interaction.

2. The multi-agent technology-based NPC control method as claimed in claim 1, wherein in step (1), the specific steps of defining parameters of the interactive dynamic influence map according to the characteristics of the NPC and the characteristics of the scene are as follows:

step (11), in the interactive dynamic influence diagram, defining action set according to the executable action of NPCDefining a set of actions from executable actions of a player character

Step (12), defining a state set S according to the property of the scene where the NPC is positioned, the position and the property of the NPC and the player character^*；

Step (13), defining a state s from time t to time t +1 according to step (11) and step (12)^t∈S^tBy NPC actionAnd player character actionsTransition to state s^t+1∈S^t+1Conditional probability function of likelihood, i.e. state transition function

Step (14), defining the NPC passing action at the time t according to the action style and preference of the NPC roleAnd player character actionsSlave state s^t∈S^tTransition to state s^t+1∈S^t+1Utility function of

Step (15), initializing the state transfer function according to the prior experience knowledgeAnd utility functionThe parameters of (1);

step (16), according to prior, the interactive dynamic influence diagram of the NPC contains a plurality of strategy descriptions which describe the player character behaviorsEach policy descriptionRepresenting conditional probabilities that a player character will perform different actions in different statesI.e. the player character is in state s^tExecute actions at the timeProbability of (2)Describing N strategiesCollections stored in player behavior patternsThe method comprises the following specific steps:

3. the NPC control method based on multi-agent technology as claimed in claim 1, wherein: the specific steps of the step (2) are as follows:

step (21), according to the interactive dynamic influence diagram constructed in the step (1) and the required NPC behavior script length, expanding the interactive dynamic influence diagram into an interactive dynamic influence diagram containing T-step reasoning;

step (22), solving the inclusion obtained in step (21) by adopting a classical dynamic programming solving algorithmAn interactive dynamic influence graph with T-step reasoning, a weighted sum of the instant expected utility and the long-term expected utility of the NPC is maximized, a strategy for maximizing the expected utility of the NPC role is found, namely actions which can be taken by the NPC for various situations at each moment are foundConditional probability of (2)The formula that maximizes the weighted sum of the immediate expected utility and the long-term expected utility of the NPC is as follows:

<mrow> <msub> <mi>ER</mi> <mi>i</mi> </msub> <mo>=</mo> <mi>E</mi> <mo>&lsqb;</mo> <msubsup> <mi>R</mi> <mi>i</mi> <mn>1</mn> </msubsup> <mo>&rsqb;</mo> <mo>+</mo> <mi>&lambda;</mi> <mi>E</mi> <mo>&lsqb;</mo> <msubsup> <mi>R</mi> <mi>i</mi> <mn>2</mn> </msubsup> <mo>&rsqb;</mo> <mo>+</mo> <msup> <mi>&lambda;</mi> <mn>2</mn> </msup> <mi>E</mi> <mo>&lsqb;</mo> <msubsup> <mi>R</mi> <mi>i</mi> <mn>3</mn> </msubsup> <mo>&rsqb;</mo> <mo>+</mo> <mo>...</mo> <mo>+</mo> <msup> <mi>&lambda;</mi> <mrow> <mi>T</mi> <mo>-</mo> <mn>1</mn> </mrow> </msup> <mi>E</mi> <mo>&lsqb;</mo> <msubsup> <mi>R</mi> <mi>i</mi> <mi>T</mi> </msubsup> <mo>&rsqb;</mo> <mo>,</mo> </mrow>

<mrow> <mi>E</mi> <mo>&lsqb;</mo> <msubsup> <mi>R</mi> <mi>i</mi> <mi>t</mi> </msubsup> <mo>&rsqb;</mo> <mo>=</mo> <msub> <mo>&Sigma;</mo> <mrow> <msup> <mi>S</mi> <mi>t</mi> </msup> <mo>,</mo> <msubsup> <mi>A</mi> <mi>i</mi> <mi>t</mi> </msubsup> <mo>,</mo> <msubsup> <mi>A</mi> <mrow> <mi>j</mi> <mo>,</mo> </mrow> <mi>t</mi> </msubsup> <msup> <mi>S</mi> <mrow> <mi>t</mi> <mo>+</mo> <mn>1</mn> </mrow> </msup> </mrow> </msub> <msub> <mi>R</mi> <mi>i</mi> </msub> <mrow> <mo>(</mo> <msup> <mi>s</mi> <mrow> <mi>t</mi> <mo>+</mo> <mn>1</mn> </mrow> </msup> <mo>,</mo> <msup> <mi>s</mi> <mi>t</mi> </msup> <mo>,</mo> <msubsup> <mi>a</mi> <mi>i</mi> <mi>t</mi> </msubsup> <mo>,</mo> <msubsup> <mi>a</mi> <mi>j</mi> <mi>t</mi> </msubsup> <mo>)</mo> </mrow> <mo>*</mo> <mi>P</mi> <mrow> <mo>(</mo> <msup> <mi>s</mi> <mrow> <mi>t</mi> <mo>+</mo> <mn>1</mn> </mrow> </msup> <mo>,</mo> <msup> <mi>s</mi> <mi>t</mi> </msup> <mo>,</mo> <msubsup> <mi>a</mi> <mi>i</mi> <mi>t</mi> </msubsup> <mo>,</mo> <msubsup> <mi>a</mi> <mi>j</mi> <mi>t</mi> </msubsup> <mo>)</mo> </mrow> <mo>,</mo> <mn>1</mn> <mo>&le;</mo> <mi>t</mi> <mo>&le;</mo> <mi>T</mi> <mo>,</mo> </mrow>

wherein,is the instant desired utility of the utility,is the expected effect of the future, and λ is a discount factor to reduce the impact of the future effect on the current action, ER_iIs the total utility value, i.e., the weighted sum of the instant expected utility and the future expected utility of the NPC;

and (23) converting the strategy which is obtained in the step (22) and maximizes the expected utility of the NPC into a behavior script format compatible with the current game.

4. The multi-agent technology-based NPC control method as claimed in claim 3, wherein the specific steps of the step (3) are as follows:

step (31), NPC adopts NPC action aiming at different actions of different player characters in the process of interacting with the player charactersAdopting the behavior script obtained in the step (23);

step (32) of recording each NPC actionThe result of the execution is: at each time, including the actions performed by the player characterActions performed by NPCFor change of state, i.e. from state s^tTransition to state s^t+1And state s^tTransition to state s^t+1Utility value corresponding to down action

5. The multi-agent technology-based NPC control method as claimed in claim 4, wherein the specific steps of the step (4) are as follows:

step (41) recording the result of each action execution according to the step (3), and counting the state s of the NPC in each state^t∈S^tEach action performed by a player characterNormalizing the frequency to obtain the conditional probability of the behavior frequency in each stateI.e. can be regarded as a strategy description of NPC construction based on current player's behavior By aggregation with player behavior patternsComparing, finding the most similar player strategy descriptionsNamely, it isIs/are as followsAndis/are as followsIs the most similar:

<mrow> <msubsup> <mi>m</mi> <mrow> <mi>j</mi> <mo>,</mo> <mi>k</mi> </mrow> <mi>t</mi> </msubsup> <mo>=</mo> <mi>arg</mi> <mi> </mi> <msub> <mi>min</mi> <mi>m</mi> </msub> <msub> <mo>&Sigma;</mo> <msup> <mi>S</mi> <mi>t</mi> </msup> </msub> <msup> <mrow> <mo>&lsqb;</mo> <mi>P</mi> <mrow> <mo>(</mo> <msubsup> <mi>A</mi> <mi>j</mi> <mi>t</mi> </msubsup> <mo>|</mo> <msup> <mi>S</mi> <mi>t</mi> </msup> <mo>)</mo> </mrow> <mo>-</mo> <mover> <mi>P</mi> <mo>~</mo> </mover> <mrow> <mo>(</mo> <msubsup> <mi>A</mi> <mi>j</mi> <mi>t</mi> </msubsup> <mo>|</mo> <msup> <mi>S</mi> <mi>t</mi> </msup> <mo>)</mo> </mrow> <mo>&rsqb;</mo> </mrow> <mn>2</mn> </msup> <mo>,</mo> <mi>m</mi> <mo>&Element;</mo> <msubsup> <mi>M</mi> <mi>j</mi> <mi>t</mi> </msubsup> <mo>,</mo> </mrow>

player behavior strategy description obtained according to statisticsUpdating the most similar player strategy descriptions in an interactive dynamic impact graphConditional probabilities of taking various actions in various situationsObtaining updated policy descriptionsThe specific formula is as follows:

<mrow> <msub> <mover> <mi>P</mi> <mo>&CenterDot;</mo> </mover> <mrow> <mi>n</mi> <mi>e</mi> <mi>w</mi> </mrow> </msub> <mrow> <mo>(</mo> <msubsup> <mi>A</mi> <mi>j</mi> <mi>t</mi> </msubsup> <mo>|</mo> <msup> <mi>S</mi> <mi>t</mi> </msup> <mo>)</mo> </mrow> <mo>=</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <mi>&alpha;</mi> <mo>)</mo> </mrow> <mo>*</mo> <mi>P</mi> <mrow> <mo>(</mo> <msubsup> <mi>A</mi> <mi>j</mi> <mi>t</mi> </msubsup> <mo>|</mo> <msup> <mi>S</mi> <mi>t</mi> </msup> <mo>)</mo> </mrow> <mo>+</mo> <mi>&alpha;</mi> <mo>*</mo> <mover> <mi>P</mi> <mo>~</mo> </mover> <mrow> <mo>(</mo> <msubsup> <mi>A</mi> <mi>j</mi> <mi>t</mi> </msubsup> <mo>|</mo> <msup> <mi>S</mi> <mi>t</mi> </msup> <mo>)</mo> </mrow> <mo>,</mo> </mrow>

wherein,is a strategic description of player behavior in existing interactive dynamic impact diagrams,is the policy description obtained from the interaction data, α is the policy update rate, which represents the speed at which the current policy description is updated;

step (42), according to the result of each action execution recorded in step (3), counting the action executed when NPC executesPerforming an action with a playerTime, current state from s^tTransition to state s^t+1Conditional probability of frequency ofPeriod corresponding to actionValue of hopeful effectUpdating state transition functions of NPC interactive dynamic impact diagramsAnd utility functionThe updated specific formula is as follows:

<mrow> <msub> <mi>P</mi> <mrow> <mi>n</mi> <mi>e</mi> <mi>w</mi> </mrow> </msub> <mrow> <mo>(</mo> <msup> <mi>S</mi> <mrow> <mi>t</mi> <mo>+</mo> <mn>1</mn> </mrow> </msup> <mo>|</mo> <msup> <mi>S</mi> <mi>t</mi> </msup> <mo>,</mo> <msubsup> <mi>A</mi> <mi>i</mi> <mi>t</mi> </msubsup> <mo>,</mo> <msubsup> <mi>A</mi> <mi>j</mi> <mi>t</mi> </msubsup> <mo>)</mo> </mrow> <mo>=</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <mi>&alpha;</mi> <mo>)</mo> </mrow> <mo>*</mo> <mi>P</mi> <mrow> <mo>(</mo> <msup> <mi>S</mi> <mrow> <mi>t</mi> <mo>+</mo> <mn>1</mn> </mrow> </msup> <mo>|</mo> <msup> <mi>S</mi> <mi>t</mi> </msup> <mo>,</mo> <msubsup> <mi>A</mi> <mi>i</mi> <mi>t</mi> </msubsup> <mo>,</mo> <msubsup> <mi>A</mi> <mi>j</mi> <mi>t</mi> </msubsup> <mo>)</mo> </mrow> <mo>+</mo> <mi>&alpha;</mi> <mo>*</mo> <mover> <mi>P</mi> <mo>~</mo> </mover> <mrow> <mo>(</mo> <msup> <mi>S</mi> <mrow> <mi>t</mi> <mo>+</mo> <mn>1</mn> </mrow> </msup> <mo>|</mo> <msup> <mi>S</mi> <mi>t</mi> </msup> <mo>,</mo> <msubsup> <mi>A</mi> <mi>i</mi> <mi>t</mi> </msubsup> <mo>,</mo> <msubsup> <mi>A</mi> <mi>j</mi> <mi>t</mi> </msubsup> <mo>)</mo> </mrow> <mo>,</mo> </mrow>2

<mrow> <msub> <mi>R</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>n</mi> <mi>e</mi> <mi>w</mi> </mrow> </msub> <mrow> <mo>(</mo> <msup> <mi>S</mi> <mi>t</mi> </msup> <mo>,</mo> <msubsup> <mi>A</mi> <mi>i</mi> <mi>t</mi> </msubsup> <mo>,</mo> <msubsup> <mi>A</mi> <mi>j</mi> <mi>t</mi> </msubsup> <mo>,</mo> <msup> <mi>S</mi> <mrow> <mi>t</mi> <mo>+</mo> <mn>1</mn> </mrow> </msup> <mo>)</mo> </mrow> <mo>=</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <mi>&alpha;</mi> <mo>)</mo> </mrow> <mo>*</mo> <msub> <mi>R</mi> <mi>i</mi> </msub> <mrow> <mo>(</mo> <msup> <mi>S</mi> <mi>t</mi> </msup> <mo>,</mo> <msubsup> <mi>A</mi> <mi>i</mi> <mi>t</mi> </msubsup> <mo>,</mo> <msubsup> <mi>A</mi> <mi>j</mi> <mi>t</mi> </msubsup> <mo>,</mo> <msup> <mi>S</mi> <mrow> <mi>t</mi> <mo>+</mo> <mn>1</mn> </mrow> </msup> <mo>)</mo> </mrow> <mo>+</mo> <mi>&alpha;</mi> <mo>*</mo> <msub> <mover> <mi>R</mi> <mo>~</mo> </mover> <mi>i</mi> </msub> <mrow> <mo>(</mo> <msup> <mi>S</mi> <mi>t</mi> </msup> <mo>,</mo> <msubsup> <mi>A</mi> <mi>i</mi> <mi>t</mi> </msubsup> <mo>,</mo> <msubsup> <mi>A</mi> <mi>j</mi> <mi>t</mi> </msubsup> <mo>,</mo> <msup> <mi>S</mi> <mrow> <mi>t</mi> <mo>+</mo> <mn>1</mn> </mrow> </msup> <mo>)</mo> </mrow> <mo>,</mo> </mrow>

wherein,is the conditional probability in the existing interactive dynamic impact graph,is a utility function in the existing interactive dynamic impact graph;

and (43) updating the interactive dynamic influence diagram according to the results obtained in the step (41) and the step (42) for the step (2) to the step (4).