CN107145948B

CN107145948B - NPC control method based on multi-agent technology

Info

Publication number: CN107145948B
Application number: CN201710237082.5A
Authority: CN
Inventors: 陈盈科; 唐华锦; 燕锐; 李洪莹
Original assignee: Sichuan University
Current assignee: Sichuan University
Priority date: 2017-04-12
Filing date: 2017-04-12
Publication date: 2021-05-18
Anticipated expiration: 2037-04-12
Also published as: CN107145948A

Abstract

The invention discloses an NPC control method based on a multi-agent technology, relates to the technical fields of artificial intelligence, computer games and the like, and solves the problems that an NPC in the prior art does not have automatic reasoning and automatic adjusting behaviors, and a NPC script needs to be frequently modified according to the experience of a game developer, so that the labor cost is consumed and the like. The method comprises the steps of defining an automatic reasoning model, namely an interactive dynamic influence graph, for each NPC, and defining parameters of the interactive dynamic influence graph according to the characteristics of the NPC and the characteristics of a scene; constructing a behavior script of the NPC according to the interactive dynamic influence graph which defines the parameters; the NPC executes the behavior script of the NPC in the process of interacting with the player and records the execution result of each action; and updating parameters of the interactive dynamic influence diagram according to the recorded execution result of each action, and reconstructing the behavior script of the NPC to be applied to the new player interaction. The invention applies an artificial intelligence method to automatically control the behavior of non-player roles in a multi-player online role-playing game.

Description

NPC control method based on multi-agent technology

Technical Field

An NPC control method based on multi-agent technology applies an artificial intelligence method to automatically control the behavior of non-player roles in a multi-player online role playing game, and relates to the technical fields of artificial intelligence, computer games and the like.

Background

A multi-player online role playing game is a large-scale network game, wherein a player plays a fictional role, controls various activities of the role, interacts with roles played by a large number of other players in the game, collaboratively beats up monsters and completes tasks. Different from man-machine individual confrontation in a stand-alone game mode, the online game can ensure that a plurality of players cooperate and confront in real time, and the game content and the interest are greatly improved. An excellent multiplayer online role playing game can attract a large number of players, and the spiritual life of the players is enriched by providing high-quality game service, so that people live more happily. This also leads to an emerging industry that has produced enormous economic benefits.

In addition to player characters, there are a number of Non-player characters in the game, i.e., NPCs (Non-player characters). In addition to a class of NPCs that provide game functions, such as item buying and selling, mission-inducing people, there are a large number of monsters NPC in games that are the core of each game mission that a player needs to defeat to complete. The game difficulty is reasonably adjusted by enabling the NPC behavior to be more intelligent, fairness, challenging and interestingness are considered, and the method is important content for providing excellent online game service. Each large game vendor invests a large amount of human and material in each game in an attempt to achieve this goal.

The existing NPC behaviors are mostly controlled through pre-programmed scripts. Due to the limitation of development resources such as manpower and time, game developers cannot compile methods for each possible encountered situation for each NPC in each scene. To escalate game challenges, developers may increase the difficulty of game players by greatly reinforcing the physical attributes of NPCs (e.g., offensive power, defensive power, etc.). NPC often appears to behave rigid but exceptionally powerful and impairs the fairness of the game. An important drawback of pre-scripted scripts is that the NPC has only passive capabilities to deal with players, and no active reasoning capabilities. That is, the player may find a strategy to overcome a seemingly strong NPC by inferring its fixed behavior pattern through multiple heuristics. More importantly, the strategy can be rapidly spread among players, so that the challenge and the interest of the game are greatly reduced. Therefore, game developers can frequently modify the NPC script according to the game log so as to ensure the interest of the game, and the frequent modification of the NPC script is limited by the labor cost, so that the NPC script modification is only carried out on important monsters. How to effectively and quantitatively adjust the script for the strategy of the game player also depends greatly on the experience of the game developer.

Disclosure of Invention

Aiming at the defects, the invention provides an NPC control method based on a multi-agent technology, and solves the problems that the NPC in the prior art does not have automatic reasoning and automatic adjustment behaviors, and the NPC script needs to be frequently modified according to the experience of game developers, so that the labor cost is wasted, and the like.

In order to achieve the purpose, the technical scheme adopted by the invention is as follows:

an NPC control method based on multi-agent technology is characterized by comprising the following steps:

defining an automatic reasoning model, namely an interactive dynamic influence graph, for each NPC, and defining parameters of the interactive dynamic influence graph according to the characteristics of the NPC and the characteristics of a scene;

step (2), constructing a behavior script of the NPC according to the parameters of the interactive dynamic influence diagram;

step (3), the NPC executes the behavior script of the NPC in the interactive process with the player, and records the execution result of each action, wherein the action refers to the behavior script;

and (4) updating parameters of the interactive dynamic influence diagram according to the execution result of each action recorded in the step (3), and reconstructing the behavior script of the NPC to be applied to new player interaction.

Further, in the step (1), the specific step of defining the parameters of the interactive dynamic influence map according to the characteristics of the NPC and the characteristics of the scene is as follows:

step (11), in the interactive dynamic influence diagram, defining action set according to the executable action of NPC

Defining a set of actions from executable actions of a player character

Step (12), defining a state set S according to the property of the scene where the NPC is positioned, the position and the property of the NPC and the player character^*；

Step (13), defining a state s from time t to time t +1 according to step (11) and step (12)^t∈S^tBy NPC action

And player character actions

Transition to state s^t+1∈S^t+1Conditional probability function of likelihood, i.e. state transition function

Step (14), defining the NPC passing action at the time t according to the action style and preference of the NPC role

And player character actions

Slave state s^t∈S^tTransition to state s^t+1∈S^t+1Utility function of

Step (15), initializing the state transfer function according to the prior experience knowledge

And utility function

The parameters of (1);

step (16), according to prior, the interactive dynamic influence diagram of the NPC contains a plurality of strategy descriptions which describe the player character behaviors

Each policy description

Representing conditional probabilities that a player character will perform different actions in different states

I.e. the player character is in state s^tExecute actions at the time

Probability of (2)

Describing N strategies

Collections stored in player behavior patterns

The method comprises the following specific steps:

further, the specific steps of the step (2) are as follows:

step (21), according to the interactive dynamic influence diagram constructed in the step (1) and the required NPC behavior script length, expanding the interactive dynamic influence diagram into an interactive dynamic influence diagram containing T-step reasoning;

step (22), solving the interactive dynamic influence graph containing the T-step reasoning obtained in the step (21) by adopting a classical dynamic programming solving algorithm, maximizing the weighted sum of the instant expected utility and the long-term expected utility of the NPC, and finding a strategy capable of maximizing the expected utility of the NPC role, namely finding actions which can be taken by the NPC at each moment aiming at various situations

Conditional probability of (2)

The formula that maximizes the weighted sum of the immediate expected utility and the long-term expected utility of the NPC is as follows:

wherein the content of the first and second substances,

is the instant desired utility of the utility,

is the expected effect of the future, and λ is a discount factor to reduce the impact of the future effect on the current action, ER_iIs the total utility value, i.e., the weighted sum of the instant expected utility and the future expected utility of the NPC;

and (23) converting the strategy which is obtained in the step (22) and maximizes the expected utility of the NPC into a behavior script format compatible with the current game.

Further, the specific steps of the step (3) are as follows:

step (31), NPC adopts NPC action aiming at different actions of different player characters in the process of interacting with the player characters

Adopting the behavior script obtained in the step (23);

step (32) of recording each NPC action

The result of the execution is: at each time, including the actions performed by the player character

Actions performed by NPC

For change of state, i.e. from state s^tTransition to state s^t+1And state s^tTransition to state s^t+1Utility value corresponding to down action

Further, the specific steps of the step (4) are as follows:

step (41) recording the result of each action execution according to the step (3), and counting the state s of the NPC in each state^t∈S^tEach action performed by a player character

Normalizing the frequency to obtain the conditional probability of the behavior frequency in each state

I.e. can be regarded as a strategy description of NPC construction based on current player's behavior

By aggregation with player behavior patterns

Comparing, finding the most similar player strategy descriptions

Namely, it is

Is/are as follows

And

is/are as follows

Is the most similar:

player behavior strategy description obtained according to statistics

Updating the most similar player strategy descriptions in an interactive dynamic impact graph

Conditional probabilities of taking various actions in various situations

Obtaining updated policy descriptions

The specific formula is as follows:

wherein the content of the first and second substances,

is a strategic description of player behavior in existing interactive dynamic impact diagrams,

is the policy description obtained from the interactive data, and α is the policy update rate, which indicates the speed at which the current policy description is updated;

step (42), according to the result of each action execution recorded in step (3), counting the action executed when NPC executes

Performing an action with a player

Time, current state from s^tTransition to state s^t+1Conditional probability of frequency of

Expected utility value corresponding to action

Updating state transition functions of NPC interactive dynamic impact diagrams

And utility function

The updated specific formula is as follows:

wherein the content of the first and second substances,

is the conditional probability in the existing interactive dynamic impact graph,

is a utility function in the existing interactive dynamic impact graph;

and (43) updating the interactive dynamic influence diagram according to the results obtained in the step (41) and the step (42) for the step (2) to the step (4).

In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that:

1. the interactive dynamic influence graph is one of the latest research achievements of artificial intelligence, has strong capability of aiming at individual modeling, can accurately depict the behavior of a player, and meets the requirements of practical application on the running efficiency and the script quality of a script generation algorithm based on a model;

2. the invention can realize the automatic online updating and generation of the behavior script through the interactive dynamic influence diagram defined for the NPC of the monster in advance, thereby greatly reducing the workload of game developers;

3. in the invention, the updating of the interactive dynamic influence graph contained by the monster NPC is based on real online data, so that the updating of the NPC script is more targeted to the player, and furthermore, the NPC behavior script generated has good universality by sharing data and player strategy description among a plurality of copies through each monster NPC.

Drawings

FIG. 1 is a diagram of the interactive dynamic influence of the present invention involving 2-step reasoning;

FIG. 2 is a schematic diagram of a policy description of the present invention;

fig. 3 is an example of the recording of data performed by the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Defining an automatic reasoning model, namely an interactive dynamic influence graph (hereinafter referred to as an influence graph) for each NPC, and defining parameters of the influence graph according to the characteristics of the NPC and the characteristics of the scene.

And (2) constructing a behavior script of the NPC according to the parameters of the influence graph.

And (3) the NPC executes the behavior script in the process of interacting with the player and records each action, wherein the action refers to the behavior script.

And (4) updating the parameters of the influence graph according to the execution result of each action recorded in the step (3), and regenerating the behavior script of the NPC to be applied to new player interaction.

In the step (1), defining parameters of the influence graph according to the characteristics of the NPC and the characteristics of the scene specifically include:

in the impact graph, action sets are defined according to executable actions of the NPC

(e.g., executable actions of the NPC such as attack, defense, escape, pursuit actions, etc.), the set of actions is defined in terms of the player character's executable actions

Executable actions of the player character such as punch, escape, etc.; defining a state set S according to the property of the scene, the NPC and the position and the attribute of the player character^*(ii) a State transition function from time t to time t +1

Defining a utility function of the NPC according to the behavior style and preference of the NPC (for example, the effectiveness obtained by fierce NPC attack is set to be higher to encourage the attack, and the effectiveness obtained by conservative role in defending is relatively higher)

Wherein the state set S^*The state in (1) including NPC and player characterBasic information: such as NPC and player position and attribute values. The state transfer function is a conditional probability function, in particular a state transfer function

Quantitatively expresses the state s from time t to time t +1^t∈S^tBy NPC action

And player character actions

Transition to state s^t+1∈S^t+1Probability, i.e. probability; utility function

Quantitatively representing the result of each action execution of the NPC, in particular, the utility function

Indicates the NPC passing action at time t

And player character actions

Slave state s^t∈S^tTransition to state s^t+1∈S^t+1The resulting utility value. The injury suffered by the player character after the NPC performs the action is the positive effect of the NPC; the injury suffered by the NPC is the negative effect of the NPC, and the total effect of the action is the sum of the two parts. The utility of the NPC actions also depends on the initial state s^tTarget state s^t+1And actions of the player character

FIG. 1 is a dynamic influence diagram including 2-step inference (t and t +1), and when the 3 rd step inference is required, the model of the t +2 th step can be obtained by copying the t +1 time slice (i.e. the t +1 time slice)All nodes with T +1 as a superscript) and links between the nodes, and repeating the process can construct a dynamic influence graph containing T-step reasoning. Initializing a state transfer function based on prior empirical knowledge

And utility function

The parameter (c) of (c).

According to the prior knowledge, the influence graph of the NPC comprises a plurality of strategy descriptions which are used for describing the player character behaviors

Each policy description

In particular, the present invention relates to a method for producing,

defines the player character in the state s^tExecute actions at the time

The probability of (c). Storing N policy descriptions in a set of player behavior patterns

The formula is as follows:

the specific steps of the step (2) are as follows:

and (21) expanding the influence graph constructed in the step (1) into an influence graph containing T-step reasoning according to the required script length. Because of the actions performed by the NPC at time t

Will generate utility value

The utility value here also depends on the current state S^tTarget state S^t+1And actions of the player character

The utility value quantifies the NPC's ability to perform actions in a particular situation (including current state, new state, and player's actions)

The result of (1). In addition, the actions performed by the NPC can randomly cause changes in the battle situation, i.e., state transitions

In turn, affects the future available utility values. Therefore, the goal of solving this image map containing T-step reasoning is to maximize the immediate expected utility of NPC

And long-term expected effect

In order to reduce the impact of future utility on present utility.

Step (22), solving the interactive dynamic influence graph containing the T-step reasoning obtained in the step (21) by adopting a classical dynamic programming solving algorithm, maximizing the weighted sum of the instant expected utility and the long-term expected utility of the NPC, and finding the interactive dynamic influence graph capable of maximizing the expected utility of the NPC rolePolicies, here policies that specify the actions that an NPC should take for various situations (including its own state, player's state, location, etc.) at each moment

The action taken each time depends on the current state, where the strategy uses a conditional probability function

A description will be given. The current state depends on the previous state and the probability of the player and NPC previously taking action

The actions previously taken by the player and the NPC are also referred to as observation information; wherein the formula that maximizes the weighted sum of the immediate expected utility and the long-term expected utility of the NPC is as follows:

wherein the content of the first and second substances,

is the instant desired utility of the utility,

is the expected effect of the future, and λ is a discount factor to reduce the impact of the future effect on the current action, ER_iIs the total utility value, i.e., the weighted sum of the instant expected utility and the future expected utility of the NPC. The expected utility value is the utility value in each particular case and the probability of each case occurring, i.e. the current state is s^tNPC performs actions

Target state s^t+1And the player character performs the action

Specific utility value obtained

And the probability of this occurring

Need to take into account each possible combination.

And (23) converting the strategy which is obtained in the step (22) and maximizes the expected utility of the NPC into a behavior script format compatible with the current game. The obtained strategy expressed by the conditional probability function is converted into a behavior script format which is compatible with the current game and is based on condition judgment.

The specific steps of the step (3) are as follows:

(31) the NPC executes the behaviour scripts defined in step (22) for different actions of different players during interaction with the player character. The action taken each time depends on the current state S^tHere S^tIt can also be seen as information observed by the NPC at different times.

(32) The results of each action execution are recorded: at each time, including the actions performed by the player character

Actions performed by NPC

For change of state, i.e. from state s^tTransition to state s^t+1And slave state s^tTransition to state s^t+1Utility value corresponding to down action

The data format is shown in fig. 3.

The specific steps of the step (4) are as follows:

(41) from step to stepThe result of each action execution recorded in step (32) is used to count the NPC in each state s^t∈S^tEach action performed by a player character

By normalizing the frequencies, the conditional probability of the behavior frequency in each state is obtained

That is, NPC constructs a policy description for a player based on the current player's behavior

By aggregation with player behavior patterns

Comparing, finding the most similar player strategy

Namely, it is

Is/are as follows

And

is/are as follows

Is the most similar:

player behavior strategy description obtained according to statistics

In each case (i.e. different current state s)^tDifferent player character actions

) Conditional probabilities of taking various actions

Obtaining updated policy descriptions

The specific formula is as follows:

wherein the content of the first and second substances,

by updating the strategy description of the player angle in the existing interactive dynamic influence diagram based on the interactive data, the behavior of the player can be predicted more accurately in the generation of the behavior script, so that the behavior of the NPC is more intelligent in the subsequent fight.

(42) According to the result of each action execution recorded in the step (32), counting the actions executed by the NPC role

Performed by a playerMovement of

Current state from s^tTransition to state s^t+1Probability of frequency of

Expected utility value corresponding to action

The expected utility value here is a weighted sum of the utility value in each particular situation and the frequency of occurrence of each situation. State transfer function for updating monster NPC impact map

And utility function

The updating method is similar to the step (41):

by updating the state transition function and the utility function in the existing influence diagram based on the interactive data, the game scene where the NPC is located can be depicted more accurately, so that the generated NPC behaviors are more intelligent in later battles.

(43) And (4) updating the interactive dynamic influence graph according to the results obtained in the step (41) and the step (42) for the step (2) to the step (4).

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. An NPC control method based on multi-agent technology is characterized by comprising the following steps:

step (4), updating parameters of the interactive dynamic influence diagram according to the execution result of each action recorded in the step (3), and reconstructing a behavior script of the NPC to be applied to new player interaction;

the specific steps of the step (2) are as follows:

Conditional probability of (2)

wherein the content of the first and second substances,

is the instant desired utility of the utility,

is the expected effect of the future, and λ is a discount factor to reduce the impact of the future effect on the current action, ER_iIs the total utility value, i.e., the weighted sum of the instant expected utility and the future expected utility of the NPC; here, the

Indicates that at time t, an action is performed by the NPC

From the current state s^tTransition to the next state s^t+1And the player performs the action

The available utility value;

indicating that at time t, the NPC performs an action

Can be taken from the current state s^tTransition to the next state s^t+1Probability of NPC, all possible actions of NPC

All possible actions of the player

And a set of possible states S at various times^tAnd S^t+1；

Step (23), the strategy which is obtained in the step (22) and maximizes the expected utility of the NPC is converted into a behavior script format compatible with the current game;

the specific steps of the step (3) are as follows:

Adopting the behavior script obtained in the step (23);

step (32) of recording each NPC action

Actions performed by NPC

For change of state, i.e. from state s^tTransition to state s^t+1Utility value corresponding to action

The specific steps of the step (4) are as follows:

step (41) recording the result of each action execution according to the step (3), and counting the state s of the NPC in each state^tEach action performed by a player character

By aggregation with player behavior patterns

Comparing, finding the most similar player strategy descriptions

Namely, it is

Is/are as follows

And

is/are as follows

Is the most similar:

player behavior strategy description obtained according to statistics

Conditional probabilities of taking various actions in various situations

Obtaining updated policy descriptions

The specific formula is as follows:

wherein the content of the first and second substances,

Performing an action with a player

Expected utility value corresponding to action

Updating state transition functions of NPC interactive dynamic impact diagrams

And utility function

The updated specific formula is as follows:

wherein the content of the first and second substances,

is a utility function in the existing interactive dynamic impact graph;

2. The multi-agent technology-based NPC control method as claimed in claim 1, wherein in step (1), the specific steps of defining parameters of the interactive dynamic influence map according to the characteristics of the NPC and the characteristics of the scene are as follows: