CN107145948B - NPC control method based on multi-agent technology - Google Patents

NPC control method based on multi-agent technology Download PDF

Info

Publication number
CN107145948B
CN107145948B CN201710237082.5A CN201710237082A CN107145948B CN 107145948 B CN107145948 B CN 107145948B CN 201710237082 A CN201710237082 A CN 201710237082A CN 107145948 B CN107145948 B CN 107145948B
Authority
CN
China
Prior art keywords
npc
player
action
state
behavior
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201710237082.5A
Other languages
Chinese (zh)
Other versions
CN107145948A (en
Inventor
陈盈科
唐华锦
燕锐
李洪莹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University
Original Assignee
Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University filed Critical Sichuan University
Priority to CN201710237082.5A priority Critical patent/CN107145948B/en
Publication of CN107145948A publication Critical patent/CN107145948A/en
Application granted granted Critical
Publication of CN107145948B publication Critical patent/CN107145948B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/80Special adaptations for executing a specific game genre or game mode
    • A63F13/822Strategy games; Role-playing games
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Multimedia (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Mathematical Analysis (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Operations Research (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an NPC control method based on a multi-agent technology, relates to the technical fields of artificial intelligence, computer games and the like, and solves the problems that an NPC in the prior art does not have automatic reasoning and automatic adjusting behaviors, and a NPC script needs to be frequently modified according to the experience of a game developer, so that the labor cost is consumed and the like. The method comprises the steps of defining an automatic reasoning model, namely an interactive dynamic influence graph, for each NPC, and defining parameters of the interactive dynamic influence graph according to the characteristics of the NPC and the characteristics of a scene; constructing a behavior script of the NPC according to the interactive dynamic influence graph which defines the parameters; the NPC executes the behavior script of the NPC in the process of interacting with the player and records the execution result of each action; and updating parameters of the interactive dynamic influence diagram according to the recorded execution result of each action, and reconstructing the behavior script of the NPC to be applied to the new player interaction. The invention applies an artificial intelligence method to automatically control the behavior of non-player roles in a multi-player online role-playing game.

Description

NPC control method based on multi-agent technology
Technical Field
An NPC control method based on multi-agent technology applies an artificial intelligence method to automatically control the behavior of non-player roles in a multi-player online role playing game, and relates to the technical fields of artificial intelligence, computer games and the like.
Background
A multi-player online role playing game is a large-scale network game, wherein a player plays a fictional role, controls various activities of the role, interacts with roles played by a large number of other players in the game, collaboratively beats up monsters and completes tasks. Different from man-machine individual confrontation in a stand-alone game mode, the online game can ensure that a plurality of players cooperate and confront in real time, and the game content and the interest are greatly improved. An excellent multiplayer online role playing game can attract a large number of players, and the spiritual life of the players is enriched by providing high-quality game service, so that people live more happily. This also leads to an emerging industry that has produced enormous economic benefits.
In addition to player characters, there are a number of Non-player characters in the game, i.e., NPCs (Non-player characters). In addition to a class of NPCs that provide game functions, such as item buying and selling, mission-inducing people, there are a large number of monsters NPC in games that are the core of each game mission that a player needs to defeat to complete. The game difficulty is reasonably adjusted by enabling the NPC behavior to be more intelligent, fairness, challenging and interestingness are considered, and the method is important content for providing excellent online game service. Each large game vendor invests a large amount of human and material in each game in an attempt to achieve this goal.
The existing NPC behaviors are mostly controlled through pre-programmed scripts. Due to the limitation of development resources such as manpower and time, game developers cannot compile methods for each possible encountered situation for each NPC in each scene. To escalate game challenges, developers may increase the difficulty of game players by greatly reinforcing the physical attributes of NPCs (e.g., offensive power, defensive power, etc.). NPC often appears to behave rigid but exceptionally powerful and impairs the fairness of the game. An important drawback of pre-scripted scripts is that the NPC has only passive capabilities to deal with players, and no active reasoning capabilities. That is, the player may find a strategy to overcome a seemingly strong NPC by inferring its fixed behavior pattern through multiple heuristics. More importantly, the strategy can be rapidly spread among players, so that the challenge and the interest of the game are greatly reduced. Therefore, game developers can frequently modify the NPC script according to the game log so as to ensure the interest of the game, and the frequent modification of the NPC script is limited by the labor cost, so that the NPC script modification is only carried out on important monsters. How to effectively and quantitatively adjust the script for the strategy of the game player also depends greatly on the experience of the game developer.
Disclosure of Invention
Aiming at the defects, the invention provides an NPC control method based on a multi-agent technology, and solves the problems that the NPC in the prior art does not have automatic reasoning and automatic adjustment behaviors, and the NPC script needs to be frequently modified according to the experience of game developers, so that the labor cost is wasted, and the like.
In order to achieve the purpose, the technical scheme adopted by the invention is as follows:
an NPC control method based on multi-agent technology is characterized by comprising the following steps:
defining an automatic reasoning model, namely an interactive dynamic influence graph, for each NPC, and defining parameters of the interactive dynamic influence graph according to the characteristics of the NPC and the characteristics of a scene;
step (2), constructing a behavior script of the NPC according to the parameters of the interactive dynamic influence diagram;
step (3), the NPC executes the behavior script of the NPC in the interactive process with the player, and records the execution result of each action, wherein the action refers to the behavior script;
and (4) updating parameters of the interactive dynamic influence diagram according to the execution result of each action recorded in the step (3), and reconstructing the behavior script of the NPC to be applied to new player interaction.
Further, in the step (1), the specific step of defining the parameters of the interactive dynamic influence map according to the characteristics of the NPC and the characteristics of the scene is as follows:
step (11), in the interactive dynamic influence diagram, defining action set according to the executable action of NPC
Figure BDA0001268309140000022
Defining a set of actions from executable actions of a player character
Figure BDA0001268309140000023
Step (12), defining a state set S according to the property of the scene where the NPC is positioned, the position and the property of the NPC and the player character*
Step (13), defining a state s from time t to time t +1 according to step (11) and step (12)t∈StBy NPC action
Figure BDA0001268309140000024
And player character actions
Figure BDA0001268309140000025
Transition to state st+1∈St+1Conditional probability function of likelihood, i.e. state transition function
Figure BDA0001268309140000026
Step (14), defining the NPC passing action at the time t according to the action style and preference of the NPC role
Figure BDA0001268309140000027
And player character actions
Figure BDA0001268309140000028
Slave state st∈StTransition to state st+1∈St+1Utility function of
Figure BDA0001268309140000029
Step (15), initializing the state transfer function according to the prior experience knowledge
Figure BDA00012683091400000210
And utility function
Figure BDA00012683091400000211
The parameters of (1);
step (16), according to prior, the interactive dynamic influence diagram of the NPC contains a plurality of strategy descriptions which describe the player character behaviors
Figure BDA00012683091400000212
Each policy description
Figure BDA00012683091400000213
Representing conditional probabilities that a player character will perform different actions in different states
Figure BDA00012683091400000214
I.e. the player character is in state stExecute actions at the time
Figure BDA00012683091400000215
Probability of (2)
Figure BDA00012683091400000216
Describing N strategies
Figure BDA00012683091400000217
Collections stored in player behavior patterns
Figure BDA00012683091400000218
The method comprises the following specific steps:
Figure BDA0001268309140000021
Figure BDA0001268309140000031
further, the specific steps of the step (2) are as follows:
step (21), according to the interactive dynamic influence diagram constructed in the step (1) and the required NPC behavior script length, expanding the interactive dynamic influence diagram into an interactive dynamic influence diagram containing T-step reasoning;
step (22), solving the interactive dynamic influence graph containing the T-step reasoning obtained in the step (21) by adopting a classical dynamic programming solving algorithm, maximizing the weighted sum of the instant expected utility and the long-term expected utility of the NPC, and finding a strategy capable of maximizing the expected utility of the NPC role, namely finding actions which can be taken by the NPC at each moment aiming at various situations
Figure BDA0001268309140000034
Conditional probability of (2)
Figure BDA0001268309140000035
The formula that maximizes the weighted sum of the immediate expected utility and the long-term expected utility of the NPC is as follows:
Figure BDA0001268309140000032
Figure BDA0001268309140000033
wherein the content of the first and second substances,
Figure BDA0001268309140000036
is the instant desired utility of the utility,
Figure BDA0001268309140000037
is the expected effect of the future, and λ is a discount factor to reduce the impact of the future effect on the current action, ERiIs the total utility value, i.e., the weighted sum of the instant expected utility and the future expected utility of the NPC;
and (23) converting the strategy which is obtained in the step (22) and maximizes the expected utility of the NPC into a behavior script format compatible with the current game.
Further, the specific steps of the step (3) are as follows:
step (31), NPC adopts NPC action aiming at different actions of different player characters in the process of interacting with the player characters
Figure BDA0001268309140000038
Adopting the behavior script obtained in the step (23);
step (32) of recording each NPC action
Figure BDA0001268309140000039
The result of the execution is: at each time, including the actions performed by the player character
Figure BDA00012683091400000310
Actions performed by NPC
Figure BDA00012683091400000311
For change of state, i.e. from state stTransition to state st+1And state stTransition to state st+1Utility value corresponding to down action
Figure BDA00012683091400000312
Further, the specific steps of the step (4) are as follows:
step (41) recording the result of each action execution according to the step (3), and counting the state s of the NPC in each statet∈StEach action performed by a player character
Figure BDA00012683091400000313
Normalizing the frequency to obtain the conditional probability of the behavior frequency in each state
Figure BDA00012683091400000314
I.e. can be regarded as a strategy description of NPC construction based on current player's behavior
Figure BDA00012683091400000315
Figure BDA00012683091400000316
By aggregation with player behavior patterns
Figure BDA00012683091400000317
Comparing, finding the most similar player strategy descriptions
Figure BDA00012683091400000318
Namely, it is
Figure BDA00012683091400000319
Is/are as follows
Figure BDA00012683091400000320
And
Figure BDA00012683091400000321
is/are as follows
Figure BDA00012683091400000322
Is the most similar:
Figure BDA0001268309140000041
player behavior strategy description obtained according to statistics
Figure BDA0001268309140000045
Updating the most similar player strategy descriptions in an interactive dynamic impact graph
Figure BDA0001268309140000046
Conditional probabilities of taking various actions in various situations
Figure BDA0001268309140000047
Obtaining updated policy descriptions
Figure BDA0001268309140000048
The specific formula is as follows:
Figure BDA0001268309140000042
wherein the content of the first and second substances,
Figure BDA0001268309140000049
is a strategic description of player behavior in existing interactive dynamic impact diagrams,
Figure BDA00012683091400000410
is the policy description obtained from the interactive data, and α is the policy update rate, which indicates the speed at which the current policy description is updated;
step (42), according to the result of each action execution recorded in step (3), counting the action executed when NPC executes
Figure BDA00012683091400000411
Performing an action with a player
Figure BDA00012683091400000412
Time, current state from stTransition to state st+1Conditional probability of frequency of
Figure BDA00012683091400000413
Expected utility value corresponding to action
Figure BDA00012683091400000414
Updating state transition functions of NPC interactive dynamic impact diagrams
Figure BDA00012683091400000415
And utility function
Figure BDA00012683091400000416
The updated specific formula is as follows:
Figure BDA0001268309140000043
Figure BDA0001268309140000044
wherein the content of the first and second substances,
Figure BDA00012683091400000417
is the conditional probability in the existing interactive dynamic impact graph,
Figure BDA00012683091400000418
is a utility function in the existing interactive dynamic impact graph;
and (43) updating the interactive dynamic influence diagram according to the results obtained in the step (41) and the step (42) for the step (2) to the step (4).
In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that:
1. the interactive dynamic influence graph is one of the latest research achievements of artificial intelligence, has strong capability of aiming at individual modeling, can accurately depict the behavior of a player, and meets the requirements of practical application on the running efficiency and the script quality of a script generation algorithm based on a model;
2. the invention can realize the automatic online updating and generation of the behavior script through the interactive dynamic influence diagram defined for the NPC of the monster in advance, thereby greatly reducing the workload of game developers;
3. in the invention, the updating of the interactive dynamic influence graph contained by the monster NPC is based on real online data, so that the updating of the NPC script is more targeted to the player, and furthermore, the NPC behavior script generated has good universality by sharing data and player strategy description among a plurality of copies through each monster NPC.
Drawings
FIG. 1 is a diagram of the interactive dynamic influence of the present invention involving 2-step reasoning;
FIG. 2 is a schematic diagram of a policy description of the present invention;
fig. 3 is an example of the recording of data performed by the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Defining an automatic reasoning model, namely an interactive dynamic influence graph (hereinafter referred to as an influence graph) for each NPC, and defining parameters of the influence graph according to the characteristics of the NPC and the characteristics of the scene.
And (2) constructing a behavior script of the NPC according to the parameters of the influence graph.
And (3) the NPC executes the behavior script in the process of interacting with the player and records each action, wherein the action refers to the behavior script.
And (4) updating the parameters of the influence graph according to the execution result of each action recorded in the step (3), and regenerating the behavior script of the NPC to be applied to new player interaction.
In the step (1), defining parameters of the influence graph according to the characteristics of the NPC and the characteristics of the scene specifically include:
in the impact graph, action sets are defined according to executable actions of the NPC
Figure BDA0001268309140000051
(e.g., executable actions of the NPC such as attack, defense, escape, pursuit actions, etc.), the set of actions is defined in terms of the player character's executable actions
Figure BDA0001268309140000052
Executable actions of the player character such as punch, escape, etc.; defining a state set S according to the property of the scene, the NPC and the position and the attribute of the player character*(ii) a State transition function from time t to time t +1
Figure BDA0001268309140000053
Defining a utility function of the NPC according to the behavior style and preference of the NPC (for example, the effectiveness obtained by fierce NPC attack is set to be higher to encourage the attack, and the effectiveness obtained by conservative role in defending is relatively higher)
Figure BDA0001268309140000054
Wherein the state set S*The state in (1) including NPC and player characterBasic information: such as NPC and player position and attribute values. The state transfer function is a conditional probability function, in particular a state transfer function
Figure BDA0001268309140000055
Quantitatively expresses the state s from time t to time t +1t∈StBy NPC action
Figure BDA0001268309140000056
And player character actions
Figure BDA0001268309140000057
Transition to state st+1∈St+1Probability, i.e. probability; utility function
Figure BDA0001268309140000058
Quantitatively representing the result of each action execution of the NPC, in particular, the utility function
Figure BDA0001268309140000059
Indicates the NPC passing action at time t
Figure BDA00012683091400000510
And player character actions
Figure BDA00012683091400000511
Slave state st∈StTransition to state st+1∈St+1The resulting utility value. The injury suffered by the player character after the NPC performs the action is the positive effect of the NPC; the injury suffered by the NPC is the negative effect of the NPC, and the total effect of the action is the sum of the two parts. The utility of the NPC actions also depends on the initial state stTarget state st+1And actions of the player character
Figure BDA0001268309140000063
FIG. 1 is a dynamic influence diagram including 2-step inference (t and t +1), and when the 3 rd step inference is required, the model of the t +2 th step can be obtained by copying the t +1 time slice (i.e. the t +1 time slice)All nodes with T +1 as a superscript) and links between the nodes, and repeating the process can construct a dynamic influence graph containing T-step reasoning. Initializing a state transfer function based on prior empirical knowledge
Figure BDA0001268309140000064
And utility function
Figure BDA0001268309140000065
The parameter (c) of (c).
According to the prior knowledge, the influence graph of the NPC comprises a plurality of strategy descriptions which are used for describing the player character behaviors
Figure BDA0001268309140000066
Each policy description
Figure BDA0001268309140000067
Representing conditional probabilities that a player character will perform different actions in different states
Figure BDA0001268309140000068
In particular, the present invention relates to a method for producing,
Figure BDA0001268309140000069
defines the player character in the state stExecute actions at the time
Figure BDA00012683091400000610
The probability of (c). Storing N policy descriptions in a set of player behavior patterns
Figure BDA00012683091400000611
The formula is as follows:
Figure BDA0001268309140000061
Figure BDA0001268309140000062
the specific steps of the step (2) are as follows:
and (21) expanding the influence graph constructed in the step (1) into an influence graph containing T-step reasoning according to the required script length. Because of the actions performed by the NPC at time t
Figure BDA00012683091400000621
Will generate utility value
Figure BDA00012683091400000612
The utility value here also depends on the current state StTarget state St+1And actions of the player character
Figure BDA00012683091400000613
The utility value quantifies the NPC's ability to perform actions in a particular situation (including current state, new state, and player's actions)
Figure BDA00012683091400000614
The result of (1). In addition, the actions performed by the NPC can randomly cause changes in the battle situation, i.e., state transitions
Figure BDA00012683091400000615
In turn, affects the future available utility values. Therefore, the goal of solving this image map containing T-step reasoning is to maximize the immediate expected utility of NPC
Figure BDA00012683091400000616
And long-term expected effect
Figure BDA00012683091400000617
In order to reduce the impact of future utility on present utility.
Step (22), solving the interactive dynamic influence graph containing the T-step reasoning obtained in the step (21) by adopting a classical dynamic programming solving algorithm, maximizing the weighted sum of the instant expected utility and the long-term expected utility of the NPC, and finding the interactive dynamic influence graph capable of maximizing the expected utility of the NPC rolePolicies, here policies that specify the actions that an NPC should take for various situations (including its own state, player's state, location, etc.) at each moment
Figure BDA00012683091400000618
The action taken each time depends on the current state, where the strategy uses a conditional probability function
Figure BDA00012683091400000619
A description will be given. The current state depends on the previous state and the probability of the player and NPC previously taking action
Figure BDA00012683091400000620
The actions previously taken by the player and the NPC are also referred to as observation information; wherein the formula that maximizes the weighted sum of the immediate expected utility and the long-term expected utility of the NPC is as follows:
Figure BDA0001268309140000071
Figure BDA0001268309140000072
wherein the content of the first and second substances,
Figure BDA0001268309140000074
is the instant desired utility of the utility,
Figure BDA0001268309140000075
is the expected effect of the future, and λ is a discount factor to reduce the impact of the future effect on the current action, ERiIs the total utility value, i.e., the weighted sum of the instant expected utility and the future expected utility of the NPC. The expected utility value is the utility value in each particular case and the probability of each case occurring, i.e. the current state is stNPC performs actions
Figure BDA0001268309140000077
Target state st+1And the player character performs the action
Figure BDA0001268309140000076
Specific utility value obtained
Figure BDA0001268309140000078
And the probability of this occurring
Figure BDA0001268309140000079
Need to take into account each possible combination.
And (23) converting the strategy which is obtained in the step (22) and maximizes the expected utility of the NPC into a behavior script format compatible with the current game. The obtained strategy expressed by the conditional probability function is converted into a behavior script format which is compatible with the current game and is based on condition judgment.
The specific steps of the step (3) are as follows:
(31) the NPC executes the behaviour scripts defined in step (22) for different actions of different players during interaction with the player character. The action taken each time depends on the current state StHere StIt can also be seen as information observed by the NPC at different times.
(32) The results of each action execution are recorded: at each time, including the actions performed by the player character
Figure BDA00012683091400000710
Actions performed by NPC
Figure BDA00012683091400000711
For change of state, i.e. from state stTransition to state st+1And slave state stTransition to state st+1Utility value corresponding to down action
Figure BDA00012683091400000712
The data format is shown in fig. 3.
The specific steps of the step (4) are as follows:
(41) from step to stepThe result of each action execution recorded in step (32) is used to count the NPC in each state st∈StEach action performed by a player character
Figure BDA00012683091400000713
By normalizing the frequencies, the conditional probability of the behavior frequency in each state is obtained
Figure BDA00012683091400000714
That is, NPC constructs a policy description for a player based on the current player's behavior
Figure BDA00012683091400000715
Figure BDA00012683091400000716
By aggregation with player behavior patterns
Figure BDA00012683091400000717
Comparing, finding the most similar player strategy
Figure BDA00012683091400000718
Namely, it is
Figure BDA00012683091400000719
Is/are as follows
Figure BDA00012683091400000720
And
Figure BDA00012683091400000721
is/are as follows
Figure BDA00012683091400000722
Is the most similar:
Figure BDA0001268309140000073
player behavior strategy description obtained according to statistics
Figure BDA00012683091400000723
Updating the most similar player strategy descriptions in an interactive dynamic impact graph
Figure BDA0001268309140000084
In each case (i.e. different current state s)tDifferent player character actions
Figure BDA0001268309140000085
) Conditional probabilities of taking various actions
Figure BDA0001268309140000086
Obtaining updated policy descriptions
Figure BDA0001268309140000087
The specific formula is as follows:
Figure BDA0001268309140000081
wherein the content of the first and second substances,
Figure BDA0001268309140000088
is a strategic description of player behavior in existing interactive dynamic impact diagrams,
Figure BDA0001268309140000089
is the policy description obtained from the interactive data, and α is the policy update rate, which indicates the speed at which the current policy description is updated;
by updating the strategy description of the player angle in the existing interactive dynamic influence diagram based on the interactive data, the behavior of the player can be predicted more accurately in the generation of the behavior script, so that the behavior of the NPC is more intelligent in the subsequent fight.
(42) According to the result of each action execution recorded in the step (32), counting the actions executed by the NPC role
Figure BDA00012683091400000810
Performed by a playerMovement of
Figure BDA00012683091400000811
Current state from stTransition to state st+1Probability of frequency of
Figure BDA00012683091400000812
Expected utility value corresponding to action
Figure BDA00012683091400000813
The expected utility value here is a weighted sum of the utility value in each particular situation and the frequency of occurrence of each situation. State transfer function for updating monster NPC impact map
Figure BDA00012683091400000814
And utility function
Figure BDA00012683091400000815
The updating method is similar to the step (41):
Figure BDA0001268309140000082
Figure BDA0001268309140000083
by updating the state transition function and the utility function in the existing influence diagram based on the interactive data, the game scene where the NPC is located can be depicted more accurately, so that the generated NPC behaviors are more intelligent in later battles.
(43) And (4) updating the interactive dynamic influence graph according to the results obtained in the step (41) and the step (42) for the step (2) to the step (4).
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (2)

1. An NPC control method based on multi-agent technology is characterized by comprising the following steps:
defining an automatic reasoning model, namely an interactive dynamic influence graph, for each NPC, and defining parameters of the interactive dynamic influence graph according to the characteristics of the NPC and the characteristics of a scene;
step (2), constructing a behavior script of the NPC according to the parameters of the interactive dynamic influence diagram;
step (3), the NPC executes the behavior script of the NPC in the interactive process with the player, and records the execution result of each action, wherein the action refers to the behavior script;
step (4), updating parameters of the interactive dynamic influence diagram according to the execution result of each action recorded in the step (3), and reconstructing a behavior script of the NPC to be applied to new player interaction;
the specific steps of the step (2) are as follows:
step (21), according to the interactive dynamic influence diagram constructed in the step (1) and the required NPC behavior script length, expanding the interactive dynamic influence diagram into an interactive dynamic influence diagram containing T-step reasoning;
step (22), solving the interactive dynamic influence graph containing the T-step reasoning obtained in the step (21) by adopting a classical dynamic programming solving algorithm, maximizing the weighted sum of the instant expected utility and the long-term expected utility of the NPC, and finding a strategy capable of maximizing the expected utility of the NPC role, namely finding actions which can be taken by the NPC at each moment aiming at various situations
Figure FDA0002937603070000011
Conditional probability of (2)
Figure FDA0002937603070000012
The formula that maximizes the weighted sum of the immediate expected utility and the long-term expected utility of the NPC is as follows:
Figure FDA0002937603070000013
Figure FDA0002937603070000014
wherein the content of the first and second substances,
Figure FDA0002937603070000015
is the instant desired utility of the utility,
Figure FDA0002937603070000016
is the expected effect of the future, and λ is a discount factor to reduce the impact of the future effect on the current action, ERiIs the total utility value, i.e., the weighted sum of the instant expected utility and the future expected utility of the NPC; here, the
Figure FDA0002937603070000017
Indicates that at time t, an action is performed by the NPC
Figure FDA0002937603070000018
From the current state stTransition to the next state st+1And the player performs the action
Figure FDA0002937603070000019
The available utility value;
Figure FDA00029376030700000110
indicating that at time t, the NPC performs an action
Figure FDA00029376030700000111
Can be taken from the current state stTransition to the next state st+1Probability of NPC, all possible actions of NPC
Figure FDA00029376030700000112
All possible actions of the player
Figure FDA00029376030700000113
And a set of possible states S at various timestAnd St+1
Step (23), the strategy which is obtained in the step (22) and maximizes the expected utility of the NPC is converted into a behavior script format compatible with the current game;
the specific steps of the step (3) are as follows:
step (31), NPC adopts NPC action aiming at different actions of different player characters in the process of interacting with the player characters
Figure FDA0002937603070000021
Adopting the behavior script obtained in the step (23);
step (32) of recording each NPC action
Figure FDA0002937603070000022
The result of the execution is: at each time, including the actions performed by the player character
Figure FDA0002937603070000023
Actions performed by NPC
Figure FDA0002937603070000024
For change of state, i.e. from state stTransition to state st+1Utility value corresponding to action
Figure FDA0002937603070000025
The specific steps of the step (4) are as follows:
step (41) recording the result of each action execution according to the step (3), and counting the state s of the NPC in each statetEach action performed by a player character
Figure FDA0002937603070000026
Normalizing the frequency to obtain the conditional probability of the behavior frequency in each state
Figure FDA0002937603070000027
I.e. can be regarded as a strategy description of NPC construction based on current player's behavior
Figure FDA0002937603070000028
Figure FDA0002937603070000029
By aggregation with player behavior patterns
Figure FDA00029376030700000210
Comparing, finding the most similar player strategy descriptions
Figure FDA00029376030700000211
Namely, it is
Figure FDA00029376030700000212
Is/are as follows
Figure FDA00029376030700000213
And
Figure FDA00029376030700000214
is/are as follows
Figure FDA00029376030700000215
Is the most similar:
Figure FDA00029376030700000216
player behavior strategy description obtained according to statistics
Figure FDA00029376030700000217
Updating the most similar player strategy descriptions in an interactive dynamic impact graph
Figure FDA00029376030700000218
Conditional probabilities of taking various actions in various situations
Figure FDA00029376030700000219
Obtaining updated policy descriptions
Figure FDA00029376030700000220
The specific formula is as follows:
Figure FDA00029376030700000221
wherein the content of the first and second substances,
Figure FDA00029376030700000222
is a strategic description of player behavior in existing interactive dynamic impact diagrams,
Figure FDA00029376030700000223
is the policy description obtained from the interactive data, and α is the policy update rate, which indicates the speed at which the current policy description is updated;
step (42), according to the result of each action execution recorded in step (3), counting the action executed when NPC executes
Figure FDA0002937603070000031
Performing an action with a player
Figure FDA0002937603070000032
Time, current state from stTransition to state st+1Conditional probability of frequency of
Figure FDA0002937603070000033
Expected utility value corresponding to action
Figure FDA0002937603070000034
Updating state transition functions of NPC interactive dynamic impact diagrams
Figure FDA0002937603070000035
And utility function
Figure FDA0002937603070000036
The updated specific formula is as follows:
Figure FDA0002937603070000037
Figure FDA0002937603070000038
wherein the content of the first and second substances,
Figure FDA0002937603070000039
is the conditional probability in the existing interactive dynamic impact graph,
Figure FDA00029376030700000310
is a utility function in the existing interactive dynamic impact graph;
and (43) updating the interactive dynamic influence diagram according to the results obtained in the step (41) and the step (42) for the step (2) to the step (4).
2. The multi-agent technology-based NPC control method as claimed in claim 1, wherein in step (1), the specific steps of defining parameters of the interactive dynamic influence map according to the characteristics of the NPC and the characteristics of the scene are as follows:
step (11), in the interactive dynamic influence diagram, defining action set according to the executable action of NPC
Figure FDA00029376030700000311
Defining actions from executable actions of player charactersCollection
Figure FDA00029376030700000312
Step (12), defining a state set S according to the property of the scene where the NPC is positioned, the position and the property of the NPC and the player character*
Step (13), defining a state s from time t to time t +1 according to step (11) and step (12)t∈StBy NPC action
Figure FDA00029376030700000313
And player character actions
Figure FDA00029376030700000314
Transition to state st+1∈St+1Conditional probability function of likelihood, i.e. state transition function
Figure FDA00029376030700000315
Step (14), defining the NPC passing action at the time t according to the action style and preference of the NPC role
Figure FDA00029376030700000316
And player character actions
Figure FDA0002937603070000041
Slave state st∈StTransition to state st+1∈St+1Utility function of
Figure FDA0002937603070000042
Step (15), initializing the state transfer function according to the prior experience knowledge
Figure FDA0002937603070000043
And utility function
Figure FDA0002937603070000044
The parameters of (1);
step (16), according to prior, the interactive dynamic influence diagram of the NPC contains a plurality of strategy descriptions which describe the player character behaviors
Figure FDA0002937603070000045
Each policy description
Figure FDA0002937603070000046
Representing conditional probabilities that a player character will perform different actions in different states
Figure FDA0002937603070000047
I.e. the player character is in state stExecute actions at the time
Figure FDA0002937603070000048
Probability of (2)
Figure FDA0002937603070000049
Describing N strategies
Figure FDA00029376030700000410
Collections stored in player behavior patterns
Figure FDA00029376030700000411
The method comprises the following specific steps:
Figure FDA00029376030700000412
Figure FDA00029376030700000413
CN201710237082.5A 2017-04-12 2017-04-12 NPC control method based on multi-agent technology Expired - Fee Related CN107145948B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710237082.5A CN107145948B (en) 2017-04-12 2017-04-12 NPC control method based on multi-agent technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710237082.5A CN107145948B (en) 2017-04-12 2017-04-12 NPC control method based on multi-agent technology

Publications (2)

Publication Number Publication Date
CN107145948A CN107145948A (en) 2017-09-08
CN107145948B true CN107145948B (en) 2021-05-18

Family

ID=59773612

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710237082.5A Expired - Fee Related CN107145948B (en) 2017-04-12 2017-04-12 NPC control method based on multi-agent technology

Country Status (1)

Country Link
CN (1) CN107145948B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107890674A (en) * 2017-11-13 2018-04-10 杭州电魂网络科技股份有限公司 AI behaviors call method and device
CN108920221B (en) * 2018-06-29 2023-01-10 网易(杭州)网络有限公司 Game difficulty adjusting method and device, electronic equipment and storage medium
CN110141867B (en) * 2019-04-23 2022-12-02 广州多益网络股份有限公司 Game intelligent agent training method and device
CN113144609A (en) * 2021-04-25 2021-07-23 上海雷鸣文化传播有限公司 Game role behavior algorithm based on genetic algorithm
CN114681919A (en) * 2022-03-29 2022-07-01 米哈游科技(上海)有限公司 NPC (network processor core) performance method, device, medium and electronic equipment based on NPC mutual behavior influence

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101158897A (en) * 2007-10-09 2008-04-09 南京大学 Intelligent non-player roles implementing method in interactive game and system
CN102930338A (en) * 2012-11-13 2013-02-13 沈阳信达信息科技有限公司 Game non-player character (NPC) action based on neural network
CN105561578A (en) * 2015-12-11 2016-05-11 北京像素软件科技股份有限公司 NPC behavior decision method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5587311B2 (en) * 2008-07-22 2014-09-10 ソニー オンライン エンタテインメント エルエルシー System and method for advancing character abilities in simulation

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101158897A (en) * 2007-10-09 2008-04-09 南京大学 Intelligent non-player roles implementing method in interactive game and system
CN102930338A (en) * 2012-11-13 2013-02-13 沈阳信达信息科技有限公司 Game non-player character (NPC) action based on neural network
CN105561578A (en) * 2015-12-11 2016-05-11 北京像素软件科技股份有限公司 NPC behavior decision method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
A Value Equivalence Approach for Solving Interactive Dynamic Influence Diagrams;Ross Conroy等;《Proceedings of the 15th International Conference on Autonomous Agents and Multiagent Systems》;20160513;第1162-1170页 *
Dynamic Programming and Influence Diagrams;JOSEPH A. TATMAN等;《IEEE Transactions on systems, man, and cybernetics》;19900430;第20卷(第2期);全文 *
基于多Agent的交互式动态影响图研究、应用与展望;罗键等;《厦门大学学报(自然科学版)》;20110331;第50卷(第2期);全文 *

Also Published As

Publication number Publication date
CN107145948A (en) 2017-09-08

Similar Documents

Publication Publication Date Title
CN107145948B (en) NPC control method based on multi-agent technology
Risi et al. From chess and atari to starcraft and beyond: How game ai is driving the world of ai
Perez-Liebana et al. The 2014 general video game playing competition
Spronck et al. Adaptive game AI with dynamic scripting
Rohlfshagen et al. Ms pac-man versus ghost team cec 2011 competition
Andrade et al. Challenge-sensitive action selection: an application to game balancing
Zhao et al. Winning is not everything: Enhancing game development with intelligent agents
CN108211362B (en) Non-player character combat strategy learning method based on deep Q learning network
WO2023071854A1 (en) Control method and apparatus for virtual character in game, computer device, storage medium, and program
Umarov et al. Believable and effective AI agents in virtual worlds: Current state and future perspectives
Kopel et al. Implementing AI for non-player characters in 3D video games
Garćıa-Sánchez et al. Towards automatic StarCraft strategy generation using genetic programming
Sweetser et al. Current AI in games: A review
CN112870721B (en) Game interaction method, device, equipment and storage medium
Pinto et al. Hierarchical reinforcement learning with monte carlo tree search in computer fighting game
de Woillemont et al. Configurable agent with reward as input: A play-style continuum generation
Umarov et al. Creating believable and effective AI agents for games and simulations: Reviews and case study
Wang et al. Large scale deep reinforcement learning in war-games
Edwards et al. The role of machine learning in game development domain-a review of current trends and future directions
Spronck A model for reliable adaptive game intelligence
Tomai et al. Adapting in-game agent behavior by observation of players using learning behavior trees
Yu et al. Application of Retrograde Analysis on Fighting Games
Yang et al. Stackelberg games for learning emergent behaviors during competitive autocurricula
Rajagopalan et al. Factors that affect the evolution of complex cooperative behavior
Fernández-Ares et al. There can be only one: Evolving RTS bots via joust selection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20210518

CF01 Termination of patent right due to non-payment of annual fee