CN111437605B - Method for determining virtual object behaviors and hosting virtual object behaviors - Google Patents

Method for determining virtual object behaviors and hosting virtual object behaviors Download PDF

Info

Publication number
CN111437605B
CN111437605B CN202010228976.XA CN202010228976A CN111437605B CN 111437605 B CN111437605 B CN 111437605B CN 202010228976 A CN202010228976 A CN 202010228976A CN 111437605 B CN111437605 B CN 111437605B
Authority
CN
China
Prior art keywords
virtual object
probability distribution
scene
behavior
game
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010228976.XA
Other languages
Chinese (zh)
Other versions
CN111437605A (en
Inventor
黄超
周大军
张力柯
荆彦青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202010228976.XA priority Critical patent/CN111437605B/en
Publication of CN111437605A publication Critical patent/CN111437605A/en
Application granted granted Critical
Publication of CN111437605B publication Critical patent/CN111437605B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/55Controlling game characters or game objects based on the game progress
    • A63F13/56Computing the motion of game characters with respect to other game characters, game objects or elements of the game scene, e.g. for simulating the behaviour of a group of virtual soldiers or for path finding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F2300/00Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
    • A63F2300/60Methods for processing data by generating or executing the game program
    • A63F2300/64Methods for processing data by generating or executing the game program for computing dynamical parameters of game objects, e.g. motion determination or computation of frictional forces for a virtual car
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Processing Or Creating Images (AREA)

Abstract

A method, apparatus, device, and computer-readable storage medium are disclosed for determining virtual object behavior and hosting virtual object behavior. The method comprises the following steps: a method of determining virtual object behavior, comprising: acquiring scene characteristics representing a scene where a virtual object is located by utilizing a residual error network based on a scene image of the virtual object; determining probability distribution of virtual object behaviors in a preset behavior set based on the scene characteristics; and determining the virtual object behavior based on the probability distribution. According to the method, the behavior distribution of the virtual object artificial intelligence can be calculated through the lightweight residual error depth network, so that the technical problems that training time is too long, design difficulty is too high, multi-value problems cannot be processed and the like in the design of the virtual object artificial intelligence are solved.

Description

Method for determining virtual object behaviors and hosting virtual object behaviors
Technical Field
The present disclosure relates to the field of artificial intelligence, and more particularly, to a method, apparatus, device, and computer-readable storage medium for determining virtual object behavior. The present disclosure also relates to a method of hosting virtual object behavior in a combat game.
Background
With the development of network technology, man-machine interaction applications such as computer games and the like can provide virtual scenes for users, and the users can control virtual objects to execute operations in the virtual scenes so as to achieve the purpose of entertainment. In scenes such as game instruction, game test, character hosting or Non-Player Character (NPC) control, it is also necessary that a computer automatically determines an operation to be performed by a certain virtual object, and further performs operation control. For example, in game hosting, a terminal analyzes a game scene in which a game character is located instead of a player, and automatically controls the game character to perform an operation. In the above scenario, the computer may determine the operation of the virtual object by designing virtual object artificial intelligence. The existing design of the artificial intelligence of the virtual object generally has the technical problems of long training time, overlarge design difficulty, incapability of processing multi-value problems and the like.
Disclosure of Invention
Embodiments of the present disclosure provide a method, apparatus, electronic device, and computer-readable storage medium for determining virtual object behavior. Embodiments of the present disclosure also provide a method of hosting virtual object behavior in a combat game.
Embodiments of the present disclosure provide a method of determining virtual object behavior, comprising: acquiring scene characteristics representing a scene where a virtual object is located by utilizing a residual error network based on a scene image of the virtual object; determining probability distribution of virtual object behaviors in a preset behavior set based on the scene characteristics; and determining the virtual object behavior based on the probability distribution.
Embodiments of the present disclosure provide a method of hosting virtual object behavior in a combat game, comprising: determining probability distribution of virtual object behaviors in a preset behavior set based on a game interface of a virtual object, wherein the virtual object behaviors comprise a moving direction of the virtual object, a visual angle of the virtual object at the next moment and a visual angle amplitude of the virtual object at the next moment; hosting the virtual object behavior based on the probability distribution; wherein, in the case that an attack object appears on the game interface, the extremum of the probability distribution of the view angle of the virtual object at the next time is close to the view angle facing the attack object.
The embodiment of the disclosure provides equipment for determining virtual object behaviors, which comprises the following components: the scene characteristic acquisition module is configured to acquire scene characteristics representing a scene where the virtual object is located by utilizing a residual network based on a scene image of the virtual object; a probability distribution determination module configured to determine a probability distribution of virtual object behaviors in a predetermined behavior set based on the contextual characteristics; and a virtual object behavior determination module configured to determine the virtual object behavior based on the probability distribution.
The embodiment of the disclosure provides a device for determining virtual object behaviors, which comprises: a processor; and a memory storing computer instructions which, when executed by the processor, implement the method described above.
Embodiments of the present disclosure provide a computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the above-described method.
The present disclosure proposes a method, device, electronic device and computer-readable storage medium for determining virtual object behavior. Embodiments of the present disclosure also provide a method of hosting virtual object behavior in a combat game. According to the embodiment of the disclosure, the behavior distribution of the virtual object artificial intelligence is calculated through the lightweight residual error depth network, and the technical problems that training time is too long, design difficulty is too large, multi-value problems cannot be processed and the like exist in the design of the virtual object artificial intelligence are solved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings that are required to be used in the description of the embodiments will be briefly described below. The drawings in the following description are only exemplary embodiments of the present disclosure.
Fig. 1 is an example schematic diagram illustrating a scene image of a virtual object according to an embodiment of the present disclosure.
Fig. 2A is a flowchart illustrating a method of determining virtual object behavior according to an embodiment of the present disclosure.
Fig. 2B is a schematic diagram illustrating a method of determining virtual object behavior according to an embodiment of the present disclosure.
Fig. 2C illustrates a block diagram of an apparatus for determining virtual object behavior according to an embodiment of the present disclosure.
Fig. 3A is a schematic diagram illustrating a residual network and a behavior prediction network according to an embodiment of the present disclosure.
Fig. 3B is a schematic diagram illustrating a first residual module and a second residual module according to an embodiment of the present disclosure.
Fig. 3C is a schematic diagram illustrating a first residual module and a second residual module according to an embodiment of the present disclosure.
Fig. 4A is a flowchart illustrating a training residual network and a behavior prediction network according to an embodiment of the present disclosure.
Fig. 4B is a flowchart illustrating one example of a training residual network and a behavior prediction network according to an embodiment of the present disclosure.
Fig. 5 illustrates a flowchart of a method of hosting virtual object behavior in a combat game, according to an embodiment of the present disclosure.
Fig. 6 is a block diagram illustrating an apparatus for determining virtual object behavior according to an embodiment of the present disclosure.
Detailed Description
In order to make the objects, technical solutions and advantages of the present disclosure more apparent, exemplary embodiments according to the present disclosure will be described in detail with reference to the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present disclosure and not all of the embodiments of the present disclosure, and that the present disclosure is not limited by the example embodiments described herein.
In the present specification and drawings, steps and elements having substantially the same or similar are denoted by the same or similar reference numerals, and repeated descriptions of the steps and elements will be omitted. Meanwhile, in the description of the present disclosure, the terms "first," "second," and the like are used merely to distinguish the descriptions, and are not to be construed as indicating or implying relative importance or order.
For purposes of describing the present disclosure, the following presents concepts related to the present disclosure.
Game artificial intelligence belongs to one type of artificial intelligence (Artificial intelligence, AI). Artificial intelligence is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and expand human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. Game artificial intelligence attempts to understand the nature of the operation of a real player in a game and to produce a new intelligent game machine that reacts in a similar manner to human intelligence. The present disclosure combines the design principles and implementation methods of various intelligent machines to make game artificial intelligence have the functions of sensing, reasoning and decision in the game.
Game artificial intelligence may simulate a real player manipulating a game character through machine learning. Machine Learning (ML) is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, etc. The game artificial intelligence simulates or realizes the learning behavior of human beings by researching how to acquire new knowledge or skills, and reorganizes the existing knowledge structure to continuously improve the performance of the game artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, confidence networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like. The present disclosure is not particularly limited to the techniques involved in machine learning and deep learning.
Game artificial intelligence typically requires parsing of game scenes when making decisions. The game scene is typically presented to a real player in two-dimensional or three-dimensional frames. The game artificial intelligence simulates the situation that a real player sees a two-dimensional or three-dimensional picture, and makes decisions based on the two-dimensional or three-dimensional picture. At this point, game artificial intelligence will employ Computer Vision technology (CV). Computer vision is a science of researching how to make a machine "look at", and more specifically, it means that a camera and a computer are used to replace human eyes to perform machine vision such as recognition, tracking and measurement on a target, and further perform graphic processing, so that the computer is processed into an image more suitable for human eyes to observe or transmit to an instrument to detect. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can acquire information from images or multidimensional data. Computer vision techniques typically include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D techniques, virtual reality, augmented reality, synchronous positioning, and map construction, among others.
Alternatively, each of the networks below may be an artificial intelligence network, in particular an artificial intelligence based neural network. Typically, artificial intelligence based neural networks are implemented as loop-free graphs, in which neurons are arranged in different layers. Typically, the neural network model includes an input layer and an output layer, which are separated by at least one hidden layer. The hidden layer transforms the input received by the input layer into a representation useful for generating an output in the output layer. The network nodes are fully connected to nodes in adjacent layers via edges, and there are no edges between nodes within each layer. Data received at a node of an input layer of the neural network is propagated to a node of an output layer via any one of a hidden layer, an active layer, a pooling layer, a convolutional layer, and the like. The input and output of the neural network model may take various forms, which is not limited by the present disclosure.
Embodiments of the present disclosure provide solutions related to techniques such as artificial intelligence, computer vision techniques, and machine learning, and are specifically described by the following embodiments.
Fig. 1 is an example schematic diagram illustrating a scenario image 100 of a virtual object according to an embodiment of the present disclosure.
The virtual objects of the present disclosure may be individual game characters in a computer game that are each manipulable by virtual object artificial intelligence or a real player. Alternatively, the computer game is a role-playing competitive game, such as a human fight game or a multiplayer fight game. The man-machine fight game is a game in which the game characters of the user account and the simulated game characters set by the game play a competitive game in the same scene. The multi-player competition game is a game in which a plurality of user accounts play a competition in the same scene. Alternatively, the multiplayer game may be a MOBA (Multiplayer Online Battle Arena Games, multiplayer online tactical game). In addition, the computer game can be a client game or a web game, an online game requiring network support, or an offline game not requiring network support.
The method provided by the embodiment of the disclosure can be applied to scenes such as game guidance, role hosting, NPC control or game testing in a computer game, and in the scenes, the operations of certain game roles need to be automatically decided and controlled by the electronic equipment so that the game roles can perform reasonable operations in various game scenes like game players. The electronic device may be a terminal or a server.
Taking a game instruction scene as an example, in order to facilitate a novice player to quickly become familiar with a game, during the game playing process of the novice player, a terminal or a server may analyze the game scene in which the game character of the novice player is located, predict an operation that should be performed next by the game character of the novice player, and then prompt the predicted operation to the novice player to perform operation instruction on the novice player.
Taking a game hosting scenario as an example, when a terminal is offline or a player is busy, the player can host the player's game characters, so that the terminal or server can replace the player to control the player's game characters.
Taking a game test scene as an example, a simulated game role can be set in a game as an opponent of the game role of a player, and the simulated game role is controlled by a terminal or a server or virtual object artificial intelligence in the terminal or the server, so that the role of a tester is replaced, and test data is obtained by playing the game through the virtual object artificial intelligence, so that the game performance test is realized.
In the application scenario, the game role controlled by the virtual object artificial intelligence can be a game role of a player or a simulated game role set by a game, and also can be NPC of a soldier, a monster and the like. A scenario image is an image showing an application scene (particularly, a game scene). Individual objects within the visual field of the virtual object may be included in the scene image that the virtual object may "see," including but not limited to enemy characters, my teammates, obstacles, bonus props, and the like. The virtual object artificial intelligence decides its next operation by analyzing the information in the field of view. As shown in fig. 1, characters such as an enemy object a, an enemy object B, and an my teammate appear in the field of view of the virtual object artificial intelligence as player characters. This information is presented to the virtual object artificial intelligence via the scene image. Virtual object artificial intelligence determines its behavior by analyzing this information, such as when to shoot, at what angle, how the perspective changes, how the character moves, etc. In some games, a reference image (e.g., a minimap) may also be displayed on the scene image that abstractly shows the location of enemy objects, the location of obstacles, the location of my teammates, the location of player characters, and so forth.
Currently, the way to train virtual object artificial intelligence is mainly based on algorithms of Deep Q Network (DQN) and/or simulated learning algorithms based on minimum mean square error. The DQN-based algorithm is a deep reinforcement learning algorithm that requires manual design of a reward/penalty function. The virtual object artificial intelligence obtains a sample set of states, actions, and rewards/penalties through constant interactions with the environment. The various parameters in the computational model of the virtual object artificial intelligence are then determined by maximizing the desired rewards for the game and/or minimizing the desired penalties for the game. Training of DQN-based virtual object artificial intelligence typically requires a significant amount of time, and it is also difficult for algorithm designers to design appropriate reward/penalty functions, resulting in difficulty in obtaining appropriate virtual object artificial intelligence. The virtual object artificial intelligence based on the least mean square error imitates the learning algorithm takes the image as the input of the artificial intelligent network model, and then compares and/or fits the output virtual object behavior with the recorded virtual object behavior operated by the real person. The model parameters are trained by adopting the loss of the minimum mean square error when the simulated learning algorithm based on the minimum mean square error is used for fitting the virtual object behaviors, and the scheme can not well solve the multi-value problem. That is, when there are multiple reasonable decisions of the virtual object in the virtual scene (e.g., the virtual object can shoot both the enemy object a and the enemy object B in the scene of fig. 1), since the algorithm using the minimum mean square error can only be used as the mean value of the behaviors of the virtual object, the virtual object cannot select one of the behaviors, and thus cannot attack the object correctly.
The present disclosure proposes a method, device, electronic device and computer-readable storage medium for determining virtual object behavior. Embodiments of the present disclosure also provide a method of hosting virtual object behavior in a combat game. According to the embodiment of the disclosure, the probability distribution of the virtual object behaviors is calculated through the lightweight residual error depth network, and the technical problems that training time is too long, design difficulty is too large, multi-value problems cannot be processed and the like exist in the design of the virtual object artificial intelligence are solved.
Fig. 2A is a flowchart illustrating a method 200 of determining virtual object behavior in accordance with an embodiment of the present disclosure. Fig. 2B is a schematic diagram illustrating a method 200 of determining virtual object behavior in accordance with an embodiment of the present disclosure. Fig. 2C illustrates a block diagram of an apparatus 2000 to determine virtual object behavior in accordance with an embodiment of the present disclosure.
The operation control method of the virtual object provided by the embodiment of the disclosure can be applied to man-machine interaction scenes such as computer games or live broadcasting, the man-machine interaction scenes can provide virtual scenes and virtual objects for users, and the operation control can be automatically performed on the virtual objects in the virtual scenes through the method provided by the application.
The method 200 of determining virtual object behavior according to embodiments of the present disclosure may be applied to any electronic device. It is understood that the electronic device may be a different kind of hardware device, such as a Personal Digital Assistant (PDA), an audio/video device, a mobile phone, an MP3 player, a personal computer, a laptop computer, a server, etc. For example, the electronic device may be device 2000 of FIG. 2C, which determines virtual object behavior. Hereinafter, the present disclosure is described by taking the apparatus 2000 as an example, and those skilled in the art should understand that the present disclosure is not limited thereto.
Referring to fig. 2C, device 2000 may include a processor 2001 and a memory 2002. The processor 2001 and memory 2002 may be connected by a bus 2003.
The processor 2001 may perform various actions and processes in accordance with programs stored in the memory 2002. In particular, the processor 2001 may be an integrated circuit chip having signal processing capabilities. The processor may be a general purpose processor, a Digital Signal Processor (DSP), an application specific integrated circuit (AS ic), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, and may be of the X87 architecture or ARM architecture.
Memory 2002 has stored thereon computer instructions that when executed by the microprocessor implement method 200. The memory 2002 may be volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The non-volatile memory may be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), or flash memory. Volatile memory can be Random Access Memory (RAM), which acts as external cache memory. By way of example, and not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), synchronous Dynamic Random Access Memory (SDRAM), double data rate synchronous dynamic random access memory (ddr SDRAM), enhanced Synchronous Dynamic Random Access Memory (ESDRAM), synchronous Link Dynamic Random Access Memory (SLDRAM), and direct memory bus random access memory (DR RAM). It should be noted that the memory of the methods described in this disclosure is intended to comprise, without being limited to, these and any other suitable types of memory.
First, in operation 201, the device 2000 may acquire a scene feature characterizing a scene in which a virtual object is located using a residual network based on a scene image of the virtual object.
Referring to fig. 2B, the scene image may be, for example, the scene image 100 shown in fig. 1. The scene image includes various information related to the virtual object. Such as the location of the enemy object, the location of the my teammate, the status of the player character, and so forth. The residual network may extract this information from the scene image and characterize it in the manner of scene features. That is, the contextual characteristics may characterize the context in which the virtual object is located, so that the virtual object artificial intelligence can determine the state in which the virtual object in the virtual scene is located according to the contextual characteristics. For example, the virtual object artificial intelligence can determine whether the reference virtual object is in a safe state or an attacked state, the degree of damage by the attack when in the attacked state, and the like, so as to facilitate the subsequent perception of damage by the imitated virtual object.
Alternatively, the scenario feature may be a multidimensional floating point number vector, such as a 128-bit floating point type vector, which merges various scenario information such as a scene related to enemy defensive architecture, a scene related to the enemy object injuring the virtual object, and the like. Each element in the multidimensional floating point number vector is a floating point number and the vector includes a plurality of dimensions. Thus, the scene features may characterize the scene information in numerical form to facilitate subsequent analytical calculations. For example, the contextual characteristics may describe by numerical values whether the virtual object is under an enemy defensive building, is being attacked by an enemy defensive building, whether the virtual object is within the range of injury of a weapon or skill of an enemy virtual object, the distance from the enemy's nearest attack injury (e.g., bullet, skill, etc.), and so forth. The contextual characteristics may also fuse information of the virtual object itself, such as virtual object type, weapon type, virtual object level, or combat power of the virtual object, etc. Taking a computer game as an example, the virtual object types may include both player characters and non-player characters. The combat power of the virtual object may include at least one of blood volume, blue volume, offensive power, class, equipment, and number of shots of the character in particular. Of course, the context information may also include other information that can affect the operation of the virtual object. The present disclosure does not limit the specific ways of characterizing the scene features and the information it may fuse.
Alternatively, as shown in fig. 2B, a reference image area, such as a small map area, may be included in the scene image. Operation 201 further comprises: and intercepting a reference image area from the scene image, wherein the reference image area shows the acquirable information of the virtual object in the game. The acquaintance information is, for example, a deployment layout of both enemy parties, map information, enemy object location, my teammate location, and the like. The reference image area is not limited to the disk form shown in fig. 2B as long as information can be presented in a predefined form in the scene image. For example, in the reference image area, the location of the enemy object may be represented by red rectangular dots, the location of the my teammate may be represented by blue circular dots, and so forth, the present disclosure does not limit how information is characterized in the reference image area.
Because the reference image area represents the acquirable information of most of the virtual objects in the scene image in an abstract manner, the device 2000 can obtain the scene characteristics representing the scene where the virtual objects are located by using the residual network based on only the reference image area, thereby reducing the input parameter quantity of the residual network and enabling the residual network to be lighter and more efficient.
The architecture of the residual network is schematically shown in fig. 2B. At least one convolution layer may be included in the architecture of the residual network. The output of at least one of these convolution layers may be fused/added to the characteristics of the certain preceding convolution layer. The residual error network can prevent gradient attenuation/gradient explosion in the neural network, so that the convergence speed of the residual error network of the artificial intelligence of the virtual object is further improved. For example, the residual network shown in fig. 2B includes two convolution layers, both of which have a convolution kernel size of 3×3, and a step size of 1. Each convolution layer includes C convolution kernels. Assuming that the input features of the residual network are convolution features x 1 The space dimension is H×W, and the channel dimension is C. Input feature x 1 New convolution characteristics x will be obtained by these two convolution layers 2 The spatial dimension is H W, and the channel dimension is C. Will x 1 And x 2 Adding to obtain the final output characteristic x 3 The spatial dimension is also H W and the channel dimension is also C. At this time, the feature x is output 3 Due to fusion of features x 1 And x 2 Thereby outputting the characteristic x 3 The result of (a) is that the cascade operation of each convolution layer causes continuous conduction of errors, thereby causing abnormal virtual object behaviors to occur. Input feature x 1 May be the reference image area/scene image above. Output characteristics x 3 May be the contextual characteristics above.
Compared with the traditional neural network model based on the DQN, the residual network has the advantages of simple structure, less parameter quantity and high convergence speed, does not need to design a reward/punishment function, and is more convenient for training and application of artificial intelligence of the virtual object.
Next, in operation 202, the device 2000 determines a probability distribution of the virtual object behavior in a predetermined behavior set based on the contextual characteristics.
As shown in fig. 2B, the probability distribution of the virtual object behavior in the predetermined behavior set may be either a discrete probability distribution or a continuous probability distribution, which is not limited by the present disclosure. The probability distribution of the virtual object behavior in the set of predetermined behaviors indicates the probability that the virtual object behavior appears as each of the set of predetermined behaviors. For example, assume that the virtual object behavior indicates whether the virtual object is to fire a gun, and the predetermined behavior set includes two behaviors of "fire" and "not fire". The device 2000 calculates the probability of "shooting" to be 0.7 and the probability of not shooting to be 0.3 based on the scene characteristics. At this time, the virtual object performs a shooting operation with a probability of 0.7 when facing the scene shown in fig. 2B. The virtual object artificial intelligence uses the probability distribution to output a random number. Assuming that the random number is shot by 1 and not shot by 0, the number of times of outputting the random number 1 is 70% and the number of times of outputting the random number 0 is 30% when the virtual object faces the same scene a plurality of times. Therefore, the behavior mode of the virtual object is not stiff and easy to predict, and the interest of the game is improved.
Alternatively, the device 2000 may utilize a behavior prediction network to determine a probability distribution of virtual object behavior in a predetermined set of behaviors. The combination of the behavior prediction network and the residual network may be referred to as a hybrid density network. Since in the fight game, especially the gunfight game, the real player mainly operates the moving direction of the virtual object, adjusts the angle change of the view angle of the virtual object, and adjusts the magnitude value of the view angle of the virtual object. The virtual object behavior may thus include at least a part of a moving direction of the virtual object, a perspective angle of the virtual object at a next time instant, and a perspective magnitude of the virtual object at the next time instant. Those skilled in the art will appreciate that the behavior of virtual objects may vary from game to game. Such as a chess and card game, the behavior of the virtual object may include the number of points played, etc. In the case where the virtual object behavior includes at least a part of a moving direction of the virtual object, a view angle of the virtual object at a next time, and a view amplitude of the virtual object at the next time, the behavior prediction network may include at least a part of a moving direction prediction network, a view angle prediction network, and a view amplitude prediction network.
Optionally, the predetermined set of behaviors of the moving direction of the virtual object includes moving upward, upward right, downward left, upward left. As shown in fig. 2B, a first probability distribution indicating probability distribution of a virtual object moving in a plurality of moving directions is acquired by inputting a scene feature to a moving direction prediction network. The first probability distribution may be a discrete probability distribution. The first probability distribution may have a plurality of equal maxima. For example, assuming that probabilities of both left and right are 0.4 and probabilities of the other directions are less than 0.4, the first probability distribution indicates that the simulated player who is able to maximize the left-to-right walk of the virtual object operates.
Alternatively, the viewing angle value of the virtual object corresponds to an angle at which the virtual object faces. The predetermined set of behavior of the perspective angle value of the virtual object is a section of the perspective angle value to which the virtual object can rotate. For example, the viewing angle value may range from 0 to 1, corresponding to 0 to 360 degrees clockwise. The device 2000 inputs the contextual characteristics to a perspective angle prediction network to obtain a second probability distribution that indicates a probability distribution of a perspective angle value of the virtual object at a next time instant in a perspective angle value interval. The second probability distribution may be a discrete probability distribution or a continuous probability distribution as shown in fig. 2B. As shown by the solid line in fig. 2B, the second probability distribution may have a plurality of extremum values, each extremum value representing an optimal solution (optimal strategy) of the perspective angle of the virtual object under the virtual scene.
For another example, the scene image of the virtual object may be a fight game interface, wherein in case an attack object appears on the fight game interface, the extremum of the second probability distribution is close to the angle of view facing said attack object. For example, in the fight game scenario shown in fig. 1, the perspective angles facing the enemy character a and the enemy character B will become the two extremes of the second probability distribution.
Optionally, the predetermined behavior set of view angle amplitude values of the virtual object is a section of view angle amplitude values that the virtual object can change. For example, the interval of view amplitude values may be from 0 to 1, the minimum value of the corresponding amplitude and the maximum value of the amplitude. The device 2000 inputs the contextual characteristics to a view magnitude prediction network to obtain a third probability distribution indicating a probability distribution of view magnitude values of the virtual object at a next time instant in a view magnitude value interval. The third probability distribution may be a discrete probability distribution or a continuous probability distribution as shown in fig. 2B. As shown by the dashed line in fig. 2B, the third probability distribution may have a plurality of extremum, each extremum representing an optimal solution (optimal strategy) of the view angle amplitude of the virtual object under the virtual scene.
Finally, in operation 203, the device 2000 determines the virtual object behavior based on the probability distribution.
Optionally, the device 2000 determines a movement direction of the virtual object based on the first probability distribution. The virtual object randomly samples the moving motion according to the first probability, rather than always performing the motion with the highest probability. In some cases, performing the most probable action may result in the virtual object hitting the obstacle and then stopping at the front of the obstacle and being unable to move. Through the random sampling movement behavior of the first probability, the virtual object can be prevented from being blocked in the game scene, and the virtual object moves randomly according to the first probability, so that a certain probability is achieved.
Optionally, the device 2000 determines a perspective angle value of the virtual object at a next time instant based on the second probability distribution. If the enemy appears on the right side, the visual angle is required to be moved to the right until the enemy appears in the center of the image, so that the virtual object can attack the enemy conveniently. Two extreme values are shown in fig. 2B, which correspond to the viewing angle facing the enemy character a and the viewing angle facing the enemy character B, respectively. The visual angle values are randomly output according to the second probability distribution, and the output visual angle values have high probability of being visual angles close to extreme values, so that the virtual object moves the sight line towards the direction of the attack object. Similarly, the device 2000 may also determine a view angle magnitude value of the virtual object at a next time instant based on the third probability distribution.
Alternatively, both the second probability distribution and the third probability distribution may be a mixture gaussian distribution. A gaussian mixture distribution is a linear combination of gaussian distributions that fits well to various probability distributions. So that the perspective varying behavior of the virtual object can be better fitted with a gaussian mixture distribution.
The virtual object is to determine the virtual object behavior according to the probability distribution, and the virtual object behavior can execute a plurality of reasonable game strategies when facing the same game scene, so that the multi-value problem facing the virtual object can be better solved compared with a loss training model with minimum mean square error.
The method 200 calculates probability distribution of virtual object behaviors through a lightweight residual depth network, and solves the technical problems that training time is too long, design difficulty is too high, multi-value problems cannot be processed and the like in design of virtual object artificial intelligence.
Fig. 3A is a schematic diagram illustrating a residual network and a behavior prediction network according to an embodiment of the present disclosure. Fig. 3B is a schematic diagram illustrating a first residual module and a second residual module according to an embodiment of the present disclosure. Fig. 3C is a schematic diagram illustrating a first residual module and a second residual module according to an embodiment of the present disclosure.
Referring to fig. 3A, the behavior prediction network includes a movement direction prediction network 302, a view angle prediction network 303, and a view amplitude prediction network 304. The residual network 301 comprises at least one first residual module 3011 and at least one second residual module 3012.
The residual network 301 is used to output the scene characteristics. Assume that a scene feature is a one-dimensional vector comprising N floating point numbers, N being greater than 1. Preferably, N may be equal to 200.
The movement direction prediction network 302 is configured to perform sub-task 1, i.e. to output a probability distribution (first probability distribution) that the virtual object moves in a plurality of movement directions. Alternatively, the movement direction prediction network 302 includes a fully connected layer whose input is a contextual characteristic and whose output is a probability distribution of movement of the virtual object in multiple movement directions. Taking 8 moving directions as an example, the output of the first probability distribution includes an array of 8 floating point numbers. Each floating point number in the array represents the probability that the virtual object moves in a certain direction. The device 2000 may determine a movement direction of the virtual object based on the first probability distribution.
The view angle prediction network 303 is configured to perform subtask 2, that is, output a probability distribution (second probability distribution) of view angle values of the virtual object at the next time in a view angle value interval. Optionally, the viewing angle prediction network 303 includes three fully connected layers: a fully connected layer for calculating the mean μ, a fully connected layer for calculating the variance σ, a fully connected layer for the weight ω. For example, the input of each fully-connected layer is a contextual characteristic, the output is an array comprising 32 floating point numbers, and in fig. 3A, the output of each fully-connected layer is simply represented by the number of data items included in the output array of that layer. Through the viewing angle prediction network 303, the device 2000 can obtain 3 number groups { μ } k }、{σ k }、{ω k And (2) wherein 1.ltoreq.k.ltoreq.32. Mean mu corresponding to each k k Sum of variances sigma k A gaussian distribution may be constructed as follows.
Figure GDA0002472804290000121
Thus, array { μ } k }、{σ k The 32 Gaussian distributions are formed, and the 32 Gaussian distributions are correspondingly weighted { omega }, the weight of the 32 Gaussian distributions is calculated k The second probability distribution is combined so that the second probability distribution is a mixed gaussian distribution, which can be formed by the following formula:
Figure GDA0002472804290000122
where K is the number of Gaussian distributions, ω, making up the mixed Gaussian distribution k Is the weight of the kth Gaussian distribution, and is 0.ltoreq.ω k Is less than or equal to 1, and at the same time,
Figure GDA0002472804290000123
μ k is the mean value of the kth Gaussian distribution, σ k Is the standard deviation of the kth gaussian distribution. Gaussian distribution is a form of distribution that exists in large numbers in nature, most commonly, with a gaussian distribution typically having an extremum. The linear combination of multiple gaussian distributions to form a hybrid gaussian distribution may have multiple extrema (each extremum representing an optimal strategy) to better fit the probability distribution of viewing angle.
It will be appreciated by those skilled in the art that 32 is only one example, and that the view angle prediction network 303 may output a greater or lesser number of means μ, variances σ, weights ω.
The view angle magnitude prediction network 304 is configured to perform subtask 3, that is, output a probability distribution (third probability distribution) of view angle magnitude values of the virtual object at a next moment in a view angle magnitude value interval. Optionally, view magnitude prediction network 304 includes three fully connected layers: a fully connected layer for calculating the mean μ, a fully connected layer for calculating the variance σ, a fully connected layer for the weight ω. The method of constructing the third probability distribution based on these values is similar to the way the second probability distribution is constructed, and will not be described in detail here.
It will be appreciated by those skilled in the art that the first probability distribution, the second probability distribution, and the third probability distribution may all be otherwise defined probability distributions, and that the discrete probability distribution and the mixture gaussian distribution are merely one example, and the disclosure is not limited thereto.
The structures of the first residual module 3011 and the second residual module 3012 may be as shown in fig. 3B.
The spatial dimension of the input feature of the first residual module 3011 is twice the spatial dimension of its output feature, and the channel dimension of the input feature of the first residual module 3011 is half the channel dimension of its output feature. The first residual module can be used as one module in the residual network and is repeatedly called in the design of the residual network, so that the design of the residual network is simpler.
Optionally, the first residual module 3011 includes: a first number of first convolution layers, the step size of the first convolution layers being a first step size, and the size of the convolution kernel of the first convolution layers being a first size; a second number of second convolution layers, the step size of the second convolution layers being a second step size, and the size of the convolution kernel of the second convolution layers being a second size; a second number of third convolution layers, the step size of the third convolution layers being a second step size and the size of the convolution kernel of the third convolution layers being a first size. The design of the first residual module can enable the residual network to extract information more lightly and efficiently. Specifically, the convolution kernels with different steps and different sizes can be used for fusing information of fields with different sizes at different sampling rates, so that the use efficiency of a residual error network is improved.
It will be appreciated by those skilled in the art that the specific values of the first number, the second number, the first step size, the second step size, the first size, and the second size may be set according to actual circumstances, and the present disclosure does not impose any limitation on the specific values of these parameters.
For ease of understanding, the present disclosure is illustrated with a first number of 2, a second number of 1, a first step size of 2, a second step size of 2, a first size of 1×1, and a second size of 3×3 as examples.
The number of convolution kernels included in each of the convolution layers in the first residual module 3011 and the second residual module 3012 may be set by those skilled in the art according to the scene image and the game scene. The present disclosure does not limit the number of convolution kernels in a convolution layer.
Referring to fig. 3B, the first residual module 3011 may include a convolution layer a, a convolution layer B, a convolution layer C, and a convolution layer D. The convolution layers A and B are first convolution layers, the convolution layer C is a second convolution layer, and the convolution layer D is a third convolution layer. It is noted that even though both convolution layers a and B belong to the first convolution layer, the number of convolution kernels included in convolution layers a and B may be the same or different. The following explanation is merely for convenience in explaining embodiments of the present disclosure, which does not limit the number of convolution kernels in convolution layers a-D.
Let the spatial dimension of the input features of the first residual block be h×w and the channel dimension be C.
The step size of the convolution layer A and the convolution layer B is 2, and the size of the convolution kernel is 1 multiplied by 1. Since the step size of convolution layer a and convolution layer B is 2, both the width and height (i.e., the spatial dimension) of the outputs of both convolution layers will be reduced by a factor of 1.
At this time, it is assumed that the number of convolution kernels of the convolution layer a is 2C (how many convolution kernels have how many output channels, i.e., the number of convolution kernels is equal to the number of output channels). Thus, the convolution layer a has an input spatial dimension of h×w, an input channel dimension of M, an output spatial dimension of (0.5H) × (0.5W), and an output channel dimension of 2M.
Let the number of convolution kernels of convolution layer B be M. Then, the convolution layer B has an input spatial dimension of h×w, an input channel dimension of M, an output spatial dimension of (0.5H) × (0.5W), and an output channel dimension of M.
Assuming that the step size of the convolution layer C is 1, the size of the convolution kernel is 3×3, and the number of convolution kernels is C. The input features and output features of convolutional layer C have the same spatial dimensions and channel dimensions. Since the input characteristic of the convolution layer B is the output characteristic of the convolution layer B, the input spatial dimension of the convolution layer C is (0.5H) × (0.5W), the input channel dimension is M, the output spatial dimension is (0.5H) × (0.5W), and the output channel dimension is M.
Assuming that the step size of the convolution layer D is 1, the size of the convolution kernel is 1×1, and the number of convolution kernels is 2M. Since the input characteristic of the convolution layer D is the output characteristic of the convolution layer M, the input spatial dimension of the convolution layer D is (0.5H) × (0.5W), the input channel dimension is M, the output spatial dimension is (0.5H) × (0.5W), and the output channel dimension is 2M.
And adding the output characteristics of the convolution layer A and the convolution layer D to obtain the output characteristics of the first residual error module, wherein the output space dimension of the first residual error module is (0.5H) x (0.5W), and the output channel dimension is 2M.
The spatial dimensions and channel dimensions of the input features of the second residual block 3012 are the same as the spatial dimensions and channel dimensions of their output features. Similarly, the second residual module may be called repeatedly in the design of the residual network as a module in the residual network, making the design of the residual network simpler.
The second residual block 3012 includes: a first number of third convolution layers, the step size of the third convolution layers being a second step size, and the size of the convolution kernel of the third convolution layers being a first size; a second number of second convolution layers, the step size of the second convolution layers being a second step size, and the size of the convolution kernel of the second convolution layers being a second size. The above design of the second residual module can make the residual network more lightweight and efficient for extracting information. Specifically, the convolution kernels with different steps and different sizes can be used for fusing information of fields with different sizes at different sampling rates, so that the use efficiency of a residual error network is improved.
Referring to fig. 3B, the second residual block 3012 includes a convolution layer E, a convolution layer F, and a convolution layer G. Wherein the convolution layers E and G are the third convolution layer and the convolution layer F is the second convolution layer.
It is noted that even though the convolution layers D, E, and G all belong to the third convolution layer, the number of convolution kernels included in the convolution layers D, E, and G may be the same or different. Even though both convolution layers C and F belong to the second convolution layer, the number of convolution kernels comprised by convolution layers C and F may be the same or different. The present disclosure does not limit the number of convolution kernels in convolution layers E-G.
Let the spatial dimension of the input features of the second residual block 3012 be h×w and the channel dimension be M.
Assuming that the number of convolution kernels of the convolution layer E is 0.5M, the step size is 1, and the size of the convolution kernels is 1×1. Thus, the convolution layer E has an input spatial dimension of h×w, an input channel dimension of M, an output spatial dimension of h×w, and an output channel dimension of 0.5M.
Assuming that the step size of the convolution layer F is 1, the size of the convolution kernel is 3×3, and the number of convolution kernels is 0.5M. The input features and output features of the convolutional layer F have the same spatial dimensions and channel dimensions. Since the input characteristic of the convolution layer F is the output characteristic of the convolution layer E, the convolution layer F has an input spatial dimension of h×w, an input channel dimension of 0.5M, an output spatial dimension of h×w, and an output channel dimension of 0.5M.
Assuming that the step size of the convolution layer G is 1, the size of the convolution kernel is 1×1, and the number of convolution kernels is M. Since the input characteristic of the convolution layer G is the output characteristic of the convolution layer F, the convolution layer G has an input spatial dimension of h×w, an input channel dimension of 0.5M, an output spatial dimension of h×w, and an output channel dimension of M.
The first residual module 3011 and the second residual module 3012 may be cascaded with other layers in the residual network in various orders. Fig. 3C illustrates an example cascading style.
In addition to the first residual module 3011 and the second residual module 3012, the residual network also includes a convolutional layer, a global averaging pooling layer, and a full connection layer. Let the scene image or the reference image area in the scene image be 1024 x 1024 pixels, with three input channels of RGB.
The scene image or the reference image area in the scene image is input to one convolution layer having 8 convolution kernels each of a size of 7 x 7 and a step size of 2. Through the convolution layer, the spatial dimension of the output characteristic of the convolution layer is 512×512, and the channel dimension is 8.
The output feature then passes through a first residual block having convolution kernels of [16,8,8,16] for convolution layers a, B, C and D, respectively, to yield an output vector having a spatial dimension of 256 x 256 and a channel dimension of 16.
The output feature then passes through two second residual modules, the convolution layers E, F and G of which have respective numbers of convolution kernels of [8,8,16], thereby yielding an output vector having a spatial dimension of 256×256 and a channel dimension of 16.
Then, the output vectors of the two second residual modules sequentially pass through 1 first residual module, 3 second residual modules, 1 first residual module, 5 second residual modules, 1 first residual module, 2 second residual modules, 1 first residual module and 2 second residual modules to obtain an output vector with 256 channel dimensions and 16 x 16 space dimensions.
The output vector of the last second residual module (channel dimension 256, space dimension 16 x 16) is input to a global average pooling layer and a full connection layer, and finally a one-dimensional vector is obtained, wherein the one-dimensional vector has 200 floating point numbers. This one-dimensional vector is also referred to as the scene vector.
Fig. 4A is a flowchart illustrating a training residual network and a behavior prediction network according to an embodiment of the present disclosure. Fig. 4B is a flowchart illustrating one example of a training residual network and a behavior prediction network according to an embodiment of the present disclosure.
In operation 401, the device 2000 records a video in which a virtual object is manipulated. The device 2000 may collect a half hour or so of a gunfight game sample by manually recording the gunfight game, wherein the sampling frequency of the game is 10 frames per second.
In operation 402, the apparatus 2000 acquires a plurality of sample data from the video, each sample data including a movement direction sample, a view angle sample at a next time, and a view amplitude sample at the next time, which are performed for a game interface sample and a virtual object. For example, as shown in fig. 4B, the device 2000 may record a gunfight game sample and then extract virtual object behaviors therefrom to obtain a movement direction sample, a view angle sample at the next time, and a view amplitude sample at the next time. The movement direction samples include movements of the virtual object in 8 directions (at 45 degree intervals, divided into upper right, lower left, upper left). The view angle sample at the next time includes the view angle value of each frame of the player at the time of operating the virtual object, that is, the view angle value of each frame of the game character in the game video. The view angle magnitude sample at the next time includes the view angle magnitude of each frame when the player operates the virtual object, that is, the change condition of the view angle value of the game character at each frame. The device 2000 saves video of the game and corresponding virtual object behavior. Optionally, as shown in fig. 4B, the device 2000 may also extract a reference image area (i.e., a minimap area) in the game interface as a game interface sample. Optionally, after obtaining the sample data, 80% of the sample data is used to train a residual network and a behavior prediction network (a combination of which is also referred to as a mixed density network), and the remaining sample data is used to test the mixed density network.
In operation 403, the residual network and the behavior prediction network are trained based on the plurality of sample data. The structure of the residual network may be similar to that shown in fig. 3A to 3B. Those skilled in the art will appreciate that, as shown in fig. 4B, a designer of artificial intelligence of a virtual object may also design other lightweight network structures as a residual network according to different games, and then train a mixed density network including the residual network and the behavior prediction network.
Operation 403 further comprises training parameters of a residual network and a movement direction prediction network by optimizing a class cross entropy penalty between the movement direction samples and movement directions predicted by the movement direction prediction network based on a game interface sample and a movement direction sample corresponding to the game interface sample in the plurality of sample data.
For example, the class cross entropy penalty between a movement direction sample and a movement direction predicted by a movement direction prediction network may be defined as:
Figure GDA0002472804290000171
where m is the total number of samples, C is the number of categories (e.g., c=8, representing movement of the game character in 8 directions), y ji The label of the ith class representing the jth sample. If the class of the jth sample is i, y ji 1. If the class of the jth sample is not i, y ji Is 0.P is p ji Representing the probability that the jth sample is the ith class. The parameters of the residual error network and the moving direction prediction network are trained by taking the class cross entropy loss as an objective function and optimizing the class cross entropy loss (when the loss function converges), so that the residual error network and the moving direction prediction network can learn the moving strategy of the gun game.
Operation 403 further comprises training parameters of a residual network and a view angle prediction network by optimizing posterior probability loss between the view angle sample and a probability distribution of view angles predicted by the view angle prediction network based on a game interface sample in the plurality of sample data and a view angle sample at a next time corresponding to the game interface sample.
For example, the posterior probability loss between a view angle sample and a probability distribution of view angles predicted by a view angle-based prediction network may be defined as:
Figure GDA0002472804290000172
wherein x is n Is the view angle value of the nth sample, N being the total number of samples. The posterior probability loss is taken as an objective function, and parameters of the residual error network and the visual angle prediction network are trained by optimizing the loss (when the loss function converges), so that the residual error network and the moving direction prediction network can learn a game strategy related to the visual angle direction under a gunfight game. The device 2000 can better solve the multi-value problem (multiple reasonable strategies in the same scene) related to the viewing angle direction in the game through the mixed Gaussian distribution scheme.
Operation 403 further comprises training parameters of a residual network and a view angle magnitude prediction network by optimizing posterior probability loss between the view angle magnitude samples and a probability distribution of view angle magnitudes predicted based on the view angle magnitude prediction network based on a game interface sample and a view angle magnitude sample at a next time corresponding to the game interface sample in the plurality of sample data. Similarly, the posterior probability loss between the view angle magnitude samples and the probability distribution of view angle magnitudes predicted by the view angle magnitude-based prediction network may also be defined as:
Figure GDA0002472804290000181
similarly, the posterior probability loss is taken as an objective function, and parameters of the residual error network and the visual angle prediction network are trained by optimizing the loss (when the loss function converges), so that the residual error network and the moving direction prediction network can learn a game strategy of the visual angle amplitude correlation under the gunfight game, and the visual angle amplitude correlation in the game can be well solved by a mixed Gaussian distribution schemeMultiple-valued problems of (there are multiple rational strategies for the same scenario).
In experiments with embodiments of the present disclosure, device 2000 can complete training of the mixed density network by updating parameters of the mixed density network through 20 iterations (each iteration would traverse all training samples). Based on the 10 game samples, the 1 game is about 3 minutes, the time for recording 10 games is about 30 minutes, and in the case of the GPU, the training of the mixed density network takes about half an hour, so that it takes about one hour to complete to obtain the artificial intelligence of the virtual object.
Because the simulated player controls the virtual object, the hybrid density network can train the artificial intelligence of the gunfight game through a small number of recorded samples, the training efficiency is greatly improved, meanwhile, the light residual small model can extract abstract features with more discriminant, and the artificial intelligence of the game can achieve better results in the gunfight game. Finally, probability parameters of discrete probability distribution and Gaussian mixture distribution are output through a mixed density network, and the angle and the amplitude of the visual angle are sampled according to the probability distribution, so that the multi-value problem of the game can be well solved.
Fig. 5 illustrates a flowchart of a method 500 of hosting virtual object behavior in a combat game, according to an embodiment of the present disclosure.
The game hosting scene is that when a terminal is offline or a player is busy, the player can host the game characters of the player, so that the terminal or the server can replace the player to control the game characters.
In operation 501, the device 2000 may determine a probability distribution of virtual object behavior in a predetermined behavior set based on a game interface of the virtual object. The game interface of the virtual object may be the scene image 100 in fig. 1. A reference image area, such as a small map area, may be included in the scene image.
The probability distribution of the virtual object behavior in the predetermined behavior set may be either a discrete probability distribution or a continuous probability distribution, which is not limited by the present disclosure. The probability distribution of the virtual object behavior in the predetermined set of behaviors indicates the probability of the virtual object behavior occurring in the predetermined set of behaviors. For example, assume that the virtual object behavior indicates whether the virtual object is to fire a gun, and the predetermined behavior set includes two behaviors of "fire" and "not fire". The device 2000 calculates the probability of "shooting" to be 0.7 and the probability of not shooting to be 0.3 based on the scene characteristics. At this time, the virtual object performs a shooting operation with a probability of 0.7 when facing the scene shown in fig. 2B. The virtual object artificial intelligence uses the probability distribution to output a random number. Assuming that the random number is shot by 1 and not shot by 0, the number of times of outputting the random number 1 is 70% and the number of times of outputting the random number 0 is 30% when the virtual object faces the same scene a plurality of times. Therefore, the behavior mode of the virtual object is not stiff and easy to predict, and the interest of the game is improved.
Optionally, as shown in fig. 2B, the probability distribution of the virtual object behavior in the predetermined behavior set includes a first probability distribution, a second probability distribution, and a third probability distribution. The first probability distribution indicates a probability distribution of a virtual object moving in a plurality of directions of movement. The second probability distribution indicates a probability distribution of view angle values of the virtual object at a next time in a view angle value interval. The third probability distribution indicates a probability distribution of view angle amplitude values of the virtual object at a next time in a view angle amplitude value interval.
In operation 502, the device 2000 may host the virtual object behavior based on the probability distribution. Wherein, in the case that an attack object appears on the game interface, the extremum of the probability distribution of the view angle of the virtual object at the next time is close to the view angle facing the attack object.
For example, if an enemy appears on the right side, the viewing angle needs to be moved to the right until the enemy appears in the center of the image, facilitating the virtual object to attack the enemy. Two extreme values of the second probability distribution are shown in fig. 2B, which correspond to the viewing angle facing the enemy character a and the viewing angle facing the enemy character B, respectively. The visual angle values are randomly output according to the second probability distribution, and the output visual angle values have high probability of being visual angles close to extreme values, so that the virtual object moves the sight line towards the direction of the attack object.
Since the virtual object determines the behavior of the virtual object according to the probability distribution, the virtual object can execute a plurality of reasonable game strategies when facing the same game scene, thereby better solving the multi-value problem faced by the virtual object.
Fig. 6 is a block diagram illustrating an apparatus 2000 that determines virtual object behavior according to an embodiment of the present disclosure.
The device 2000 may include a contextual characteristics acquisition module 601, a probability distribution determination module 602, and a virtual object behavior determination module 603.
The scene feature acquisition module 601 may be configured to acquire, based on a scene image of a virtual object, scene features characterizing a scene in which the virtual object is located using a residual network.
Optionally, a reference image area, such as a small map area, may be included in the scene image. The contextual characteristics acquisition module 601 may also perform: and intercepting a reference image area from the scene image, wherein the reference image area shows the acquirable information of the virtual object in the game. The acquaintance information is, for example, a deployment layout of both sides of the friend or foe, map information that has been explored, a position of an enemy object, a position of a team friend of the friend or foe, or the like. Because the reference image area represents the acquirable information of the virtual object in the scene image in an abstract manner, the device 2000 can obtain the scene characteristics representing the scene where the virtual object is located by using the residual network based on only the reference image area, thereby reducing the input parameter quantity of the residual network and enabling the residual network to be lighter and more efficient.
The probability distribution determination module 602 may be configured to utilize probability distributions that determine virtual object behavior based on the contextual characteristics.
Alternatively, the device 2000 may utilize a behavior prediction network to determine a probability distribution of virtual object behavior in a predetermined set of behaviors. The combination of the behavior prediction network and the residual network may be referred to as a hybrid density network. Wherein the behavior prediction network includes a movement direction prediction network, a view angle prediction network, and a view magnitude prediction network, and the virtual object behavior includes at least a part of a movement direction of the virtual object, a view angle of the virtual object at a next time, and a view magnitude of the virtual object at the next time. Optionally, the predetermined set of behaviors of the moving direction of the virtual object includes moving upward, upward right, downward left, upward left. The predetermined set of behavior of the perspective angle value of the virtual object is a section of the perspective angle value to which the virtual object can rotate. The predetermined behavior set of view angle amplitude values of the virtual object is a section of view angle amplitude values that the virtual object can change.
The virtual object behavior determination module 603 may be configured to determine the virtual object behavior based on the probability distribution.
Optionally, the device 2000 determines a movement direction of the virtual object based on the first probability distribution. The virtual object randomly samples the moving motion according to the first probability, rather than always performing the motion with the highest probability. In some cases, performing the most probable action may result in the virtual object hitting the obstacle and then stopping at the front of the obstacle and being unable to move. Through the random sampling movement behavior of the first probability, the virtual object can be prevented from being blocked in the game scene, and the virtual object moves randomly according to the first probability, so that a certain probability is achieved.
Optionally, the device 2000 determines a perspective angle value of the virtual object at a next time instant based on the second probability distribution. If the enemy appears on the right side, the visual angle is required to be moved to the right until the enemy appears in the center of the image, so that the virtual object can attack the enemy conveniently. Two extreme values are shown in fig. 2B, which correspond to the viewing angle facing the enemy character a and the viewing angle facing the enemy character B, respectively. The visual angle values are randomly output according to the second probability distribution, and the output visual angle values have high probability of being visual angles close to extreme values, so that the virtual object moves the sight line towards the direction of the attack object. Similarly, the device 2000 may also determine a view angle magnitude value of the virtual object at a next time instant based on the third probability distribution.
Embodiments of the present disclosure provide a computer-readable storage medium having stored thereon computer instructions that, when executed by a processor, implement method 200 and method 500.
According to the method 200 and the method 500, the distribution of the virtual object behaviors is calculated through the lightweight residual error network, and the technical problems that training time is too long, design difficulty is too high, multi-value problems cannot be processed and the like exist in the design of the virtual object artificial intelligence are solved.
It is noted that the flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In general, the various example embodiments of the disclosure may be implemented in hardware or special purpose circuits, software, firmware, logic, or any combination thereof. Some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device. While aspects of the embodiments of the present disclosure are illustrated or described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that the blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.
The exemplary embodiments of the invention described in detail above are illustrative only and are not limiting. It will be appreciated by those skilled in the art that various modifications and combinations of the embodiments or features thereof can be made without departing from the principles and spirit of the invention, and such modifications are intended to be within the scope of the invention.

Claims (10)

1. A method of determining virtual object behavior, comprising:
acquiring scene characteristics representing a scene where a virtual object is located by utilizing a residual network based on a scene image of the virtual object, wherein the scene image comprises a reference image area, the scene characteristics are multidimensional floating point number vectors, each element in the multidimensional floating point number vectors is a floating point number, the multidimensional floating point number vectors comprise a plurality of dimensions, and the scene characteristics represent scene information in the form of numerical values;
determining a probability distribution of virtual object behaviors in a predetermined behavior set using a behavior prediction network based on the contextual characteristics; and
determining the virtual object behavior based on the probability distribution,
wherein the behavior prediction network comprises a movement direction prediction network, a view angle prediction network, and a view magnitude prediction network, and the virtual object behavior comprises at least one of the following behaviors: the moving direction of the virtual object, the visual angle of the virtual object at the next moment and the visual angle amplitude of the virtual object at the next moment;
The obtaining, by the scene image based on the virtual object, scene characteristics that characterize a scene where the virtual object is located further includes: intercepting a reference image area from the scene image, wherein the reference image area shows the acquirable information of the virtual object in a game; acquiring scene characteristics representing the scene of the virtual object by using the residual error network based on the reference image area;
wherein the determining a probability distribution of virtual object behavior using a behavior prediction network and determining the virtual object behavior based on the probability distribution further comprises: inputting the contextual characteristics to a movement direction prediction network to obtain a first probability distribution, the first probability distribution indicating a probability distribution of movement of the virtual object in a plurality of movement directions, and determining a movement direction of the virtual object based on the first probability distribution; inputting the scene characteristics into a visual angle prediction network to obtain a second probability distribution, wherein the second probability distribution indicates the probability distribution of the visual angle value of the virtual object at the next moment in a visual angle value interval, and the visual angle value of the virtual object at the next moment is determined based on the second probability distribution; inputting the scene feature to a view angle magnitude prediction network to obtain a third probability distribution indicating a probability distribution of view angle magnitude values of the virtual object at a next time in a view angle magnitude value interval, and determining the view angle magnitude values of the virtual object at the next time based on the third probability distribution,
Wherein the residual network comprises: at least one first residual module having a spatial dimension of an input feature that is twice a spatial dimension of an output feature thereof, and a channel dimension of the input feature of the first residual module that is one half a channel dimension of the output feature thereof; and at least one second residual module, wherein the spatial dimension and the channel dimension of the input feature of the second residual module are the same as the spatial dimension and the channel dimension of the output feature of the second residual module.
2. The method of determining virtual object behavior of claim 1 wherein,
the first residual module includes:
a first number of first convolution layers, the step size of the first convolution layers being a first step size, and the size of the convolution kernel of the first convolution layers being a first size;
a second number of second convolution layers, the step size of the second convolution layers being a second step size, and the size of the convolution kernel of the second convolution layers being a second size;
a second number of third convolution layers, the step size of the third convolution layers being a second step size, and the size of the convolution kernel of the third convolution layers being a first size;
the second residual module includes:
a first number of third convolution layers, the step size of the first convolution layer being a second step size, and the size of the convolution kernel of the third convolution layer being a first size;
A second number of second convolution layers, the step size of the second convolution layers being a second step size, and the size of the convolution kernel of the second convolution layers being a second size.
3. The method of determining virtual object behavior according to claim 1, wherein the virtual object is a virtual object in a game, and the scene image of the virtual object is a game interface, the method further comprising:
recording video of the game interface where the virtual object is manipulated,
obtaining a plurality of sample data from the video, each sample data comprising a game interface sample and a movement direction sample executed by a virtual object for the game interface sample, a view angle sample at a next time and a view amplitude sample at the next time;
training the residual network and the behavior prediction network based on the plurality of sample data.
4. A method of determining virtual object behavior as recited in claim 3, wherein the training the residual network and the behavior prediction network comprises:
training parameters of a residual network and a movement direction prediction network by optimizing class cross entropy loss between the movement direction samples and movement directions predicted by the movement direction prediction network based on a game interface sample and a movement direction sample corresponding to the game interface sample in the plurality of sample data;
Training parameters of a residual network and a view angle prediction network by optimizing posterior probability loss between the view angle sample and a probability distribution of view angles predicted by the view angle prediction network based on a game interface sample in the plurality of sample data and a view angle sample at a next time corresponding to the game interface sample; and
based on a game interface sample in the plurality of sample data and a view angle amplitude sample at a next moment corresponding to the game interface sample, training parameters of a residual network and a view angle amplitude prediction network by optimizing posterior probability loss between the view angle amplitude sample and a probability distribution of view angle amplitude predicted based on the view angle amplitude prediction network.
5. The method of determining virtual object behavior according to claim 1, wherein the virtual object is a virtual object manipulated in a game, the scene image of the virtual object is a fight game interface, wherein in case an attack object of the virtual object appears on the fight game interface, an extremum of the second probability distribution is close to a perspective angle facing the attack object.
6. A method of hosting virtual object behavior in a combat game, comprising:
Determining scene characteristics representing a scene where a virtual object is located based on a game interface of the virtual object, wherein the game interface comprises a reference image area, the scene characteristics are multidimensional floating point number vectors, each element in the multidimensional floating point number vectors is a floating point number, the multidimensional floating point number vectors comprise a plurality of dimensions, and the scene characteristics represent scene information in the form of numerical values;
determining probability distribution of virtual object behaviors in a preset behavior set based on the scene characteristics, wherein the virtual object behaviors comprise a moving direction of a virtual object, a visual angle of the virtual object at the next moment and a visual angle amplitude of the virtual object at the next moment; and
hosting the virtual object behavior based on the probability distribution;
wherein, in the case that an attack object of the virtual object appears on the game interface, an extremum of probability distribution of a viewing angle of the virtual object at a next time is close to a viewing angle facing the attack object;
wherein determining the probability distribution of the virtual object behavior in the predetermined behavior set comprises: inputting the contextual characteristics to a movement direction prediction network to obtain a first probability distribution, the first probability distribution indicating a probability distribution of movement of the virtual object in a plurality of movement directions, and determining a movement direction of the virtual object based on the first probability distribution; inputting the scene characteristics into a visual angle prediction network to obtain a second probability distribution, wherein the second probability distribution indicates the probability distribution of the visual angle value of the virtual object at the next moment in a visual angle value interval, and the visual angle value of the virtual object at the next moment is determined based on the second probability distribution; and inputting the scene characteristic into a view angle amplitude prediction network to obtain a third probability distribution, wherein the third probability distribution indicates the probability distribution of the view angle amplitude value of the virtual object at the next moment in a view angle amplitude value interval, and determining the view angle amplitude value of the virtual object at the next moment based on the third probability distribution.
7. The method of hosting virtual object behavior in a combat game of claim 6, wherein the reference image area shows the position and distance of an attacking object or obstacle of the virtual object in view in the combat game.
8. An apparatus for determining virtual object behavior, comprising:
a scene feature acquisition module configured to acquire, based on a scene image of a virtual object, a scene feature characterizing a scene in which the virtual object is located using a residual network, wherein the scene image includes a reference image area, the scene feature is a multidimensional floating point vector, each element in the multidimensional floating point vector is a floating point, the multidimensional floating point vector includes a plurality of dimensions, and the scene feature characterizes scene information in a numerical form;
a probability distribution determination module configured to determine a probability distribution of virtual object behaviors in a predetermined behavior set using a behavior prediction network based on the contextual characteristics; and
a virtual object behavior determination module configured to determine the virtual object behavior based on the probability distribution,
the behavior prediction network comprises a moving direction prediction network, a visual angle prediction network and a visual angle amplitude prediction network, and the virtual object behavior comprises a moving direction of a virtual object, a visual angle of the virtual object at the next moment and a visual angle amplitude of the virtual object at the next moment;
The obtaining, by the scene image based on the virtual object, scene characteristics that characterize a scene where the virtual object is located further includes: intercepting a reference image area from the scene image, wherein the reference image area shows the acquirable information of the virtual object in a game; acquiring scene characteristics representing the scene of the virtual object by using the residual error network based on the reference image area;
wherein the determining a probability distribution of virtual object behavior using a behavior prediction network and determining the virtual object behavior based on the probability distribution further comprises: inputting the contextual characteristics to a movement direction prediction network to obtain a first probability distribution, the first probability distribution indicating a probability distribution of movement of the virtual object in a plurality of movement directions, and determining a movement direction of the virtual object based on the first probability distribution; inputting the scene characteristics into a visual angle prediction network to obtain a second probability distribution, wherein the second probability distribution indicates the probability distribution of the visual angle value of the virtual object at the next moment in a visual angle value interval, and the visual angle value of the virtual object at the next moment is determined based on the second probability distribution; inputting the scene feature to a view angle magnitude prediction network to obtain a third probability distribution indicating a probability distribution of view angle magnitude values of the virtual object at a next time in a view angle magnitude value interval, and determining the view angle magnitude values of the virtual object at the next time based on the third probability distribution,
Wherein the residual network comprises: at least one first residual module having a spatial dimension of an input feature that is twice a spatial dimension of an output feature thereof, and a channel dimension of the input feature of the first residual module that is one half a channel dimension of the output feature thereof; and at least one second residual module, wherein the spatial dimension and the channel dimension of the input feature of the second residual module are the same as the spatial dimension and the channel dimension of the output feature of the second residual module.
9. An apparatus for determining virtual object behavior, comprising:
a processor;
a memory storing computer instructions which, when executed by a processor, implement the method of any one of claims 1-5.
10. A computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the method of any of claims 1-7.
CN202010228976.XA 2020-03-27 2020-03-27 Method for determining virtual object behaviors and hosting virtual object behaviors Active CN111437605B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010228976.XA CN111437605B (en) 2020-03-27 2020-03-27 Method for determining virtual object behaviors and hosting virtual object behaviors

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010228976.XA CN111437605B (en) 2020-03-27 2020-03-27 Method for determining virtual object behaviors and hosting virtual object behaviors

Publications (2)

Publication Number Publication Date
CN111437605A CN111437605A (en) 2020-07-24
CN111437605B true CN111437605B (en) 2023-06-27

Family

ID=71650845

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010228976.XA Active CN111437605B (en) 2020-03-27 2020-03-27 Method for determining virtual object behaviors and hosting virtual object behaviors

Country Status (1)

Country Link
CN (1) CN111437605B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112426717A (en) * 2020-09-21 2021-03-02 成都完美天智游科技有限公司 Method and device for generating frame data, storage medium and computer equipment

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108888958B (en) * 2018-06-22 2023-03-21 深圳市腾讯网络信息技术有限公司 Virtual object control method, device, equipment and storage medium in virtual scene
CN110339569B (en) * 2019-07-08 2022-11-08 深圳市腾讯网域计算机网络有限公司 Method and device for controlling virtual role in game scene

Also Published As

Publication number Publication date
CN111437605A (en) 2020-07-24

Similar Documents

Publication Publication Date Title
JP7159458B2 (en) Method, apparatus, device and computer program for scheduling virtual objects in a virtual environment
CN109499068B (en) Object control method and device, storage medium and electronic device
CN111111204B (en) Interactive model training method and device, computer equipment and storage medium
CN111111220B (en) Self-chess-playing model training method and device for multiplayer battle game and computer equipment
CN109529352B (en) Method, device and equipment for evaluating scheduling policy in virtual environment
CN108629422A (en) A kind of intelligent body learning method of knowledge based guidance-tactics perception
Poulsen et al. DLNE: A hybridization of deep learning and neuroevolution for visual control
Gold Training goal recognition online from low-level inputs in an action-adventure game
KR20210090239A (en) Information prediction methods, model training methods and servers
CN113996063A (en) Method and device for controlling virtual character in game and computer equipment
CN112807681A (en) Game control method, device, electronic equipment and storage medium
CN111841018A (en) Model training method, model using method, computer device and storage medium
CN116956007A (en) Pre-training method, device and equipment for artificial intelligent model and storage medium
CN115888119A (en) Game AI training method, device, electronic equipment and storage medium
Youssef et al. Building your kingdom imitation learning for a custom gameplay using unity ml-agents
Nam et al. Generation of diverse stages in turn-based role-playing game using reinforcement learning
CN111437605B (en) Method for determining virtual object behaviors and hosting virtual object behaviors
CN113509726B (en) Interaction model training method, device, computer equipment and storage medium
Liu et al. Playing Card-Based RTS Games with Deep Reinforcement Learning.
CN114935893B (en) Motion control method and device for aircraft in combat scene based on double-layer model
CN114344889B (en) Game strategy model generation method and control method of intelligent agent in game
Louis et al. Real-time strategy game micro for tactical training simulations
Wu et al. A training model of wargaming based on imitation learning and deep reinforcement learning
Kahng et al. Clear the fog: Combat value assessment in incomplete information games with convolutional encoder-decoders
Soylucicek et al. A fuzzy logic based attack strategy design for enemy drones in meteor escape game

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40026353

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant