CN113577769B - Game character action control method, apparatus, device and storage medium - Google Patents

Game character action control method, apparatus, device and storage medium Download PDF

Info

Publication number
CN113577769B
CN113577769B CN202110769501.6A CN202110769501A CN113577769B CN 113577769 B CN113577769 B CN 113577769B CN 202110769501 A CN202110769501 A CN 202110769501A CN 113577769 B CN113577769 B CN 113577769B
Authority
CN
China
Prior art keywords
predicted
game
action
game character
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110769501.6A
Other languages
Chinese (zh)
Other versions
CN113577769A (en
Inventor
刘舟
徐键滨
吴梓辉
徐雅
王理平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Sanqi Jiyao Network Technology Co ltd
Original Assignee
Guangzhou Sanqi Jiyao Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Sanqi Jiyao Network Technology Co ltd filed Critical Guangzhou Sanqi Jiyao Network Technology Co ltd
Priority to CN202110769501.6A priority Critical patent/CN113577769B/en
Publication of CN113577769A publication Critical patent/CN113577769A/en
Application granted granted Critical
Publication of CN113577769B publication Critical patent/CN113577769B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/50Controlling the output signals based on the game progress
    • A63F13/52Controlling the output signals based on the game progress involving aspects of the displayed game scene
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/55Controlling game characters or game objects based on the game progress

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The invention discloses a game role action control method, which comprises the following steps: acquiring current game state information of the game character, and acquiring current game state information of an opponent game character in a special game state; predicting predicted position information of the hostile game role at the next moment according to the current game state information of the hostile game role; inputting current game state information of the game character and predicted position information of the hostile game character into an AI model, and determining a predicted action output strategy currently adopted by the game character; and controlling the game role to output the predicted action according to the predicted action output strategy. The invention also discloses a game role motion control device, equipment and a computer readable storage medium. By adopting the embodiment of the invention, the predicted action output strategy of the game role can be determined by combining the current game states of other roles, so that the predicted output result of the role action is accurate and reasonable.

Description

Game character action control method, apparatus, device and storage medium
Technical Field
The present invention relates to the field of game role control technologies, and in particular, to a method, an apparatus, a device, and a storage medium for controlling game role actions.
Background
With the gradual development of internet technology, the electronic competition industry is receiving more and more attention. For the competitive game, in order to ensure good experience of the user in the game process, the role motion of the game role at the next moment needs to be predicted in the game process, for example, when the user hangs up, the game role normally outputs motion according to the predicted role motion at the next moment. The character actions of the existing competitive game are to directly output the predicted actions according to the current game state of the character of the game, but the output of the predicted actions of the character of the game cannot be reasonably output by combining the current game states of other characters, so that the predicted output results of the actions of the character are not accurate and reasonable enough. For example, the present game character releases a certain skill for the hostile game character based on the current game state information, but the hostile game character may be out of the hit range of the skill at the next moment, and at this moment, the predicted action cannot hit the hostile game character normally.
Disclosure of Invention
The embodiment of the invention aims to provide a game role action control method, a game role action control device and a game role action control storage medium, which can be used for determining a predicted action output strategy of a game role according to the current game states of other roles, so that a predicted output result of the role action is accurate and reasonable.
In order to achieve the above object, an embodiment of the present invention provides a game character action control method, including:
acquiring current game state information of the game character, and acquiring current game state information of an opponent game character in a special game state;
predicting predicted position information of the hostile game role at the next moment according to the current game state information of the hostile game role;
inputting current game state information of the game character and predicted position information of the hostile game character into an AI model, and determining a predicted action output strategy currently adopted by the game character;
and controlling the game role to output the predicted action according to the predicted action output strategy.
As an improvement of the above-described aspect, the predicted action output policy includes at least one of a determination policy of an attack target, a determination policy of a game action, and a determination policy of a moving path.
As an improvement of the above-mentioned scheme, controlling the present game character to output the predicted action according to the predicted action output policy includes:
when unreasonable actions exist in the predicted actions, an action limiting mechanism corresponding to the predicted actions is acquired;
performing action limiting operation on the unreasonable actions according to the action limiting mechanism so as to adjust sampling probability of the unreasonable actions;
And outputting the predicted action after the action limiting operation is finished.
As an improvement of the above-described aspect, the method further includes, after outputting the predicted motion after the motion limiting operation is performed:
and inputting the predicted actions after the action limiting operation is performed into the AI model as sample data to train the AI model.
As an improvement of the above-described aspect, the predicted action is a pitch angle of the present game character; then, controlling the game character to output the predicted action according to the predicted action output strategy, and further comprising:
acquiring an application scene of the game character using pre-aiming skills;
when the game character does not see an attack target, using pre-aiming skills, and outputting the AI model predicted angle as a pitching angle;
and when the game character sees the attack target, calculating a preset position included angle between the game character and the attack target as a pitching angle by using the pre-aiming skill.
As an improvement of the above solution, the AI model includes a critic network and an actor network, and the predicted action is output through the actor network.
As an improvement of the above scheme, the AI model is obtained by training a preset AI model training method, and the AI model training method includes:
Acquiring sample data; wherein the sample data includes game status data of a game character and actual rewards;
inputting the game state data into the AI model to generate a predicted prize for the game character;
calculating a loss function of the AI model according to the predicted rewards and the actual rewards; wherein, the actual rewards are calculated by a preset rewarding mechanism;
and optimizing the AI model according to the loss function until the loss function converges.
In order to achieve the above object, an embodiment of the present invention further provides a game character action control device, including:
the game state information acquisition module is used for acquiring current game state information of the game role and acquiring current game state information of the hostile game role in a special game state;
the predicted position information acquisition module is used for predicting predicted position information of the hostile game role at the next moment according to the current game state information of the hostile game role;
the predicted action output strategy determining module is used for inputting the current game state information of the game character and the predicted position information of the hostile game character into the AI model to determine the predicted action output strategy currently adopted by the game character;
And the predicted action output module is used for controlling the game role to output the predicted action according to the predicted action output strategy.
To achieve the above object, an embodiment of the present invention further provides a game character action control device including a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, the processor implementing the game character action control method according to any one of the above embodiments when executing the computer program.
To achieve the above object, an embodiment of the present invention further provides a computer readable storage medium, where the computer readable storage medium includes a stored computer program, and when the computer program runs, controls a device where the computer readable storage medium is located to execute the game role motion control method according to any one of the embodiments.
Compared with the prior art, the game role action control method, the device, the equipment and the storage medium disclosed by the embodiment of the invention firstly acquire the current game state information of the game role and acquire the current game state information of the hostile game role in a special game state; then, predicting predicted position information of the hostile game role at the next moment according to the current game state information of the hostile game role; and finally, inputting the current game state information of the game character and the predicted position information of the hostile game character into an AI model, and determining the current predicted action output strategy adopted by the game character so as to control the game character to output the predicted action according to the predicted action output strategy. When the current predicted action output strategy of the game role is determined, the game state information of the hostile game role is fully considered, so that the predicted position information of the hostile game role at the next moment can be predicted based on the game state information of the hostile game role, the game role can make a corresponding strategy in advance, the skill hit rate of the game role is further improved, and the predicted output result of the game role action is accurate and reasonable.
Drawings
FIG. 1 is a flow chart of a method for controlling game character actions provided by an embodiment of the present invention;
FIG. 2 is a block diagram showing a game character motion control apparatus according to an embodiment of the present invention;
fig. 3 is a block diagram showing a configuration of a game character motion control apparatus according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, fig. 1 is a flowchart of a game character motion control method according to an embodiment of the present invention, where the game character motion control method includes:
s1, acquiring current game state information of a game character, and acquiring current game state information of an opponent game character in a special game state;
s2, predicting predicted position information of the hostile game role at the next moment according to current game state information of the hostile game role;
S3, inputting current game state information of the game character and predicted position information of the hostile game character into an AI model, and determining a predicted action output strategy currently adopted by the game character;
s4, controlling the game role to output the predicted action according to the predicted action output strategy.
Specifically, in step S1, the game state information includes position information, attribute information, combat state information, and skill information, and ray information for detecting obstacles around the AI (only for the AI character). The position information is the current position information of the game role; the attribute information includes weapon type of game character, such as firearm, and other weapon attribute information (weapon bullet total number) and team information for distinguishing the present game character from the hostile game character; the fight state information is a life value of the game character, whether enemies are killed or not, whether the game character is in a squat state or not, whether the bullet remains (only aiming at the AI character) or not, whether the game character is fired or not, whether the game character is stealth or not, and whether the AI is visible to the game character (only aiming at the AI character); the skill information may further include current status information for each skill in the skill column of the game character, where the skill in the skill type encoding skill column is optional, e.g., three skills in the skill column of the game character, the current status information including available status and unavailable status, it being understood that the skills are unavailable status when in recovery.
It should be noted that, before obtaining current game status information of the present game character and the opponent game character, it is required to detect whether the present game character and the opponent game character meet preset fight conditions, where the fight conditions include: the hostile game character is within an attackeable range of the game character. The number of the hostile game roles in the attackeable range of the game role can be multiple, and when the hostile game roles are in the special game state, the position information of at least one hostile game role at the next moment can be predicted at the same time.
Illustratively, the special game state includes at least one of a blood volume below a preset blood volume threshold, a skill support volume below a preset skill support volume threshold, the blood volume being indicative of a life value of the game character, the skill support volume being indicative of a skill availability of the game character. Since the action behavior of the opponent game character is easily limited when the blood volume or the skill support volume is low, the accuracy of predicting the position information of the opponent game character at the next time is relatively high for the opponent game character with the blood volume or the skill support volume being low because the action behavior is relatively regular. It should be noted that the blood volume threshold and the skill support volume threshold may be set according to different types of game characters, and the present invention is not limited herein.
Specifically, in step S2, after the current game state information of the hostile game character is obtained, the predicted position information of the hostile game character at the next time is predicted according to the current game state information of the hostile game character.
For example, the current game state information of the hostile game character may be input into a preset position prediction model, and the position prediction model outputs predicted position information of the hostile game character at the next moment according to the current game state information of the hostile game character. When sample data are acquired in the training process of the position prediction model, samples of enemy blood volume lower than a blood volume threshold and skill support volume lower than a skill support volume threshold are required to be respectively extracted, game state information of the enemy game character is used as training data (model input), position information of the character at the next moment is extracted as target data (model output), prediction is carried out according to a multi-layer perceptron model, and mean square error is used as a loss function training of the model. It should be noted that, the specific application of the multi-layer perceptron model and the loss function using the mean square error as the model may refer to the prior art, and will not be described herein.
Specifically, in step S3, current game state information of the present game character and predicted position information of the hostile game character are input into an AI model, so that the AI model outputs a predicted action output policy of the hostile game character, where the predicted action output policy includes at least one of a determination policy of an attack target, a determination policy of a game action, and a determination policy of a movement path.
For example, when two hostile game characters (such as hostile game character a and hostile game character B) exist within the attackeable range of the present game character, the AI model may determine a primary attack target according to predicted position information of the hostile game character, for example, select hostile game character a whose predicted position information is closer to the present game character as the attack target. After determining the attack target, the AI model may formulate a path of movement for the character to move to the attack target and game actions to take, such as the game actions to employ a specified skill attack.
In order to keep the input dimensions of the AI model uniform, when the predicted position information is input to the AI model, since the position of the other opponent game character is not predicted, the predicted value is not present, and at this time, it is necessary to fill the predicted value of the position of the other opponent game character with 0, and the predicted position information of the opponent game character in the special game state having the predicted value is input to the AI model.
Optionally, the AI model is obtained by training a preset AI model training method, and the AI model training method includes steps S31 to S34:
s31, acquiring sample data; wherein the sample data includes game status data of a game character and actual rewards;
s32, inputting the game state data into the AI model to generate predicted rewards of the game roles;
s33, calculating a loss function of the AI model according to the predicted rewards and the actual rewards; wherein, the actual rewards are calculated by a preset rewarding mechanism;
s34, optimizing the AI model according to the loss function until the loss function converges
Specifically, in step S31, the game state data includes: the game character includes first state information of a step at a current time, first action information of the step at the current time, and second state information of the step at a next time.
Specifically, in step S32, the AI model in the embodiment of the present invention is a DDPG model, and the DDPG model has two networks: an actor network and a critic network. Inputting the first state information and the first action information into a critic network to obtain a first state action value of the current time step; then, inputting the second state information of the next step into an actor network to obtain the predicted action of the next step, namely second action information, and inputting the second state information and the second action information into a critic network to obtain a second state action value of the next step; and finally, taking the difference between the first state action value and the second state action value to obtain a value difference value as the predicted reward.
It should be noted that the action information refers to actions taken by the virtual character in the environment; the status information is the result of taking the action and is reflected in the status of the game. For example, if the action information is shooting, the corresponding state information is enemy blood drop, enemy death and the like; for example, if the action information is jumping, if the box body is jumped up, the corresponding state information is that the height of the box body is increased. In the embodiment of the present invention, the state obtained from the game environment once every n frames is called a time step, and it is understood that the current time step and the next time step are two consecutive time steps. State action value refers to the expected return on a state and action, such as a return being an accumulated value of a game prize, and expected return being an average of returns for a plurality of games. The first network is a neural network, its input is state and action, and its output is state action value, and the neural network parameters are back-propagated and regulated by means of loss function so as to make them be close to actual one.
Specifically, in steps S33 to S34, the bonus mechanism indicates that the game character may obtain a corresponding bonus when some conditions are satisfied in the course of the game, for example, the bonus mechanism includes: when the operation state of the virtual character is normal operation, giving forward rewards; and when the operation state of the virtual character is irregular operation, giving negative rewards. The critic network and the actor network are further divided into an evaluation network and a target network respectively, wherein the target network does not participate in training, and parameters are periodically copied from the evaluation network. The predicted actions are output through the actor network. Illustratively, the sum of squares of the difference between the predicted and actual rewards is taken as the loss function of the critic network, and the loss function of the actor network is taken as the predicted value of the critic network. The critic network loss function satisfies the following formula:
Wherein N is the total number of the sample data; y is i =r i +γQ′(s i+1 ,μ′(s i+1μ′ )|θ Q′ );r i A reward for the ith time step; gamma is the super-ginseng, discount factor; q (s, a|theta) Q ) For critic network, the parameter is θ Q ;μ(s|θ μ ) Is an actor network, and the parameter is theta μ The method comprises the steps of carrying out a first treatment on the surface of the Q' is the target network of the critic network; μ' is a target network of the actor network, the target network parameters do not participate in training, the parameters are updated from the Q network and the μ network at regular intervals, and the actor network updates the parameters by maximizing the predicted value of the critic network.
Specifically, in step S4, after the predicted action output policy is determined, the game character is controlled to output a corresponding predicted action according to the predicted action output policy. When the game character outputs the predicted action, the rationality of the predicted action needs to be judged, and if the predicted action is unreasonable, the sampling probability of the unreasonable action needs to be adjusted.
Optionally, the controlling the game character to output the predicted action according to the predicted action output policy includes steps S411 to S413:
s411, when unreasonable actions exist in the predicted actions, an action limiting mechanism corresponding to the predicted actions is acquired;
s412, performing action limiting operation on the unreasonable actions according to the action limiting mechanism so as to adjust sampling probability of the unreasonable actions;
S413, outputting the predicted operation after the operation limiting operation is performed.
Specifically, the output of the AI model is limited by judging whether the predicted action accords with a normal rule, and the AI model outputs a probability of selecting an action for discrete actions, and the sampling probability of the discrete actions is adjusted to 0 for unreasonable actions, so that the actions cannot be sampled.
Illustratively, when the predicted action is movement path selection, the irrational action is: in the direction of advance of the presence of the obstacle; then, the action limiting mechanism is: the sampling probability of the forward direction in which an obstacle exists is adjusted to 0. When the predicted action output is attack target selection, the unreasonable action output is: selecting an invisible attack target; then, the action limiting mechanism is: and adjusting the sampling probability of the invisible attack target to 0. When the predicted action is a game action taken, the unreasonable action output is: selecting skills in a cooling state; then, the action limiting mechanism is: the sampling probability of the skill in the cool state is adjusted to 0.
For example, the predicted action is a target for selecting attack, the AI model may select an enemy which is not visible from the visual field, at this time, a certain process is required to be performed on the action output of the AI model, and the probabilities of selecting the enemy 1, the enemy 2 and the enemy 3 are output as the probabilities of selecting the enemy 1, the enemy 2 and the enemy 3, and the sum of the probabilities is 1, and if the enemy 1 is not visible at this time, the probability of selecting the enemy 1 is adjusted to 0, and only the visible enemy is selected.
Further, when the predicted action is the pitch angle of the present game character; then, the game character is controlled to output the predicted action according to the predicted action output strategy, and the method further comprises steps S421 to S423:
s421, acquiring an application scene of the game character using pre-aiming skills;
s422, when the game character does not see an attack target, using pre-aiming skills, and outputting the AI model predicted angle as a pitching angle;
s423, when the game character sees the attack target, using the pre-aiming skill to calculate a preset position included angle between the game character and the attack target as a pitching angle.
Illustratively, the pitch angle is a pitch angle, representing the pitch angle that the present game character adopts when using its pre-aiming skills. When the game character does not see the attack target, the pre-aiming skill is used, the AI model predicted angle is output as the pitching angle, and the game character cannot directly calculate the pitching angle at the moment because the game character does not see the attack target in the visual field range, so that the AI model predicted angle can be directly adopted as the pitching angle. When the game character sees the attack target, the pre-aiming skill is used, the pitching angle of the game character can be directly calculated, and if the pre-set position is the head, the included angle between the connecting line of the head of the game character and the head of the hostile game character and the horizontal line is used as the pitching angle.
Further, after outputting the predicted motion after the motion limiting operation is performed in step S413, the method further includes step S414:
s414, the predicted actions after the action limiting operation is performed are used as sample data to be input into the AI model, so that the AI model is trained.
For example, the predicted motion after the motion limitation is performed may be input as sample data to the AI model, the AI model may be trained, and if the motion limitation is not performed, the predicted motion is directly input to the AI model, the invalid sample generated by the AI model may be too many and cannot be applied to the AI model, so, in order to reduce the invalid sample, the predicted motion is limited, the training speed of the AI model is faster, and the better the AI model is trained, the less the unreasonable motion output is generated.
Compared with the prior art, the game role action control method disclosed by the embodiment of the invention comprises the steps of firstly, acquiring current game state information of a game role and acquiring current game state information of an opponent game role in a special game state; then, predicting predicted position information of the hostile game role at the next moment according to the current game state information of the hostile game role; and finally, inputting the current game state information of the game character and the predicted position information of the hostile game character into an AI model, and determining the current predicted action output strategy adopted by the game character so as to control the game character to output the predicted action according to the predicted action output strategy. When the current predicted action output strategy of the game role is determined, the game state information of the hostile game role is fully considered, so that the predicted position information of the hostile game role at the next moment can be predicted based on the game state information of the hostile game role, the game role can make a corresponding strategy in advance, the skill hit rate of the game role is further improved, and the predicted output result of the game role action is accurate and reasonable.
Referring to fig. 2, fig. 2 is a block diagram showing a game character motion control apparatus 10 according to an embodiment of the present invention, the game character motion control apparatus 10 includes:
the game state information acquisition module 11 is used for acquiring current game state information of the game character and acquiring current game state information of the hostile game character in a special game state;
a predicted position information obtaining module 12, configured to predict predicted position information of the hostile game character at a next time according to current game state information of the hostile game character;
a predicted action output strategy determining module 13, configured to input current game state information of the current game character and predicted position information of the hostile game character into an AI model, and determine a predicted action output strategy currently adopted by the current game character;
and the predicted action output module 14 is used for controlling the game character to output the predicted action according to the predicted action output strategy.
Specifically, the game state information includes position information, attribute information, combat state information, and skill information, and ray information for detecting obstacles around the AI (only for AI characters). The position information is the current position information of the game role; the attribute information includes weapon type of game character, such as firearm, and other weapon attribute information (weapon bullet total number) and team information for distinguishing the present game character from the hostile game character; the fight state information is a life value of the game character, whether enemies are killed or not, whether the game character is in a squat state or not, whether the bullet remains (only aiming at the AI character) or not, whether the game character is fired or not, whether the game character is stealth or not, and whether the AI is visible to the game character (only aiming at the AI character); the skill information may further include current status information for each skill in the skill column of the game character, where the skill in the skill type encoding skill column is optional, e.g., three skills in the skill column of the game character, the current status information including available status and unavailable status, it being understood that the skills are unavailable status when in recovery.
It should be noted that, before obtaining current game status information of the present game character and the opponent game character, it is required to detect whether the present game character and the opponent game character meet preset fight conditions, where the fight conditions include: the hostile game character is within an attackeable range of the game character. The number of the hostile game roles in the attackeable range of the game role can be multiple, and when the hostile game roles are in the special game state, the position information of at least one hostile game role at the next moment can be predicted at the same time.
Illustratively, the special game state includes at least one of a blood volume below a preset blood volume threshold, a skill support volume below a preset skill support volume threshold, the blood volume being indicative of a life value of the game character, the skill support volume being indicative of a skill availability of the game character. Since the action behavior of the opponent game character is easily limited when the blood volume or the skill support volume is low, the accuracy of predicting the position information of the opponent game character at the next time is relatively high for the opponent game character with the blood volume or the skill support volume being low because the action behavior is relatively regular. It should be noted that the blood volume threshold and the skill support volume threshold may be set according to different types of game characters, and the present invention is not limited herein.
Specifically, the predicted position information obtaining module 12 predicts the predicted position information of the opponent game character at the next time according to the current game state information of the opponent game character after obtaining the current game state information of the opponent game character.
For example, the current game state information of the hostile game character may be input into a preset position prediction model, and the position prediction model outputs predicted position information of the hostile game character at the next moment according to the current game state information of the hostile game character. When sample data are acquired in the training process of the position prediction model, samples of enemy blood volume lower than a blood volume threshold and skill support volume lower than a skill support volume threshold are required to be respectively extracted, game state information of the enemy game character is used as training data (model input), position information of the character at the next moment is extracted as target data (model output), prediction is carried out according to a multi-layer perceptron model, and mean square error is used as a loss function training of the model. It should be noted that, the specific application of the multi-layer perceptron model and the loss function using the mean square error as the model may refer to the prior art, and will not be described herein.
Specifically, the predicted action output policy determining module 13 inputs current game state information of the present game character and predicted position information of the hostile game character into an AI model, so that the AI model outputs a predicted action output policy of the hostile game character, the predicted action output policy including at least one of a determination policy of an attack target, a determination policy of a game action, and a determination policy of a movement path.
For example, when two hostile game characters (such as hostile game character a and hostile game character B) exist within the attackeable range of the present game character, the AI model may determine a primary attack target according to predicted position information of the hostile game character, for example, select hostile game character a whose predicted position information is closer to the present game character as the attack target. After determining the attack target, the AI model may formulate a path of movement for the character to move to the attack target and game actions to take, such as the game actions to employ a specified skill attack.
In order to keep the input dimensions of the AI model uniform, when the predicted position information is input to the AI model, since the position of the other opponent game character is not predicted, the predicted value is not present, and at this time, it is necessary to fill the predicted value of the position of the other opponent game character with 0, and the predicted position information of the opponent game character in the special game state having the predicted value is input to the AI model.
Optionally, the AI model is obtained by training a preset AI model training method, and the AI model training method includes:
s31, acquiring sample data; wherein the sample data includes game status data of a game character and actual rewards;
s32, inputting the game state data into the AI model to generate predicted rewards of the game roles;
s33, calculating a loss function of the AI model according to the predicted rewards and the actual rewards; wherein, the actual rewards are calculated by a preset rewarding mechanism;
s34, optimizing the AI model according to the loss function until the loss function converges
Specifically, in step S31, the game state data includes: the game character includes first state information of a step at a current time, first action information of the step at the current time, and second state information of the step at a next time.
Specifically, in step S32, the AI model in the embodiment of the present invention is a DDPG model, and the DDPG model has two networks: an actor network and a critic network. Inputting the first state information and the first action information into a critic network to obtain a first state action value of the current time step; then, inputting the second state information of the next step into an actor network to obtain the predicted action of the next step, namely second action information, and inputting the second state information and the second action information into a critic network to obtain a second state action value of the next step; and finally, taking the difference between the first state action value and the second state action value to obtain a value difference value as the predicted reward.
It should be noted that the action information refers to actions taken by the virtual character in the environment; the status information is the result of taking the action and is reflected in the status of the game. For example, if the action information is shooting, the corresponding state information is enemy blood drop, enemy death and the like; for example, if the action information is jumping, if the box body is jumped up, the corresponding state information is that the height of the box body is increased. In the embodiment of the present invention, the state obtained from the game environment once every n frames is called a time step, and it is understood that the current time step and the next time step are two consecutive time steps. State action value refers to the expected return on a state and action, such as a return being an accumulated value of a game prize, and expected return being an average of returns for a plurality of games. The first network is a neural network, its input is state and action, and its output is state action value, and the neural network parameters are back-propagated and regulated by means of loss function so as to make them be close to actual one.
Specifically, in steps S33 to S34, the bonus mechanism indicates that the game character may obtain a corresponding bonus when some conditions are satisfied in the course of the game, for example, the bonus mechanism includes: when the operation state of the virtual character is normal operation, giving forward rewards; and when the operation state of the virtual character is irregular operation, giving negative rewards. The critic network and the actor network are further divided into an evaluation network and a target network respectively, wherein the target network does not participate in training, and parameters are periodically copied from the evaluation network. The predicted actions are output through the actor network. Illustratively, the sum of squares of the difference between the predicted and actual rewards is taken as the loss function of the critic network, and the loss function of the actor network is taken as the predicted value of the critic network. The critic network loss function satisfies the following formula:
Wherein N is the total number of the sample data; y is i =r i +γQ′(s i+1 ,μ′(s i+1μ′ )|θ Q′ );r i A reward for the ith time step; gamma is the super-ginseng, discount factor; q (s, a|theta) Q ) For critic network, the parameter is θ Q ;μ(s|θ μ ) Is an actor network, and the parameter is theta mu; q' is the target network of the critic network; μ' is a target network of the actor network, the target network parameters do not participate in training, the parameters are updated from the Q network and the μ network at regular intervals, and the actor network updates the parameters by maximizing the predicted value of the critic network.
Specifically, the predicted action output module 14 controls the game character to output the corresponding predicted action according to the predicted action output policy after determining the predicted action output policy. When the game character outputs the predicted action, the rationality of the predicted action needs to be judged, and if the predicted action is unreasonable, the sampling probability of the unreasonable action needs to be adjusted.
Optionally, the prediction action output module 14 is configured to:
when unreasonable actions exist in the predicted actions, an action limiting mechanism corresponding to the predicted actions is acquired;
performing action limiting operation on the unreasonable actions according to the action limiting mechanism so as to adjust sampling probability of the unreasonable actions;
And outputting the predicted action after the action limiting operation is finished.
Specifically, the output of the AI model is limited by judging whether the predicted action accords with a normal rule, and the AI model outputs a probability of selecting an action for discrete actions, and the sampling probability of the discrete actions is adjusted to 0 for unreasonable actions, so that the actions cannot be sampled.
Illustratively, when the predicted action is movement path selection, the irrational action is: in the direction of advance of the presence of the obstacle; then, the action limiting mechanism is: the sampling probability of the forward direction in which an obstacle exists is adjusted to 0. When the predicted action output is attack target selection, the unreasonable action output is: selecting an invisible attack target; then, the action limiting mechanism is: and adjusting the sampling probability of the invisible attack target to 0. When the predicted action is a game action taken, the unreasonable action output is: selecting skills in a cooling state; then, the action limiting mechanism is: the sampling probability of the skill in the cool state is adjusted to 0.
For example, the predicted action is a target for selecting attack, the AI model may select an enemy which is not visible from the visual field, at this time, a certain process is required to be performed on the action output of the AI model, and the probabilities of selecting the enemy 1, the enemy 2 and the enemy 3 are output as the probabilities of selecting the enemy 1, the enemy 2 and the enemy 3, and the sum of the probabilities is 1, and if the enemy 1 is not visible at this time, the probability of selecting the enemy 1 is adjusted to 0, and only the visible enemy is selected.
Further, when the predicted action is the pitch angle of the present game character; the prediction action output module is further configured to:
acquiring an application scene of the game character using pre-aiming skills;
when the game character does not see an attack target, using pre-aiming skills, and outputting the AI model predicted angle as a pitching angle;
and when the game character sees the attack target, calculating a preset position included angle between the game character and the attack target as a pitching angle by using the pre-aiming skill.
Illustratively, the pitch angle is a pitch angle, representing the pitch angle that the present game character adopts when using its pre-aiming skills. When the game character does not see the attack target, the pre-aiming skill is used, the AI model predicted angle is output as the pitching angle, and the game character cannot directly calculate the pitching angle at the moment because the game character does not see the attack target in the visual field range, so that the AI model predicted angle can be directly adopted as the pitching angle. When the game character sees the attack target, the pre-aiming skill is used, the pitching angle of the game character can be directly calculated, and if the pre-set position is the head, the included angle between the connecting line of the head of the game character and the head of the hostile game character and the horizontal line is used as the pitching angle.
Further, after the predicted motion after the motion limiting operation is performed is output from the predicted motion output module 14, the predicted motion output module 14 is further configured to:
and inputting the predicted actions after the action limiting operation is performed into the AI model as sample data to train the AI model.
For example, the predicted motion after the motion limitation is performed may be input as sample data to the AI model, the AI model may be trained, and if the motion limitation is not performed, the predicted motion is directly input to the AI model, the invalid sample generated by the AI model may be too many and cannot be applied to the AI model, so, in order to reduce the invalid sample, the predicted motion is limited, the training speed of the AI model is faster, and the better the AI model is trained, the less the unreasonable motion output is generated.
Compared with the prior art, the game character action control device 10 disclosed in the embodiment of the invention firstly obtains the current game state information of the game character and obtains the current game state information of the hostile game character in the special game state; then, predicting predicted position information of the hostile game role at the next moment according to the current game state information of the hostile game role; and finally, inputting the current game state information of the game character and the predicted position information of the hostile game character into an AI model, and determining the current predicted action output strategy adopted by the game character so as to control the game character to output the predicted action according to the predicted action output strategy. When the current predicted action output strategy of the game role is determined, the game state information of the hostile game role is fully considered, so that the predicted position information of the hostile game role at the next moment can be predicted based on the game state information of the hostile game role, the game role can make a corresponding strategy in advance, the skill hit rate of the game role is further improved, and the predicted output result of the game role action is accurate and reasonable.
Referring to fig. 3, fig. 3 is a block diagram showing a configuration of a game character motion control apparatus 20 according to an embodiment of the present invention; the game character motion control apparatus 20 includes: a processor 21, a memory 22 and a computer program stored in said memory 22 and executable on said processor 21. The processor 21, when executing the computer program, implements the steps of the respective game character action control method embodiments described above. Alternatively, the processor 21 may implement the functions of the modules/units in the above-described device embodiments when executing the computer program.
Illustratively, the computer program may be partitioned into one or more modules/units that are stored in the memory 22 and executed by the processor 21 to complete the present invention. The one or more modules/units may be a series of computer program instruction segments capable of accomplishing specific functions for describing the execution of the computer program in the game character action control device 20.
The game character motion control device 20 may be a computing device such as a desktop computer, a notebook computer, a palm computer, a cloud server, etc. The game character motion control device 20 may include, but is not limited to, a processor 21, a memory 22. Those skilled in the art will appreciate that the schematic diagram is merely an example of a game character motion control device 20 and does not constitute a limitation of the game character motion control device 20, and may include more or less components than illustrated, or may combine certain components, or different components, e.g., the game character motion control device 20 may further include input and output devices, network access devices, buses, etc.
The processor 21 may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, and the processor 21 is a control center of the game character motion control device 20, and connects the respective parts of the entire game character motion control device 20 using various interfaces and lines.
The memory 22 may be used to store the computer program and/or module, and the processor 21 implements various functions of the game character action control device 20 by running or executing the computer program and/or module stored in the memory 22 and invoking data stored in the memory 22. The memory 22 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data (such as audio data, phonebook, etc.) created according to the use of the handset, etc. In addition, the memory 22 may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, memory, plug-in hard disk, smart Media Card (SMC), secure Digital (SD) Card, flash Card (Flash Card), at least one disk storage device, flash memory device, or other volatile solid-state storage device.
Wherein the modules/units integrated by the game character motion control device 20 may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as a separate product. Based on such understanding, the present invention may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and the computer program may implement the steps of each of the method embodiments described above when executed by the processor 21. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth.
It should be noted that the above-described apparatus embodiments are merely illustrative, and the units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. In addition, in the drawings of the embodiment of the device provided by the invention, the connection relation between the modules represents that the modules have communication connection, and can be specifically implemented as one or more communication buses or signal lines. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
While the foregoing is directed to the preferred embodiments of the present invention, it will be appreciated by those skilled in the art that changes and modifications may be made without departing from the principles of the invention, such changes and modifications are also intended to be within the scope of the invention.

Claims (8)

1. A game character motion control method, comprising:
Acquiring current game state information of the game character, and acquiring current game state information of an opponent game character in a special game state;
predicting predicted position information of the hostile game role at the next moment according to the current game state information of the hostile game role;
inputting current game state information of the game character and predicted position information of the hostile game character into an AI model, and determining a predicted action output strategy currently adopted by the game character; the predicted action output strategy comprises at least one of an attack target determination strategy, a game action determination strategy and a moving path determination strategy;
controlling the game role to output the predicted action according to the predicted action output strategy;
wherein the controlling the game character to output the predicted action according to the predicted action output strategy comprises:
when unreasonable actions exist in the predicted actions, an action limiting mechanism corresponding to the predicted actions is acquired;
performing action limiting operation on the unreasonable actions according to the action limiting mechanism so as to adjust sampling probability of the unreasonable actions;
and outputting the predicted action after the action limiting operation is finished.
2. The game character motion control method according to claim 1, wherein outputting the predicted motion after the motion restriction operation is performed, further comprises:
and inputting the predicted actions after the action limiting operation is performed into the AI model as sample data to train the AI model.
3. The game character motion control method according to claim 1, wherein the predicted motion is a pitch angle of the present game character; then, controlling the game character to output the predicted action according to the predicted action output strategy, and further comprising:
acquiring an application scene of the game character using pre-aiming skills;
when the game character does not see an attack target, using pre-aiming skills, and outputting the AI model predicted angle as a pitching angle;
and when the game character sees the attack target, calculating a preset position included angle between the game character and the attack target as a pitching angle by using the pre-aiming skill.
4. The game character motion control method of claim 1, wherein the AI model includes a critic network and an actor network, and the predicted motion is output through the actor network.
5. The game character motion control method according to any one of claims 1 to 4, wherein the AI model is trained by a preset AI model training method, the AI model training method comprising:
acquiring sample data; wherein the sample data includes game status data of a game character and actual rewards;
inputting the game state data into the AI model to generate a predicted prize for the game character;
calculating a loss function of the AI model according to the predicted rewards and the actual rewards; wherein, the actual rewards are calculated by a preset rewarding mechanism;
and optimizing the AI model according to the loss function until the loss function converges.
6. A game character motion control device, comprising:
the game state information acquisition module is used for acquiring current game state information of the game role and acquiring current game state information of the hostile game role in a special game state;
the predicted position information acquisition module is used for predicting predicted position information of the hostile game role at the next moment according to the current game state information of the hostile game role;
The predicted action output strategy determining module is used for inputting the current game state information of the game character and the predicted position information of the hostile game character into the AI model to determine the predicted action output strategy currently adopted by the game character; the predicted action output strategy comprises at least one of an attack target determination strategy, a game action determination strategy and a moving path determination strategy;
the predicted action output module is used for controlling the game role to output predicted actions according to the predicted action output strategy;
wherein, the prediction action output module is used for:
when unreasonable actions exist in the predicted actions, an action limiting mechanism corresponding to the predicted actions is acquired;
performing action limiting operation on the unreasonable actions according to the action limiting mechanism so as to adjust sampling probability of the unreasonable actions;
and outputting the predicted action after the action limiting operation is finished.
7. A game character motion control apparatus comprising a processor, a memory and a computer program stored in the memory and configured to be executed by the processor, the processor implementing the game character motion control method according to any one of claims 1 to 5 when executing the computer program.
8. A computer readable storage medium, characterized in that the computer readable storage medium comprises a stored computer program, wherein the computer program, when run, controls a device in which the computer readable storage medium is located to perform the game character action control method according to any one of claims 1 to 5.
CN202110769501.6A 2021-07-07 2021-07-07 Game character action control method, apparatus, device and storage medium Active CN113577769B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110769501.6A CN113577769B (en) 2021-07-07 2021-07-07 Game character action control method, apparatus, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110769501.6A CN113577769B (en) 2021-07-07 2021-07-07 Game character action control method, apparatus, device and storage medium

Publications (2)

Publication Number Publication Date
CN113577769A CN113577769A (en) 2021-11-02
CN113577769B true CN113577769B (en) 2024-03-08

Family

ID=78246226

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110769501.6A Active CN113577769B (en) 2021-07-07 2021-07-07 Game character action control method, apparatus, device and storage medium

Country Status (1)

Country Link
CN (1) CN113577769B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115228087A (en) * 2022-07-18 2022-10-25 深圳市大梦龙途文化传播有限公司 Game role control optimization method, device, electronic equipment and medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106390456A (en) * 2016-09-30 2017-02-15 腾讯科技(深圳)有限公司 Generating method and generating device for role behaviors in game
KR20180083703A (en) * 2017-01-13 2018-07-23 주식회사 엔씨소프트 Method of decision making for a fighting action game character based on artificial neural networks and computer program therefor
CN111632379A (en) * 2020-04-28 2020-09-08 腾讯科技(深圳)有限公司 Game role behavior control method and device, storage medium and electronic equipment
KR102154828B1 (en) * 2019-05-13 2020-09-10 숭실대학교산학협력단 Method and apparatus for predicting game results
CN112221120A (en) * 2020-10-15 2021-01-15 网易(杭州)网络有限公司 Game state synchronization method and device, electronic equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106390456A (en) * 2016-09-30 2017-02-15 腾讯科技(深圳)有限公司 Generating method and generating device for role behaviors in game
KR20180083703A (en) * 2017-01-13 2018-07-23 주식회사 엔씨소프트 Method of decision making for a fighting action game character based on artificial neural networks and computer program therefor
KR102154828B1 (en) * 2019-05-13 2020-09-10 숭실대학교산학협력단 Method and apparatus for predicting game results
CN111632379A (en) * 2020-04-28 2020-09-08 腾讯科技(深圳)有限公司 Game role behavior control method and device, storage medium and electronic equipment
CN112221120A (en) * 2020-10-15 2021-01-15 网易(杭州)网络有限公司 Game state synchronization method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN113577769A (en) 2021-11-02

Similar Documents

Publication Publication Date Title
US11291917B2 (en) Artificial intelligence (AI) model training using cloud gaming network
KR102523888B1 (en) Method, Apparatus and Device for Scheduling Virtual Objects in a Virtual Environment
US20220054947A1 (en) Method and apparatus for providing online shooting game
US9679258B2 (en) Methods and apparatus for reinforcement learning
CN108920221B (en) Game difficulty adjusting method and device, electronic equipment and storage medium
CN109529352B (en) Method, device and equipment for evaluating scheduling policy in virtual environment
Zhong et al. Ad-vat+: An asymmetric dueling mechanism for learning and understanding visual active tracking
JP7524451B2 (en) Automated Harassment Monitoring System
CN112791394B (en) Game model training method and device, electronic equipment and storage medium
CN112807681B (en) Game control method, game control device, electronic equipment and storage medium
CN113134233B (en) Control display method and device, computer equipment and storage medium
CN111026272B (en) Training method and device for virtual object behavior strategy, electronic equipment and storage medium
JP7447296B2 (en) Interactive processing method, device, electronic device and computer program for virtual tools
WO2022127277A1 (en) Cheating detection method for shooting game, model training method for shooting game, and device and storage medium
CN113663335B (en) AI model training method, device, equipment and storage medium for FPS game
Poulsen et al. DLNE: A hybridization of deep learning and neuroevolution for visual control
CN113577769B (en) Game character action control method, apparatus, device and storage medium
CN114307160A (en) Method for training intelligent agent
CN113509726B (en) Interaction model training method, device, computer equipment and storage medium
CN117311392A (en) Unmanned aerial vehicle group countermeasure control method and system
CN111265871A (en) Virtual object control method and device, equipment and storage medium
CN114935893B (en) Motion control method and device for aircraft in combat scene based on double-layer model
CN114404976B (en) Training method and device for decision model, computer equipment and storage medium
JP6843410B1 (en) Programs, information processing equipment, methods
CN114611664A (en) Multi-agent learning method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant