CN110496377B - Virtual table tennis player ball hitting training method based on reinforcement learning - Google Patents

Virtual table tennis player ball hitting training method based on reinforcement learning Download PDF

Info

Publication number
CN110496377B
CN110496377B CN201910763946.6A CN201910763946A CN110496377B CN 110496377 B CN110496377 B CN 110496377B CN 201910763946 A CN201910763946 A CN 201910763946A CN 110496377 B CN110496377 B CN 110496377B
Authority
CN
China
Prior art keywords
ball
table tennis
racket
dimensional
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910763946.6A
Other languages
Chinese (zh)
Other versions
CN110496377A (en
Inventor
李桂清
曾繁忠
黎子聪
吴自辉
聂勇伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN201910763946.6A priority Critical patent/CN110496377B/en
Publication of CN110496377A publication Critical patent/CN110496377A/en
Application granted granted Critical
Publication of CN110496377B publication Critical patent/CN110496377B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63BAPPARATUS FOR PHYSICAL TRAINING, GYMNASTICS, SWIMMING, CLIMBING, OR FENCING; BALL GAMES; TRAINING EQUIPMENT
    • A63B69/00Training appliances or apparatus for special sports
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63BAPPARATUS FOR PHYSICAL TRAINING, GYMNASTICS, SWIMMING, CLIMBING, OR FENCING; BALL GAMES; TRAINING EQUIPMENT
    • A63B71/00Games or sports accessories not covered in groups A63B1/00 - A63B69/00
    • A63B71/06Indicating or scoring devices for games or players, or for other sports activities
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63BAPPARATUS FOR PHYSICAL TRAINING, GYMNASTICS, SWIMMING, CLIMBING, OR FENCING; BALL GAMES; TRAINING EQUIPMENT
    • A63B71/00Games or sports accessories not covered in groups A63B1/00 - A63B69/00
    • A63B71/06Indicating or scoring devices for games or players, or for other sports activities
    • A63B71/0619Displays, user interfaces and indicating devices, specially adapted for sport equipment, e.g. display mounted on treadmills
    • A63B2071/065Visualisation of specific exercise parameters
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63BAPPARATUS FOR PHYSICAL TRAINING, GYMNASTICS, SWIMMING, CLIMBING, OR FENCING; BALL GAMES; TRAINING EQUIPMENT
    • A63B2102/00Application of clubs, bats, rackets or the like to the sporting activity ; particular sports involving the use of balls and clubs, bats, rackets, or the like
    • A63B2102/16Table tennis

Abstract

The invention discloses a virtual table tennis player batting training method based on reinforcement learning, which comprises the following steps: 1) designing a task scene and a task flow; 2) training a batting strategy of the racket by using a reinforcement learning method; 3) estimating the motion condition of each joint when the human body hits the ball by using an algorithm of inverse kinematics; 4) the mobility policy of the root node is trained using reinforcement learning. The invention can obtain a virtual player which can hit the ball with reasonable posture and high accuracy by designing a simple reward function under the condition of no training data, does not need to design a complex hitting rule, and simultaneously can stably keep a high frame rate for hitting the ball by the virtual player due to the low consumption characteristic of strengthening learning forward operation, so that a user has good interactive experience.

Description

Virtual table tennis player ball hitting training method based on reinforcement learning
Technical Field
The invention relates to the field of virtual reality and reinforcement learning, in particular to a virtual table tennis player batting training method based on reinforcement learning.
Background
Virtual reality is one of the subjects of intensive research in the computer field, and in recent years, with the advent and development of virtual reality equipment such as HTCVive, Oculus and the like, the development of virtual reality technology has reached a new height, and the application of virtual reality is infinite, so at present, virtual reality has been widely applied to multiple fields such as military affairs, education, entertainment and the like. With the cost reduction and civilization of virtual reality equipment, people have more and more contact times and deeper contact degree to virtual reality application, and have higher and higher requirements on the quality of the virtual reality application, and people not only want to see that the perceived virtual scene is different from a real scene, but also want to have more interactive freedom with the virtual scene and obtain feedback as real as possible. Virtual characters can provide a user with a sense of immersion through actions made in a virtual scene, and such sense of immersion and reality derives from the intelligence and reasonableness of the actions taken by the virtual characters in the virtual scene, and for virtual applications where the characters are human, one would like that the behavior decisions and actions of the virtual human are as similar as possible to those of a real human.
Intelligent avatars, which can be defined as avatars that can act autonomously in a virtual environment or that can generate feedback on environmental changes, are one of the subjects of research in the field of artificial intelligence. For an intelligent virtual role, the core is its action policy, and according to the action policy, the state of the environment where the intelligent virtual role is located is input, and the action to be taken can be output. There are many related works to design action strategies for intelligent virtual roles, and the mainstream algorithms can be divided into two categories, one is rule-based method, and the other is machine learning-based method. The rule-based method means that the action strategy of the intelligent virtual role is artificially customized, and comprises an action strategy design method based on the intelligent virtual role such as logic, a state machine and a strategy tree. The main problem with rule-based methods is that it is thought to be difficult to set rules when faced with more complex problems. For example, designing a virtual character shooting in a maze needs to consider how to move, when to shoot, and where to shoot in a complex maze, and making a rule to make the virtual character complete such a complex decision needs to design a very complex character logic, and also greatly increases the amount of calculation. With the development of machine learning, when the design problem of a complex virtual character is faced, the machine learning method can simplify and process the problem to a great extent. Reinforcement learning belongs to a branch of machine learning, which performs well when dealing with well-defined tasks.
In the design task of the virtual table tennis player, the rule-based method needs to design complex batting rules, the design difficulty is high, and the operation cost is high, while the method based on simulation learning or supervised learning needs to collect training data for training, so the training cost is high.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide an effective, scientific and reasonable virtual table tennis player hitting training method based on reinforcement learning.
In order to achieve the purpose, the technical scheme provided by the invention is as follows: a virtual table tennis player hitting training method based on reinforcement learning comprises the following steps:
1) designing a task scene and a task flow;
2) training a batting strategy of the racket by using a reinforcement learning method;
3) estimating the motion condition of each joint when the human body hits the ball by using an algorithm of inverse kinematics;
4) the mobility policy of the root node is trained using reinforcement learning.
In the step 1), designing a task scene to model a virtual table tennis player, building a virtual table tennis court at Unity3D, and setting the size of the court, the position and size of a table tennis table, the height of a net, the size of a table tennis ball, the size of a table tennis bat collision bounding box and the origin of a world coordinate system;
the design task flow specifically comprises the following steps: when the table tennis touches a wall, a floor or the bat hits the table tennis to the table top or the net at the end B of the table, the round is finished, and the hit ball is determined to be failed; when the bat strikes a table tennis ball on the table top at the A end of the table, the turn is over, and the striking is considered to be successful; when the turn is over, the table tennis bat is reset to the initial position to wait for the next turn to start.
In step 2), training a batting strategy by using a neural network based on a reinforcement learning method, and comprising the following steps:
2.1) design observations
The observation refers to data collected by the virtual character in a virtual environment, and the observation of the batting strategy training of the racket is set to be 4 three-dimensional vectors { pball,vball,pballmat,rballmatIn which p isballIs the position of the table tennis ball, vballSpeed of table tennis, pballmatPosition of table tennis bat, rballmatThe result of dividing the rotation angle of the table tennis bat by 360 degrees is observed for 1 time per frame, and the observation collected for every 3 frames is used as the input of a batting strategy training network of the table tennis bat;
2.2) design behavior
The behavior is estimated according to the observation data, the behavior of the virtual character is trained according to the batting strategy of the racket, and the behavior is a 9-dimensional vector { T }ballbat,RballbatS, C, F }, wherein T isballbatStandardizing each component of the three-dimensional translation amount of the racket to be between 0 and 1, and respectively multiplying the three-dimensional translation amount by a weight coefficient w for controlling the moving speed of the racket in three directionsx、wyAnd wzThe reasonable motion speed of the racket is ensured; rballbatThe rotation angle vectors of the racket around three coordinate axes are multiplied by weight coefficients w respectively when in actual outputu,wvAnd ww(ii) a S is the time for determining the moment for entering the batting preparation action, and when S is more than zero, the racket enters the batting preparation action; c is a ball hitting action, when C is larger than 0, the ball is hit by using a forehand ball hitting action, and when C is smaller than 0, the ball is hit by using a backhand ball hitting action; f, determining the hitting force, and applying a force C to the table tennis along the face direction when the bat collides with the table tennisF+wFF,CFBasic force of impact, wFIs the weight of F, in order to allow F to control the striking force more accurately, wFShould not be too large, set CF=-0.4+0.2×Zd,ZdA value in the z direction for a desired impact drop point;
2.3) design reward function
The reward function for the shot strategy training is set as follows:
Rbat=wlimitRlimit+wgoalRgoal+wsupportRsupport
the reward function comprises three items of content, namely constraint, target and auxiliary;
Rlimitas a function for constraining the behavior of the racquet, unreasonable situations that should not occur are penalized:
Figure BDA0002171304260000041
Rpositionlimitas a function of the range of motion of the racquet, for limiting the range of motion of the racquet:
Figure BDA0002171304260000042
Ractionlimitfor constraining whether the racket enters the preparatory movement as a function of whether the racket enters the preparatory movement:
Figure BDA0002171304260000043
wlimitis RlimitA weight;
Rgoalthe method belongs to a target driving function and is used for driving a role to complete a game target:
Figure BDA0002171304260000044
wgoalis RgoalA weight;
Rsupportis an auxiliary function, and drives the racket to complete the batting task through a series of prior knowledge:
Rsupport=Rhit+Rangle+Rheight+Rdroppoint
Rhitas a function of the behavior of the hit ball, it is ensured that when the racket hits the ball, a positive feedback is given whether the ball can be hit back to the opponent's table or not, and that when the racket does not hit the ball, a negative feedback is given:
Figure BDA0002171304260000045
Rangleas a function of the angle of attack of the table tennis ball, for measuring the angle of attack of the table tennis ball:
Figure BDA0002171304260000046
Figure BDA0002171304260000051
when the table tennis is contacted with the bat, the normal vector of the bat surface of the table tennis is projected to the x-z plane,
Figure BDA0002171304260000052
the instantaneous speed of the table tennis ball contacting the table tennis bat is projected to an x-z plane;
Rheightas a function of the ping-pong ball striking height and elevation angle, for measuring the ping-pong ball striking height and elevation angle:
Figure BDA0002171304260000053
h is the height of the table tennis ball contacting the table tennis bat, nyThe projection of the normal direction of the table tennis surface on the y-axis direction when the table tennis ball contacts the table tennis bat;
Rdroppointas a function of the table tennis ball drop point, for measuring the table tennis ball drop point:
Rdroppoint=1-|pz-Zg|
pz is the Z-direction value of the table tennis ball drop point, ZgAt the place back of the middle of the opposite table surface, when the ball falls on ZgWhen the device is nearby, the device is not easy to go out of bounds and fall into the net;
wsupportis RsupportA weight;
2.4) design network and training parameters
Setting the input of a neural network as a 36-dimensional vector and the output as a 9-dimensional vector, wherein the whole neural network comprises 4 hidden layers and 1 output layer, and each hidden layer comprises 512 neurons; after the network is established, a near-end strategy optimization algorithm, namely a PPO algorithm, is used for training the neural network to obtain a batting strategy.
In the step 3), the motion condition of each joint is estimated by using an algorithm of inverse kinematics, so that the unreasonable stretching and twisting of the posture can not occur, and the method comprises the following steps:
3.1) simplifying the skeleton of the three-dimensional human Body model by using a whole Body skeleton inverse dynamics algorithm, namely a Full Body Biped IK algorithm, wherein the simplified skeleton has 14 joints which are respectively a crotch, a head, left and right legs, left and right shanks, left and right soles, left and right big arms, left and right small arms and left and right palms, and the left and right shoulders, the left and right legs, the left and right feet and the left and right hands all comprise reactors;
3.2) binding the handle part of the racket with the end reactor of the right hand, using Full Body Biped IK algorithm to regard the right arm as a joint chain when the racket moves, and solving each joint point position on the joint chain of the right arm by using FABRIK algorithm for solving inverse kinematics problem in an iterative way;
3.3) using Full Body Biped IK algorithm to correspondingly adjust the positions of all joint points of the Body in a small range according to the change of the right hand arm, thereby solving and obtaining the positions of all joint points;
and 3.4) calculating the motion situation of each joint point of the three-dimensional human Body model under the condition that the right hand holds the racket and the root node is not moved by using a Full Body double IK algorithm.
In step 4), because the root node is not moved by the inverse kinematics, the movement of the root node is controlled by using a root node movement strategy based on reinforcement learning, and the overall human body posture is more reasonable by matching the inverse kinematics, which comprises the following steps:
4.1) design Observation
Set the observations of the root node mobility strategy training as 6 three-dimensional vectors { p }agent,ragent,nspine,pref,rref,vrefIn which p isagentIs the position of the phantom, ragentIs the orientation, n, of the manikinspineIs the direction of the human spine, prefIs the position of the racket rrefIs the orientation v of the racketrefIs the instantaneous speed of the racket, each frame is observed for 1 time, and the observation collected for every 3 frames is used as a root node moving strategy training networkInputting;
4.2) design behavior
The behavior of the training for setting the root node mobility strategy is a 3-dimensional vector tx,tz,ryWhere t isxAnd tyRepresenting the movement of the phantom in the x-axis and z-direction, r, respectivelyyThen the rotation of the manikin about the y-axis, tx、tzAnd ryAre all automatically standardized to [ -1,1 [)]Within the interval, the weight coefficients are multiplied before output
Figure BDA0002171304260000061
And
Figure BDA0002171304260000062
4.3) design reward function
The reward function of the root node mobile strategy training is set as follows:
Rmove=wplimitRplimit+wleaveRleave+wposeRpose+wdiviationRdiviation
Rplimitas a function of the range of motion of the mannequin to limit the range of motion of the mannequin:
Figure BDA0002171304260000071
wplimitis RplimitA weight;
Rleavepreventing a racquet from losing hands by measuring the distance of the racquet tip from the hand as a function of the distance of the racquet tip from the hand:
Figure BDA0002171304260000072
Figure BDA0002171304260000073
d is the distance between the handle and the palm of the racketIon, phandAs three-dimensional coordinates of the palm, pbatIs the three-dimensional coordinate of the racket handle;
wleaveis RleaveA weight;
Rposeas a function of the ball striking position, for measuring the plausibility of the ball striking position:
Figure BDA0002171304260000074
Figure BDA0002171304260000075
Figure BDA0002171304260000076
Figure BDA0002171304260000077
Rforehandand RbackhandThe cos α represents the included angle between the connecting line of the holding clap and the root node and the unit vector in the x direction under the local coordinate system of the three-dimensional human body model, and p is the reward function corresponding to the forehand batting and the backhand batting respectivelyhandThree-dimensional world coordinates for hand-held clappers, prootIs the three-dimensional world coordinates of the root node,
Figure BDA0002171304260000081
is a unit vector (1, 0, 0) in the x direction under the SMP L human model local coordinate system;
wposeis RposeA weight;
Rdiviationformulating a reward as a function of the offset of the manikin spine, according to the offset of the manikin spine:
Figure BDA0002171304260000082
Figure BDA0002171304260000083
cos β defines the offset of the human spine,
Figure BDA0002171304260000084
is the three-dimensional coordinate of the human model neck under the current action,
Figure BDA0002171304260000085
is the three-dimensional coordinate of the root node of the human model under the current action,
Figure BDA0002171304260000086
is the three-dimensional coordinates of the manikin neck under the initial motion,
Figure BDA0002171304260000087
three-dimensional coordinates of a root node of the human body model under the initial action are obtained;
wdiviationis RdiviationA weight;
4.4) design network and training parameters
Setting the input of a neural network as a 54-dimensional vector and the output as a 3-dimensional vector, wherein the whole neural network comprises 3 hidden layers and 1 output layer, and each hidden layer comprises 512 neurons; after the network is established, a near-end strategy optimization algorithm, namely a PPO algorithm, is used for training the neural network to obtain a root node movement strategy.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. the virtual player designed by the invention can complete the ball hitting task with reasonable ball hitting action and higher ball hitting success rate, rarely has the conditions of net falling and falling on the own desktop, and can accurately hit the flying-to-ground table tennis back to the desktop of the other side.
2. The invention does not need to design a complex batting rule, and has smaller design difficulty and low operation cost.
3. The invention can obtain a virtual player which can hit the ball with reasonable posture and high accuracy by designing a simple reward function under the condition of no training data without collecting data in advance.
4. The virtual player designed by the invention can stably keep a high frame rate for hitting the ball because of the low consumption characteristic of the reinforcement learning forward operation, and has good interactive experience for users.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
FIG. 2 is a task scenario design diagram of the present invention.
Fig. 3 is a schematic view of a racquet being too far forward resulting in a hand drop.
Fig. 4 is a schematic diagram of a reasonable position of the racket.
Fig. 5 is a schematic diagram of a hitting strategy training network according to the present invention.
FIG. 6 is a schematic view of a mannequin inserted into a table too far forward.
Fig. 7 is a schematic view of a ball hitting sequence.
Detailed Description
The present invention will be further described with reference to the following specific examples.
In this embodiment, a virtual table tennis player is designed, which can hit a table tennis ball to an opposite table top with a reasonable ball hitting posture and a high ball hitting success rate, and the main flow is as shown in fig. 1, and the virtual table tennis player ball hitting training method based on reinforcement learning includes the following steps:
1) designing task scene and task flow
Designing a task scene, modeling a virtual table tennis player by using an SMP L algorithm, and building a virtual table tennis course on Unity3D, wherein the size of the course is 8m × 16m as shown in (a) in FIG. 2, the 4 surfaces of the course are provided with walls with the height of 4m, a table tennis table is arranged in the middle of the course, the size of the table tennis table is 2.74m × 1.525.525 m × 0.76.76 m as shown in (b) in FIG. 2, the height of a net is 0.1525m, the size of a table tennis ball is 0.04m, the size of a racket is 0.158m × 0.152.152 m, the size of a racket collision bounding box is 0.16m × 0.1m × 0.18.18 m, and the origin of a world coordinate system is the center of the course.
Designing a task flow: at the beginning of the turn, the virtual serve will serve the ball from the random position of the table a end to the table B end at a randomly assigned speed, and when the table tennis touches the wall, floor or the table bat hits the table or net at the table B end with the table tennis, the turn ends and the shot is deemed to have failed. When the bat strikes a table tennis ball onto the table top at the A end of the table, the turn is over and the stroke is deemed to be successful. When the turn is over, the table tennis bat is reset to the initial position to wait for the next turn to start.
2) Training a hitting strategy of the racket by using a reinforcement learning method, comprising the following steps of:
2.1) design observations
The observation refers to data collected by the virtual character in a virtual environment, and the observation of the batting strategy training of the racket is set to be 4 three-dimensional vectors { pball,vball,pballmat,rballmatIn which p isballIs the position of the table tennis ball, vballSpeed of table tennis, pballmatPosition of table tennis bat, rballmatThe observation is performed 1 time per frame after the rotation angle of the table tennis bat is divided by 360 degrees, and the observation collected for every 3 frames is used as the input of a batting strategy training network of the table tennis bat.
2.2) design behavior
The behavior is estimated according to the observation data, the behavior of the virtual character is trained according to the batting strategy of the racket, and the behavior is a 9-dimensional vector { T }ballbat,RballbatS, C, F }, wherein T isballbatStandardizing each component of the three-dimensional translation amount of the racket to be between 0 and 1, and respectively multiplying the three-dimensional translation amount by a coefficient w for controlling the moving speed of the racket in three directionsx、wyAnd wzEnsuring the relatively reasonable motion speed of the racket, and taking wx=0.25、wy0.07 and wz=0.07;RballbatThe rotation angle vectors of the racket around three coordinate axes are multiplied by weight coefficients w respectively when in actual outputu,wvAnd wwTaking wu=1.5、wv2 and ww0.5; s is the timing for determining the putting-in preparation action, when S is larger than zero, the racket is put in preparation actionMaking; c is a ball hitting action, when C is larger than 0, the ball is hit by using a forehand ball hitting action, and when C is smaller than 0, the ball is hit by using a backhand ball hitting action; f, determining the hitting force, and applying a force C to the table tennis along the face direction when the bat collides with the table tennisF+wFF,CFBasic force of impact, wFIs the weight of F, in order to allow F to control the striking force more accurately, wFShould not be too large, take wF=1,CF=-0.4+0.2×Zd,ZdTaking the value of the desired hitting point in the Z directiond0.8m, i.e. CF=1.2。
2.3) design reward function
Rbat=wlimitRlimit+wgoalRgoal+wsupportRsupport
The reward function comprises three items of content, namely constraint, target and auxiliary;
Rlimitfor constraining the behaviour of the racket, penalizing unreasonable situations that should not occur:
Figure BDA0002171304260000111
Rpositionlimitfor limiting the range of motion of the racket:
Figure BDA0002171304260000112
since the racket is connected with the tail end of the arm, if the racket is too far forward and exceeds the reachable range of the arm, the racket can twist the arm and even fall off the hand, as shown in fig. 3; it is the position limiting function that is used to limit the range of motion of the racquet, and in the present invention, it is considered reasonable as long as the z-axis coordinate of the racquet does not exceed-0.9, and the position where z is-0.9 is shown in FIG. 4, so that a penalty is triggered only when z > -0.9, i.e., the racquet crosses the red line of FIG. 4, RpositionlimitEach frame triggers one detection, and the unified settlement is carried out after the turn is finished;
Ractionlimitfor restraining whether the racket enters a preparatory action:
Figure BDA0002171304260000113
when the bat collides with the table tennis ball, R needs to be triggered onceactionlimitDetecting, and if the preparation action is not carried out during collision, punishing;
wlimitis RlimitWeight, set to 100;
Rgoalthe method belongs to a target driving function and is used for driving a role to complete a game target:
Figure BDA0002171304260000121
according to the rules of table tennis, when the table tennis is not successfully hit, the table tennis is hit to a net, the table tennis is hit to the boundary of the own square of the table, the table tennis is hit out of the boundary and bounced on the table top of the own square for 0 time or 2 times or more when the table tennis is hit, the hitting is considered as failed, and when the table tennis is hit to the boundary of the table top of the other side under the condition that the table tennis is bounced on the table top of the own square for 1 time, the hitting is considered as successful, R isgoalPerforming 1 detection at the end of the round;
wgoalis RgoalWeight, set to 2;
Rsupportis an auxiliary function, and drives the racket to complete the batting task through a series of prior knowledge:
Rsupport=Rhit+Rangle+Rheight+Rdroppoint
Rhitensure that when the racket hits the ball, no matter whether the ball can be hit back to the opposite table or not, positive feedback is given, and when the racket does not hit the ball, negative feedback is given:
Figure BDA0002171304260000122
Ranglefor measuring the striking angle of a table tennis:
Figure BDA0002171304260000123
Figure BDA0002171304260000124
when the table tennis is contacted with the bat, the normal vector of the bat surface of the table tennis is projected to the x-z plane,
Figure BDA0002171304260000125
the instantaneous speed of the table tennis ball contacting the table tennis bat is projected to an x-z plane;
Rheightfor measuring the ping-pong ball striking height and elevation angle:
Figure BDA0002171304260000126
h is the height of the table tennis ball contacting the table tennis bat, nyThe projection of the normal direction of the table tennis surface on the y-axis direction when the table tennis ball contacts the table tennis bat;
Rdroppointthe method is used for measuring the drop point of the table tennis:
Rdroppoint=1-|pz-Zg|
pz is the Z-direction value of the table tennis ball drop point, ZgThe position of the middle and the back of the opposite desktop is set to be 0.8;
wsupportis RsupportThe weight is set to 1.
2.4) design network and training parameters
Setting the input of the neural network as a 36-dimensional vector and the output as a 9-dimensional vector, wherein the whole neural network comprises 4 hidden layers and 1 output layer, and each hidden layer comprises 512 neurons, as shown in fig. 5; after the network is established, a near-end strategy optimization (PPO) algorithm is used for training the neural network to obtain a batting strategy.
3) The method for estimating the motion condition of each joint when a human body hits the ball by using an inverse kinematics algorithm comprises the following steps:
3.1) using a Full Body inverse dynamics (Full Body Biped IK) algorithm to simplify the skeleton of the three-dimensional human model, wherein the simplified skeleton has 14 joints which are respectively a crotch, a head, left and right legs, left and right crus, left and right soles, left and right big arms, left and right small arms and left and right palms, and the left and right shoulders, the left and right legs, the left and right feet and the left and right hands all comprise reactors.
3.2) binding the handle part of the racket and the end reactor of the right hand together, using Full Body Biped IK algorithm to regard the right arm as a joint chain when the racket moves, and solving each joint point position on the joint chain of the right arm by using FABRIK algorithm which solves inverse kinematics problem in an iterative way.
3.3) using Full Body Biped IK algorithm to correspondingly adjust the positions of all the joint points of the Body in a small range according to the change of the right hand arm, thereby solving and obtaining the positions of all the joint points.
And 3.4) calculating the motion situation of each joint point of the three-dimensional human Body model under the condition that the right hand holds the racket and the root node is not moved by using a Full Body double IK algorithm.
4) Training a mobility strategy of a root node by using reinforcement learning, comprising the following steps:
4.1) design Observation
Set the observations of the root node mobility strategy training as 6 three-dimensional vectors { p }agent,ragent,nspine,pref,rref,vrefIn which p isagentIs the position of the phantom, ragentIs the orientation, n, of the manikinspineIs the direction of the human spine, prefIs the position of the racket rrefIs the orientation v of the racketrefThe instantaneous speed of the racket is observed for 1 time per frame, and the observation collected for every 3 frames is used as the input of a root node movement strategy training network.
4.2) design behavior
The behavior of the training for setting the root node mobility strategy is a 3-dimensional vector tx,tz,ryWhere t isxAnd tzRepresenting the movement of the phantom in the x-axis and z-direction, r, respectivelyyThen the rotation of the manikin about the y-axis, tx、tzAnd ryAre all automatically standardized to [ -1,1 [)]Within the interval, the weight coefficients are multiplied before output
Figure BDA0002171304260000141
And
Figure BDA0002171304260000142
get
Figure BDA0002171304260000143
And
Figure BDA0002171304260000144
4.3) design reward function
Rmove=wplimitRplimit+wleaveRleave+wposeRpose+wdiviationRdiviation
RplimitFor limiting the range of motion of the manikin:
Figure BDA0002171304260000145
when the position of the human body model is too far forward, the human body model may collide with the ping-pong table or may even be embedded into the ping-pong table, as shown in fig. 6, the edge position of the ping-pong table is z-1.5, so when the position of the human body in the z-axis direction is greater than-1.5, the human body model receives a large penalty, RplimitCarrying out detection once per frame;
wplimitis RplimitWeight, set to 100;
Rleavethe racket is prevented from dropping by measuring the distance between the tail end and the hand part of the racket:
Figure BDA0002171304260000151
Figure BDA0002171304260000152
d is the distance between the handle and the palm of the racket, phandAs three-dimensional coordinates of the palm, pbatIs the three-dimensional coordinate of the racket handle;
wleaveis RleaveWeight, set to 10;
Rposemeasuring the reasonability of the batting posture:
Figure BDA0002171304260000153
Figure BDA0002171304260000154
Figure BDA0002171304260000155
Figure BDA0002171304260000156
Rforehandand RbackhandThe cos α represents the included angle between the connecting line of the hand-held clap and the root node and the unit vector in the x direction under the local coordinate system of the three-dimensional human body model, when the hand-held clap is on the right side of the body, the cos α is more than 0, when the hand-held clap is on the left side of the body, the cos α is more than 0, p is phandThree-dimensional world coordinates for hand-held clappers, prootIs the three-dimensional world coordinates of the root node,
Figure BDA0002171304260000157
a unit vector (1, 0, 0) in the x direction under a human body model local coordinate system;
wposeis RposeA weight, set to 1;
Rdiviationand (3) formulating a reward according to the offset of the spine of the human body model:
Figure BDA0002171304260000161
Figure BDA0002171304260000162
cos β defines the offset of the human spine,
Figure BDA0002171304260000163
is the three-dimensional coordinate of the human model neck under the current action,
Figure BDA0002171304260000164
is the three-dimensional coordinate of the root node of the human model under the current action,
Figure BDA0002171304260000165
is the three-dimensional coordinates of the manikin neck under the initial motion,
Figure BDA0002171304260000166
three-dimensional coordinates of a root node of the human body model under the initial action are obtained;
wdiviationis RdiviationThe weight is set to 1.
4.4) design network and training parameters
Setting the input of a neural network as a 54-dimensional vector and the output as a 3-dimensional vector, wherein the whole neural network comprises 3 hidden layers and 1 output layer, and each hidden layer comprises 512 neurons; after the network is built, a near-end strategy optimization (PPO) algorithm is used for training the neural network to obtain a root node movement strategy.
The method is proved to be feasible through experiments, when a flying table tennis ball is faced, the table tennis ball can be hit to the boundary of the opposite table top by the racket according to a strategy obtained through training in a reasonable motion track, meanwhile, the moving racket is bound with the palm of the right hand of the human body model, then the positions of all joints of the human body are solved by using a reverse kinematics method, and then the root nodes are moved by matching with a root node motion strategy obtained through reinforcement learning training, so that the whole ball hitting action is completed. As shown in fig. 7, the action sequence of two ball hitting actions (forehand hitting and backhand hitting) is performed after the bat hits the table tennis ball, and the ball hitting is completed, in fig. 7, (a) line 1 shows the bat sequence of the forehand hitting, and in fig. 7, (b) line 1 shows the bat sequence of the backhand hitting; then, the motion information of each joint point of the human body is solved by using inverse kinematics, wherein (a) line 2 in fig. 7 shows the action sequence of the virtual player hitting the ball with the front hand in the static state, and (b) line 2 in fig. 7 shows the action sequence of the virtual player hitting the ball with the back hand in the static state; and then, the reinforcement learning-based root node motion algorithm is used for moving the root node, so that the virtual player keeps reasonable hitting action during the motion, wherein (a) the third line in fig. 7 shows the action sequence that the virtual player positively hits the ball in the motion state, and (b) the third line in fig. 7 shows the action sequence that the virtual player reversely hits the ball in the motion state.
The above-mentioned embodiments are merely preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, so that variations based on the shape and principle of the present invention should be covered within the scope of the present invention.
The above-mentioned embodiments are merely preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, so that the changes in the shape and principle of the present invention should be covered within the protection scope of the present invention.

Claims (4)

1. A virtual table tennis player hitting training method based on reinforcement learning is characterized by comprising the following steps:
1) designing a task scene and a task flow;
2) the method based on reinforcement learning, which uses a neural network to train the batting strategy, comprises the following steps:
2.1) design observations
The observation refers to data collected by the virtual character in a virtual environment, and the observation of the batting strategy training of the racket is set to be 4 three-dimensional vectors { pball,vball,pballmat,rballmatIn which p isballIs the position of the table tennis ball, vballSpeed of table tennis, pballmatPosition of table tennis bat, rballmatThe result of dividing the rotation angle of the table tennis bat by 360 degrees is observed for 1 time per frame, and the observation collected for every 3 frames is used as the input of a batting strategy training network of the table tennis bat;
2.2) design behavior
The behavior is estimated according to the observation data, the behavior of the virtual character is trained according to the batting strategy of the racket, and the behavior is a 9-dimensional vector { T }ballbat,RballbatS, C, F }, wherein T isballbatStandardizing each component of the three-dimensional translation amount of the racket to be between 0 and 1, and respectively multiplying the three-dimensional translation amount by a weight coefficient w for controlling the moving speed of the racket in three directionsx、wyAnd wzThe reasonable motion speed of the racket is ensured; rballbatThe rotation angle vectors of the racket around three coordinate axes are multiplied by weight coefficients w respectively when in actual outputu,wvAnd ww(ii) a S is the time for determining the moment for entering the batting preparation action, and when S is more than zero, the racket enters the batting preparation action; c is a ball hitting action, when C is larger than 0, the ball is hit by using a forehand ball hitting action, and when C is smaller than 0, the ball is hit by using a backhand ball hitting action; f, determining the hitting force, and applying a force C to the table tennis along the face direction when the bat collides with the table tennisF+wFF,CFBasic force of impact, wFFor the weight of F, set CF=-0.4+0.2×Zd,ZdA value in the z direction for a desired impact drop point;
2.3) design reward function
The reward function for the shot strategy training is set as follows:
Rbat=wlimitRlimit+wgoalRgoal+wsupportRsupport
the reward function comprises three items of content, namely constraint, target and auxiliary;
Rlimitas a function of the behavior of the racket for constraining, no coincidence should occurPenalizing:
Figure FDA0002445454410000021
Rpositionlimitas a function of the range of motion of the racquet, for limiting the range of motion of the racquet:
Figure FDA0002445454410000022
Ractionlimitfor constraining whether the racket enters the preparatory movement as a function of whether the racket enters the preparatory movement:
Figure FDA0002445454410000023
wlimitis RlimitA weight;
Rgoalthe method belongs to a target driving function and is used for driving a role to complete a game target:
Figure FDA0002445454410000024
wgoalis RgoalA weight;
Rsupportis an auxiliary function, and drives the racket to complete the batting task through a series of prior knowledge:
Rsupport=Rhit+Rangle+Rheight+Rdroppoint
Rhitas a function of the behavior of the hit ball, it is ensured that when the racket hits the ball, a positive feedback is given whether the ball can be hit back to the opponent's table or not, and that when the racket does not hit the ball, a negative feedback is given:
Figure FDA0002445454410000025
Rangleis a table tennisA function of ball strike angle for measuring the ping-pong ball strike angle:
Figure FDA0002445454410000026
Figure FDA0002445454410000031
when the table tennis is contacted with the bat, the normal vector of the bat surface projects to an x-z plane,
Figure FDA0002445454410000032
the instantaneous speed of the table tennis ball contacting the table tennis bat is projected to an x-z plane;
Rheightas a function of the ping-pong ball striking height and elevation angle, for measuring the ping-pong ball striking height and elevation angle:
Figure FDA0002445454410000033
h is the height of the table tennis ball contacting the table tennis bat, nyThe projection of the normal direction of the table tennis surface on the y-axis direction when the table tennis ball contacts the table tennis bat;
Rdroppointas a function of the table tennis ball drop point, for measuring the table tennis ball drop point:
Rdroppoint=1-|pz-Zg|
pz is the Z-direction value of the table tennis ball drop point, ZgAt the place back of the middle of the opposite table surface, when the ball falls on ZgWhen the device is nearby, the device is not easy to go out of bounds and fall into the net;
wsupportis RsupportA weight;
2.4) design network and training parameters
Setting the input of a neural network as a 36-dimensional vector and the output as a 9-dimensional vector, wherein the whole neural network comprises 4 hidden layers and 1 output layer, and each hidden layer comprises 512 neurons; after the network is established, training a neural network by using a near-end strategy optimization algorithm, namely a PPO algorithm, so as to obtain a batting strategy;
3) estimating the motion condition of each joint when the human body hits the ball by using an algorithm of inverse kinematics;
4) the mobility policy of the root node is trained using reinforcement learning.
2. The virtual table tennis player ball-hitting training method based on reinforcement learning of claim 1, wherein: in the step 1), designing a task scene to model a virtual table tennis player, building a virtual table tennis court at Unity3D, and setting the size of the court, the position and size of a table tennis table, the height of a net, the size of a table tennis ball, the size of a racket collision bounding box and the origin of a world coordinate system;
the design task flow specifically comprises the following steps: when the table tennis touches a wall, a floor or the bat hits the table tennis to the table top or the net at the end B of the table, the round is finished, and the hit ball is determined to be failed; when the bat strikes a table tennis ball on the table top at the A end of the table, the turn is over, and the striking is considered to be successful; when the turn is over, the table tennis bat is reset to the initial position to wait for the next turn to start.
3. The virtual table tennis player ball-hitting training method based on reinforcement learning of claim 1, wherein: in the step 3), the motion condition of each joint is estimated by using an algorithm of inverse kinematics, so that the unreasonable stretching and twisting of the posture can not occur, and the method comprises the following steps:
3.1) simplifying the skeleton of the three-dimensional human Body model by using a whole Body skeleton inverse dynamics algorithm, namely a Full Body Biped IK algorithm, wherein the simplified skeleton has 14 joints which are respectively a crotch, a head, left and right legs, left and right shanks, left and right soles, left and right big arms, left and right small arms and left and right palms, and the left and right shoulders, the left and right legs, the left and right feet and the left and right hands all comprise reactors;
3.2) binding the handle part of the racket with the end reactor of the right hand, using Full Body Biped IK algorithm to regard the right arm as a joint chain when the racket moves, and solving each joint point position on the joint chain of the right arm by using FABRIK algorithm for solving inverse kinematics problem in an iterative way;
3.3) using Full Body Biped IK algorithm to correspondingly adjust the positions of all joint points of the Body in a small range according to the change of the right hand arm, thereby solving and obtaining the positions of all joint points;
and 3.4) calculating the motion situation of each joint point of the three-dimensional human Body model under the condition that the right hand holds the racket and the root node is not moved by using a Full Body double IK algorithm.
4. The virtual table tennis player ball-hitting training method based on reinforcement learning of claim 1, wherein: in step 4), because the root node is not moved by the inverse kinematics, the movement of the root node is controlled by using a root node movement strategy based on reinforcement learning, and the overall human body posture is more reasonable by matching the inverse kinematics, which comprises the following steps:
4.1) design Observation
Set the observations of the root node mobility strategy training as 6 three-dimensional vectors { p }agent,ragent,nspine,pref,rref,vrefIn which p isagentIs the position of the phantom, ragentIs the orientation, n, of the manikinspineIs the direction of the human spine, prefIs the position of the racket rrefIs the orientation v of the racketrefThe instantaneous speed of the racket is observed for 1 time per frame, and the observation collected for every 3 frames is used as the input of a root node movement strategy training network;
4.2) design behavior
The behavior of the training for setting the root node mobility strategy is a 3-dimensional vector tx,tz,ryWhere t isxAnd tyRepresenting the movement of the phantom in the x-axis and z-direction, r, respectivelyyThen the rotation of the manikin about the y-axis, tx、tzAnd ryAre all automatically standardized to [ -1,1 [)]Within the interval, the weight coefficients are multiplied before output
Figure FDA0002445454410000055
And
Figure FDA0002445454410000054
4.3) designing a reward function, wherein the reward function of the root node mobile strategy training is set as follows:
Rmove=wplimitRplimit+wleaveRleave+wposeRpose+wdiviationRdiviation
Rplimitas a function of the range of motion of the mannequin to limit the range of motion of the mannequin:
Figure FDA0002445454410000051
wplimitis RplimitA weight;
Rleavepreventing a racquet from losing hands by measuring the distance of the racquet tip from the hand as a function of the distance of the racquet tip from the hand:
Figure FDA0002445454410000052
Figure FDA0002445454410000053
d is the distance between the handle and the palm of the racket, phandAs three-dimensional coordinates of the palm, pbatIs the three-dimensional coordinate of the racket handle;
wleaveis RleaveA weight;
Rposeas a function of the ball striking position, for measuring the plausibility of the ball striking position:
Figure FDA0002445454410000061
Figure FDA0002445454410000062
Figure FDA0002445454410000063
Figure FDA0002445454410000064
Rforehandand RbackhandThe cos α represents the included angle between the connecting line of the holding clap and the root node and the unit vector in the x direction under the local coordinate system of the three-dimensional human body model, and p is the reward function corresponding to the forehand batting and the backhand batting respectivelyhandThree-dimensional world coordinates for hand-held clappers, prootIs the three-dimensional world coordinates of the root node,
Figure FDA0002445454410000065
is a unit vector (1, 0, 0) in the x direction under the SMP L human model local coordinate system;
wposeis RposeA weight;
Rdiviationformulating a reward as a function of the offset of the manikin spine, according to the offset of the manikin spine:
Figure FDA0002445454410000066
Figure FDA0002445454410000067
cos β defines the offset of the human spine,
Figure FDA0002445454410000071
is at presentThe three-dimensional coordinates of the neck of the mannequin under the action,
Figure FDA0002445454410000072
is the three-dimensional coordinate of the root node of the human model under the current action,
Figure FDA0002445454410000073
is the three-dimensional coordinates of the manikin neck under the initial motion,
Figure FDA0002445454410000074
three-dimensional coordinates of a root node of the human body model under the initial action are obtained;
wdiviationis RdiviationA weight;
4.4) design network and training parameters
Setting the input of a neural network as a 54-dimensional vector and the output as a 3-dimensional vector, wherein the whole neural network comprises 3 hidden layers and 1 output layer, and each hidden layer comprises 512 neurons; after the network is established, a near-end strategy optimization algorithm, namely a PPO algorithm, is used for training the neural network to obtain a root node movement strategy.
CN201910763946.6A 2019-08-19 2019-08-19 Virtual table tennis player ball hitting training method based on reinforcement learning Active CN110496377B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910763946.6A CN110496377B (en) 2019-08-19 2019-08-19 Virtual table tennis player ball hitting training method based on reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910763946.6A CN110496377B (en) 2019-08-19 2019-08-19 Virtual table tennis player ball hitting training method based on reinforcement learning

Publications (2)

Publication Number Publication Date
CN110496377A CN110496377A (en) 2019-11-26
CN110496377B true CN110496377B (en) 2020-07-28

Family

ID=68588315

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910763946.6A Active CN110496377B (en) 2019-08-19 2019-08-19 Virtual table tennis player ball hitting training method based on reinforcement learning

Country Status (1)

Country Link
CN (1) CN110496377B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111546332A (en) * 2020-04-23 2020-08-18 上海电机学院 Table tennis robot system based on embedded equipment and application
CN113312840B (en) * 2021-05-25 2023-02-17 广州深灵科技有限公司 Badminton playing method and system based on reinforcement learning
CN113625876B (en) * 2021-08-10 2024-04-02 浙江大学 Immersion-based badminton tactic analysis method
CN114417618A (en) * 2022-01-21 2022-04-29 北京理工大学 Virtual reality assisted assembly complexity evaluation system
CN114841362A (en) * 2022-03-30 2022-08-02 山东大学 Method for collecting imitation learning data by using virtual reality technology
CN114609918B (en) * 2022-05-12 2022-08-02 齐鲁工业大学 Four-footed robot motion control method, system, storage medium and equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108549237A (en) * 2018-05-16 2018-09-18 华南理工大学 Preview based on depth enhancing study controls humanoid robot gait's planing method
CN108983804A (en) * 2018-08-27 2018-12-11 燕山大学 A kind of biped robot's gait planning method based on deeply study
CN109345614A (en) * 2018-09-20 2019-02-15 山东师范大学 The animation simulation method of AR augmented reality large-size screen monitors interaction based on deeply study
CN109540151A (en) * 2018-03-25 2019-03-29 哈尔滨工程大学 A kind of AUV three-dimensional path planning method based on intensified learning
CN109760046A (en) * 2018-12-27 2019-05-17 西北工业大学 Robot for space based on intensified learning captures Tum bling Target motion planning method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5750657B2 (en) * 2011-03-30 2015-07-22 株式会社国際電気通信基礎技術研究所 Reinforcement learning device, control device, and reinforcement learning method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109540151A (en) * 2018-03-25 2019-03-29 哈尔滨工程大学 A kind of AUV three-dimensional path planning method based on intensified learning
CN108549237A (en) * 2018-05-16 2018-09-18 华南理工大学 Preview based on depth enhancing study controls humanoid robot gait's planing method
CN108983804A (en) * 2018-08-27 2018-12-11 燕山大学 A kind of biped robot's gait planning method based on deeply study
CN109345614A (en) * 2018-09-20 2019-02-15 山东师范大学 The animation simulation method of AR augmented reality large-size screen monitors interaction based on deeply study
CN109760046A (en) * 2018-12-27 2019-05-17 西北工业大学 Robot for space based on intensified learning captures Tum bling Target motion planning method

Also Published As

Publication number Publication date
CN110496377A (en) 2019-11-26

Similar Documents

Publication Publication Date Title
CN110496377B (en) Virtual table tennis player ball hitting training method based on reinforcement learning
CN111260762B (en) Animation implementation method and device, electronic equipment and storage medium
Chen et al. A system for general in-hand object re-orientation
Juliani et al. Unity: A general platform for intelligent agents
WO2021143289A1 (en) Animation processing method and apparatus, and computer storage medium and electronic device
CN111223170B (en) Animation generation method and device, electronic equipment and storage medium
Yang et al. Ball motion control in the table tennis robot system using time-series deep reinforcement learning
Zhu et al. Towards high level skill learning: Learn to return table tennis ball using monte-carlo based policy gradient method
CN111283700A (en) Table tennis service robot, table tennis service method and computer-readable storage medium
Wang et al. Movevr: Enabling multiform force feedback in virtual reality using household cleaning robot
Schwab et al. Learning skills for small size league robocup
CN116362133A (en) Framework-based two-phase flow network method for predicting static deformation of cloth in target posture
Wang et al. RETRACTED ARTICLE: Optimization analysis of sport pattern driven by machine learning and multi-agent
Hecker Physics in computer games (title only)
CN114565050A (en) Game artificial intelligence action planning method and system
CN102446359A (en) Small ball sport processing method based on computer and system thereof
Bai et al. Wrighteagle and UT Austin villa: RoboCup 2011 simulation league champions
CN102004552A (en) Tracking point identification based method and system for increasing on-site sport experience of users
Zicong et al. Training a virtual tabletennis player based on reinforcement learning
Sasaki et al. Exemposer: Predicting poses of experts as examples for beginners in climbing using a neural network
US20120223953A1 (en) Kinematic Engine for Adaptive Locomotive Control in Computer Simulations
CN109200575A (en) The method and system for reinforcing the movement experience of user scene of view-based access control model identification
Ding et al. Goalseye: Learning high speed precision table tennis on a physical robot
Chen Research on the VR Technology in Basketball Training [J]
CN109821243A (en) A method of simulation reappears shooting process

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant