CN112330778A

CN112330778A - Deep reinforcement learning-based animation simulation method for AR augmented reality large-screen interaction

Info

Publication number: CN112330778A
Application number: CN202010977580.5A
Authority: CN
Inventors: 蔡顺蛟
Original assignee: Jiangsu Jieheng Software Technology Co Ltd
Current assignee: Jiangsu Jieheng Software Technology Co Ltd
Priority date: 2020-09-17
Filing date: 2020-09-17
Publication date: 2021-02-05

Abstract

The invention discloses an animation simulation method of AR augmented reality large-screen interaction based on deep reinforcement learning, which comprises the steps of collecting action data of a professional, dividing the data and using the data as a reference action set; firstly, constructing two mask convolution neural network frames, taking the state, action and target of an animation role as the input of a first network, taking the state, action and target of a lower limb with a wider activity range and the action of the first network as the input of a second network, and mutually assisting with the first network to accelerate the learning rate; driving an animation role through a PD controller, and directly embedding the animation role into the AR augmented reality system or combining the animation role with the original animation role in the AR augmented reality system; the method constructs the virtual animation role, and by feeding back the reward and punishment information of the role, the role can know whether the behavior of the role is correct or not, and finally the role can have the ability of autonomous decision-making through long-time learning.

Description

Deep reinforcement learning-based animation simulation method for AR augmented reality large-screen interaction

Technical Field

The invention relates to an animation simulation method, in particular to an animation simulation method for AR augmented reality large-screen interaction.

Background

AR large screen interaction is used as a propaganda sharps, and has irreplaceable advantages in exhibitions, release meetings, markets and other places with large people flow. In the technical aspect, AR large screen interaction is involved in image recognition, face recognition, state recognition and gesture recognition. In the interactive mode, a series of interactive forms such as extended AR interactive games, AR interactive photographing and the like have good performances.

Currently, the motion creation method of computer animated characters mainly draws key frames manually and uses motion capture data in animated characters. The method for manually drawing the key frame needs to be designed frame by frame, is time-consuming and labor-consuming, is difficult to manufacture, and has the defects of simplicity or unreality in the action of the character.

Physics-based simulation of natural motion has been the subject of extensive research for decades, with quality concerns being driven by motion pictures, visual effects, and gaming applications in general. Over the years, a great deal of work on physics-based character animation has developed controllers that can generate robust natural actions suitable for a large number of tasks and characters. These methods typically exploit human insights into control structures that are incorporated into specific tasks, thereby providing a powerful inductive bias for the motions that an agent can perform. However, due to these design decisions, controllers are usually specific to a certain agent or task, and most are single-action training, and multi-action coherent training has not been applied to practice.

Disclosure of Invention

The invention aims to provide an animation simulation method for AR augmented reality large-screen interaction based on deep reinforcement learning, which is used for performing accurate action simulation.

In order to solve the technical problems, the technical scheme of the invention is as follows: the deep reinforcement learning-based animation simulation method for AR augmented reality large-screen interaction comprises the following steps:

step one, acquiring action data of a professional through an action capturing system, uniformly converting the data acquired by different devices into bvh format, dividing the data through codes, extracting useful values, and performing combined sequencing according to a joint sequence defined by actions to serve as a reference action set;

step two, by utilizing a deep reinforcement learning algorithm, each strategy is represented by two neural networks in an interaction way, and two coffee convolution neural network frameworks are firstly constructed: the first layer of the first network is a complete connection layer consisting of 512 units and is bilinear phase transformation, the second layer is two groups of six linear output layers of 256 units, and the upper layer is a commentator sub-network used for predicting a value function of each actor; below is an actor subnetwork for selecting actions to be performed for a given state; the reference action set in the step one is used for a critic-actor strategy to carry out decision training to obtain an output action a i so as to drive the animation character to imitate the action posture of a professional; the second network consists of a complete connection layer consisting of two layers of 256 units, and the learning efficiency when the lower limbs need to move greatly is accelerated by selecting the role areas;

step three, enabling the animation to generate actions different from the reference actions according to special scenes through the self-adaptive style and the initial defined action set in the strategy so as to better adapt to a new environment;

driving the animation role through the PD controller during testing, directly embedding the animation role into the AR augmented reality system or combining the animation role with the original animation role in the AR augmented reality system so as to achieve the purpose of controlling the action of the animation role;

step five, the animation role in the AR augmented reality system imitates the standard action of a real person, so that the role is more vivid; meanwhile, the role can autonomously decide the movement in the left direction and the right direction, and the action simulation is accurately carried out.

As a preferable aspect, in the step one, the "dividing the operation data" criterion is: dividing continuous motion into a plurality of 5-second independent motion segments, averagely dividing each motion segment into 10 parts, namely each 0.5 second, extracting intermediate data to serve as meta-group data of one motion, and storing motion postures of a plurality of professionals in the reference motion set for the animation character to study and imitate; the reference action will be a component of the goal and reward functions.

As a preferred solution, in step two, the "bilinear phase change" is to construct Φ ═ T (Φ 0, Φ 1, Φ 2, Φ 3, Φ 4), Φ i ∈ (0,1) in order to keep the LLC synchronized with the reference motion; where Φ 0 ∈ (0,0.2), Φ 0 is equal to 1, and otherwise, 0.

As a preferred technical solution, in the second step, the "critic-operator policy" is: at the beginning of each set, the initial state s is uniformly sampled from the reference motion set or the initial defined action set, each event being simulated up to a fixed time range or up to a triggering termination condition; once a batch of data has been collected, a small batch is sampled from the dataset and used to update the policy and value functions; and updating the value function by using the target value, the state and the return function calculated by the TD, selecting one of four critic-operators with the maximum return function from the value functions according to the probability of alpha, randomly selecting one of the remaining three groups according to the probability of 1-alpha, and adjusting alpha by continuous training.

As a preferred technical solution, in step three, the "adaptive style" is generated by automatically adjusting the animation character according to the actual scene, mainly solving the decision under rare conditions, and the "initial defined action set" is set in advance according to the collected simple action data commonly used by the player; the animation role can have a better learning basis through the initial definition of the action set, and the adaptive style can enable the animation role to have better random strain capability and robustness to learn coping actions under a plurality of rare scenes.

As a preferred technical solution, in step four, the PD controller defines three joint moments of 24, 32 and 40 respectively according to the complexity of the animated character in the AR augmented reality to control the motion of the animation.

Due to the adoption of the technical scheme, the depth reinforcement learning-based animation simulation method for AR augmented reality large-screen interaction comprises the following steps: step one, acquiring action data of a professional through an action capturing system, uniformly converting the data acquired by different devices into bvh format, and dividing the data to be used as a reference action set; step two, by utilizing a deep reinforcement learning algorithm, each strategy is represented by two neural networks in an interaction way, and two coffee convolution neural network frameworks are firstly constructed: the first layer of the first network is a complete connection layer consisting of 512 units and is bilinear phase transformation, the second layer is two groups of six linear output layers of 256 units, and the upper layer is a commentator sub-network used for predicting a value function of each actor; below is an actor subnetwork for selecting actions to be performed for a given state; the reference action set in the step one is used for a critic-actor strategy to carry out decision training to obtain an output action a i so as to drive the animation character to imitate the action posture of a professional; the second network consists of a complete connection layer consisting of two layers of 256 units, and the learning efficiency when the lower limbs need to move greatly is accelerated by selecting the role areas; step three, enabling the animation to generate actions different from the reference actions according to special scenes through the self-adaptive style and the initial defined action set in the strategy so as to better adapt to a new environment; driving the animation role through the PD controller during testing, directly embedding the animation role into the AR augmented reality system or combining the animation role with the original animation role in the AR augmented reality system so as to achieve the purpose of controlling the action of the animation role; step five, the animation role in the AR augmented reality system imitates the standard action of a real person, so that the role is more vivid; the virtual animation character is constructed, whether the behavior of the character is correct or not is known through reward and punishment information fed back to the character, and finally the character has the ability of autonomous decision making through long-time learning.

Drawings

The drawings are only for purposes of illustrating and explaining the present invention and are not to be construed as limiting the scope of the present invention. Wherein:

FIG. 1 is a schematic diagram of an embodiment of the present invention;

fig. 2 is a schematic diagram of the role distribution of a simulated table tennis ball according to an embodiment of the invention.

Detailed Description

The invention is further illustrated below with reference to the figures and examples. In the following detailed description, certain exemplary embodiments of the present invention are described by way of illustration only. Needless to say, a person skilled in the art realizes that the described embodiments can be modified in various different ways without departing from the spirit and scope of the present invention. Accordingly, the drawings and description are illustrative in nature and not intended to limit the scope of the claims.

An animation simulation method of AR augmented reality large-screen interaction based on deep reinforcement learning is shown in FIG. 1 and comprises the following steps:

the "dividing the motion data" criterion is: dividing continuous motion into a plurality of 5-second independent motion segments, averagely dividing each motion segment into 10 parts, namely each 0.5 second, extracting intermediate data to serve as meta-group data of one motion, and storing motion postures of a plurality of professionals in the reference motion set for the animation character to study and imitate; the reference action will be a component of the goal and reward functions.

the "bilinear phase change" is to keep the LLC synchronized with the reference motion, and construct Φ ═ T (Φ 0, Φ 1, Φ 2, Φ 3, Φ 4), Φ i ∈ (0, 1); where Φ 0 ∈ (0,0.2), Φ 0 is equal to 1, and otherwise, 0.

The critic-operator strategy comprises the following steps: at the beginning of each set, the initial state s is uniformly sampled from the reference motion set or the initial defined action set, each event being simulated up to a fixed time range or up to a triggering termination condition; once a batch of data has been collected, a small batch is sampled from the dataset and used to update the policy and value functions; and updating the value function by using the target value, the state and the return function calculated by the TD, selecting one of four critic-operators with the maximum return function from the value functions according to the probability of alpha, randomly selecting one of the remaining three groups according to the probability of 1-alpha, and adjusting alpha by continuous training.

the self-adaptive style is generated by automatically adjusting the animation role according to the actual scene, mainly solves the decision under the rare condition, and the initial defined action set is set in advance according to the collected common simple action data of the player; the animation role can have a better learning basis through the initial definition of the action set, and the adaptive style can enable the animation role to have better random strain capability and robustness to learn coping actions under a plurality of rare scenes.

the PD controller will define three joint moments, 24, 32 and 40 respectively, to control the motion of the animation according to the complexity of the animated character in AR augmented reality. The PD controller adopts the existing technology that is well-established in the art, and is not described herein.

The invention aims to provide an animation simulation method of AR augmented reality large-screen interaction based on deep reinforcement learning. The following description will be made by taking a table tennis ball simulation as an example.

An animation simulation method of AR augmented reality table tennis large screen interaction based on deep reinforcement learning is shown in fig. 1 and 2, and specifically comprises the following steps:

step one, acquiring action data of a professional through an action capturing system, uniformly converting the data acquired by different devices into bvh format, dividing the data through codes, extracting useful values, and performing combined sequencing according to a joint sequence defined by actions to serve as a reference action set; dividing continuous motion into a plurality of 5-second independent motion segments, equally dividing each motion segment into 10 parts, namely 0.5 second each, extracting intermediate data as motion metadata, and storing the data in txt format. The reference action set stores action postures of a plurality of table tennis players during playing balls and is used for the reference and simulation of the animated character.

Secondly, constructing two mask convolution neural network frameworks by utilizing a deep reinforcement learning algorithm, namely a critic-operator algorithm based on full-incremental natural gradient: the first layer of the first network is a fully connected layer consisting of 512 units, the second layer is two groups of six linear output layers of 256 units, the upper one is a reviewer sub-network which is used for predicting the value function of each actor, and the sub-network has 4 outputs; the next 4 are actor subnets, one output for each subnetwork, which select the action to be performed for a given state. Relu activation is used for all hidden units, the first network takes an animation character state s, a previous action a i-1 and a reference action, namely a target g, as input, and the reference action set of the step one is taken as a part of a target and a return function and is used for a critic-actor strategy to carry out decision-making training to obtain an output action a i so as to drive the animation character to imitate the action posture of a professional; the second network is much simpler than the first one, so a simple network with two fully connected layers of 256 neural units is chosen to handle the moving position of the animated character individually, the state s': position of the animated character, origin and direction of the incoming ball, action a' i-1: animating the direction of the previous step movement, target g': the location in network one output action a i is used as input to train a strategy by continuously updating the network in the forward and reverse directions, in conjunction with the first network, to output the location of the incoming ball and the movement location of the animated character.

The method comprises the steps of carrying out bilinear phase change on the animation character state, the return and the target data, namely constructing phi ═ T (phi 0, phi 1, phi 2, phi 3 and phi 4) and phi i ∈ (0,1), wherein if phi 0 ∈ (0,0.2), phi 0 ═ 1 is carried out, otherwise, after 0 is carried out, abnormal value removal and partial tuple removal are carried out, namely, tuples with low occurrence frequency and little influence on the result are removed to reduce the data volume. After the processing, the input is used as the input of a convolution neural network, wherein the state comprises the position, the direction, the speed and the rotation degree of a coming ball, the position of a role, the speed and the angle of each joint point; the motion consists of the current orientation (classified into 4 types only in the horizontal direction, as shown in fig. 2), the angle of each joint point and the speed; the reference actions are targeted to guide character learning and are also part of the reward function. The reward function r is wc + ww rw + wt rt + c, rc is the difference between the actual motion and the reference motion, i.e., the target, rw is the difference between the angular velocity of the joint, and rt is the difference between the actual frame velocity and the 0.5 second reference motion frame. Wherein wc is-0, 75, ww is-0.15, wt is-0.1 and c is 1. At the beginning of each set, the initial state s is uniformly sampled from the reference motion set or the initially defined action set, and each event is simulated until a fixed time frame or until a termination condition is triggered. Once a batch of data has been collected, a small batch is sampled from the dataset and used to update the policy and value functions. And updating the value function by using the target value, the state and the return function calculated by the TD, selecting one of four critic-operators with the maximum return function from the value functions according to the probability of alpha, randomly selecting one of the remaining three groups according to the probability of 1-alpha, and adjusting alpha by continuous training. And the reference action set in the step one is used for learning and simulating actions by a critic-actor strategy to obtain actions more similar to the reference actions so as to drive the animation character to simulate the action posture of a table tennis professional and make quick and intelligent analysis and action decision of the animation character on the coming ball.

And step three, the animation can be stylized by simply modifying the reward function according to special scenes through a self-adaptive style in the strategy to generate actions different from the reference actions, so that the animation roles have better random strain capacity, and coping actions under a plurality of rare scenes are learned. And the initial defined action set is set in advance according to the collected common simple action data of the table tennis player. The animation role can have a better learning basis by initially defining the action set, so that the animation role can be easier to learn reference actions, meanwhile, the learning amount and the data storage space are reduced, and the learning efficiency and speed are improved.

And step four, during testing, the PD controller drives the animation role to be directly embedded into the AR augmented reality system or combined with the original animation role in the AR augmented reality system so as to achieve the purpose of controlling the action of the animation role. The method will define three joint moments, 24, 32 and 40 respectively, to control the motion of the animation according to the complexity of the animated character in AR augmented reality. Meanwhile, the method has very strong robustness, and the difference can be simply learned through the joint moments of 24, 32 and 40 so as to be suitable for other similar models.

Through the steps, the animation role in the AR augmented reality system can simulate the standard action of a real person, so that the role is more vivid and lifelike, and plays a good role in accompanying and professional guidance.

The method utilizes a deep reinforcement learning algorithm, is based on a critic-operator algorithm of a full-increment natural gradient of an improved double-network collaborative training neural network, and changes the batting action and the body posture of an animation role in an AR augmented reality screen into more real and restores the action of a table tennis professional by acquiring the batting action posture and the training convolutional neural network of the table tennis professional, so that an experiencer performs understanding or targeted learning and simulates more standard and standard actions and postures while experiencing. The experiencer can also learn the knowledge and skills of a plurality of ball games such as table tennis and the like through actual combat and observation, such as what posture to take the spin, cut the ball and the like.

The deep reinforcement learning combines the advantages of the deep neural network and the reinforcement learning, can be used for solving the perception decision problem of an intelligent agent in a complex high-dimensional state space, and has made a breakthrough progress in the fields of games, robots, recommendation systems and the like. The virtual animation character is constructed, whether the behavior of the character is correct or not is known through reward and punishment information fed back to the character, and finally the character has the ability of autonomous decision making through long-time learning.

The AR augmented reality system may be equipped with a scoring system that reflects the quality of the ball played by scoring the pose, speed, etc. The system can be provided with difficulty levels, so that users in different horizontal sections can obtain good user experience. The system can play the roles of standard partner training and professional guidance while playing entertainment.

The invention relates to an animation simulation method of AR augmented reality large-screen interaction based on deep reinforcement learning, which takes ping-pong large-screen interaction as an example for explanation and has the following advantages:

(1) the method provided by the invention can enable the animation character to have the autonomous decision-making capability, different action postures can be selected and executed according to different ball-coming conditions through a large amount of data and learning, and the action postures of the animation character can be comparable to the actions of professional table tennis persons and are very vivid.

(2) The method provided by the invention can enable the animation character to continuously execute and switch different actions, and the integration effect among a plurality of skills is very good because the actions of playing the ball have great similarity.

(3) The convolutional neural network has strong robustness and can be well migrated to other similar models.

(4) The invention is used in AR augmented reality, and can play the role of standard partner training and professional action guidance during entertainment; difficulty levels are set, so that users in different horizontal sections can obtain good user experience; the experiencer can also learn the knowledge and skills of a plurality of table tennis and other ball games through actual combat and observation, such as what posture to use for putting in and taking out the spin ball, cutting the ball and the like, and the system has good education and popularization significance.

The foregoing shows and describes the general principles, essential features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. The deep reinforcement learning-based animation simulation method for AR augmented reality large-screen interaction is characterized by comprising the following steps of: step one, acquiring action data of a professional through an action capturing system, uniformly converting the data acquired by different devices into bvh format, dividing the data through codes, extracting useful values, and performing combined sequencing according to a joint sequence defined by actions to serve as a reference action set; step two, by utilizing a deep reinforcement learning algorithm, each strategy is represented by two neural networks in an interaction way, and two coffee convolution neural network frameworks are firstly constructed: the first layer of the first network is a complete connection layer consisting of 512 units and is bilinear phase transformation, the second layer is two groups of six linear output layers of 256 units, and the upper layer is a commentator sub-network used for predicting a value function of each actor; below is an actor subnetwork for selecting actions to be performed for a given state; the reference action set in the step one is used for a critic-actor strategy to carry out decision-making training to obtain an output action a so as to drive the animation character to imitate the action posture of a professional; the second network consists of a complete connection layer consisting of two layers of 256 units, and the learning efficiency when the lower limbs need to move greatly is accelerated by selecting the role areas; step three, enabling the animation to generate actions different from the reference actions according to special scenes through the self-adaptive style and the initial defined action set in the strategy so as to better adapt to a new environment; driving the animation role through the PD controller during testing, directly embedding the animation role into the AR augmented reality system or combining the animation role with the original animation role in the AR augmented reality system so as to achieve the purpose of controlling the action of the animation role; step five, the animation role in the AR augmented reality system imitates the standard action of a real person, so that the role is more vivid; meanwhile, the role can autonomously decide the movement in the left direction and the right direction, and the action simulation is accurately carried out.

2. The method for simulating the animation of the AR augmented reality large-screen interaction based on the deep reinforcement learning of claim 1, wherein in the step one, the criterion of "segmenting the motion data" is: dividing continuous motion into a plurality of 5-second independent motion segments, averagely dividing each motion segment into 10 parts, namely each 0.5 second, extracting intermediate data to serve as meta-group data of one motion, and storing motion postures of a plurality of professionals in the reference motion set for the animation character to study and imitate; the reference action will be a component of the goal and reward functions.

3. The deep reinforcement learning-based animation simulation method for AR augmented reality large-screen interaction of claim 1, wherein in step two, the bilinear phase change is to keep LLC synchronous with reference motion, construct phi ═ T (phi 0, phi 1, phi 2, phi 3, phi 4) and phi i e (0, 1); where Φ 0 ∈ (0,0.2), Φ 0 is equal to 1, and otherwise, 0.

4. The method for simulating the animation of the AR augmented reality large-screen interaction based on the deep reinforcement learning of claim 1, wherein in the second step, the "critic-operator policy" is: at the beginning of each set, the initial state s is uniformly sampled from the reference motion set or the initial defined action set, each event being simulated up to a fixed time range or up to a triggering termination condition; once a batch of data has been collected, a small batch is sampled from the dataset and used to update the policy and value functions; and updating the value function by using the target value, the state and the return function calculated by the TD, selecting one of four critic-operators with the maximum return function from the value functions according to the probability of alpha, randomly selecting one of the remaining three groups according to the probability of 1-alpha, and adjusting alpha by continuous training.

5. The deep reinforcement learning-based animation simulation method for AR augmented reality large-screen interaction of claim 1, wherein in step three, the adaptive style is generated by automatic adjustment of an animated character according to an actual scene, mainly solving the decision under rare conditions, and the initial defined action set is set in advance according to collected common simple action data of a player; the animation role can have a better learning basis through the initial definition of the action set, and the adaptive style can enable the animation role to have better random strain capability and robustness to learn coping actions under a plurality of rare scenes.

6. The method as claimed in claim 1, wherein in step four, the PD controller defines three joint moments of 24, 32 and 40 respectively to control the motion of the animation according to the complexity of the animated character in the AR augmented reality.