CN114841362A - Method for collecting imitation learning data by using virtual reality technology - Google Patents

Method for collecting imitation learning data by using virtual reality technology Download PDF

Info

Publication number
CN114841362A
CN114841362A CN202210331565.2A CN202210331565A CN114841362A CN 114841362 A CN114841362 A CN 114841362A CN 202210331565 A CN202210331565 A CN 202210331565A CN 114841362 A CN114841362 A CN 114841362A
Authority
CN
China
Prior art keywords
learning
virtual
virtual reality
model object
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210331565.2A
Other languages
Chinese (zh)
Inventor
王春鹏
石翔慧
盖新宇
张岩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University
Original Assignee
Shandong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University filed Critical Shandong University
Priority to CN202210331565.2A priority Critical patent/CN114841362A/en
Publication of CN114841362A publication Critical patent/CN114841362A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/006Mixed reality

Abstract

The invention discloses a method for collecting simulated learning data by using a virtual reality technology, which belongs to the technical field of virtual reality and comprises the following steps: the method comprises the following steps: acquiring scene image data, imitating a real scene, and building a virtual scene in a three-dimensional engine; step two: at least one operational virtual model object is set up as a proxy in the virtual scene in a geometric manner. Compared with the traditional method for artificially demonstrating by using a keyboard, the method combines the simulated learning and the virtual reality technology, provides a feasible scheme for training the agent with complexity, is convenient for collecting simulated learning data and improves the model training efficiency; the simulation learning data collection can be realized by utilizing the virtual reality, and the model training efficiency is improved.

Description

Method for collecting simulated learning data by using virtual reality technology
Technical Field
The invention relates to the technical field of virtual reality, in particular to a method for collecting simulated learning data by using a virtual reality technology.
Background
In recent years, with the continuous breakthrough of artificial intelligence related technologies and the continuous maturity of related algorithms, AI intelligent agents have gradually deepened into various fields and showed better application effects. Unity Machine Learning Agents (ML-Agents) is an open-source Unity plugin, allows users to train intelligent Agents in game environments and simulation environments, and can train Agents by using reinforcement Learning, imitation Learning, neural evolution or other Machine Learning methods and controlled through a simple and easy-to-use Python API.
Reinforcement learning can train far beyond human intelligent agents by interacting with the environment to obtain maximum benefits, but training time is often very long. Whereas mock learning may extract knowledge from human expert's presentations or artificially created agents to replicate their behavior. The simulation learning is combined, and the reinforcement learning is carried out on the basis of human demonstration, so that the training time can be greatly reduced, and the efficiency is improved.
But for agents with complex behavior, it is difficult or even impossible to operate with a keyboard during the presentation phase, and the presentation quality is poor. The performance of the model is highly dependent on the quality of the presentation, which makes training a complex agent with mock learning impractical, and inefficient because the desired effect can only be achieved over a large amount of training time.
Disclosure of Invention
The invention aims to solve the defects in the prior art, and provides a method for collecting simulated learning data by using a virtual reality technology.
In order to achieve the purpose, the invention adopts the following technical scheme:
a method for mock learning data collection using virtual reality technology, comprising the steps of:
the method comprises the following steps: acquiring scene image data, imitating a real scene, and building a virtual scene in a three-dimensional engine;
step two: setting up at least one operation virtual model object as a proxy in a virtual scene in a geometric mode;
step three: according to a specific target to be realized, compiling codes by using a Unity plug-in ML-Agents to complete state input, reward setting and action output of the intelligent agent;
step four: configuring reinforcement learning training parameters, training and checking effects;
step five: configuring simulation learning training parameters, and performing human demonstration by using a virtual reality tracker to finish data collection of simulation learning;
step six: performing simulation learning, training and checking effects on the basis of human demonstration;
step seven: and analyzing the result to obtain an optimal scheme.
Further, for step three, the virtual environment is built in Unity 3D.
Further, in step three, the vector motion space of the virtual model object is of a Continuous type, and there are 5 variable motion parameters in total, including the movement of the virtual model object in the directions of the x-axis, the y-axis and the z-axis and the rotation of the virtual model object along the directions of the x-axis and the z-axis.
Further, in the third step, the Unity plug-in ML-Agents is used for completing the state input, the reward setting and the action output of the virtual model object.
Further, in the sixth step, the virtual model object performs reinforcement learning on the basis of the demonstration, and performs cyclic training on the strategy model according to the continuously input state input, the reward information and the action output.
And further, in the sixth step, the virtual model object adopting the reinforcement learning model and the virtual model object using the reinforcement learning + simulation learning model are compared and trained, wherein the strategy result is quickened by adding a manual demonstration mode in the process.
Further, in the fifth step, a VR handle is adopted for data collection, and control parameters are obtained by utilizing rotation and movement parameters of the VR handle to complete data collection.
Further, in the fifth step, the simulated learning algorithm is adopted for the simulated learning training demonstration, wherein the BC algorithm and the GAIL algorithm are simultaneously used by the simulated learning algorithm.
In summary, compared with the traditional method of artificially demonstrating by using a keyboard, the method combines the simulated learning and the virtual reality technology, provides a feasible scheme for training an agent with complexity, and has important significance;
the combination of simulation learning and other tools (such as a depth camera and the like) capable of controlling the agent can replace a keyboard for operation, so that the collection of simulation learning data is facilitated, and the model training efficiency is improved;
the simulation learning data collection can be realized by utilizing the virtual reality, and the model training efficiency is improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention.
FIG. 1 is a logic diagram of a method for mock learning data collection using virtual reality techniques in accordance with the present invention;
fig. 2 is a schematic illustration of a second demonstration of the method for performing simulated learning data collection by using a virtual reality technology according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments.
The present invention will be described in detail below in order to better understand the technical solution of the present invention.
Unity ML Agents are an open-source Unity plugin that allows users to train intelligent Agents in gaming and simulation environments, and may be controlled through a simple easy-to-use Python API using reinforcement learning (reinforcement learning), emulation learning (animation learning), neural evolution (neuroevolution), or other machine learning methods to train the Agents. Supporting a plurality of deep reinforcement learning algorithms (PPO, SAC, MA-POCA, self-play), and supporting learning from demonstration through two simulation learning algorithms (BC and GAIL);
proximal Policy Optimization (near-end Policy Optimization algorithm), PPO for short, is a Policy gradient method for reinforcement learning, and allowsParallel agent interactions with the environment are sampled and agent objectives are optimized by stochastic gradient descent. The core idea is that the action probability of the above strategy is divided by the action probability of the current strategy
Figure BDA0003573250710000051
The objective function is constrained to ensure that large policy updates do not occur. Using PPO optimized clipping instead of the objective loss function:
Figure BDA0003573250710000052
example one
Referring to fig. 1, a method for mock learning data collection using virtual reality technology,
the method comprises the following steps:
the method comprises the following steps: acquiring scene image data, imitating a real scene, and building a virtual scene in a three-dimensional engine;
step two: setting up at least one operation virtual model object as a proxy in a virtual scene in a geometric mode;
step three: according to a specific target to be realized, writing codes by using a Unity plug-in ML-Agents to complete state input, reward setting and action output of the intelligent agent;
step four: training for reinforcement learning: configuring reinforcement learning training parameters, training and checking effects;
step five: training to simulate learning: configuring simulation learning training parameters, and performing human demonstration by using a virtual reality tracker to finish data collection of simulation learning;
step six: training of imitation learning is carried out on the basis of demonstration, and the simulation learning + reinforcement learning is realized: performing simulation learning, training and checking effects on the basis of human demonstration;
step seven: and analyzing the result to obtain an optimal scheme.
The analysis mode is to compare the effect of model training by adopting a reinforcement learning mode and a mode of simulating learning and reinforcement learning.
Example two
Referring to fig. 2, on the basis of the first embodiment, an example of building a tennis scene is adopted, and the method specifically includes the following steps:
the method comprises the following steps: acquiring tennis court scene image data, imitating a real scene, and building a tennis court virtual scene in a three-dimensional engine;
step two: setting two operation virtual model objects in a virtual tennis scene at equal ratio as agents, and sharing policy model parameters according to agent rules;
specifically, the same policy module is used when the agent rules of the agents are the same, and different policy models are used otherwise.
It should be noted that the Policy model adopts Proximal Policy Optimization (near-end Policy Optimization algorithm), in which:
dividing the action probability of the above strategy according to the action probability under the current strategy
Figure BDA0003573250710000061
The objective function is constrained to ensure that large policy updates do not occur.
Using PPO optimized clipping instead of the objective loss function:
Figure BDA0003573250710000062
step three: according to a specific target to be realized, compiling codes by using a Unity plug-in ML-Agents to complete state input, reward setting and action output of the intelligent agent;
step four: training for reinforcement learning: configuring reinforcement learning training parameters, training and checking effects;
step five: training to simulate learning: configuring simulation learning training parameters, performing human demonstration by using a virtual reality tracker, and completing the data collection of simulation learning, wherein a reinforced learning + simulation learning (BC + GAIL) mode is adopted;
when the simulation learning training is carried out, the Behavior Type of the agent is modified into Heuristic Only, and the Demonstration Recorder component is added for human Demonstration.
The vector motion space of the virtual model object is of a Continuous type, and 5 variable motion parameters in total comprise the movement of the virtual model object in the directions of an x axis, a y axis and a z axis and the rotation of the virtual model object along the directions of the x axis and the z axis of the virtual model object.
It should be noted that the simulated learning algorithm is adopted for performing the simulated learning training demonstration, wherein the simulated learning algorithm uses the BC algorithm and the GAIL algorithm at the same time, which is further illustrated and described below:
behavior Cloning, BC for short;
genetic adaptive evaluation Learning, GAIL, may be used together.
BC: typically used as pre-training. The idea is that the strategy network of the intelligent agent is trained to be closer to the behavior mode of human demonstration data as well as better, namely the same state s is input and the similar output a is required;
the optimized target is not different from supervised learning, each state s is equivalent to the input characteristic, the action output by the expert is equivalent to a Label, and only the output a of the model is close to the Label.
Figure BDA0003573250710000071
Wherein, before using the BC, it should be ensured that the presentation file has been recorded by itself, i.e. the collection of the presentation data;
some teaching data, namely a plurality of state action pairs, need to be collected, and the state action pairs are used as training data to train the strategy network, so that the strategy network can imitate behaviors.
Genetic adaptive evaluation Learning, GAIL for short, can learn strategies directly from expert data. GAIL belongs to the Inverse reinforcement learning (Inverse RL) modelDomain, reinforcement learning is given by the environment Reward and next state s t+1 To learn the optimal strategy;
enhancement of policy reward signals by recorded expert demonstration using reverse reinforcement learning at a given s t And a t The method has the advantages that the strategy is not directly supervised for learning, and is more universal.
The goal of the GAIL algorithm is to find a saddle point (π, D) in the following equation:
Figure BDA0003573250710000081
defining two approximation functions to represent pi and D as pi, respectively θ And D ω : sxa → (0, 1). The adam optimizer was used to ramp up the ω gradient and the TRPO method was used to ramp down the θ gradient.
GAIL is equivalent to another internal reward and can be trained in combination with an external reward for reinforcement learning. The higher the expert reward is set, the more the agent tends to mimic the behavior of the expert in the environment. By reasonably setting the upper limit of the reward, the intelligent agent can imitate the behaviors of experts to a certain extent and explore the environment more so as to find a better strategy.
From the above, BC and GAIL can significantly enhance the effect of reinforcement learning. Wherein the Behavioral Cloning is equivalent to a pre-training and is only used in the early stage; the general adaptive evaluation Learning can run through the whole reinforcement Learning, which is equivalent to adding an internal reward, and the closer to the strategy of expert demonstration, the greater the reward is, so that the intelligent can explore more optimal solutions.
In the present application, the duration to achieve the same result is greatly reduced with the reinforcement learning + BC + GAIL method.
Step six: training of simulation learning is carried out on the basis of demonstration, and simulation learning and reinforcement learning are realized: performing simulation learning, training and checking effects on the basis of human demonstration;
specifically, the virtual agent inputs (i.e., human demonstration) basic demonstration parameters through the VR handle, and the model is slowly trained without human demonstration.
Step seven: and analyzing the result to obtain an optimal scheme.
And the scheme which is analyzed to obtain the most added options is the optimal action scheme.
In a specific embodiment of the present application, for step three, a virtual environment is built in Unity 3D.
In the specific embodiment of the application, the Unity plug-in ML-Agents is used for completing the state input, the reward setting and the action output of the virtual model object in the step three.
In the embodiment of the present application, in the step six, the virtual model object performs reinforcement learning based on the demonstration, and performs a cyclic training on the policy model according to the continuously input status input, the reward information, and the motion output.
In the specific embodiment of the present application, in the sixth step, the virtual model object using the reinforcement learning model and the virtual model object using the reinforcement learning + simulation learning model are subjected to comparison training, wherein in the process, a policy result is obtained in an accelerated manner by adding a manual demonstration.
In a specific embodiment of the application, in the fifth step, the VR handle is used for collecting data, and the control parameters are obtained by using the rotation and movement parameters of the VR handle, so that data collection is completed.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention should be equivalent or changed within the scope of the present invention.

Claims (8)

1. A method for performing simulated learning data collection by using a virtual reality technology is characterized by comprising the following steps:
the method comprises the following steps: acquiring scene image data, imitating a real scene, and building a virtual scene in a three-dimensional engine;
step two: setting up at least one operation virtual model object as a proxy in a virtual scene in a geometric mode;
step three: according to a specific target to be realized, compiling codes by using a Unity plug-in ML-Agents to complete state input, reward setting and action output of the intelligent agent;
step four: configuring reinforcement learning training parameters, training and checking effects;
step five: configuring simulation learning training parameters, and performing human demonstration by using a virtual reality tracker to finish data collection of simulation learning;
step six: performing simulation learning, training and checking effects on the basis of human demonstration;
step seven: and analyzing the result to obtain an optimal scheme.
2. The method for performing mock learning data collection by utilizing virtual reality technology according to claim 1, which is used in step three, wherein a virtual environment is constructed in Unity 3D.
3. The method for performing mock learning data collection by utilizing virtual reality technology according to claim 2, wherein the vector motion space of the virtual model object used in step three is of the continuos type, and there are 5 variable motion parameters in total, including the movement of the virtual model object in the x-axis, y-axis and z-axis directions and the rotation of the virtual model object in the x-axis and z-axis directions.
4. The method for performing mock learning data collection by utilizing virtual reality technology according to claim 3, wherein for step three, the Unity plug-ins ML-Agents is utilized to complete the status input, reward setting and action output to the virtual model object.
5. The method according to claim 4, wherein in the sixth step, the virtual model object performs reinforcement learning based on the demonstration, and the strategy model is trained cyclically according to the continuously input state input, reward information and motion output.
6. The method of claim 5, wherein the method is used in step six, and the virtual model object using reinforcement learning model and the virtual model object using reinforcement learning + simulation learning model are compared and trained, wherein the strategy result is accelerated by adding manual demonstration during the process.
7. The method of claim 6, wherein in step five, the data collection is performed by using a VR handle, and the data collection is performed by obtaining the control parameters according to the rotation and movement parameters of the VR handle.
8. The method of claim 7, wherein in step five, the simulated learning training demonstration is performed by using a simulated learning algorithm, wherein the simulated learning algorithm uses the BC algorithm and the GAIL algorithm at the same time.
CN202210331565.2A 2022-03-30 2022-03-30 Method for collecting imitation learning data by using virtual reality technology Pending CN114841362A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210331565.2A CN114841362A (en) 2022-03-30 2022-03-30 Method for collecting imitation learning data by using virtual reality technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210331565.2A CN114841362A (en) 2022-03-30 2022-03-30 Method for collecting imitation learning data by using virtual reality technology

Publications (1)

Publication Number Publication Date
CN114841362A true CN114841362A (en) 2022-08-02

Family

ID=82564011

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210331565.2A Pending CN114841362A (en) 2022-03-30 2022-03-30 Method for collecting imitation learning data by using virtual reality technology

Country Status (1)

Country Link
CN (1) CN114841362A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109858574A (en) * 2018-12-14 2019-06-07 启元世界(北京)信息技术服务有限公司 The autonomous learning method and system of intelligent body towards man-machine coordination work
CN110496377A (en) * 2019-08-19 2019-11-26 华南理工大学 A kind of virtual table tennis forehand hit training method based on intensified learning
CN111983922A (en) * 2020-07-13 2020-11-24 广州中国科学院先进技术研究所 Robot demonstration teaching method based on meta-simulation learning
CN112162564A (en) * 2020-09-25 2021-01-01 南京大学 Unmanned aerial vehicle flight control method based on simulation learning and reinforcement learning algorithm
CN113677485A (en) * 2019-01-23 2021-11-19 谷歌有限责任公司 Efficient adaptation of robot control strategies for new tasks using meta-learning based on meta-mimic learning and meta-reinforcement learning
CN114021330A (en) * 2021-10-28 2022-02-08 武汉中海庭数据技术有限公司 Simulated traffic scene building method and system and intelligent vehicle control method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109858574A (en) * 2018-12-14 2019-06-07 启元世界(北京)信息技术服务有限公司 The autonomous learning method and system of intelligent body towards man-machine coordination work
CN113677485A (en) * 2019-01-23 2021-11-19 谷歌有限责任公司 Efficient adaptation of robot control strategies for new tasks using meta-learning based on meta-mimic learning and meta-reinforcement learning
CN110496377A (en) * 2019-08-19 2019-11-26 华南理工大学 A kind of virtual table tennis forehand hit training method based on intensified learning
CN111983922A (en) * 2020-07-13 2020-11-24 广州中国科学院先进技术研究所 Robot demonstration teaching method based on meta-simulation learning
CN112162564A (en) * 2020-09-25 2021-01-01 南京大学 Unmanned aerial vehicle flight control method based on simulation learning and reinforcement learning algorithm
CN114021330A (en) * 2021-10-28 2022-02-08 武汉中海庭数据技术有限公司 Simulated traffic scene building method and system and intelligent vehicle control method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ROHIT JENA 等: "Augmenting GAIL with BC for sample efficient imitation learning", 《ARXIV》, 9 November 2020 (2020-11-09), pages 3 - 5 *

Similar Documents

Publication Publication Date Title
Risi et al. Increasing generality in machine learning through procedural content generation
CN113688977B (en) Human-computer symbiotic reinforcement learning method and device oriented to countermeasure task, computing equipment and storage medium
Song et al. Arena: A general evaluation platform and building toolkit for multi-agent intelligence
CN110516389A (en) Learning method, device, equipment and the storage medium of behaviour control strategy
Liapis et al. Sentient World: Human-Based Procedural Cartography: An Experiment in Interactive Sketching and Iterative Refining
El Gourari et al. The implementation of deep reinforcement learning in e-learning and distance learning: Remote practical work
CN111282272A (en) Information processing method, computer readable medium and electronic device
Xu et al. Composite Motion Learning with Task Control
Yang et al. Adaptive inner-reward shaping in sparse reward games
Rowe et al. Toward automated scenario generation with deep reinforcement learning in gift
CN114841362A (en) Method for collecting imitation learning data by using virtual reality technology
CN115797517A (en) Data processing method, device, equipment and medium of virtual model
Espinosa Leal et al. Reinforcement learning for extended reality: designing self-play scenarios
CN114186696A (en) Visual system and method for AI training teaching
Dinerstein et al. Learning policies for embodied virtual agents through demonstration
Browne et al. Guest editorial: General games
CN112017265A (en) Virtual human motion simulation method based on graph neural network
Feng et al. Recognizing Multiplayer Behaviors Using Synthetic Training Data
Li Design and implement of soccer player AI training system using unity ML-agents
Kang et al. Animation Character Generation and Optimization Algorithm Based on Computer Aided Design and Virtual Reality
Zhu et al. Deep neuro-evolution: Evolving neural network for character locomotion controller
Jianbo et al. Design of amazon chess evaluation function based on reinforcement learning
Kanervisto Advances in deep learning for playing video games
Davies et al. Modelling pervasive environments using bespoke and commercial game-based simulators
Plechawska-Wójcik et al. Professionalized master theses as a result of cooperation between university and industry

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination