CN110390845A - Robotic training method and device, storage medium and computer system under virtual environment - Google Patents
Robotic training method and device, storage medium and computer system under virtual environment Download PDFInfo
- Publication number
- CN110390845A CN110390845A CN201810349138.0A CN201810349138A CN110390845A CN 110390845 A CN110390845 A CN 110390845A CN 201810349138 A CN201810349138 A CN 201810349138A CN 110390845 A CN110390845 A CN 110390845A
- Authority
- CN
- China
- Prior art keywords
- virtual environment
- training
- movement
- deep learning
- action command
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012549 training Methods 0.000 title claims abstract description 139
- 238000000034 method Methods 0.000 title claims abstract description 62
- 238000003860 storage Methods 0.000 title claims abstract description 23
- 230000009471 action Effects 0.000 claims abstract description 58
- 238000013135 deep learning Methods 0.000 claims abstract description 49
- 238000004088 simulation Methods 0.000 claims abstract description 14
- 238000012937 correction Methods 0.000 claims description 10
- 230000004044 response Effects 0.000 claims description 6
- 241000406668 Loxodonta cyclotis Species 0.000 claims description 2
- 238000004590 computer program Methods 0.000 claims description 2
- 238000005516 engineering process Methods 0.000 abstract description 4
- 230000003252 repetitive effect Effects 0.000 abstract description 4
- 230000000875 corresponding effect Effects 0.000 description 17
- 238000012545 processing Methods 0.000 description 15
- 238000012546 transfer Methods 0.000 description 7
- 238000004891 communication Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 238000004422 calculation algorithm Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 241000209094 Oryza Species 0.000 description 3
- 235000007164 Oryza sativa Nutrition 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 235000009566 rice Nutrition 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 230000005291 magnetic effect Effects 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 210000004027 cell Anatomy 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000005520 cutting process Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000002045 lasting effect Effects 0.000 description 1
- 238000007477 logistic regression Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000007637 random forest analysis Methods 0.000 description 1
- 230000003014 reinforcing effect Effects 0.000 description 1
- 230000000630 rising effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 230000035939 shock Effects 0.000 description 1
- 210000000352 storage cell Anatomy 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 239000011800 void material Substances 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
- 238000003466 welding Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B9/00—Simulators for teaching or training purposes
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- Educational Administration (AREA)
- Educational Technology (AREA)
- General Physics & Mathematics (AREA)
- Manipulator (AREA)
Abstract
This disclosure relates to field of computer technology, and in particular to robotic training device, a kind of storage medium and a kind of computer system under robotic training method, a kind of virtual environment under a kind of virtual environment.The described method includes: calling predetermined movement rule in the virtual environment for being training object by robot simulation, the trained object movement is made in virtual environment;From the movement of the trained object, emulation data are obtained using deep learning system;It is input with the emulation data, so that deep learning system generates action command according to the input;According to the action command, make to train object movement in the virtual environment.The disclosure can realize the repetitive exercise to robot motion under virtual environment, to effectively improve training effectiveness.
Description
Technical field
This disclosure relates to field of computer technology, and in particular to robotic training method, Yi Zhongxu under a kind of virtual environment
Robotic training device, a kind of storage medium and a kind of computer system under near-ring border.
Background technique
As robot technology is increasingly mature, robot is had begun in the daily of more and more fields substitution personnel
Work, the work such as the especially some higher work of repeatability, such as cargo carrying, assembling product.
The prior art, to guarantee that robot can normally complete work, needs mostly before operation that robot is practical
The simulated environment similar or similar with robot actual working environment is built in laboratory, and the movement of robot is instructed
Practice.But it is influenced in practical applications, by robot movement speed and in the time cost of scenario reduction repetitive exercise,
Very big to the simulation training time loss of robot, training effectiveness is extremely low.Also, there is also the related algorithms of robotic training
Speed and the inconsistent situation of real work speed are executed, and then constrains the efficiency of robotic training.In addition, due to by
The limitation and influence of the objective condition such as laboratory environment, space, fund and the update of robot actual working environment, robot
Often there are certain essence differences with actual working environment for simulated environment where training, and robotic training result is caused to be paid no attention to
Think, so that robot has movement inaccuracy in actual working environment, is also easy to produce situations such as danger.
It should be noted that information is only used for reinforcing to the background of the disclosure disclosed in above-mentioned background technology part
Understand, therefore may include the information not constituted to the prior art known to persons of ordinary skill in the art.
Summary of the invention
The disclosure is designed to provide under a kind of virtual environment machine under robotic training method, a kind of virtual environment
People's training device, a kind of storage medium and a kind of computer system, and then overcome at least to a certain extent due to related skill
Robotic training efficiency is lower caused by the limitation and defect of art, the situation of training result inaccuracy.
Other characteristics and advantages of the disclosure will be apparent from by the following detailed description, or partially by this public affairs
The practice opened and acquistion.
According to the disclosure in a first aspect, providing a kind of robotic training method under virtual environment, comprising:
In the virtual environment for being training object by robot simulation, predetermined movement rule is called, is made in virtual environment
The trained object movement;
From the movement of the trained object, emulation data are obtained using deep learning system;
It is input with the emulation data, so that deep learning system generates action command according to the input;
According to the action command, make to train object movement in the virtual environment.
In a kind of exemplary embodiment of the disclosure, make in virtual environment the trained object move the step of it
Before, the method also includes:
Receive configuration of the user in the virtual environment to training object.
In a kind of exemplary embodiment of the disclosure, the reception user is in the virtual environment to training object
Configuration, comprising: receive user in the virtual environment to training object type and/or position configuration.
In a kind of exemplary embodiment of the disclosure, user is received in the virtual environment to the class of training object
The configuration of type includes:
It shows and object type list may be selected;List can be list or icon
Selection in response to user to object type in the list, by the object type configuration of selection described virtual
In environment.
In a kind of exemplary embodiment of the disclosure, user is received in the virtual environment to the position of training object
The configuration set includes:
Dragging in response to user to training object described in the virtual environment, the trained object is moved to and is dragged
The position arrived.
In a kind of exemplary embodiment of the disclosure, make in virtual environment the trained object move the step of it
Before, the method also includes:
User is received to preset sports rule.
It is input with the emulation data, so that deep learning system root in a kind of exemplary embodiment of the disclosure
Action command is generated according to the input, comprising: so that deep learning system searching is acted reward list, to obtain the emulation of input
The movement of the corresponding maximal rewards value of data.
It is described after obtaining emulation data using deep learning system in a kind of exemplary embodiment of the disclosure
Method further include:
The emulation data that are currently obtained based on deep learning system and the last emulation data obtained and corresponding time
Report value calculates the corresponding return value of emulation data currently obtained;
By the current action of emulation data, the training object detected that deep learning system currently obtains, calculated
The corresponding return value of emulation data currently obtained increases to the reward list.
It is described after the trained object movement is made in virtual environment in a kind of exemplary embodiment of the disclosure
Method further include:
Go out robot accident using deep learning system identification;
Generate correction action command corresponding with robot accident;
According to the correction action command, make to train object movement in the virtual environment.
In a kind of exemplary embodiment of the disclosure, the method also includes: in the virtual environment receive user
Configuration to training objective, and
After according to the action command, making to train object movement in the virtual environment, the method is also wrapped
It includes:
Judge whether the training objective of configuration reaches;
In the case that the training objective of configuration is not up to after having called the predetermined movement rule, training knot is determined
Beam.
According to the second aspect of the disclosure, robotic training device under a kind of virtual environment is provided, comprising:
Training object control module calls predetermined movement rule in the virtual environment for being training object by robot simulation
Then, the trained object movement is made in virtual environment;
Data acquisition module is emulated, for obtaining and emulating using deep learning system from the movement of the trained object
Data;
Action command obtains module, for being input with the emulation data, so that deep learning system is according to described defeated
Enter to generate action command;
Action command execution module trains object fortune for making in the virtual environment according to the action command
It is dynamic.
According to the third aspect of the disclosure, a kind of storage medium is provided, is stored thereon with computer program, described program
Robotic training method under above-mentioned virtual environment is realized when being executed by processor.
According to the fourth aspect of the disclosure, a kind of computer system is provided, comprising:
Processor;And
Memory, for storing the executable instruction of the processor;
Wherein, the processor is configured to execute machine under above-mentioned virtual environment via the executable instruction is executed
Device people's training method.
Under virtual environment provided by a kind of embodiment of the disclosure in robotic training method, by virtual environment
It is middle that robot is established into training object, and emulation data are obtained in the motion process of training object, it can be convenient for setting and machine
The consistent training environment of device people's actual working environment, while being also convenient for carrying out training environment according to actual working environment timely
Adjustment, to effectively guarantee the accuracy of robotic training environment, the effective accuracy for guaranteeing training result can also have
The reduction of effect is to time cost and capital consumption brought by the building of robot and its simulated environment.In addition, passing through utilization
The Generation of simulating data that deep learning system is obtained according to virtual environment refers to movement of the training object in virtual environment
It enables, the repetitive exercise to robot motion is realized, to effectively improve training effectiveness.
It should be understood that above general description and following detailed description be only it is exemplary and explanatory, not
The disclosure can be limited.
Detailed description of the invention
The drawings herein are incorporated into the specification and forms part of this specification, and shows the reality for meeting the disclosure
Example is applied, and together with specification for explaining the principles of this disclosure.It should be evident that the accompanying drawings in the following description is only this
Disclosed some embodiments without creative efforts, may be used also for those of ordinary skill in the art
To obtain other drawings based on these drawings.
Fig. 1 schematically shows the schematic diagram of robotic training method under a kind of virtual environment in the prior art;
Fig. 2 schematically shows a kind of signal of robotic training device under virtual environment in disclosure exemplary embodiment
Figure;
Fig. 3 schematically shows a kind of schematic diagram of computer system in disclosure exemplary embodiment;
Fig. 4 schematically shows in disclosure exemplary embodiment a kind of the another of robotic training device under virtual environment
Kind schematic diagram.
Specific embodiment
Example embodiment is described more fully with reference to the drawings.However, example embodiment can be with a variety of shapes
Formula is implemented, and is not understood as limited to example set forth herein;On the contrary, thesing embodiments are provided so that the disclosure will more
Add fully and completely, and the design of example embodiment is comprehensively communicated to those skilled in the art.Described spy
Sign, structure or characteristic can be incorporated in any suitable manner in one or more embodiments.
In addition, attached drawing is only the schematic illustrations of the disclosure, it is not necessarily drawn to scale.Identical attached drawing in figure
Label indicates same or similar part, thus will omit repetition thereof.Some block diagrams shown in the drawings are
Functional entity, not necessarily must be corresponding with physically or logically independent entity.These can be realized using software form
Functional entity, or these functional entitys are realized in one or more hardware modules or integrated circuit, or heterogeneous networks and/
Or these functional entitys are realized in processor device and/or microcontroller device.
A kind of robotic training method under virtual environment is provided firstly in this example embodiment, can be applied to machine
The simulation training of device people, for example, being used on industrial flow-line for the transfer robot in unmanned cargo hold for transporting goods
Identification, assembling, the industrial robot for examining product, and the industrial machine for production for executing the movements such as welding, cutting
The action training of Qi Rendeng robot.With reference to shown in Fig. 1, robotic training method may include under above-mentioned virtual environment
Following steps:
S101 calls predetermined movement rule, in virtual ring in the virtual environment for being training object by robot simulation
Make the trained object movement in border;
S102 obtains emulation data using deep learning system from the movement of the trained object;
S103 is input with the emulation data, so that deep learning system generates action command according to the input;
S104 makes to train object movement in the virtual environment according to the action command.
Under virtual environment provided by this example embodiment in robotic training method, on the one hand, by virtual ring
In border by robot establish training object, and training object motion process in obtain emulation data, can convenient for setting with
The consistent training environment of robot actual working environment, at the same be also convenient for training environment according to actual working environment carry out and
When adjust, to effectively guarantee the accuracy of robotic training environment, the effective accuracy for guaranteeing training result also can
It effectively reduces to time cost and capital consumption brought by the building of robot and its simulated environment.On the other hand, lead to
It is dynamic in virtual environment for training object to cross the Generation of simulating data obtained using deep learning system according to virtual environment
It instructs, the repetitive exercise to robot motion is realized, to effectively improve training effectiveness.In addition, by utilizing virtual ring
Border is trained robot, and the generation of safety accident can effectively be reduced or avoided.
In the following, accompanying drawings and embodiments will be combined to robotic training method under the virtual environment in this example embodiment
Each step be described in detail.
Step S100 receives configuration of the user in the virtual environment to training object.
In this example embodiment, for virtual environment and training object, 3D engine can use for robot
Practical work scene carry out equal proportion creation and guaranteeing strict conformance.To guarantee to train the accuracy of object Training scene,
And then guarantee the validity of robot training result in virtual environment.
Before making to train object movement, user first can carry out parameter configuration to training object according to actual needs.It lifts
For example, configuration parameter may include: the configuration of the type and/or position in the virtual environment to training object.
Specifically, may include: to the configuration of the type of training object
S1001 shows and object type list may be selected;
S1002, the selection in response to user to object type in the list configure the object type of selection in institute
It states in virtual environment.
Training object is selected in the list that, when configuring training object, can be provided in 3D engine using 3D engine
Type, which can be the list in the form of textual list or icon list etc., and the disclosure is not spy to this
It is different to limit.When user selectes some or certain several objects in object type list, the object of the type can be configured
In virtual environment.
For example, if training object is the transfer robot in unmanned freight house, user can be according to model, power
And the parameters such as size, color carry out the transfer robot in configuration object type table;If training object be from electrical forklift,
User can according to model, power with etc. in parameter configurations object type table from electrical forklift;If training object is industrial flowing water
Mechanical arm on line can then be selected according to its model and concrete function.
In addition, for may include: to the configuration of the position of training object in the virtual environment
S1003, the dragging in response to user to training object described in the virtual environment, the trained object is moved
Move the position being dragged to.
In this example embodiment, after determining training object, trained object can be set depending on the user's operation and is existed
Initial position in virtual environment, and with the positional relationship in virtual environment between other objects.For example, according to user's
It operates and determines the position between electrical forklift and pallet and the inceptive direction etc. from electrical forklift in virtual environment;Or it carries
Position between robot and shelf;Or the parameters such as the distance between mechanical arm and production line.
Based on above content, in this example embodiment, before the trained object movement is made in virtual environment,
The above method can also include:
S1004 receives user and presets to sports rule.
In the configuration parameter for determining Training scene in virtual environment using 3D engine, determining training object, and training
Behind position of the object in Training scene, user can also configure the sports rule of training object.For example, it is above-mentioned from
Electrical forklift, transfer robot or mechanical arm sports rule.
Sports rule is illustrated for from electrical forklift in the present embodiment.Specifically, can be set certainly
Sports rule of the electrical forklift in virtual environment include:
It is enabled in virtual environment from electrical forklift forward direction and being advanced in face of pallet, be directed at the emulation pallet gaps underneath and pitched
It rises;Or the emulation fork truck is enabled to advance clockwise or counterclockwise around the emulation pallet, it is aligned below the emulation pallet
Gap simultaneously forks.
Certainly, in other exemplary embodiments of the disclosure, more, more detailed sports rule, example also can be set
Movement velocity, the move angle of mechanical arm are such as set.Sports rule can be according to the actual working state of training object
And operative scenario is configured, the disclosure does not do particular determination to the particular content of the sports rule of training object.
By the way that the sports rule of training object is arranged previously according to practical work state, the movement for training object can be made
Meet practical application scene, and then effectively guarantees the accuracy of training result.
Step S101 calls predetermined movement rule, in void in the virtual environment for being training object by robot simulation
Make the trained object movement in near-ring border.
In this example embodiment, configures and complete in the parameters to training object, application scenarios and sports rule
Afterwards, sports rule can be called, makes to train object setting in motion in virtual environment.For example, making from electrical forklift by above-mentioned
Sports rule setting in motion.
Step S102 obtains emulation data using deep learning system from the movement of the trained object.
In this example embodiment, after training object is by predetermined movement rule setting in motion, deep learning system
To obtain emulation data in the simulation video or emulating image that generate according to 3D engine.For example, in simulation video or emulating image
The middle motion state and kinematic parameter for extracting training object.
The present embodiment is illustrated the acquisition of emulation data from for electrical forklift by above-mentioned, specifically, can be by
According to the image of preset time interval interception simulation video, such as each second one emulating image of interception, and utilize image recognition skill
Art is identified from the direction of motion at electrical forklift current time, from the distance between electrical forklift and pallet, and from the fork of electrical forklift
The design parameters such as angle or rising height are lifted, using those parameters as the output data of deep learning system.
Certainly, in other exemplary embodiments of the disclosure, the emulation data that deep learning system obtains be can also be
The data such as distance between the direction of motion and shelf of transfer robot;Either between mechanical arm and product operation point away from
From or the data such as present rotation angel degree.The disclosure does not do particular determination to this.
Step S103 is input with the emulation data, so that deep learning system refers to according to input generation movement
It enables.
In this example embodiment, can also may be used using above-mentioned emulation data as the input data of deep learning system
Think that deep learning system configuration one acts reward list.Above-mentioned step S103 can specifically include: make deep learning system
Lookup acts reward list, to obtain the movement of the corresponding maximal rewards value of emulation data of input.
After the input that deep learning system obtains current time, it can be reported in list according to the input in movement
Obtain optimal action command, the i.e. maximum action command of return value.
For example, for above-mentioned from electrical forklift, as shown in table 1 below, movement reward list may include: from moving fork
Current distance between vehicle and pallet, the corresponding action command of the current distance and the corresponding return value of the action command.
State (with stack distance between plates) | Action command | Return value |
2 meters | 15 ° of lift upwards is advanced | 0.835 |
2 meters | 20 ° of lift upwards is advanced | 0.9 |
2 meters | 25 ° of lift upwards is advanced | 0.73 |
2 meters | 30 ° of lift upwards is advanced | 0.65 |
1.9 rice | 20 ° of lift upwards is advanced | 0.534 |
Table 1
For example, maximal rewards value is 0.9 when detecting current time from a distance from electrical forklift is between pallet is 2 meters,
Its corresponding action command are as follows: 20 ° of lift upwards is advanced.
Step S104 makes to train object movement in the virtual environment according to the action command.
In this example embodiment, after deep learning system generates action command according to the input at current time,
The action command to be sent in 3D engine, so that the training object in virtual environment is moved according to the action command.Example
Such as, above-mentioned from electrical forklift, action command can be executed are as follows: 20 ° of lift upwards is advanced.
By the way that data interaction will be established between deep learning system and 3D engine, make the training pair generated in virtual environment
The emulation data of elephant can be used as the input data of deep learning system, and allow deep learning system according to the input number
According to the training object feedback action instruction into virtual environment, instruction is iterated to robot in virtual environment to realize
Practice.
Based on above content, in other exemplary embodiments of the disclosure, this method can also include:
Step S1041, the emulation data currently obtained based on deep learning system and the last emulation data obtained
With corresponding return value, the corresponding return value of emulation data currently obtained is calculated;
Step S1042, by emulation data that deep learning system currently obtains, the training object detected it is current dynamic
Make, the calculated corresponding return value of emulation data currently obtained increases to the reward list.
For example, above-mentioned action command is being executed in virtual environment from electrical forklift from electrical forklift for above-mentioned
After " 20 ° of lift upwards is advanced ", parameter of the electrical forklift in next second emulating image can be obtained from.
For example, identifying in the emulating image of subsequent time after executing action command " 20 ° of lift upwards is advanced " from moving fork
Vehicle and stack distance between plates are 1.9 meters, and lift 18 ° upwards, are advanced.Although a upper action command is " 20 ° of lift upwards is advanced ",
The regular hour is needed due to being fully finished the movement, the movement that fork truck is completed at this moment is 18 ° of lift upwards, and preceding
Into 0.1 meter.
It, can be according to the state and the movement identified, according to presetting method meter after identifying the parameter at this moment
Calculate return value.For example, in the right direction, then often upper 1 ° of lift makes return value add 0.01;If anisotropy makes return value subtract 0.01
Deng.The disclosure does not do particular determination to the specific calculation of return value.
It, as shown in table 2, then can should if the return value that benefit calculates above-mentioned movement with the aforedescribed process is 0.93
Item record increases to automatic reward list.
State (with stack distance between plates) | Action command | Return value |
2 meters | 15 ° of advances of lift upwards | 0.835 |
2 meters | 20 ° of advances of lift upwards | 0.9 |
2 meters | 25 ° of advances of lift upwards | 0.73 |
2 meters | 30 ° of advances of lift upwards | 0.65 |
1.9 rice | 20 ° of advances of lift upwards | 0.534 |
1.9 rice | 18 ° of advances of lift upwards | 0.93 |
Table 2
It is above-mentioned after the trained object movement is made in virtual environment in other exemplary embodiments of the disclosure
Method can also include:
Step S1051 goes out robot accident using deep learning system identification;
Step S1052 generates correction action command corresponding with robot accident;
Step S1053 makes to train object movement in the virtual environment according to the correction action command.
If there is robot fault by emulating image identification during robotic training, stack such as is hit from electrical forklift
Plate, transfer robot hit the product on shelf or mechanical arm shock assembly line, and transport from electrical forklift or transfer robot
When dynamic direction is completely wrong, mechanical arm rotates out of predetermined angle range, it can be generated and be corresponded to according to specific mistake
Correction action command, and make train object execute the correction action command.For example, correction movement is specified to can be rotation extremely
Assigned direction such as moves forward or back to designated position or distance to a declared goal, and returns back to initial position at the movement.The disclosure is to this
Do not do particular determination.
Above-mentioned identification robot accident can also be judged according to parameters such as distance, angles;Or it can will show
Some accident pictures are saved into deep learning system, for carrying out the judgement of accident pattern.
Robot accident is identified by setting, and corresponding correction action command is set, and makes the motion process for training object
Real work scene, the validity and integrality of effective hoisting machine people training result can be more nearly.Also, it can also
Enough guarantee the progress that training process can be effective, lasting.
In addition, in this illustrative embodiments, configuration of the user to training objective is received in the virtual environment,
And after according to the action command, making to train object movement in the virtual environment, above-mentioned method can also be wrapped
It includes:
Step S1061, judges whether the training objective of configuration reaches;
Step S1062, in the case that the training objective of configuration is not up to after having called the predetermined movement rule,
Determine that training terminates.
After training object executes action command in virtual environment, it can judge whether to complete training objective.If not
Training objective is completed, then continues to train according to preset sports rule;If since robot accident or other reasons cannot
Continue to train, then can carry out rollback operation or reset operation, makes virtual environment and training object callbacks into historical record
Sometime node continues to train;If completing predetermined movement rule, and after reaching default training objective, it can conclude training
Terminate;If in the case that the training objective of configuration is still not up to, can also sentence after having called the predetermined movement rule
Fixed training terminates.
In addition, this method can also be calculated using other machine learning in other illustrative embodiments of the disclosure
Method is trained robot, for example, random forests algorithm, logistic regression algorithm etc..
Robotic training method under virtual environment provided by the disclosure is established and practical application scene using 3D engine
The virtual environment of strict conformance, and robot is trained using the virtual environment;Using deep learning algorithm according to machine
Motion state of the people in virtual environment generates corresponding action command, and robot is enabled to be iterated training in virtual environment,
Using the continuous iterated revision of the action command of deep learning system feedback, to obtain the optimal action command of robot.It can
Make robot Fast Learning, and quickly finish training objective, reduces time cost.In addition, by utilizing virtual environment pair
Robot is trained, and can effectively reduce accident and danger as caused by robot motion inaccuracy.
It should be noted that above-mentioned attached drawing is only processing included by method according to an exemplary embodiment of the present invention
It schematically illustrates, rather than limits purpose.It can be readily appreciated that above-mentioned processing shown in the drawings does not indicate or limits these processing
Time sequencing.In addition, be also easy to understand, these processing, which can be, for example either synchronously or asynchronously to be executed in multiple modules.
Further, it is instructed refering to what is shown in Fig. 2, also providing robot under a kind of virtual environment in this exemplary embodiment
Practice device 20, comprising: training object control module 201, emulation data acquisition module 202, action command obtain module 203 with
And action command execution module 204.Wherein:
The trained object control module 201 can be used in the virtual environment for being training object by robot simulation,
Predetermined movement rule is called, the trained object movement is made in virtual environment.
The emulation data acquisition module 202 can be used for from the movement of the trained object, utilize deep learning system
System obtains emulation data.
The action command, which obtains module 203, can be used for the emulation data for input, so that deep learning system
Action command is generated according to the input.
The action command execution module 204 can be used for making to instruct in the virtual environment according to the action command
Practice object movement.
Under above-mentioned virtual environment in robotic training device the detail of each module in corresponding virtual environment
It is described in detail in lower robotic training method, therefore details are not described herein again.
It should be noted that although be referred in the above detailed description for act execute equipment several modules or
Unit, but this division is not enforceable.In fact, according to embodiment of the present disclosure, above-described two or more
Multimode or the feature and function of unit can embody in a module or unit.Conversely, above-described one
Module or the feature and function of unit can be to be embodied by multiple modules or unit with further division.
In an exemplary embodiment of the disclosure, a kind of computer system that can be realized the above method is additionally provided.
Person of ordinary skill in the field it is understood that various aspects of the invention can be implemented as system, method or
Program product.Therefore, various aspects of the invention can be embodied in the following forms, it may be assumed that complete hardware embodiment,
The embodiment combined in terms of complete Software Implementation (including firmware, microcode etc.) or hardware and software, here may be used
To be referred to as circuit, " module " or " system ".
The computer system 600 of this embodiment according to the present invention is described referring to Fig. 3.The meter that Fig. 3 is shown
Calculation machine system 600 is only an example, should not function to the embodiment of the present invention and use scope bring any restrictions.
As shown in figure 3, computer system 600 is showed in the form of universal computing device.The component of computer system 600
Can include but is not limited to: at least one above-mentioned processing unit 610, connects not homology at least one above-mentioned storage unit 620
The bus 630 of system component (including storage unit 620 and processing unit 610).
Wherein, the storage unit is stored with program code, and said program code can be held by the processing unit 610
Row, so that various according to the present invention described in the execution of the processing unit 610 above-mentioned " illustrative methods " part of this specification
The step of illustrative embodiments.For example, the processing unit 610 can execute step S101 as shown in fig. 1: inciting somebody to action
Robot simulation is to call predetermined movement rule in the virtual environment of training object, the training pair is made in virtual environment
As movement;S102: from the movement of the trained object, emulation data are obtained using deep learning system;S103: with described
Emulating data is input, so that deep learning system generates action command according to the input;S104: refer to according to the movement
It enables, makes to train object movement in the virtual environment.
Storage unit 620 may include the readable medium of volatile memory cell form, such as Random Access Storage Unit
(RAM) 6201 and/or cache memory unit 6202, it can further include read-only memory unit (ROM) 6203.
Storage unit 620 can also include program/utility with one group of (at least one) program module 6205
6204, such program module 6205 includes but is not limited to: operating system, one or more application program, other program moulds
It may include the realization of network environment in block and program data, each of these examples or certain combination.
Bus 630 can be to indicate one of a few class bus structures or a variety of, including storage unit bus or storage
Cell controller, peripheral bus, graphics acceleration port, processing unit use any total knot in a variety of bus structures
The local bus of structure.
Computer system 600 can also (such as keyboard, sensing equipment, bluetooth be set with one or more external equipments 700
It is standby etc.) communication, the equipment interacted with the computer system 600 communication can be also enabled a user to one or more, and/or
It (such as is routed with any equipment for enabling the computer system 600 to be communicated with one or more of the other calculating equipment
Device, modem etc.) communication.This communication can be carried out by input/output (I/O) interface 650.Also, computer
System 600 can also pass through network adapter 660 and one or more network (such as local area network (LAN), wide area network
(WAN) and/or public network, for example, internet) communication.As shown, network adapter 660 passes through bus 630 and computer
Other modules of system 600 communicate.It should be understood that although not shown in the drawings, can be used in conjunction with computer system 600 other
Hardware and/or software module, including but not limited to: microcode, device driver, redundant processing unit, external disk drive battle array
Column, RAID system, tape drive and data backup storage system etc..
Through the above description of the embodiments, those skilled in the art is it can be readily appreciated that example described herein is implemented
Mode can also be realized by software realization in such a way that software is in conjunction with necessary hardware.Therefore, according to this public affairs
The technical solution for opening embodiment can be embodied in the form of software products, which can store non-at one
In volatile storage medium (can be CD-ROM, USB flash disk, mobile hard disk etc.) or on network, including some instructions are so that one
It calculates equipment (can be personal computer, server, terminal installation or network equipment etc.) and executes and implemented according to the disclosure
The method of mode.
In an exemplary embodiment of the disclosure, a kind of computer readable storage medium is additionally provided, energy is stored thereon with
Enough realize the program product of this specification above method.In some possible embodiments, various aspects of the invention may be used also
In the form of being embodied as a kind of program product comprising program code, when described program product is run on the terminal device,
Said program code is for executing the terminal device described in above-mentioned " illustrative methods " part of this specification according to this
The step of inventing various illustrative embodiments.
Refering to what is shown in Fig. 4, describing the program product for realizing the above method of embodiment according to the present invention
800, can using portable compact disc read only memory (CD-ROM) and including program code, and can in terminal device,
Such as it is run on PC.However, program product of the invention is without being limited thereto, in this document, readable storage medium storing program for executing can be with
To be any include or the tangible medium of storage program, the program can be commanded execution system, device or device use or
It is in connection.
Described program product can be using any combination of one or more readable mediums.Readable medium can be readable
Signal media or readable storage medium storing program for executing.Readable storage medium storing program for executing for example can be but be not limited to electricity, magnetic, optical, electromagnetic, infrared ray,
Or system, device or the device of semiconductor, or any above combination.The more specific example of readable storage medium storing program for executing is (non-poor
The list of act) include: electrical connection with one or more conducting wires, portable disc, hard disk, random access memory (RAM),
Read-only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc is read-only deposits
Reservoir (CD-ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.
Computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal,
Wherein carry readable program code.The data-signal of this propagation can take various forms, and including but not limited to electromagnetism is believed
Number, optical signal or above-mentioned any appropriate combination.Readable signal medium can also be other than readable storage medium storing program for executing it is any can
Read medium, the readable medium can send, propagate or transmit for by instruction execution system, device or device use or
Person's program in connection.
The program code for including on readable medium can transmit with any suitable medium, including but not limited to wirelessly, have
Line, optical cable, RF etc. or above-mentioned any appropriate combination.
The program for executing operation of the present invention can be write with any combination of one or more programming languages
Code, described program design language include object oriented program language-Java, C++ etc., further include conventional
Procedural programming language-such as " C " language or similar programming language.Program code can be fully in user
It calculates and executes in equipment, partly executes on a user device, being executed as an independent software package, partially in user's meter
Upper side point is calculated to execute or execute in remote computing device or server completely on a remote computing.It is being related to
In the situation of remote computing device, remote computing device can pass through the network of any kind, including local area network (LAN) or wide
Domain net (WAN), is connected to user calculating equipment, or, it may be connected to external computing device (such as utilize Internet service
Provider is connected by internet).
In addition, above-mentioned attached drawing is only the schematic theory of processing included by method according to an exemplary embodiment of the present invention
It is bright, rather than limit purpose.It can be readily appreciated that the time that above-mentioned processing shown in the drawings did not indicated or limited these processing is suitable
Sequence.In addition, be also easy to understand, these processing, which can be, for example either synchronously or asynchronously to be executed in multiple modules.
Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to the disclosure
Other embodiments.This application is intended to cover any variations, uses, or adaptations of the disclosure, these modifications, purposes or
Person's adaptive change follows the general principles of this disclosure and including the undocumented common knowledge in the art of the disclosure
Or conventional techniques.The description and examples are only to be considered as illustrative, and the true scope and spirit of the disclosure are wanted by right
It asks and points out.
It should be understood that the present disclosure is not limited to the precise structures that have been described above and shown in the drawings, and
And various modifications and changes may be made without departing from the scope thereof.The scope of the present disclosure is only limited to the appended claims.
Claims (13)
1. a kind of robotic training method under virtual environment characterized by comprising
In the virtual environment for being training object by robot simulation, predetermined movement rule is called, is made in virtual environment described
Training object movement;
From the movement of the trained object, emulation data are obtained using deep learning system;
It is input with the emulation data, so that deep learning system generates action command according to the input;
According to the action command, make to train object movement in the virtual environment.
2. the method according to claim 1, wherein the step of moving the trained object in virtual environment
Before, the method also includes:
Receive configuration of the user in the virtual environment to training object.
3. according to the method described in claim 2, it is characterized in that, the reception user is in the virtual environment to training pair
The configuration of elephant, comprising: receive the configuration of type and/or position of the user in the virtual environment to training object.
4. according to the method described in claim 3, it is characterized in that, receiving user in the virtual environment to training object
The configuration of type includes:
It shows and object type list may be selected;
Selection in response to user to object type in the list configures the object type of selection in the virtual environment
In.
5. according to the method described in claim 3, it is characterized in that, receiving user in the virtual environment to training object
The configuration of position includes:
Dragging in response to user to training object described in the virtual environment, is moved to the position being dragged to for the trained object
It sets.
6. the method according to claim 1, wherein the step of moving the trained object in virtual environment
Before, the method also includes:
User is received to preset sports rule.
7. the method according to claim 1, wherein being input with the emulation data, so that deep learning system
System generates action command according to the input, comprising: so that deep learning system searching is acted reward list, to obtain the imitative of input
The movement of the corresponding maximal rewards value of true data.
8. the method according to the description of claim 7 is characterized in that using deep learning system obtain emulation data after,
The method also includes:
The emulation data and the last emulation data and corresponding return value obtained currently obtained based on deep learning system,
Calculate the corresponding return value of emulation data currently obtained;
By the current action of emulation data, the training object detected that deep learning system currently obtains, calculated current
The corresponding return value of emulation data of acquisition increases to the reward list.
9. the method according to claim 1, wherein after making the trained object movement in virtual environment,
The method also includes:
Go out robot accident using deep learning system identification;
Generate correction action command corresponding with robot accident;
According to the correction action command, make to train object movement in the virtual environment.
10. the method according to claim 1, wherein further include: user is received in the virtual environment to instruction
Practice the configuration of target, and
After according to the action command, making to train object movement in the virtual environment, the method also includes:
Judge whether the training objective of configuration reaches;
In the case that the training objective of configuration is not up to after having called the predetermined movement rule, determine that training terminates.
11. robotic training device under a kind of virtual environment characterized by comprising
Training object control module, in the virtual environment for being training object by robot simulation, calling predetermined movement rule
Then, the trained object movement is made in virtual environment;
Data acquisition module is emulated, for obtaining emulation data using deep learning system from the movement of the trained object;
Action command obtains module, for being input with the emulation data, so that deep learning system is given birth to according to the input
At action command;
Action command execution module trains object movement for making in the virtual environment according to the action command.
12. a kind of storage medium is stored thereon with computer program, realizes when described program is executed by processor and wanted according to right
Robotic training method under virtual environment described in asking any one of 1 to 11.
13. a kind of computer system characterized by comprising
Processor;And
Memory, for storing the executable instruction of the processor;
Wherein, the processor is configured to come any one of perform claim requirement 1 to 10 institute via the execution executable instruction
Robotic training method under the virtual environment stated.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810349138.0A CN110390845A (en) | 2018-04-18 | 2018-04-18 | Robotic training method and device, storage medium and computer system under virtual environment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810349138.0A CN110390845A (en) | 2018-04-18 | 2018-04-18 | Robotic training method and device, storage medium and computer system under virtual environment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110390845A true CN110390845A (en) | 2019-10-29 |
Family
ID=68283156
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810349138.0A Pending CN110390845A (en) | 2018-04-18 | 2018-04-18 | Robotic training method and device, storage medium and computer system under virtual environment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110390845A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111580411A (en) * | 2020-04-27 | 2020-08-25 | 珠海格力电器股份有限公司 | Control parameter optimization method, device and system |
CN112338920A (en) * | 2020-11-04 | 2021-02-09 | 中国联合网络通信集团有限公司 | Data processing method, device and equipment |
CN112427843A (en) * | 2020-11-09 | 2021-03-02 | 中国船舶重工集团公司第七一六研究所 | Ship multi-mechanical-arm welding spot cooperative welding method based on QMIX reinforcement learning algorithm |
CN112434464A (en) * | 2020-11-09 | 2021-03-02 | 中国船舶重工集团公司第七一六研究所 | Arc welding cooperative welding method for multiple mechanical arms of ship based on MADDPG reinforcement learning algorithm |
CN115186370A (en) * | 2022-05-18 | 2022-10-14 | 广东海洋大学 | Engineering forklift transfer learning system based on deep learning training model |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120229446A1 (en) * | 2011-03-07 | 2012-09-13 | Avaya Inc. | Method and system for topic based virtual environments and expertise detection |
CN103258338A (en) * | 2012-02-16 | 2013-08-21 | 克利特股份有限公司 | Method and system for driving simulated virtual environments with real data |
US20150100530A1 (en) * | 2013-10-08 | 2015-04-09 | Google Inc. | Methods and apparatus for reinforcement learning |
CN106327942A (en) * | 2016-10-21 | 2017-01-11 | 上海申电教育培训有限公司 | Distributed electric power training system based on virtual reality |
US20170028553A1 (en) * | 2015-07-31 | 2017-02-02 | Fanuc Corporation | Machine learning device, robot controller, robot system, and machine learning method for learning action pattern of human |
CN106997243A (en) * | 2017-03-28 | 2017-08-01 | 北京光年无限科技有限公司 | Speech scene monitoring method and device based on intelligent robot |
US20170285584A1 (en) * | 2016-04-04 | 2017-10-05 | Fanuc Corporation | Machine learning device that performs learning using simulation result, machine system, manufacturing system, and machine learning method |
WO2018006364A1 (en) * | 2016-07-07 | 2018-01-11 | 深圳狗尾草智能科技有限公司 | Robot training method and device based on virtual environment |
US9880553B1 (en) * | 2015-04-28 | 2018-01-30 | Hrl Laboratories, Llc | System and method for robot supervisory control with an augmented reality user interface |
US20180089572A1 (en) * | 2012-08-02 | 2018-03-29 | Artifical Solutions Iberia S.L. | Hybrid approach for developing, optimizing, and executing conversational interaction applications |
-
2018
- 2018-04-18 CN CN201810349138.0A patent/CN110390845A/en active Pending
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120229446A1 (en) * | 2011-03-07 | 2012-09-13 | Avaya Inc. | Method and system for topic based virtual environments and expertise detection |
CN103258338A (en) * | 2012-02-16 | 2013-08-21 | 克利特股份有限公司 | Method and system for driving simulated virtual environments with real data |
US20130218542A1 (en) * | 2012-02-16 | 2013-08-22 | Crytek Gmbh | Method and system for driving simulated virtual environments with real data |
US20180089572A1 (en) * | 2012-08-02 | 2018-03-29 | Artifical Solutions Iberia S.L. | Hybrid approach for developing, optimizing, and executing conversational interaction applications |
US20150100530A1 (en) * | 2013-10-08 | 2015-04-09 | Google Inc. | Methods and apparatus for reinforcement learning |
US9880553B1 (en) * | 2015-04-28 | 2018-01-30 | Hrl Laboratories, Llc | System and method for robot supervisory control with an augmented reality user interface |
US20170028553A1 (en) * | 2015-07-31 | 2017-02-02 | Fanuc Corporation | Machine learning device, robot controller, robot system, and machine learning method for learning action pattern of human |
US20170285584A1 (en) * | 2016-04-04 | 2017-10-05 | Fanuc Corporation | Machine learning device that performs learning using simulation result, machine system, manufacturing system, and machine learning method |
WO2018006364A1 (en) * | 2016-07-07 | 2018-01-11 | 深圳狗尾草智能科技有限公司 | Robot training method and device based on virtual environment |
CN106327942A (en) * | 2016-10-21 | 2017-01-11 | 上海申电教育培训有限公司 | Distributed electric power training system based on virtual reality |
CN106997243A (en) * | 2017-03-28 | 2017-08-01 | 北京光年无限科技有限公司 | Speech scene monitoring method and device based on intelligent robot |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111580411A (en) * | 2020-04-27 | 2020-08-25 | 珠海格力电器股份有限公司 | Control parameter optimization method, device and system |
CN112338920A (en) * | 2020-11-04 | 2021-02-09 | 中国联合网络通信集团有限公司 | Data processing method, device and equipment |
CN112338920B (en) * | 2020-11-04 | 2022-04-15 | 中国联合网络通信集团有限公司 | Data processing method, device and equipment |
CN112427843A (en) * | 2020-11-09 | 2021-03-02 | 中国船舶重工集团公司第七一六研究所 | Ship multi-mechanical-arm welding spot cooperative welding method based on QMIX reinforcement learning algorithm |
CN112434464A (en) * | 2020-11-09 | 2021-03-02 | 中国船舶重工集团公司第七一六研究所 | Arc welding cooperative welding method for multiple mechanical arms of ship based on MADDPG reinforcement learning algorithm |
CN112434464B (en) * | 2020-11-09 | 2021-09-10 | 中国船舶重工集团公司第七一六研究所 | Arc welding cooperative welding method for multiple mechanical arms of ship based on MADDPG algorithm |
WO2022095278A1 (en) * | 2020-11-09 | 2022-05-12 | 中国船舶重工集团公司第七一六研究所 | Qmix reinforcement learning algorithm-based ship welding spots collaborative welding method using multiple manipulators |
CN115186370A (en) * | 2022-05-18 | 2022-10-14 | 广东海洋大学 | Engineering forklift transfer learning system based on deep learning training model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110390845A (en) | Robotic training method and device, storage medium and computer system under virtual environment | |
US11220002B2 (en) | Robot simulation device | |
CN107992252A (en) | Information cuing method, device, electronic equipment and storage medium | |
CN108037888B (en) | Skill control method, skill control device, electronic equipment and storage medium | |
CN107967096A (en) | Destination object determines method, apparatus, electronic equipment and storage medium | |
CN108159697B (en) | Virtual object transmission method and device, storage medium and electronic equipment | |
CN107823884A (en) | Destination object determines method, apparatus, electronic equipment and storage medium | |
CN114139637B (en) | Multi-agent information fusion method and device, electronic equipment and readable storage medium | |
CN107656620A (en) | Virtual object control method, device, electronic equipment and storage medium | |
CN112416323B (en) | Control code generation method, operation method, device, equipment and storage medium | |
CN110090444A (en) | Behavior record creation method, device, storage medium and electronic equipment in game | |
WO2021138260A1 (en) | Transformation mode switching for a real-time robotic control system | |
KR20160052952A (en) | Block and user terminal for modeling 3d shape and the method for modeling 3d shape using the same | |
CN108170295A (en) | Virtual camera condition control method, device, electronic equipment and storage medium | |
CN105068653A (en) | Method and apparatus for determining touch event in virtual space | |
Liang et al. | Trajectory-based skill learning for overhead construction robots using generalized cylinders with orientation | |
KR102529023B1 (en) | Training processing device, intermediary device, training system and training processing method | |
CN114578712B (en) | Multifunctional underwater autonomous vehicle cluster simulation system | |
CN110502838A (en) | Spare parts management strategy optimization model based on emulation | |
US20220172107A1 (en) | Generating robotic control plans | |
KR20220100876A (en) | An associative framework for robotic control systems | |
CN116922379B (en) | Vision-based mechanical arm obstacle avoidance method, system, electronic equipment and storage medium | |
US20210187746A1 (en) | Task planning accounting for occlusion of sensor observations | |
US20240025035A1 (en) | Robotic simulations using multiple levels of fidelity | |
US20220402126A1 (en) | Systems, computer program products, and methods for building simulated worlds |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191029 |