CN110390845A

CN110390845A - Robotic training method and device, storage medium and computer system under virtual environment

Info

Publication number: CN110390845A
Application number: CN201810349138.0A
Authority: CN
Inventors: 韦于思
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Priority date: 2018-04-18
Filing date: 2018-04-18
Publication date: 2019-10-29

Abstract

This disclosure relates to field of computer technology, and in particular to robotic training device, a kind of storage medium and a kind of computer system under robotic training method, a kind of virtual environment under a kind of virtual environment.The described method includes: calling predetermined movement rule in the virtual environment for being training object by robot simulation, the trained object movement is made in virtual environment；From the movement of the trained object, emulation data are obtained using deep learning system；It is input with the emulation data, so that deep learning system generates action command according to the input；According to the action command, make to train object movement in the virtual environment.The disclosure can realize the repetitive exercise to robot motion under virtual environment, to effectively improve training effectiveness.

Description

Robotic training method and device, storage medium and computer system under virtual environment

Technical field

This disclosure relates to field of computer technology, and in particular to robotic training method, Yi Zhongxu under a kind of virtual environment Robotic training device, a kind of storage medium and a kind of computer system under near-ring border.

Background technique

As robot technology is increasingly mature, robot is had begun in the daily of more and more fields substitution personnel Work, the work such as the especially some higher work of repeatability, such as cargo carrying, assembling product.

The prior art, to guarantee that robot can normally complete work, needs mostly before operation that robot is practical The simulated environment similar or similar with robot actual working environment is built in laboratory, and the movement of robot is instructed Practice.But it is influenced in practical applications, by robot movement speed and in the time cost of scenario reduction repetitive exercise, Very big to the simulation training time loss of robot, training effectiveness is extremely low.Also, there is also the related algorithms of robotic training Speed and the inconsistent situation of real work speed are executed, and then constrains the efficiency of robotic training.In addition, due to by The limitation and influence of the objective condition such as laboratory environment, space, fund and the update of robot actual working environment, robot Often there are certain essence differences with actual working environment for simulated environment where training, and robotic training result is caused to be paid no attention to Think, so that robot has movement inaccuracy in actual working environment, is also easy to produce situations such as danger.

It should be noted that information is only used for reinforcing to the background of the disclosure disclosed in above-mentioned background technology part Understand, therefore may include the information not constituted to the prior art known to persons of ordinary skill in the art.

Summary of the invention

The disclosure is designed to provide under a kind of virtual environment machine under robotic training method, a kind of virtual environment People's training device, a kind of storage medium and a kind of computer system, and then overcome at least to a certain extent due to related skill Robotic training efficiency is lower caused by the limitation and defect of art, the situation of training result inaccuracy.

Other characteristics and advantages of the disclosure will be apparent from by the following detailed description, or partially by this public affairs The practice opened and acquistion.

According to the disclosure in a first aspect, providing a kind of robotic training method under virtual environment, comprising:

In the virtual environment for being training object by robot simulation, predetermined movement rule is called, is made in virtual environment The trained object movement；

From the movement of the trained object, emulation data are obtained using deep learning system；

It is input with the emulation data, so that deep learning system generates action command according to the input；

According to the action command, make to train object movement in the virtual environment.

In a kind of exemplary embodiment of the disclosure, make in virtual environment the trained object move the step of it Before, the method also includes:

Receive configuration of the user in the virtual environment to training object.

In a kind of exemplary embodiment of the disclosure, the reception user is in the virtual environment to training object Configuration, comprising: receive user in the virtual environment to training object type and/or position configuration.

In a kind of exemplary embodiment of the disclosure, user is received in the virtual environment to the class of training object The configuration of type includes:

It shows and object type list may be selected；List can be list or icon

Selection in response to user to object type in the list, by the object type configuration of selection described virtual In environment.

In a kind of exemplary embodiment of the disclosure, user is received in the virtual environment to the position of training object The configuration set includes:

Dragging in response to user to training object described in the virtual environment, the trained object is moved to and is dragged The position arrived.

User is received to preset sports rule.

It is input with the emulation data, so that deep learning system root in a kind of exemplary embodiment of the disclosure Action command is generated according to the input, comprising: so that deep learning system searching is acted reward list, to obtain the emulation of input The movement of the corresponding maximal rewards value of data.

It is described after obtaining emulation data using deep learning system in a kind of exemplary embodiment of the disclosure Method further include:

The emulation data that are currently obtained based on deep learning system and the last emulation data obtained and corresponding time Report value calculates the corresponding return value of emulation data currently obtained；

By the current action of emulation data, the training object detected that deep learning system currently obtains, calculated The corresponding return value of emulation data currently obtained increases to the reward list.

It is described after the trained object movement is made in virtual environment in a kind of exemplary embodiment of the disclosure Method further include:

Go out robot accident using deep learning system identification；

Generate correction action command corresponding with robot accident；

According to the correction action command, make to train object movement in the virtual environment.

In a kind of exemplary embodiment of the disclosure, the method also includes: in the virtual environment receive user Configuration to training objective, and

After according to the action command, making to train object movement in the virtual environment, the method is also wrapped It includes:

Judge whether the training objective of configuration reaches；

In the case that the training objective of configuration is not up to after having called the predetermined movement rule, training knot is determined Beam.

According to the second aspect of the disclosure, robotic training device under a kind of virtual environment is provided, comprising:

Training object control module calls predetermined movement rule in the virtual environment for being training object by robot simulation Then, the trained object movement is made in virtual environment；

Data acquisition module is emulated, for obtaining and emulating using deep learning system from the movement of the trained object Data；

Action command obtains module, for being input with the emulation data, so that deep learning system is according to described defeated Enter to generate action command；

Action command execution module trains object fortune for making in the virtual environment according to the action command It is dynamic.

According to the third aspect of the disclosure, a kind of storage medium is provided, is stored thereon with computer program, described program Robotic training method under above-mentioned virtual environment is realized when being executed by processor.

According to the fourth aspect of the disclosure, a kind of computer system is provided, comprising:

Processor；And

Memory, for storing the executable instruction of the processor；

Wherein, the processor is configured to execute machine under above-mentioned virtual environment via the executable instruction is executed Device people's training method.

Under virtual environment provided by a kind of embodiment of the disclosure in robotic training method, by virtual environment It is middle that robot is established into training object, and emulation data are obtained in the motion process of training object, it can be convenient for setting and machine The consistent training environment of device people's actual working environment, while being also convenient for carrying out training environment according to actual working environment timely Adjustment, to effectively guarantee the accuracy of robotic training environment, the effective accuracy for guaranteeing training result can also have The reduction of effect is to time cost and capital consumption brought by the building of robot and its simulated environment.In addition, passing through utilization The Generation of simulating data that deep learning system is obtained according to virtual environment refers to movement of the training object in virtual environment It enables, the repetitive exercise to robot motion is realized, to effectively improve training effectiveness.

It should be understood that above general description and following detailed description be only it is exemplary and explanatory, not The disclosure can be limited.

Detailed description of the invention

The drawings herein are incorporated into the specification and forms part of this specification, and shows the reality for meeting the disclosure Example is applied, and together with specification for explaining the principles of this disclosure.It should be evident that the accompanying drawings in the following description is only this Disclosed some embodiments without creative efforts, may be used also for those of ordinary skill in the art To obtain other drawings based on these drawings.

Fig. 1 schematically shows the schematic diagram of robotic training method under a kind of virtual environment in the prior art；

Fig. 2 schematically shows a kind of signal of robotic training device under virtual environment in disclosure exemplary embodiment Figure；

Fig. 3 schematically shows a kind of schematic diagram of computer system in disclosure exemplary embodiment；

Fig. 4 schematically shows in disclosure exemplary embodiment a kind of the another of robotic training device under virtual environment Kind schematic diagram.

Specific embodiment

Example embodiment is described more fully with reference to the drawings.However, example embodiment can be with a variety of shapes Formula is implemented, and is not understood as limited to example set forth herein；On the contrary, thesing embodiments are provided so that the disclosure will more Add fully and completely, and the design of example embodiment is comprehensively communicated to those skilled in the art.Described spy Sign, structure or characteristic can be incorporated in any suitable manner in one or more embodiments.

In addition, attached drawing is only the schematic illustrations of the disclosure, it is not necessarily drawn to scale.Identical attached drawing in figure Label indicates same or similar part, thus will omit repetition thereof.Some block diagrams shown in the drawings are Functional entity, not necessarily must be corresponding with physically or logically independent entity.These can be realized using software form Functional entity, or these functional entitys are realized in one or more hardware modules or integrated circuit, or heterogeneous networks and/ Or these functional entitys are realized in processor device and/or microcontroller device.

A kind of robotic training method under virtual environment is provided firstly in this example embodiment, can be applied to machine The simulation training of device people, for example, being used on industrial flow-line for the transfer robot in unmanned cargo hold for transporting goods Identification, assembling, the industrial robot for examining product, and the industrial machine for production for executing the movements such as welding, cutting The action training of Qi Rendeng robot.With reference to shown in Fig. 1, robotic training method may include under above-mentioned virtual environment Following steps:

S101 calls predetermined movement rule, in virtual ring in the virtual environment for being training object by robot simulation Make the trained object movement in border；

S102 obtains emulation data using deep learning system from the movement of the trained object；

S103 is input with the emulation data, so that deep learning system generates action command according to the input；

S104 makes to train object movement in the virtual environment according to the action command.

Under virtual environment provided by this example embodiment in robotic training method, on the one hand, by virtual ring In border by robot establish training object, and training object motion process in obtain emulation data, can convenient for setting with The consistent training environment of robot actual working environment, at the same be also convenient for training environment according to actual working environment carry out and When adjust, to effectively guarantee the accuracy of robotic training environment, the effective accuracy for guaranteeing training result also can It effectively reduces to time cost and capital consumption brought by the building of robot and its simulated environment.On the other hand, lead to It is dynamic in virtual environment for training object to cross the Generation of simulating data obtained using deep learning system according to virtual environment It instructs, the repetitive exercise to robot motion is realized, to effectively improve training effectiveness.In addition, by utilizing virtual ring Border is trained robot, and the generation of safety accident can effectively be reduced or avoided.

In the following, accompanying drawings and embodiments will be combined to robotic training method under the virtual environment in this example embodiment Each step be described in detail.

Step S100 receives configuration of the user in the virtual environment to training object.

In this example embodiment, for virtual environment and training object, 3D engine can use for robot Practical work scene carry out equal proportion creation and guaranteeing strict conformance.To guarantee to train the accuracy of object Training scene, And then guarantee the validity of robot training result in virtual environment.

Before making to train object movement, user first can carry out parameter configuration to training object according to actual needs.It lifts For example, configuration parameter may include: the configuration of the type and/or position in the virtual environment to training object.

Specifically, may include: to the configuration of the type of training object

S1001 shows and object type list may be selected；

S1002, the selection in response to user to object type in the list configure the object type of selection in institute It states in virtual environment.

Training object is selected in the list that, when configuring training object, can be provided in 3D engine using 3D engine Type, which can be the list in the form of textual list or icon list etc., and the disclosure is not spy to this It is different to limit.When user selectes some or certain several objects in object type list, the object of the type can be configured In virtual environment.

For example, if training object is the transfer robot in unmanned freight house, user can be according to model, power And the parameters such as size, color carry out the transfer robot in configuration object type table；If training object be from electrical forklift, User can according to model, power with etc. in parameter configurations object type table from electrical forklift；If training object is industrial flowing water Mechanical arm on line can then be selected according to its model and concrete function.

In addition, for may include: to the configuration of the position of training object in the virtual environment

S1003, the dragging in response to user to training object described in the virtual environment, the trained object is moved Move the position being dragged to.

In this example embodiment, after determining training object, trained object can be set depending on the user's operation and is existed Initial position in virtual environment, and with the positional relationship in virtual environment between other objects.For example, according to user's It operates and determines the position between electrical forklift and pallet and the inceptive direction etc. from electrical forklift in virtual environment；Or it carries Position between robot and shelf；Or the parameters such as the distance between mechanical arm and production line.

Based on above content, in this example embodiment, before the trained object movement is made in virtual environment, The above method can also include:

S1004 receives user and presets to sports rule.

In the configuration parameter for determining Training scene in virtual environment using 3D engine, determining training object, and training Behind position of the object in Training scene, user can also configure the sports rule of training object.For example, it is above-mentioned from Electrical forklift, transfer robot or mechanical arm sports rule.

Sports rule is illustrated for from electrical forklift in the present embodiment.Specifically, can be set certainly Sports rule of the electrical forklift in virtual environment include:

It is enabled in virtual environment from electrical forklift forward direction and being advanced in face of pallet, be directed at the emulation pallet gaps underneath and pitched It rises；Or the emulation fork truck is enabled to advance clockwise or counterclockwise around the emulation pallet, it is aligned below the emulation pallet Gap simultaneously forks.

Certainly, in other exemplary embodiments of the disclosure, more, more detailed sports rule, example also can be set Movement velocity, the move angle of mechanical arm are such as set.Sports rule can be according to the actual working state of training object And operative scenario is configured, the disclosure does not do particular determination to the particular content of the sports rule of training object.

By the way that the sports rule of training object is arranged previously according to practical work state, the movement for training object can be made Meet practical application scene, and then effectively guarantees the accuracy of training result.

Step S101 calls predetermined movement rule, in void in the virtual environment for being training object by robot simulation Make the trained object movement in near-ring border.

In this example embodiment, configures and complete in the parameters to training object, application scenarios and sports rule Afterwards, sports rule can be called, makes to train object setting in motion in virtual environment.For example, making from electrical forklift by above-mentioned Sports rule setting in motion.

Step S102 obtains emulation data using deep learning system from the movement of the trained object.

In this example embodiment, after training object is by predetermined movement rule setting in motion, deep learning system To obtain emulation data in the simulation video or emulating image that generate according to 3D engine.For example, in simulation video or emulating image The middle motion state and kinematic parameter for extracting training object.

The present embodiment is illustrated the acquisition of emulation data from for electrical forklift by above-mentioned, specifically, can be by According to the image of preset time interval interception simulation video, such as each second one emulating image of interception, and utilize image recognition skill Art is identified from the direction of motion at electrical forklift current time, from the distance between electrical forklift and pallet, and from the fork of electrical forklift The design parameters such as angle or rising height are lifted, using those parameters as the output data of deep learning system.

Certainly, in other exemplary embodiments of the disclosure, the emulation data that deep learning system obtains be can also be The data such as distance between the direction of motion and shelf of transfer robot；Either between mechanical arm and product operation point away from From or the data such as present rotation angel degree.The disclosure does not do particular determination to this.

Step S103 is input with the emulation data, so that deep learning system refers to according to input generation movement It enables.

In this example embodiment, can also may be used using above-mentioned emulation data as the input data of deep learning system Think that deep learning system configuration one acts reward list.Above-mentioned step S103 can specifically include: make deep learning system Lookup acts reward list, to obtain the movement of the corresponding maximal rewards value of emulation data of input.

After the input that deep learning system obtains current time, it can be reported in list according to the input in movement Obtain optimal action command, the i.e. maximum action command of return value.

For example, for above-mentioned from electrical forklift, as shown in table 1 below, movement reward list may include: from moving fork Current distance between vehicle and pallet, the corresponding action command of the current distance and the corresponding return value of the action command.

State (with stack distance between plates)	Action command	Return value
			2 meters	15 ° of lift upwards is advanced	0.835
2 meters	20 ° of lift upwards is advanced	0.9
			2 meters	25 ° of lift upwards is advanced	0.73
2 meters	30 ° of lift upwards is advanced	0.65
			1.9 rice	20 ° of lift upwards is advanced	0.534

Table 1

For example, maximal rewards value is 0.9 when detecting current time from a distance from electrical forklift is between pallet is 2 meters, Its corresponding action command are as follows: 20 ° of lift upwards is advanced.

Step S104 makes to train object movement in the virtual environment according to the action command.

In this example embodiment, after deep learning system generates action command according to the input at current time, The action command to be sent in 3D engine, so that the training object in virtual environment is moved according to the action command.Example Such as, above-mentioned from electrical forklift, action command can be executed are as follows: 20 ° of lift upwards is advanced.

By the way that data interaction will be established between deep learning system and 3D engine, make the training pair generated in virtual environment The emulation data of elephant can be used as the input data of deep learning system, and allow deep learning system according to the input number According to the training object feedback action instruction into virtual environment, instruction is iterated to robot in virtual environment to realize Practice.

Based on above content, in other exemplary embodiments of the disclosure, this method can also include:

Step S1041, the emulation data currently obtained based on deep learning system and the last emulation data obtained With corresponding return value, the corresponding return value of emulation data currently obtained is calculated；

Step S1042, by emulation data that deep learning system currently obtains, the training object detected it is current dynamic Make, the calculated corresponding return value of emulation data currently obtained increases to the reward list.

For example, above-mentioned action command is being executed in virtual environment from electrical forklift from electrical forklift for above-mentioned After " 20 ° of lift upwards is advanced ", parameter of the electrical forklift in next second emulating image can be obtained from.

For example, identifying in the emulating image of subsequent time after executing action command " 20 ° of lift upwards is advanced " from moving fork Vehicle and stack distance between plates are 1.9 meters, and lift 18 ° upwards, are advanced.Although a upper action command is " 20 ° of lift upwards is advanced ", The regular hour is needed due to being fully finished the movement, the movement that fork truck is completed at this moment is 18 ° of lift upwards, and preceding Into 0.1 meter.

It, can be according to the state and the movement identified, according to presetting method meter after identifying the parameter at this moment Calculate return value.For example, in the right direction, then often upper 1 ° of lift makes return value add 0.01；If anisotropy makes return value subtract 0.01 Deng.The disclosure does not do particular determination to the specific calculation of return value.

It, as shown in table 2, then can should if the return value that benefit calculates above-mentioned movement with the aforedescribed process is 0.93 Item record increases to automatic reward list.

State (with stack distance between plates)	Action command	Return value
			2 meters	15 ° of advances of lift upwards	0.835
2 meters	20 ° of advances of lift upwards	0.9
			2 meters	25 ° of advances of lift upwards	0.73
2 meters	30 ° of advances of lift upwards	0.65
			1.9 rice	20 ° of advances of lift upwards	0.534
1.9 rice	18 ° of advances of lift upwards	0.93

Table 2

It is above-mentioned after the trained object movement is made in virtual environment in other exemplary embodiments of the disclosure Method can also include:

Step S1051 goes out robot accident using deep learning system identification；

Step S1052 generates correction action command corresponding with robot accident；

Step S1053 makes to train object movement in the virtual environment according to the correction action command.

If there is robot fault by emulating image identification during robotic training, stack such as is hit from electrical forklift Plate, transfer robot hit the product on shelf or mechanical arm shock assembly line, and transport from electrical forklift or transfer robot When dynamic direction is completely wrong, mechanical arm rotates out of predetermined angle range, it can be generated and be corresponded to according to specific mistake Correction action command, and make train object execute the correction action command.For example, correction movement is specified to can be rotation extremely Assigned direction such as moves forward or back to designated position or distance to a declared goal, and returns back to initial position at the movement.The disclosure is to this Do not do particular determination.

Above-mentioned identification robot accident can also be judged according to parameters such as distance, angles；Or it can will show Some accident pictures are saved into deep learning system, for carrying out the judgement of accident pattern.

Robot accident is identified by setting, and corresponding correction action command is set, and makes the motion process for training object Real work scene, the validity and integrality of effective hoisting machine people training result can be more nearly.Also, it can also Enough guarantee the progress that training process can be effective, lasting.

In addition, in this illustrative embodiments, configuration of the user to training objective is received in the virtual environment, And after according to the action command, making to train object movement in the virtual environment, above-mentioned method can also be wrapped It includes:

Step S1061, judges whether the training objective of configuration reaches；

Step S1062, in the case that the training objective of configuration is not up to after having called the predetermined movement rule, Determine that training terminates.

After training object executes action command in virtual environment, it can judge whether to complete training objective.If not Training objective is completed, then continues to train according to preset sports rule；If since robot accident or other reasons cannot Continue to train, then can carry out rollback operation or reset operation, makes virtual environment and training object callbacks into historical record Sometime node continues to train；If completing predetermined movement rule, and after reaching default training objective, it can conclude training Terminate；If in the case that the training objective of configuration is still not up to, can also sentence after having called the predetermined movement rule Fixed training terminates.

In addition, this method can also be calculated using other machine learning in other illustrative embodiments of the disclosure Method is trained robot, for example, random forests algorithm, logistic regression algorithm etc..

Robotic training method under virtual environment provided by the disclosure is established and practical application scene using 3D engine The virtual environment of strict conformance, and robot is trained using the virtual environment；Using deep learning algorithm according to machine Motion state of the people in virtual environment generates corresponding action command, and robot is enabled to be iterated training in virtual environment, Using the continuous iterated revision of the action command of deep learning system feedback, to obtain the optimal action command of robot.It can Make robot Fast Learning, and quickly finish training objective, reduces time cost.In addition, by utilizing virtual environment pair Robot is trained, and can effectively reduce accident and danger as caused by robot motion inaccuracy.

It should be noted that above-mentioned attached drawing is only processing included by method according to an exemplary embodiment of the present invention It schematically illustrates, rather than limits purpose.It can be readily appreciated that above-mentioned processing shown in the drawings does not indicate or limits these processing Time sequencing.In addition, be also easy to understand, these processing, which can be, for example either synchronously or asynchronously to be executed in multiple modules.

Further, it is instructed refering to what is shown in Fig. 2, also providing robot under a kind of virtual environment in this exemplary embodiment Practice device 20, comprising: training object control module 201, emulation data acquisition module 202, action command obtain module 203 with And action command execution module 204.Wherein:

The trained object control module 201 can be used in the virtual environment for being training object by robot simulation, Predetermined movement rule is called, the trained object movement is made in virtual environment.

The emulation data acquisition module 202 can be used for from the movement of the trained object, utilize deep learning system System obtains emulation data.

The action command, which obtains module 203, can be used for the emulation data for input, so that deep learning system Action command is generated according to the input.

The action command execution module 204 can be used for making to instruct in the virtual environment according to the action command Practice object movement.

Under above-mentioned virtual environment in robotic training device the detail of each module in corresponding virtual environment It is described in detail in lower robotic training method, therefore details are not described herein again.

It should be noted that although be referred in the above detailed description for act execute equipment several modules or Unit, but this division is not enforceable.In fact, according to embodiment of the present disclosure, above-described two or more Multimode or the feature and function of unit can embody in a module or unit.Conversely, above-described one Module or the feature and function of unit can be to be embodied by multiple modules or unit with further division.

In an exemplary embodiment of the disclosure, a kind of computer system that can be realized the above method is additionally provided.

Person of ordinary skill in the field it is understood that various aspects of the invention can be implemented as system, method or Program product.Therefore, various aspects of the invention can be embodied in the following forms, it may be assumed that complete hardware embodiment, The embodiment combined in terms of complete Software Implementation (including firmware, microcode etc.) or hardware and software, here may be used To be referred to as circuit, " module " or " system ".

The computer system 600 of this embodiment according to the present invention is described referring to Fig. 3.The meter that Fig. 3 is shown Calculation machine system 600 is only an example, should not function to the embodiment of the present invention and use scope bring any restrictions.

As shown in figure 3, computer system 600 is showed in the form of universal computing device.The component of computer system 600 Can include but is not limited to: at least one above-mentioned processing unit 610, connects not homology at least one above-mentioned storage unit 620 The bus 630 of system component (including storage unit 620 and processing unit 610).

Wherein, the storage unit is stored with program code, and said program code can be held by the processing unit 610 Row, so that various according to the present invention described in the execution of the processing unit 610 above-mentioned " illustrative methods " part of this specification The step of illustrative embodiments.For example, the processing unit 610 can execute step S101 as shown in fig. 1: inciting somebody to action Robot simulation is to call predetermined movement rule in the virtual environment of training object, the training pair is made in virtual environment As movement；S102: from the movement of the trained object, emulation data are obtained using deep learning system；S103: with described Emulating data is input, so that deep learning system generates action command according to the input；S104: refer to according to the movement It enables, makes to train object movement in the virtual environment.

Storage unit 620 may include the readable medium of volatile memory cell form, such as Random Access Storage Unit (RAM) 6201 and/or cache memory unit 6202, it can further include read-only memory unit (ROM) 6203.

Storage unit 620 can also include program/utility with one group of (at least one) program module 6205 6204, such program module 6205 includes but is not limited to: operating system, one or more application program, other program moulds It may include the realization of network environment in block and program data, each of these examples or certain combination.

Bus 630 can be to indicate one of a few class bus structures or a variety of, including storage unit bus or storage Cell controller, peripheral bus, graphics acceleration port, processing unit use any total knot in a variety of bus structures The local bus of structure.

Computer system 600 can also (such as keyboard, sensing equipment, bluetooth be set with one or more external equipments 700 It is standby etc.) communication, the equipment interacted with the computer system 600 communication can be also enabled a user to one or more, and/or It (such as is routed with any equipment for enabling the computer system 600 to be communicated with one or more of the other calculating equipment Device, modem etc.) communication.This communication can be carried out by input/output (I/O) interface 650.Also, computer System 600 can also pass through network adapter 660 and one or more network (such as local area network (LAN), wide area network (WAN) and/or public network, for example, internet) communication.As shown, network adapter 660 passes through bus 630 and computer Other modules of system 600 communicate.It should be understood that although not shown in the drawings, can be used in conjunction with computer system 600 other Hardware and/or software module, including but not limited to: microcode, device driver, redundant processing unit, external disk drive battle array Column, RAID system, tape drive and data backup storage system etc..

Through the above description of the embodiments, those skilled in the art is it can be readily appreciated that example described herein is implemented Mode can also be realized by software realization in such a way that software is in conjunction with necessary hardware.Therefore, according to this public affairs The technical solution for opening embodiment can be embodied in the form of software products, which can store non-at one In volatile storage medium (can be CD-ROM, USB flash disk, mobile hard disk etc.) or on network, including some instructions are so that one It calculates equipment (can be personal computer, server, terminal installation or network equipment etc.) and executes and implemented according to the disclosure The method of mode.

In an exemplary embodiment of the disclosure, a kind of computer readable storage medium is additionally provided, energy is stored thereon with Enough realize the program product of this specification above method.In some possible embodiments, various aspects of the invention may be used also In the form of being embodied as a kind of program product comprising program code, when described program product is run on the terminal device, Said program code is for executing the terminal device described in above-mentioned " illustrative methods " part of this specification according to this The step of inventing various illustrative embodiments.

Refering to what is shown in Fig. 4, describing the program product for realizing the above method of embodiment according to the present invention 800, can using portable compact disc read only memory (CD-ROM) and including program code, and can in terminal device, Such as it is run on PC.However, program product of the invention is without being limited thereto, in this document, readable storage medium storing program for executing can be with To be any include or the tangible medium of storage program, the program can be commanded execution system, device or device use or It is in connection.

Described program product can be using any combination of one or more readable mediums.Readable medium can be readable Signal media or readable storage medium storing program for executing.Readable storage medium storing program for executing for example can be but be not limited to electricity, magnetic, optical, electromagnetic, infrared ray, Or system, device or the device of semiconductor, or any above combination.The more specific example of readable storage medium storing program for executing is (non-poor The list of act) include: electrical connection with one or more conducting wires, portable disc, hard disk, random access memory (RAM), Read-only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc is read-only deposits Reservoir (CD-ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.

Computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal, Wherein carry readable program code.The data-signal of this propagation can take various forms, and including but not limited to electromagnetism is believed Number, optical signal or above-mentioned any appropriate combination.Readable signal medium can also be other than readable storage medium storing program for executing it is any can Read medium, the readable medium can send, propagate or transmit for by instruction execution system, device or device use or Person's program in connection.

The program code for including on readable medium can transmit with any suitable medium, including but not limited to wirelessly, have Line, optical cable, RF etc. or above-mentioned any appropriate combination.

The program for executing operation of the present invention can be write with any combination of one or more programming languages Code, described program design language include object oriented program language-Java, C++ etc., further include conventional Procedural programming language-such as " C " language or similar programming language.Program code can be fully in user It calculates and executes in equipment, partly executes on a user device, being executed as an independent software package, partially in user's meter Upper side point is calculated to execute or execute in remote computing device or server completely on a remote computing.It is being related to In the situation of remote computing device, remote computing device can pass through the network of any kind, including local area network (LAN) or wide Domain net (WAN), is connected to user calculating equipment, or, it may be connected to external computing device (such as utilize Internet service Provider is connected by internet).

In addition, above-mentioned attached drawing is only the schematic theory of processing included by method according to an exemplary embodiment of the present invention It is bright, rather than limit purpose.It can be readily appreciated that the time that above-mentioned processing shown in the drawings did not indicated or limited these processing is suitable Sequence.In addition, be also easy to understand, these processing, which can be, for example either synchronously or asynchronously to be executed in multiple modules.

Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to the disclosure Other embodiments.This application is intended to cover any variations, uses, or adaptations of the disclosure, these modifications, purposes or Person's adaptive change follows the general principles of this disclosure and including the undocumented common knowledge in the art of the disclosure Or conventional techniques.The description and examples are only to be considered as illustrative, and the true scope and spirit of the disclosure are wanted by right It asks and points out.

It should be understood that the present disclosure is not limited to the precise structures that have been described above and shown in the drawings, and And various modifications and changes may be made without departing from the scope thereof.The scope of the present disclosure is only limited to the appended claims.

Claims

1. a kind of robotic training method under virtual environment characterized by comprising

In the virtual environment for being training object by robot simulation, predetermined movement rule is called, is made in virtual environment described Training object movement；

2. the method according to claim 1, wherein the step of moving the trained object in virtual environment Before, the method also includes:

3. according to the method described in claim 2, it is characterized in that, the reception user is in the virtual environment to training pair The configuration of elephant, comprising: receive the configuration of type and/or position of the user in the virtual environment to training object.

4. according to the method described in claim 3, it is characterized in that, receiving user in the virtual environment to training object The configuration of type includes:

It shows and object type list may be selected；

Selection in response to user to object type in the list configures the object type of selection in the virtual environment In.

5. according to the method described in claim 3, it is characterized in that, receiving user in the virtual environment to training object The configuration of position includes:

Dragging in response to user to training object described in the virtual environment, is moved to the position being dragged to for the trained object It sets.

6. the method according to claim 1, wherein the step of moving the trained object in virtual environment Before, the method also includes:

User is received to preset sports rule.

7. the method according to claim 1, wherein being input with the emulation data, so that deep learning system System generates action command according to the input, comprising: so that deep learning system searching is acted reward list, to obtain the imitative of input The movement of the corresponding maximal rewards value of true data.

8. the method according to the description of claim 7 is characterized in that using deep learning system obtain emulation data after, The method also includes:

The emulation data and the last emulation data and corresponding return value obtained currently obtained based on deep learning system, Calculate the corresponding return value of emulation data currently obtained；

By the current action of emulation data, the training object detected that deep learning system currently obtains, calculated current The corresponding return value of emulation data of acquisition increases to the reward list.

9. the method according to claim 1, wherein after making the trained object movement in virtual environment, The method also includes:

Go out robot accident using deep learning system identification；

Generate correction action command corresponding with robot accident；

10. the method according to claim 1, wherein further include: user is received in the virtual environment to instruction Practice the configuration of target, and

After according to the action command, making to train object movement in the virtual environment, the method also includes:

Judge whether the training objective of configuration reaches；

In the case that the training objective of configuration is not up to after having called the predetermined movement rule, determine that training terminates.

11. robotic training device under a kind of virtual environment characterized by comprising

Training object control module, in the virtual environment for being training object by robot simulation, calling predetermined movement rule Then, the trained object movement is made in virtual environment；

Data acquisition module is emulated, for obtaining emulation data using deep learning system from the movement of the trained object；

Action command obtains module, for being input with the emulation data, so that deep learning system is given birth to according to the input At action command；

Action command execution module trains object movement for making in the virtual environment according to the action command.

12. a kind of storage medium is stored thereon with computer program, realizes when described program is executed by processor and wanted according to right Robotic training method under virtual environment described in asking any one of 1 to 11.

13. a kind of computer system characterized by comprising

Processor；And

Memory, for storing the executable instruction of the processor；

Wherein, the processor is configured to come any one of perform claim requirement 1 to 10 institute via the execution executable instruction Robotic training method under the virtual environment stated.