CN109760050A

CN109760050A - Robot behavior training method, device, system, storage medium and equipment

Info

Publication number: CN109760050A
Application number: CN201910028901.4A
Authority: CN
Inventors: 何德裕
Original assignee: Luban's Robot (shenzhen) Co Ltd
Current assignee: Luban's Robot (shenzhen) Co Ltd
Priority date: 2019-01-12
Filing date: 2019-01-12
Publication date: 2019-05-17
Also published as: CN110293560A

Abstract

This application involves a kind of robot behavior training method, device, system, storage medium and robots.Wherein, robot behavior training method includes: the decision data in the action process for obtain executive expert；Based on the decision data, initial model is trained to obtain pretreated model；Autonomous learning is carried out based on the pretreated model, obtains robot behavior model.Using technical solution of the present invention, the adaptability and accuracy of robot model's behavior act after improving training.

Description

Robot behavior training method, device, system, storage medium and equipment

Technical field

This application involves technical field of robot control, more particularly to a kind of robot behavior training method, device, are System, storage medium and equipment.

Background technique

With the raising of scientific and technological level, entire society all develops towards intelligent, automation direction.More and more rows For the realization dependent on robot.Such as: the movement of crawl is executed by robot, the movement of assembly, drives object movement Etc. action behavior.

Artificial intelligence is that robot future development brings unlimited possibility, by being trained to neural network model, So that the various movements of study execution that can be autonomous based on the robot that the network model controls.

But it shall be understood that the method based on machine learning carries out the Behavioral training study of robot, there is also to training number According to excessively rely on, learning effect is bad etc. the problem of.

Summary of the invention

Based on this, the present invention provides a kind of robot behavior training method, device, system, storage medium and equipment.

First aspect present invention provides a kind of robot behavior training method, and the robot behavior training method includes:

Obtain the decision data in the action process of executive expert；Wherein, the decision data includes multiple behavioral datas With corresponding observation data；

Based on the decision data, model autonomous learning is carried out, robot behavior model is obtained.

Further, described to be based on the decision data, model autonomous learning is carried out, obtaining robot behavior model includes:

Based on the decision data, training initial model obtains pretreated model；

The pretreated model autonomous learning is carried out, the robot behavior model is obtained.

Based on the decision data, initial model autonomous learning is carried out, the robot behavior model is obtained.

Further, the decision data in the action process for obtaining executive expert includes:

Obtain the behavioral data at multiple current times in the action process of the executive expert；

Obtain the sight at the multiple current time that first sensor is sent in the action process of the executive expert Measured data；Wherein, the behavioral data at the current time is corresponding with the observation data at the current time.

Obtain behavioral data described in multiple current times that second sensor is sent in the action process of the executive expert Relevant information；

According to the relevant information, the behavioral data of multiple last moments is obtained；

Obtain the sight for the multiple last moment that first sensor is sent in the action process of the executive expert Measured data；Wherein, the behavioral data of the last moment is corresponding with the observation data of the last moment.

Further, the observation data include:

The pose or position data, force feedback data, driving unit of image or the robot generated according to described image Amount of exercise feedback data, ranging data, speed or acceleration analysis data, current or voltage measurement data, time data and/or Temperature data.

Further, the behavioral data include: object pose or position, robot each driving unit amount of exercise or The amount of exercise of robot.

Further, the behavior includes:

Object is grabbed from bulk or regularly placing object；

Assemble object；

Drop target object；And/or

Another location is moved to from a position.

Second aspect of the present invention provides a kind of robot behavior training device, the robot behavior Training Control dress It sets and includes:

Decision data obtains module, the decision data in action process for obtaining executive expert；Wherein, the decision Data include multiple behavioral datas and corresponding observation data；

Behavior model generation module carries out model autonomous learning, obtains robot behavior for being based on the decision data Model.

Third aspect present invention provides a kind of robot behavior training system, comprising:

Behavioral data generating means are sent to the control device for generating behavioral data, and by the behavioral data；

First sensor is sent to for obtaining the corresponding observation data of the behavioral data, and by the observation data The control device；

Control device, the decision data in action process for obtaining executive expert；Wherein, the decision data includes Multiple behavioral datas and the corresponding observation data；Based on the decision data, model autonomous learning is carried out, machine is obtained Device people's behavior model.

Further, the robot behavior training system further include:

Robot, for executing the behavior of the expert under teaching.

Further, the sensor includes:

Imaging sensor, the image data of the robot for obtaining a certain moment；

Force snesor, the force feedback data of the robot for obtaining a certain moment；

Encoder, the motion feedback data of the driving unit for obtaining a certain moment robot；

Range finder, for obtaining the relevant ranging data of distance of a certain moment robot；

Speed or acceleration information measuring appliance, for obtaining the speed or acceleration analysis data of a certain moment robot；

Current or voltage measuring appliance, for obtaining the current or voltage measurement data of a certain moment robot；

Timer, for obtaining the specific time data at a certain moment；

Temperature sensor, for obtaining the temperature data of a certain moment robot.

Further, the behavioral data generating means include: control unit；

Described control unit, for generating the behavioral data.

Further, the behavioral data generating means include: second sensor and control unit；

The second sensor, for obtaining the relevant information of behavioral data described in multiple current times, by the correlation Information is sent to described control unit；

Described control unit, for obtaining the behavioral data of multiple last moments according to the relevant information.

Fourth aspect present invention provides a kind of robot system, and the robot system includes machine described in any of the above item Device people's Behavioral training system.

Fifth aspect present invention provides a kind of computer equipment, including memory and processor, and the memory is stored with Computer program, the processor realize robotic training method described in any of the above item when executing the computer program.

Sixth aspect present invention provides a kind of computer readable storage medium, is stored thereon with computer program, feature It is, the computer program realizes robotic training method described in any of the above item when being executed by processor.

Model autonomous learning is carried out, robot is obtained due to being based on the decision data using technical method of the invention Behavior model, therefore improve the adaptability of consummatory behavior movement and accurate in all cases of the robot model after training Property.

Detailed description of the invention

Fig. 1 is the first pass schematic diagram of robot behavior training method in one embodiment；

Fig. 2 is the second procedure schematic diagram of robot behavior training method in one embodiment；

Fig. 3 is the third flow diagram of robot behavior training method in one embodiment；

Fig. 4 is the 4th flow diagram of robot behavior training method in one embodiment；

Fig. 5 is the first structure diagram of the embodiment of robot system；

Fig. 6 is the second structural schematic diagram of the embodiment of robot system；

Fig. 7 is the first structure block diagram of robotic training device；

Fig. 8 is the second structural block diagram of robotic training device；

Fig. 9 is the third structural block diagram of robotic training device；

Figure 10 is the 4th structural block diagram of robotic training device；

Figure 11 is the first structure block diagram of middle robotic training system；

Figure 12 is the second structural block diagram of middle robotic training system；

Figure 13 is the first structure block diagram of the behavioral data generating means of middle robot；

Figure 14 is the second structural block diagram of the behavioral data generating means of middle robot.

Specific embodiment

It is with reference to the accompanying drawings and embodiments, right in order to which the objects, technical solutions and advantages of the application are more clearly understood The application is further elaborated.It should be appreciated that specific embodiment described herein is only used to explain the application, not For limiting the application.

In one embodiment, as shown in Figure 1, providing a kind of robot behavior training method, robot behavior training Method the following steps are included:

Step S100 obtains the decision data in the action process of executive expert；Wherein, decision data is multiple observation numbers According to and corresponding behavioral data.

Specifically, inscribing the observation data of acquisition and the corresponding behavior number of the moment acquisition when decision data refers to a certain According to the set summarized.

Specifically, action process can include but is not limited to: grabbing target object from bulk or regularly arranged object Movement (as shown in Figure 6)；Assemble the movement (as shown in Figure 5) of target object；The movement (omitting attached drawing) for putting down object, from One position some or all of moves in the movement (omit attached drawing) or above-mentioned each movement of another position movement Combination.

In one embodiment, decision data is to obtain in the action process by teaching robot executive expert.Tool Body, control instruction of controller generation etc. band mobile robot executive expert row can be directed or through by operator For；Such as: robot completes the assembly behavior act of building blocks under the drive of operator；For another example: being sent according to controller Robot each driving unit amount of exercise instruction complete building blocks assembly behavior act.

Further, in some embodiments, it is executed by the instruction tape mobile robot that behavioral data generating means generate special In the case where family's behavior:

Behavioral data can include but is not limited to: each step in the action process of executive expert is corresponding to pass through controller The robot of output executes the object pose (X, Y, Z, U, V, W coordinate) or position (X, Y coordinates) of each step；Either according to mesh Mark appearance or position, the amount of exercise (rotation amount and/or flat of the driving unit of the corresponding robot calculated based on kinematical equation Shifting amount)；The either amount of exercise of robot.

Control device obtains the observation data that first sensor is obtained and sent.Specifically, observation data may include but Be not limited to: imaging sensor obtain and the image data that sends or according to the robot of the image data extraction (such as: robot End effector) pose or position, distance measuring sensor obtain and the ranging data that sends, force snesor obtain and send Driving unit amount of exercise (rotation amount and/or the translation of robot that power (power/torque) feedback data, encoder are obtained and sent Amount) data, speed or acceleration analysis device obtains and send speed or acceleration information, current or voltage measuring appliance acquisition simultaneously The temperature number that the current or voltage measurement data of transmission, timer obtain and the time data that send, thermometer are obtained and sent According to.

Such as: as shown in figure 5, by taking image training robot assembles behavior (such as: object M2 is assembled on object M1) as an example, Obtain the multiple groups decision data in the action process of executive expert；It is at a time obtained down specifically, behavioral data can be The movement of the object pose or position or driving unit of the robot next step exported by behavioral data generating means taken Amount；And the corresponding image data for observing each first sensor transmission that acquisition is inscribed when data can be this or pose or position It sets, force feedback data, encoder feedback data, speed or acceleration information and/or current or voltage data etc.；It is special executing In the action process of family, the multiple groups decision data that will acquire is sent to the control unit of robot.In some embodiments, it executes Multiple groups decision data in the action process of expert needs to include at least the decision data under assembly success status.

It in some embodiments, is image data when control device obtains, it can be directly using image data as observation Data, or the pose of robot is gone out perhaps behind position using pose or position as observation according to image data extraction Data.

Further, in some embodiments, in the case where the behavior by operator with mobile robot executive expert:

Since in this case, none very specific behavior command is as behavioral data, in order to obtain behavior number According to can indirectly obtain the relevant information of behavioral data or behavioral data by certain second sensors.At this moment, the first sensing Device and second sensor may include the sensor of identical type, such as: imaging sensor and encoder, in some embodiments In, sensor identical in first sensor and second sensor can be merged into a sensor, that is, the data obtained are i.e. It can be used as behavioral data, can also be used as status data.Such as: the driving list that the encoder obtained under current time is sent The amount of exercise data of member；The amount of exercise data can do the observation data inscribed when this, can also be used as the row at last moment For data.For another example: the imaging sensor under current time according to acquisition obtains and sends the pose that image obtains robot Or position may act as the behavioral data of last moment；It can be used as the observation data of the robot under current time again.

Such as: as shown in fig. 6, by taking image training robot is from the behavior for grabbing object in bulk as an example.Wherein, bulk Refer to that multiple objects M is scattered with irregular state.Obtain the multiple groups decision data in the action process of executive expert (behavioral data and corresponding observation data)；Specifically, the behavioral data at a certain current time subsequent time can obtain according to The pose of the robot for the image zooming-out that the imaging sensor taken is sent or position, or the figure according to current time and subsequent time As the amount of exercise of the robot of the pose or position acquisition of the robot of extraction；And the observation data at current time then can be to work as The information that each first sensor of the acquisition inscribed when preceding is sent, such as: the force feedback data of force snesor (such as: in hand The pressure sensor being arranged on finger obtains the numerical value and/or directional information of power when completing grasping movement；Or in robot Multi-dimension force sensor is arranged in terminal shaft output end, obtains the variation etc. of the power or torque of output end in the process of grasping), driving (robot is during the motion for unit feedback data (such as: motor rotation or the angle of movement), speed or acceleration information Speed or acceleration) and/or current or voltage data (such as: input motor current or voltage value) etc., in addition, according to current The image data at moment can also be with the pose or position data at the current time of extraction machine people.

Specifically, behavioral data can include but is not limited to: object pose or position, robot each driving unit The amount of exercise of amount of exercise or robot.

In some embodiments, the multiple groups decision data in the action process of executive expert is needed to include at least and be grabbed successfully When the decision data inscribed.

For another example: by taking image training robot moves (translation and/or rotation) to another position from a position as an example.It obtains Multiple groups decision data in the action process of executive expert；Specifically, behavioral data may include what imaging sensor obtained The athletic performance of robot it is each when the pose of the actuator of the robot of image zooming-out inscribed；And the moment is corresponding Observe data: such as: by the range information for the distance objective position that ranging data is fed back, such as: it installs and surveys in robot Distance meter (such as: infrared range-measurement system), range information driving unit feedback data, the speed of distance objective position are fed back by rangefinder Degree or acceleration information etc..Specifically, the multiple groups decision data in the action process of executive expert needs to include at least movement The decision data inscribed when to target position.

In one embodiment, decision data is what expert obtained in the action process of executive expert.

Specifically, expert can be operator or other robot, such as: it obtains some operator and assembles realizing Decision data in behavior；Specifically, such as: the behaviour that can be shot and send by obtaining multiple current time imaging sensors Work person is in the image data for executing assembling process to obtain the behavioral data of last moment of the operator in assembling process and work as The observation data at preceding moment；In addition to this, it can also be fed back by force snesor and be held in people in the installation force snesor on hand of people Luggage is with the observation data etc. in action process.

Specifically, during executing the behavior of the various experts of grasping body, assembly etc., under multiple states of acquisition Image data can be 3D rendering, 2D image or video image.Imaging sensor can include but is not limited to: camera is taken the photograph Camera, scanner or other equipment (mobile phone, computer etc.) etc. with correlation function.Imaging sensor can for more than or equal to Any of 1.

Specifically, imaging sensor can be set in robot or be fixed on a certain position outside robot, it is right in advance Imaging sensor, imaging sensor and robot (referred to as " eye hand ") and robot are demarcated.

Specifically, robot can be multiple joints and various types of machinery that connecting rod passes through formation in series or in parallel Hand, each joint are a driving unit, such as: the Serial manipulators such as four axis robots, six axis robot or parallel manipulator Hand.In some embodiments, the output end of the terminal shaft of manipulator also fixation ends actuator, end effector can be sucker Or clamping jaw etc..In some embodiments, the amount of exercise of the robot in above example may refer to any portion of robot The amount of exercise of position, such as；The amount of exercise of end effector.

Step S200 is based on the decision data, carries out model autonomous learning, obtains robot behavior model；

Model autonomous learning is carried out, machine is obtained due to being based on the decision data by using learning method above People's behavior model, therefore improve the adaptability of consummatory behavior movement and accurate in all cases of the robot model after training Property.

As shown in Fig. 2, in some embodiments, step S200 includes following method and step:

S210 is based on the decision data, and training initial model obtains pretreated model；

Using status data as feature (feature), behavioral data is classified as label (label) (for discrete Movement) or return (for continuous action) study, the parameter of initial model is constantly updated, to obtain pretreated model.

S230 carries out autonomous learning based on the pretreated model, obtains robot behavior model.

Autonomous learning process is exactly to allow robot to be based on pretreated model to generate some action trails, then defines a mark Difference between action trail of the standard to judge the expert of these tracks and teaching period acquisition, then according to this difference come more The strategy of new pretreated model, the track for generating it next time more close to the behavior of expert, is sentenced until according to standard The action trail of the disconnected close enough expert of action trail generated based on pretreated model, the then model obtained are final machine People's behavior model.

Specifically, standard described in above example can based on experience value, machine learning, the various methods of random value etc. It obtains, in some embodiments, this standard can be indicated with the neural network through overfitting.

By using learning method above, due to the decision data that is obtained in the behavior based on expert to initial model into Row training obtains pretreated model, carries out autonomous learning based on pretreated model, finally obtains the model of machine learning, therefore mention The adaptability and accuracy of robot model after high training consummatory behavior movement in all cases.

Alternatively, it is also possible to reduce the time of robot behavior model training.

In some embodiments, step S200 includes following method and step:

S220 is used to be based on the decision data, carries out initial model autonomous learning, obtains robot behavior model.

Autonomous learning process is exactly to allow robot to be based on initial model to generate some action trails, defines a standard to sentence The difference broken between the action trail of expert that these tracks and teaching period obtain, then updates pre- place according to this difference The strategy of model is managed, the track for generating it next time more close to the behavior of expert, is based on until according to standard judgement The action trail for the close enough expert of action trail that initial model generates, the then model obtained are final robot behavior mould Type.

As shown in figure 3, in some embodiments, the decision number in the action process of executive expert is obtained described in step S100 According to may include following method and step:

S110 obtains the behavioral data at multiple current times in the action process of the executive expert；

S130 obtains the institute at the multiple current time that first sensor is sent in the action process of the executive expert State observation data；Wherein, the behavioral data at the current time is corresponding with the observation data at the current time.

As shown in figure 4, in some embodiments, the decision number in the action process of executive expert is obtained described in step S100 According to may include following method and step:

S120 obtains the behavior number at multiple current times that second sensor is sent in the action process of the executive expert According to relevant information；

S140 obtains the behavioral data of multiple last moments according to the relevant information；

Such as: when relevant information is the image information that imaging sensor is sent, image information is parsed, to generate machine The pose of people perhaps position or is made according to the amount of exercise that the pose of current time and subsequent time or position generate robot For the behavioral data of last moment.

For another example: when relevant information is the amount of exercise for each driving unit that encoder is sent, directly by the amount of exercise Behavioral data of the information as last moment.

S160 obtains the sight for the multiple last moment that first sensor is sent in the action process of the executive expert Measured data；Wherein, the behavioral data of the last moment is corresponding with the observation data of the last moment.

It should be understood that although each step in the flow chart of Fig. 1,2,3 or 4 is successively shown according to the instruction of arrow Show, but these steps are not that the inevitable sequence according to arrow instruction successively executes.Unless expressly state otherwise herein, this There is no stringent sequences to limit for the execution of a little steps, these steps can execute in other order.Moreover, Fig. 1,2,3 or 4 In at least part step may include that perhaps these sub-steps of multiple stages or stage are not necessarily multiple sub-steps Completion is executed in synchronization, but can be executed at different times, the execution in these sub-steps or stage sequence is not yet Necessarily successively carry out, but can be at least part of the sub-step or stage of other steps or other steps in turn Or it alternately executes.

In one embodiment, as shown in fig. 7, providing a kind of robot behavior training device, the robot behavior Training device includes that decision data obtains module 100 and behavior model generation module 200.

Decision data obtains module 100, the decision data in action process for obtaining executive expert；Wherein, described Decision data includes multiple behavioral datas and corresponding observation data；

Behavior model generation module 200 carries out model autonomous learning, obtains robot for being based on the decision data Behavior model.

As shown in figure 8, in some embodiments, the behavior model generation module 200 includes: pretreated model generating unit 210 and the first row be generating unit 230；

The pretreated model generating unit 210, for being based on the decision data, training initial model is pre-processed Model；

The first row obtains robot behavior model for carrying out pretreated model autonomous learning for generating unit 230.

In some embodiments, the behavior model generation module 200 includes: the second behavior generating unit 220；

The second behavior generating unit 220 carries out initial model autonomous learning, obtains for being based on the decision data Robot behavior model.

As shown in figure 9, in some embodiments, it includes: current behavior data generating section that decision data, which obtains module 100, 110 and Current observation data generating section 130；

Current behavior data generating section 110, multiple current times in action process for obtaining the executive expert Behavioral data；

Current observation data generating section 130, first sensor is sent in the action process for obtaining the executive expert The multiple current time the observation data；Wherein, the behavioral data at the current time and the current time It is corresponding to observe data.

As shown in Figure 10, in some embodiments, decision data obtain module 100 include: current information acquisition unit 120, Lastrow is data generating section 140 and upper observation data generating section 160；

Current information acquisition unit 120, second sensor is sent more in the action process for obtaining the executive expert The relevant information of a current time behavioral data；

Lastrow is data generating section 140, for obtaining the behavior of multiple last moments according to the relevant information Data；

Upper one observes data generating section 160, and first sensor is sent in the action process for obtaining the executive expert The multiple last moment the observation data；Wherein, the behavioral data of the last moment and described upper a period of time The observation data carved are corresponding.

Specific restriction about robot behavior training device may refer to above for robot behavior training The restriction of method, details are not described herein.Modules in above-mentioned robot behavior training device can completely or partially lead to Software, hardware and combinations thereof are crossed to realize.Above-mentioned each module can be embedded in the form of hardware or independently of in computer equipment In processor, can also be stored in a software form in the memory in computer equipment, in order to processor call execute with The corresponding operation of upper modules.

In one embodiment, as shown in Fig. 5,6 or 11, a kind of robot behavior training system, including control dress are provided Set 400, first sensor 500 and behavioral data generating means 600.

The behavioral data for generating behavioral data, and is sent to the control and filled by behavioral data generating means 600 It sets.

The observation data are sent to control for obtaining the observation data in expert's action process by first sensor 500 Device processed.

Control device 400, the decision data in action process for obtaining executive expert；Wherein, the decision data Including multiple behavioral datas and the corresponding observation data；Based on the decision data, model autonomous learning is carried out, is obtained To robot behavior model.

Specific about control device limits the restriction that may refer to above for robot behavior training method, herein It repeats no more.

In one embodiment, as shown in Fig. 5,6 or 12, the robotic training system further include: robot 700 is used for The behavior of executive expert under teaching.

Specifically, first sensor includes but is not limited to:

Imaging sensor, the image data of the robot for obtaining a certain moment；

Timer, for obtaining the specific time data at a certain moment；

As shown in figure 13, in some embodiments, the behavioral data generating means 600 include: control unit 610；

Described control unit 610, for generating the behavioral data.

As shown in figure 14, in some embodiments, the behavioral data generating means 600 include: 620 He of second sensor Control unit 610；

The second sensor 620 is sent to institute for obtaining the relevant information of behavioral data described in multiple current times State control unit；

Described control unit 610 generates the behavioral data of multiple last moments for parsing the relevant information.

Specifically, second sensor 620 can include but is not limited to: imaging sensor and encoder.

It should be noted that when first sensor 500 includes such as:, can be with second when imaging sensor and encoder The imaging sensor for including and encoder of sensor 620 are separately independently arranged, in addition to this, can also be with common image sensor And encoder, i.e., it is parsed by the relevant information shot to a certain current time imaging sensor and encoder, it can It is generated as the behavioral data of last moment, the observation data for current time also can be generated.

Specifically, control device 400 and control unit 610 can be independently provided separately, a device (ratio can also be combined into Such as: as shown in Figure 5,6, control device 400 and control unit 610 merge, unified to realize control device 400 by control device 400 With the robot behavior training method of control unit 610 and behavioral data generation method etc..)

Control device 400 and control unit 610 can be programmable logic controller (PLC) (Programmable Logic Controller, PLC), field programmable gate array (Field-Programmable Gate Array, FPGA), computer (Personal Computer, PC), industrial control computer (Industrial Personal Computer, IPC) or service Device etc..Control device is according to program fixed in advance, in conjunction with the first sensor of the information, parameter or the outside that are manually entered And/or data of second sensor (such as imaging sensor) acquisition etc. generate program instruction.

In one embodiment, the present invention also provides a kind of including robot behavior training system described in above example Robot system.Associated description in relation to robot behavior training system is not repeated to go to live in the household of one's in-laws on getting married herein referring to above embodiment It states.

It should be noted that above-mentioned robot behavior training method, Behavioral training control device, Behavioral training system or machine The robot and/or sensor mentioned in device people's system etc., it can it is real machine people and the sensor under true environment, The virtual robot and/or sensor being also possible under emulation platform, by simulated environment with reach connection real intelligence body and/ Or the effect of sensor.The control device after virtual environment consummatory behavior is trained will be relied on, is transplanted under true environment, to true Robot and sensor carry out control or retraining, resource and the time of training process can be saved.

In one embodiment, a kind of computer equipment, including memory and processor are provided, memory is stored with calculating Machine program, processor perform the steps of the decision data in the action process for obtaining executive expert when executing computer program； Obtain the aid decision data in the action process for executing mistake；It is right based on the decision data and the aid decision data Initial model is trained, and obtains robot behavior learning model.

In one embodiment, a kind of computer readable storage medium is provided, computer program, computer are stored thereon with The decision data in the action process for obtaining executive expert is performed the steps of when program is executed by processor；Acquisition executes mistake Aid decision data in action process accidentally；Based on the decision data and the aid decision data, to initial model into Row training, obtains robot behavior learning model.

Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the computer program can be stored in a non-volatile computer In read/write memory medium, the computer program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, To any reference of memory, storage, database or other media used in each embodiment provided herein, Including non-volatile and/or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM (PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include Random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms, Such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double data rate sdram (DDRSDRAM), enhancing Type SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..

Each technical characteristic of embodiment described above can be combined arbitrarily, for simplicity of description, not to above-mentioned reality It applies all possible combination of each technical characteristic in example to be all described, as long as however, the combination of these technical characteristics is not deposited In contradiction, all should be considered as described in this specification.

In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, is not described in detail or remembers in some embodiment The part of load may refer to the associated description of other embodiments.

The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.

It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of software functional units.

Unless otherwise defined, technical and scientific term all used in this specification is led with technology of the invention is belonged to The normally understood meaning of the technical staff in domain is identical.In this specification in the description of the invention used in belong to and be only The purpose of description specific embodiment is not intended to the limitation present invention.

Claims of the present invention and specification and term " first " in above-mentioned attached drawing, " second ", " third ", " S110 ", " S120 " " S130 " etc. (if present) are for distinguishing similar object, without specific suitable for describing Sequence or precedence.It should be understood that the data used in this way are interchangeable under appropriate circumstances, so as to the embodiments described herein It can be implemented with the sequence other than the content for illustrating or describing herein.In addition, term " includes " " having " and they Any deformation, it is intended that cover and non-exclusive include.Such as: include series of steps or module process, method, System, product or robot those of are not necessarily limited to be clearly listed step or module, but including being not clearly listed Or other steps or module intrinsic for these process, methods, system, product or robot.

It should be noted that those skilled in the art should also know that, embodiment described in this description belongs to excellent Embodiment is selected, related structure and module are not necessarily essential to the invention.

The embodiments described above only express several embodiments of the present invention, and the description thereof is more specific and detailed, but simultaneously It cannot therefore be construed as limiting the scope of the patent.It should be pointed out that coming for those of ordinary skill in the art It says, without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to protection of the invention Range.Therefore, the scope of protection of the patent of the invention shall be subject to the appended claims.

Claims

1. a kind of robot behavior training method, which is characterized in that the robot behavior training method includes:

Obtain the decision data in the action process of executive expert；Wherein, the decision data includes multiple behavioral datas and right The observation data answered；

2. robot behavior training method according to claim 1, which is characterized in that it is described to be based on the decision data, Model autonomous learning is carried out, obtaining robot behavior model includes:

Based on the decision data, training initial model obtains pretreated model；

3. robot behavior training method according to claim 1, which is characterized in that it is described to be based on the decision data, Model autonomous learning is carried out, obtaining robot behavior model includes:

4. intelligent body Behavioral training method according to claim 1,2 or 3, which is characterized in that the acquisition executive expert Action process in decision data include:

Obtain the observation number at the multiple current time that first sensor is sent in the action process of the executive expert According to；Wherein, the behavioral data at the current time is corresponding with the observation data at the current time.

5. intelligent body Behavioral training method according to claim 1,2 or 3, which is characterized in that the acquisition executive expert Action process in decision data include:

Obtain the phase of behavioral data described in multiple current times that second sensor is sent in the action process of the executive expert Close information；

Obtain the observation number for the multiple last moment that first sensor is sent in the action process of the executive expert According to；Wherein, the behavioral data of the last moment is corresponding with the observation data of the last moment.

6. robot behavior training method according to claim 1,2 or 3, which is characterized in that the observation data include:

Image or according to described image generate robot pose or position data, force feedback data, driving unit movement Measure feedback data, ranging data, speed or acceleration analysis data, current or voltage measurement data, time data and/or temperature Data.

7. robot behavior training method according to claim 1,2 or 3, which is characterized in that the behavioral data includes: Object pose or position, robot each driving unit amount of exercise or robot amount of exercise.

8. robot behavior training method according to claim 1,2 or 3, which is characterized in that the behavior includes:

Object is grabbed from bulk or regularly placing object；

Assemble object；

Drop target object；And/or

Another location is moved to from a position.

9. a kind of robot behavior training device, which is characterized in that the robot behavior training device includes:

Decision data obtains module, the decision data in action process for obtaining executive expert；Wherein, the decision data Including multiple behavioral datas and corresponding observation data；

Behavior model generation module carries out model autonomous learning, obtains robot behavior mould for being based on the decision data Type.

10. a kind of robot behavior training system characterized by comprising

First sensor for obtaining the corresponding observation data of the behavioral data, and the observation data is sent to described Control device；

Control device, the decision data in action process for obtaining executive expert；Wherein, the decision data includes multiple The behavioral data and the corresponding observation data；Based on the decision data, model autonomous learning is carried out, robot is obtained Behavior model.

11. robot behavior training system according to claim 10, which is characterized in that robot behavior training system System further include:

Robot, for executing the behavior of the expert under teaching.

12. robot behavior training system described in 0 or 11 according to claim 1, which is characterized in that the sensor includes:

Imaging sensor, the image data of the robot for obtaining a certain moment；

Timer, for obtaining the specific time data at a certain moment；

13. intelligent body training system described in 0 or 11 according to claim 1, which is characterized in that the behavioral data generating means It include: control unit；

Described control unit, for generating the behavioral data.

14. intelligent body training system described in 0 or 11 according to claim 1, which is characterized in that the behavioral data generating means It include: second sensor and control unit；

The second sensor, for obtaining the relevant information of behavioral data described in multiple current times, by the relevant information It is sent to described control unit；

15. intelligent body training system described in 0 or 11 according to claim 1, which is characterized in that the behavioral data includes: mesh Mark appearance or position, robot each driving unit amount of exercise or robot amount of exercise.

16. a kind of robot system, which is characterized in that the robot system includes claim 10-12 described in any item Robot behavior training system.

17. a kind of computer equipment, including memory and processor, the memory are stored with computer program, feature exists In the processor realizes the described in any item robotic training methods of claim 1-8 when executing the computer program.

18. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program Claim 1-8 described in any item robotic training methods are realized when being executed by processor.