CN109760050A - Robot behavior training method, device, system, storage medium and equipment - Google Patents
Robot behavior training method, device, system, storage medium and equipment Download PDFInfo
- Publication number
- CN109760050A CN109760050A CN201910028901.4A CN201910028901A CN109760050A CN 109760050 A CN109760050 A CN 109760050A CN 201910028901 A CN201910028901 A CN 201910028901A CN 109760050 A CN109760050 A CN 109760050A
- Authority
- CN
- China
- Prior art keywords
- data
- robot
- behavioral
- behavior
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J19/00—Accessories fitted to manipulators, e.g. for monitoring, for viewing; Safety devices combined with or specially adapted for use in connection with manipulators
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/16—Programme controls
- B25J9/1628—Programme controls characterised by the control loop
- B25J9/163—Programme controls characterised by the control loop learning, adaptive, model based, rule based expert control
Abstract
This application involves a kind of robot behavior training method, device, system, storage medium and robots.Wherein, robot behavior training method includes: the decision data in the action process for obtain executive expert;Based on the decision data, initial model is trained to obtain pretreated model;Autonomous learning is carried out based on the pretreated model, obtains robot behavior model.Using technical solution of the present invention, the adaptability and accuracy of robot model's behavior act after improving training.
Description
Technical field
This application involves technical field of robot control, more particularly to a kind of robot behavior training method, device, are
System, storage medium and equipment.
Background technique
With the raising of scientific and technological level, entire society all develops towards intelligent, automation direction.More and more rows
For the realization dependent on robot.Such as: the movement of crawl is executed by robot, the movement of assembly, drives object movement
Etc. action behavior.
Artificial intelligence is that robot future development brings unlimited possibility, by being trained to neural network model,
So that the various movements of study execution that can be autonomous based on the robot that the network model controls.
But it shall be understood that the method based on machine learning carries out the Behavioral training study of robot, there is also to training number
According to excessively rely on, learning effect is bad etc. the problem of.
Summary of the invention
Based on this, the present invention provides a kind of robot behavior training method, device, system, storage medium and equipment.
First aspect present invention provides a kind of robot behavior training method, and the robot behavior training method includes:
Obtain the decision data in the action process of executive expert;Wherein, the decision data includes multiple behavioral datas
With corresponding observation data;
Based on the decision data, model autonomous learning is carried out, robot behavior model is obtained.
Further, described to be based on the decision data, model autonomous learning is carried out, obtaining robot behavior model includes:
Based on the decision data, training initial model obtains pretreated model;
The pretreated model autonomous learning is carried out, the robot behavior model is obtained.
Further, described to be based on the decision data, model autonomous learning is carried out, obtaining robot behavior model includes:
Based on the decision data, initial model autonomous learning is carried out, the robot behavior model is obtained.
Further, the decision data in the action process for obtaining executive expert includes:
Obtain the behavioral data at multiple current times in the action process of the executive expert;
Obtain the sight at the multiple current time that first sensor is sent in the action process of the executive expert
Measured data;Wherein, the behavioral data at the current time is corresponding with the observation data at the current time.
Further, the decision data in the action process for obtaining executive expert includes:
Obtain behavioral data described in multiple current times that second sensor is sent in the action process of the executive expert
Relevant information;
According to the relevant information, the behavioral data of multiple last moments is obtained;
Obtain the sight for the multiple last moment that first sensor is sent in the action process of the executive expert
Measured data;Wherein, the behavioral data of the last moment is corresponding with the observation data of the last moment.
Further, the observation data include:
The pose or position data, force feedback data, driving unit of image or the robot generated according to described image
Amount of exercise feedback data, ranging data, speed or acceleration analysis data, current or voltage measurement data, time data and/or
Temperature data.
Further, the behavioral data include: object pose or position, robot each driving unit amount of exercise or
The amount of exercise of robot.
Further, the behavior includes:
Object is grabbed from bulk or regularly placing object;
Assemble object;
Drop target object;And/or
Another location is moved to from a position.
Second aspect of the present invention provides a kind of robot behavior training device, the robot behavior Training Control dress
It sets and includes:
Decision data obtains module, the decision data in action process for obtaining executive expert;Wherein, the decision
Data include multiple behavioral datas and corresponding observation data;
Behavior model generation module carries out model autonomous learning, obtains robot behavior for being based on the decision data
Model.
Third aspect present invention provides a kind of robot behavior training system, comprising:
Behavioral data generating means are sent to the control device for generating behavioral data, and by the behavioral data;
First sensor is sent to for obtaining the corresponding observation data of the behavioral data, and by the observation data
The control device;
Control device, the decision data in action process for obtaining executive expert;Wherein, the decision data includes
Multiple behavioral datas and the corresponding observation data;Based on the decision data, model autonomous learning is carried out, machine is obtained
Device people's behavior model.
Further, the robot behavior training system further include:
Robot, for executing the behavior of the expert under teaching.
Further, the sensor includes:
Imaging sensor, the image data of the robot for obtaining a certain moment;
Force snesor, the force feedback data of the robot for obtaining a certain moment;
Encoder, the motion feedback data of the driving unit for obtaining a certain moment robot;
Range finder, for obtaining the relevant ranging data of distance of a certain moment robot;
Speed or acceleration information measuring appliance, for obtaining the speed or acceleration analysis data of a certain moment robot;
Current or voltage measuring appliance, for obtaining the current or voltage measurement data of a certain moment robot;
Timer, for obtaining the specific time data at a certain moment;
Temperature sensor, for obtaining the temperature data of a certain moment robot.
Further, the behavioral data generating means include: control unit;
Described control unit, for generating the behavioral data.
Further, the behavioral data generating means include: second sensor and control unit;
The second sensor, for obtaining the relevant information of behavioral data described in multiple current times, by the correlation
Information is sent to described control unit;
Described control unit, for obtaining the behavioral data of multiple last moments according to the relevant information.
Further, the behavioral data include: object pose or position, robot each driving unit amount of exercise or
The amount of exercise of robot.
Fourth aspect present invention provides a kind of robot system, and the robot system includes machine described in any of the above item
Device people's Behavioral training system.
Fifth aspect present invention provides a kind of computer equipment, including memory and processor, and the memory is stored with
Computer program, the processor realize robotic training method described in any of the above item when executing the computer program.
Sixth aspect present invention provides a kind of computer readable storage medium, is stored thereon with computer program, feature
It is, the computer program realizes robotic training method described in any of the above item when being executed by processor.
Model autonomous learning is carried out, robot is obtained due to being based on the decision data using technical method of the invention
Behavior model, therefore improve the adaptability of consummatory behavior movement and accurate in all cases of the robot model after training
Property.
Detailed description of the invention
Fig. 1 is the first pass schematic diagram of robot behavior training method in one embodiment;
Fig. 2 is the second procedure schematic diagram of robot behavior training method in one embodiment;
Fig. 3 is the third flow diagram of robot behavior training method in one embodiment;
Fig. 4 is the 4th flow diagram of robot behavior training method in one embodiment;
Fig. 5 is the first structure diagram of the embodiment of robot system;
Fig. 6 is the second structural schematic diagram of the embodiment of robot system;
Fig. 7 is the first structure block diagram of robotic training device;
Fig. 8 is the second structural block diagram of robotic training device;
Fig. 9 is the third structural block diagram of robotic training device;
Figure 10 is the 4th structural block diagram of robotic training device;
Figure 11 is the first structure block diagram of middle robotic training system;
Figure 12 is the second structural block diagram of middle robotic training system;
Figure 13 is the first structure block diagram of the behavioral data generating means of middle robot;
Figure 14 is the second structural block diagram of the behavioral data generating means of middle robot.
Specific embodiment
It is with reference to the accompanying drawings and embodiments, right in order to which the objects, technical solutions and advantages of the application are more clearly understood
The application is further elaborated.It should be appreciated that specific embodiment described herein is only used to explain the application, not
For limiting the application.
In one embodiment, as shown in Figure 1, providing a kind of robot behavior training method, robot behavior training
Method the following steps are included:
Step S100 obtains the decision data in the action process of executive expert;Wherein, decision data is multiple observation numbers
According to and corresponding behavioral data.
Specifically, inscribing the observation data of acquisition and the corresponding behavior number of the moment acquisition when decision data refers to a certain
According to the set summarized.
Specifically, action process can include but is not limited to: grabbing target object from bulk or regularly arranged object
Movement (as shown in Figure 6);Assemble the movement (as shown in Figure 5) of target object;The movement (omitting attached drawing) for putting down object, from
One position some or all of moves in the movement (omit attached drawing) or above-mentioned each movement of another position movement
Combination.
In one embodiment, decision data is to obtain in the action process by teaching robot executive expert.Tool
Body, control instruction of controller generation etc. band mobile robot executive expert row can be directed or through by operator
For;Such as: robot completes the assembly behavior act of building blocks under the drive of operator;For another example: being sent according to controller
Robot each driving unit amount of exercise instruction complete building blocks assembly behavior act.
Further, in some embodiments, it is executed by the instruction tape mobile robot that behavioral data generating means generate special
In the case where family's behavior:
Behavioral data can include but is not limited to: each step in the action process of executive expert is corresponding to pass through controller
The robot of output executes the object pose (X, Y, Z, U, V, W coordinate) or position (X, Y coordinates) of each step;Either according to mesh
Mark appearance or position, the amount of exercise (rotation amount and/or flat of the driving unit of the corresponding robot calculated based on kinematical equation
Shifting amount);The either amount of exercise of robot.
Control device obtains the observation data that first sensor is obtained and sent.Specifically, observation data may include but
Be not limited to: imaging sensor obtain and the image data that sends or according to the robot of the image data extraction (such as: robot
End effector) pose or position, distance measuring sensor obtain and the ranging data that sends, force snesor obtain and send
Driving unit amount of exercise (rotation amount and/or the translation of robot that power (power/torque) feedback data, encoder are obtained and sent
Amount) data, speed or acceleration analysis device obtains and send speed or acceleration information, current or voltage measuring appliance acquisition simultaneously
The temperature number that the current or voltage measurement data of transmission, timer obtain and the time data that send, thermometer are obtained and sent
According to.
Such as: as shown in figure 5, by taking image training robot assembles behavior (such as: object M2 is assembled on object M1) as an example,
Obtain the multiple groups decision data in the action process of executive expert;It is at a time obtained down specifically, behavioral data can be
The movement of the object pose or position or driving unit of the robot next step exported by behavioral data generating means taken
Amount;And the corresponding image data for observing each first sensor transmission that acquisition is inscribed when data can be this or pose or position
It sets, force feedback data, encoder feedback data, speed or acceleration information and/or current or voltage data etc.;It is special executing
In the action process of family, the multiple groups decision data that will acquire is sent to the control unit of robot.In some embodiments, it executes
Multiple groups decision data in the action process of expert needs to include at least the decision data under assembly success status.
It in some embodiments, is image data when control device obtains, it can be directly using image data as observation
Data, or the pose of robot is gone out perhaps behind position using pose or position as observation according to image data extraction
Data.
Further, in some embodiments, in the case where the behavior by operator with mobile robot executive expert:
Since in this case, none very specific behavior command is as behavioral data, in order to obtain behavior number
According to can indirectly obtain the relevant information of behavioral data or behavioral data by certain second sensors.At this moment, the first sensing
Device and second sensor may include the sensor of identical type, such as: imaging sensor and encoder, in some embodiments
In, sensor identical in first sensor and second sensor can be merged into a sensor, that is, the data obtained are i.e.
It can be used as behavioral data, can also be used as status data.Such as: the driving list that the encoder obtained under current time is sent
The amount of exercise data of member;The amount of exercise data can do the observation data inscribed when this, can also be used as the row at last moment
For data.For another example: the imaging sensor under current time according to acquisition obtains and sends the pose that image obtains robot
Or position may act as the behavioral data of last moment;It can be used as the observation data of the robot under current time again.
Such as: as shown in fig. 6, by taking image training robot is from the behavior for grabbing object in bulk as an example.Wherein, bulk
Refer to that multiple objects M is scattered with irregular state.Obtain the multiple groups decision data in the action process of executive expert
(behavioral data and corresponding observation data);Specifically, the behavioral data at a certain current time subsequent time can obtain according to
The pose of the robot for the image zooming-out that the imaging sensor taken is sent or position, or the figure according to current time and subsequent time
As the amount of exercise of the robot of the pose or position acquisition of the robot of extraction;And the observation data at current time then can be to work as
The information that each first sensor of the acquisition inscribed when preceding is sent, such as: the force feedback data of force snesor (such as: in hand
The pressure sensor being arranged on finger obtains the numerical value and/or directional information of power when completing grasping movement;Or in robot
Multi-dimension force sensor is arranged in terminal shaft output end, obtains the variation etc. of the power or torque of output end in the process of grasping), driving
(robot is during the motion for unit feedback data (such as: motor rotation or the angle of movement), speed or acceleration information
Speed or acceleration) and/or current or voltage data (such as: input motor current or voltage value) etc., in addition, according to current
The image data at moment can also be with the pose or position data at the current time of extraction machine people.
Specifically, behavioral data can include but is not limited to: object pose or position, robot each driving unit
The amount of exercise of amount of exercise or robot.
In some embodiments, the multiple groups decision data in the action process of executive expert is needed to include at least and be grabbed successfully
When the decision data inscribed.
For another example: by taking image training robot moves (translation and/or rotation) to another position from a position as an example.It obtains
Multiple groups decision data in the action process of executive expert;Specifically, behavioral data may include what imaging sensor obtained
The athletic performance of robot it is each when the pose of the actuator of the robot of image zooming-out inscribed;And the moment is corresponding
Observe data: such as: by the range information for the distance objective position that ranging data is fed back, such as: it installs and surveys in robot
Distance meter (such as: infrared range-measurement system), range information driving unit feedback data, the speed of distance objective position are fed back by rangefinder
Degree or acceleration information etc..Specifically, the multiple groups decision data in the action process of executive expert needs to include at least movement
The decision data inscribed when to target position.
In one embodiment, decision data is what expert obtained in the action process of executive expert.
Specifically, expert can be operator or other robot, such as: it obtains some operator and assembles realizing
Decision data in behavior;Specifically, such as: the behaviour that can be shot and send by obtaining multiple current time imaging sensors
Work person is in the image data for executing assembling process to obtain the behavioral data of last moment of the operator in assembling process and work as
The observation data at preceding moment;In addition to this, it can also be fed back by force snesor and be held in people in the installation force snesor on hand of people
Luggage is with the observation data etc. in action process.
Specifically, during executing the behavior of the various experts of grasping body, assembly etc., under multiple states of acquisition
Image data can be 3D rendering, 2D image or video image.Imaging sensor can include but is not limited to: camera is taken the photograph
Camera, scanner or other equipment (mobile phone, computer etc.) etc. with correlation function.Imaging sensor can for more than or equal to
Any of 1.
Specifically, imaging sensor can be set in robot or be fixed on a certain position outside robot, it is right in advance
Imaging sensor, imaging sensor and robot (referred to as " eye hand ") and robot are demarcated.
Specifically, robot can be multiple joints and various types of machinery that connecting rod passes through formation in series or in parallel
Hand, each joint are a driving unit, such as: the Serial manipulators such as four axis robots, six axis robot or parallel manipulator
Hand.In some embodiments, the output end of the terminal shaft of manipulator also fixation ends actuator, end effector can be sucker
Or clamping jaw etc..In some embodiments, the amount of exercise of the robot in above example may refer to any portion of robot
The amount of exercise of position, such as;The amount of exercise of end effector.
Step S200 is based on the decision data, carries out model autonomous learning, obtains robot behavior model;
Model autonomous learning is carried out, machine is obtained due to being based on the decision data by using learning method above
People's behavior model, therefore improve the adaptability of consummatory behavior movement and accurate in all cases of the robot model after training
Property.
As shown in Fig. 2, in some embodiments, step S200 includes following method and step:
S210 is based on the decision data, and training initial model obtains pretreated model;
Using status data as feature (feature), behavioral data is classified as label (label) (for discrete
Movement) or return (for continuous action) study, the parameter of initial model is constantly updated, to obtain pretreated model.
S230 carries out autonomous learning based on the pretreated model, obtains robot behavior model.
Autonomous learning process is exactly to allow robot to be based on pretreated model to generate some action trails, then defines a mark
Difference between action trail of the standard to judge the expert of these tracks and teaching period acquisition, then according to this difference come more
The strategy of new pretreated model, the track for generating it next time more close to the behavior of expert, is sentenced until according to standard
The action trail of the disconnected close enough expert of action trail generated based on pretreated model, the then model obtained are final machine
People's behavior model.
Specifically, standard described in above example can based on experience value, machine learning, the various methods of random value etc.
It obtains, in some embodiments, this standard can be indicated with the neural network through overfitting.
By using learning method above, due to the decision data that is obtained in the behavior based on expert to initial model into
Row training obtains pretreated model, carries out autonomous learning based on pretreated model, finally obtains the model of machine learning, therefore mention
The adaptability and accuracy of robot model after high training consummatory behavior movement in all cases.
Alternatively, it is also possible to reduce the time of robot behavior model training.
In some embodiments, step S200 includes following method and step:
S220 is used to be based on the decision data, carries out initial model autonomous learning, obtains robot behavior model.
Autonomous learning process is exactly to allow robot to be based on initial model to generate some action trails, defines a standard to sentence
The difference broken between the action trail of expert that these tracks and teaching period obtain, then updates pre- place according to this difference
The strategy of model is managed, the track for generating it next time more close to the behavior of expert, is based on until according to standard judgement
The action trail for the close enough expert of action trail that initial model generates, the then model obtained are final robot behavior mould
Type.
As shown in figure 3, in some embodiments, the decision number in the action process of executive expert is obtained described in step S100
According to may include following method and step:
S110 obtains the behavioral data at multiple current times in the action process of the executive expert;
S130 obtains the institute at the multiple current time that first sensor is sent in the action process of the executive expert
State observation data;Wherein, the behavioral data at the current time is corresponding with the observation data at the current time.
As shown in figure 4, in some embodiments, the decision number in the action process of executive expert is obtained described in step S100
According to may include following method and step:
S120 obtains the behavior number at multiple current times that second sensor is sent in the action process of the executive expert
According to relevant information;
S140 obtains the behavioral data of multiple last moments according to the relevant information;
Such as: when relevant information is the image information that imaging sensor is sent, image information is parsed, to generate machine
The pose of people perhaps position or is made according to the amount of exercise that the pose of current time and subsequent time or position generate robot
For the behavioral data of last moment.
For another example: when relevant information is the amount of exercise for each driving unit that encoder is sent, directly by the amount of exercise
Behavioral data of the information as last moment.
S160 obtains the sight for the multiple last moment that first sensor is sent in the action process of the executive expert
Measured data;Wherein, the behavioral data of the last moment is corresponding with the observation data of the last moment.
It should be understood that although each step in the flow chart of Fig. 1,2,3 or 4 is successively shown according to the instruction of arrow
Show, but these steps are not that the inevitable sequence according to arrow instruction successively executes.Unless expressly state otherwise herein, this
There is no stringent sequences to limit for the execution of a little steps, these steps can execute in other order.Moreover, Fig. 1,2,3 or 4
In at least part step may include that perhaps these sub-steps of multiple stages or stage are not necessarily multiple sub-steps
Completion is executed in synchronization, but can be executed at different times, the execution in these sub-steps or stage sequence is not yet
Necessarily successively carry out, but can be at least part of the sub-step or stage of other steps or other steps in turn
Or it alternately executes.
In one embodiment, as shown in fig. 7, providing a kind of robot behavior training device, the robot behavior
Training device includes that decision data obtains module 100 and behavior model generation module 200.
Decision data obtains module 100, the decision data in action process for obtaining executive expert;Wherein, described
Decision data includes multiple behavioral datas and corresponding observation data;
Behavior model generation module 200 carries out model autonomous learning, obtains robot for being based on the decision data
Behavior model.
As shown in figure 8, in some embodiments, the behavior model generation module 200 includes: pretreated model generating unit
210 and the first row be generating unit 230;
The pretreated model generating unit 210, for being based on the decision data, training initial model is pre-processed
Model;
The first row obtains robot behavior model for carrying out pretreated model autonomous learning for generating unit 230.
In some embodiments, the behavior model generation module 200 includes: the second behavior generating unit 220;
The second behavior generating unit 220 carries out initial model autonomous learning, obtains for being based on the decision data
Robot behavior model.
As shown in figure 9, in some embodiments, it includes: current behavior data generating section that decision data, which obtains module 100,
110 and Current observation data generating section 130;
Current behavior data generating section 110, multiple current times in action process for obtaining the executive expert
Behavioral data;
Current observation data generating section 130, first sensor is sent in the action process for obtaining the executive expert
The multiple current time the observation data;Wherein, the behavioral data at the current time and the current time
It is corresponding to observe data.
As shown in Figure 10, in some embodiments, decision data obtain module 100 include: current information acquisition unit 120,
Lastrow is data generating section 140 and upper observation data generating section 160;
Current information acquisition unit 120, second sensor is sent more in the action process for obtaining the executive expert
The relevant information of a current time behavioral data;
Lastrow is data generating section 140, for obtaining the behavior of multiple last moments according to the relevant information
Data;
Upper one observes data generating section 160, and first sensor is sent in the action process for obtaining the executive expert
The multiple last moment the observation data;Wherein, the behavioral data of the last moment and described upper a period of time
The observation data carved are corresponding.
Specific restriction about robot behavior training device may refer to above for robot behavior training
The restriction of method, details are not described herein.Modules in above-mentioned robot behavior training device can completely or partially lead to
Software, hardware and combinations thereof are crossed to realize.Above-mentioned each module can be embedded in the form of hardware or independently of in computer equipment
In processor, can also be stored in a software form in the memory in computer equipment, in order to processor call execute with
The corresponding operation of upper modules.
In one embodiment, as shown in Fig. 5,6 or 11, a kind of robot behavior training system, including control dress are provided
Set 400, first sensor 500 and behavioral data generating means 600.
The behavioral data for generating behavioral data, and is sent to the control and filled by behavioral data generating means 600
It sets.
The observation data are sent to control for obtaining the observation data in expert's action process by first sensor 500
Device processed.
Control device 400, the decision data in action process for obtaining executive expert;Wherein, the decision data
Including multiple behavioral datas and the corresponding observation data;Based on the decision data, model autonomous learning is carried out, is obtained
To robot behavior model.
Specific about control device limits the restriction that may refer to above for robot behavior training method, herein
It repeats no more.
In one embodiment, as shown in Fig. 5,6 or 12, the robotic training system further include: robot 700 is used for
The behavior of executive expert under teaching.
Specifically, first sensor includes but is not limited to:
Imaging sensor, the image data of the robot for obtaining a certain moment;
Force snesor, the force feedback data of the robot for obtaining a certain moment;
Encoder, the motion feedback data of the driving unit for obtaining a certain moment robot;
Range finder, for obtaining the relevant ranging data of distance of a certain moment robot;
Speed or acceleration information measuring appliance, for obtaining the speed or acceleration analysis data of a certain moment robot;
Current or voltage measuring appliance, for obtaining the current or voltage measurement data of a certain moment robot;
Timer, for obtaining the specific time data at a certain moment;
Temperature sensor, for obtaining the temperature data of a certain moment robot.
As shown in figure 13, in some embodiments, the behavioral data generating means 600 include: control unit 610;
Described control unit 610, for generating the behavioral data.
As shown in figure 14, in some embodiments, the behavioral data generating means 600 include: 620 He of second sensor
Control unit 610;
The second sensor 620 is sent to institute for obtaining the relevant information of behavioral data described in multiple current times
State control unit;
Described control unit 610 generates the behavioral data of multiple last moments for parsing the relevant information.
Specifically, second sensor 620 can include but is not limited to: imaging sensor and encoder.
It should be noted that when first sensor 500 includes such as:, can be with second when imaging sensor and encoder
The imaging sensor for including and encoder of sensor 620 are separately independently arranged, in addition to this, can also be with common image sensor
And encoder, i.e., it is parsed by the relevant information shot to a certain current time imaging sensor and encoder, it can
It is generated as the behavioral data of last moment, the observation data for current time also can be generated.
Specifically, control device 400 and control unit 610 can be independently provided separately, a device (ratio can also be combined into
Such as: as shown in Figure 5,6, control device 400 and control unit 610 merge, unified to realize control device 400 by control device 400
With the robot behavior training method of control unit 610 and behavioral data generation method etc..)
Control device 400 and control unit 610 can be programmable logic controller (PLC) (Programmable Logic
Controller, PLC), field programmable gate array (Field-Programmable Gate Array, FPGA), computer
(Personal Computer, PC), industrial control computer (Industrial Personal Computer, IPC) or service
Device etc..Control device is according to program fixed in advance, in conjunction with the first sensor of the information, parameter or the outside that are manually entered
And/or data of second sensor (such as imaging sensor) acquisition etc. generate program instruction.
In one embodiment, the present invention also provides a kind of including robot behavior training system described in above example
Robot system.Associated description in relation to robot behavior training system is not repeated to go to live in the household of one's in-laws on getting married herein referring to above embodiment
It states.
It should be noted that above-mentioned robot behavior training method, Behavioral training control device, Behavioral training system or machine
The robot and/or sensor mentioned in device people's system etc., it can it is real machine people and the sensor under true environment,
The virtual robot and/or sensor being also possible under emulation platform, by simulated environment with reach connection real intelligence body and/
Or the effect of sensor.The control device after virtual environment consummatory behavior is trained will be relied on, is transplanted under true environment, to true
Robot and sensor carry out control or retraining, resource and the time of training process can be saved.
In one embodiment, a kind of computer equipment, including memory and processor are provided, memory is stored with calculating
Machine program, processor perform the steps of the decision data in the action process for obtaining executive expert when executing computer program;
Obtain the aid decision data in the action process for executing mistake;It is right based on the decision data and the aid decision data
Initial model is trained, and obtains robot behavior learning model.
In one embodiment, a kind of computer readable storage medium is provided, computer program, computer are stored thereon with
The decision data in the action process for obtaining executive expert is performed the steps of when program is executed by processor;Acquisition executes mistake
Aid decision data in action process accidentally;Based on the decision data and the aid decision data, to initial model into
Row training, obtains robot behavior learning model.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with
Relevant hardware is instructed to complete by computer program, the computer program can be stored in a non-volatile computer
In read/write memory medium, the computer program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein,
To any reference of memory, storage, database or other media used in each embodiment provided herein,
Including non-volatile and/or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM
(PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include
Random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms,
Such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double data rate sdram (DDRSDRAM), enhancing
Type SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM
(RDRAM), direct memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..
Each technical characteristic of embodiment described above can be combined arbitrarily, for simplicity of description, not to above-mentioned reality
It applies all possible combination of each technical characteristic in example to be all described, as long as however, the combination of these technical characteristics is not deposited
In contradiction, all should be considered as described in this specification.
In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, is not described in detail or remembers in some embodiment
The part of load may refer to the associated description of other embodiments.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit
The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple
In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme
's.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit
It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list
Member both can take the form of hardware realization, can also realize in the form of software functional units.
Unless otherwise defined, technical and scientific term all used in this specification is led with technology of the invention is belonged to
The normally understood meaning of the technical staff in domain is identical.In this specification in the description of the invention used in belong to and be only
The purpose of description specific embodiment is not intended to the limitation present invention.
Claims of the present invention and specification and term " first " in above-mentioned attached drawing, " second ", " third ",
" S110 ", " S120 " " S130 " etc. (if present) are for distinguishing similar object, without specific suitable for describing
Sequence or precedence.It should be understood that the data used in this way are interchangeable under appropriate circumstances, so as to the embodiments described herein
It can be implemented with the sequence other than the content for illustrating or describing herein.In addition, term " includes " " having " and they
Any deformation, it is intended that cover and non-exclusive include.Such as: include series of steps or module process, method,
System, product or robot those of are not necessarily limited to be clearly listed step or module, but including being not clearly listed
Or other steps or module intrinsic for these process, methods, system, product or robot.
It should be noted that those skilled in the art should also know that, embodiment described in this description belongs to excellent
Embodiment is selected, related structure and module are not necessarily essential to the invention.
The embodiments described above only express several embodiments of the present invention, and the description thereof is more specific and detailed, but simultaneously
It cannot therefore be construed as limiting the scope of the patent.It should be pointed out that coming for those of ordinary skill in the art
It says, without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to protection of the invention
Range.Therefore, the scope of protection of the patent of the invention shall be subject to the appended claims.
Claims (18)
1. a kind of robot behavior training method, which is characterized in that the robot behavior training method includes:
Obtain the decision data in the action process of executive expert;Wherein, the decision data includes multiple behavioral datas and right
The observation data answered;
Based on the decision data, model autonomous learning is carried out, robot behavior model is obtained.
2. robot behavior training method according to claim 1, which is characterized in that it is described to be based on the decision data,
Model autonomous learning is carried out, obtaining robot behavior model includes:
Based on the decision data, training initial model obtains pretreated model;
The pretreated model autonomous learning is carried out, the robot behavior model is obtained.
3. robot behavior training method according to claim 1, which is characterized in that it is described to be based on the decision data,
Model autonomous learning is carried out, obtaining robot behavior model includes:
Based on the decision data, initial model autonomous learning is carried out, the robot behavior model is obtained.
4. intelligent body Behavioral training method according to claim 1,2 or 3, which is characterized in that the acquisition executive expert
Action process in decision data include:
Obtain the behavioral data at multiple current times in the action process of the executive expert;
Obtain the observation number at the multiple current time that first sensor is sent in the action process of the executive expert
According to;Wherein, the behavioral data at the current time is corresponding with the observation data at the current time.
5. intelligent body Behavioral training method according to claim 1,2 or 3, which is characterized in that the acquisition executive expert
Action process in decision data include:
Obtain the phase of behavioral data described in multiple current times that second sensor is sent in the action process of the executive expert
Close information;
According to the relevant information, the behavioral data of multiple last moments is obtained;
Obtain the observation number for the multiple last moment that first sensor is sent in the action process of the executive expert
According to;Wherein, the behavioral data of the last moment is corresponding with the observation data of the last moment.
6. robot behavior training method according to claim 1,2 or 3, which is characterized in that the observation data include:
Image or according to described image generate robot pose or position data, force feedback data, driving unit movement
Measure feedback data, ranging data, speed or acceleration analysis data, current or voltage measurement data, time data and/or temperature
Data.
7. robot behavior training method according to claim 1,2 or 3, which is characterized in that the behavioral data includes:
Object pose or position, robot each driving unit amount of exercise or robot amount of exercise.
8. robot behavior training method according to claim 1,2 or 3, which is characterized in that the behavior includes:
Object is grabbed from bulk or regularly placing object;
Assemble object;
Drop target object;And/or
Another location is moved to from a position.
9. a kind of robot behavior training device, which is characterized in that the robot behavior training device includes:
Decision data obtains module, the decision data in action process for obtaining executive expert;Wherein, the decision data
Including multiple behavioral datas and corresponding observation data;
Behavior model generation module carries out model autonomous learning, obtains robot behavior mould for being based on the decision data
Type.
10. a kind of robot behavior training system characterized by comprising
Behavioral data generating means are sent to the control device for generating behavioral data, and by the behavioral data;
First sensor for obtaining the corresponding observation data of the behavioral data, and the observation data is sent to described
Control device;
Control device, the decision data in action process for obtaining executive expert;Wherein, the decision data includes multiple
The behavioral data and the corresponding observation data;Based on the decision data, model autonomous learning is carried out, robot is obtained
Behavior model.
11. robot behavior training system according to claim 10, which is characterized in that robot behavior training system
System further include:
Robot, for executing the behavior of the expert under teaching.
12. robot behavior training system described in 0 or 11 according to claim 1, which is characterized in that the sensor includes:
Imaging sensor, the image data of the robot for obtaining a certain moment;
Force snesor, the force feedback data of the robot for obtaining a certain moment;
Encoder, the motion feedback data of the driving unit for obtaining a certain moment robot;
Range finder, for obtaining the relevant ranging data of distance of a certain moment robot;
Speed or acceleration information measuring appliance, for obtaining the speed or acceleration analysis data of a certain moment robot;
Current or voltage measuring appliance, for obtaining the current or voltage measurement data of a certain moment robot;
Timer, for obtaining the specific time data at a certain moment;
Temperature sensor, for obtaining the temperature data of a certain moment robot.
13. intelligent body training system described in 0 or 11 according to claim 1, which is characterized in that the behavioral data generating means
It include: control unit;
Described control unit, for generating the behavioral data.
14. intelligent body training system described in 0 or 11 according to claim 1, which is characterized in that the behavioral data generating means
It include: second sensor and control unit;
The second sensor, for obtaining the relevant information of behavioral data described in multiple current times, by the relevant information
It is sent to described control unit;
Described control unit, for obtaining the behavioral data of multiple last moments according to the relevant information.
15. intelligent body training system described in 0 or 11 according to claim 1, which is characterized in that the behavioral data includes: mesh
Mark appearance or position, robot each driving unit amount of exercise or robot amount of exercise.
16. a kind of robot system, which is characterized in that the robot system includes claim 10-12 described in any item
Robot behavior training system.
17. a kind of computer equipment, including memory and processor, the memory are stored with computer program, feature exists
In the processor realizes the described in any item robotic training methods of claim 1-8 when executing the computer program.
18. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program
Claim 1-8 described in any item robotic training methods are realized when being executed by processor.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910028901.4A CN109760050A (en) | 2019-01-12 | 2019-01-12 | Robot behavior training method, device, system, storage medium and equipment |
CN201910509236.0A CN110293560A (en) | 2019-01-12 | 2019-06-13 | Robot behavior training, planing method, device, system, storage medium and equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910028901.4A CN109760050A (en) | 2019-01-12 | 2019-01-12 | Robot behavior training method, device, system, storage medium and equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109760050A true CN109760050A (en) | 2019-05-17 |
Family
ID=66452699
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910028901.4A Pending CN109760050A (en) | 2019-01-12 | 2019-01-12 | Robot behavior training method, device, system, storage medium and equipment |
CN201910509236.0A Pending CN110293560A (en) | 2019-01-12 | 2019-06-13 | Robot behavior training, planing method, device, system, storage medium and equipment |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910509236.0A Pending CN110293560A (en) | 2019-01-12 | 2019-06-13 | Robot behavior training, planing method, device, system, storage medium and equipment |
Country Status (1)
Country | Link |
---|---|
CN (2) | CN109760050A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110456644A (en) * | 2019-08-13 | 2019-11-15 | 北京地平线机器人技术研发有限公司 | Determine the method, apparatus and electronic equipment of the execution action message of automation equipment |
CN110532320A (en) * | 2019-08-01 | 2019-12-03 | 立旃(上海)科技有限公司 | Training data management method and device based on block chain |
CN112230647A (en) * | 2019-06-28 | 2021-01-15 | 鲁班嫡系机器人(深圳)有限公司 | Intelligent power system behavior model, training method and device for trajectory planning |
CN112847337A (en) * | 2020-12-24 | 2021-05-28 | 珠海新天地科技有限公司 | Method for autonomous operation of application program by industrial robot |
CN113386133A (en) * | 2021-06-10 | 2021-09-14 | 贵州恰到科技有限公司 | Control method of reinforcement learning robot |
WO2022012265A1 (en) * | 2020-07-13 | 2022-01-20 | Guangzhou Institute Of Advanced Technology, Chinese Academy Of Sciences | Robot learning from demonstration via meta-imitation learning |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112873214B (en) * | 2021-03-18 | 2024-02-23 | 中国工程物理研究院机械制造工艺研究所 | Robot state real-time monitoring system and method based on acceleration information |
CN113876437B (en) * | 2021-09-13 | 2024-02-23 | 上海微创医疗机器人(集团)股份有限公司 | Storage medium, robot system, and computer device |
Family Cites Families (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102200787B (en) * | 2011-04-18 | 2013-04-17 | 重庆大学 | Robot behaviour multi-level integrated learning method and robot behaviour multi-level integrated learning system |
US9070083B2 (en) * | 2011-12-13 | 2015-06-30 | Iucf-Hyu Industry-University Cooperation Foundation Hanyang University | Method for learning task skill and robot using thereof |
US9384443B2 (en) * | 2013-06-14 | 2016-07-05 | Brain Corporation | Robotic training apparatus and methods |
US9463571B2 (en) * | 2013-11-01 | 2016-10-11 | Brian Corporation | Apparatus and methods for online training of robots |
CN104924313B (en) * | 2015-05-13 | 2017-03-01 | 北京工业大学 | There is teach-by-doing teaching mechanical arm system and the method for learning by imitation mechanism |
CN106773659A (en) * | 2015-11-20 | 2017-05-31 | 哈尔滨工大天才智能科技有限公司 | A kind of robot learning by imitation method based on Gaussian process |
JP6616170B2 (en) * | 2015-12-07 | 2019-12-04 | ファナック株式会社 | Machine learning device, laminated core manufacturing apparatus, laminated core manufacturing system, and machine learning method for learning stacking operation of core sheet |
US20170249561A1 (en) * | 2016-02-29 | 2017-08-31 | GM Global Technology Operations LLC | Robot learning via human-demonstration of tasks with force and position objectives |
US10807233B2 (en) * | 2016-07-26 | 2020-10-20 | The University Of Connecticut | Skill transfer from a person to a robot |
CN106454108B (en) * | 2016-11-04 | 2019-05-03 | 北京百度网讯科技有限公司 | Track up method, apparatus and electronic equipment based on artificial intelligence |
CN106938470B (en) * | 2017-03-22 | 2017-10-31 | 华中科技大学 | A kind of device and method of Robot Force control teaching learning by imitation |
KR102010129B1 (en) * | 2017-05-26 | 2019-08-12 | 한국과학기술원 | Method and apparatus for emulating behavior of robot |
CN108229678B (en) * | 2017-10-24 | 2021-04-06 | 深圳市商汤科技有限公司 | Network training method, operation control method, device, storage medium and equipment |
CN108115681B (en) * | 2017-11-14 | 2020-04-07 | 深圳先进技术研究院 | Simulation learning method and device for robot, robot and storage medium |
CN109102525B (en) * | 2018-07-19 | 2021-06-18 | 浙江工业大学 | Mobile robot following control method based on self-adaptive posture estimation |
CN108927806A (en) * | 2018-08-13 | 2018-12-04 | 哈尔滨工业大学(深圳) | A kind of industrial robot learning method applied to high-volume repeatability processing |
CN109697458A (en) * | 2018-11-27 | 2019-04-30 | 深圳前海达闼云端智能科技有限公司 | Control equipment mobile method, apparatus, storage medium and electronic equipment |
-
2019
- 2019-01-12 CN CN201910028901.4A patent/CN109760050A/en active Pending
- 2019-06-13 CN CN201910509236.0A patent/CN110293560A/en active Pending
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112230647A (en) * | 2019-06-28 | 2021-01-15 | 鲁班嫡系机器人(深圳)有限公司 | Intelligent power system behavior model, training method and device for trajectory planning |
CN110532320A (en) * | 2019-08-01 | 2019-12-03 | 立旃(上海)科技有限公司 | Training data management method and device based on block chain |
CN110532320B (en) * | 2019-08-01 | 2023-06-27 | 立旃(上海)科技有限公司 | Training data management method and device based on block chain |
CN110456644A (en) * | 2019-08-13 | 2019-11-15 | 北京地平线机器人技术研发有限公司 | Determine the method, apparatus and electronic equipment of the execution action message of automation equipment |
CN110456644B (en) * | 2019-08-13 | 2022-12-06 | 北京地平线机器人技术研发有限公司 | Method and device for determining execution action information of automation equipment and electronic equipment |
WO2022012265A1 (en) * | 2020-07-13 | 2022-01-20 | Guangzhou Institute Of Advanced Technology, Chinese Academy Of Sciences | Robot learning from demonstration via meta-imitation learning |
CN112847337A (en) * | 2020-12-24 | 2021-05-28 | 珠海新天地科技有限公司 | Method for autonomous operation of application program by industrial robot |
CN113386133A (en) * | 2021-06-10 | 2021-09-14 | 贵州恰到科技有限公司 | Control method of reinforcement learning robot |
Also Published As
Publication number | Publication date |
---|---|
CN110293560A (en) | 2019-10-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109760050A (en) | Robot behavior training method, device, system, storage medium and equipment | |
Matulis et al. | A robot arm digital twin utilising reinforcement learning | |
Sadeghi et al. | Sim2real viewpoint invariant visual servoing by recurrent control | |
Billard et al. | Learning from humans | |
CN110000785B (en) | Agricultural scene calibration-free robot motion vision cooperative servo control method and equipment | |
Pervez et al. | Learning deep movement primitives using convolutional neural networks | |
Khalil et al. | Dexterous robotic manipulation of deformable objects with multi-sensory feedback-a review | |
Sadeghi et al. | Sim2real view invariant visual servoing by recurrent control | |
CN113826051A (en) | Generating digital twins of interactions between solid system parts | |
Fu et al. | Active learning-based grasp for accurate industrial manipulation | |
JP2022061022A (en) | Technique of assembling force and torque guidance robot | |
CN109784400A (en) | Intelligent body Behavioral training method, apparatus, system, storage medium and equipment | |
Kurrek et al. | Ai motion control–a generic approach to develop control policies for robotic manipulation tasks | |
Xu et al. | Dexterous manipulation from images: Autonomous real-world rl via substep guidance | |
Stan et al. | Reinforcement learning for assembly robots: A review | |
Su et al. | A ROS based open source simulation environment for robotics beginners | |
Cipriani et al. | Applications of learning algorithms to industrial robotics | |
Shareef et al. | Generalizing a learned inverse dynamic model of KUKA LWR IV+ for load variations using regression in the model space | |
Oguz et al. | Hybrid human motion prediction for action selection within human-robot collaboration | |
Pairet et al. | Learning and generalisation of primitives skills towards robust dual-arm manipulation | |
Claassens | An RRT-based path planner for use in trajectory imitation | |
CN113927593B (en) | Mechanical arm operation skill learning method based on task decomposition | |
Benotsmane et al. | Survey on artificial intelligence algorithms used in industrial robotics | |
Rocchi et al. | A generic simulator for underactuated compliant hands | |
Li et al. | An automatic robot skills learning system from robot’s real-world demonstrations |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20190517 |
|
WD01 | Invention patent application deemed withdrawn after publication |