CN109784400A

CN109784400A - Intelligent body Behavioral training method, apparatus, system, storage medium and equipment

Info

Publication number: CN109784400A
Application number: CN201910028902.9A
Authority: CN
Inventors: 何德裕
Original assignee: Luban's Robot (shenzhen) Co Ltd
Current assignee: Luban's Robot (shenzhen) Co Ltd
Priority date: 2019-01-12
Filing date: 2019-01-12
Publication date: 2019-05-21

Abstract

This application involves a kind of intelligent body Behavioral training method, the intelligent body Behavioral training method includes: the decision data in the action process for obtain executive expert；Wherein, the decision data includes the set of multiple decision behavior data and corresponding decision observation data pair；Obtain the auxiliary data in the action process for executing auxiliary；Wherein, the auxiliary data includes the set of multiple auxiliary behavioral datas and corresponding supplementary observation data pair；Based on the decision data and the auxiliary data, model autonomous learning is carried out, intelligent body behavior model is obtained.Using technical solution of the present invention, the successful row of intelligent body Behavioral training is improved.Using technical solution of the present invention, the time of model training is saved, improves the adaptability and accuracy of agent model in all cases.

Description

Intelligent body Behavioral training method, apparatus, system, storage medium and equipment

Technical field

This application involves equipment control technology fields, more particularly to a kind of intelligent body Behavioral training method, apparatus, are System, storage medium and equipment.

Background technique

With the raising of scientific and technological level, entire society all develops towards intelligent, automation direction.More and more rows For the realization dependent on intelligent body.Such as: the movement of crawl is executed by intelligent body, the movement of assembly, drives object movement Etc. action behavior.

Artificial intelligence is that intelligent body future development brings unlimited possibility, by supervision, semi-supervised, reinforcing or is imitated Study etc. various methods are trained neural network model, so that the intelligent body based on network model control can Autonomous study executes various movements.

Learning by imitation refers to learn from the example that demonstrator provides, and obtains the multiple groups decision number of the expert in demonstration program Include status data and corresponding action data according to, every group of decision data, all state and action data are constituted to summarizing New set.Later can using state as feature (feature), movement classified as label (label) (for from Dissipate movement) or the study of (for continuous action) is returned to obtain optimal policy model.

However, it is desirable to it is seen that, the method to learn by imitation is during being trained neural network, very Good model training result cannot be obtained in more situations.

Summary of the invention

Based on this, the present invention provides a kind of intelligent body Behavioral training method, apparatus, system, storage medium and equipment.

First aspect present invention provides a kind of intelligent body Behavioral training method, and the intelligent body Behavioral training method includes:

Obtain the decision data in the action process of executive expert；Wherein, the decision data includes multiple decision behaviors Data and corresponding decision observe data；

Obtain the auxiliary data in the action process for executing auxiliary；Wherein, the auxiliary data includes multiple auxiliary behaviors Data and corresponding supplementary observation data；

Based on the decision data and the auxiliary data, model autonomous learning is carried out, intelligent body behavior model is obtained.

Further, described to be based on the decision data and the auxiliary data, model autonomous learning is carried out, intelligent body is obtained Behavior model includes:

Based on the decision data and the auxiliary data, training initial model obtains pretreated model；

Pretreated model autonomous learning is carried out, intelligent body behavior model is obtained；

Based on the decision data and the auxiliary data, initial model autonomous learning is carried out, intelligent body behavior mould is obtained Type.

Further, the decision data in the action process for obtaining executive expert includes:

Obtain the decision behavior data at multiple current times in the action process of the executive expert；

The described of the multiple current time that first sensor is sent in the action process of the executive expert is obtained to determine Plan observes data；Wherein, the decision behavior data at the current time are corresponding with the decision at current time observation data； Or

Obtain decision behavior described in multiple current times that second sensor is sent in the action process of the executive expert The relevant information of data；

The relevant information is parsed, the decision behavior data of multiple last moments are generated；

The described of the multiple last moment that first sensor is sent in the action process of the executive expert is obtained to determine Plan observes data；Wherein, the decision behavior data of the last moment and the decision of the last moment observe number According to corresponding.

Further, the auxiliary data obtained in the action process for executing auxiliary includes:

Obtain the auxiliary behavioral data at multiple current times in the action process for executing auxiliary；

Obtain the described auxiliary of the multiple current time that first sensor is sent in the action process for executing auxiliary Help observation data；Wherein, the auxiliary behavioral data at the current time is corresponding with the supplementary observation data at the current time； Or

It obtains and assists behavior described in multiple current times that second sensor is sent in the action process for executing auxiliary The relevant information of data；

According to the relevant information, the behavioral data of multiple last moments is obtained；

Obtain the described auxiliary of the multiple last moment that first sensor is sent in the action process for executing auxiliary Help observation data；Wherein, the supplementary observation number of the auxiliary behavioral data of the last moment and the last moment According to corresponding.

Second aspect of the present invention provides a kind of intelligent body Behavioral training control device, the intelligent body Behavioral training control dress It sets and includes:

Decision data obtains module, the decision data in action process for obtaining executive expert；Wherein, the decision Data include multiple decision behavior data and corresponding decision observation data；

Auxiliary data obtains module, obtains the auxiliary data in the action process for executing auxiliary；Wherein, the auxiliary data Including multiple auxiliary behavioral datas and corresponding supplementary observation data；

Behavior model generation module, for carrying out model autonomous learning based on the decision data and the auxiliary data, Obtain intelligent body behavior model.

Third aspect present invention provides a kind of intelligent body Behavioral training system, and the intelligent body Behavioral training system includes:

Behavioral data generating means, for generating decision behavior data and the auxiliary behavioral data, and by the decision Behavioral data and the auxiliary behavioral data are sent to the control device；

First sensor, for obtaining decision observation data and supplementary observation data, and by decision observation data and The supplementary observation data are sent to the control device；

Control device, the decision data in action process for obtaining executive expert；Wherein, the decision data includes Multiple decision behavior data and corresponding decision observe data；Obtain the auxiliary data in the action process for executing auxiliary；Wherein, The auxiliary data includes multiple auxiliary behavioral datas and corresponding supplementary observation data；Based on the decision data and described auxiliary Data are helped, model autonomous learning is carried out, obtains intelligent body behavior model.

Further, the intelligent body Behavioral training system further include:

Intelligent body, for executing the behavior of the expert and the behavior of the auxiliary under teaching.

Further, the first sensor includes:

Imaging sensor, the image data of the intelligent body for obtaining a certain moment；

Force snesor, the force feedback data of the intelligent body for obtaining a certain moment；

Encoder, the motion feedback data of the driving unit for obtaining intelligent body described in a certain moment；

Range finder, the relevant ranging data of distance for obtaining intelligent body described in a certain moment；

Speed or acceleration information measuring appliance, for obtaining the speed or acceleration analysis number of intelligent body described in a certain moment According to；

Current or voltage measuring appliance, for obtaining the current or voltage measurement data of intelligent body described in a certain moment；

Timer, for obtaining the specific time data at a certain moment；And/or

Temperature sensor, for obtaining the temperature data of intelligent body described in a certain moment.

Further, the behavioral data generating means include: control unit；

Described control unit, for generating the decision behavior data and the auxiliary behavioral data.

Further, the behavioral data generating means include: second sensor and control unit；

The second sensor, for obtain second sensor transmission multiple current times described in decision behavior data and Assist the relevant information of behavioral data；

Described control unit, for obtaining the behavioral data of multiple last moments according to the relevant information.

Further, the second sensor includes imaging sensor and encoder.

Third aspect present invention provides a kind of multiagent system, and the robot system includes intelligence described in any of the above item It can body Behavioral training system.

Fourth aspect present invention provides a kind of computer equipment, including memory and processor, and the memory is stored with Computer program, the processor realize intelligent body Behavioral training side described in any of the above item when executing the computer program Method.

Fourth aspect present invention provides a kind of computer readable storage medium, is stored thereon with computer program, the meter Calculation machine program realizes intelligent body Behavioral training method described in any of the above item when being executed by processor.

Due to the auxiliary data of behavior and the decision data of expert's behavior will be assisted to input jointly during model training Into initial model, model is trained, saves the time of model training, improves agent model in all cases Adaptability and accuracy.

Detailed description of the invention

Fig. 1 is the first pass schematic diagram of intelligent body Behavioral training method in one embodiment；

Fig. 2 is the second procedure schematic diagram of intelligent body Behavioral training method in one embodiment；

Fig. 3 is the third flow diagram of intelligent body Behavioral training method in one embodiment；

Fig. 4 is the 4th flow diagram of intelligent body Behavioral training method in one embodiment；

Fig. 5 is the 5th flow diagram of intelligent body Behavioral training method in one embodiment；

Fig. 6 is the 6th flow diagram of intelligent body Behavioral training method in one embodiment；

Fig. 7 is the first structure diagram of the embodiment of multiagent system；

Fig. 8 is the second structural schematic diagram of the embodiment of multiagent system；

Fig. 9 is the first structure block diagram of intelligent body training device；

Figure 10 is the second structural block diagram of intelligent body training device；

Figure 11 is the first structure block diagram of middle intelligent body training system；

Figure 12 is the second structural block diagram of middle intelligent body training system；

Figure 13 is the first structure block diagram of the behavioral data generating means of middle robot；

Figure 14 is the second structural block diagram of the behavioral data generating means of middle robot.

Specific embodiment

It is with reference to the accompanying drawings and embodiments, right in order to which the objects, technical solutions and advantages of the application are more clearly understood The application is further elaborated.It should be appreciated that specific embodiment described herein is only used to explain the application, not For limiting the application.

In one embodiment, as shown in Figure 1, providing a kind of intelligent body Behavioral training method, the intelligent body Behavioral training Method the following steps are included:

Step S100 obtains the decision data in the action process of executive expert.

Specifically, intelligent body can be the independent Computer Control Unit of realization machine learning, or including the control Various automation equipments of device processed (such as: industry, medical treatment, amusement, communications and transportation, service etc. field) or measuring device etc. Deng.For convenience of description, this specific embodiment is further described so that intelligent body is robot as an example, wherein robot can be with Regard advanced automation equipment as.In some embodiments, the amount of exercise of the robot in following example, may refer to machine The amount of exercise of any part of people, such as；The amount of exercise of end effector.

Specifically, as shown in figure 3 or 4, robot can pass through formation in series or in parallel for multiple joints and connecting rod Various types of manipulators, each joint are a driving unit, such as: the cascade machines such as four axis robots, six axis robot Tool hand or parallel manipulator.In some embodiments, the output end of the terminal shaft of manipulator also fixation ends actuator, end is held Row device can be sucker or clamping jaw etc.；Therefore, the end of manipulator described in this specific embodiment can refer to manipulator End effector；Or the terminal shaft of manipulator.

Wherein, the corresponding behavioral data that the observation data and the moment that acquisition is inscribed when decision data refers to a certain obtain To the set summarized.

Specifically, inscribing the observation data of acquisition and the corresponding behavior number of the moment acquisition when decision data refers to a certain According to the set summarized.

Specifically, action process can include but is not limited to: grabbing target object from bulk or regularly arranged object Movement (as shown in Figure 8)；Assemble the movement (as shown in Figure 7) of target object；The movement (omitting attached drawing) for putting down object, from One position some or all of moves in the movement (omit attached drawing) or above-mentioned each movement of another position movement Combination.

In one embodiment, decision data is to obtain in the action process by teaching robot executive expert.

Specifically, control instruction of controller generation etc. band mobile robot can be directed or through by operator Executive expert's behavior；Such as: robot completes the assembly behavior act of building blocks under the drive of operator；For another example: according to The assembly behavior act of building blocks is completed in the amount of exercise instruction of each driving unit for the robot that controller is sent.

Further, in some embodiments, it is executed by the instruction tape mobile robot that behavioral data generating means generate special In the case where family's behavior:

Behavioral data can include but is not limited to: each step in the action process of executive expert is corresponding to pass through controller The robot of output executes the object pose (X, Y, Z, U, V, W coordinate) or position (X, Y coordinates) of each step；Either according to mesh Mark appearance or position, the amount of exercise (rotation amount and/or flat of the driving unit of the corresponding robot calculated based on kinematical equation Shifting amount)；Or the amount of exercise of robot.

Control device obtains the observation data that first sensor is obtained and sent.Specifically, observation data may include but Be not limited to: imaging sensor obtain and the image data that sends or according to the robot of the image data extraction (such as: robot End effector) pose or position, distance measuring sensor obtain and the ranging data that sends, force snesor obtain and send Driving unit amount of exercise (rotation amount and/or the translation of robot that power (power/torque) feedback data, encoder are obtained and sent Amount) data, speed or acceleration analysis device obtains and send speed or acceleration information, current or voltage measuring appliance acquisition simultaneously The temperature number that the current or voltage measurement data of transmission, timer obtain and the time data that send, thermometer are obtained and sent According to.

Such as: as shown in fig. 7, by taking image training robot assembles behavior (such as: object M2 is assembled on object M1) as an example, Obtain the multiple groups decision data in the action process of executive expert；It is at a time obtained down specifically, behavioral data can be The movement of the object pose or position or driving unit of the robot next step exported by behavioral data generating means taken Amount；And the corresponding image data for observing each first sensor transmission that acquisition is inscribed when data can be this or pose or position It sets, force feedback data, encoder feedback data, speed or acceleration information and/or current or voltage data etc.；It is special executing In the action process of family, the multiple groups decision data that will acquire is sent to the control unit of robot.In some embodiments, it executes Multiple groups decision data in the action process of expert needs to include at least the decision data under assembly success status.

It in some embodiments, is image data when control device obtains, it can be directly using image data as observation Data, or the pose of robot is gone out perhaps behind position using pose or position as observation according to image data extraction Data.

Further, in some embodiments, in the case where the behavior by operator with mobile robot executive expert:

Since in this case, none very specific behavior command is as behavioral data, in order to obtain behavior number According to can indirectly obtain the relevant information of behavioral data or behavioral data by certain second sensors.At this moment, the first sensing Device and second sensor may include the sensor of identical type, such as: imaging sensor and encoder, in some embodiments In, sensor identical in first sensor and second sensor can be merged into a sensor, that is, the data obtained are i.e. It can be used as behavioral data, can also be used as status data.Such as: the driving list that the encoder obtained under current time is sent The amount of exercise data of member；The amount of exercise data can do the observation data inscribed when this, can also be used as the row at last moment For data.For another example: the imaging sensor under current time according to acquisition obtains and sends the pose that image obtains robot Or position may act as the behavioral data of last moment；It can be used as the observation data of the robot under current time again.

Such as: as shown in figure 8, by taking image training robot is from the behavior for grabbing object in bulk as an example.Wherein, bulk Refer to that multiple objects M is scattered with irregular state.Obtain the multiple groups decision data in the action process of executive expert (behavioral data and corresponding observation data)；Specifically, the behavioral data at a certain current time subsequent time can obtain according to The pose of the robot for the image zooming-out that the imaging sensor taken is sent or position, or the figure according to current time and subsequent time As the amount of exercise of the robot of the pose or position acquisition of the robot of extraction；And the observation data at current time then can be to work as The information that each first sensor of the acquisition inscribed when preceding is sent, such as: the force feedback data of force snesor (such as: in hand The pressure sensor being arranged on finger obtains the numerical value and/or directional information of power when completing grasping movement；Or in robot Multi-dimension force sensor is arranged in terminal shaft output end, obtains the variation etc. of the power or torque of output end in the process of grasping), driving (robot is during the motion for unit feedback data (such as: motor rotation or the angle of movement), speed or acceleration information Speed or acceleration) and/or current or voltage data (such as: input motor current or voltage value) etc., in addition, according to current The image data at moment can also be with the pose or position data at the current time of extraction machine people.

Specifically, behavioral data can include but is not limited to: object pose or position, robot each driving unit Amount of exercise.

In some embodiments, the multiple groups decision data in the action process of executive expert is needed to include at least and be grabbed successfully When the decision data inscribed.

For another example: by taking image training robot moves (translation and/or rotation) to another position from a position as an example.It obtains Multiple groups decision data in the action process of executive expert；Specifically, behavioral data may include what imaging sensor obtained The athletic performance of robot it is each when the pose of the actuator of the robot of image zooming-out inscribed；And the moment is corresponding Observe data: such as: by the range information for the distance objective position that ranging data is fed back, such as: it installs and surveys in robot Distance meter (such as: infrared range-measurement system), range information driving unit feedback data, the speed of distance objective position are fed back by rangefinder Degree or acceleration information etc..Specifically, the multiple groups decision data in the action process of executive expert needs to include at least movement The decision data inscribed when to target position.

In one embodiment, decision data is what expert obtained in the action process of executive expert.

Specifically, expert can be operator or other robot, such as: it obtains some operator and assembles realizing Decision data in behavior；Specifically, such as: the behaviour that can be shot and send by obtaining multiple current time imaging sensors Work person is in the image data for executing assembling process to obtain the behavioral data of last moment of the operator in assembling process and work as The observation data at preceding moment；In addition to this, it can also be fed back by force snesor and be held in people in the installation force snesor on hand of people Luggage is with the observation data etc. in action process.

Specifically, during executing the behavior of the various experts of grasping body, assembly etc., under multiple states of acquisition Image data can be 3D rendering, 2D image or video image.Imaging sensor can include but is not limited to: camera is taken the photograph Camera, scanner or other equipment (mobile phone, computer etc.) etc. with correlation function.Imaging sensor can for more than or equal to Any of 1.

Specifically, imaging sensor can be set in robot or be fixed on a certain position outside robot, it is right in advance Imaging sensor, imaging sensor and robot (referred to as " eye hand ") and robot are demarcated.

Step S200 obtains the auxiliary data in the action process for executing auxiliary；

Specifically, auxiliary data includes the set of multiple auxiliary behavioral datas and corresponding supplementary observation data pair；

By for realize in certain action processes obviously assisted in predefined action purpose the multiple groups status data that obtains and Corresponding action data is to as in auxiliary data input model.Specifically, reaching a certain destination row executing a certain track For in the process when, can by the action trail of certain auxiliary, such as: may bump against barrier etc. mistake expert behavior The data obtained during (i.e. auxiliary behavior) track are as auxiliary data.

In some embodiments, the decision data described in above example can assign to corresponding positive value, and supplementary number According to the corresponding negative value of imparting.

Step S300 is based on the decision data and the auxiliary data, is trained to obtain pretreatment mould to initial model Type carries out autonomous learning based on the pretreated model, obtains intelligent body behavior model.

Autonomous learning process is exactly to allow intelligent body to be based on pretreated model to generate some action trails, then defines a mark Difference between action trail of the standard to judge the expert of these tracks and teaching period acquisition, then according to this difference come more The strategy of new pretreated model, the track for generating it next time more close to the behavior of expert, is sentenced until according to standard The action trail of the disconnected close enough expert of action trail generated based on pretreated model, the then model obtained are final intelligence Body behavior model.

Specifically, standard described in above example can based on experience value, machine learning, the various methods of random value etc. It obtains, in some embodiments, this standard can be indicated with the neural network through overfitting.

By using learning method above, since the auxiliary data and specially of behavior during model training, will be assisted The decision data of family's behavior is input in initial model jointly, is trained to model, is saved the time of model training, is improved The adaptability and accuracy of agent model in all cases.

In some embodiments, step S300 includes following method and step:

S310 is based on the decision data and auxiliary data, and training initial model obtains pretreated model；

Decision is observed data and supplementary observation data as feature (feature), decision behavior data and auxiliary behavior Data are classified as label (label) (for discrete movement) or the study of recurrence (for continuous action), constantly update The parameter of initial model, to obtain pretreated model.

S320 carries out autonomous learning based on the pretreated model, obtains intelligent body behavior model.

By using learning method above, due to the decision data that is obtained in the behavior based on expert to initial model into Row training obtains pretreated model, carries out autonomous learning based on pretreated model, finally obtains the model of intelligent body study, therefore Agent model consummatory behavior acts in all cases adaptability and accuracy after improving training.

Alternatively, it is also possible to reduce the time of intelligent body behavior model training.

In some embodiments, step S300 includes following method and step:

S330 is used to be based on the decision data, carries out initial model autonomous learning, obtains intelligent body behavior model.

Autonomous learning process is exactly to allow intelligent body to be based on initial model to generate some action trails, defines a standard to sentence The difference broken between the action trail of expert that these tracks and teaching period obtain, then updates pre- place according to this difference The strategy of model is managed, the track for generating it next time more close to the behavior of expert, is based on until according to standard judgement The action trail for the close enough expert of action trail that initial model generates, the then model obtained are final intelligent body behavior mould Type.

As shown in figure 3, in some embodiments, step S100 obtains the decision data packet in the action process of executive expert Include following method and step:

S110 obtains the decision behavior data at multiple current times in the action process of the executive expert；

S130 obtains the institute at the multiple current time that first sensor is sent in the action process of the executive expert State decision observation data；Wherein, the decision behavior data at the current time and the decision at the current time observe data phase It is corresponding.

As shown in figure 4, in some embodiments, step S100 obtains the decision data packet in the action process of executive expert Include following method and step:

S120 obtains decision described in multiple current times that second sensor is sent in the action process of the executive expert The relevant information of behavioral data；

S140 obtains the decision behavior data of multiple last moments according to the relevant information；

S160 obtains the institute for the multiple last moment that first sensor is sent in the action process of the executive expert State decision observation data；Wherein, the decision behavior data of the last moment and the decision of the last moment are seen Measured data is corresponding.

As shown in figure 5, in some embodiments, step S200 obtains the decision data packet in the action process for executing auxiliary Include following method and step:

S210 obtains the auxiliary behavioral data at multiple current times in the action process for executing auxiliary；

S230 obtains the institute at the multiple current time that first sensor is sent in the action process for executing auxiliary State supplementary observation data；Wherein, the supplementary observation data phase of the auxiliary behavioral data and the current time at the current time It is corresponding；Or

As shown in fig. 6, in some embodiments, step S200 obtains the decision data packet in the action process for executing auxiliary Include following method and step:

S220 obtains auxiliary described in multiple current times that second sensor is sent in the action process for executing auxiliary The relevant information of behavioral data；

S230 obtains the behavioral data of multiple last moments according to the relevant information；

S240 obtains the institute for the multiple last moment that first sensor is sent in the action process for executing auxiliary State supplementary observation data；Wherein, the auxiliary of the auxiliary behavioral data of the last moment and the last moment are seen Measured data is corresponding.

Although should be understood that Fig. 1,2,3,4,5 and 6 flow chart in each step according to arrow instruction successively It has been shown that, but these steps are not that the inevitable sequence according to arrow instruction successively executes.Unless expressly state otherwise herein, There is no stringent sequences to limit for the execution of these steps, these steps can execute in other order.Moreover, Fig. 1,2,3, 4, at least part step in 5 and 6 may include that perhaps these sub-steps of multiple stages or stage be not for multiple sub-steps Completion necessarily is executed in synchronization, but can be executed at different times, the execution in these sub-steps or stage is suitable Sequence, which is also not necessarily, successively to be carried out, but can be at least one of the sub-step or stage of other steps or other steps Minute wheel stream alternately executes.

In one embodiment, as shown in figure 9, providing a kind of intelligent body Behavioral training control device, the intelligent body behavior Training device includes that decision data obtains module 100, auxiliary data obtains module 200 and behavior model generation module 300.

Decision data obtains module 100, the decision data in action process for obtaining executive expert；

Auxiliary data obtains module 200, for obtaining the aid decision data in the action process for executing auxiliary；

Behavior model generation module 300, for being based on the decision data and the aid decision data, to initial model It is trained, obtains intelligent body action learning model.

As shown in Figure 10, in some embodiments, the behavior model generation module 300 includes: that pretreated model generates Portion 310 and the first row are generating unit 320；

The pretreated model generating unit 310, for being based on the decision data and the auxiliary data, training introductory die Type obtains pretreated model；

The first row obtains intelligent body behavior model for carrying out pretreated model autonomous learning for generating unit 320.

In some embodiments, the behavior model generation module 300 includes: the second behavior generating unit 330；

The second behavior generating unit 330 carries out initial model for being based on the decision data and the auxiliary data Autonomous learning obtains intelligent body behavior model.

In some embodiments, decision data obtains module and includes: current decision behavioral data generation module and currently determine Plan observes data generation module.

Current decision behavioral data generation module, when multiple current in the action process for obtaining the executive expert The decision behavior data at quarter；

Current decision observes data generation module, first sensor hair in the action process for obtaining the executive expert The decision at the multiple current time sent observes data；Wherein, the decision behavior data at the current time with it is described The decision observation data at current time are corresponding.

In some embodiments, it includes: the relevant information acquisition unit of current decision, a upper decision that decision data, which obtains module, Behavioral data generating unit and a upper decision observe data generating section.

The relevant information acquisition unit of current decision, second sensor hair in the action process for obtaining the executive expert The relevant information of decision behavior data described in the multiple current times sent；

Upper decision behavior data generating section, for obtaining the described of multiple last moments and determining according to the relevant information Plan behavioral data；

A upper decision observes data generating section, and first sensor is sent in the action process for obtaining the executive expert The multiple last moment the decision observe data；Wherein, the decision behavior data of the last moment and institute The decision observation data for stating last moment are corresponding.

In some embodiments, it includes: current auxiliary behavior data generation module and current auxiliary that auxiliary data, which obtains module, Help observation data generation module.

Current auxiliary behavior data generation module, when for obtaining multiple current in the action process for executing auxiliary The auxiliary behavioral data at quarter；

Current supplementary observation data generation module, for obtaining first sensor hair in the action process for executing auxiliary The supplementary observation data at the multiple current time sent；Wherein, the auxiliary behavioral data at the current time with it is described The supplementary observation data at current time are corresponding；Or

In some embodiments, it includes: the relevant information acquisition unit currently assisted, a upper auxiliary that auxiliary data, which obtains module, Behavioral data generating unit and upper supplementary observation data generating section.

The relevant information acquisition unit currently assisted, for obtaining second sensor hair in the action process for executing auxiliary The relevant information of behavioral data is assisted described in the multiple current times sent；

Upper auxiliary behavior data generating section, for obtaining the row of multiple last moments according to the relevant information For data；

Upper supplementary observation data generating section is sent for obtaining first sensor in the action process for executing auxiliary The multiple last moment the supplementary observation data；Wherein, the auxiliary behavioral data of the last moment and institute The supplementary observation data for stating last moment are corresponding.

Specific restriction about intelligent body Behavioral training control device may refer to above for intelligent body Behavioral training The restriction of method, details are not described herein.Modules in above-mentioned intelligent body Behavioral training control device can completely or partially lead to Software, hardware and combinations thereof are crossed to realize.Above-mentioned each module can be embedded in the form of hardware or independently of in computer equipment In processor, can also be stored in a software form in the memory in computer equipment, in order to processor call execute with The corresponding operation of upper modules.

In one embodiment, as shown in figure 11, a kind of intelligent body Behavioral training system is provided, including control device 400, First sensor 500 and behavioral data generating means 600.

Control device 400, the decision data in action process for obtaining executive expert；Wherein, the decision data Data are observed including multiple decision behavior data and corresponding decision；Obtain the auxiliary data in the action process for executing auxiliary； Wherein, the auxiliary data includes multiple auxiliary behavioral datas and corresponding supplementary observation data；Based on the decision data and The auxiliary data carries out model autonomous learning, obtains intelligent body behavior model.

Behavioral data generating means 600, for generating decision behavior data and auxiliary behavioral data, and by the decision row The control device is sent to for data and the auxiliary behavioral data.

First sensor 500, for obtaining decision data and auxiliary data in expert's action process, by decision data and Auxiliary data is sent to control device.

In one embodiment, as shown in figure 12, the intelligent body training system further include: intelligent body 700, in teaching The behavior of lower executive expert and the behavior of auxiliary.

Specifically, first sensor 500 includes but is not limited to:

Encoder, the motion feedback data of the driving unit for obtaining a certain moment intelligent body；

Range finder, for obtaining the relevant ranging data of distance of a certain moment intelligent body；

Speed or acceleration information measuring appliance, for obtaining the speed or acceleration analysis data of a certain moment intelligent body；

Current or voltage measuring appliance, for obtaining the current or voltage measurement data of a certain moment intelligent body；

Timer, for obtaining the specific time data at a certain moment；

Temperature sensor, for obtaining the temperature data of a certain moment intelligent body.

As shown in figure 13, in some embodiments, the behavioral data generating means 600 include: control unit 610；

Described control unit 610, for generating the decision behavior data and auxiliary behavioral data.

As shown in figure 14, in some embodiments, the behavioral data generating means 600 include: 620 He of second sensor Control unit 610；

The second sensor 620, for obtaining decision behavior data described in multiple current times and auxiliary behavioral data Relevant information, be sent to described control unit；

Described control unit 610 generates the decision behavior number of multiple last moments for parsing the relevant information According to or auxiliary behavioral data.

Specifically, second sensor 620 can include but is not limited to: imaging sensor and encoder.

It should be noted that when first sensor 500 includes such as:, can be with second when imaging sensor and encoder The imaging sensor for including and encoder of sensor 620 are separately independently arranged, in addition to this, can also be with common image sensor And encoder, i.e., it is parsed by the relevant information shot to a certain current time imaging sensor and encoder, it can The decision behavior data and auxiliary behavioral data of last moment are generated as, also can be generated and observe data for the decision at current time With supplementary observation data.

Specifically, control device 400 and control unit 610 can be independently provided separately, a device (ratio can also be combined into Such as: as shown in Figure 5,6, control device 400 and control unit 610 merge, unified to realize control device 400 by control device 400 With the robot behavior training method of control unit 610 and behavioral data generation method etc..)

Control device 400 and control unit 610 can be programmable logic controller (PLC) (Programmable Logic Controller, PLC), field programmable gate array (Field-Programmable Gate Array, FPGA), computer (Personal Computer, PC), industrial control computer (Industrial Personal Computer, IPC) or service Device etc..Control device is according to program fixed in advance, in conjunction with the first sensor of the information, parameter or the outside that are manually entered And/or data of second sensor (such as imaging sensor) acquisition etc. generate program instruction.

Specific about control device limits the restriction that may refer to above for intelligent body training method, herein no longer It repeats.Modules in above-mentioned control device can be realized fully or partially through software, hardware and combinations thereof.Above-mentioned each mould Block can be embedded in the form of hardware or independently of in the processor in computer equipment, can also be stored in calculating in a software form In memory in machine equipment, the corresponding operation of the above modules is executed in order to which processor calls.

In one embodiment, the present invention also provides a kind of including intelligent body Behavioral training system described in above example Multiagent system.

Associated description in relation to intelligent body Behavioral training system is referring to above embodiment, and it is no longer repeated herein.

It should be noted that above-mentioned intelligent body Behavioral training method, Behavioral training control device, Behavioral training system or intelligence The intelligent body and/or sensor mentioned in energy system system etc., it can it is real intelligence body and the sensor under true environment, The Virtual Agent and/or sensor being also possible under emulation platform, by simulated environment with reach connection real intelligence body and/ Or the effect of sensor.The control device after virtual environment consummatory behavior is trained will be relied on, is transplanted under true environment, to true Intelligent body and sensor carry out control or retraining, resource and the time of training process can be saved.

In one embodiment, a kind of computer equipment, including memory and processor are provided, memory is stored with calculating Machine program, processor perform the steps of the decision data in the action process for obtaining executive expert when executing computer program； Obtain the aid decision data in the action process for executing auxiliary；It is right based on the decision data and the aid decision data Initial model is trained, and obtains intelligent body action learning model.

In one embodiment, a kind of computer readable storage medium is provided, computer program, computer are stored thereon with The decision data in the action process for obtaining executive expert is performed the steps of when program is executed by processor；It is auxiliary to obtain execution Aid decision data in the action process helped；Based on the decision data and the aid decision data, to initial model into Row training, obtains intelligent body action learning model.

Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the computer program can be stored in a non-volatile computer In read/write memory medium, the computer program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, To any reference of memory, storage, database or other media used in each embodiment provided herein, Including non-volatile and/or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM (PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include Random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms, Such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double data rate sdram (DDRSDRAM), enhancing Type SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..

Each technical characteristic of embodiment described above can be combined arbitrarily, for simplicity of description, not to above-mentioned reality It applies all possible combination of each technical characteristic in example to be all described, as long as however, the combination of these technical characteristics is not deposited In contradiction, all should be considered as described in this specification.

In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, is not described in detail or remembers in some embodiment The part of load may refer to the associated description of other embodiments.

The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.

It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of software functional units.

Unless otherwise defined, technical and scientific term all used in this specification is led with technology of the invention is belonged to The normally understood meaning of the technical staff in domain is identical.In this specification in the description of the invention used in belong to and be only The purpose of description specific embodiment is not intended to the limitation present invention.

Claims of the present invention and specification and term " first " in above-mentioned attached drawing, " second ", " third ", " S110 ", " S120 " " S130 " etc. (if present) are for distinguishing similar object, without specific suitable for describing Sequence or precedence.It should be understood that the data used in this way are interchangeable under appropriate circumstances, so as to the embodiments described herein It can be implemented with the sequence other than the content for illustrating or describing herein.In addition, term " includes " " having " and they Any deformation, it is intended that cover and non-exclusive include.Such as: include series of steps or module process, method, System, product or robot those of are not necessarily limited to be clearly listed step or module, but including being not clearly listed Or other steps or module intrinsic for these process, methods, system, product or robot.

It should be noted that those skilled in the art should also know that, embodiment described in this description belongs to excellent Embodiment is selected, related structure and module are not necessarily essential to the invention.

The embodiments described above only express several embodiments of the present invention, and the description thereof is more specific and detailed, but simultaneously It cannot therefore be construed as limiting the scope of the patent.It should be pointed out that coming for those of ordinary skill in the art It says, without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to protection of the invention Range.Therefore, the scope of protection of the patent of the invention shall be subject to the appended claims.

Claims

1. a kind of intelligent body Behavioral training method, which is characterized in that the intelligent body Behavioral training method includes:

Obtain the decision data in the action process of executive expert；Wherein, the decision data includes multiple decision behavior data Data are observed with corresponding decision；

Obtain the auxiliary data in the action process for executing auxiliary；Wherein, the auxiliary data includes multiple auxiliary behavioral datas With corresponding supplementary observation data；

2. intelligent body Behavioral training method according to claim 1, which is characterized in that it is described based on the decision data and The auxiliary data carries out model autonomous learning, and obtaining intelligent body behavior model includes:

Pretreated model autonomous learning is carried out, intelligent body behavior model is obtained.

3. intelligent body Behavioral training method according to claim 1, which is characterized in that it is described based on the decision data and The auxiliary data carries out model autonomous learning, and obtaining intelligent body behavior model includes:

Based on the decision data and the auxiliary data, initial model autonomous learning is carried out, intelligent body behavior model is obtained.

4. intelligent body Behavioral training method according to claim 1,2 or 3, which is characterized in that the acquisition executive expert Action process in decision data include:

The decision for obtaining the multiple current time that first sensor is sent in the action process of the executive expert is seen Measured data；Wherein, the decision behavior data at the current time are corresponding with the decision at current time observation data；Or

Obtain decision behavior data described in multiple current times that second sensor is sent in the action process of the executive expert Relevant information；

The decision for obtaining the multiple last moment that first sensor is sent in the action process of the executive expert is seen Measured data；Wherein, the decision behavior data of the last moment and the decision of the last moment observe data phase It is corresponding.

5. intelligent body Behavioral training method according to claim 1,2 or 3, which is characterized in that the acquisition executes auxiliary Action process in auxiliary data include:

The auxiliary for obtaining the multiple current time that first sensor is sent in the action process for executing auxiliary is seen Measured data；Wherein, the auxiliary behavioral data at the current time is corresponding with the supplementary observation data at the current time；Or

It obtains and assists behavioral data described in multiple current times that second sensor is sent in the action process for executing auxiliary Relevant information；

The auxiliary for obtaining the multiple last moment that first sensor is sent in the action process for executing auxiliary is seen Measured data；Wherein, the supplementary observation data phase of the auxiliary behavioral data of the last moment and the last moment It is corresponding.

6. a kind of intelligent body Behavioral training control device, which is characterized in that the intelligent body Behavioral training control device includes:

Decision data obtains module, the decision data in action process for obtaining executive expert；Wherein, the decision data Data are observed including multiple decision behavior data and corresponding decision；

Auxiliary data obtains module, obtains the auxiliary data in the action process for executing auxiliary；Wherein, the auxiliary data includes Multiple auxiliary behavioral datas and corresponding supplementary observation data；

Behavior model generation module carries out model autonomous learning, obtains for being based on the decision data and the auxiliary data Intelligent body behavior model.

7. a kind of intelligent body Behavioral training system, which is characterized in that the intelligent body Behavioral training system includes:

Behavioral data generating means, for generating decision behavior data and the auxiliary behavioral data, and by the decision behavior Data and the auxiliary behavioral data are sent to the control device；

First sensor observes data and described for obtaining decision observation data and supplementary observation data, and by the decision Supplementary observation data are sent to the control device；

Control device, the decision data in action process for obtaining executive expert；Wherein, the decision data includes multiple Decision behavior data and corresponding decision observe data；Obtain the auxiliary data in the action process for executing auxiliary；Wherein, described Auxiliary data includes multiple auxiliary behavioral datas and corresponding supplementary observation data；Based on the decision data and the supplementary number According to progress model autonomous learning obtains intelligent body behavior model.

8. intelligent body Behavioral training system according to claim 7, which is characterized in that the intelligent body Behavioral training system Further include:

9. intelligent body Behavioral training system according to claim 7 or 8, which is characterized in that the first sensor includes:

Speed or acceleration information measuring appliance, for obtaining the speed or acceleration analysis data of intelligent body described in a certain moment；

Timer, for obtaining the specific time data at a certain moment；And/or

10. intelligent body Behavioral training system according to claim 7 or 8, which is characterized in that the behavioral data generates dress Setting includes: control unit；

11. intelligent body Behavioral training system according to claim 7 or 8, which is characterized in that the behavioral data generates dress Set includes: second sensor and control unit；

The second sensor, for obtaining decision behavior data and auxiliary described in multiple current times of second sensor transmission The relevant information of behavioral data；

12. intelligent body Behavioral training system according to claim 11, which is characterized in that the second sensor includes figure As sensor and encoder.

13. a kind of multiagent system, which is characterized in that the robot system includes the described in any item intelligence of claim 7-12 It can body Behavioral training system.

14. a kind of computer equipment, including memory and processor, the memory are stored with computer program, feature exists In the processor realizes the described in any item intelligent body Behavioral training sides claim 1-5 when executing the computer program Method.

15. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program Claim 1-5 described in any item intelligent body Behavioral training methods are realized when being executed by processor.