CN110516389A - Learning method, device, equipment and the storage medium of behaviour control strategy - Google Patents
Learning method, device, equipment and the storage medium of behaviour control strategy Download PDFInfo
- Publication number
- CN110516389A CN110516389A CN201910820695.0A CN201910820695A CN110516389A CN 110516389 A CN110516389 A CN 110516389A CN 201910820695 A CN201910820695 A CN 201910820695A CN 110516389 A CN110516389 A CN 110516389A
- Authority
- CN
- China
- Prior art keywords
- target object
- demonstration
- joint
- behavioral data
- neural network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Processing Or Creating Images (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
This application discloses learning method, device, computer equipment and the storage mediums of a kind of behaviour control strategy, this method comprises: sampling out from demonstration behavioral data sequence includes at least two demonstration behavioral data segments for demonstrating behavioral datas;According to demonstration behavioral data segment, the initial state information in each joint for the target object simulated in physics emulator is set, and utilization neural network model to be trained determines the force data in each joint of target object;The movement in each joint for the target object simulated in control physical simulation device, so that physical simulation device limits feature, the emulation behavioral data sequence of the target object simulated based on the action behavior of setting;According to demonstration behavioral data and emulation behavioral data, action behavior diversity factor is determined;Based on action behavior diversity factor, optimization neural network model is until reach optimization aim.The object that the scheme of the application is conducive to demonstration study generates the action behavior after extension based on demostrating action.
Description
Technical field
This application involves field of computer technology more particularly to a kind of learning method, device, the equipment of behaviour control strategy
And storage medium.
Background technique
Demonstration study is a kind of using demonstration behavior as the autonomous learning technology of target, in demonstration study, skill to be learned
The object of energy is required to imitate the behavior of demonstration, so that the object can obtain motor skill corresponding with demonstration behavior.Its
In, in different application field, the object of technical ability to be learned be would also vary from.Such as, in field of play, technical ability to be learned
Object can be the personage in game, animal etc.;For another example, in robot control field, the object of technical ability to be learned can be with
For robot.
Currently, can be demonstrated in example and be learnt from several groups by the machine learning algorithm of multiplicity in demonstration learning process
Behaviour control strategy is obtained, behavior control strategy then can be based on, behavior is carried out to the object in actual application environment
Control, so that object can obtain action behavior corresponding with example is demonstrated.
However, in existing demonstration learning process, if it is desired to the object of technical ability to be learned has a certain motor skill,
Just need to be obtained ahead of time the corresponding movement demonstration data of the motor skill;If having lacked corresponding movement demonstration data, nothing
Method makes object have corresponding motor skill, and the complexity for causing the object of technical ability to be learned to generate a certain technical ability is higher.Example
Such as, if it is desired to which the personage in game has the motor skill for removing chest walking, then needs pre- to first pass through true man and remove chest
The demonstration data of walking.
Summary of the invention
In view of this, this application provides learning method, device, equipment and the storage medium of a kind of behaviour control strategy,
It may learn action behavior different from demostrating action to be conducive to the object of demonstration study, reduce the uncertain plant learning behavior skill
The complexity of energy.
To achieve the above object, on the one hand, this application provides a kind of learning methods of behaviour control strategy, comprising:
The demonstration behavioral data segment as training sample, the demonstration behavior are sampled out from demonstration behavioral data sequence
Data slot includes at least two demonstration behavioral datas with sequencing, and the demonstration behavioral data includes presentation objects
The first state information in each joint;
According to the demonstration behavioral data segment, the first of each joint for the target object simulated in physics emulator is set
Beginning status information, and determine to act on the effect in each joint of the target object using neural network model to be trained
Force data, the target object and presentation objects joint having the same;
The force data in each joint of the target object determined based on the neural network model, described in control
The movement in each joint for the target object simulated in physical simulation device, so that the movement of the physical simulation device based on setting
Behavior limits feature, simulates the emulation behavioral data sequence of the target object, and the emulation behavioral data sequence includes tool
There is at least one emulation behavioral data of sequencing, the emulation behavioral data includes each joint of the target object
Second status information, the action behavior limits to be met needed for the action behavior for the target object that feature is used to limit the simulation
Feature;
First state information and emulation row according to each joint of presentation objects in the demonstration behavioral data
For second status information in each joint of target object described in data, determine the simulation target object and the demonstration
Action behavior diversity factor between object;
Based on the action behavior diversity factor, optimize behaviour control strategy expressed by the neural network model, until
Reach optimization aim, by the behaviour control strategy that the neural network model is expressed be determined as demonstration study in based on control
Strategy.
Another aspect, present invention also provides a kind of learning devices of behaviour control strategy, comprising:
Data sampling unit, for sampling out the demonstration behavioral data as training sample from demonstration behavioral data sequence
Segment, the demonstration behavioral data segment include at least two demonstration behavioral datas with sequencing, the demonstration behavior
Data include the first state information in each joint of presentation objects;
Model cootrol unit, for the target simulated in physics emulator to be arranged according to the demonstration behavioral data segment
The initial state information in each joint of object, and determine to act on the target pair using neural network model to be trained
The force data in each joint of elephant, the target object and presentation objects joint having the same;
Data simulation unit, the work in each joint of the target object for being determined based on the neural network model
With force data, the movement in each joint for the target object simulated in the physical simulation device is controlled, so that the physics is imitative
True device limits feature based on the action behavior of setting, simulates the emulation behavioral data sequence of the target object, the emulation
Behavioral data sequence includes at least one emulation behavioral data with sequencing, and the emulation behavioral data includes the mesh
Second status information in each joint of object is marked, the action behavior limits the target object that feature is used to limit the simulation
Action behavior needed for meet feature;
Difference comparing unit, for the first state letter according to each joint of presentation objects in the demonstration behavioral data
Second status information in each joint of target object described in breath and the emulation behavioral data, determines the mesh of the simulation
Mark the action behavior diversity factor between object and the presentation objects;
Training optimization unit optimizes expressed by the neural network model for being based on the action behavior diversity factor
The behaviour control strategy that the neural network model is expressed is determined as demonstrating by behaviour control strategy until reaching optimization aim
Control strategy based in study.
Another aspect, present invention also provides a kind of computer equipments, comprising:
Processor and memory;
The processor, for calling and executing the program stored in the memory;
The memory is used for storing said program, and described program is at least used for:
The demonstration behavioral data segment as training sample, the demonstration behavior are sampled out from demonstration behavioral data sequence
Data slot includes at least two demonstration behavioral datas with sequencing, and the demonstration behavioral data includes presentation objects
The first state information in each joint;
According to the demonstration behavioral data segment, the first of each joint for the target object simulated in physics emulator is set
Beginning status information, and determine to act on the effect in each joint of the target object using neural network model to be trained
Force data, the target object and presentation objects joint having the same;
The force data in each joint of the target object determined based on the neural network model, described in control
The movement in each joint for the target object simulated in physical simulation device, so that the movement of the physical simulation device based on setting
Behavior limits feature, simulates the emulation behavioral data sequence of the target object, and the emulation behavioral data sequence includes tool
There is at least one emulation behavioral data of sequencing, the emulation behavioral data includes each joint of the target object
Second status information, the action behavior limits to be met needed for the action behavior for the target object that feature is used to limit the simulation
Feature;First state information and the emulation behavior according to each joint of presentation objects in the demonstration behavioral data
Second status information in each joint of target object described in data, determine the simulation target object and it is described demonstration pair
Action behavior diversity factor as between;
Based on the action behavior diversity factor, optimize behaviour control strategy expressed by the neural network model, until
Reach optimization aim, by the behaviour control strategy that the neural network model is expressed be determined as demonstration study in based on control
Strategy.
It is executable to be stored with computer present invention also provides a kind of storage medium for another aspect in the storage medium
Instruction when the computer executable instructions are loaded and executed by processor, realizes as above described in any item behaviour control plans
Learning method slightly.
It can be seen via above technical scheme that behaviour control strategy needed for demonstration study passes through neural network in the application
Model tormulation.Behaviour control expressed by neural network model is completed by the cooperation of neural network model and physical simulation device
The training of strategy, moreover, during training neural network model, other than combining and demonstrating behavioral data, also in physics
Action behavior limited features corresponding to object in emulator provided with behavior technical ability to be learned are limited special by action behavior
Sign can limit the feature requirement met needed for the behavioural characteristic for the target object simulated in physics emulator, so that instruction
Behaviour control strategy expressed by the neural network model practised can make target object generation to the greatest extent may be used with demonstration behavioral data
Can be similar, and meet other action behaviors of the action behavior limited features of setting again.It follows that the mind obtained based on training
When controlling the action learning of target object through network model, it can both be conducive to target object study and arrive and demonstrate behavioral data phase
As action behavior and the action behavior not exactly the same with the corresponding action behavior of demonstration behavioral data, it can expand
Other similar action behavior, be conducive to target object based on demostrating action behavioral data can learn out with demostrating action row
The different action behavior of behavior is demonstrated for data, thus without the demonstration behavioral data of certain action behavior,
Also the behaviour control strategy of available corresponding actions behavior, and then target object can be controlled based on behavior control strategy
The action behavior similar but different from demonstration behavior is practised out, the complexity of demonstration study is advantageously reduced.
Detailed description of the invention
In order to more clearly explain the technical solutions in the embodiments of the present application, make required in being described below to embodiment
Attached drawing is briefly described, it should be apparent that, the accompanying drawings in the following description is only embodiments herein, for ability
For the those of ordinary skill of domain, without creative efforts, it can also be obtained according to the attached drawing of offer other
Attached drawing.
Fig. 1 a shows each joint of presentation objects and its schematic diagram of state in demonstration study;
Fig. 1 b shows the structural schematic diagram in each joint of object of technology to be learned in demonstration study;
Fig. 2 shows the one of a kind of computer equipment that a kind of learning method of behaviour control strategy of the application is applicable in
Kind composed structure schematic diagram;
Fig. 3 shows a kind of flow diagram of learning method one embodiment of behaviour control strategy of the application;
Fig. 4 shows a kind of flow diagram of another embodiment of the learning method of behaviour control strategy of the application;
Fig. 5 shows a kind of configuration diagram of realization principle of the learning method of the behaviour control strategy of the application;
Fig. 6 shows a kind of learning method of behaviour control strategy of the application and illustrates applied to a kind of process of application scenarios
Figure;
Fig. 7 shows a kind of composed structure schematic diagram of learning device one embodiment of behaviour control strategy of the application.
Specific embodiment
The scheme of the application is suitable for demonstration study, is related to presentation objects and behavior skill to be learned in demonstration study
The object of energy.Wherein, presentation objects demonstrate behavioral data based on demonstration study to generate for demonstrating behavior.And wait learn
The object of habit behavior technical ability is the object finally learnt based on demonstration behavioral data to corresponding actions behavior technical ability.Such as, the object
It can be the game object in robot or game.
Such as, by taking field of play as an example, the object of technical ability to be learned can be the game charater in game.In this kind of situation
Under, the movement (such as walk, jump act) that can be demonstrated according to true user obtains and demonstrate behavioral data, and according to drilling
Show behavioral data, intensified learning is carried out to the game charater in game, is moved so that game charater can have this demonstrated out
The technical ability made and (such as walked, movement of jumping).
Currently, demonstration behavioral data General Expression is the state in each joint of presentation objects in demonstration learning process,
The state may include the angle in each joint, speed (speed comprising each reference direction) etc..And pair of technical ability to be learned
As with presentation objects joint having the same, the freedom degree in corresponding each joint is also identical.
As shown in Figure 1a, the state in each joint and each joint that are included it illustrates presentation objects.
In fig 1 a by taking presentation objects are behaved as an example, the presentation objects are shown in fig 1 a and contain each pass of human body
Section, e.g., knee joint, elbow joint, wrist joint etc..
Meanwhile the demonstration behavioral data can reflect out each joint state in which of presentation objects in Fig. 1.Such as, respectively
A joint angle in three dimensions and speed etc..For example, in the three-dimensional space of setting, have orthogonal X-axis,
Y-axis and Z axis, each joint of the demonstration available presentation objects of behavioral data based on presentation objects is relative to these three axial directions
Angle etc..
Correspondingly, in order to enable the target object of technical ability to be learned based on the demonstration behavioral data of presentation objects
Demonstration learns corresponding technical ability out, and the target object of the technical ability to be learned should have joint identical with the presentation objects.When
So, the freedom degree in each joint is also identical.It as shown in Figure 1 b, is the demonstration behavior based on presentation objects shown in Fig. 1 a
Data carry out the structural schematic diagram of the target object of demonstration study.The target object is similarly human body, mesh it can be seen from Fig. 1 b
Joint and the freedom degree for marking object and presentation objects are all the same.
It is understood that Fig. 1 a and Fig. 1 b are only with the behaviour of the target object of presentation objects and behavior technical ability to be learned
As an example, in practical applications, needing presentation objects to have and the target object phase if target object is other forms
Same joint, for example, target object is the robot (such as Doraemon) of zoomorphism, then presentation objects can be with animal
(such as cat) etc..
It is understood that needing to determine control target object based on demonstration behavioral data in demonstration learning process
Behaviour control strategy is then based on behaviour control strategy to control the movement of target object, so that target object can learn
To behavior technical ability similar with demonstration behavior.
However, inventor has found that: identified behaviour control strategy in existing demonstration learning process, it can only
So that target object learns to the behavior almost the same with demonstration behavior, and but process similar to demonstration behavior without calligraphy learning
Other behavior acts of extension to limit the behavior technical ability that can learn by demonstrating study, and then are only having
In the case where the demonstration data of certain behavior, this kind of behavior just may learn, the complexity for causing demonstration to learn is higher, flexibly
Property is poor.
Based on above the study found that the scheme of the application can be trained based on demonstration behavioral data and be suitble to extension demonstration
The behaviour control strategy of behavior.
The scheme of the application be suitable for computer equipment, the computer equipment can for personal computer, server and
Other have data processing can electronic equipment.
Such as, referring to fig. 2, it illustrates the computers that the learning method of the behaviour control strategy of the embodiment of the present application is applicable in
A kind of composed structure schematic diagram of equipment.In Fig. 2, the computer equipment 200 may include: processor 201, memory 202,
Communication interface 203, input unit 204 and display 205 and communication bus 206.
Processor 201, communication interface 203, input unit 204, display 205, passes through communication bus at memory 202
206 complete mutual communication.
In the embodiment of the present application, the processor 201 can be central processing unit (Central Processing
Unit, CPU), application-specific integrated circuit (application-specific integrated circuit, ASIC), number
Signal processor (DSP), specific integrated circuit (ASIC), ready-made programmable gate array (FPGA) or other programmable logic devices
Part etc..
The processor can call the program stored in memory 202, specifically, processor can execute subsequent figure 3 to
Operation performed by computer equipment in Fig. 6.
For storing one or more than one program in memory 202, program may include program code, described program
Code includes computer operation instruction, in the embodiment of the present application, is at least stored in the memory for realizing following functions
Program:
The demonstration behavioral data segment as training sample, the demonstration behavior number are sampled out from demonstration behavioral data sequence
It include at least two demonstration behavioral datas with sequencing according to segment, which includes each of presentation objects
The first state information in joint;
According to the demonstration behavioral data segment, the initial of each joint for the target object simulated in physics emulator is set
Status information, and determine to act on the active force number in each joint of the target object using neural network model to be trained
According to the target object and presentation objects joint having the same;
Based on the force data in each joint of the determining target object of the neural network model, it is imitative to control the physics
The movement in each joint for the target object simulated in true device, so that the physical simulation device is limited based on the action behavior of setting
Feature, the emulation behavioral data sequence of the target object simulated, the emulation behavioral data sequence include having sequencing
At least one emulation behavioral data, which includes second status information in each joint of the target object,
The action behavior limits the feature met needed for the action behavior for the target object that feature is used to limit the simulation;According to the demonstration
In behavioral data in the first state information in each joint of presentation objects and the emulation behavioral data target object it is each
Second status information in a joint determines the action behavior diversity factor between the target object of the simulation and the presentation objects;
Based on the action behavior diversity factor, optimize behaviour control strategy expressed by the neural network model, until reaching
Optimization aim, by the behaviour control strategy that the neural network model is expressed be determined as demonstration study in based on control strategy.
In one possible implementation, which may include storing program area and storage data area, wherein
Storing program area can storage program area, above mentioned program and at least one function (such as sound-playing function,
Image player function and positioning function etc.) needed for application program etc.;Storage data area can be stored according to computer equipment
The data created in use process, for example, audio data, phone directory etc..
In addition, memory 202 may include high-speed random access memory, it can also be including nonvolatile memory etc..
The communication interface 203 can be the interface of communication module, such as the interface of gsm module.
The application can also include input unit 205, which may include touch sensing unit, keyboard etc..
The display 204 includes display panel, such as touch display panel.
Certainly, computer equipment structure shown in Fig. 2 does not constitute the restriction to computer equipment in the embodiment of the present application,
Computer equipment may include than more or fewer components shown in Fig. 2, or the certain components of combination in practical applications.
It is introduced below with reference to learning method of the flow chart to the behaviour control strategy of the application.
As shown in figure 3, it illustrates a kind of a kind of flow diagram of the learning method of behaviour control strategy of the application, this
The scheme of embodiment can be applied to above-mentioned computer equipment, this method comprises:
S301 samples out the demonstration behavioral data segment as training sample from demonstration behavioral data sequence.
Wherein, demonstration behavioral data sequence includes the demonstration behavioral data of multiple continuous different moments.And demonstrate behavior
Data slot belongs to continuous a part of data segment in demonstration behavioral data sequence, correspondingly, in the demonstration behavioral data segment
Including at least two demonstration behavioral datas with sequencing, that is, demonstration behavioral data segment includes the two neighboring moment
Demonstrate behavioral data.The demonstration behavioral data includes the status information in each joint of presentation objects.
The status information in joint can characterize the particular state that the joint is presented, and be can reflect out by the status information
The motion state in joint, and then reflect by the status information in each joint the action behavior of presentation objects.Such as, the shape in joint
State information includes one of state values such as angle locating for joint and speed or several.Wherein, for the ease of with it is subsequent
The status information in each joint in emulation distinguishes, and the status information in the joint of the presentation objects is known as first state letter
Breath.
It is understood that obtain demonstration behavioral data sequence mode can there are many, e.g., in a kind of possible realization
It in mode, can be after presentation objects demonstrate action behavior, presentation objects demonstration captured by motion capture equipment
Demonstration data, the demonstration data can be used as demonstration behavioral data sequence;Alternatively, being to handle demonstration data to obtain the demonstration
Behavioral data sequence.Certainly, the demonstration behavioral data sequence is obtained by other means and is applied equally to the present embodiment, to this
It is without restriction.
It is available for training behaviour control strategy institute it is understood that carrying out sampling to demonstration behavioral data sequence
The sample needed.Wherein, the specific of the demonstration behavioral data segment as training sample is sampled out from demonstration behavioral data sequence
Mode can there are many.Such as, one piece of data can be sampled out as training sample from demonstration behavioral data sequence at random every time.
It is of course also possible to be once to sample out multiple demonstration behavioral data segments from demonstration behavioral data sequence, but each training
Period is only used only a demonstration behavioral data segment and is trained to neural network.
The each joint for the target object simulated in physics emulator is arranged according to demonstration behavioral data segment in S302
Initial state information, and determine to act on the work in each joint of the target object using neural network model to be trained
Use force data.
Wherein, which is also referred to as physical engine, is a for simulating the simulated program of intelligent body movement.
In the embodiment of the present application, the intelligent body that can be simulated in the physical simulation device is the target object, meanwhile, the object
Managing emulator can be with stress and motion conditions of the simulation objectives object in real physical space.
Wherein, which is the object of behavior technical ability to be learned, and e.g., by taking game application as an example, which can
Think the game objects such as the game charater in game application.As front it is found that possessed by the target object and the presentation objects
Joint is identical.
In the embodiment of the present application, behaviour control strategy is expressed by neural network model, therefore, passes through training nerve
Network model is available for the behaviour control strategy controlled each joint of target object.Behavior control strategy
The active force in each joint for the target object that can be exported by the neural network model characterizes.
It is understood that in order to enable the target object simulated in physical simulation device can learn to demonstrate behavioral data pair
The demonstration behavior answered needs first to set the mesh in physical simulation device based on behavioral data is demonstrated in the demonstration behavioral data segment
The initial state information for marking each joint of object, so that the initial actuating behavior for the target object simulated in physical simulation device
It is consistent with action behavior first or intermediate in presentation objects in demonstration behavior segment.
Alternatively, in order to enable physical simulation device can simulate target object study demonstration behavioral data
The corresponding each demonstration behavior of segment, can be according to presentation objects in demonstration behavioral data first in demonstration behavioral data segment
The original state letter in each joint for the target object simulated in physics emulator is arranged in the first state information in each joint
Breath.In that case, in physical simulation device the status information in each joint of target object with the demonstration behavioral data segment
In the first state information of first demonstration behavioral data presentation objects corresponding joint for being included be consistent.
Correspondingly, can be by the first state information input in each joint of presentation objects in the first demonstration behavioral data
To neural network model to be trained, each joint for controlling the target object of neural network model output is obtained
Force data.It in this application, is that neural network is completed by the interaction between neural network model and physical simulation device
The training of model, therefore, the neural network model need the demonstration behavioral data based on input, predict target object study
The active force situation in each joint needed for the corresponding demonstration behavior of the demonstration behavioral data.Since the target object needs and should
Presentation objects joint having the same, therefore, neural network model may be considered target object (alternatively, physical simulation herein
The target object simulated in device) each joint force data, it is also assumed that be presentation objects each joint it is corresponding
Force data.
Wherein, the force data in joint can be the data for the power being applied on the joint, e.g., be applied on the joint
One or more of size, direction and the duration of control force etc. data.
Wherein, which can be set as needed, and alternatively, which can
Think deep neural network model.
S303 controls the physical simulation device based on the force data in each joint that the neural network model determines
The movement in each joint of the target object of middle simulation, so that the physical simulation device limits spy based on the action behavior of setting
Sign simulates emulation behavioral data sequence.
The emulation behavioral data sequence includes at least one emulation behavioral data, which includes the target pair
Second status information in each joint of elephant.
Wherein, it is imitative that the status information in each joint for the target object that physical simulation device simulates equally can reflect out this
The action behavior of the target object really gone out, e.g., angle and speed etc. locating for each joint of the target object simulated
Numerical value.For the ease of distinguishing, the status information in the joint of the target object simulated is known as the second status information.
It is understood that the situation that the initial state information in each joint of target object determines in physical simulation device
Under, the force data in each joint of neural network model output is input in physical simulation device, physics can be made imitative
True device simulates active force suffered by each joint of the target object, so that simulating each joint of the target object has
In the case where corresponding active force, the movement in each joint of the target object changes, the target object simulated
The status information in each joint.
It is understood that each joint for the target object simulated every time into physical simulation device applies direct action
Each joint of power, the target object can have the variation of a status information, so that one for simulating the target object is imitative
True behavioral data.
Physical simulation device can also constantly be interacted with neural network model, to simulate multiple emulation behavioral datas.Such as, root
According to the quantity or combination actual needs of the demonstration behavioral data for including in demonstration behavioral data segment, it is imitative that physics can also be set
The multiple interaction of true device and neural network model, that is, the emulation behavioral data for combining physics emulator to simulate update nerve net
The force data of network model output, and the force data that neural network model exports is applied to physical simulation device mould again
In quasi- target object, and the process is constantly repeated, a series of emulation behavioral data can be simulated, to obtain comprising extremely
The emulation behavioral data sequence of few emulation behavioral data.
Particularly, it is additionally provided with action behavior in the physical simulation device of the application and limits feature, which limits
Feature is used to limit the feature met needed for the action behavior of the target object of the simulation.That is, matching in physical simulation device
The action behavior demand additionally met needed for being equipped with for limiting target object study action behavior.
Such as, which limits during feature can carry out action behavior for the target object of configuration simulation and needs
The article of setting is carried, for example, target object needs to carry chest.
For another example, which limits feature and can be limited the action behavior mode of the quasi- target object of cover half, for example,
Target object needs continuous transformation movement.
For another example, action behavior restriction feature, which can be limited, sets the goal object needs while controlling special article movement,
Learn action behavior.
It is understood that passing through nerve net in the case where being provided with action behavior restriction feature in physical simulation device
Interaction between network model and physical simulation device finally needs the action behavior of the target object simulated to meet principle: in mould
Before the action behavior for the target object drawn up action behavior corresponding with the demonstration behavioral data of presentation objects is as similar as possible
It puts, so that the action behavior of target object meets the action behavior and limits feature.
For example:
Assuming that the action behavior of presentation objects demonstration is walking motion, and demonstrating the destination of study is so that target object
It practises and removes article walking This move behavior.In that case, then the action behavior limitation configured in physical simulation device is special
Sign can remove article for target object.
S304, according to the first state information in each joint of presentation objects in the demonstration behavioral data and emulation behavior
Second status information in each joint of the target object, determines between the target object of the simulation and the presentation objects in data
Action behavior diversity factor.
Wherein, which is used to reflect the first state information and emulation in each joint of the presentation objects
Comprehensive differences situation between second status information in each joint of the target object out.As it can be seen that the comprehensive differences situation
Diversity factor between the action behavior for the target object actually simulated and the action behavior of the presentation objects.
Wherein it is determined that the concrete mode of action behavior diversity factor can be set as needed, e.g., physical simulation device is simulated
Each emulation behavioral data be respectively for the corresponding each joint of each demonstration behavioral data in demonstration behavioral data segment
First state information learnt, therefore, according to the elder generation of emulation behavioral data each in emulation behavioral data sequence
The sequencing of each demonstration behavioral data in sequence afterwards, and demonstration behavioral data segment determines relatively corresponding emulation row
For data and demonstration behavioral data.It, can be according to emulation behavioral data for each pair of emulation behavioral data and demonstration behavioral data
The first state value in each joint of middle target object and the second state for demonstrating each joint in presentation objects in behavioral data
Value, calculates separately target object state difference value corresponding with joint each in presentation objects, e.g., calculates target object and demonstration
The Euclidean distance of the status information in each joint of object.It is then possible to which the average value according to all state difference values determines
Action behavior diversity factor
S305 is based on the action behavior diversity factor, optimizes behaviour control strategy expressed by the neural network model, until
Reach optimization aim, by the behaviour control strategy that the neural network model is expressed be determined as demonstration study in based on control plan
Slightly.
It is understood that the action behavior diversity factor can reflect out the action behavior of the target object simulated and drill
Show the difference degree of the action behavior of object presentation, therefore, which can be used as optimization neural network model
Based on parameter.
Wherein, behaviour control strategy expressed by optimization neural network model is substantially exactly to adjust the neural network model
Inner parameter, to change behaviour control strategy expressed by neural network model.
It alternatively, can be by demonstration study in conjunction with nitrification enhancement, correspondingly, can be dynamic according to this
Make behavioral difference degree, and combine nitrification enhancement, determines pumping signal;According to the pumping signal, the neural network mould is adjusted
Inner parameter in type.
It is understood that the optimization aim can be set as needed, reaches optimization aim and then illustrate to have demonstrated row
For data, and the target object simulated in the behaviour control policy control physical simulation device exported by neural network model is dynamic
The similarity degree for making the demonstration behavior of behavior and presentation objects meets the requirements.Such as, in an optional implementation manner, the optimization
Target can be that action behavior diversity factor is minimum value, that is, action behavior diversity factor is determined dynamic before being less than current time
Make behavioral difference degree.The optimization aim can also be that the variation amplitude for the action behavior diversity factor determined is less than setting value.
If based on currently determining action behavior diversity factor determining that optimization aim currently has not yet been reached, need to be based on
The action behavior diversity factor optimizes the behaviour control strategy of neural network model expression, meanwhile, it needs using sampling out
Training sample continue to train the neural network model.Such as, it if optimization aim has not yet been reached, needs to continue to train, e.g., such as
Multiple demonstration behavioral data segments are sampled out in fruit step S301, then can choose the demonstration behavioral data segment for being not used for training
Continue to execute the operation of step S302 to S305.Optionally, one only is being sampled out from demonstration behavioral data sequence every time
It, then can be with return step S301, again from the demonstration behavior in the case where demonstration behavioral data segment as training sample
A demonstration behavioral data segment is sampled out in data sequence as training sample, and continues to execute the behaviour of step S302 to S305
Make, until reaching optimization aim.
Correspondingly, can then terminate to learn (training in other words), then the nerve trained if it is determined that reach optimization aim
Network model is used as the behaviour control strategy of target object in real scene.
It alternatively, can also be by the neural network model after training obtains the neural network model
It is loaded into the destination application, to pass through behaviour control policy control target application expressed by the neural network model
The action behavior of the target object of process control.Wherein, which is used to control the operation of target object, that is, should
Destination application is the controlling extent of target object in practical application scene, and the target object not simulated in simulated environment
Control program.
It such as,, can be by the nerve after training obtains the neural network model by taking the demonstration of field of play study as an example
Network model is loaded into game application, with the action row based on game object in neural network model control game application
For.Such as, the current action behavior of game object is input in neural network model, and being somebody's turn to do based on neural network model output
The force data in each joint of game object controls the movement in each joint of the game object, so that game pair
As action behavior that is similar to presentation objects and meeting behavior limited features can be obtained.
It can be seen via above technical scheme that behaviour control strategy needed for demonstration study passes through neural network in the application
Model tormulation.Behaviour control expressed by neural network model is completed by the cooperation of neural network model and physical simulation device
The training of strategy, moreover, during training neural network model, other than combining and demonstrating behavioral data, also in physics
Action behavior limited features corresponding to object in emulator provided with behavior technical ability to be learned are limited special by action behavior
Sign can limit the feature requirement met needed for the behavioural characteristic for the target object simulated in physics emulator, so that instruction
Behaviour control strategy expressed by the neural network model practised can make target object generation to the greatest extent may be used with demonstration behavioral data
Can be similar, and meet other action behaviors of the action behavior limited features of setting again.
It follows that can both have when controlling the action learning of target object based on the neural network model that training obtains
And the similar action behavior of demonstration behavioral data and action behavior corresponding with demonstration behavioral data is arrived conducive to target object study
Not exactly the same action behavior, it can the other similar action behavior expanded is conducive to target object and is being based on drilling
Show that action behavior data can learn action behaviors different from the demonstration behavior of demostrating action behavioral data out, thus not having
In the case where the demonstration behavioral data of certain action behavior, the also behaviour control strategy of available corresponding actions behavior, in turn
It can learn similar with demonstration behavior out but be different action behavior based on behavior control strategy control target object, favorably
In the complexity for reducing demonstration study.
In order to make it easy to understand, below to this Shen for the process by deeply study to train to obtain neural network model
Scheme please is illustrated.In this kind of situation, deeply study is combined with demonstration study, and according to behavior to be learned
The specific tasks requirement of the target object of technical ability, sets behavior act limited features, obtains being suitable for the target object with training
Learn to presentation objects action form and meeting the action behavior of particular requirement.
Such as Fig. 4, it illustrates a kind of process signals of another embodiment of the learning method of behaviour control strategy of the application
Figure, the present embodiment apply equally to above-mentioned computer equipment, and the method for the present embodiment may include:
S401, one section of demonstration behavioral data segment of stochastical sampling from the demonstration behavioral data sequence obtained.
Such as, the demonstration behavioral data in a continuous time period is randomly selected as the demonstration behavioral data segment, this is drilled
Show that behavioral data segment includes presentation objects at least two continuous moment corresponding demonstration behavioral datas.Demonstration behavior number
According to the first state value in same each joint including presentation objects.
It is understood that above step S401 is to sample out a demonstration behavioral data segment as training sample
For illustrate, but for other situations, be also applied for the present embodiment.
S402, according in demonstration behavioral data segment in first demonstration behavioral data each joint of presentation objects the
The initial state information in each joint for the target object simulated in physics emulator is arranged in one status information.
Such as, the initial state information in each joint of the target object in physical simulation device is gone with this first demonstration respectively
First state information for the joint in presentation objects in data is consistent, to set target object in physical simulation device
Original state so that the subsequent target object that can simulate of physical simulation device learns in the demonstration behavioral data segment
Second and the corresponding demostrating action of subsequent demonstration behavioral data.
S403, by this first demonstrate behavioral data in presentation objects each joint first state information input to
Trained neural network model obtains the active force in each joint for controlling the target object of neural network model output
Data.
S404, the initial state information in each joint based on the target object simulated in the physical simulation device, foundation should
The force data in each joint for the target object that neural network model determines, the target simulated into the physical simulation device
Each joint of object applies active force, so that the physical simulation device limits feature based on the action behavior of setting, simulates
One emulation behavioral data of the target object.
It is understood that in the case that the initial state information in each joint of target object determines in physical simulation device,
To each joint active force of the target object, it can make the state in each joint in target object that primary change occur, obtain
To an emulation behavioral data, which includes second status information in each joint of target object.
It is understood that the emulation behavioral data simulated in step S404 is in each joint of target object
In the case where original state, the active force according to neural network model output simulates the state to each joint of target object
Information, therefore, emulation behavioral data characterization be in physical simulation device target object study in demonstration behavior segment the
Two demonstration behavioral datas learn action behavior out.
Whether the total quantity of S405, detection emulation behavioral data meet setting condition, if so, confirmation is obtained comprising at least
The emulation behavioral data sequence of one emulation behavioral data, and execute step S408;If not, thening follow the steps S406.
Wherein, which can be set as needed, such as, it is assumed that set quantity for demonstrating in behavioral data segment
A demonstration behavioral data carries out demonstration study, then can the setting condition can reach the setting quantity for total quantity.
Optionally, physical simulation device can be set to need to own in simulation objectives simulating demonstration behavioral data segment
The corresponding demostrating action of behavioral data is demonstrated, therefore, which can be the total quantity and demonstration of the emulation behavioral data
The quantity that behavioral data is demonstrated in behavioral data segment is consistent;Either, the total quantity for demonstrating behavioral data is more than the demonstration row
For the quantity for demonstrating behavioral data in data slot.Herein, it should be noted that if target object is each in physical simulation device
The initial state information in joint is also determined as the emulation behavioral data that the physical simulation device simulates, then imposing a condition
Can be: the total quantity for emulating behavioral data be consistent with the demonstration quantity of behavioral data in demonstration behavioral data segment.If
The initial state information in each joint of target object is not identified as one that the physical simulation device simulates in physical simulation device
Behavioral data is emulated, then only needing to emulate demonstration behavioral data in the total quantity and demonstration behavioral data segment of behavioral data
It is identical that quantity subtracts 1.
It is understood that impose a condition if the total quantity of emulation behavioral data meets, at least one will simulated
A emulation behavioral data is determined as emulating behavioral data sequence.It is understood that if target object is each in physical simulation device
The initial state information in a joint is also determined as the emulation behavioral data that the physical simulation device simulates, then emulation row
It should include at least two emulation behavioral datas for data sequence.
The emulation behavioral data for the target object that the physical simulation device the last time simulates is input to the mind by S406
Through network model, the force data in each joint of updated target object is obtained.
S407, the force data in each joint according to the updated target object, is simulated into physical simulation device
Target object each joint apply active force so that physical simulation device based on setting action behavior limit feature, imitate
The emulation behavioral data of true target object out, and return step S405, until the total quantity of the emulation behavioral data simulated is full
Foot imposes a condition.
In step S406 and S407, neural network model can based on the emulation behavioral data that physical simulation device simulates,
The force data of application needed for updating each joint to target object, and control physical simulation device and continue simulation objectives object
The movement in each joint, until obtaining multiple emulation behavioral datas.
Such as, it is assumed that demonstration behavioral data segment includes continuous 5 demonstrations behavioral data, then being based on demonstration behavior
First demonstration behavioral data is provided with the original state letter in each joint of target object in physical simulation device in data slot
Breath, so that the physical simulation device can be simulated by step S404 after physical simulation device obtains first emulation behavioral data
Second emulation behavioral data corresponding with this second demonstration behavioral data, then three times by step S406 and S407
It repeats, it can also be a to the corresponding third of the 5th demonstration behavioral data with third in demonstration behavioral data segment
To the 5th emulation behavioral data, to obtain the emulation behavioral data sequence comprising five emulation behavioral datas.
S408 according at least two demonstration behavioral datas in demonstration behavioral data segment and is emulated in behavior sequence at least
One emulation behavioral data, determines the action behavior diversity factor between the target object of the simulation and the presentation objects.
It is understood that due to demonstrating in behavioral data segment in first demonstration behavioral data and the physical simulation device
The initial state information in each joint of target object is consistent, then can only need to drill first in demonstration behavioral data segment
Show and is simulated after the initial state information in each joint in demonstration behavioral data and physical simulation device after behavioral data
Emulation behavioral data is compared.
Certainly, if to be also determined as the physics imitative for the initial state information in each joint of target object in physical simulation device
The emulation behavioral data that true device simulates then can be with then physical simulation device can export at least two emulation behavioral datas
Corresponding relationship in sequence, successively the corresponding demonstration behavioral data of comparison sequence and emulation behavioral data.
S409, according to the action behavior diversity factor and current time predetermined action behavior diversity factor, detection is dynamic
Make whether behavioral difference degree reaches convergence state, if not, thening follow the steps S410;If it is, terminating training.
Wherein, which can be understood as the convergence state routinely set in intensified learning, several as previously mentioned
Kind optimization aim, repeats no more this.
S410 determines pumping signal according to action behavior diversity factor.
It is understood that intensified learning is to remove training smart body, In using the physical engine and enhanced signal of height emulation
In training process, intelligent body is constantly interacted using existing strategy with physical engine, generates a series of enhanced signal (i.e.
Pumping signal), these pumping signals are used in more new strategy.In the present embodiment, strategy is expressed by neural network model, and
The intelligent body is therefore the target object simulated in physical engine according to the action behavior diversity factor, can be determined to be used for
Update pumping signal tactful in neural network model.
Wherein, the action behavior diversity factor is bigger, then the pumping signal is smaller;Conversely, the action behavior diversity factor is smaller,
The pumping signal is bigger.
S411 adjusts the inner parameter in neural network model, according to the pumping signal to change the neural network model
Expressed behaviour control strategy, and return step S401 go out to demonstrate action behavior segment with resampling.
It is understood that being the target object simulated by continuing to optimize neural network model target to be reached
Can generate can be expressed as follows with demonstration data action behavior as similar as possible, the optimization problem:
min|τ-τE|, and follow h (τ)≤0, g (τ)=0;
Wherein, τETo demonstrate behavioral data, τ is the emulation behavioral data of the target object for the simulation that final optimization pass obtains,
The emulation behavioral data includes second status information in each joint of target object of simulation.H (τ)≤0 and g (τ)=0 indicates to use
In two kinds of setting means for setting different action behavior limited features, e.g., h (τ)≤0 can be ability when being not belonging to certain
Can with motion characteristic.And g (τ)=0 can be for equal to the action behavior feature that can just execute in the case of certain.
It follows that optimization problem essence be exactly generate meet action behavior limitation it is specific and with demonstration behavioral data as far as possible
Similar optimization data, i.e. τ.
Correspondingly, defining excitation function using the τ of the optimization of generation as learning objective, carried out in physical simulation device a large amount of
After emulation, the neural network model for updating expression behaviour control strategy can be removed with determining pumping signal.
For the ease of intuitively understand the application behaviour control strategy learning method, may refer to Fig. 5, it illustrates
The present processes realization principle block schematic illustration.
As seen from Figure 5, after sampling out demonstration behavioral data in demonstration behavioral data sequence, behavioral data is demonstrated
It can be input in neural network model, and neural network model is based on the demonstration behavioral data and can export for controlling physical simulation
The force data in the corresponding each joint of the target object simulated in device, so that the physical simulation device can be based on movement
Behavioural characteristic emulates the behavior of target object, and exports the emulation behavioral data of the target object of emulation.The emulation
Behavioral data includes the status information in each joint of the target object simulated.Pass through contrast simulation behavioral data and sampling
Demonstration behavioral data out can determine the behavioral difference degree between presentation objects and target object, in this way, Behavior-based control is poor
Different degree can optimize the neural network model, until reaching convergence, so that the emulation behavioral data and phase of the output of physical simulation device
The demonstration behavioral data answered, which approaches and emulates the action behavior that behavioral data is characterized, meets the action behavior feature.
The benefit of application scheme in order to facilitate understanding is introduced below with reference to an application scenarios.
Illustrate by taking the demonstration study of game charater in game application as an example, and assumes that game charater is needed to be based on real user
Article walking is removed in the walking motion generation of demonstration.In that case, the learning method of the behaviour control strategy of the present embodiment
It may refer to shown in Fig. 6, which can be applied to computer equipment, which may include:
S601 obtains the demonstration data sequence of the walking motion of real user demonstration.
In the present embodiment, to be learnt based on demonstration so that the game charater in game application may learn real user
For behavior act, therefore, which is the data of the walking motion of real user demonstration.Specifically, the demonstration
Data sequence includes: first state value of each joint at multiple and different moment of real user.
It is understood that the present embodiment is for needing the walking motion of game charater study real user, still
If game charater movement to be learned is other movements, only needs to obtain real user or there is phase with game charater
With the demonstration data sequence for the corresponding actions that the presentation objects in joint are demonstrated.For example, it is desired to which game charater study is turned a somersault
Movement, then only demonstration data sequence need to be replaced with the demonstrations such as real user demonstration the demonstration data sequence turned a somersault i.e.
It can.
S602, one section of demonstration data segment of stochastical sampling from demonstration data sequence.
S603, according to first demonstration data in demonstration data segment, game charater is each in setting physical simulation device
The initial state information in a joint obtains first emulation behavioral data of the Mission Objective in physical simulation device.
Step S602 and S603 still in the way of a kind of sample train sample for illustrate, but for other sampling sides
Formula is applied equally to the present embodiment.
First demonstration data is input to neural network model to be trained, obtains the neural network model by S604
The force data in each joint of the game charater to be simulated of output.
S605, the force data in each joint of the game charater according to neural network model output, controls object
The movement in each joint for the game charater simulated in reason emulator, so that the belongings of the physical simulation device based on setting
Product walking characteristics, the game charater simulated is in the case where belongings are walked, and the of each joint of the game charater
Two-state information, second emulation behavioral data of the target object simulated.
It is understood that due to needing game charater to be taken based on the walking motion extension study that real user is demonstrated
With article walking movement, therefore, the action behavior controlling feature configured in physical simulation device be target object belongings (such as
Chest) this feature of walking.Correspondingly, physical simulation device can be according to the active force number in each joint of neural network model input
According to the process walked to game charater belongings emulates, to export the game charater belongings that emulation obtains
The emulation behavioral data of walking.The emulation behavioral data includes second status information in each joint of game charater.
It is understood that it is relevant to walking motion to expand other if necessary to the walking motion based on real user
Action behavior, then configuration behavior movement restriction feature would also vary from emulation controller.Such as, need game charater according to
The normal walking motion of real user learns constantly to alter one's posture out the motor skill of walking, configures in the physical simulation device dynamic
Make behavior limit feature can be with are as follows: the walking postures of game charater adjacent moment are different.Certainly, the present embodiment is to learn to walk
For the scene of movement, if action behavior to be learned is other situations, the action row that can be demonstrated according to presentation objects
For and game charater needed for extension specific action behavior, the action behavior limited features in the physical simulation device are set.
The emulation behavioral data for the game charater that physical simulation device the last time simulates is input to neural network by S606
Model obtains the force data in each joint of updated game charater, and according to each of updated game charater
Each joint of the force data in joint, the game charater simulated into physical simulation device applies active force, so that physics
Emulator limits feature based on the action behavior of setting, simulates the emulation behavioral data of game charater, repeats step S606,
Until the total quantity of the emulation behavioral data simulated is consistent with the demonstration total quantity of behavioral data in demonstration behavioral data segment.
Step S606 may refer to the related introduction of preceding embodiment, and details are not described herein.
S607, the emulation behavioral data sequence simulated according to demonstration behavioral data each in the demonstration behavioral data segment
In each emulation behavioral data, determine simulation game charater and the real user between action behavior diversity factor.
S608 works as according to the action behavior diversity factor and current time predetermined action behavior diversity factor, detection
Whether the action behavior diversity factor of preceding determination reaches minimum value, if not, thening follow the steps S609;If it is, terminating training.
The present embodiment is it by taking optimization aim reaches minimum for action behavior diversity factor as an example, but for optimization aim
He is applied equally to the present embodiment at situation.
S609 determines pumping signal according to action behavior diversity factor.
S610 adjusts the inner parameter in neural network model, according to the pumping signal to change the neural network model
Expressed behaviour control strategy, and return step S602 go out demonstration data sequence fragment with resampling, and resampling goes out
One demonstration data as training sample.
It is understood that then neural network model training is completed, on the basis after confirmation reaches optimization aim
On, then the action row of the game charater in game application can be controlled based on control strategy expressed by the neural network model
For so that game charater may learn the action behavior of belongings walking.
Specifically, the neural network model trained can be loaded into game application, the trip in the game application
Play personage carries article.In that case, the status information in each joint of the available game charater of game application, and
The status information in each joint of game charater is input in the neural network model;Then, game application can be based on being somebody's turn to do
The movement in the active force control each joint of game charater in each joint of the game charater of neural network model output, so that
Obtaining game charater can be generated the movement of belongings walking.
By the present embodiment as it can be seen that the scheme of the application can be trained based on the walking motion of real user for controlling
The corresponding neural network model of behaviour control strategy needed for the walking of game charater belongings, so as to be based on neural network
Model carries out action control to the game charater in game application, so that game charater may learn and real user demonstration
Walking motion is similar, and the movement skill of the belongings walking expanded on the basis of the walking motion of real user demonstration
Energy.
By test, the scheme of the application can make game charater obtain the behavior that optimal carrying article is walked, can
It is walked with long-time stable, realizes the inaccessiable effect of currently existing scheme institute.
A kind of learning method of behaviour control strategy of corresponding the application, present invention also provides a kind of behaviour control strategies
Learning device.
As shown in fig. 7, it illustrates a kind of composition knots of learning device one embodiment of behaviour control strategy of the application
The device of structure schematic diagram, the present embodiment may include:
Data sampling unit 701, for sampling out the demonstration behavior as training sample from demonstration behavioral data sequence
Data slot, the demonstration behavioral data segment include at least two demonstration behavioral datas with sequencing, the demonstration
Behavioral data includes the first state information in each joint of presentation objects;
Model cootrol unit 702, for the mesh simulated in physics emulator to be arranged according to the demonstration behavioral data segment
The initial state information in each joint of object is marked, and determines to act on the target using neural network model to be trained
The force data in each joint of object, the target object and presentation objects joint having the same;
Data simulation unit 703, each joint of the target object for being determined based on the neural network model
Force data, the movement in each joint for the target object simulated in the physical simulation device is controlled, so that the object
It manages emulator and feature is limited based on the action behavior of setting, simulate the emulation behavioral data sequence of the target object, it is described
Emulation behavioral data sequence includes at least one emulation behavioral data with sequencing, and the emulation behavioral data includes institute
Second status information in each joint of target object is stated, the action behavior limits the target that feature is used to limit the simulation
The feature met needed for the action behavior of object;
Difference comparing unit 704, for the first shape according to each joint of presentation objects in the demonstration behavioral data
Second status information in each joint of target object described in state information and the emulation behavioral data, determines the simulation
Target object and the presentation objects between action behavior diversity factor;
Training optimization unit 705 optimizes expressed by the neural network model for being based on the action behavior diversity factor
Behaviour control strategy the behaviour control strategy that the neural network model is expressed is determined as drilling until reach optimization aim
Control strategy based in dendrography habit.
In one possible implementation, the training optimization unit, comprising:
Whether detection sub-unit reaches the optimization aim of setting for detecting the action behavior diversity factor;
Circuit training subelement is based on if being not up to the optimization aim set for the action behavior diversity factor
The action behavior diversity factor, optimizes the behaviour control strategy of the neural network model expression, and returns and execute the data
The operation of sampling unit;
Finishing control subelement confirms if reaching the optimization aim of setting for the action behavior diversity factor
Practise complete, by the behaviour control strategy that the neural network model is expressed be determined as demonstration study in based on control strategy.
Optionally, the training optimization unit or circuit training subelement are being based on the action behavior diversity factor, excellent
When changing the behaviour control strategy of the neural network model expression, specifically, being used for according to the action behavior diversity factor, and base
In nitrification enhancement, pumping signal is determined;According to the pumping signal, the inside ginseng in the neural network model is adjusted
Number, to change behaviour control strategy expressed by the neural network model.
In one possible implementation, the model cootrol unit, comprising:
Initialization unit is emulated, for according to presentation objects in first demonstration behavioral data in the demonstration behavioral data segment
Each joint first state information, the original state letter in each joint of target object simulated in physics emulator is set
Breath;
Starting force determination unit, for by the of each joint of presentation objects described in the first demonstration behavioral data
One status information is input to neural network model to be trained, obtain neural network model output for controlling the mesh
Mark the force data in each joint of object.
In another possible implementation, the data simulation unit, comprising:
Simulation Control unit, the initial shape for each joint based on the target object simulated in the physical simulation device
State information, the force data in each joint of the target object determining according to the neural network model, Xiang Suoshu object
The each joint for the target object simulated in reason emulator applies active force, so that the physical simulation device is based on the dynamic of setting
Make behavior and limit feature, simulates an emulation behavioral data of the target object;
Finishing control unit is emulated, imposes a condition, confirms if the total quantity for the emulation behavioral data meets
Obtain the emulation behavioral data sequence comprising at least one emulation behavioral data;
Simulation cycles unit, if the total quantity for the emulation behavioral data does not meet setting condition, by the object
The emulation behavioral data for the target object that reason emulator the last time simulates is input to the neural network model, obtains
The force data in each joint of the updated target object, and according to each of the updated target object
Each joint of the force data in a joint, the target object simulated in Xiang Suoshu physical simulation device applies active force, so that
It obtains the physical simulation device and feature is limited based on the action behavior of setting, simulate the emulation behavioral data of the target object,
It imposes a condition until the total quantity of the emulation behavioral data simulated meets.
Optionally, which can also include:
Model applying unit, for obtaining the behaviour control strategy of the neural network model expression in training optimization unit
Later, the neural network model is loaded into destination application, to pass through row expressed by the neural network model
The action behavior of the target object of the destination application control is controlled for control strategy, the destination application is for controlling
The operation of target object processed.
On the other hand, present invention also provides a kind of storage medium, it is stored with that computer is executable to be referred in the storage medium
It enables, when the computer executable instructions are loaded and executed by processor, realizes the behavior control in as above any one embodiment
Make the learning method of strategy.
It should be noted that all the embodiments in this specification are described in a progressive manner, each embodiment weight
Point explanation is the difference from other embodiments, and the same or similar parts between the embodiments can be referred to each other.
For device class embodiment, since it is basically similar to the method embodiment, so being described relatively simple, related place ginseng
See the part explanation of embodiment of the method.
Finally, it is to be noted that, herein, relational terms such as first and second and the like be used merely to by
One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation
Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant meaning
Covering non-exclusive inclusion, so that the process, method, article or equipment for including a series of elements not only includes that
A little elements, but also including other elements that are not explicitly listed, or further include for this process, method, article or
The intrinsic element of equipment.In the absence of more restrictions, the element limited by sentence "including a ...", is not arranged
Except there is also other identical elements in the process, method, article or equipment for including element.
The foregoing description of the disclosed embodiments can be realized those skilled in the art or using the present invention.To this
A variety of modifications of a little embodiments will be apparent for a person skilled in the art, and the general principles defined herein can
Without departing from the spirit or scope of the present invention, to realize in other embodiments.Therefore, the present invention will not be limited
It is formed on the embodiments shown herein, and is to fit to consistent with the principles and novel features disclosed in this article widest
Range.
The above is only the preferred embodiment of the present invention, it is noted that those skilled in the art are come
It says, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications also should be regarded as
Protection scope of the present invention.
Claims (10)
1. a kind of learning method of behaviour control strategy characterized by comprising
The demonstration behavioral data segment as training sample, the demonstration behavioral data are sampled out from demonstration behavioral data sequence
Segment includes at least two demonstration behavioral datas with sequencing, and the demonstration behavioral data includes each of presentation objects
The first state information in joint;
According to the demonstration behavioral data segment, the initial shape in each joint for the target object simulated in physics emulator is set
State information, and determine to act on the active force number in each joint of the target object using neural network model to be trained
According to the target object and presentation objects joint having the same;
Based on the force data in each joint of the determining target object of the neural network model, the physics is controlled
The movement in each joint for the target object simulated in emulator, so that the action behavior of the physical simulation device based on setting
Feature is limited, the emulation behavioral data sequence of the target object is simulated, the emulation behavioral data sequence includes having first
At least one emulation behavioral data of sequence afterwards, the emulation behavioral data include the second of each joint of the target object
Status information, the action behavior limit the spy met needed for the action behavior for the target object that feature is used to limit the simulation
Sign;
According to the first state information in each joint of presentation objects and the emulation behavior number in the demonstration behavioral data
Second status information in each joint of the target object described in, determine the simulation target object and the presentation objects
Between action behavior diversity factor;
Based on the action behavior diversity factor, optimize behaviour control strategy expressed by the neural network model, until reaching
Optimization aim, by the behaviour control strategy that the neural network model is expressed be determined as demonstration study in based on control plan
Slightly.
2. the learning method of behaviour control strategy according to claim 1, which is characterized in that described to be based on the action row
For diversity factor, optimize behaviour control strategy expressed by the neural network model, until reaching optimization aim, comprising:
Detect the optimization aim whether the action behavior diversity factor reaches setting;
If the action behavior diversity factor is not up to the optimization aim set, it is based on the action behavior diversity factor, optimization
The behaviour control strategy of the neural network model expression, and return to described sample out from demonstration behavioral data sequence of execution and make
For the operation of the demonstration behavioral data segment of training sample;
If the action behavior diversity factor reaches the optimization aim of setting, confirm that study is completed.
3. the learning method of behaviour control strategy according to claim 1 or 2, which is characterized in that described based on described dynamic
Make behavioral difference degree, optimize the behaviour control strategy of the neural network model expression, comprising:
According to the action behavior diversity factor, and it is based on nitrification enhancement, determines pumping signal;
According to the pumping signal, the inner parameter in the neural network model is adjusted, to change the neural network model
Expressed behaviour control strategy.
4. the learning method of behaviour control strategy according to claim 1, which is characterized in that described to go according to the demonstration
For data slot, the initial state information in each joint for the target object simulated in physics emulator is set, and using wait instruct
Experienced neural network model is determined to act on the force data in each joint of the target object, comprising:
First state according to each joint of presentation objects in first demonstration behavioral data in the demonstration behavioral data segment
The initial state information in each joint for the target object simulated in physics emulator is arranged in information;
By the first state information input in each joint of presentation objects described in the first demonstration behavioral data to wait train
Neural network model, obtain the effect in each joint for controlling the target object of neural network model output
Force data.
5. the learning method of behaviour control strategy according to claim 1 or 4, which is characterized in that described to be based on the mind
The force data in each joint of the target object determined through network model is controlled and is simulated in the physical simulation device
The movement in each joint of target object, so that the physical simulation device limits feature, emulation based on the action behavior of setting
The emulation behavioral data sequence of the target object out, comprising:
The initial state information in each joint based on the target object simulated in the physical simulation device, according to the nerve net
The force data in each joint for the target object that network model determines, the target pair simulated in Xiang Suoshu physical simulation device
Each joint of elephant applies active force, so that the physical simulation device limits feature based on the action behavior of setting, simulates
One emulation behavioral data of the target object;
It imposes a condition if the total quantity of the emulation behavioral data meets, confirmation is obtained comprising at least one emulation behavior number
According to emulation behavioral data sequence;
If the total quantity of the emulation behavioral data does not meet setting condition, described physical simulation device the last time is simulated
The emulation behavioral data of the target object be input to the neural network model, obtain the updated target object
The force data in each joint, and the force data in each joint according to the updated target object, to
The each joint for the target object simulated in the physical simulation device applies active force, so that the physical simulation device is based on setting
Fixed action behavior limits feature, simulates the emulation behavioral data of the target object, until the emulation behavior number simulated
According to total quantity meet impose a condition.
6. the demonstration learning method of action behavior according to claim 1, which is characterized in that obtaining the neural network
After the behaviour control strategy of model tormulation, further includes:
The neural network model is loaded into destination application, to pass through behavior expressed by the neural network model
Control strategy controls the action behavior of the target object of the destination application control, and the destination application is for controlling
The operation of target object.
7. a kind of learning device of behaviour control strategy characterized by comprising
Data sampling unit, for sampling out the demonstration behavioral data piece as training sample from demonstration behavioral data sequence
Section, the demonstration behavioral data segment include at least two demonstration behavioral datas with sequencing, the demonstration behavior number
According to the first state information in each joint for including presentation objects;
Model cootrol unit, for the target object simulated in physics emulator to be arranged according to the demonstration behavioral data segment
Each joint initial state information, and determine to act on the target object using neural network model to be trained
The force data in each joint, the target object and presentation objects joint having the same;
Data simulation unit, the active force in each joint of the target object for being determined based on the neural network model
Data control the movement in each joint for the target object simulated in the physical simulation device, so that the physical simulation device
Action behavior based on setting limits feature, simulates the emulation behavioral data sequence of the target object, the emulation behavior
Data sequence includes at least one emulation behavioral data with sequencing, and the emulation behavioral data includes the target pair
Second status information in each joint of elephant, the action behavior limit the dynamic of target object of the feature for limiting the simulation
Make the feature met needed for behavior;
Difference comparing unit, for according to it is described demonstration behavioral data in presentation objects each joint first state information with
And second status information in each joint of target object described in the emulation behavioral data, determine the target pair of the simulation
As the action behavior diversity factor between the presentation objects;
Training optimization unit optimizes behavior expressed by the neural network model for being based on the action behavior diversity factor
The behaviour control strategy that the neural network model is expressed is determined as demonstration study until reaching optimization aim by control strategy
In based on control strategy.
8. the learning device of behaviour control strategy according to claim 7, which is characterized in that the training optimization unit,
Include:
Whether detection sub-unit reaches the optimization aim of setting for detecting the action behavior diversity factor;
Circuit training subelement, if being not up to the optimization aim set for the action behavior diversity factor, based on described
Action behavior diversity factor, optimizes the behaviour control strategy of the neural network model expression, and returns and execute the data sampling
The operation of unit;
Finishing control subelement confirms and has learnt if reaching the optimization aim of setting for the action behavior diversity factor
At, by the behaviour control strategy that the neural network model is expressed be determined as demonstration study in based on control strategy.
9. a kind of computer equipment characterized by comprising
Processor and memory;
The processor, for calling and executing the program stored in the memory;
The memory is used for storing said program, and described program is at least used for:
The demonstration behavioral data segment as training sample, the demonstration behavioral data are sampled out from demonstration behavioral data sequence
Segment includes at least two demonstration behavioral datas with sequencing, and the demonstration behavioral data includes each of presentation objects
The first state information in joint;
According to the demonstration behavioral data segment, the initial shape in each joint for the target object simulated in physics emulator is set
State information, and determine to act on the active force number in each joint of the target object using neural network model to be trained
According to the target object and presentation objects joint having the same;
Based on the force data in each joint of the determining target object of the neural network model, the physics is controlled
The movement in each joint for the target object simulated in emulator, so that the action behavior of the physical simulation device based on setting
Feature is limited, the emulation behavioral data sequence of the target object is simulated, the emulation behavioral data sequence includes having first
At least one emulation behavioral data of sequence afterwards, the emulation behavioral data include the second of each joint of the target object
Status information, the action behavior limit the spy met needed for the action behavior for the target object that feature is used to limit the simulation
Sign;According to the first state information in each joint of presentation objects and the emulation behavioral data in the demonstration behavioral data
Described in target object each joint the second status information, determine the simulation target object and the presentation objects it
Between action behavior diversity factor;
Based on the action behavior diversity factor, optimize behaviour control strategy expressed by the neural network model, until reaching
Optimization aim, by the behaviour control strategy that the neural network model is expressed be determined as demonstration study in based on control plan
Slightly.
10. a kind of storage medium, which is characterized in that be stored with computer executable instructions, the calculating in the storage medium
When machine executable instruction is loaded and executed by processor, as above behaviour control strategy as claimed in any one of claims 1 to 6 is realized
Learning method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910820695.0A CN110516389B (en) | 2019-08-29 | 2019-08-29 | Behavior control strategy learning method, device, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910820695.0A CN110516389B (en) | 2019-08-29 | 2019-08-29 | Behavior control strategy learning method, device, equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110516389A true CN110516389A (en) | 2019-11-29 |
CN110516389B CN110516389B (en) | 2021-04-13 |
Family
ID=68630130
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910820695.0A Active CN110516389B (en) | 2019-08-29 | 2019-08-29 | Behavior control strategy learning method, device, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110516389B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111223170A (en) * | 2020-01-07 | 2020-06-02 | 腾讯科技(深圳)有限公司 | Animation generation method and device, electronic equipment and storage medium |
CN111260762A (en) * | 2020-01-19 | 2020-06-09 | 腾讯科技(深圳)有限公司 | Animation implementation method and device, electronic equipment and storage medium |
CN111292401A (en) * | 2020-01-15 | 2020-06-16 | 腾讯科技(深圳)有限公司 | Animation processing method and device, computer storage medium and electronic equipment |
CN111340211A (en) * | 2020-02-19 | 2020-06-26 | 腾讯科技(深圳)有限公司 | Training method of action control model, related device and storage medium |
CN111832187A (en) * | 2020-07-24 | 2020-10-27 | 宁夏政安信息科技有限公司 | Realization method for simulating and demonstrating secret stealing means |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101441776A (en) * | 2008-12-04 | 2009-05-27 | 浙江大学 | Three-dimensional human body motion editing method driven by demonstration show based on speedup sensor |
CN105637540A (en) * | 2013-10-08 | 2016-06-01 | 谷歌公司 | Methods and apparatus for reinforcement learning |
CN106056213A (en) * | 2015-04-06 | 2016-10-26 | 谷歌公司 | Selecting reinforcement learning actions using goals and observations |
CN109291052A (en) * | 2018-10-26 | 2019-02-01 | 山东师范大学 | A kind of massaging manipulator training method based on deeply study |
CN109345614A (en) * | 2018-09-20 | 2019-02-15 | 山东师范大学 | The animation simulation method of AR augmented reality large-size screen monitors interaction based on deeply study |
-
2019
- 2019-08-29 CN CN201910820695.0A patent/CN110516389B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101441776A (en) * | 2008-12-04 | 2009-05-27 | 浙江大学 | Three-dimensional human body motion editing method driven by demonstration show based on speedup sensor |
CN105637540A (en) * | 2013-10-08 | 2016-06-01 | 谷歌公司 | Methods and apparatus for reinforcement learning |
CN106056213A (en) * | 2015-04-06 | 2016-10-26 | 谷歌公司 | Selecting reinforcement learning actions using goals and observations |
CN109345614A (en) * | 2018-09-20 | 2019-02-15 | 山东师范大学 | The animation simulation method of AR augmented reality large-size screen monitors interaction based on deeply study |
CN109291052A (en) * | 2018-10-26 | 2019-02-01 | 山东师范大学 | A kind of massaging manipulator training method based on deeply study |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111223170A (en) * | 2020-01-07 | 2020-06-02 | 腾讯科技(深圳)有限公司 | Animation generation method and device, electronic equipment and storage medium |
CN111223170B (en) * | 2020-01-07 | 2022-06-10 | 腾讯科技(深圳)有限公司 | Animation generation method and device, electronic equipment and storage medium |
CN111292401A (en) * | 2020-01-15 | 2020-06-16 | 腾讯科技(深圳)有限公司 | Animation processing method and device, computer storage medium and electronic equipment |
CN111292401B (en) * | 2020-01-15 | 2022-05-03 | 腾讯科技(深圳)有限公司 | Animation processing method and device, computer storage medium and electronic equipment |
US11790587B2 (en) | 2020-01-15 | 2023-10-17 | Tencent Technology (Shenzhen) Company Limited | Animation processing method and apparatus, computer storage medium, and electronic device |
CN111260762A (en) * | 2020-01-19 | 2020-06-09 | 腾讯科技(深圳)有限公司 | Animation implementation method and device, electronic equipment and storage medium |
WO2021143261A1 (en) * | 2020-01-19 | 2021-07-22 | 腾讯科技(深圳)有限公司 | Animation implementation method and apparatus, electronic device, and storage medium |
CN111260762B (en) * | 2020-01-19 | 2023-03-28 | 腾讯科技(深圳)有限公司 | Animation implementation method and device, electronic equipment and storage medium |
US11928765B2 (en) | 2020-01-19 | 2024-03-12 | Tencent Technology (Shenzhen) Company Limited | Animation implementation method and apparatus, electronic device, and storage medium |
CN111340211A (en) * | 2020-02-19 | 2020-06-26 | 腾讯科技(深圳)有限公司 | Training method of action control model, related device and storage medium |
CN111340211B (en) * | 2020-02-19 | 2020-11-24 | 腾讯科技(深圳)有限公司 | Training method of action control model, related device and storage medium |
CN111832187A (en) * | 2020-07-24 | 2020-10-27 | 宁夏政安信息科技有限公司 | Realization method for simulating and demonstrating secret stealing means |
Also Published As
Publication number | Publication date |
---|---|
CN110516389B (en) | 2021-04-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110516389A (en) | Learning method, device, equipment and the storage medium of behaviour control strategy | |
CN111260762B (en) | Animation implementation method and device, electronic equipment and storage medium | |
Vollenweider et al. | Advanced skills through multiple adversarial motion priors in reinforcement learning | |
Kim et al. | 1. soccer robotics | |
Rajendran et al. | Attend, adapt and transfer: Attentive deep architecture for adaptive transfer from multiple sources in the same domain | |
CN108284436B (en) | Remote mechanical double-arm system with simulation learning mechanism and method | |
CN109702744A (en) | A method of the robot learning by imitation based on dynamic system model | |
CN112476424A (en) | Robot control method, device, equipment and computer storage medium | |
JP2010527086A (en) | Character simulation method and system | |
CN107457780B (en) | Method and device for controlling mechanical arm movement, storage medium and terminal equipment | |
CN110328668A (en) | Robotic arm path planing method based on rate smoothing deterministic policy gradient | |
Chen et al. | Sequential dexterity: Chaining dexterous policies for long-horizon manipulation | |
Naderi et al. | Learning physically based humanoid climbing movements | |
Floyd et al. | Creation of devs models using imitation learning | |
Naderi et al. | A reinforcement learning approach to synthesizing climbing movements | |
Floyd et al. | Building learning by observation agents using jloaf | |
Zhou et al. | Efficient and robust learning on elaborated gaits with curriculum learning | |
CN104766359A (en) | Method for determining noise items of winged insect motion models | |
Say et al. | A model for cognitively valid lifelong learning | |
Bauer | Automated design of tendon-driven soft foam hands using Markov-Chain-Monte-Carlo optimization methods | |
Fischer et al. | SIM2VR: Towards Automated Biomechanical Testing in VR | |
Bowman et al. | Dynamic pre-grasp planning when tracing a moving object through a multi-agent perspective | |
Wang et al. | Reinforcement Learning based End-to-End Control of Bimanual Robotic Coordination | |
CN114841362A (en) | Method for collecting imitation learning data by using virtual reality technology | |
Ritter et al. | Trying to grasp a sketch of a brain for grasping |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |