CN102708377B - Method for planning combined tasks for virtual human - Google Patents

Method for planning combined tasks for virtual human Download PDF

Info

Publication number
CN102708377B
CN102708377B CN201210125122.4A CN201210125122A CN102708377B CN 102708377 B CN102708377 B CN 102708377B CN 201210125122 A CN201210125122 A CN 201210125122A CN 102708377 B CN102708377 B CN 102708377B
Authority
CN
China
Prior art keywords
state
visual human
subtask
motion
represent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210125122.4A
Other languages
Chinese (zh)
Other versions
CN102708377A (en
Inventor
李淳芃
宗丹
夏时洪
王兆其
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN201210125122.4A priority Critical patent/CN102708377B/en
Publication of CN102708377A publication Critical patent/CN102708377A/en
Application granted granted Critical
Publication of CN102708377B publication Critical patent/CN102708377B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention provides a method for planning combined tasks for virtual human. The method includes: step 1, setting up a behavior map of the virtual human on basis of motion capture data; step 2, finding a key state and decomposing combined tasks into sub-tasks on basis of the key state; step 3, learning optimal control strategies of each sub-task; and step 4, computing an optimal action sequence of the combined tasks on basis of an initial state of the virtual human in the environment. By the method for planning combined tasks for virtual human, computing time and storage spaces required in the action planning process can be reduced, and a planning algorithm can be converged to the optimal control strategies by probability one without any assumption of shapes of value functions of a controller.

Description

Visual human's combined task planing method
Technical field
The present invention relates to planning motion of virtual human field, relate in particular to a kind of visual human's combined task planing method.
Background technology
In recent years, motion of virtual human synthetic technology is a study hotspot in role animation and computer game, has widespread use in fields such as Entertainment, video display animation, computer aided decision making and virtual assemblings.But, how visual human's motion is planned, make visual human there is certain capacity of self-government, remain a challenging problem.
In recent years, scientific worker explores several different methods planning visual human's motion, for example motion planning method based on figure and the motion planning method based on enhancing study etc.(1) in the motion planning method based on figure, the motion fragment of catching is organized by the mode of motion diagram, and wherein node represents motion fragment, and directed edge represents the transition between motion fragment.By at the enterprising line search of motion diagram, meet thereby synthesize the motion of virtual human sequence that user requires.Motion planning method based on figure can retain a large amount of real human motion details, provides effective means for creating role animation true to nature, is therefore widely used in computer animation.(2) based on strengthening the motion planning method of study (RL, Reinforcement Learning) without the signal of supervising and guiding in the external world, apply very extensive.Mode that visual human searches for by trial and error is directly carried out alternately with environment, trains optimal control policy according to the enhancing signal of environmental feedback, the optimum action sequence that acquisition is met consumers' demand, thus the composition problem of motion is converted into the problem concerning study of control strategy.
But the motion planning method based on motion diagram needs the extraneous signal of supervising and guiding, and therefore cannot generate the motion of virtual human with capacity of self-government; And the existing motion planning method based on strengthening study requires sample motion to have the step sliding problem of identical constraint frame to prevent from may occurring in Motion fusion process.Visual human's compound movement or task comprise multi-motion type, thereby comprise multiple constraint frame, and state space increases greatly, and planning process exists dimension blast problem.
In sum, utilize existing motion planning method, or cannot generate the motion of virtual human with capacity of self-government, or in planning when complex task, problem that existence number is too much, computing time is long (being dimension blast).
Summary of the invention
The object of this invention is to provide a kind of visual human's combined task planing method, reduce required computing time and storage space in motion planning process.
According to an aspect of the present invention, provide a kind of visual human's combined task planing method, comprising:
Step 1, based on motion capture data, build visual human's behavior figure;
Step 2, find key state, and based on key state, combined task is decomposed into subtask;
Step 3, learn the optimal control policy of each subtask; With
Step 4, original state based on visual human in environment, the optimum action sequence of calculation combination task.
Optionally, described motion capture data is expressed as:
C={c 1,...,c M}
Wherein, M is total number of motion fragment, each motion fragment c i(i=1 ..., M) formed by one group of attitude, be expressed as:
c i={p 1,...,p T}
Wherein, T is the frame number of this motion fragment, and each attitude is expressed as:
p t={R,q 0,...,q N}(t=1,...,T)
Wherein,
Figure BDA0000157171870000021
represent the position in the current attitude root of visual human joint; q 0represent visual human's root joint towards, with unit quaternion (w, x, y, z) represent; q n(n=1 ..., N) represent dig up the roots abarticular other joint with respect to father joint towards, N represents the joint number of manikin.
Optionally, step 1 also comprises:
Step 1.1, described motion capture data is divided into motor unit;
Step 1.2, by motor unit cluster, and a class motor unit is defined as to behavior;
The restriction relation of step 1.3, demarcation various actions;
Step 1.4, according to the restriction relation of demarcating, build visual human's behavior figure.
Optionally, step 2 also comprises:
Step 2.1, in state space, carry out sparse sampling, randomly draw n stindividual two tuples
Figure BDA0000157171870000022
Step 2.2, for each two tuples
Figure BDA0000157171870000023
will
Figure BDA0000157171870000024
as initial state,
Figure BDA0000157171870000025
as final state:
Utilize trial and error searching method training N traininferior, find from arrive
Figure BDA0000157171870000027
successful path;
Calculate each state s accessed frequency n (s) of accumulative total in these paths;
Step 2.3, repeat step below, until obtain the subtask specifying number:
Find key state s max, meet s max=arg smaxn (s), as the final state of subtask;
Calculate each state s via key state s maxaccessed frequency n (s, s max);
Calculate n ‾ ( s max ) = avg s n ( s , s max ) ;
Select to meet
Figure BDA0000157171870000032
state s add the state set of this subtask.
Optionally, step 2.1 also comprises:
State is defined as:
s=(B s,x i,y i,z i,θ)
Wherein, B snode in expression behavior figure; (x i, y i, z i) be illustrated in the relative position of visual human and other object in theorem in Euclid space, i=1 ..., n, n represents the number of object in environment; θ represent visual human's root joint towards vector to after x-z plane projection with the angle of positive x direction;
Action definition is:
a=(B a,x mid,z mid)
Wherein, B arepresent current action; (x mid, z mid) represent middle the touchdown point displacement of fragment, be the change in displacement of motion fragment centre moment with respect to initial time.
Optionally, step 3 also comprises:
Learning rate, the discount factor of step 3.1, definition learning model;
Step 3.2, definition one step return function, and initialization accumulation return function;
Step 3.3, choose an original state and choose an optimum action according to existing value function for this original state arbitrarily, state is updated to next state, and revise and expect that accumulative total returns function;
Step 3.4, judge to expect whether accumulation return value restrains, if do not restrained, repeated execution of steps 3.3.
Optionally, step 3.2 comprises:
Define a step return matrix R:
R ( s , a ) = min R , if s 1 = null ; max R , if s 1 = s goal ; - ω T · T ( s , a ) + ω P · P ( a ) , otherwise . (formula 4)
Wherein, state s 1represent that visual human selects next step state after action a under state s; (s a) has described from state s to s T 1physical difference, with visual human position and towards variable quantity represent, its value is less more level and smooth; P (a) has described the use preference of visual human to action a, and its value is more more this action of tendency selection; ω tand ω pbe respectively weighting coefficient; Max R and min R represent the upper bound and the lower bound of R.
Optionally, step 4 also comprises:
Step 4.1, input using given original state as first subtask control strategy, obtain the optimum action sequence of this subtask;
Step 4.2, original state using the final state of first subtask controller as follow-up subtask controller, obtain optimum action sequence corresponding to follow-up subtask successively;
Step 4.3, order are spliced the optimum action sequence of all subtasks, obtain the optimum action sequence of original combined task.
Optionally, step 4.3 comprises:
Note with
Figure BDA0000157171870000042
for 2 motion fragments to be spliced, wherein, i and j are respectively the totalframes of two motion fragments, and the position in visual human's root joint is adopted to linear interpolation, to the motion fragment after adopting hypercomplex number sphere linear interpolation, synthesizing in joint are:
M ~ = M 1 ⊕ M 2 = { p 1 1 , . . . , p i - k 1 , p 1 , . . . , p k , p k + 1 2 , . . . , p j 2 } (formula 7)
Wherein,
R ( p t ) = α ( t ) · R ( p i - k + t 1 ) + [ 1 - α ( t ) ] · R ( p t 2 ) (formula 8)
q ( p t ) = slerp ( q ( p i - k + t 1 ) , q ( p t 2 ) , α ( t ) ) (formula 9)
Wherein, R (p t) expression attitude p troot joint position, q (p t) expression attitude p teach joint towards, in addition,
α ( t ) = 2 ( t - 1 k - 1 ) 3 - 3 ( t - 1 k - 1 ) 2 + 1 , 1≤t≤k (formula 10)
Wherein, fusion coefficients α (t) meets: in the time of t≤1, and α (t)=1; In the time of t>=k, α (t)=0; And α (t) has C everywhere 1continuity.
With respect to prior art, the advantage of the visual human's combined task planing method based on key state provided by the invention is: combined task is resolved into multiple subtasks by (1), and solve in the sub-state space of small-scale, greatly reduce and calculate required computing time and storage space.(2) for visual human's combined task, owing to dividing and rule, each subtask is planned, instruct visual human to arrive quickly to setting the goal thereby therefore can obtain more accurate controller.(3) this algorithm is without the shape of controller value function is made to any hypothesis, only needs to ensure that each state-action is to can, by repeated accesses continually, ensureing to converge to optimal control policy with probability one.
Brief description of the drawings
What Fig. 1 showed is visual human's combined task planing method process flow diagram according to an embodiment of the invention;
What Fig. 2 showed is the data flow diagram that Fig. 1 is corresponding;
That Fig. 3 shows is behavior figure according to an embodiment of the invention;
What Fig. 4 showed is the principle schematic of different motion planing method;
What Fig. 5 showed is the schematic diagram of finding according to an embodiment of the invention key state;
What Fig. 6 showed is subtask selection course schematic diagram according to an embodiment of the invention;
What Fig. 7 showed is the building-up process schematic diagram of motion fragment according to an embodiment of the invention.
Embodiment
In order to make object of the present invention, technical scheme and advantage clearer, below in conjunction with accompanying drawing, the present invention is described in more detail.Should be appreciated that specific embodiment described herein, only in order to explain the present invention, is not intended to limit the present invention.
Visual human's task is complicated abundant, and for completing certain task, visual human need to sequentially carry out multiple steps conventionally.For example: in virtual assembling, virtual people has been the task of fixing model machine, need to take first somewhither screw, then take somewhither screwdriver, finally complete fitting operation to target location.The present invention is defined as atomic task to have motion integrated semantic, that can not segment again.For example " visual human picks up screwdriver from worktable ", " visual human is installed to screw on model machine " can be regarded two different atomic tasks as.Be combined task by the task definition being combined by some subtasks, wherein subtask can be that atomic task can be also combined task.Above-mentioned " visual human fixes model machine " is a combined task.
In the present invention, mission planning refers to resolves into a series of processes of carrying out motion sequence by visual human's a Given task.The Given task of input often represents by dbjective state, and the motion sequence of output is required to meet certain constraint condition.
Inventor finds after deliberation, utilizes the existing planning motion of virtual human method based on strengthening study to plan that combined task is difficult.Its reason is, combined task comprises multiple subtasks, and each subtask has different motion features.Existing method is building when motion model, a dimension using each motion feature in state, and computing time and storage space are exponential growth along with the growth of state number, have dimension blast problem.Inventor also finds after deliberation, if combined task is decomposed into some subtasks, is divided and rule and is planned in each subtask, can reduce problem solving scale, thereby can effectively solve dimension blast problem.
The present invention proposes a kind of visual human's combined task hierarchical reconfiguration planning method based on key state.The method can be divided into two-layer: level strengthens learning model and strengthens learning model.The level on upper strata strengthens learning model, by sparse sampling in state space and search for the successful path of some local task, using the state of accessed frequency maximum as key state, and combined task is decomposed into some subtasks; The enhancing learning model of bottom, by abstract motion fragment for behavior, by abstract environmental information be state, by trial and error search strategy, divide and rule and plan subtask.When resultant motion, only need follow the control strategy of each subtask, select successively motion fragment order splicing.To be described in detail below.
According to one embodiment of present invention, provide a kind of visual human's combined task planing method based on key state, as shown in Figure 1; Be somebody's turn to do the data flow diagram of the visual human's combined task planing method based on key state as shown in Figure 2.In Fig. 2, combined task is decomposed the step S30 in operation corresponding diagram 1 partly, the step S40 in the operation corresponding diagram 1 of subtask planning part, the step S50 in the operation corresponding diagram 1 of flying splice composite part.
With reference to figure 1, visual human's combined task planing method that the present embodiment provides comprises:
S10, gathers motion, obtains dummy role movement capture-data;
S20, structure visual human's behavior figure;
S30, finds key state, and original combined task is decomposed into some subtasks;
S40, learns the optimal control policy of each subtask;
S50, the original state of given visual human in environment, the optimum action sequence of calculation combination task.
Wherein, data preprocessing phase (being that controller model builds the stage) comprises step S10 and S20; The controller training stage comprises step S30 and S40; Motion synthesis phase comprises S50.Build the stage at controller model, user only need define a step return function with meaning directly perceived, just can on higher level, control visual human's motion; In the controller training stage, by finding the method for key state, complicated combined task is resolved into multiple subtasks, and solve in the sub-state space of small-scale, greatly reduce required computing time and storage space; At motion synthesis phase, user only need select optimum action (find under this state and can obtain that action corresponding to greatest hope accumulation return value) just can obtain the optimum action sequence of combined task according to control strategy successively.This process does not relate to computation process consuming time, therefore can meet the demand of real-time application.To introduce in detail one by one each step below.
s10 obtains role movement data
Utilize the various optics of selling on Vehicles Collected from Market, electromagnetic capturing movement equipment, the capture device VICON8 that such as VICON company produces etc., gather dummy role movement data sample.
The exercise data sequence table gathering is shown:
C={c 1,...,c M}
Wherein, M is total number of motion fragment, each motion fragment c i(i=1 ..., M) formed by one group of attitude, be expressed as:
c i={p 1,...,p T}
Wherein, T is the frame number of this motion fragment.
Each attitude can be expressed as:
p t={R,q 0,...,q N}(t=1,...,T)
Wherein, represent the position in the current attitude root of visual human joint; q 0represent visual human's root joint towards, represent by unit quaternion (w, x, y, z), it is 1 constraint that four components of unit quaternion meet quadratic sum; q n(n=1 ..., N) represent dig up the roots abarticular other joint with respect to father joint towards, N represents the joint number of manikin.
The exercise data sample packages gathering, containing multi-motion type, for same class motion, requires it to have similar reference attitude and stops attitude.If exercise data does not meet this condition, exercise data can be done to certain biasing mapping.The biasing mapping of motion is that some amendments are made in the part of a motion fragment, makes it retain the continuity of archeokinetic details and motion as far as possible.
For example, suppose motion fragment c i={ p 1..., p tprocess, the first frame that makes to process rear motion fragment is initial targeted attitude p start.
First, record the root joint R of each attitude in motion fragment c twith joint towards wherein, R trepresent the root joint position of t frame,
Figure BDA0000157171870000073
represent n joint in t frame towards.
Suppose R startfor p startroot joint position.Note
Figure BDA0000157171870000074
for p startin n joint towards.
Because the first frame that requires to process rear motion fragment is p start, therefore calculate this motion fragment the first frame attitude and p startroot joint position discrepancy delta R and joint are as follows respectively towards discrepancy delta q:
ΔR=R start-R 1.
Δ q n = q n start - q n 1 . (formula 1)
If the first frame attitude of this motion fragment is directly revised as to p start, can make the motion fragment after synthesizing occur saltus step, cause moving unsmooth.For making the motion transition nature after synthetic, the front H frame of adjusting caused attitude difference and be distributed to equably motion fragment is gone.As an example of the h frame of amendment motion fragment c example (h=1 ..., H), the root joint position of its amended h frame and joint are towards as follows respectively:
R h=R h+α·ΔR.
(formula 2)
q n h = q n h + α · Δ q n .
Wherein
Figure BDA0000157171870000082
like this, the first frame that obtains motion fragment after treatment is p start.If require the last frame of motion fragment for stopping targeted attitude p end, disposal route is roughly the same, repeats no more.
s20 builds visual human's behavior figure
Behavior comprises independent and complete semanteme conventionally, can regard certain technical performance of visual human as.For the capacity of defining virtual people from higher level, we by abstract motion capture data for behavior.
Behavior is made up of one group of motion fragment that comprises identical constraint frame.Constraint frame refers to the frame that comprises particular constraints; Constraint can be specified by user, is a kind of constraint as visual human's pin lands, and it is another kind of constraint that visual human's hand lifts.For example, the constraint frame of the running fragment of a complete cycle comprises following 3 frames: (1) both feet land simultaneously and left foot front; (2) both feet land simultaneously and right crus of diaphragm front; (3) both feet land simultaneously and left foot front.Again for example, a constraint frame that captures the fragment of object comprises following 2 frames: (1) hand lifts; (2) hand resets.
Different behaviors comprises different constraint frames.For example, walking and running be two kinds of behaviors, wherein, the constraint frame of walking comprises following 3 frames: (1) both feet land simultaneously and left foot front, both hands freely throw away and the right hand front; (2) both feet land simultaneously and right crus of diaphragm front, both hands freely throw away and left hand front; (3) both feet land simultaneously and left foot front, both hands freely throw away and the right hand front.The constraint frame of running comprises 3 frames: (1) both feet land simultaneously and left foot front, both hands lift that elbow is clenched fist and the right hand front; (2) both feet land simultaneously and right crus of diaphragm front, both hands lift that elbow is clenched fist and left hand front; (3) both feet land simultaneously and left foot front, both hands lift that elbow is clenched fist and the right hand front.
For defining virtual people's behavior restriction relation, we have built behavior figure, with reference to figure 3.Behavior figure is a kind of behavior organizational form based on digraph.Wherein, each node represents a class behavior, and directed edge representative connects the transition fragment of two class behaviors.Wherein, solid line limit represents the transition fragment of different rows between being, dotted line limit represents the transition fragment of identical behavior, if do not have limit to connect between two nodes, representing can not transition between this two class behavior.For the motion fragment of any two needs splicing, because the termination attitude of first motion fragment and the reference attitude of second motion fragment can not be identical, if direct splicing there will be saltus step, visually there is shake, therefore need these two motion fragments to do certain fusion.
Continue with reference to figure 3, wherein defined visual human's a behavior figure.Wherein, node " walking " represents that visual human normally walks, node " walking & & left hand " represents that visual human walks while left hand by screw, node " the walking & & right hand " represents that visual human walks the while right hand by screwdriver, node " walking & & both hands " represents to walk visual human, and left hand is by screw, the right hand by screwdriver simultaneously, and node " installation " represents that visual human installs screw.
Behavior figure has described the capacity of visual human in environment and the transition transfer constraint of various actions.Whether " transition shift constraint " refers to and between two class behaviors, can sequentially carry out and saltus step do not occur.Visual human's motion planning, can regard the once traversal on behavior figure as.
It should be noted that the behavior figure in the present embodiment regards the motion fragment with same movement constraint frame as identical behavior.For example, striding with trot is the same class behavior with different motion parameter, and the walking fragment with different strides belongs to Same Vertices in behavior figure.The advantage of doing is like this, behavior figure small scale, thereby train the required time of optimal control policy shorter.
In sum, building visual human's behavior figure can comprise: step 1, input motion are caught fragment; Step 2, capturing movement fragment is divided into motor unit; Step 3, by motor unit cluster, and a class motor unit is defined as to behavior; The restriction relation of step 4, demarcation various actions; Step 5, according to the restriction relation of demarcating, build visual human's behavior figure.
Can, by the restriction relation of the formal define behavior of two tuples, for example, represent that with [B1, B2] visual human is executing after behavior B1, and then act of execution B2.
original combined task is decomposed into some subtasks by S30
Level strengthens study (HRL, Hierarchical Reinforcement Learning) essence be strengthen study basis on increase " abstract " mechanism, overall task is decomposed into the subtask in different levels, each subtask is solved in the less subproblem space of scale, thereby greatly reduce the scale that solves of problem.
As shown in Figure 4, represent the principle schematic of different motion planing method.(a) in Fig. 4 represents the motion planning method based on strengthening study, plans taking motion fragment as unit; (b) in Fig. 4 represents to strengthen based on level the motion planning method of study, taking subtask as unit plans; (c) in Fig. 4 represents the combined task hierarchical reconfiguration planning method based on key state, upper strata strengthens learning method by level and finds key poses by multiple combined task layering subtasks, and lower floor is divided and rule and planned each subtask by enhancing learning method.Wherein, transverse axis represents the time, and the longitudinal axis represents state, and the point (comprising solid dot and hollow dots) on every curve represents a state in corresponding moment.
Inventor finds after deliberation, and level strengthens study as a kind of half Markov process, can in continuous time step (being the time period), move.If this motion sequence completing in step-length in continuous time is defined as to subtask, level enhancing study is unit taking subtask to the planning of motion of virtual human so.Each subtask can be both an atomic task, can be also another subtask, by subtask, upper strata to lower floor subtask or atomic task (being elemental motion) call formation heterarchical architecture, as shown in (b) in Fig. 4.
For combined task (being goal task) is decomposed into some subtasks, state is defined as visual human's residing situation in virtual environment by the present invention, characterized by one group of physical quantity.For example, state can represent the residing position of visual human and towards; Also can represent the interaction feature of visual human and Objects In A Virtual Environment.The state is here different from attitude, and the category of state is wider compared with attitude.For example, the former can also comprise the interaction feature of visual human and Objects In A Virtual Environment.
Each physical quantity of sign state can be a dimension, and for example, positional information is the first dimension, and orientation information is the second dimension, and the interaction feature of visual human and screwdriver is third dimension degree, and the interaction feature of visual human and model machine is fourth dimension degree.
The dividing mode of subtask is not unique, but always has some states frequently to appear on all motion sequences that are successfully completed goal task, and these states can be the subtask of some order splicings by original combined division of tasks.The state space of each subtask is only relevant by the dimension relevant with this subtask, is a subset in virgin state space.For example, if certain subtask is to take screwdriver, " interaction feature of visual human and screwdriver " this dimension in state is just relevant to this subtask so.Pick up screwdriver if certain subtask is visual human, in virgin state, the dimension relevant to this subtask has " interaction feature of visual human and screwdriver ", and " interaction feature of visual human and model machine " this dimension and this subtask are irrelevant.
Ancestral task is decomposed into some subtasks, there is state space, set of actions, value function and control strategy separately the subtask of every one deck, can be obtained the optimal control policy of each subtask, finally be obtained the optimal control policy of whole combined task by study from bottom to top.According to above-mentioned analysis, inventor proposes a kind of two-layer plan model based on key state, for planning visual human's combined task, as shown in (c) in Fig. 4.To describe in detail below.
s301 carries out sparse sampling in state space
Level strengthens learning model and conventionally consists of the following components: state set S, set of actions A, step return function R and a control strategy π.
State is defined as:
s=(B s,x i,y i,z i,θ)
Wherein, B snode in expression behavior figure; (x i, y i, z i) be illustrated in the relative position of visual human and other object in theorem in Euclid space, i=1 ..., n, n represents the number of object in environment; θ represent visual human's root joint towards vector to after x-z plane projection with the angle of positive x direction.
Action definition is:
a=(B a,x mid,z mid)
Wherein, B arepresent current motion fragment; (x mid, z mid) represent the middle touchdown point displacement of this motion fragment, the change in displacement of middle moment of motion fragment with respect to initial time, for example the position in current visual human's root joint is with respect to the change in location in initial time visual human root joint, and virtual human and environment bumps when avoiding resultant motion.Touchdown point displacement in the middle of record, and judge that whether visual human carries out in the process of certain action and Environment Collision at certain state, makes the motion of synthesizing meet environmental constraints.
When the state space that virtual environment is formed carries out sparse sampling, sampling interval is specified by user.In the present embodiment, environment space size is 20 × 20 × 5 (rice 3), sampling interval is Δ x=Δ y=Δ z=1, and angular range is 2 π (radians), and sampling interval is Δ θ=π/6.
State set after sampling is S (s ∈ S), and state number is M.
Set of actions after sampling is A (a ∈ A), and action number is N.
s302 finds key state, and divides combined task and state space based on key state
Each subtask is defined as to tlv triple o:<I, μ, β >, wherein,
for visual human carries out the original state set of this subtask;
μ: s × ∪ A s→ [0,1], represents the inner strategy of this subtask, ∪ A sthe all optional set of actions of expression state s, above formula represents that visual human is under state s, with probability P ∈ [0,1] from set ∪ A saction of middle selection;
β: s → [0,1], represents inside, subtask, and state s is that the probability of final state is P ∈ [0,1].
Subtask is selected, and and if only if that visual human's current state belongs to original state set I, visual human selects action according to inner strategy μ in the time of subtasking, in the time of a certain final state that visual human's current state is subtask, whole subtask is carried out and is finished.
Therefore, visual human's combined task planning problem can be regarded as the select progressively process of subtask.The quality of subtask strategy can be weighed with long-term return function, for example, expects accumulation return value.Select optimum action different from traditional enhancing learning method, what this method was selected at every turn is optimum subtask; Here, optimum action is an action fragment, and optimum subtask is a series of actions fragment.
By V π(s, o) is defined as visual human under current state s, and the expectation accumulation return value of selecting subtasking o to obtain, has:
V π(s, o)=E{r t+ γ r t+1+ ... | ε (o π, s, t) } (formula 3)
Wherein, r trepresent that visual human, under moment t state s, carries out a step return value of o; γ represents discount factor, and 0≤γ≤1 represents that following return value is on present impact, and discount factor is less, represents that visual human more pays close attention to the impact of nearest action, and discount factor is larger, represents that visual human pays close attention to the action in the long period very much; O π represents that visual human arrives after final state according to the inner strategy of o, then selects next action by tactful π; ε (o π, s, t) represents that visual human carries out the event of o under moment t state s.
The existing motion planning method based on enhancing study is the dimension in state using the motion feature of task, and in the time of planning combined task, state space is very large, has the problem of dimension blast.Therefore combined task is decomposed into some subtasks, and subtask is planned respectively, can greatly reduce computing time and storage space.
Inventor finds after deliberation, and visual human, can frequent some state of access in the repeatedly trial that completes some local task, and these states can be regarded as the key state of original combined task.For example, as shown in Figure 5, node represents state, and limit represents a successful access path from given initial state to final state, and many successful paths all state of process are the key states of this Solve problems.Can decompose by the method for extracting key state visual human's combined task.
Utilize key state to decompose original combined task, a kind of feasible method is by random initial state and the final state of specifying in state space, and find the successful path from given initial state to final state, add up respectively each state and appear at the number of times on these successful paths, find the state of accumulative total access times maximum as key state, and then be some subtasks by original combined task division, the state space that is simultaneously subtask by virgin state spatial division.
Be expressed as follows with the corresponding false code of content of above-mentioned steps S30:
Step 1, in state space, randomly draw n stindividual two tuples
Figure BDA0000157171870000121
carry out sparse sampling;
Step 2, for each two tuples
Figure BDA0000157171870000122
will
Figure BDA0000157171870000123
as initial state,
Figure BDA0000157171870000124
as final state:
Step 2.1, utilize trial and error searching method training N traininferior, find from
Figure BDA0000157171870000125
arrive
Figure BDA0000157171870000126
successful path;
Step 2.2, calculate each state s accessed frequency n (s) of accumulative total in these paths; Step 3, repeat step below, until the number of subtask meet the requirements (for example, be n subtask if user specifies by original combined Task-decomposing, subtask number is n):
Step 3.1, find key state s max, meet s max=arg smax n (s), as the final state of subtask;
Step 3.2, calculating n (s, s max), represent that each state s is via key state s maxaccessed number of times;
Step 3.3, calculating n &OverBar; ( s max ) = avg s n ( s , s max ) ;
Step 3.4, selection meet
Figure BDA0000157171870000132
state s add the state set (being state space) of this subtask.
s40 learns the optimal control policy of each subtask
After obtaining some subtasks, we adopt enhancing learning training to obtain the optimal control policy (the Q matrix below optimal control policy correspondence) of each subtask.
Strengthen study, claim again intensified learning, without the given signal of supervising and guiding of user, visual human by with the interaction feedback learning optimal control policy of environment.The basic thought that strengthens study is: if a certain action obtains the positive return of environment, just system selects the trend of this action to strengthen so later; Otherwise system selects the trend of this action just can weaken.Strengthen the target expected returns (or minimization expectation cost) that maximizes exactly of study.
the parameter such as learning rate, discount factor of S401 definition learning model
Strengthening study is a kind of machine learning of increment type, and learning rate α controls the speed of study, 0≤α≤1; Learning rate is larger, restrains sooner, but easily produces vibration, and learning rate is less, restrains slower.
The implication of discount factor γ, with step S302, repeats no more.
Except learning rate and discount factor, also need to define maximum study number of times K, greatest iteration step number E, maximum update times φ.Wherein, start iteration from given initial state, until find the sequence process of dbjective state, be referred to as once to learn.
s402 define a step return function (be visual human in the time of each state, take each action a step return value), and initialization accumulation return function Q
With the step return matrix R that gives a definition.(s a) has defined the step return value of visual human in the time that state s (the state set is here a subset in virgin state space) performs an action a to element R in matrix.This value is larger, and the instant return that visual human obtains is larger, otherwise less; If it is cost in fact that this value, for negative, illustrates.The upper bound and the lower bound of one step return value are specified by user.
R is defined as follows:
R ( s , a ) = min R , if s 1 = null ; max R , if s 1 = s goal ; - &omega; T &CenterDot; T ( s , a ) + &omega; P &CenterDot; P ( a ) , otherwise . (formula 4)
Wherein, state s 1represent that visual human selects next step state after action a under state s.(s a) has described from state s to s T 1physical difference, with visual human position and towards variable quantity represent, its value is less more level and smooth; P (a) has described the use preference of visual human to action a, and its value is more more this action of tendency selection; ω tand ω pbe respectively weighting coefficient; Max R and min R represent the upper bound and the lower bound of R.Formula 4 represents:
The a if visual human cannot perform an action under state s, a step return value is min R; It should be noted that, the situation that a that cannot perform an action under state s comprises has: state s and state s 1unreasonable, or due to behavior figure in behavior restriction relation contradiction (the behavior node from state s can not be transitioned into action a).
The a if visual human can perform an action under state s, and s 1for dbjective state, a step return value is max R; Conventionally max R is made as enough greatly, to guide visual human close towards dbjective state quickly;
The a if visual human can perform an action under state s, but s 1dbjective state, a step return value be state transitions return T (s, a) and the weighted sum of preference return P (a).Due to state transitions return T, (s, a) has described the level and smooth degree of motion transition, and more its value is less, so ω tbefore have negative sign.
Strengthen in study at level, the return function of every one deck is generally selected according to the task object of equivalent layer.For example, for taking the subtask of screwdriver, target is to capture screwdriver at a distance.But consider that, in the learning process of task strategy, the number of times of high-level realization of goal is less, if only using realize target as the sole mode that obtains return, the results of learning of controller will be very poor.Whether therefore a step return function of the present embodiment definition, not only returns the realization of target, also to bumping with virtual environment and the level and smooth degree of splicing of motion fragment is returned.If collision has occurred virtual human and environment or motion fragment assembly is unsmooth, return value is less; If visual human has grabbed screwdriver, return value is larger.
Initialization accumulation return function Q is null matrix, and the line number of matrix Q is identical with matrix R with columns.
In addition, in the time of planning visual human's combined task, user only needs definition status space, motion space and builds a step return function, can obtain the optimal control policy of combined task, thereby it is synthetic on higher level, to control visual human's motion.
s403 chooses arbitrarily an original state and chooses for this original state according to existing value function an optimum action, is updated to next state by state, and revises and expect to add up to return function
(s a), for visual human takes to move the obtainable expectation accumulation of a return value at state s, remembers state s to note Q 1for next state, the mode that adopts iteration to upgrade obtains expecting accumulation return matrix Q:
Q ( s , a ) = ( 1 - &alpha; ) &CenterDot; Q ( s , a ) + &alpha; &CenterDot; { R ( s , a ) + &gamma; &CenterDot; max Q ( s 1 , &cup; A s 1 ) } (formula 5)
Wherein, discount factor γ, learning rate α defines with step S401; expression state s 1optimum move corresponding expectation accumulation return value.
But, if all choose optimum action at every turn, easily make strategy be absorbed in local optimum, therefore, introduce ε greed search strategy, in the time of each selection, choose action corresponding to greatest hope accumulation return value with probability ε, choose other action with probability (1-ε).
s404 judges to expect whether accumulation return value restrains, or whether iterations is greater than given large iterations
Be expressed as follows with the corresponding false code of content of above-mentioned steps S40:
The parameter such as learning rate, discount factor of step 0, definition learning model; (corresponding step S401)
Step 1, definition R matrix, and initialization Q is null matrix; (corresponding step S402)
Step 2, repeat process below (the k time study):
Step 2.1, select an original state s arbitrarily; (corresponding step S403)
Step 2.2, repetition (the e time iteration):
Step 2.2.1, the current Q matrix of foundation are chosen action a, and (s a), obtains next state s to obtain a step return value R 1; (corresponding step S403)
Step 2.2.2, according to formula 5 upgrade Q (s, a); (corresponding step S403)
Step 2.2.3, current state is upgraded: s=s 1; (corresponding step S403)
If step 2.2.4 is s=s goal, or iterative steps e>=E, finish; Otherwise, execution step 2.2.1.(corresponding step S404)
If step 3 k >=K, or Q matrix exceedes φ time and do not upgrade, and finishes; Otherwise, execution step 2.(corresponding step S404)
In theory, only need each state-action to (s a) can, by repeated accesses continually, just can ensure that this algorithm converges to optimum expectation accumulated value function Q with probability one *.
the original state of the given visual human of S50 in environment, calculates optimum action sequence
the input of S501 using given original state as first subtask control strategy, obtains this the optimum action sequence of subtask
First arbitrary original state s of given visual human 0, find under this state and can obtain that subtask o corresponding to greatest hope accumulation return value according to formula (3) 1=arg omaxV π(s 0, o).
For subtask o 1, according to the expectation accumulation return matrix Q of its correspondence, obtain o 1optimum action sequence.Because Q is initialized as null matrix, Q while therefore convergence (s, a)>=0,
Figure BDA0000157171870000161
(s, a) larger, a that performs an action under expression state s is more reasonable, can more quickly arrive dbjective state for Q.
For some state s, (, a) there is two or more maximal values in s to Q, illustrates that the expectation accumulation return value of taking these actions to obtain is identical.For this situation, system is selected arbitrary action randomly, and current state is updated to next state s 1.If reached dbjective state, stop; Otherwise, reselect optimum action, and new state more, until arrive the dbjective state s of this subtask goaltill:
Figure BDA0000157171870000162
The action sequence of gained is the optimum action sequence μ under original state 1=μ (s 0)={ a 0, a 1... }, by the optimum action sequence order splicing obtaining, and can carry out Motion fusion in fragment head and the tail junction, generate with given original state s 0the optimum action sequence μ of first subtask control strategy starting 1.
s502 using the final state of first subtask controller as follow-up subtask controller at the beginning of beginning state, obtains optimum action sequence corresponding to follow-up subtask successively
Original state using the final state of first subtask controller as follow-up subtask controller, obtains the optimum action subsequence μ of follow-up subtask 2, μ 3deng, as shown in Figure 6.
s503 order is spliced the optimum action sequence of all subtasks, obtains original combined task excellent action sequence
Note
Figure BDA0000157171870000163
with
Figure BDA0000157171870000164
for 2 motion fragments to be spliced.Wherein, i and j are respectively the totalframes of two motion fragments.Seamlessly transit for realizing, the position in visual human's root joint adopted to linear interpolation, to the motion fragment after adopting hypercomplex number sphere linear interpolation, synthesizing in joint be:
M ~ = M 1 &CirclePlus; M 2 = { p 1 1 , . . . , p i - k 1 , p 1 , . . . , p k , p k + 1 2 , . . . , p j 2 } (formula 7)
Wherein,
R ( p t ) = &alpha; ( t ) &CenterDot; R ( p i - k + t 1 ) + [ 1 - &alpha; ( t ) ] &CenterDot; R ( p t 2 ) (formula 8)
q ( p t ) = slerp ( q ( p i - k + t 1 ) , q ( p t 2 ) , &alpha; ( t ) ) (formula 9)
Wherein, R (p t) expression attitude p troot joint position, q (p t) expression attitude p teach joint towards, in addition,
&alpha; ( t ) = 2 ( t - 1 k - 1 ) 3 - 3 ( t - 1 k - 1 ) 2 + 1 , 1≤t≤k (formula 10)
Wherein, fusion coefficients α (t) meets: in the time of t≤1, and α (t)=1; In the time of t >=k, α (t)=0.And α (t) has C everywhere 1continuity.
In the present embodiment, said process as shown in Figure 7, synthesizes front motion fragment M 1there is i frame, M 2there is j frame.If merging window selected is k frame, and the frame number of final synthetic motion fragment is i+j-k.Front i-k frame and the M of the motion fragment after synthetic 1front i-k frame just the same, rear j-k frame and M 2rear j-k frame just the same, middle k frame utilizes linear interpolation to obtain.
The invention provides a kind of general, efficient method and plan visual human's combined task, thereby synthesize the motion of virtual role with a kind of control device of higher level.The method by abstract goal task be subtask in different levels, and training obtains the control strategy of each subtask in less subproblem space, thereby reduces the scale that solves of problem, and accelerates the speed that solves of problem.So-called " strategy ", for example refers to visual human's state, to the mapping relations of action (, one to one, the mapping relations of one-to-many), and visual human selects the foundation of action, thereby can from environment, obtain greatest hope accumulation return value.
The visual human's combined task planing method based on key state that the present invention proposes, its advantage is: combined task is resolved into multiple subtasks by (1), and solve in the sub-state space of small-scale, greatly reduce required computing time and storage space.(2) for visual human's combined task, owing to dividing and rule, each subtask is planned, instruct visual human to arrive quickly to setting the goal thereby therefore can obtain more accurate controller.(3) this algorithm is without the shape of controller value function is made to any hypothesis, only needs to ensure that each state-action is to can, by repeated accesses continually, ensureing to converge to optimal control policy with probability one.
Should be noted that and understand, in the situation that not departing from the desired the spirit and scope of the present invention of accompanying claim, can make various amendments and improvement to the present invention of foregoing detailed description.Therefore, the scope of claimed technical scheme is not subject to the restriction of given any specific exemplary teachings.

Claims (6)

1. visual human's combined task planing method, comprising:
Step 1, based on motion capture data, build visual human's behavior figure;
Step 2, find key state, and based on key state, combined task is decomposed into subtask;
Step 3, learn the optimal control policy of each subtask; With
Step 4, original state based on visual human in environment, the optimum action sequence of calculation combination task;
Wherein, described step 1 also comprises:
Step 1.1, described motion capture data is divided into motor unit;
Step 1.2, by motor unit cluster, and a class motor unit is defined as to behavior;
The restriction relation of step 1.3, demarcation various actions; With
Step 1.4, according to the restriction relation of demarcating, build visual human's behavior figure;
Described step 2 also comprises: by the successful path of sparse sampling in state space search mission, find described key state according to the accessed frequency of each state;
Described step 3 also comprises:
Learning rate, the discount factor of step 3.1, definition learning model;
Step 3.2, definition one step return function, and initialization accumulation return function;
Step 3.3, choose an original state and choose an optimum action according to existing value function for this original state arbitrarily, state is updated to next state, and revise and expect that accumulative total returns function; With
Step 3.4, judge to expect whether accumulation return value restrains, if do not restrained, repeated execution of steps 3.3;
Described step 4 also comprises:
Step 4.1, input using given original state as first subtask control strategy, obtain the optimum action sequence of this subtask;
Step 4.2, original state using the final state of first subtask controller as follow-up subtask controller, obtain optimum action sequence corresponding to follow-up subtask successively;
Step 4.3, order are spliced the optimum action sequence of all subtasks, obtain the optimum action sequence of original combined task.
2. visual human's combined task planing method according to claim 1, wherein, described motion capture data is expressed as:
C={c 1,...,c M}
Wherein, M is total number of motion fragment, each motion fragment c i(i=1 ..., M) formed by one group of attitude, be expressed as:
c i={p 1,...,p T}
Wherein, T is the frame number of this motion fragment, and each attitude is expressed as:
p t={R,q 0,...,q N}(t=1,...,T)
Wherein, R ∈ R 3, the position in the expression current attitude root of visual human joint; q 0represent visual human's root joint towards, with unit quaternion (w, x, y, z) represent; q n(n=1 ..., N) represent dig up the roots abarticular other joint with respect to father joint towards, N represents the joint number of manikin.
3. visual human's combined task planing method according to claim 1, wherein, step 2 also comprises:
Step 2.1, in state space, carry out sparse sampling, randomly draw n stindividual two tuples
Step 2.2, for each two tuples
Figure FDA0000438629000000022
will
Figure FDA0000438629000000023
as initial state, as final state:
Utilize trial and error searching method training N traininferior, find from
Figure FDA0000438629000000025
arrive successful path;
Calculate each state s accessed frequency n (s) of accumulative total in these paths; With
Step 2.3, repeat step below, until obtain the subtask specifying number:
Find key state s max, meet s max=arg smaxn (s), as the final state of subtask;
Calculate each state s via key state s maxaccessed frequency n (s, s max);
Calculate n &OverBar; ( s max ) = avg s n ( s , s max ) ;
Select to meet state s add the state set of this subtask.
4. visual human's combined task planing method according to claim 3, wherein, step 2.1 also comprises:
State is defined as:
s=(B s,x i,y i,z i,θ)
Wherein, B snode in expression behavior figure; (x i, y i, z i) be illustrated in the relative position of visual human and other object in theorem in Euclid space, i=1 ..., n, n represents the number of object in environment; θ represent visual human's root joint towards vector to after x-z plane projection with the angle of positive x direction;
Action definition is:
a=(B a,x mid,z mid)
Wherein, B arepresent current action; (x mid, z mid) represent middle the touchdown point displacement of fragment, be the change in displacement of motion fragment centre moment with respect to initial time.
5. visual human's combined task planing method according to claim 1, wherein, step 3.2 comprises:
Define a step return matrix R:
R ( s , a ) = min R , if s 1 = null ; max R , if s 1 = s goal - &omega; T &CenterDot; T ( s , a ) + &omega; P &CenterDot; P ( a ) , otherwise . ; (formula 4)
Wherein, state s 1represent that visual human selects next step state after action a under state s; (s a) has described from state s to s T 1physical difference, with visual human position and towards variable quantity represent, its value is less more level and smooth; P (a) has described the use preference of visual human to action a, and its value is more more this action of tendency selection; ω tand ω pbe respectively weighting coefficient; Max R and min R represent the upper bound and the lower bound of R; S goalrepresent dbjective state.
6. visual human's combined task planing method according to claim 1, wherein, step 4.3 comprises:
Note
Figure FDA0000438629000000036
with
Figure FDA0000438629000000037
for 2 motion fragments to be spliced, wherein, i and j are respectively the totalframes of two motion fragments, and the position in visual human's root joint is adopted to linear interpolation, to the motion fragment after adopting hypercomplex number sphere linear interpolation, synthesizing in joint are:
M = M 1 &CirclePlus; M 2 = { p 1 1 , . . . , p i - k 1 , p 1 , . . . , p k , p k + 1 2 , . . . p j 2 } (formula 7)
Wherein,
R ( p t ) = &alpha; ( t ) &CenterDot; R ( p i - k + t 1 ) + [ 1 - &alpha; ( t ) ] &CenterDot; R ( p t 2 ) (formula 8)
q ( p t ) = slerp ( q ( p i - k + t 1 ) , q ( p t 2 ) , &alpha; ( t ) ) (formula 9)
Wherein, R (p t) expression attitude p troot joint position, q (p t) expression attitude p teach joint towards, in addition,
&alpha; ( t ) = 2 ( t - 1 k - 1 ) 3 - 3 ( t - 1 k - 1 ) 2 + 1,1 &le; t &le; k (formula 10)
Wherein, fusion coefficients α (t) meets: in the time of t≤1, and α (t)=1; In the time of t>=k, α (t)=0; And α (t) has C everywhere 1continuity.
CN201210125122.4A 2012-04-25 2012-04-25 Method for planning combined tasks for virtual human Active CN102708377B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210125122.4A CN102708377B (en) 2012-04-25 2012-04-25 Method for planning combined tasks for virtual human

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210125122.4A CN102708377B (en) 2012-04-25 2012-04-25 Method for planning combined tasks for virtual human

Publications (2)

Publication Number Publication Date
CN102708377A CN102708377A (en) 2012-10-03
CN102708377B true CN102708377B (en) 2014-06-25

Family

ID=46901120

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210125122.4A Active CN102708377B (en) 2012-04-25 2012-04-25 Method for planning combined tasks for virtual human

Country Status (1)

Country Link
CN (1) CN102708377B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105739505B (en) * 2016-04-13 2018-09-04 上海物景智能科技有限公司 A kind of controlling of path thereof and system of robot
CN106940594B (en) * 2017-02-28 2019-11-22 深圳信息职业技术学院 A kind of visual human and its operation method
CN107301370B (en) * 2017-05-08 2020-10-16 上海大学 Kinect three-dimensional skeleton model-based limb action identification method
CN108197871A (en) * 2018-01-19 2018-06-22 顺丰科技有限公司 The mission planning method and system that express delivery receipts are dispatched officers
CN109470263B (en) * 2018-09-30 2020-03-20 北京诺亦腾科技有限公司 Motion capture method, electronic device, and computer storage medium
EP3792483A1 (en) * 2019-09-16 2021-03-17 Siemens Gamesa Renewable Energy A/S Wind turbine control based on reinforcement learning
CN111552301B (en) * 2020-06-21 2022-05-20 南开大学 Hierarchical control method for salamander robot path tracking based on reinforcement learning
CN112765339B (en) * 2021-01-21 2022-10-04 山东师范大学 Personalized book recommendation method and system based on reinforcement learning
CN113043275B (en) * 2021-03-29 2022-05-24 南京工业职业技术大学 Micro-part assembling method based on expert demonstration and reinforcement learning

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1478638A (en) * 2003-05-29 2004-03-03 上海交通大学 Robot unirersal open control system facing object

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4709723B2 (en) * 2006-10-27 2011-06-22 株式会社東芝 Attitude estimation apparatus and method
CN102016745B (en) * 2008-01-23 2015-11-25 加州大学评议会 For the system and method for behavior monitoring and correction

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1478638A (en) * 2003-05-29 2004-03-03 上海交通大学 Robot unirersal open control system facing object

Also Published As

Publication number Publication date
CN102708377A (en) 2012-10-03

Similar Documents

Publication Publication Date Title
CN102708377B (en) Method for planning combined tasks for virtual human
Du et al. A survey on multi-agent deep reinforcement learning: from the perspective of challenges and applications
Tai et al. A survey of deep network solutions for learning control in robotics: From reinforcement to imitation
Ingrand et al. Deliberation for autonomous robots: A survey
Gupta et al. Half a dozen real-world applications of evolutionary multitasking, and more
Gupta et al. Cognitive mapping and planning for visual navigation
Ivanovic et al. Generative modeling of multimodal multi-human behavior
Amarjyoti Deep reinforcement learning for robotic manipulation-the state of the art
Bagnell An invitation to imitation
Cutler et al. Real-world reinforcement learning via multifidelity simulators
Chen et al. Stabilization approaches for reinforcement learning-based end-to-end autonomous driving
Badgwell et al. Reinforcement learning–overview of recent progress and implications for process control
Baudouin et al. Real-time replanning using 3D environment for humanoid robot
Morales et al. A survey on deep learning and deep reinforcement learning in robotics with a tutorial on deep reinforcement learning
Fernandez-Gonzalez et al. Scottyactivity: Mixed discrete-continuous planning with convex optimization
CN113050640B (en) Industrial robot path planning method and system based on generation of countermeasure network
Liu et al. Map-based deep imitation learning for obstacle avoidance
CN112629542A (en) Map-free robot path navigation method and system based on DDPG and LSTM
CN114603564A (en) Mechanical arm navigation obstacle avoidance method and system, computer equipment and storage medium
Rupprecht et al. A survey for deep reinforcement learning in markovian cyber–physical systems: Common problems and solutions
Wang et al. GOPS: A general optimal control problem solver for autonomous driving and industrial control applications
CN115659275A (en) Real-time accurate trajectory prediction method and system in unstructured human-computer interaction environment
Okuyama et al. Minimum-time trajectory planning for a differential drive mobile robot considering non-slipping constraints
Zhu et al. Deep reinforcement learning for real-time assembly planning in robot-based prefabricated construction
Karkus et al. Beyond tabula-rasa: a modular reinforcement learning approach for physically embedded 3d sokoban

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant