CN106094516A - A kind of robot self-adapting grasping method based on deeply study - Google Patents

A kind of robot self-adapting grasping method based on deeply study Download PDF

Info

Publication number
CN106094516A
CN106094516A CN201610402319.6A CN201610402319A CN106094516A CN 106094516 A CN106094516 A CN 106094516A CN 201610402319 A CN201610402319 A CN 201610402319A CN 106094516 A CN106094516 A CN 106094516A
Authority
CN
China
Prior art keywords
robot
target
network
photo
deeply
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610402319.6A
Other languages
Chinese (zh)
Inventor
陈春林
侯跃南
刘力锋
魏青
徐旭东
朱张青
辛博
马海兰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN201610402319.6A priority Critical patent/CN106094516A/en
Publication of CN106094516A publication Critical patent/CN106094516A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/04Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
    • G05B13/042Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance

Abstract

The invention provides a kind of robot self-adapting grasping method based on deeply study, step includes: in distance in time capturing target certain distance, robot obtains the photo of target by anterior photographic head, utilize binocular distance-finding method to calculate the positional information of target further according to photo, and the positional information calculated is used for robot navigation;When target enters in the range of mechanical arm is grabbed, then the photo by anterior photographic head photographic subjects, and the deeply learning network based on DDPG utilizing training in advance to cross carries out Data Dimensionality Reduction feature extraction to photo;Draw the control strategy of robot according to feature extraction result, robot utilizes control strategy to control the pose of motion path and mechanical arm, thus realizes the self-adapting grasping of target.This grasping means can realize self-adapting grasping to size shape difference, the unfixed object in position, has good market application foreground.

Description

A kind of robot self-adapting grasping method based on deeply study
Technical field
The present invention relates to a kind of method that robot captures object, a kind of robot based on deeply study Self-adapting grasping method.
Background technology
Autonomous robot is the most intelligentized service type robot, has the learning functionality of environment to external world.For reality The function of existing various basic activities (such as location, mobile, crawl), needs robot be furnished with mechanical arm and mechanical paw and merge The information of multisensor carries out machine learning (such as degree of depth study and intensified learning), interacts with external environment, it is achieved its The various functions such as perception, decision-making and action.Now most capture humanoid robots be operated in object size to be captured, shape and The relatively-stationary situation in position, and the technology that captures is mainly based upon the sensors such as ultrasound wave, infrared and laser ranging, therefore makes The most restricted by scope, it is impossible to adaptation crawl environment is increasingly complex, capture object size, the unfixed situation of shape and position; At present, existing optic type robotics is difficult to solve visual information dimension is high, data volume is big " dimension disaster " of input Problem;Further, the neutral net utilizing machine learning to train also is difficult to convergence, it is impossible to directly process the image information of input.Always For body, present optic type captures the control technology of service robot and not yet reaches gratifying result, especially in practicality In also need to optimize further.
Summary of the invention
The technical problem to be solved in the present invention be existing cannot adapt to capture environment increasingly complex, capture object size, The unfixed situation of shape and position.
In order to solve above-mentioned technical problem, the invention provides a kind of robot self adaptation based on deeply study and grab Access method, comprises the steps:
Step 1, in distance in time capturing target certain distance, robot obtains the photograph of target by anterior photographic head Sheet, utilizes binocular distance-finding method to calculate the positional information of target further according to photo, and the positional information calculated is used for machine Device people navigates;
Step 2, robot moves according to navigation, when target enters in the range of mechanical arm is grabbed, then by front portion The photo of photographic head photographic subjects, and photo carries out by the deeply learning network based on DDPG utilizing training in advance to cross Data Dimensionality Reduction feature extraction;
Step 3, draws the control strategy of robot according to feature extraction result, and robot utilizes control strategy to control fortune Dynamic path and the pose of mechanical arm, thus realize the self-adapting grasping of target.
Binocular distance-finding method is utilized to calculate target as limiting further in scheme, step 1 of the present invention according to photo The concretely comprising the following steps of positional information:
Step 1.1, obtains the focal distance f of photographic head, centre-to-centre spacing T of two photographic head in left and rightxAnd impact point is in left and right two The subpoint of the image plane of individual photographic head is to physical distance x of the respective image plane leftmost sidelAnd xr, two, left and right photographic head correspondence The image plane in left side and the image plane on right side be rectangle plane, and be positioned on same imaging plane, two, left and right photographic head Photocentre projection lay respectively at the center of corresponding image plane, then parallax d is:
D=xl-xr (1)
Step 1.2, utilizes Similar Principle of Triangle to set up Q matrix to be:
Q = 1 0 0 - c x 0 1 0 - c y 0 0 0 f 0 0 - 1 T x c x - c x ′ T x - - - ( 2 )
Q x y d 1 = x - c x y - c y f - d + c x - c x ′ T x = X Y Z W - - - ( 3 )
In formula (2) and (3), (X, Y, Z) is impact point seat in the three-dimensional coordinate system with left photographic head photocentre as initial point Mark, W is for rotating translation conversion ratio example coefficient, and (x, y) is impact point coordinate in the image plane in left side, cxAnd cyIt is respectively a left side The coordinate system of the image plane of side and the image plane on right side and the side-play amount of initial point, c in three-dimensional coordinate systemx' for cxCorrection value;
Step 1.3, being calculated impact point to the space length of imaging plane is:
Z = - T x f d - ( c x - c x ′ ) - - - ( 4 )
Using the photocentre position of left photographic head as robot position, by the co-ordinate position information of impact point (X, Y, Z) carry out robot navigation as navigation purpose.
The deeply based on DDPG utilizing training in advance to cross in scheme, step 2 is limited further as the present invention Learning network carries out Data Dimensionality Reduction feature extraction to photo and concretely comprises the following steps:
Step 2.1, utilizes target to capture process and meets intensified learning and meet the condition of Markov character, when calculating t Observed quantity and the collection of action before quarter are combined into:
st=(x1,a1,...,at-1,xt)=xt (5)
In formula (5), xtAnd atThe observed quantity being respectively t and the action taked;
Step 2.2, Utilization strategies value function describes the prospective earnings of crawl process and is:
Qπ(st,at)=E [Rt|st,at] (6)
In formula (6),The future profits summation that discount is later, γ was beaten for what moment t obtained ∈ [0,1] is discount factor, r (st,at) it is the revenue function of moment t, T is to capture the moment terminated, and π is for capturing strategy;
Target strategy π owing to capturing presets and determines, being designated as function mu: S ← A, S are state space, A is N-dimensional degree Motion space, utilize Bellman equation to process formula (6) has simultaneously:
Q μ ( s t , a t ) = E s t + 1 ~ E [ r ( s t , a t ) + γQ μ ( s t + 1 , μ ( s t + 1 ) ) ] - - - ( 7 )
In formula (7), st+1~E represents that the observed quantity in t+1 moment obtains from environment E, μ (st+1) represent the t+1 moment The action being be mapped to by function mu from observed quantity;
Step 2.3, utilizes the principle of maximal possibility estimation, by minimizing loss function and updating network weight parameter is θQPolicy evaluation network Q (s, a | θQ), the loss function used is:
L(θQ)=Eμ'[(Q(st,atQ)-yt)2] (8)
In formula (8), yt=r (st,at)+γQ(st+1,μ(st+1)|θQ) it is that target strategy assesses network, μ ' is target plan Slightly;
Step 2.4, is θ for actual parameterμStrategic function μ (s | θμ), the gradient utilizing chain method to obtain is:
▿ θ μ μ ≈ E μ ′ [ ▿ θ μ Q ( s , a | θ Q ) | s = s t , a = μ ( s t | θ μ ) ] = E μ ′ [ ▿ a Q ( s , a | θ Q ) | s = s t , a = μ ( s t ) ▿ θ μ ( s | θ μ ) | s = s t ] - - - ( 9 )
Be Policy-Gradient by formula (9) calculated gradient, recycling Policy-Gradient update strategic function μ (s | θμ);
Step 2.5, utilizes and carrys out training network from policing algorithm, and the sample data used in network training is from same sample Relief area obtains, to minimize the relatedness between sample, trains neutral net, i.e. with a target Q value network simultaneously Using experience replay mechanism and target Q value network method for the renewal of objective network, the slow more New Policy used is:
θQ'←τθQ+(1-τ)θQ' (10)
θμ'←τθμ+(1-τ)θμ' (11)
In formula (10) and (11), τ is turnover rate, and τ < < 1 the most just constructs a deeply based on DDPG Practise network, and be the neutral net of convergence;
Step 2.6, utilizes the deeply learning network built that photo is carried out Data Dimensionality Reduction feature extraction, it is thus achieved that machine The control strategy of device people.
The deeply learning network limited further in scheme, step 2.6 as the present invention is inputted by an image Layer, two convolutional layers, two full articulamentums and an output layer are constituted, and image input layer comprises object to be captured for input Image;Convolutional layer is used for extracting feature, i.e. the deep layer form of expression of an image;Full articulamentum and output layer are for composition one Individual deep layer network, after training, input feature vector information, to this most exportable control instruction of deep layer network, i.e. controls robot Mechanical arm steering wheel angle and control carry dolly DC motor speed.By selected convolutional layer and the number of full articulamentum Amount be the purpose of two be both can effectively to have extracted characteristics of image, again so that neutral net training time be easy to convergence.
The beneficial effects of the present invention is: during (1) pre-training neutral net, use experience replay mechanism and stochastical sampling true Before and after the image information of fixed input can effectively solve photo, degree of association is unsatisfactory for more greatly neutral net for input data each other The problem of demand for independence;(2) realize Data Dimensionality Reduction by degree of depth study, use target Q value network technique constantly to adjust nerve net The weight matrix of network, can ensure the neutral net convergence of training as much as possible;(3) degree of depth based on DDPG trained Intensified learning neutral net can realize Data Dimensionality Reduction and object feature extraction, and directly gives the motor control plan of robot Slightly, " dimension disaster " problem is effectively solved.
Accompanying drawing explanation
Fig. 1 is the system structure schematic diagram of the present invention;
Fig. 2 is the method flow diagram of the present invention;
Fig. 3 is the binocular distance-finding method floor map of the present invention;
Fig. 4 is the binocular ranging technology schematic perspective view of the present invention;
Fig. 5 is the composition schematic diagram of the deeply learning network based on DDPG of the present invention.
Detailed description of the invention
As it is shown in figure 1, the system bag of a kind of based on deeply learning method the robot self-adapting grasping of the present invention Include: image processing system, wireless telecommunication system and robot motion's system.
Wherein, image processing system mainly has photographic head and the matlab software sharing being arranged on robot front portion;Wireless Communication system is mainly made up of WIFI module;Robot motion's system is mainly made up of base dolly and mechanical arm;First need Will be by dynamics simulation platform pre-training deeply learning network based on DDPG (degree of depth deterministic policy gradient), at this During generally use experience replay mechanism and target Q value network both approaches to guarantee that deeply based on DDPG learns Network can be restrained during pre-training, and then image processing system obtains the image of target object, passes through wireless telecommunication system Image information is passed to computer, robot distance wait capture object farther out time, use binocular ranging technology, to obtain object The positional information of body also uses it for the navigation of robot.
When robot moves and can catch object to mechanical arm, shoot object picture the most again and utilize the most trained Good deeply learning network based on DDPG realizes Data Dimensionality Reduction and extracts feature and provide the control strategy of robot, finally Send control strategy to robot motion system by wireless telecommunication system and control the kinestate of robot, it is achieved target The accurate crawl of object.
First with matlab software, the RGB image of target object is converted into gray level image during pre-training, then uses warp Test playback mechanism so that before and after photo degree of association the least with meet neutral net for input data independent of each other want Ask, obtain the image of input neural network finally by stochastical sampling;Realize Data Dimensionality Reduction by degree of depth study, use target Q-value network technique constantly adjusts the weight matrix of neutral net, finally gives the neutral net of convergence.
The control of robot Arduino plate realizes, and plate has carried WIFI module, and mechanical arm is made up of 4 steering wheels, Realizing 4 degree of freedom altogether, base dolly is by DC motor Driver;Image processing system is mainly soft by photographic head and image transmitting thereof Part and matlab are main;The photo of the target object that photographic head photographs will be transferred to electricity by the WIFI module on Arduino plate Brain, and transfer to matlab process.
Operationally, step is as follows for system:
Step 1, it is necessary first to based on DDPG (degree of depth deterministic policy gradient) by dynamics simulation platform pre-training Deeply learning network, the most generally uses experience replay mechanism and target Q value network both approaches to guarantee Deeply learning network based on DDPG can restrain during pre-training;
Step 2, obtains the image of target object, utilizes WIFI module by image with the photographic head being arranged on robot front portion Information passes to computer;
Step 3, robot distance wait capture object farther out time, use binocular ranging technology, to obtain target object Positional information also uses it for the navigation of robot;
Step 4, when robot moves and can catch object to mechanical arm, shoots object picture the most again and utilizes Trained good deeply learning network based on DDPG realizes Data Dimensionality Reduction and extracts feature and provide the control plan of robot Slightly;
Step 5, utilizes WIFI module to send control information to robot motion system, it is achieved accurately grabbing of target object Take;
As shown in Figure 3 and Figure 4, binocular ranging technology mainly make use of impact point imaging horizontal on the width view of left and right two The difference (i.e. parallax) that coordinate directly exists also exists inversely proportional relation with the distance of impact point to imaging plane.Ordinary circumstance Under, the dimension of focal length is pixel, and the dimension of photographic head centre-to-centre spacing is by the tessellated actual size of calibration plate and our input Value determines, is usually in units of millimeter (in order to improve precision, we are set to 0.1 millimeter of magnitude), and the dimension of parallax is also picture Vegetarian refreshments.Therefore molecule denominator is divided out, and the dimension of the distance of impact point to imaging plane is identical with photographic head centre-to-centre spacing.
As it is shown in figure 5, deeply learning network based on DDPG mainly by an image input layer, two convolutional layers, Two full articulamentums, output layers are constituted.The degree of depth network architecture is used for realizing Data Dimensionality Reduction, and convolutional layer is used for extracting feature, Output layer output control information.
As in figure 2 it is shown, the invention provides a kind of robot self-adapting grasping method based on deeply study, including Following steps:
Step 1, in distance in time capturing target certain distance, robot obtains the photograph of target by anterior photographic head Sheet, utilizes binocular distance-finding method to calculate the positional information of target further according to photo, and the positional information calculated is used for machine Device people navigates;
Step 2, robot moves according to navigation, when target enters in the range of mechanical arm is grabbed, then by front portion The photo of photographic head photographic subjects, and photo carries out by the deeply learning network based on DDPG utilizing training in advance to cross Data Dimensionality Reduction feature extraction;
Step 3, draws the control strategy of robot according to feature extraction result, and robot utilizes control strategy to control fortune Dynamic path and the pose of mechanical arm, thus realize the self-adapting grasping of target.
Wherein, step 1 utilizes binocular distance-finding method to calculate the concretely comprising the following steps of positional information of target according to photo:
Step 1.1, obtains the focal distance f of photographic head, centre-to-centre spacing T of two photographic head in left and rightxAnd impact point is in left and right two The subpoint of the image plane of individual photographic head is to physical distance x of the respective image plane leftmost sidelAnd xr, two, left and right photographic head correspondence The image plane in left side and the image plane on right side be rectangle plane, and be positioned on same imaging plane, two, left and right photographic head Photocentre projection lay respectively at the center of corresponding image plane, i.e. Ol、OrAt the subpoint of imaging plane, then parallax d is:
D=xl-xr (1)
Step 1.2, utilizes Similar Principle of Triangle to set up Q matrix to be:
Q = 1 0 0 - c x 0 1 0 - c y 0 0 0 f 0 0 - 1 T x c x - c x ′ T x - - - ( 2 )
Q x y d 1 = x - c x y - c y f - d + c x - c x ′ T x = X Y Z W - - - ( 3 )
In formula (2) and (3), (X, Y, Z) is impact point seat in the three-dimensional coordinate system with left photographic head photocentre as initial point Mark, W is for rotating translation conversion ratio example coefficient, and (x, y) is impact point coordinate in the image plane in left side, cxAnd cyIt is respectively a left side The coordinate system of the image plane of side and the image plane on right side and the side-play amount of initial point, c in three-dimensional coordinate systemx' for cxCorrection value (two Person's numerical value is typically more or less the same, in the present invention it is believed that both approximately equals);
Step 1.3, being calculated impact point to the space length of imaging plane is:
Z = - T x f d - ( c x - c x ′ ) - - - ( 4 )
Using the photocentre position of left photographic head as robot position, by the co-ordinate position information of impact point (X, Y, Z) carry out robot navigation as navigation purpose.
It is special that the deeply learning network based on DDPG utilizing training in advance to cross in step 2 carries out Data Dimensionality Reduction to photo Levy concretely comprising the following steps of extraction:
Step 2.1, utilizes target to capture process and meets intensified learning and meet the condition of Markov character, when calculating t Observed quantity and the collection of action before quarter are combined into:
st=(x1,a1,...,at-1,xt)=xt (5)
In formula (5), xtAnd atThe observed quantity being respectively t and the action taked;
Step 2.2, Utilization strategies value function describes the prospective earnings of crawl process and is:
Qπ(st,at)=E [Rt|st,at] (6)
In formula (6),The future profits summation that discount is later, γ was beaten for what moment t obtained ∈ [0,1] is discount factor, r (st,at) it is the revenue function of moment t, T is to capture the moment terminated, and π is for capturing strategy;
Target strategy π owing to capturing presets and determines, being designated as function mu: S ← A, S are state space, A is N-dimensional degree Motion space, utilize Bellman equation to process formula (6) has simultaneously:
Q μ ( s t , a t ) = E s t + 1 ~ E [ r ( s t , a t ) + γQ μ ( s t + 1 , μ ( s t + 1 ) ) ] - - - ( 7 )
In formula (7), st+1~E represents that the observed quantity in t+1 moment obtains from environment E, μ (st+1) represent t+1
The action that moment is be mapped to by function mu from observed quantity;
Step 2.3, utilizes the principle of maximal possibility estimation, by minimizing loss function and updating network weight parameter is θQPolicy evaluation network Q (s, a | θQ), the loss function used is:
L(θQ)=Eμ'[(Q(st,atQ)-yt)2] (8)
In formula (8), yt=r (st,at)+γQ(st+1,μ(st+1)|θQ) it is that target strategy assesses network, μ ' is target plan Slightly;
Step 2.4, is θ for actual parameterμStrategic function μ (s | θμ), the gradient utilizing chain method to obtain is:
▿ θ μ μ ≈ E μ ′ [ ▿ θ μ Q ( s , a | θ Q ) | s = s t , a = μ ( s t | θ μ ) ] = E μ ′ [ ▿ a Q ( s , a | θ Q ) | s = s t , a = μ ( s t ) ▿ θ μ ( s | θ μ ) | s = s t ] - - - ( 9 )
Be Policy-Gradient by formula (9) calculated gradient, recycling Policy-Gradient update strategic function μ (s | θμ);
Step 2.5, utilizes and carrys out training network from policing algorithm, and the sample data used in network training is from same sample Relief area obtains, to minimize the relatedness between sample, trains neutral net, i.e. with a target Q value network simultaneously Using experience replay mechanism and target Q value network method for the renewal of objective network, the slow more New Policy used is:
θQ'←τθQ+(1-τ)θQ' (10)
θμ'←τθμ+(1-τ)θμ' (11)
In formula (10) and (11), τ is turnover rate, and τ < < 1 the most just constructs a deeply based on DDPG Practise network, and be the neutral net of convergence;
Step 2.6, utilizes the deeply learning network built that photo is carried out Data Dimensionality Reduction feature extraction, it is thus achieved that machine The control strategy of device people;Deeply learning network is by an image input layer, two convolutional layers, two full articulamentums and Individual output layer constitute, wherein, the quantity of selected convolutional layer and full articulamentum be the purpose of two be both can effectively to have extracted Characteristics of image, again so that neutral net is easy to convergence when training;Image input layer comprises object to be captured for input Image;Convolutional layer is used for extracting feature, i.e. the deep layer form of expression of an image, such as some lines, limit, camber line etc.;Quan Lian Connect layer and output layer for constitute a deep layer network, by training after, input feature vector information can export control to this network System instruction, i.e. controls the mechanical arm steering wheel angle of robot and controls to carry the DC motor speed of dolly.
Experience replay mechanism and stochastical sampling is used to determine that the image information of input can during pre-training neutral net of the present invention The problem that neutral net requires independently of one another it is unsatisfactory for more greatly for input data with degree of association before and after effectively solving photo;Pass through Degree of depth study realizes Data Dimensionality Reduction, uses target Q value network technique constantly to adjust the weight matrix of neutral net, can be as far as possible Ground ensures the neutral net convergence of training;The deeply learning neural network based on DDPG trained can realize number According to dimensionality reduction and object feature extraction, and directly give the Motion Control Strategies of robot, effectively solve " dimension disaster " problem.

Claims (4)

1. a robot self-adapting grasping method based on deeply study, it is characterised in that comprise the steps:
Step 1, in distance in time capturing target certain distance, robot obtains the photo of target by anterior photographic head, then Utilize binocular distance-finding method to calculate the positional information of target according to photo, and the positional information calculated is used for robot leads Boat;
Step 2, robot moves according to navigation, when target enters in the range of mechanical arm is grabbed, then is taken the photograph by anterior As the photo of head photographic subjects, and the deeply learning network based on DDPG utilizing training in advance to cross carries out data to photo Dimensionality reduction feature extraction;
Step 3, draws the control strategy of robot according to feature extraction result, and robot utilizes control strategy to control road of moving Footpath and the pose of mechanical arm, thus realize the self-adapting grasping of target.
Robot self-adapting grasping method based on deeply study the most according to claim 1, it is characterised in that step Binocular distance-finding method is utilized to calculate the concretely comprising the following steps of positional information of target according to photo in rapid 1:
Step 1.1, obtains the focal distance f of photographic head, centre-to-centre spacing T of two photographic head in left and rightxAnd impact point is in the shooting of two, left and right The subpoint of the image plane of head is to physical distance x of the respective image plane leftmost sidelAnd xr, left side that two, left and right photographic head is corresponding Image plane and the image plane on right side be rectangle plane, and be positioned on same imaging plane, the photocentre of two photographic head in left and right Projection lays respectively at the center of corresponding image plane, then parallax d is:
D=xl-xr (1)
Step 1.2, utilizes Similar Principle of Triangle to set up Q matrix to be:
Q = 1 0 0 - c x 0 1 0 - c y 0 0 0 f 0 0 - 1 T x c x - c x ′ T x - - - ( 2 )
Q x y d 1 = x - c x y - c y f - d + c x - c x ′ T x = X Y Z W - - - ( 3 )
In formula (2) and (3), (X, Y, Z) is impact point coordinate in the three-dimensional coordinate system with left photographic head photocentre as initial point, W For rotating translation conversion ratio example coefficient, (x y) is impact point coordinate in the image plane in left side, cxAnd cyIt is respectively left side The coordinate system of the image plane on image plane and right side and the side-play amount of initial point, c in three-dimensional coordinate systemx' for cxCorrection value;
Step 1.3, being calculated impact point to the space length of imaging plane is:
Z = - T x f d - ( c x - c x ′ ) - - - ( 4 )
Using the photocentre position of left photographic head as robot position, by the co-ordinate position information (X, Y, Z) of impact point Robot navigation is carried out as navigation purpose.
Robot self-adapting grasping method based on deeply study the most according to claim 1 and 2, its feature exists In, the deeply learning network based on DDPG utilizing training in advance to cross in step 2 carries out Data Dimensionality Reduction feature to photo and carries Take concretely comprises the following steps:
Step 2.1, utilize target capture process meet intensified learning and meet the condition of Markov character, calculate t it Front observed quantity and the collection of action are combined into:
st=(x1,a1,...,at-1,xt)=xt (5)
In formula (5), xtAnd atThe observed quantity being respectively t and the action taked;
Step 2.2, Utilization strategies value function describes the prospective earnings of crawl process and is:
Qπ(st,at)=E [Rt|st,at] (6)
In formula (6),The future profits summation that discount is later, γ ∈ was beaten for what moment t obtained [0,1] is discount factor, r (st,at) it is the revenue function of moment t, T is to capture the moment terminated, and π is for capturing strategy;
Target strategy π owing to capturing presets and determines, being designated as function mu: S ← A, S are state space, A is the action of N-dimensional degree Space, utilize Bellman equation to process formula (6) has simultaneously:
Q μ ( s t , a t ) = E s t + 1 ~ E [ r ( s t , a t ) + γQ μ ( s t + 1 , μ ( s t + 1 ) ) ] - - - ( 7 )
In formula (7), st+1~E represents that the observed quantity in t+1 moment obtains from environment E, μ (st+1) represent that the t+1 moment is from sight The action that the amount of examining is be mapped to by function mu;
Step 2.3, utilizes the principle of maximal possibility estimation, is θ by minimize loss function updating network weight parameterQ's Policy evaluation network Q (s, a | θQ), the loss function used is:
L(θQ)=Eμ'[(Q(st,atQ)-yt)2] (8)
In formula (8), yt=r (st,at)+γQ(st+1,μ(st+1)|θQ) it is that target strategy assesses network, μ ' is target strategy;
Step 2.4, is θ for actual parameterμStrategic function μ (s | θμ), the gradient utilizing chain method to obtain is:
▿ θ μ μ ≈ E μ ′ [ ▿ θ μ Q ( s , a | θ Q ) | s = s t , a = μ ( s t | θ μ ) ] = E μ ′ [ ▿ a Q ( s , a | θ Q ) | s = s t , a = μ ( s t ) ▿ θ μ μ ( s | θ μ ) | s = s t ] - - - ( 9 )
Be Policy-Gradient by formula (9) calculated gradient, recycling Policy-Gradient update strategic function μ (s | θμ);
Step 2.5, utilizes and carrys out training network from policing algorithm, and the sample data used in network training is from same Sample Buffer District obtains, to minimize the relatedness between sample, trains neutral net with a target Q value network simultaneously, i.e. use Experience replay mechanism and target Q value network method are for the renewal of objective network, and the slow more New Policy used is:
θQ'←τθQ+(1-τ)θQ' (10)
θμ'←τθμ+(1-τ)θμ' (11)
In formula (10) and (11), τ is turnover rate, τ < < 1, the most just constructs deeply based on a DDPG study net Network, and be the neutral net of convergence;
Step 2.6, utilizes the deeply learning network built that photo is carried out Data Dimensionality Reduction feature extraction, it is thus achieved that robot Control strategy.
Robot self-adapting grasping method based on deeply study the most according to claim 3, it is characterised in that step Deeply learning network in rapid 2.6 is by an image input layer, two convolutional layers, two full articulamentums and an output Layer is constituted, and image input layer comprises the image of object to be captured for input;Convolutional layer is used for extracting feature, i.e. an image The deep layer form of expression;Full articulamentum and output layer are for constituting a deep layer network, and after training, input feature vector information arrives This most exportable control instruction of deep layer network, i.e. controls the mechanical arm steering wheel angle of robot and controls to carry the direct current of dolly Motor speed.
CN201610402319.6A 2016-06-08 2016-06-08 A kind of robot self-adapting grasping method based on deeply study Pending CN106094516A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610402319.6A CN106094516A (en) 2016-06-08 2016-06-08 A kind of robot self-adapting grasping method based on deeply study

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610402319.6A CN106094516A (en) 2016-06-08 2016-06-08 A kind of robot self-adapting grasping method based on deeply study

Publications (1)

Publication Number Publication Date
CN106094516A true CN106094516A (en) 2016-11-09

Family

ID=57228280

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610402319.6A Pending CN106094516A (en) 2016-06-08 2016-06-08 A kind of robot self-adapting grasping method based on deeply study

Country Status (1)

Country Link
CN (1) CN106094516A (en)

Cited By (39)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106600650A (en) * 2016-12-12 2017-04-26 杭州蓝芯科技有限公司 Binocular visual sense depth information obtaining method based on deep learning
CN106780605A (en) * 2016-12-20 2017-05-31 芜湖哈特机器人产业技术研究院有限公司 A kind of detection method of the object crawl position based on deep learning robot
CN106737673A (en) * 2016-12-23 2017-05-31 浙江大学 A kind of method of the control of mechanical arm end to end based on deep learning
CN106873585A (en) * 2017-01-18 2017-06-20 无锡辰星机器人科技有限公司 One kind navigation method for searching, robot and system
CN106970594A (en) * 2017-05-09 2017-07-21 京东方科技集团股份有限公司 A kind of method for planning track of flexible mechanical arm
CN107092254A (en) * 2017-04-27 2017-08-25 北京航空航天大学 A kind of design method for the Household floor-sweeping machine device people for strengthening study based on depth
CN107139179A (en) * 2017-05-26 2017-09-08 西安电子科技大学 A kind of intellect service robot and method of work
CN107186708A (en) * 2017-04-25 2017-09-22 江苏安格尔机器人有限公司 Trick servo robot grasping system and method based on deep learning image Segmentation Technology
CN107367929A (en) * 2017-07-19 2017-11-21 北京上格云技术有限公司 Update method, storage medium and the terminal device of Q value matrixs
CN107450593A (en) * 2017-08-30 2017-12-08 清华大学 A kind of unmanned plane autonomous navigation method and system
CN107450555A (en) * 2017-08-30 2017-12-08 唐开强 A kind of Hexapod Robot real-time gait planing method based on deeply study
CN107479368A (en) * 2017-06-30 2017-12-15 北京百度网讯科技有限公司 A kind of method and system of the training unmanned aerial vehicle (UAV) control model based on artificial intelligence
CN107479501A (en) * 2017-09-28 2017-12-15 广州智能装备研究院有限公司 3D parts suction methods based on deep learning
CN107562052A (en) * 2017-08-30 2018-01-09 唐开强 A kind of Hexapod Robot gait planning method based on deeply study
CN107748566A (en) * 2017-09-20 2018-03-02 清华大学 A kind of underwater autonomous robot constant depth control method based on intensified learning
CN108052004A (en) * 2017-12-06 2018-05-18 湖北工业大学 Industrial machinery arm autocontrol method based on depth enhancing study
CN108051999A (en) * 2017-10-31 2018-05-18 中国科学技术大学 Accelerator beam path control method and system based on deeply study
CN108305275A (en) * 2017-08-25 2018-07-20 深圳市腾讯计算机系统有限公司 Active tracking method, apparatus and system
CN108321795A (en) * 2018-01-19 2018-07-24 上海交通大学 Start-stop of generator set configuration method based on depth deterministic policy algorithm and system
CN108415254A (en) * 2018-03-12 2018-08-17 苏州大学 Waste recovery robot control method based on depth Q networks and its device
CN108536011A (en) * 2018-03-19 2018-09-14 中山大学 A kind of Hexapod Robot complicated landform adaptive motion control method based on deeply study
CN108594804A (en) * 2018-03-12 2018-09-28 苏州大学 Automatic ride control method based on depth Q distribution via internet trolleies
CN108873687A (en) * 2018-07-11 2018-11-23 哈尔滨工程大学 A kind of Intelligent Underwater Robot behavior system knot planing method based on depth Q study
CN109116854A (en) * 2018-09-16 2019-01-01 南京大学 A kind of robot cooperated control method of multiple groups based on intensified learning and control system
CN109344877A (en) * 2018-08-31 2019-02-15 深圳先进技术研究院 A kind of sample data processing method, sample data processing unit and electronic equipment
CN109358628A (en) * 2018-11-06 2019-02-19 江苏木盟智能科技有限公司 A kind of container alignment method and robot
CN109407603A (en) * 2017-08-16 2019-03-01 北京猎户星空科技有限公司 A kind of method and device of control mechanical arm crawl object
CN109523029A (en) * 2018-09-28 2019-03-26 清华大学深圳研究生院 For the adaptive double from driving depth deterministic policy Gradient Reinforcement Learning method of training smart body
CN109909998A (en) * 2017-12-12 2019-06-21 北京猎户星空科技有限公司 A kind of method and device controlling manipulator motion
WO2019155061A1 (en) * 2018-02-09 2019-08-15 Deepmind Technologies Limited Distributional reinforcement learning using quantile function neural networks
CN110202583A (en) * 2019-07-09 2019-09-06 华南理工大学 A kind of Apery manipulator control system and its control method based on deep learning
CN110293549A (en) * 2018-03-21 2019-10-01 北京猎户星空科技有限公司 Mechanical arm control method, device and neural network model training method, device
CN110323981A (en) * 2019-05-14 2019-10-11 广东省智能制造研究所 A kind of method and system controlling permanent magnetic linear synchronous motor
CN110394804A (en) * 2019-08-26 2019-11-01 山东大学 A kind of robot control method, controller and system based on layering thread frame
CN110400345A (en) * 2019-07-24 2019-11-01 西南科技大学 Radioactive waste based on deeply study, which pushes away, grabs collaboration method for sorting
CN110722556A (en) * 2019-10-17 2020-01-24 苏州恒辉科技有限公司 Movable mechanical arm control system and method based on reinforcement learning
WO2020134254A1 (en) * 2018-12-27 2020-07-02 南京芊玥机器人科技有限公司 Method employing reinforcement learning to optimize trajectory of spray painting robot
US10926416B2 (en) 2018-11-21 2021-02-23 Ford Global Technologies, Llc Robotic manipulation using an independently actuated vision system, an adversarial control scheme, and a multi-tasking deep learning architecture
CN112734759A (en) * 2021-03-30 2021-04-30 常州微亿智造科技有限公司 Method and device for determining trigger point of flying shooting

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080133053A1 (en) * 2006-11-29 2008-06-05 Honda Motor Co., Ltd. Determination of Foot Placement for Humanoid Push Recovery
CN102521205A (en) * 2011-11-23 2012-06-27 河海大学常州校区 Multi-Agent based robot combined search system by reinforcement learning
CN102902271A (en) * 2012-10-23 2013-01-30 上海大学 Binocular vision-based robot target identifying and gripping system and method
CN203390936U (en) * 2013-04-26 2014-01-15 上海锡明光电科技有限公司 Self-adaption automatic robotic system realizing dynamic and real-time capture function
CN104778721A (en) * 2015-05-08 2015-07-15 哈尔滨工业大学 Distance measuring method of significant target in binocular image
CN105115497A (en) * 2015-09-17 2015-12-02 南京大学 Reliable indoor mobile robot precise navigation positioning system and method
CN105137967A (en) * 2015-07-16 2015-12-09 北京工业大学 Mobile robot path planning method with combination of depth automatic encoder and Q-learning algorithm
CN105425828A (en) * 2015-11-11 2016-03-23 山东建筑大学 Robot anti-impact double-arm coordination control system based on sensor fusion technology
CN105459136A (en) * 2015-12-29 2016-04-06 上海帆声图像科技有限公司 Robot vision grasping method
CN105637540A (en) * 2013-10-08 2016-06-01 谷歌公司 Methods and apparatus for reinforcement learning

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080133053A1 (en) * 2006-11-29 2008-06-05 Honda Motor Co., Ltd. Determination of Foot Placement for Humanoid Push Recovery
CN102521205A (en) * 2011-11-23 2012-06-27 河海大学常州校区 Multi-Agent based robot combined search system by reinforcement learning
CN102902271A (en) * 2012-10-23 2013-01-30 上海大学 Binocular vision-based robot target identifying and gripping system and method
CN203390936U (en) * 2013-04-26 2014-01-15 上海锡明光电科技有限公司 Self-adaption automatic robotic system realizing dynamic and real-time capture function
CN105637540A (en) * 2013-10-08 2016-06-01 谷歌公司 Methods and apparatus for reinforcement learning
CN104778721A (en) * 2015-05-08 2015-07-15 哈尔滨工业大学 Distance measuring method of significant target in binocular image
CN105137967A (en) * 2015-07-16 2015-12-09 北京工业大学 Mobile robot path planning method with combination of depth automatic encoder and Q-learning algorithm
CN105115497A (en) * 2015-09-17 2015-12-02 南京大学 Reliable indoor mobile robot precise navigation positioning system and method
CN105425828A (en) * 2015-11-11 2016-03-23 山东建筑大学 Robot anti-impact double-arm coordination control system based on sensor fusion technology
CN105459136A (en) * 2015-12-29 2016-04-06 上海帆声图像科技有限公司 Robot vision grasping method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
TIMOTHY P. LILLICRAP 等: "Continuous Control with Deep Reinforcement Learning", 《GOOGLE DEEPMIND,ICLR 2016》 *
陈强: "基于双目立体视觉的三维重建", 《图形图像》 *

Cited By (56)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106600650A (en) * 2016-12-12 2017-04-26 杭州蓝芯科技有限公司 Binocular visual sense depth information obtaining method based on deep learning
CN106780605A (en) * 2016-12-20 2017-05-31 芜湖哈特机器人产业技术研究院有限公司 A kind of detection method of the object crawl position based on deep learning robot
CN106737673A (en) * 2016-12-23 2017-05-31 浙江大学 A kind of method of the control of mechanical arm end to end based on deep learning
CN106737673B (en) * 2016-12-23 2019-06-18 浙江大学 A method of the control of mechanical arm end to end based on deep learning
CN106873585A (en) * 2017-01-18 2017-06-20 无锡辰星机器人科技有限公司 One kind navigation method for searching, robot and system
CN107186708B (en) * 2017-04-25 2020-05-12 珠海智卓投资管理有限公司 Hand-eye servo robot grabbing system and method based on deep learning image segmentation technology
CN107186708A (en) * 2017-04-25 2017-09-22 江苏安格尔机器人有限公司 Trick servo robot grasping system and method based on deep learning image Segmentation Technology
CN107092254A (en) * 2017-04-27 2017-08-25 北京航空航天大学 A kind of design method for the Household floor-sweeping machine device people for strengthening study based on depth
CN107092254B (en) * 2017-04-27 2019-11-29 北京航空航天大学 A kind of design method of the Household floor-sweeping machine device people based on depth enhancing study
CN106970594A (en) * 2017-05-09 2017-07-21 京东方科技集团股份有限公司 A kind of method for planning track of flexible mechanical arm
CN106970594B (en) * 2017-05-09 2019-02-12 京东方科技集团股份有限公司 A kind of method for planning track of flexible mechanical arm
CN107139179A (en) * 2017-05-26 2017-09-08 西安电子科技大学 A kind of intellect service robot and method of work
CN107139179B (en) * 2017-05-26 2020-05-29 西安电子科技大学 Intelligent service robot and working method
CN107479368A (en) * 2017-06-30 2017-12-15 北京百度网讯科技有限公司 A kind of method and system of the training unmanned aerial vehicle (UAV) control model based on artificial intelligence
CN107367929A (en) * 2017-07-19 2017-11-21 北京上格云技术有限公司 Update method, storage medium and the terminal device of Q value matrixs
CN109407603B (en) * 2017-08-16 2020-03-06 北京猎户星空科技有限公司 Method and device for controlling mechanical arm to grab object
CN109407603A (en) * 2017-08-16 2019-03-01 北京猎户星空科技有限公司 A kind of method and device of control mechanical arm crawl object
CN108305275A (en) * 2017-08-25 2018-07-20 深圳市腾讯计算机系统有限公司 Active tracking method, apparatus and system
CN107450555A (en) * 2017-08-30 2017-12-08 唐开强 A kind of Hexapod Robot real-time gait planing method based on deeply study
CN107450593A (en) * 2017-08-30 2017-12-08 清华大学 A kind of unmanned plane autonomous navigation method and system
CN107562052A (en) * 2017-08-30 2018-01-09 唐开强 A kind of Hexapod Robot gait planning method based on deeply study
CN107450593B (en) * 2017-08-30 2020-06-12 清华大学 Unmanned aerial vehicle autonomous navigation method and system
CN107748566A (en) * 2017-09-20 2018-03-02 清华大学 A kind of underwater autonomous robot constant depth control method based on intensified learning
CN107748566B (en) * 2017-09-20 2020-04-24 清华大学 Underwater autonomous robot fixed depth control method based on reinforcement learning
CN107479501A (en) * 2017-09-28 2017-12-15 广州智能装备研究院有限公司 3D parts suction methods based on deep learning
CN108051999A (en) * 2017-10-31 2018-05-18 中国科学技术大学 Accelerator beam path control method and system based on deeply study
CN108052004B (en) * 2017-12-06 2020-11-10 湖北工业大学 Industrial mechanical arm automatic control method based on deep reinforcement learning
CN108052004A (en) * 2017-12-06 2018-05-18 湖北工业大学 Industrial machinery arm autocontrol method based on depth enhancing study
CN109909998B (en) * 2017-12-12 2020-10-02 北京猎户星空科技有限公司 Method and device for controlling movement of mechanical arm
CN109909998A (en) * 2017-12-12 2019-06-21 北京猎户星空科技有限公司 A kind of method and device controlling manipulator motion
CN108321795A (en) * 2018-01-19 2018-07-24 上海交通大学 Start-stop of generator set configuration method based on depth deterministic policy algorithm and system
CN108321795B (en) * 2018-01-19 2021-01-22 上海交通大学 Generator set start-stop configuration method and system based on deep certainty strategy algorithm
WO2019155061A1 (en) * 2018-02-09 2019-08-15 Deepmind Technologies Limited Distributional reinforcement learning using quantile function neural networks
CN108594804A (en) * 2018-03-12 2018-09-28 苏州大学 Automatic ride control method based on depth Q distribution via internet trolleies
CN108594804B (en) * 2018-03-12 2021-06-18 苏州大学 Automatic driving control method for distribution trolley based on deep Q network
CN108415254B (en) * 2018-03-12 2020-12-11 苏州大学 Waste recycling robot control method based on deep Q network
CN108415254A (en) * 2018-03-12 2018-08-17 苏州大学 Waste recovery robot control method based on depth Q networks and its device
CN108536011A (en) * 2018-03-19 2018-09-14 中山大学 A kind of Hexapod Robot complicated landform adaptive motion control method based on deeply study
CN110293549A (en) * 2018-03-21 2019-10-01 北京猎户星空科技有限公司 Mechanical arm control method, device and neural network model training method, device
CN110293549B (en) * 2018-03-21 2021-06-22 北京猎户星空科技有限公司 Mechanical arm control method and device and neural network model training method and device
CN108873687A (en) * 2018-07-11 2018-11-23 哈尔滨工程大学 A kind of Intelligent Underwater Robot behavior system knot planing method based on depth Q study
CN109344877B (en) * 2018-08-31 2020-12-11 深圳先进技术研究院 Sample data processing method, sample data processing device and electronic equipment
CN109344877A (en) * 2018-08-31 2019-02-15 深圳先进技术研究院 A kind of sample data processing method, sample data processing unit and electronic equipment
CN109116854A (en) * 2018-09-16 2019-01-01 南京大学 A kind of robot cooperated control method of multiple groups based on intensified learning and control system
CN109523029B (en) * 2018-09-28 2020-11-03 清华大学深圳研究生院 Self-adaptive double-self-driven depth certainty strategy gradient reinforcement learning method
CN109523029A (en) * 2018-09-28 2019-03-26 清华大学深圳研究生院 For the adaptive double from driving depth deterministic policy Gradient Reinforcement Learning method of training smart body
CN109358628A (en) * 2018-11-06 2019-02-19 江苏木盟智能科技有限公司 A kind of container alignment method and robot
US10926416B2 (en) 2018-11-21 2021-02-23 Ford Global Technologies, Llc Robotic manipulation using an independently actuated vision system, an adversarial control scheme, and a multi-tasking deep learning architecture
WO2020134254A1 (en) * 2018-12-27 2020-07-02 南京芊玥机器人科技有限公司 Method employing reinforcement learning to optimize trajectory of spray painting robot
CN110323981A (en) * 2019-05-14 2019-10-11 广东省智能制造研究所 A kind of method and system controlling permanent magnetic linear synchronous motor
CN110202583A (en) * 2019-07-09 2019-09-06 华南理工大学 A kind of Apery manipulator control system and its control method based on deep learning
CN110400345B (en) * 2019-07-24 2021-06-15 西南科技大学 Deep reinforcement learning-based radioactive waste push-grab cooperative sorting method
CN110400345A (en) * 2019-07-24 2019-11-01 西南科技大学 Radioactive waste based on deeply study, which pushes away, grabs collaboration method for sorting
CN110394804A (en) * 2019-08-26 2019-11-01 山东大学 A kind of robot control method, controller and system based on layering thread frame
CN110722556A (en) * 2019-10-17 2020-01-24 苏州恒辉科技有限公司 Movable mechanical arm control system and method based on reinforcement learning
CN112734759A (en) * 2021-03-30 2021-04-30 常州微亿智造科技有限公司 Method and device for determining trigger point of flying shooting

Similar Documents

Publication Publication Date Title
CN106094516A (en) A kind of robot self-adapting grasping method based on deeply study
CN104699247B (en) A kind of virtual reality interactive system and method based on machine vision
CN105787439B (en) A kind of depth image human synovial localization method based on convolutional neural networks
US20200074683A1 (en) Camera calibration
CN106444780B (en) A kind of autonomous navigation method and system of the robot of view-based access control model location algorithm
CN110312912A (en) Vehicle automatic parking system and method
CN103231708B (en) A kind of intelligent vehicle barrier-avoiding method based on binocular vision
CN107909061B (en) Head posture tracking device and method based on incomplete features
CN105786016B (en) The processing method of unmanned plane and RGBD image
CN104180818B (en) A kind of monocular vision mileage calculation device
CN105847684A (en) Unmanned aerial vehicle
CN107450555A (en) A kind of Hexapod Robot real-time gait planing method based on deeply study
CN105631861B (en) Restore the method for 3 D human body posture from unmarked monocular image in conjunction with height map
CN108227735A (en) Method, computer-readable medium and the system of view-based access control model flight self-stabilization
CN108986136A (en) A kind of binocular scene flows based on semantic segmentation determine method and system
CN105760894A (en) Robot navigation method based on machine vision and machine learning
CN104794737B (en) A kind of depth information Auxiliary Particle Filter tracking
CN103150728A (en) Vision positioning method in dynamic environment
CN104777839B (en) Robot autonomous barrier-avoiding method based on BP neural network and range information
CN110175566A (en) A kind of hand gestures estimating system and method based on RGBD converged network
CN108021241A (en) A kind of method for realizing AR glasses virtual reality fusions
CN106056633A (en) Motion control method, device and system
CN107397658B (en) Multi-scale full-convolution network and visual blind guiding method and device
CN108898628A (en) Three-dimensional vehicle object's pose estimation method, system, terminal and storage medium based on monocular
CN106569225A (en) Range-finding sensor based real-time obstacle avoidance method of driveless car

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20161109