CN111331607A - Automatic grabbing and stacking method and system based on mechanical arm - Google Patents

Automatic grabbing and stacking method and system based on mechanical arm Download PDF

Info

Publication number
CN111331607A
CN111331607A CN202010260136.1A CN202010260136A CN111331607A CN 111331607 A CN111331607 A CN 111331607A CN 202010260136 A CN202010260136 A CN 202010260136A CN 111331607 A CN111331607 A CN 111331607A
Authority
CN
China
Prior art keywords
grabbing
stacking
network
palletizing
area
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010260136.1A
Other languages
Chinese (zh)
Other versions
CN111331607B (en
Inventor
张伟
张钧皓
宋然
马林
李贻斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University
Original Assignee
Shandong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University filed Critical Shandong University
Priority to CN202010260136.1A priority Critical patent/CN111331607B/en
Publication of CN111331607A publication Critical patent/CN111331607A/en
Application granted granted Critical
Publication of CN111331607B publication Critical patent/CN111331607B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1679Programme controls characterised by the tasks executed
    • B25J9/1687Assembly, peg and hole, palletising, straight line, weaving pattern movement
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J19/00Accessories fitted to manipulators, e.g. for monitoring, for viewing; Safety devices combined with or specially adapted for use in connection with manipulators
    • B25J19/02Sensing devices
    • B25J19/04Viewing devices
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1694Programme controls characterised by use of sensors other than normal servo-feedback from position, speed or acceleration sensors, perception control, multi-sensor controlled systems, sensor fusion
    • B25J9/1697Vision controlled systems
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B65CONVEYING; PACKING; STORING; HANDLING THIN OR FILAMENTARY MATERIAL
    • B65GTRANSPORT OR STORAGE DEVICES, e.g. CONVEYORS FOR LOADING OR TIPPING, SHOP CONVEYOR SYSTEMS OR PNEUMATIC TUBE CONVEYORS
    • B65G61/00Use of pick-up or transfer devices or of manipulators for stacking or de-stacking articles not otherwise provided for

Abstract

The invention discloses an automatic grabbing and stacking method and system based on mechanical arms, wherein images of grabbing areas and stacking areas of objects to be stacked are obtained, and the images are input into an automatic grabbing and stacking network; the automatic grabbing and stacking network predicts a grabbing position and a stacking position according to the learned grabbing strategy and stacking strategy; when the automatic grabbing code network is combined with deep reinforcement learning, an optimal strategy for maximizing the expected sum of future rewards is adopted; and the mechanical arm selects the required objects in the grabbing area according to the prediction result and places the objects in the grabbing area at proper positions in the current and future states. According to the technical scheme, the Grabbing and Stacking Network (GSN) learns the grabbing strategy and the stacking strategy at the same time, so that the mechanical clamp can pick up the object to be stacked from the table and correctly stack the object at a proper position.

Description

Automatic grabbing and stacking method and system based on mechanical arm
Technical Field
The invention belongs to the technical field of machine learning, and particularly relates to an automatic grabbing and stacking method and system based on a mechanical arm.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
In the last decades, the gripping action of the robot arm has reached a high level of precision in highly ordered environments such as automobile assembly and welding. In many task scenarios, however, the robotic arm system must handle unpredictable objects. A cluttered desktop can cause the gripper system to fail completely. Even if the grabbing is successful, the preset fixed stacking position can cause objects with different shapes to collide. Accordingly, there is an urgent need in the expanding retail industry for intelligent palletizing systems that can be used in warehouses.
The objective of reinforcement learning is to train the agent into contact with the environment to maximize the expected value of the cumulative reward in the future, which is related to policy optimization in the Markov Decision Process (MDP). the markov decision process can be represented by a tuple of formulas as M (S, G, a, r, γ), where S ∈ S represents a defined state space, G ∈ G represents a list of possible targets, a ∈ a represents the action space, r represents the state reward function, γ ∈ (0, 1) is a discount factor.
Traditional reinforcement learning methods such as tabular reinforcement learning suffer from "dimensional disasters" when high dimensional state space and action space are encountered, which has been difficult to solve before. With the rise of deep learning technology in recent years, the combination of deep neural network technology and reinforcement learning technology becomes an important means for solving the 'dimensional disaster'. Using deep neural networks, states can also be represented in the form of images, which makes it more convenient to solve visual problems with reinforcement learning techniques.
At present, the fact that deep reinforcement learning falls on a real robot, particularly a mechanical arm, is difficult, is mainly because reinforcement learning is a continuous trial and error method essentially, a large number of experiments need to be carried out, and the real mechanical arm is easy to damage when being subjected to a large number of experiments and needs a long time to collect samples. In addition, the action dimension of the mechanical arm is very high, for example UR5 has 6 joints, i.e. 6 degrees of freedom, which makes the robot difficult to control in the learning process.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides the automatic grabbing and stacking method and system based on the mechanical arm, so that an intelligent body can independently choose to grab objects and neatly stack the objects in another area only by means of visual state input.
In order to achieve the above object, one or more embodiments of the present invention provide the following technical solutions:
an automatic grabbing and stacking method and system based on mechanical arms comprises the following steps:
acquiring images of a stacking area and a grabbing area of an object to be stacked, and inputting the images into an automatic grabbing and stacking network;
the automatic grabbing and stacking network predicts a grabbing position and a stacking position according to the learned grabbing strategy and stacking strategy;
when the automatic grabbing code network is combined with deep reinforcement learning, an optimal strategy for maximizing the expected sum of future rewards is adopted;
and the mechanical arm selects the required objects in the grabbing area according to the prediction result and places the objects in the grabbing area at proper positions in the current and future states.
According to the further technical scheme, the automatic grabbing and stacking network comprises a grabbing network and a stacking network, the grabbing position and the stacking position are respectively predicted, the characteristics of the images in the stacking area and the characteristics of the images in the grabbing area are fused, and information of the stacking area is transmitted to the grabbing network.
According to the further technical scheme, when the grabbing strategy and the stacking strategy are automatically grabbed and stacked through network learning, auxiliary training is carried out based on task related information, and the method comprises the following steps:
predicting the number of objects left in the grabbing area by using the features extracted from the grabbing network sensing layer;
predicting a height of the pile at a pixel level using information obtained from a stacking network aware layer;
feature learning centered on the item to ensure that items disappearing from the grip area are similar to items at the feature level and added to the stack.
According to the further technical scheme, the automatic grabbing and stacking network learns to grab articles with different sizes and tightly stack the articles in a stacking area by using distributed prior experience playback.
In a further technical scheme, when the automatic grabbing and stacking network is combined with deep reinforcement learning, an optimal strategy of maximizing the expected sum of future rewards is adopted, and objects required in a grabbing area are selected and grabbed and stacked at proper positions in the current and future states.
According to the further technical scheme, before the image of the object to be stacked (the grabbing area) is input into the automatic grabbing and stacking network, the image is processed: the 3-channel color data is combined with the depth data to be projected orthogonally to the overhead view angle and rotated counterclockwise by different angles to generate a new front view.
According to the further technical scheme, for the stacking state representation of the stacking area, the RGB images shot by the camera facing the stacking area are used.
According to the further technical scheme, two Q functions are modeled through a capture network and a stacking network, at each time step, the capture network can evaluate the capture Q function of each pixel in the capture state, and the stacking network can evaluate the stacking Q function of each position unit in the stacking state of the object.
According to a further technical scheme, a capture network and a stacking network extract features from original image data; for the convolution layers in the capture network and the stacking network, fusing the high-level characteristics of the stacking state of the object generated by the convolution layers in the stacking network with the high-level characteristics of the capture state generated by the convolution layers in the capture network;
for a grab network, the blended low-level features are processed by two convolutional layers and then fed into a bilinear upsampling layer, another function of which is also used to predict the number of objects on the table by a global average pooling layer followed by an activation function and a linear layer.
Mechanical arm snatchs system of putting things in good order based on degree of depth reinforcement study includes:
the mechanical arm, the camera and the control system;
the camera collects images of a grabbing area and a stacking area where objects to be stacked are placed, and inputs the images to an automatic grabbing and stacking network of the control system;
the automatic grabbing and stacking network predicts a grabbing position and a stacking position according to the learned grabbing strategy and stacking strategy;
and the mechanical arm selects and carries out grabbing according to the prediction result and then stacks the objects to be stacked.
The above one or more technical solutions have the following beneficial effects:
according to the technical scheme, the Grabbing and Stacking Network (GSN) learns the grabbing strategy and the stacking strategy at the same time, so that the mechanical clamp can pick up objects to be stacked from the table and correctly stack the objects at a proper position.
The present disclosure uses information obtained from a stacking network (SNet) aware layer to predict the height of a pile at the pixel level. This task helps the network to extract the stack's profile features, which contain useful information to evaluate the current state. Another is an item-centric feature learning task to ensure that items missing from the gripping area are similar to items added to the stack at a feature level. That is, ensuring that the item features captured from different perspectives (desktop image and image of pile) are close.
The present disclosure formulates the entire grab-put process as a Q-learning problem. The technical scheme of the application uses distributed prior experience playback to learn the strategy that boxes with different sizes can be grabbed and closely stacked on a platform. Experiments were performed both in the simulation environment and in the real world to verify the effectiveness of the proposed method.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the invention and not to limit the invention.
FIG. 1 is a schematic diagram of a system according to an embodiment of the present invention;
fig. 2 is a schematic diagram of network learning according to an embodiment of the present invention.
Detailed Description
It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
The embodiments and features of the embodiments of the present invention may be combined with each other without conflict.
The embodiment discloses an automatic grabbing and stacking method based on mechanical arms, which comprises the following steps:
acquiring images of a stacking area and a grabbing area of an object to be stacked, and inputting the images into an automatic grabbing and stacking network;
the automatic grabbing and stacking network predicts a grabbing position and a stacking position according to the learned grabbing strategy and stacking strategy;
and the mechanical arm selects and carries out grabbing according to the prediction result and then stacking the objects to be stacked.
The mechanical arm grabbing and stacking method based on the deep reinforcement learning is a model embedded method based on a DQN algorithm, so that an intelligent body can independently choose to grab an object and neatly stack the object in another area only by means of visual state input. The multiple object grabbing and stacking task can be handled in an end-to-end manner.
The automatic crawling for stacking network (GSN) proposed in the above embodiment example is composed of two parts, namely, a crawling network (GNet) and a stacking network (SNet). The grab position and the stacking area can be predicted separately. To communicate the information of the landing zone to the GNet, the features of the landing zone photo and the features of the desktop photo are fused. Thus the GNet can consider not only which item is easily picked up, but also which item is needed in the stacking area.
By learning the grab and stack strategy simultaneously through the Grab and Stack Network (GSN), the mechanical gripper can pick the cassette from the table and stack it correctly on the platform.
To speed up the learning process, let the network focus more on task related information and provide additional training signals, three auxiliary tasks are introduced. The first is a table top object quantity prediction task that uses features extracted from the GNet perception layer to predict how many items remain on the table. The second is the heap height prediction task, which uses information obtained from the SNet perceptual layer to predict the height of the heap at the pixel level. This task helps the network to extract the stack's profile features, which contain useful information to evaluate the current state. The last is an item-centric feature learning task to ensure that items that disappear from the desktop are similar to items that are at the feature level and added to the stack. That is, the item features captured from different perspectives (desktop image and image of pile) should be close.
The entire capture and stacking process is formulated as a Q learning problem. Distributed priority empirical playback is used to learn strategies that can grab different sized boxes and place them tightly on the platform. Experiments were performed both in the simulation environment and in the real world to verify the effectiveness of the proposed method.
In a specific implementation, the operation is performed using a robotic arm system equipped with two fingers. The machine manipulation process may be represented as a Markov Decision Process (MDP). State s with time step at ttThen, the system respectively shoots and stacks through two camerasAnd (5) grabbing the area. The robot then follows a strategy pi parameterized by thetaθ(st) An action (including grabbing and stacking an object) is selected and implemented, and the strategy can be learned through training the deep network. The status is then updated with the instant prize r(s)t,at) S oft+1. After training, the training can be carried out by finding an optimal strategy
Figure BDA0002438975820000063
To solve the reinforcement learning problem, the strategy maximizes the sum (∑) of the expectation of future rewards (E) at T ═ 1,2, …, T by adjusting θ with a discount factor γ.
Figure BDA0002438975820000061
The above framework provides a solution to such decision-making problems, but training is difficult due to the difficulty of data collection. Collecting a large amount of experience is critical to the performance of reinforcement learning networks. Compared with the same strategy learning, the different strategy learning method can utilize the collected data for multiple times so as to train when the data collection is difficult. To train the network efficiently, a hetero-strategy Q learning algorithm is employed to learn the strategy for estimating the Q function by minimizing bellman errors:
Figure BDA0002438975820000062
after training, this strategy will act on the value function by maximizing the optimal state
Figure BDA0002438975820000072
(st,at) To select operations to form an optimal strategy
Figure BDA0002438975820000071
In other words, it will choose to be in state stAction a of generating the maximum jackpottI.e. to select the required stacking areaAnd place their grab bar in the appropriate position in the current and future states.
Grabbing state s through 4-channel RGBD image taken in front of tablegtModeling, combining 3-channel color data with depth data to project orthogonally to the overhead view angle and rotate counterclockwise by different angles n 22.5 °, n ∈ 0,1,2.. 7. this strategy generates eight new front views for the stacked state representation, the RGB images taken with the camera facing the stacking areastCan be fully represented by 2-dimensional RGB pictures.
Defining a representation of a grab action as agtThe representation of the stacking action is defined as astAn action is generated at each time step. As for the grabbing action agtWhich contains a Cartesian motion command [ x ]g,yg,zgg]Wherein [ x ]g,yg,zg]Corresponding to the centre of the gripper in the gripping, thetagIs the rotation angle of the wrist around the z-axis. The technical scheme divides 180 degrees into 8 independent thetagAnd (4) rotating. As for the stacking action astThe stacking area is divided into 14 positions along the x-axis, denoted as si∈[0,13]. These areas also represent the center of the palletized object, since the center of the object is learned to be grabbed after training. Since most objects in a task are 3 cells wide, neither the leftmost nor the rightmost cell area is contained in the operating space (if put at the edge, a part of the object will not be visible). The mechanical arm not only presses fx(si) Designated x-coordinate stacking object (f)xIs to mix siDiscrete function mapped to x coordinate), also by s)stThe inferred z-coordinate is used to stack the object. Other commands (e.g. y-coordinate and gripper orientation) have been fixed during the stacking operation, which simplifies this palletizing problem and facilitates stacking in a dense fashionAnd (4) a box.
The reward in reinforcement learning comprises two parts, wherein one part is used for evaluating the effect of putting and the other part is used for evaluating the effect of grabbing. Because the box that puts things in good order is put very tightly and is the level at the top, consequently this application technical scheme definition puts in good order reward rsThe following were used:
rs=B--H+-O+-L,
wherein B is-Indicates a ruggedness reduction value (evaluated by calculating the column height variance, thresholds set to 0.3, 0, and 1). H+Indicating a maximum height increase value (threshold set to 0 or 0.7). O is+Indicating an increased number of holes. Once the gap is covered, a hole is formed that cannot be refilled. L is a binary value indicating whether the top of the stack is all horizontal. By mixing spt+1And sptThe comparison can calculate these four values. The first three are different piecewise functions, the inputs of which are respectively the image spt+1And sptThereby guiding the strategy to achieve the code placement with high adaptability. Inspired by early termination strategies, if the learned strategy fails to achieve a level of code placement, the metric L will take effect and a restart signal will be sent since the status reward at a later time may be inaccurate.
Snatch reward rgIs defined as:
Figure BDA0002438975820000081
where G represents the result of the grab, 0 represents a grab failure, and 1 represents a grab success. D, which represents the distance between the center of the object and the grasping position, is crucial for achieving high stacking accuracy.
By incorporating different functions in the full convolutional network (GNet), the deep Q network is extended. The two Q functions are modeled by two convolutional networks (GNet and SNet). At each time step, GNet evaluates sgtThe grab Q function of each pixel in the set, while SNet evaluates sstThe code Q function of each position unit in the array.
Both networks in the architectureFeatures are extracted from the raw image data using the first 3 cells of ResNet-50. Since depth information is critical for accurate grabbing, the input layer of the GNet is adjusted from 3 channels to 6 channels (i.e., RGB is changed to RGBDDD by connecting together the RGB channel and the per-channel replicated depth channel). The input channel of the ResNet component in SNet remains unchanged. Will sgtThe captured pictures (resolution of 224x224) and s are representedstThe size of the representative code-positioned pictures (with a resolution of 256x128) are all adjusted to 320x320, which will generate a feature map of the appropriate size, which is then used for the upsampling and auxiliary tasks.
For subsequent convolutional layers in the GNet and SNet, they share the same network architecture as shown in fig. 2. The convolution kernel size of each convolution layer is 1x1, which helps to reduce size and mitigate possible overfitting. To incorporate characteristic information about which objects are required in a palletization to form an ordered layout and to be easily grasped in a particular grasping direction, the convolutional layer Φ in the SNet issGenerated sstHigh-level features of (1) and convolution layer phi in GNetgGenerated sgtAre fused. Considering that GNet is a full convolution network that encodes position information in a feature map, Φ cannot be scaled down at the channel levelsAnd phigConnected in series, using two linear layerssConversion to along-channel weights ωgThen, ω will beg(between 0 1) times phigThe following were used:
Φm=λωgΦg+(1-λ)Φg
wherein phimRepresenting the fused features, and λ is a scale factor, balancing the original features (containing a preliminary prediction of the location of the object that is easy to hold) with the fused features (emphasizing the feature from s)gtThe features are useful for selecting those objects that are suitable for the status of the stacking area). In the operation of the present embodiment, λ is set to 0.25.
The aforementioned features are sent to different layers for different tasks. For GNet, mixed low-layer features ΦmProcessed by two convolutional layers and then fed into a bilinear upsampling layer. PhigIs used to predict the number of objects on the table by global averaging after pooling the layers using the ReLU activation function and the linear layers, which helps to perceive objects with different layer regions and maintains sensitivity to smaller objects. In SNet, the high-level feature ΦsNot only for facilitating the GNet to make a broader perception, but also for predicting the Q value and column-wise height value of each codespace. Using two separate linear layer modules, by calculating the value function Vs(ss) And a merit function As(ss,as) Jointly estimating the Q value Qs(ss,as). For the task of predicting the height of the stacking area, the height of the object which is predicted column by column when the object is stacked can represent the upper boundary of a pile, wherein auxiliary information for predicting the Q value of the stacking area is contained.
Another auxiliary task, not shown in fig. 2, is an object-centric feature learning task. When the robot arm picks up and stacks an object from the desktop (desktop image and stack image in fig. 1), the features of both scenes extracted by the network should change according to the features of the object that is removed from the desktop and appears on the stack. The task will successfully capture the perception layer characteristics phi before and aftergWith the characteristics of the sensing layer before and after stackingpA comparison is made. To compute these features, the feature map Φ is processed by applying global average pooling and the ReLU non-linear layergAnd phip. Thus, the perception modules in GNet and SNet have the ability to identify the same object by similar features, which facilitates the above-described feature fusion.
The GNet and SNet are co-trained using a deep Q network as a Q function approximator. Specifically, the Q function Q will be grabbedg(sp+g,ag) Modeling as a full convolution network (GNet), decoding a Q function Qs(ss,as) Modeled as a deep network (SNet). Dual Q learning is used to train GNet and SNet. Compared with a Q learning method, the method adopts a target network and an improved maximum operator, so that the method is more reliable. The target network shares the same architecture as the network in fig. 2 (without the auxiliary task module), with its parameters extracted from the online learning model every 300 steps. For theMaximum operator, dual Q learning use Current QθAnd from target Qθ -The operation of maximizing the obtained value, that is, the loss functions for the grab Q function and the put-on Q function are both:
Figure BDA0002438975820000101
i ∈ { g, s } in each training round, the expected value E is calculated from a small batch of samples, the remaining variables being described abovegt) The parameterization of (a) allows the convolution characteristics to be shared between position and orientation. In the technical scheme of the application, the Q value Q predicted by GNetgWhat represents the grabbing and grabbing at a position where it is likely to be successful is the object required for the stacking strategy. Thus, the Q function Qg(sp+g,ag) The reward of (a) should include both grab and bet rewards. In this way,. phisTo omegagThe feature fusion layer of (2) can be self-adjusted in the training process.
To speed up learning, a distributed learning framework is implemented. There are 16 samplers sampling in an asynchronous manner. After collecting 200 samples, the sampler transfers the experience of samples with different priorities to the learner and duplicates the parameters from it. Meanwhile, the learner performs training through an empirical replay buffer with two priorities that stores empirical indexes with different priorities related to grab and put, and alternates use of samples with high priorities. Where samples with larger prediction errors will have higher priority. The pseudo code of the learner training routine is listed in Algorithm 1.
Figure BDA0002438975820000111
The system is trained in a simulated environment to improve efficiency. A UR5 robot equipped with a Robotiq85 gripper was used in V-REP. Four types of cassettes were designed, with sizes of 3x 3x 3, 3x 9x 3, 6x 9x 3 and 9x 9x 3 (in centimeters).
In addition to estimating the loss of the Q function during the learning process, the auxiliary task is also lost differently. The task of predicting the landing zone height and the number of objects is trained using the smoothing L1 loss function, while the task of learning features centered on objects uses the n-pair loss function. The GNet and SNet were trained simultaneously using a random gradient descent with a learning rate of 0.0001. Both Q learning methods employ an epsilon-greedy exploration strategy, with epsilon initialized to 0.9 for SNet and 0.5 for GNet, respectively, and then annealed to 0.05 in the training.
The system (GSN) of the present solution is evaluated in a simulation environment and in a real scenario. Boxes of different sizes and colors are randomly stacked on a table, and the robot needs to grab and stack the boxes one by one to form a stable stack. Three experiments were performed:
1) comparative study between the reinforcement learning framework and the supervised learning method of the embodiment of the application;
2) ablation studies to assess the contribution of each component of the system of the present application in terms of overall performance;
3) it is demonstrated that the system of the present application can be applied to real robots to perform pick and place tasks.
The present application uses a UR5 robotic arm and a Robotiq85 gripper (with an attached Realsense camera) to perform the same test task on a real environment robot. In practical tests, the stacking performance was evaluated by the height difference Hd between the highest and lowest surfaces of the stack. If Hd is less than or equal to 2, the task of code placement is regarded as successful. The method achieves 75% (15/20) of stacking success rate in the box stacking task, and the supervised learning method can only achieve 15% (3/20) of success rate.
In one embodiment, an autonomous grasping and palletizing method and system based on a mechanical arm includes:
the mechanical arm, the camera and the control system;
the camera collects images of a grabbing area and a stacking area of an object to be stacked, and inputs the images to an automatic grabbing and stacking network of the control system;
the automatic grabbing and stacking network predicts a grabbing position and a stacking position according to the learned grabbing strategy and stacking strategy;
and the mechanical arm selects and carries out grabbing according to the prediction result and then stacks the objects to be stacked.
When the system works, specific reference is made to specific steps of the automatic grabbing and stacking method based on the mechanical arm in the embodiment.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive efforts by those skilled in the art based on the technical solution of the present invention.

Claims (10)

1. An automatic grabbing and stacking method based on mechanical arms is characterized by comprising the following steps:
acquiring images of a stacking area and a grabbing area of an object to be stacked, and inputting the images into an automatic grabbing and stacking network;
the automatic grabbing and stacking network predicts a grabbing position and a stacking position according to the learned grabbing strategy and stacking strategy;
when the automatic grabbing code network is combined with deep reinforcement learning, an optimal strategy for maximizing the expected sum of future rewards is adopted;
and the mechanical arm selects the required objects in the grabbing area according to the prediction result and places the objects in the grabbing area at proper positions in the current and future states.
2. The method for automatically grabbing and palletizing based on the mechanical arm as claimed in claim 1, wherein the automatic grabbing and palletizing network comprises a grabbing network and a palletizing network, the grabbing position and the palletizing area are respectively predicted, and the characteristics of images of the palletizing area and the characteristics of images of the grabbing area are fused to realize the transmission of information of the palletizing area to the grabbing network.
3. The method for automatically grabbing and palletizing based on the mechanical arm as claimed in claim 1, wherein when an automatic grabbing and palletizing network learns a grabbing strategy and a palletizing strategy, training is performed based on task-related information, and the method comprises the following steps:
predicting the number of objects left in the grabbing area by using the features extracted from the grabbing network sensing layer;
predicting a height of the pile at a pixel level using information obtained from a stacking network aware layer;
feature learning centered on the item to ensure that items disappearing from the grip area are similar to items at the feature level and added to the stack.
4. The robotic-arm-based autonomous grasping and palletizing method according to claim 1, wherein the automated grasping and palletizing network learns to grasp and closely stack articles of different sizes over a palletizing area using distributed, prior empirical playback.
5. The robotic-arm-based autonomous grasping and palletizing method as in claim 1, wherein the image of the grasping area of the object to be palletized is processed before being input into the automatic grasping and stacking network: combining the 3-channel color data with the depth data to project orthogonally to the overhead view and rotate counterclockwise by different angles, a new elevation is generated.
6. The robot-based autonomous grasping and palletizing method as set forth in claim 1, wherein for the representation of the palletizing state of the palletizing region, RGB images taken by a camera facing the palletizing region are used.
7. The robotic-arm-based autonomous grasping and palletizing method according to claim 1, characterized in that two Q-functions are modeled by a grasping network and a stacking network, the grasping network evaluating, at each time step, the grasping Q-function for each pixel in the grasping state, and the stacking network evaluating the stacking Q-function for each position unit in the stacking state of the object.
8. The robotic-arm-based autonomous grasping and palletizing method according to claim 1, wherein the grasping network and the palletizing network extract features from raw image data; for the convolution layers in the capture network and the stacking network, fusing the high-level characteristics of the stacking state of the object generated by the convolution layers in the stacking network with the high-level characteristics of the capture state generated by the convolution layers in the capture network;
for a grab network, the mixed low-level features are processed by two convolutional layers and then fed into a bilinear upsampling layer, and another function of the convolutional layers is also used to predict the number of objects on the table by passing in an activation function and a linear layer after globally averaging the pooling layer.
9. The robotic-arm-based autonomous grasping and palletizing method according to claim 1, wherein a deep Q-network is used as a Q-function approximator to train the grasping network and the palletizing network together.
10. The utility model provides an independently snatch and pile up neatly system based on arm, characterized by includes:
the mechanical arm, the camera and the control system;
the camera collects images of a grabbing area and a stacking area where objects to be stacked are placed, and inputs the images to an automatic grabbing and stacking network of the control system;
the automatic grabbing and stacking network predicts a grabbing position and a stacking position according to the learned grabbing strategy and stacking strategy;
and the mechanical arm selects and carries out grabbing according to the prediction result and then stacks the objects to be stacked.
CN202010260136.1A 2020-04-03 2020-04-03 Automatic grabbing and stacking method and system based on mechanical arm Active CN111331607B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010260136.1A CN111331607B (en) 2020-04-03 2020-04-03 Automatic grabbing and stacking method and system based on mechanical arm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010260136.1A CN111331607B (en) 2020-04-03 2020-04-03 Automatic grabbing and stacking method and system based on mechanical arm

Publications (2)

Publication Number Publication Date
CN111331607A true CN111331607A (en) 2020-06-26
CN111331607B CN111331607B (en) 2021-04-23

Family

ID=71176895

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010260136.1A Active CN111331607B (en) 2020-04-03 2020-04-03 Automatic grabbing and stacking method and system based on mechanical arm

Country Status (1)

Country Link
CN (1) CN111331607B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112643668A (en) * 2020-12-01 2021-04-13 浙江工业大学 Mechanical arm pushing and grabbing cooperation method suitable for intensive environment
CN113592855A (en) * 2021-08-19 2021-11-02 山东大学 Heuristic deep reinforcement learning-based autonomous grabbing and boxing method and system
CN114454160A (en) * 2021-12-31 2022-05-10 中国人民解放军国防科技大学 Mechanical arm grabbing control method and system based on kernel least square soft Bellman residual reinforcement learning
WO2023050589A1 (en) * 2021-09-30 2023-04-06 北京工业大学 Intelligent cargo box loading method and system based on rgbd camera
WO2024031831A1 (en) * 2022-08-09 2024-02-15 山东大学 Mechanical arm packing and unpacking collaboration method and system based on deep reinforcement learning

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018236753A1 (en) * 2017-06-19 2018-12-27 Google Llc Robotic grasping prediction using neural networks and geometry aware object representation
CN109344882A (en) * 2018-09-12 2019-02-15 浙江科技学院 Robot based on convolutional neural networks controls object pose recognition methods
CN109397285A (en) * 2018-09-17 2019-03-01 鲁班嫡系机器人(深圳)有限公司 A kind of assembly method, assembly device and assembly equipment
CN109514553A (en) * 2018-11-21 2019-03-26 苏州大学 A kind of method, system and the equipment of the mobile control of robot
CN110400345A (en) * 2019-07-24 2019-11-01 西南科技大学 Radioactive waste based on deeply study, which pushes away, grabs collaboration method for sorting
CN110539299A (en) * 2018-05-29 2019-12-06 北京京东尚科信息技术有限公司 Robot working method, controller and robot system
US20190385022A1 (en) * 2018-06-15 2019-12-19 Google Llc Self-supervised robotic object interaction

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018236753A1 (en) * 2017-06-19 2018-12-27 Google Llc Robotic grasping prediction using neural networks and geometry aware object representation
CN110539299A (en) * 2018-05-29 2019-12-06 北京京东尚科信息技术有限公司 Robot working method, controller and robot system
US20190385022A1 (en) * 2018-06-15 2019-12-19 Google Llc Self-supervised robotic object interaction
CN109344882A (en) * 2018-09-12 2019-02-15 浙江科技学院 Robot based on convolutional neural networks controls object pose recognition methods
CN109397285A (en) * 2018-09-17 2019-03-01 鲁班嫡系机器人(深圳)有限公司 A kind of assembly method, assembly device and assembly equipment
CN109514553A (en) * 2018-11-21 2019-03-26 苏州大学 A kind of method, system and the equipment of the mobile control of robot
CN110400345A (en) * 2019-07-24 2019-11-01 西南科技大学 Radioactive waste based on deeply study, which pushes away, grabs collaboration method for sorting

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ANDY ZENG ET AL: "Learning Synergies between Pushing and Grasping with Self-supervised Deep Reinforcement Learning", 《2018 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS(IROS)》 *
E.JANG ET AL: "Grasp2vec:Learning object representations from self-supervised grasping", 《ARXIV:1811.06964》 *
Y. JIANG ET AL: "Learning to place new objects", 《2012 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA)》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112643668A (en) * 2020-12-01 2021-04-13 浙江工业大学 Mechanical arm pushing and grabbing cooperation method suitable for intensive environment
CN112643668B (en) * 2020-12-01 2022-05-24 浙江工业大学 Mechanical arm pushing and grabbing cooperation method suitable for intensive environment
CN113592855A (en) * 2021-08-19 2021-11-02 山东大学 Heuristic deep reinforcement learning-based autonomous grabbing and boxing method and system
CN113592855B (en) * 2021-08-19 2024-02-13 山东大学 Autonomous grabbing and boxing method and system based on heuristic deep reinforcement learning
WO2023050589A1 (en) * 2021-09-30 2023-04-06 北京工业大学 Intelligent cargo box loading method and system based on rgbd camera
CN114454160A (en) * 2021-12-31 2022-05-10 中国人民解放军国防科技大学 Mechanical arm grabbing control method and system based on kernel least square soft Bellman residual reinforcement learning
CN114454160B (en) * 2021-12-31 2024-04-16 中国人民解放军国防科技大学 Mechanical arm grabbing control method and system based on kernel least square soft Belman residual error reinforcement learning
WO2024031831A1 (en) * 2022-08-09 2024-02-15 山东大学 Mechanical arm packing and unpacking collaboration method and system based on deep reinforcement learning

Also Published As

Publication number Publication date
CN111331607B (en) 2021-04-23

Similar Documents

Publication Publication Date Title
CN111331607B (en) Automatic grabbing and stacking method and system based on mechanical arm
JP6921151B2 (en) Deep machine learning methods and equipment for robot grip
DE102019130048B4 (en) A robotic system with a sack loss management mechanism
CN110785268B (en) Machine learning method and device for semantic robot grabbing
CN110238840B (en) Mechanical arm autonomous grabbing method based on vision
Zhang et al. Grasp for stacking via deep reinforcement learning
CN112297013B (en) Robot intelligent grabbing method based on digital twin and deep neural network
CN110298886B (en) Dexterous hand grabbing planning method based on four-stage convolutional neural network
CN111203878A (en) Robot sequence task learning method based on visual simulation
CN114641378A (en) System and method for robotic picking
JP2020082322A (en) Machine learning device, machine learning system, data processing system and machine learning method
CN110969660A (en) Robot feeding system based on three-dimensional stereoscopic vision and point cloud depth learning
CN115213896A (en) Object grabbing method, system and equipment based on mechanical arm and storage medium
CN113715016A (en) Robot grabbing method, system and device based on 3D vision and medium
JP2022187983A (en) Network modularization to learn high dimensional robot tasks
CN114789454A (en) Robot digital twin track completion method based on LSTM and inverse kinematics
Xue et al. Gesture-and vision-based automatic grasping and flexible placement in teleoperation
CN112288809B (en) Robot grabbing detection method for multi-object complex scene
CN116460843A (en) Multi-robot collaborative grabbing method and system based on meta heuristic algorithm
CN113762159B (en) Target grabbing detection method and system based on directional arrow model
JP2022187984A (en) Grasping device using modularized neural network
CN115631401A (en) Robot autonomous grabbing skill learning system and method based on visual perception
CN114998573A (en) Grabbing pose detection method based on RGB-D feature depth fusion
Khargonkar et al. SCENEREPLICA: Benchmarking Real-World Robot Manipulation by Creating Reproducible Scenes
CN117086862A (en) Six-degree-of-freedom flexible grabbing method for mechanical arm based on double-agent reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant