CN111331607A - Automatic grabbing and stacking method and system based on mechanical arm - Google Patents
Automatic grabbing and stacking method and system based on mechanical arm Download PDFInfo
- Publication number
- CN111331607A CN111331607A CN202010260136.1A CN202010260136A CN111331607A CN 111331607 A CN111331607 A CN 111331607A CN 202010260136 A CN202010260136 A CN 202010260136A CN 111331607 A CN111331607 A CN 111331607A
- Authority
- CN
- China
- Prior art keywords
- grabbing
- stacking
- network
- palletizing
- area
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/16—Programme controls
- B25J9/1679—Programme controls characterised by the tasks executed
- B25J9/1687—Assembly, peg and hole, palletising, straight line, weaving pattern movement
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J19/00—Accessories fitted to manipulators, e.g. for monitoring, for viewing; Safety devices combined with or specially adapted for use in connection with manipulators
- B25J19/02—Sensing devices
- B25J19/04—Viewing devices
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/16—Programme controls
- B25J9/1694—Programme controls characterised by use of sensors other than normal servo-feedback from position, speed or acceleration sensors, perception control, multi-sensor controlled systems, sensor fusion
- B25J9/1697—Vision controlled systems
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B65—CONVEYING; PACKING; STORING; HANDLING THIN OR FILAMENTARY MATERIAL
- B65G—TRANSPORT OR STORAGE DEVICES, e.g. CONVEYORS FOR LOADING OR TIPPING, SHOP CONVEYOR SYSTEMS OR PNEUMATIC TUBE CONVEYORS
- B65G61/00—Use of pick-up or transfer devices or of manipulators for stacking or de-stacking articles not otherwise provided for
Abstract
The invention discloses an automatic grabbing and stacking method and system based on mechanical arms, wherein images of grabbing areas and stacking areas of objects to be stacked are obtained, and the images are input into an automatic grabbing and stacking network; the automatic grabbing and stacking network predicts a grabbing position and a stacking position according to the learned grabbing strategy and stacking strategy; when the automatic grabbing code network is combined with deep reinforcement learning, an optimal strategy for maximizing the expected sum of future rewards is adopted; and the mechanical arm selects the required objects in the grabbing area according to the prediction result and places the objects in the grabbing area at proper positions in the current and future states. According to the technical scheme, the Grabbing and Stacking Network (GSN) learns the grabbing strategy and the stacking strategy at the same time, so that the mechanical clamp can pick up the object to be stacked from the table and correctly stack the object at a proper position.
Description
Technical Field
The invention belongs to the technical field of machine learning, and particularly relates to an automatic grabbing and stacking method and system based on a mechanical arm.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
In the last decades, the gripping action of the robot arm has reached a high level of precision in highly ordered environments such as automobile assembly and welding. In many task scenarios, however, the robotic arm system must handle unpredictable objects. A cluttered desktop can cause the gripper system to fail completely. Even if the grabbing is successful, the preset fixed stacking position can cause objects with different shapes to collide. Accordingly, there is an urgent need in the expanding retail industry for intelligent palletizing systems that can be used in warehouses.
The objective of reinforcement learning is to train the agent into contact with the environment to maximize the expected value of the cumulative reward in the future, which is related to policy optimization in the Markov Decision Process (MDP). the markov decision process can be represented by a tuple of formulas as M (S, G, a, r, γ), where S ∈ S represents a defined state space, G ∈ G represents a list of possible targets, a ∈ a represents the action space, r represents the state reward function, γ ∈ (0, 1) is a discount factor.
Traditional reinforcement learning methods such as tabular reinforcement learning suffer from "dimensional disasters" when high dimensional state space and action space are encountered, which has been difficult to solve before. With the rise of deep learning technology in recent years, the combination of deep neural network technology and reinforcement learning technology becomes an important means for solving the 'dimensional disaster'. Using deep neural networks, states can also be represented in the form of images, which makes it more convenient to solve visual problems with reinforcement learning techniques.
At present, the fact that deep reinforcement learning falls on a real robot, particularly a mechanical arm, is difficult, is mainly because reinforcement learning is a continuous trial and error method essentially, a large number of experiments need to be carried out, and the real mechanical arm is easy to damage when being subjected to a large number of experiments and needs a long time to collect samples. In addition, the action dimension of the mechanical arm is very high, for example UR5 has 6 joints, i.e. 6 degrees of freedom, which makes the robot difficult to control in the learning process.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides the automatic grabbing and stacking method and system based on the mechanical arm, so that an intelligent body can independently choose to grab objects and neatly stack the objects in another area only by means of visual state input.
In order to achieve the above object, one or more embodiments of the present invention provide the following technical solutions:
an automatic grabbing and stacking method and system based on mechanical arms comprises the following steps:
acquiring images of a stacking area and a grabbing area of an object to be stacked, and inputting the images into an automatic grabbing and stacking network;
the automatic grabbing and stacking network predicts a grabbing position and a stacking position according to the learned grabbing strategy and stacking strategy;
when the automatic grabbing code network is combined with deep reinforcement learning, an optimal strategy for maximizing the expected sum of future rewards is adopted;
and the mechanical arm selects the required objects in the grabbing area according to the prediction result and places the objects in the grabbing area at proper positions in the current and future states.
According to the further technical scheme, the automatic grabbing and stacking network comprises a grabbing network and a stacking network, the grabbing position and the stacking position are respectively predicted, the characteristics of the images in the stacking area and the characteristics of the images in the grabbing area are fused, and information of the stacking area is transmitted to the grabbing network.
According to the further technical scheme, when the grabbing strategy and the stacking strategy are automatically grabbed and stacked through network learning, auxiliary training is carried out based on task related information, and the method comprises the following steps:
predicting the number of objects left in the grabbing area by using the features extracted from the grabbing network sensing layer;
predicting a height of the pile at a pixel level using information obtained from a stacking network aware layer;
feature learning centered on the item to ensure that items disappearing from the grip area are similar to items at the feature level and added to the stack.
According to the further technical scheme, the automatic grabbing and stacking network learns to grab articles with different sizes and tightly stack the articles in a stacking area by using distributed prior experience playback.
In a further technical scheme, when the automatic grabbing and stacking network is combined with deep reinforcement learning, an optimal strategy of maximizing the expected sum of future rewards is adopted, and objects required in a grabbing area are selected and grabbed and stacked at proper positions in the current and future states.
According to the further technical scheme, before the image of the object to be stacked (the grabbing area) is input into the automatic grabbing and stacking network, the image is processed: the 3-channel color data is combined with the depth data to be projected orthogonally to the overhead view angle and rotated counterclockwise by different angles to generate a new front view.
According to the further technical scheme, for the stacking state representation of the stacking area, the RGB images shot by the camera facing the stacking area are used.
According to the further technical scheme, two Q functions are modeled through a capture network and a stacking network, at each time step, the capture network can evaluate the capture Q function of each pixel in the capture state, and the stacking network can evaluate the stacking Q function of each position unit in the stacking state of the object.
According to a further technical scheme, a capture network and a stacking network extract features from original image data; for the convolution layers in the capture network and the stacking network, fusing the high-level characteristics of the stacking state of the object generated by the convolution layers in the stacking network with the high-level characteristics of the capture state generated by the convolution layers in the capture network;
for a grab network, the blended low-level features are processed by two convolutional layers and then fed into a bilinear upsampling layer, another function of which is also used to predict the number of objects on the table by a global average pooling layer followed by an activation function and a linear layer.
Mechanical arm snatchs system of putting things in good order based on degree of depth reinforcement study includes:
the mechanical arm, the camera and the control system;
the camera collects images of a grabbing area and a stacking area where objects to be stacked are placed, and inputs the images to an automatic grabbing and stacking network of the control system;
the automatic grabbing and stacking network predicts a grabbing position and a stacking position according to the learned grabbing strategy and stacking strategy;
and the mechanical arm selects and carries out grabbing according to the prediction result and then stacks the objects to be stacked.
The above one or more technical solutions have the following beneficial effects:
according to the technical scheme, the Grabbing and Stacking Network (GSN) learns the grabbing strategy and the stacking strategy at the same time, so that the mechanical clamp can pick up objects to be stacked from the table and correctly stack the objects at a proper position.
The present disclosure uses information obtained from a stacking network (SNet) aware layer to predict the height of a pile at the pixel level. This task helps the network to extract the stack's profile features, which contain useful information to evaluate the current state. Another is an item-centric feature learning task to ensure that items missing from the gripping area are similar to items added to the stack at a feature level. That is, ensuring that the item features captured from different perspectives (desktop image and image of pile) are close.
The present disclosure formulates the entire grab-put process as a Q-learning problem. The technical scheme of the application uses distributed prior experience playback to learn the strategy that boxes with different sizes can be grabbed and closely stacked on a platform. Experiments were performed both in the simulation environment and in the real world to verify the effectiveness of the proposed method.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the invention and not to limit the invention.
FIG. 1 is a schematic diagram of a system according to an embodiment of the present invention;
fig. 2 is a schematic diagram of network learning according to an embodiment of the present invention.
Detailed Description
It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
The embodiments and features of the embodiments of the present invention may be combined with each other without conflict.
The embodiment discloses an automatic grabbing and stacking method based on mechanical arms, which comprises the following steps:
acquiring images of a stacking area and a grabbing area of an object to be stacked, and inputting the images into an automatic grabbing and stacking network;
the automatic grabbing and stacking network predicts a grabbing position and a stacking position according to the learned grabbing strategy and stacking strategy;
and the mechanical arm selects and carries out grabbing according to the prediction result and then stacking the objects to be stacked.
The mechanical arm grabbing and stacking method based on the deep reinforcement learning is a model embedded method based on a DQN algorithm, so that an intelligent body can independently choose to grab an object and neatly stack the object in another area only by means of visual state input. The multiple object grabbing and stacking task can be handled in an end-to-end manner.
The automatic crawling for stacking network (GSN) proposed in the above embodiment example is composed of two parts, namely, a crawling network (GNet) and a stacking network (SNet). The grab position and the stacking area can be predicted separately. To communicate the information of the landing zone to the GNet, the features of the landing zone photo and the features of the desktop photo are fused. Thus the GNet can consider not only which item is easily picked up, but also which item is needed in the stacking area.
By learning the grab and stack strategy simultaneously through the Grab and Stack Network (GSN), the mechanical gripper can pick the cassette from the table and stack it correctly on the platform.
To speed up the learning process, let the network focus more on task related information and provide additional training signals, three auxiliary tasks are introduced. The first is a table top object quantity prediction task that uses features extracted from the GNet perception layer to predict how many items remain on the table. The second is the heap height prediction task, which uses information obtained from the SNet perceptual layer to predict the height of the heap at the pixel level. This task helps the network to extract the stack's profile features, which contain useful information to evaluate the current state. The last is an item-centric feature learning task to ensure that items that disappear from the desktop are similar to items that are at the feature level and added to the stack. That is, the item features captured from different perspectives (desktop image and image of pile) should be close.
The entire capture and stacking process is formulated as a Q learning problem. Distributed priority empirical playback is used to learn strategies that can grab different sized boxes and place them tightly on the platform. Experiments were performed both in the simulation environment and in the real world to verify the effectiveness of the proposed method.
In a specific implementation, the operation is performed using a robotic arm system equipped with two fingers. The machine manipulation process may be represented as a Markov Decision Process (MDP). State s with time step at ttThen, the system respectively shoots and stacks through two camerasAnd (5) grabbing the area. The robot then follows a strategy pi parameterized by thetaθ(st) An action (including grabbing and stacking an object) is selected and implemented, and the strategy can be learned through training the deep network. The status is then updated with the instant prize r(s)t,at) S oft+1. After training, the training can be carried out by finding an optimal strategyTo solve the reinforcement learning problem, the strategy maximizes the sum (∑) of the expectation of future rewards (E) at T ═ 1,2, …, T by adjusting θ with a discount factor γ.
The above framework provides a solution to such decision-making problems, but training is difficult due to the difficulty of data collection. Collecting a large amount of experience is critical to the performance of reinforcement learning networks. Compared with the same strategy learning, the different strategy learning method can utilize the collected data for multiple times so as to train when the data collection is difficult. To train the network efficiently, a hetero-strategy Q learning algorithm is employed to learn the strategy for estimating the Q function by minimizing bellman errors:
after training, this strategy will act on the value function by maximizing the optimal state(st,at) To select operations to form an optimal strategyIn other words, it will choose to be in state stAction a of generating the maximum jackpottI.e. to select the required stacking areaAnd place their grab bar in the appropriate position in the current and future states.
Grabbing state s through 4-channel RGBD image taken in front of tablegtModeling, combining 3-channel color data with depth data to project orthogonally to the overhead view angle and rotate counterclockwise by different angles n 22.5 °, n ∈ 0,1,2.. 7. this strategy generates eight new front views for the stacked state representation, the RGB images taken with the camera facing the stacking areastCan be fully represented by 2-dimensional RGB pictures.
Defining a representation of a grab action as agtThe representation of the stacking action is defined as astAn action is generated at each time step. As for the grabbing action agtWhich contains a Cartesian motion command [ x ]g,yg,zg,θg]Wherein [ x ]g,yg,zg]Corresponding to the centre of the gripper in the gripping, thetagIs the rotation angle of the wrist around the z-axis. The technical scheme divides 180 degrees into 8 independent thetagAnd (4) rotating. As for the stacking action astThe stacking area is divided into 14 positions along the x-axis, denoted as si∈[0,13]. These areas also represent the center of the palletized object, since the center of the object is learned to be grabbed after training. Since most objects in a task are 3 cells wide, neither the leftmost nor the rightmost cell area is contained in the operating space (if put at the edge, a part of the object will not be visible). The mechanical arm not only presses fx(si) Designated x-coordinate stacking object (f)xIs to mix siDiscrete function mapped to x coordinate), also by s)stThe inferred z-coordinate is used to stack the object. Other commands (e.g. y-coordinate and gripper orientation) have been fixed during the stacking operation, which simplifies this palletizing problem and facilitates stacking in a dense fashionAnd (4) a box.
The reward in reinforcement learning comprises two parts, wherein one part is used for evaluating the effect of putting and the other part is used for evaluating the effect of grabbing. Because the box that puts things in good order is put very tightly and is the level at the top, consequently this application technical scheme definition puts in good order reward rsThe following were used:
rs=B--H+-O+-L,
wherein B is-Indicates a ruggedness reduction value (evaluated by calculating the column height variance, thresholds set to 0.3, 0, and 1). H+Indicating a maximum height increase value (threshold set to 0 or 0.7). O is+Indicating an increased number of holes. Once the gap is covered, a hole is formed that cannot be refilled. L is a binary value indicating whether the top of the stack is all horizontal. By mixing spt+1And sptThe comparison can calculate these four values. The first three are different piecewise functions, the inputs of which are respectively the image spt+1And sptThereby guiding the strategy to achieve the code placement with high adaptability. Inspired by early termination strategies, if the learned strategy fails to achieve a level of code placement, the metric L will take effect and a restart signal will be sent since the status reward at a later time may be inaccurate.
Snatch reward rgIs defined as:
where G represents the result of the grab, 0 represents a grab failure, and 1 represents a grab success. D, which represents the distance between the center of the object and the grasping position, is crucial for achieving high stacking accuracy.
By incorporating different functions in the full convolutional network (GNet), the deep Q network is extended. The two Q functions are modeled by two convolutional networks (GNet and SNet). At each time step, GNet evaluates sgtThe grab Q function of each pixel in the set, while SNet evaluates sstThe code Q function of each position unit in the array.
Both networks in the architectureFeatures are extracted from the raw image data using the first 3 cells of ResNet-50. Since depth information is critical for accurate grabbing, the input layer of the GNet is adjusted from 3 channels to 6 channels (i.e., RGB is changed to RGBDDD by connecting together the RGB channel and the per-channel replicated depth channel). The input channel of the ResNet component in SNet remains unchanged. Will sgtThe captured pictures (resolution of 224x224) and s are representedstThe size of the representative code-positioned pictures (with a resolution of 256x128) are all adjusted to 320x320, which will generate a feature map of the appropriate size, which is then used for the upsampling and auxiliary tasks.
For subsequent convolutional layers in the GNet and SNet, they share the same network architecture as shown in fig. 2. The convolution kernel size of each convolution layer is 1x1, which helps to reduce size and mitigate possible overfitting. To incorporate characteristic information about which objects are required in a palletization to form an ordered layout and to be easily grasped in a particular grasping direction, the convolutional layer Φ in the SNet issGenerated sstHigh-level features of (1) and convolution layer phi in GNetgGenerated sgtAre fused. Considering that GNet is a full convolution network that encodes position information in a feature map, Φ cannot be scaled down at the channel levelsAnd phigConnected in series, using two linear layerssConversion to along-channel weights ωgThen, ω will beg(between 0 1) times phigThe following were used:
Φm=λωgΦg+(1-λ)Φg
wherein phimRepresenting the fused features, and λ is a scale factor, balancing the original features (containing a preliminary prediction of the location of the object that is easy to hold) with the fused features (emphasizing the feature from s)gtThe features are useful for selecting those objects that are suitable for the status of the stacking area). In the operation of the present embodiment, λ is set to 0.25.
The aforementioned features are sent to different layers for different tasks. For GNet, mixed low-layer features ΦmProcessed by two convolutional layers and then fed into a bilinear upsampling layer. PhigIs used to predict the number of objects on the table by global averaging after pooling the layers using the ReLU activation function and the linear layers, which helps to perceive objects with different layer regions and maintains sensitivity to smaller objects. In SNet, the high-level feature ΦsNot only for facilitating the GNet to make a broader perception, but also for predicting the Q value and column-wise height value of each codespace. Using two separate linear layer modules, by calculating the value function Vs(ss) And a merit function As(ss,as) Jointly estimating the Q value Qs(ss,as). For the task of predicting the height of the stacking area, the height of the object which is predicted column by column when the object is stacked can represent the upper boundary of a pile, wherein auxiliary information for predicting the Q value of the stacking area is contained.
Another auxiliary task, not shown in fig. 2, is an object-centric feature learning task. When the robot arm picks up and stacks an object from the desktop (desktop image and stack image in fig. 1), the features of both scenes extracted by the network should change according to the features of the object that is removed from the desktop and appears on the stack. The task will successfully capture the perception layer characteristics phi before and aftergWith the characteristics of the sensing layer before and after stackingpA comparison is made. To compute these features, the feature map Φ is processed by applying global average pooling and the ReLU non-linear layergAnd phip. Thus, the perception modules in GNet and SNet have the ability to identify the same object by similar features, which facilitates the above-described feature fusion.
The GNet and SNet are co-trained using a deep Q network as a Q function approximator. Specifically, the Q function Q will be grabbedg(sp+g,ag) Modeling as a full convolution network (GNet), decoding a Q function Qs(ss,as) Modeled as a deep network (SNet). Dual Q learning is used to train GNet and SNet. Compared with a Q learning method, the method adopts a target network and an improved maximum operator, so that the method is more reliable. The target network shares the same architecture as the network in fig. 2 (without the auxiliary task module), with its parameters extracted from the online learning model every 300 steps. For theMaximum operator, dual Q learning use Current QθAnd from target Qθ -The operation of maximizing the obtained value, that is, the loss functions for the grab Q function and the put-on Q function are both:
i ∈ { g, s } in each training round, the expected value E is calculated from a small batch of samples, the remaining variables being described abovegt) The parameterization of (a) allows the convolution characteristics to be shared between position and orientation. In the technical scheme of the application, the Q value Q predicted by GNetgWhat represents the grabbing and grabbing at a position where it is likely to be successful is the object required for the stacking strategy. Thus, the Q function Qg(sp+g,ag) The reward of (a) should include both grab and bet rewards. In this way,. phisTo omegagThe feature fusion layer of (2) can be self-adjusted in the training process.
To speed up learning, a distributed learning framework is implemented. There are 16 samplers sampling in an asynchronous manner. After collecting 200 samples, the sampler transfers the experience of samples with different priorities to the learner and duplicates the parameters from it. Meanwhile, the learner performs training through an empirical replay buffer with two priorities that stores empirical indexes with different priorities related to grab and put, and alternates use of samples with high priorities. Where samples with larger prediction errors will have higher priority. The pseudo code of the learner training routine is listed in Algorithm 1.
The system is trained in a simulated environment to improve efficiency. A UR5 robot equipped with a Robotiq85 gripper was used in V-REP. Four types of cassettes were designed, with sizes of 3x 3x 3, 3x 9x 3, 6x 9x 3 and 9x 9x 3 (in centimeters).
In addition to estimating the loss of the Q function during the learning process, the auxiliary task is also lost differently. The task of predicting the landing zone height and the number of objects is trained using the smoothing L1 loss function, while the task of learning features centered on objects uses the n-pair loss function. The GNet and SNet were trained simultaneously using a random gradient descent with a learning rate of 0.0001. Both Q learning methods employ an epsilon-greedy exploration strategy, with epsilon initialized to 0.9 for SNet and 0.5 for GNet, respectively, and then annealed to 0.05 in the training.
The system (GSN) of the present solution is evaluated in a simulation environment and in a real scenario. Boxes of different sizes and colors are randomly stacked on a table, and the robot needs to grab and stack the boxes one by one to form a stable stack. Three experiments were performed:
1) comparative study between the reinforcement learning framework and the supervised learning method of the embodiment of the application;
2) ablation studies to assess the contribution of each component of the system of the present application in terms of overall performance;
3) it is demonstrated that the system of the present application can be applied to real robots to perform pick and place tasks.
The present application uses a UR5 robotic arm and a Robotiq85 gripper (with an attached Realsense camera) to perform the same test task on a real environment robot. In practical tests, the stacking performance was evaluated by the height difference Hd between the highest and lowest surfaces of the stack. If Hd is less than or equal to 2, the task of code placement is regarded as successful. The method achieves 75% (15/20) of stacking success rate in the box stacking task, and the supervised learning method can only achieve 15% (3/20) of success rate.
In one embodiment, an autonomous grasping and palletizing method and system based on a mechanical arm includes:
the mechanical arm, the camera and the control system;
the camera collects images of a grabbing area and a stacking area of an object to be stacked, and inputs the images to an automatic grabbing and stacking network of the control system;
the automatic grabbing and stacking network predicts a grabbing position and a stacking position according to the learned grabbing strategy and stacking strategy;
and the mechanical arm selects and carries out grabbing according to the prediction result and then stacks the objects to be stacked.
When the system works, specific reference is made to specific steps of the automatic grabbing and stacking method based on the mechanical arm in the embodiment.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive efforts by those skilled in the art based on the technical solution of the present invention.
Claims (10)
1. An automatic grabbing and stacking method based on mechanical arms is characterized by comprising the following steps:
acquiring images of a stacking area and a grabbing area of an object to be stacked, and inputting the images into an automatic grabbing and stacking network;
the automatic grabbing and stacking network predicts a grabbing position and a stacking position according to the learned grabbing strategy and stacking strategy;
when the automatic grabbing code network is combined with deep reinforcement learning, an optimal strategy for maximizing the expected sum of future rewards is adopted;
and the mechanical arm selects the required objects in the grabbing area according to the prediction result and places the objects in the grabbing area at proper positions in the current and future states.
2. The method for automatically grabbing and palletizing based on the mechanical arm as claimed in claim 1, wherein the automatic grabbing and palletizing network comprises a grabbing network and a palletizing network, the grabbing position and the palletizing area are respectively predicted, and the characteristics of images of the palletizing area and the characteristics of images of the grabbing area are fused to realize the transmission of information of the palletizing area to the grabbing network.
3. The method for automatically grabbing and palletizing based on the mechanical arm as claimed in claim 1, wherein when an automatic grabbing and palletizing network learns a grabbing strategy and a palletizing strategy, training is performed based on task-related information, and the method comprises the following steps:
predicting the number of objects left in the grabbing area by using the features extracted from the grabbing network sensing layer;
predicting a height of the pile at a pixel level using information obtained from a stacking network aware layer;
feature learning centered on the item to ensure that items disappearing from the grip area are similar to items at the feature level and added to the stack.
4. The robotic-arm-based autonomous grasping and palletizing method according to claim 1, wherein the automated grasping and palletizing network learns to grasp and closely stack articles of different sizes over a palletizing area using distributed, prior empirical playback.
5. The robotic-arm-based autonomous grasping and palletizing method as in claim 1, wherein the image of the grasping area of the object to be palletized is processed before being input into the automatic grasping and stacking network: combining the 3-channel color data with the depth data to project orthogonally to the overhead view and rotate counterclockwise by different angles, a new elevation is generated.
6. The robot-based autonomous grasping and palletizing method as set forth in claim 1, wherein for the representation of the palletizing state of the palletizing region, RGB images taken by a camera facing the palletizing region are used.
7. The robotic-arm-based autonomous grasping and palletizing method according to claim 1, characterized in that two Q-functions are modeled by a grasping network and a stacking network, the grasping network evaluating, at each time step, the grasping Q-function for each pixel in the grasping state, and the stacking network evaluating the stacking Q-function for each position unit in the stacking state of the object.
8. The robotic-arm-based autonomous grasping and palletizing method according to claim 1, wherein the grasping network and the palletizing network extract features from raw image data; for the convolution layers in the capture network and the stacking network, fusing the high-level characteristics of the stacking state of the object generated by the convolution layers in the stacking network with the high-level characteristics of the capture state generated by the convolution layers in the capture network;
for a grab network, the mixed low-level features are processed by two convolutional layers and then fed into a bilinear upsampling layer, and another function of the convolutional layers is also used to predict the number of objects on the table by passing in an activation function and a linear layer after globally averaging the pooling layer.
9. The robotic-arm-based autonomous grasping and palletizing method according to claim 1, wherein a deep Q-network is used as a Q-function approximator to train the grasping network and the palletizing network together.
10. The utility model provides an independently snatch and pile up neatly system based on arm, characterized by includes:
the mechanical arm, the camera and the control system;
the camera collects images of a grabbing area and a stacking area where objects to be stacked are placed, and inputs the images to an automatic grabbing and stacking network of the control system;
the automatic grabbing and stacking network predicts a grabbing position and a stacking position according to the learned grabbing strategy and stacking strategy;
and the mechanical arm selects and carries out grabbing according to the prediction result and then stacks the objects to be stacked.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010260136.1A CN111331607B (en) | 2020-04-03 | 2020-04-03 | Automatic grabbing and stacking method and system based on mechanical arm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010260136.1A CN111331607B (en) | 2020-04-03 | 2020-04-03 | Automatic grabbing and stacking method and system based on mechanical arm |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111331607A true CN111331607A (en) | 2020-06-26 |
CN111331607B CN111331607B (en) | 2021-04-23 |
Family
ID=71176895
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010260136.1A Active CN111331607B (en) | 2020-04-03 | 2020-04-03 | Automatic grabbing and stacking method and system based on mechanical arm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111331607B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112643668A (en) * | 2020-12-01 | 2021-04-13 | 浙江工业大学 | Mechanical arm pushing and grabbing cooperation method suitable for intensive environment |
CN113592855A (en) * | 2021-08-19 | 2021-11-02 | 山东大学 | Heuristic deep reinforcement learning-based autonomous grabbing and boxing method and system |
CN114454160A (en) * | 2021-12-31 | 2022-05-10 | 中国人民解放军国防科技大学 | Mechanical arm grabbing control method and system based on kernel least square soft Bellman residual reinforcement learning |
WO2023050589A1 (en) * | 2021-09-30 | 2023-04-06 | 北京工业大学 | Intelligent cargo box loading method and system based on rgbd camera |
WO2024031831A1 (en) * | 2022-08-09 | 2024-02-15 | 山东大学 | Mechanical arm packing and unpacking collaboration method and system based on deep reinforcement learning |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018236753A1 (en) * | 2017-06-19 | 2018-12-27 | Google Llc | Robotic grasping prediction using neural networks and geometry aware object representation |
CN109344882A (en) * | 2018-09-12 | 2019-02-15 | 浙江科技学院 | Robot based on convolutional neural networks controls object pose recognition methods |
CN109397285A (en) * | 2018-09-17 | 2019-03-01 | 鲁班嫡系机器人(深圳)有限公司 | A kind of assembly method, assembly device and assembly equipment |
CN109514553A (en) * | 2018-11-21 | 2019-03-26 | 苏州大学 | A kind of method, system and the equipment of the mobile control of robot |
CN110400345A (en) * | 2019-07-24 | 2019-11-01 | 西南科技大学 | Radioactive waste based on deeply study, which pushes away, grabs collaboration method for sorting |
CN110539299A (en) * | 2018-05-29 | 2019-12-06 | 北京京东尚科信息技术有限公司 | Robot working method, controller and robot system |
US20190385022A1 (en) * | 2018-06-15 | 2019-12-19 | Google Llc | Self-supervised robotic object interaction |
-
2020
- 2020-04-03 CN CN202010260136.1A patent/CN111331607B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018236753A1 (en) * | 2017-06-19 | 2018-12-27 | Google Llc | Robotic grasping prediction using neural networks and geometry aware object representation |
CN110539299A (en) * | 2018-05-29 | 2019-12-06 | 北京京东尚科信息技术有限公司 | Robot working method, controller and robot system |
US20190385022A1 (en) * | 2018-06-15 | 2019-12-19 | Google Llc | Self-supervised robotic object interaction |
CN109344882A (en) * | 2018-09-12 | 2019-02-15 | 浙江科技学院 | Robot based on convolutional neural networks controls object pose recognition methods |
CN109397285A (en) * | 2018-09-17 | 2019-03-01 | 鲁班嫡系机器人(深圳)有限公司 | A kind of assembly method, assembly device and assembly equipment |
CN109514553A (en) * | 2018-11-21 | 2019-03-26 | 苏州大学 | A kind of method, system and the equipment of the mobile control of robot |
CN110400345A (en) * | 2019-07-24 | 2019-11-01 | 西南科技大学 | Radioactive waste based on deeply study, which pushes away, grabs collaboration method for sorting |
Non-Patent Citations (3)
Title |
---|
ANDY ZENG ET AL: "Learning Synergies between Pushing and Grasping with Self-supervised Deep Reinforcement Learning", 《2018 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS(IROS)》 * |
E.JANG ET AL: "Grasp2vec:Learning object representations from self-supervised grasping", 《ARXIV:1811.06964》 * |
Y. JIANG ET AL: "Learning to place new objects", 《2012 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA)》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112643668A (en) * | 2020-12-01 | 2021-04-13 | 浙江工业大学 | Mechanical arm pushing and grabbing cooperation method suitable for intensive environment |
CN112643668B (en) * | 2020-12-01 | 2022-05-24 | 浙江工业大学 | Mechanical arm pushing and grabbing cooperation method suitable for intensive environment |
CN113592855A (en) * | 2021-08-19 | 2021-11-02 | 山东大学 | Heuristic deep reinforcement learning-based autonomous grabbing and boxing method and system |
CN113592855B (en) * | 2021-08-19 | 2024-02-13 | 山东大学 | Autonomous grabbing and boxing method and system based on heuristic deep reinforcement learning |
WO2023050589A1 (en) * | 2021-09-30 | 2023-04-06 | 北京工业大学 | Intelligent cargo box loading method and system based on rgbd camera |
CN114454160A (en) * | 2021-12-31 | 2022-05-10 | 中国人民解放军国防科技大学 | Mechanical arm grabbing control method and system based on kernel least square soft Bellman residual reinforcement learning |
CN114454160B (en) * | 2021-12-31 | 2024-04-16 | 中国人民解放军国防科技大学 | Mechanical arm grabbing control method and system based on kernel least square soft Belman residual error reinforcement learning |
WO2024031831A1 (en) * | 2022-08-09 | 2024-02-15 | 山东大学 | Mechanical arm packing and unpacking collaboration method and system based on deep reinforcement learning |
Also Published As
Publication number | Publication date |
---|---|
CN111331607B (en) | 2021-04-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111331607B (en) | Automatic grabbing and stacking method and system based on mechanical arm | |
JP6921151B2 (en) | Deep machine learning methods and equipment for robot grip | |
DE102019130048B4 (en) | A robotic system with a sack loss management mechanism | |
CN110785268B (en) | Machine learning method and device for semantic robot grabbing | |
CN110238840B (en) | Mechanical arm autonomous grabbing method based on vision | |
Zhang et al. | Grasp for stacking via deep reinforcement learning | |
CN112297013B (en) | Robot intelligent grabbing method based on digital twin and deep neural network | |
CN110298886B (en) | Dexterous hand grabbing planning method based on four-stage convolutional neural network | |
CN111203878A (en) | Robot sequence task learning method based on visual simulation | |
CN114641378A (en) | System and method for robotic picking | |
JP2020082322A (en) | Machine learning device, machine learning system, data processing system and machine learning method | |
CN110969660A (en) | Robot feeding system based on three-dimensional stereoscopic vision and point cloud depth learning | |
CN115213896A (en) | Object grabbing method, system and equipment based on mechanical arm and storage medium | |
CN113715016A (en) | Robot grabbing method, system and device based on 3D vision and medium | |
JP2022187983A (en) | Network modularization to learn high dimensional robot tasks | |
CN114789454A (en) | Robot digital twin track completion method based on LSTM and inverse kinematics | |
Xue et al. | Gesture-and vision-based automatic grasping and flexible placement in teleoperation | |
CN112288809B (en) | Robot grabbing detection method for multi-object complex scene | |
CN116460843A (en) | Multi-robot collaborative grabbing method and system based on meta heuristic algorithm | |
CN113762159B (en) | Target grabbing detection method and system based on directional arrow model | |
JP2022187984A (en) | Grasping device using modularized neural network | |
CN115631401A (en) | Robot autonomous grabbing skill learning system and method based on visual perception | |
CN114998573A (en) | Grabbing pose detection method based on RGB-D feature depth fusion | |
Khargonkar et al. | SCENEREPLICA: Benchmarking Real-World Robot Manipulation by Creating Reproducible Scenes | |
CN117086862A (en) | Six-degree-of-freedom flexible grabbing method for mechanical arm based on double-agent reinforcement learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |