CN111099363B - Stacking method, stacking system and storage medium - Google Patents

Stacking method, stacking system and storage medium Download PDF

Info

Publication number
CN111099363B
CN111099363B CN202010020711.0A CN202010020711A CN111099363B CN 111099363 B CN111099363 B CN 111099363B CN 202010020711 A CN202010020711 A CN 202010020711A CN 111099363 B CN111099363 B CN 111099363B
Authority
CN
China
Prior art keywords
training
data
tray
model
probability
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010020711.0A
Other languages
Chinese (zh)
Other versions
CN111099363A (en
Inventor
赵航
彭飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan Shibite Robot Co Ltd
Original Assignee
Hunan Shibite Robot Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan Shibite Robot Co Ltd filed Critical Hunan Shibite Robot Co Ltd
Priority to CN202010020711.0A priority Critical patent/CN111099363B/en
Publication of CN111099363A publication Critical patent/CN111099363A/en
Application granted granted Critical
Publication of CN111099363B publication Critical patent/CN111099363B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B65CONVEYING; PACKING; STORING; HANDLING THIN OR FILAMENTARY MATERIAL
    • B65GTRANSPORT OR STORAGE DEVICES, e.g. CONVEYORS FOR LOADING OR TIPPING, SHOP CONVEYOR SYSTEMS OR PNEUMATIC TUBE CONVEYORS
    • B65G57/00Stacking of articles
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B65CONVEYING; PACKING; STORING; HANDLING THIN OR FILAMENTARY MATERIAL
    • B65GTRANSPORT OR STORAGE DEVICES, e.g. CONVEYORS FOR LOADING OR TIPPING, SHOP CONVEYOR SYSTEMS OR PNEUMATIC TUBE CONVEYORS
    • B65G43/00Control devices, e.g. for safety, warning or fault-correcting

Landscapes

  • Engineering & Computer Science (AREA)
  • Mechanical Engineering (AREA)
  • Stacking Of Articles And Auxiliary Devices (AREA)

Abstract

The application discloses a stacking method, a stacking system and a storage medium. A palletizing method for palletizing a target object to a current pallet, the palletizing method comprising: acquiring training data; and training a preset processing model according to the training data, wherein the trained processing model is used for processing the size information of the target object and the state information of the current tray so as to determine the target position of the target object on the current tray. Therefore, the target position can be simply and quickly determined, and the stacking effect is good.

Description

Stacking method, stacking system and storage medium
Technical Field
The application relates to the technical field of electronics, in particular to a stacking method, a stacking system and a storage medium.
Background
The palletizing method of the related art is researched for the problem of three-dimensional boxing in pursuit of maximum space utilization rate. However, the stacking method in the related art is complicated and complicated, and has a poor application effect in an actual scene.
Disclosure of Invention
The application provides a stacking method, a stacking system and a storage medium.
The embodiment of the application provides a stacking method. The palletizing method is used for stacking target objects to a current tray, and comprises the following steps:
acquiring training data;
and training a preset processing model according to the training data, wherein the trained processing model is used for processing the size information of the target object and the state information of the current tray so as to determine the target position of the target object on the current tray.
The embodiment of the application provides a stacking system. The stacking system is used for stacking the target object on the current tray and comprises a memory and a processor, the processor is connected with the memory, and the processor is used for acquiring training data; and training a preset processing model according to the training data, wherein the trained processing model is used for processing the size information of the target object and the state information of the current tray so as to determine the target position of the target object on the current tray.
The embodiment of the application provides a computer readable storage medium. The computer-readable storage medium has stored thereon a control program which, when executed by a processor, implements the palletizing method as described above.
According to the stacking method, the stacking system and the storage medium, the preset processing model is trained according to the training data, and the trained processing model is used for processing the size information of the target object and the state information of the current tray so as to determine the target position of the target object on the current tray, so that the target position can be determined simply and quickly, and the stacking effect is good.
Additional aspects and advantages of embodiments of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of embodiments of the present application.
Drawings
The above and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a schematic flow diagram of a palletizing method according to an embodiment of the present application;
FIG. 2 is a block schematic diagram of a palletizing system according to an embodiment of the present application;
FIG. 3 is a data flow diagram of a palletizing system according to an embodiment of the present application;
FIG. 4 is a schematic flow diagram of a palletizing method according to another embodiment of the present application;
FIG. 5 is a scene schematic diagram of a palletizing method according to an embodiment of the present application;
FIG. 6 is a schematic diagram of another scenario of a palletizing method according to an embodiment of the present application;
FIG. 7 is a schematic flow diagram of a palletizing method according to yet another embodiment of the present application;
FIG. 8 is a schematic view of another scenario of a palletizing method according to an embodiment of the present application;
FIG. 9 is a schematic flow diagram of a palletizing method according to yet another embodiment of the present application;
FIG. 10 is a schematic flow diagram of a palletizing method according to another embodiment of the present application;
fig. 11 is a schematic view of an effect of a palletizing method of the related art;
fig. 12 is a schematic diagram of the effect of the palletizing method according to the embodiment of the present application.
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative and are only for the purpose of explaining the present application and are not to be construed as limiting the present application.
In the description of the present application, the terms "first", "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of technical features indicated. Thus, features defined as "first", "second", may explicitly or implicitly include one or more of the described features. In the description of the present application, "a plurality" means two or more unless specifically limited otherwise.
In the description of the present application, it is to be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; may be mechanically connected, may be electrically connected or may be in communication with each other; either directly or indirectly through intervening media, either internally or in any other relationship. The specific meaning of the above terms in the present application can be understood by those of ordinary skill in the art as appropriate.
The following disclosure provides many different embodiments or examples for implementing different features of the application. In order to simplify the disclosure of the present application, specific example components and arrangements are described below. Of course, they are merely examples and are not intended to limit the present application. Moreover, the present application may repeat reference numerals and/or letters in the various examples, such repetition is for the purpose of simplicity and clarity and does not in itself dictate a relationship between the various embodiments and/or configurations discussed. In addition, examples of various specific processes and materials are provided herein, but one of ordinary skill in the art may recognize applications of other processes and/or use of other materials.
Referring to fig. 1,2 and 3, embodiments of the present application provide a palletizing method and system 100. The stacking method comprises the following steps:
step S11: acquiring training data;
step S12: and training a preset processing model according to the training data, wherein the trained processing model is used for processing the size information of the target object and the state information of the current tray so as to determine the target position of the target object on the current tray.
The present embodiment provides a palletizing system 100. The palletizing system 100 is used for palletizing the target objects to the current tray, the palletizing system 100 comprises a memory 102 and a processor 101, the processor 101 is connected with the memory 102, and the processor 101 is used for acquiring training data; and training a preset processing model according to the training data, wherein the trained processing model is used for processing the size information of the target object and the state information of the current tray so as to determine the target position of the target object on the current tray.
According to the stacking method and the stacking system 100, the preset processing model is trained according to the training data, and the trained processing model is used for processing the size information of the target object and the state information of the current tray so as to determine the target position of the target object on the current tray, so that the target position can be determined simply and quickly, and the stacking effect is good.
In particular, the palletizing method and the palletizing system 100 according to the embodiment of the present application may be based on a reinforcement learning algorithm. In other words, the present application provides a palletizing method and a palletizing system 100 based on a reinforcement learning algorithm.
As can be appreciated, palletizing consumes a significant amount of labor in the modern logistics industry. The existing various stacking methods are still greatly limited in the actual application scene. The existing stacking algorithm is developed mostly according to the classic three-dimensional packing (3D-BPP) problem, in the classic three-dimensional packing problem, the number and the size information of boxes to be stacked are completely mastered, the sequence of stacking the boxes can be adjusted at will, and rigid constraint is not made on the box support problem and the stability problem of the whole stack.
However, in an actual palletization scenario, the boxes to be palletized are continuously stacked onto the tray, and during the palletization it must be ensured that the boxes are supported and that the stability of the entire stack is ensured.
Because the stacking problem is an NP-hard problem, people can set a heuristic algorithm to simulate the placement of the box only through knowledge and experience. And the reinforcement learning algorithm can guide the strategy of how to interact with the environment through the manually set scores in the process of interacting with the real environment. Reinforcement learning, an unsupervised feature, is well suited to optimize palletizing, a problem in which it is difficult for a human to set optimal rules. Therefore, the optimal box combination and placement strategy is learned under the condition that the boxes arrive in disorder.
In this embodiment, the target object is a box. In other embodiments, the target object may be a table, chair, or other object. The specific form of the target object is not limited herein.
In step S12, the size information of the target object includes box size, and the status information of the current tray includes height distribution and space utilization of the current tray.
Further, the current tray may be equally divided into a plurality of positions in advance. The height distribution of the current tray may include the height of the laid boxes for each position of the current tray. Space utilization may refer to the quotient of the position of the box placed and the overall position.
Referring to fig. 4, in some embodiments, step S12 includes:
step S121: determining filtering data (mask) from the training data;
step S122: and training a preset processing model according to the training data and the filtering data.
In some embodiments, the processor 101 is configured to determine filter data from the training data; and a processing module for training the predetermined processing model based on the training data and the filtering data.
Therefore, the preset processing model is trained according to the training data. Moreover, since the filtering data is determined according to the training data, the processing of the preset processing model can be more accurate. In this embodiment, the filtering data may be a mask.
Specifically, the training data includes size data of the training object and status data of the training tray, the filtering data includes a filtering value for each position of the training tray, and the step S121 includes:
determining all first positions and second positions of the training tray according to the state data and the size data of the current training object, wherein the first positions are positions where the current training object can be put down on the training tray, and the second positions are positions where the current training object cannot be put down on the training tray;
setting the filtering value of the first position as a first preset value;
and setting the filtering value of the second position as a second preset value.
In this manner, filtering data is determined from the training data. Moreover, the filtering value of each position of the training tray can be filtered, so that the probability corresponding to each position of the training tray can be filtered, and the filtering is more comprehensive.
Specifically, in the present embodiment, the first preset value is 1, and the range of the second preset value is greater than 0 and less than 1. In other words, the second preset value is a value in the range of (0, 1). For example, 0.1, 0.22, 0.31, 0.47, 0.52, 0.66, 0.73, 0.87, 0.91. The specific value of the second preset value is not limited herein.
Therefore, punishment can be carried out on the probability corresponding to the second position, and the probability corresponding to the second position is reduced, so that the second position is prevented from being subsequently selected, and the second position is filtered. In other words, the filtered data can be used as a supervisory signal to filter out invalid locations.
Further, each location of the training tray may be traversed to determine a filter value for each location.
It will be appreciated that if the filtering data is not used to limit the actions of the process model agent, the process model agent may terminate in two cases during the training or testing of the process model agent using the reinforcement learning algorithm. Firstly, there is no place in the current tray for placing the current box, and the processing model agent randomly outputs an action to terminate training or testing. Secondly, the current box can be placed in the current tray, the processing model agent is directly placed in the wrong position, the placing is not in accordance with the physical rule, and the current round (epicode) is terminated.
In the embodiment, on the basis of processing the size data and the state data received by the model agent, the filtering data is added as a supervision signal, so that the accuracy and the reasonability of the training position can be improved.
In addition, step S121 may include:
when the current first position is a corner position, determining a first sub-position and a second sub-position in the first position, the distance between which and the current first position is within a preset range, wherein the first sub-position is the corner position, and the second sub-position is a non-corner position;
setting the filtering value of the third sub-position as a third preset value;
and setting the filtering value of the second sub-position as a fourth preset value.
Therefore, the probability of selecting the corner position by the current training object is improved, and more placing space is saved for the subsequent training objects. In addition, the current training object is preferentially placed at a corner position, and the placing logic is better met.
In the present embodiment, the third preset value is 1, and the range of the fourth preset value is greater than 0 and less than 1. In other words, the fourth preset value is a value in the range of (0, 1). For example, 0.1, 0.22, 0.31, 0.47, 0.52, 0.66, 0.73, 0.87, 0.91. The specific value of the fourth preset value is not limited herein. In addition, the fourth preset value may be the same as or different from the second preset value.
Specifically, the corner positions refer to, in a top plan view of the training tray, a two-dimensional vertex of the training object and a two-dimensional vertex of the training tray. In other words, the corner positions include the corner positions of the tray and the corner positions of the boxes already placed on the tray. As shown in fig. 5.
In addition, the preset range may be a range formed by a circle having the current first position as a center and a preset distance as a radius, or a range formed by a position having a manhattan distance from the current first position smaller than the preset distance. The specific form of the preset range is not limited herein.
In the example of fig. 5, the number of placed training objects 80 is three, and the vertices 81 of the three placed training objects 80 and the four vertices 91 of the training tray 90 are corner positions. The current training object preferably has one or more vertices on the base at corner positions when the current training object is placed. The method of the present embodiment can reduce the probability corresponding to a part of non-corner positions, and can improve the probability of selecting corner positions.
Referring to fig. 6, the current first position 82 is a corner position, and within a predetermined range of the current first position 82, i.e., the range defined by the dashed box 821 in fig. 6, a first sub-position 83 and a second sub-position 84 can be determined, where the first sub-position is a corner position and the second sub-position 84 is a non-corner position.
In other words, within the preset range of the current first position, only the positions that are also corner positions are defined as the first sub-positions, no penalty is imposed on the first sub-positions, and the second sub-positions within the preset range are all masked due to the penalty, so that the hit probability is low. Note that positions outside the preset range are not affected. For example, the position of the lower left corner of the training tray 90 in fig. 6 is not masked by the current first position 82 because it is outside the preset range.
Referring to fig. 7, in some embodiments, the training data includes size data of the training object and state data of the training tray, the processing model includes a first sub-model (operator) and a second sub-model (critic), and step S122 includes:
step S1221: determining probability data according to the training data by using the first submodel, wherein the probability data is the probability corresponding to each position of the current training object stacked on the training tray;
step S1222: processing the probability data to determine a training position based on the filtered data;
step S1226: determining evaluation data (Q value) from the training position and the training data using the second submodel;
step S1227: the process model is updated with the evaluation data.
In some embodiments, the training data includes size data of the training object and state data of the training tray, the processing model includes a first sub-model and a second sub-model, the processor 101 is configured to determine probability data according to the training data by using the first sub-model, and the probability data is a probability corresponding to each position where the current training object is stacked on the training tray; and for processing the probability data to determine a training position based on the filtered data; and determining evaluation data according to the training position and the training data by using the second submodel; and for updating the process model with the evaluation data.
Therefore, the preset processing model is trained according to the training data and the filtering data. Because the probability data is processed by filtering the data, the accuracy of the training position can be improved, and the processing capability of the processing model is better.
In the present embodiment, the size information of the training object includes a box size (l)n,wn,hn) The state information of the training tray comprises height distribution and space utilization rate r of the training traynAnd tray size (L, W, H). The box size is less than or equal to the tray size. Namely: ln<0.5L,wn<0.5W,hn<0.5H. In this way, premature stopping of the palletization due to the case being too large can be avoided.
In the embodiment, the processing model agent is based on a reinforcement learning algorithm, environment data (observation) is input into the processing model, and the processing model outputs a training position and evaluation data.
In particular, the environmental data may be collected by a camera of the palletization system 100. The environmental data includes dimensional data of the training objects, status data of the training tray, and filtering data.
The dimensional data of the training object includes box dimensions (l)n,wn,hn) The status data of the training tray includes trainingHeight distribution H and space utilization r of trayn
Note that the training tray may be equally divided into a plurality of positions in advance. The height distribution H of the training tray may comprise the height of the laid boxes for each position of the training tray. Space utilization rate rnMay refer to the quotient of the location where the box has been placed and the location of the box at all. In other words, the spatial distribution of training objects already placed on the training tray may be mapped to a height matrix H, as shown in fig. 8.
On the basis of the environment data, the processing model agent needs to make an action (action) according to the policy function pi of the processing model agent, namely, the training position is determined. The training position is the current training object BnThe left front-left-bottom vertex (FLB front-left-bottom) places coordinates (x, y) on the training tray. The space for processing all actions of the model agent is defined as A.
Further, the coordinates (x, y) on the training tray can be expressed as an integer to facilitate processing of the model agent selection action by the following formula:
action=L*x+y。
meanwhile, the policy function that may define an agent is:
π(a|s)=p[an=a|Sn=So]。
please note that for an action, if the current training object B isnIs completely landed on a training tray or other training object, or BnIs in a supportable position, the action can be determined to be valid, otherwise the full round (epamode) is terminated.
In addition, the filtered data may be the current training object BnPosition coding M capable of being placed on training trayn,MnIs a vector of length L W, Mn[x*L+y]Identifying whether a location (x, y) on the container is available. As previously described, for each location (x, y) on the training tray, the data M is filtered if that location can be used to place the current training objectnThe filtering value is a first preset value, namely 1; otherwise, the filter value here is the penalty factor ε, 0<ε<1。
The environmental data may be normalized and used as input for processing the model agent. The environment data received by the first sub-model operator of the process model agent is a vector of 2L W +4 dimensions.
In the reinforcement learning-based processing model agent according to the present embodiment, both the first sub-model (operator) and the second sub-model (critic) receive the environment data as input, and the output of the first sub-model (operator) is the current state of each action and its probability, that is, the probability data. The output of the second submodel (critic) is an evaluation of future palletizations starting from the current state, i.e. the aforementioned evaluation data. During the training process, the first submodel and the second submodel are continuously updated.
In the embodiment, an ACKTR (Actor Critic Kronecker-influenced Trust Region) reinforcement learning algorithm improved based on an advantageous action evaluation algorithm (a 2c) is adopted to train the placing process of an agent. The acktr algorithm differs from the a2c algorithm in the manner in which agent gets the reward value (reward) and then updates the network. a2c is based on a method such as a random gradient descent algorithm (SGD), and first-order gradient update is performed on the network after a loss function (loss) is calculated. The acktr method is to use a Kronecker coefficient approximation curvature (K-FAC) method to estimate the second-order natural gradient of the network so as to update the network, so that the sample utilization efficiency of agent is higher. The operator part of acktr outputs probability magnitudes Pactions corresponding to each coordinate position on the tray, and agent randomly selects an action according to the probability weight of each action in search (application), and selects only the action with the highest probability as the action to be executed next in use (application).
Referring to fig. 9, in some embodiments, step S1222 includes:
step S1223: processing the probability data according to the filtering data to obtain filtered probability data;
step S1224: determining a maximum value in the filtered probability data;
step S1225: and taking the corresponding position of the maximum value on the training tray as a training position.
In some embodiments, the processor 101 is configured to process the probability data according to the filtered data to obtain filtered probability data; determining a maximum value in the filtered probability data; and taking the corresponding position of the maximum value on the training tray as a training position.
In this manner, processing probability data to determine a training location from the filtered data is achieved. In step S1223, the filtered data and the probability data may be correspondingly multiplied, thereby achieving processing of the probability data according to the filtered data. It will be appreciated that if the probability of a location is multiplied by ε, and 0< ε <1, then the probability of the location after filtering is less than the probability of the location before filtering, i.e., the probability of the location is attenuated or penalized. Alternatively, the filtered data is encoded for illegal actions. Therefore, illegal actions of the processing model agent in the exploration and utilization processes can be avoided.
In particular, illegal actions calculated from the environmental data may be coded MnAnd probability P of all actions output by agentactionsThe dot product is calculated. PactionsIs a vector of length L W, Pactions[L*x+y]Representing the magnitude of the probability that agent selects the location on the container (x, y). PactionsAnd MnAfter the dot multiplication, the probability of an illegal position is multiplied by a penalty factor ε to be smaller. This enables actions that are not legal to be masked or actions that are not desired by the agent in the current situation to be masked.
Undesirable actions by an agent, such as an agent placing one box in a position near another box, rather than being placed next to the other box, result in many unusable gaps between boxes. Can set the value range of epsilon to be 0<ε<1. Since if M isnContains 0, and is reacted with PactionsAnd the dot product is calculated and then transmitted into the network, so that the network is easy to have problems in the gradient transmission process, and the training process is interrupted, therefore, the lower limit of epsilon is more than 0.
In this embodiment,. epsilon.is 1 e-3. Thus, the penalty factor ε is small enough to filter the data MnFiltering of illegal actions has been sufficient to force the process model agent to forgo selection of the current location.
In addition, when it is determined that there are no positions on the entire pallet where boxes can be placed, a label M with a value of all 1's can be returned from the environmentnMeaning that we no longer intervene in the action of agent at this time.
In order to make the probability of illegal action as small as possible when the first submoder acts to create a policy, the operation probability Pinvalid that is illegal among the operation probabilities output by the actors may be extracted and minimized as our optimization goal. Meanwhile, in order to ensure that the processing model agent has better exploratory property (exploration) in the training process, the entropy of the probability data output by the first sub-model operator can be calculated by adopting the following formula:
Figure BDA0002360640600000081
the larger the entropy, the more distributed the probability of the first sub-model operator on different actions, the better exploratory the processing model agent will have, therefore, the optimization goal also includes maximizing entropy.
Finally, the sum of the loss functions (loss) of the network parts is:
L=αlaction+βlvalue+γ∑pinvalid 2-δ*entropy。
after an agent outputs an action, the action can be judged and rewarded, and the agent updates an operator network and a critic network after receiving the rewarded. Currently, the reward is designed to be 0 when an agent successfully places a box before the turn is finished; at the end of the last box, i.e. at the end of the round, the environment clears the space utilization in the training tray and is rewarded accordingly.
Of course, agent may be rewarded every time an agent places a box. However, both of these rewarded boot objectives are ultimately to have the agent put as many boxes as possible.
In this embodiment, the reward is set only at the last step of processing the model agent. Note that in updating the network with rewarded, we also normalize the rewarded.
Referring to fig. 10, in some embodiments, the number of the training objects is multiple, the processing model includes a third sub-model, and step S122 includes:
step S1228: and updating the training position according to the evaluation data and the size data of the plurality of training objects by using the third sub-model.
In some embodiments, the processing model comprises a third submodel, and the processor 101 is configured to update the training positions based on the evaluation data and the size data of the plurality of training objects using the third submodel.
Therefore, the preset processing model is trained according to the training data and the filtering data. Moreover, the training position is more reasonable and the stacking effect is better through the data of a plurality of training objects.
It can be understood that, assuming that the order of arrival of the training objects is training object a, training object b, and training object c, the layout of fig. 11 is reasonable when the data of these three training objects cannot be obtained. When the data of the three training objects are acquired, the placing mode of fig. 12 is more reasonable. The data for the plurality of training objects includes a sequence of the plurality of training objects.
According to the stacking method, a better single-step strategy is learned through a reinforcement learning algorithm. But does not make good use of the look-ahead information of multiple training objects, i.e. the list of next training objects.
To exploit the look-ahead information of multiple training objects, a search-based approach can be used, with a Monte Carlo Tree (MCTS) as the third sub-model. In other words, in the present embodiment, the third submodel is a monte carlo tree.
Therefore, under the condition of acquiring the first sub-model, the second sub-model and the look-ahead information, the best action in the current state after considering the look-ahead information can be obtained in a simulation search mode, and the training position is updated.
Specifically, the monte carlo tree is a search tree obtained through several simulations, each node on the tree represents a possible subsequent state, and information for assisting the search (such as an average value of subsequent nodes, access times and the like) is recorded. The Monte Carlo tree is only provided with a root node at the beginning and corresponds to the current state. In each simulation, starting from the root node, a child node of the current node may be selected as the next access node according to a "tree policy" until a leaf node is reached, which is referred to as selection (Select). After the leaf node is reached, the successor state of the state corresponding to the leaf node may be added as a new node in the tree and the new node is evaluated using an evaluation function. These two phases are called extension (Expand) and Evaluation (Evaluation). Then, backtracking can be performed to update the information stored by the nodes on this path, and this stage is called backup (backup). After several simulations, a search tree is obtained that stores simulation information and can be used to select real actions, this stage is called playout (playout).
In the present embodiment, the MCTS and the neural network are combined, the action probability output by the first submodel operator is combined in the tree strategy, and the second submodel critic output value is used as the evaluation function in the evaluation function.
In summary, the palletizing can be divided into two cases, the processing model agent can observe the current training object Bn, and the processing model agent can look ahead a plurality of training objects Bn + T (T ═ 0,1,2,3 … … T-1), that is:
(1) single step placement task of unknown sequence distribution: the process model agent can observe the current training object Bn, and needs to make a single step decision to maximize the length of the sequence placed.
(2) And (3) prospective placing tasks are carried out on the T training objects after the sequence: in case of being able to look ahead a plurality of training objects Bn + T (T0, 1,2,3 … … T-1), a local combination strategy is made in combination with the size of the T-1 training objects after.
Aiming at the single-step placing task of the agent, namely when the agent can only observe the current box Bn, the reinforced learning algorithm can be adopted to lead the agent to learn the conditions possibly faced in the stacking process and make the expected income of each action. So that the process model agent learns to make optimal action decisions with existing environmental data.
Given a sequence of state actions τ ═(s)0,a0,s1,a1,...sl,al) The optimization objectives for reinforcement learning can be defined as:
Figure BDA0002360640600000101
on the basis, a processing model trained by an operator-critic algorithm in a single-step process can be combined, and MCTS is used for searching a subsequent observation state so as to obtain a stacking strategy under the condition that T training objects can be looked ahead.
In addition, in order to adapt to an actual stacking scene, the stacking method of the embodiment adds three necessary constraints on the basis of the traditional three-dimensional boxing problem. First, the bin size information is a finite set, the quantity is unknown: in the conventional three-dimensional boxing problem, the number and size of all boxes are grasped. In an actual logistics scene, the number of boxes on the conveyor belt is limited, and the number and the size of the boxes placed on the conveyor belt can be mastered, so that the number and the size of all the boxes stacked on the tray cannot be mastered completely. Second, the stacking order constraint: in the conventional three-dimensional packing problem, the order of stacking the boxes can be arbitrarily adjusted. In a real scene, the sequence of boxes stacked by the robot can be determined only according to the sequence of the boxes on the conveyor belt, and the stacking sequence of the boxes cannot be adjusted. Third, stability constraints: each box placed on the pallet must be supported from the surface of the pallet or other box, including fully supported, which means that the bottom of the box is in full contact with the top of the pallet or other box, and partially supported, which allows the bottom of the box to be partially suspended, but ensures that the box is physically supported and the stack remains stable and does not collapse.
The stacking method of the embodiment can ensure the three constraints and can ensure good space utilization rate of the tray.
The embodiment of the present application further provides a computer-readable storage medium, where a control program is stored on the computer-readable storage medium, and when the control program is executed by the processor 101, the palletizing method according to any one of the above embodiments is implemented.
For example, performing: step S11: acquiring training data; step S12: and training a preset processing model according to the training data, wherein the trained processing model is used for processing the size information of the target object and the state information of the current tray so as to determine the target position of the target object on the current tray.
According to the computer-readable storage medium of the embodiment of the application, the preset processing model is trained according to the training data, and the trained processing model is used for processing the size information of the target object and the state information of the current tray so as to determine the target position of the target object on the current tray, so that the target position can be determined simply and quickly, and the stacking effect is good.
In the description herein, references to the description of the terms "certain embodiments," "one embodiment," "some embodiments," "illustrative embodiments," "examples," "specific examples," or "some examples" or the like, mean that a particular feature, structure, material, or characteristic described in connection with the embodiments or examples is included in at least one embodiment or example of the application. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations of the above embodiments may be made by those of ordinary skill in the art within the scope of the present application, which is defined by the claims and their equivalents.

Claims (6)

1. A palletizing method for palletizing target objects to a current tray, the palletizing method comprising:
acquiring training data;
training a preset processing model according to the training data, wherein the trained processing model is used for processing the size information of the target object and the state information of the current tray so as to determine the target position of the target object on the current tray;
wherein, training a preset processing model according to the training data comprises:
determining filtering data according to the training data;
training a preset processing model according to the training data and the filtering data;
the training data includes size data of a training object and state data of a training tray, the processing model includes a first sub-model and a second sub-model, a preset processing model is trained according to the training data and the filtering data, and the method includes:
determining probability data according to the training data by using the first submodel, wherein the probability data is the probability corresponding to each position of the current training object stacked on the training tray;
processing the probability data to determine a training location based on the filtered data;
determining evaluation data from the training location and the training data using the second submodel;
updating the process model with the assessment data.
2. The palletizing method according to claim 1, wherein processing the probability data to determine a training position in dependence on the filter data comprises:
processing the probability data according to the filtering data to obtain filtered probability data;
determining a maximum value in the filtered probability data;
and taking the corresponding position of the maximum value on the training tray as the training position.
3. The palletizing method according to claim 1, wherein the number of the training objects is plural, the process model comprises a third sub-model, and the training of the preset process model based on the training data and the filtering data comprises:
updating the training position according to the evaluation data and the size data of the plurality of training objects using the third submodel.
4. A palletizing system for palletizing target objects onto a current tray, comprising a memory and a processor, the processor being connected to the memory and the processor being configured to obtain training data; and
training a preset processing model according to the training data, wherein the trained processing model is used for processing the size information of the target object and the state information of the current tray so as to determine the target position of the target object on the current tray;
wherein the processor is configured to determine filtering data from the training data; the processing model is used for training a preset processing model according to the training data and the filtering data;
the training data comprises size data of a training object and state data of a training tray, the processing model comprises a first sub-model and a second sub-model, the processor is used for determining probability data according to the training data by using the first sub-model, and the probability data is the probability corresponding to each position of the training object which is put on the training tray at present; and for processing the probability data in accordance with the filtered data to determine a training position; and for determining evaluation data from the training position and the training data using the second submodel; and for updating the process model with the assessment data.
5. The palletizing system according to claim 4, wherein the number of training objects is plural, the processing model comprises a third submodel, and the processor is configured to update the training positions based on the evaluation data and size data of the plural training objects using the third submodel.
6. A computer-readable storage medium, having stored thereon a control program which, when executed by a processor, implements a palletizing method as claimed in any one of claims 1 to 3.
CN202010020711.0A 2020-01-09 2020-01-09 Stacking method, stacking system and storage medium Active CN111099363B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010020711.0A CN111099363B (en) 2020-01-09 2020-01-09 Stacking method, stacking system and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010020711.0A CN111099363B (en) 2020-01-09 2020-01-09 Stacking method, stacking system and storage medium

Publications (2)

Publication Number Publication Date
CN111099363A CN111099363A (en) 2020-05-05
CN111099363B true CN111099363B (en) 2021-10-22

Family

ID=70426383

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010020711.0A Active CN111099363B (en) 2020-01-09 2020-01-09 Stacking method, stacking system and storage medium

Country Status (1)

Country Link
CN (1) CN111099363B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111598316B (en) * 2020-05-06 2023-03-24 深圳大学 Object transfer boxing process strategy generation method and device and computer equipment
CN112085385A (en) * 2020-09-09 2020-12-15 广东力生智能有限公司 Generation system and method of stable mixed box stack type box supply sequence based on order
CN113651118B (en) * 2020-11-03 2023-02-10 梅卡曼德(北京)机器人科技有限公司 Method, device and apparatus for hybrid palletizing of boxes of various sizes and computer-readable storage medium
CN113427307B (en) * 2021-06-26 2022-08-09 山东省智能机器人应用技术研究院 Industrial robot end effector for gripping caterpillar links and palletizing method
CN114529155A (en) * 2022-01-17 2022-05-24 湖南视比特机器人有限公司 Method and system for dynamically stacking and framing workpieces
CN114933176A (en) * 2022-05-14 2022-08-23 江苏经贸职业技术学院 3D vision stacking system adopting artificial intelligence

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108776879A (en) * 2018-06-04 2018-11-09 江苏楚门机器人科技有限公司 A kind of pile shape planing method based on weight study
CN109359186A (en) * 2018-10-25 2019-02-19 杭州时趣信息技术有限公司 A kind of method, apparatus and computer readable storage medium of determining address information
CN109829947A (en) * 2019-02-25 2019-05-31 北京旷视科技有限公司 Pose determines method, tray loading method, apparatus, medium and electronic equipment
CN109870983A (en) * 2017-12-04 2019-06-11 北京京东尚科信息技术有限公司 Handle the method, apparatus of pallet stacking image and the system for picking of storing in a warehouse

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8538963B2 (en) * 2010-11-16 2013-09-17 International Business Machines Corporation Optimal persistence of a business process

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109870983A (en) * 2017-12-04 2019-06-11 北京京东尚科信息技术有限公司 Handle the method, apparatus of pallet stacking image and the system for picking of storing in a warehouse
CN108776879A (en) * 2018-06-04 2018-11-09 江苏楚门机器人科技有限公司 A kind of pile shape planing method based on weight study
CN109359186A (en) * 2018-10-25 2019-02-19 杭州时趣信息技术有限公司 A kind of method, apparatus and computer readable storage medium of determining address information
CN109829947A (en) * 2019-02-25 2019-05-31 北京旷视科技有限公司 Pose determines method, tray loading method, apparatus, medium and electronic equipment

Also Published As

Publication number Publication date
CN111099363A (en) 2020-05-05

Similar Documents

Publication Publication Date Title
CN111099363B (en) Stacking method, stacking system and storage medium
CN109918532B (en) Image retrieval method, device, equipment and computer readable storage medium
JP6620439B2 (en) Learning method, program, and learning apparatus
Haupt et al. Practical genetic algorithms
Coley An introduction to genetic algorithms for scientists and engineers
KR101816329B1 (en) Organizing neural networks
CN112325897B (en) Path planning method based on heuristic deep reinforcement learning
CN105719001A (en) Large-Scale Classification In Neural Networks Using Hashing
CN101496051A (en) Player ranking with partial information
CN111949020A (en) AR path guidance-based path planning method and system for picking multiple persons in warehouse
CN109829065B (en) Image retrieval method, device, equipment and computer readable storage medium
CN107341548A (en) A kind of data processing method, device and electronic equipment
US7730000B2 (en) Method of developing solutions for online convex optimization problems when a decision maker has knowledge of all past states and resulting cost functions for previous choices and attempts to make new choices resulting in minimal regret
CN116955959A (en) Time sequence prediction integration method based on multi-objective evolution algorithm
CN116186243A (en) Text abstract generation method, device, equipment and storage medium
WO2019240047A1 (en) Behavior learning device, behavior learning method, behavior learning system, program, and recording medium
CN114201860A (en) Cigarette raw material warehouse optimal layout method based on group intelligence
Maneeratana et al. Compressed-objective genetic algorithm
JP2022161099A (en) Arithmetic apparatus, integrated circuit, machine learning apparatus, and discrimination apparatus
CN116360437A (en) Intelligent robot path planning method, device, equipment and storage medium
CN113420970B (en) Task scheduling method in intelligent storage environment
CN112907004B (en) Learning planning method, device and computer storage medium
JP6713099B2 (en) Learned model integration method, device, program, IC chip, and system
Li et al. A self-learning Monte Carlo tree search algorithm for robot path planning
CN109409591A (en) A kind of historical relic displacement prediction method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Zhao Hang

Inventor after: Peng Fei

Inventor before: Peng Fei

Inventor before: Zhao Hang

GR01 Patent grant
GR01 Patent grant