CN114218624A

CN114218624A - Two-dimensional special-shaped part layout method based on deep reinforcement learning

Info

Publication number: CN114218624A
Application number: CN202111399771.9A
Authority: CN
Inventors: 张校志; 付鑫; 陈良煜
Original assignee: Ji Hua Laboratory
Current assignee: Ji Hua Laboratory
Priority date: 2021-11-24
Filing date: 2021-11-24
Publication date: 2022-03-22

Abstract

The invention provides a two-dimensional special-shaped part layout method based on deep reinforcement learning, which relates to the technical field of two-dimensional special-shaped part layout, wherein a scene modeling of a layout problem is performed based on the deep reinforcement learning, an observation space is constructed by a cut wood block and a cut wood block shape, a motion space is constructed according to the size of a shape to be cut, a reward and punishment mechanism is set, a deep reinforcement learning training environment is configured, a layout strategy is trained through the deep reinforcement learning, automatic exploration and sampling are performed, training data are generated, and a continuous optimization strategy is propagated in a reverse direction until the condition specified by a layout task is met. The invention has the beneficial effects that: the searching efficiency is improved, the layout problem of a complex two-dimensional special-shaped piece can be solved, and a new solution can be obtained by changing the shape of a mother block, the shape of sub blocks, the minimum cutting quantity and the like in different demand scenes at low cost.

Description

Two-dimensional special-shaped part layout method based on deep reinforcement learning

Technical Field

The invention belongs to the technical field of two-dimensional special-shaped part layout, and particularly relates to a two-dimensional special-shaped part layout method based on deep reinforcement learning.

Background

The stock layout Problem (Nesting solution), also known as blanking and stock solutions or filling Problem (Packing solution), aims to find a higher material utilization during the material Cutting process. The layout problem belongs to the classical NP-Hard problem, the time complexity of the layout problem rapidly rises along with the increase of the problem scale, and the large-scale example is difficult to accurately solve in a reasonable time. An example is shown in figure 1.

A plurality of subblocks with different styles are cut from a whole plane material (simple task: several rectangular wood blocks with different sizes; complex task: besides basic shape, irregular shape of irregular shape needs to be cut), and the constraint conditions are met: the number of subblocks for each model is cut out not less than a prescribed number of blocks, and the smaller the last remaining unallocated cutting area (waste ratio) is, the better.

In the traditional method, for example, when a segmentation scheme is designed by a rectangular layout algorithm, a genetic algorithm and the like, only simple scenes can be dealt with, for example, only rectangular subblocks with more consistent shapes can be cut out, compared with the rectangular layout problem, the special-shaped layout problem has the outstanding characteristics that the boundary outline of a cut piece is complex, complex geometric operation is needed in the calculation process, the algorithm complexity is further increased, and the problem is recognized by academia and the industry to be difficult to solve.

Disclosure of Invention

The invention provides a two-dimensional special-shaped part layout method based on deep reinforcement learning, and solves the problems that in the prior art, two-dimensional special-shaped parts are complex in layout, difficult to automate and the like.

The invention provides a two-dimensional special-shaped piece stock layout method based on deep reinforcement learning, which refers to a plane material for cutting stock layout as a mother block, and refers to special-shaped pieces and non-special-shaped pieces to be cut as sub blocks, and comprises the following steps:

step 1, obtaining stock layout task information, and determining a state space and an action space of a task;

step 2, introducing an auxiliary decision, reducing the action space: performing corrosion treatment on the remaining original block area which is not subjected to stock layout when action is taken each time, wherein the corrosion radius is the minimum radius of the candidate subblocks, and the minimum radius is defined as the minimum distance from the geometric center of the candidate subblock shape to the shape contour point;

step 3, setting a reward and punishment mechanism;

step 4, configuring a deep reinforcement learning training environment, training and storing an optimal model;

and 5, substituting the optimal model into a task scene to carry out reasoning calculation to obtain a final stock layout scheme.

The specific method for determining the state space and the action space of the task in the step 1 comprises the following steps:

(1) determining a state space S: marking the region which is divided into the sub-blocks and the rest regions which are not divided on the mother block according to the division, and jointly forming a state space S, wherein the state space S consists of a region in a two-dimensional space, and the state space is initially in an undivided state of the whole mother block;

(2) determining the action space A: the action space a is determined by the properties of the sub-blocks, including the size and placement pose of the sub-blocks.

The specific determination method of the motion space in the step (2) comprises the following steps: each candidate sub-block is assigned a standard initial pose specifying that it can only be rotated by one of {30,60,90,120,150,180,210,240,270,300,330,360} relative to the initial position for actually cutting the lofted placement shape.

The reward and punishment mechanism of the invention is set as follows:

(1) no overlap: if the placed sub-blocks and the rest wood block areas which can be placed do not meet the condition that the sub-blocks are completely enclosed, ending the turn, marking the turn as the end of the task, and giving a penalty of-1;

(2) each time a sub-block is successfully cut out in the simulation environment, a reward of 0.01 is given;

(3) time cost: each additional step of exploration was given a penalty of-0.001;

(3) utilization rate: and (3) when each round is finished, calculating the area ratio S _ left/S _ total of the remaining unmanaged samples, and returning the reward to the round: gamma (1-S _ left/S _ total), wherein gamma is a super parameter;

(4) minimum number of sub-blocks requirement: defining a reward related to the minimum number of sub-blocks, wherein the reward is defined as the negative value of the number of incomplete sub-blocks in the proportion of the sub-blocks which need to be drained in total: -N _ undefined/N _ total.

In the step (3), gamma is selected from 0.5-2.0 according to the strict degree of the requirement of the task on the material utilization rate.

The invention configures a deep reinforcement learning training environment, and the training content comprises the following steps:

(1) and obtaining the next stock layout action in the cuttable area of the mother block according to the current strategy pi: the input of the strategy pi based on deep learning is the state of the currently observed task scene: the method comprises the following steps that a region of a mother block into which sub-blocks are divided and a region which is not divided are input after being subjected to corrosion treatment according to an auxiliary strategy, and the input is output as a two-dimensional coordinate point P, a candidate sub-block n and a rotation angle alpha;

(2) in a simulation environment, on a mother block, a two-dimensional coordinate point P is taken as a central point, a candidate subblock n is placed into the mother block at a clockwise rotation angle alpha relative to a default position, the subblock segmentation operation is executed, the part of area is marked as segmented, and a reward r given by the environment is obtained according to a reward and punishment mechanism;

(3) repeating the steps (1) to (2) until the turn is finished, wherein the sign of the turn is one of the following conditions:

if the number of steps executed in the current round exceeds the set number of steps, the current round is immediately ended;

secondly, the fact that the area of the residual area in the mother block is smaller than a set threshold value indicates that any sub-block cannot be segmented again in the current state, and the current round is finished;

when the round is finished, recording the accumulated reward R of the round, namely the accumulated sum of the reward R of each step of the round, comparing the accumulated reward R with a preset optimal model accumulated reward threshold value R _ best, judging whether the current model is in the optimal model by comparing whether the accumulated reward R of the round is larger than the R _ best, if so, saving the model of the round as the optimal model, and updating the value of the R _ best to the R of the current round;

(4) saving the state transfer process of each step in the current round to a memory pool, training a value function in a deep reinforcement learning algorithm, and updating and improving a strategy pi by a gradient descent method;

(5) if no signal for stopping training is received, carrying out next round of exploration, repeating the steps (1) to (4) until one or more of the following two states appears, and stopping training:

firstly, accumulating the number of executed rounds to exceed the set step number, and stopping training;

and secondly, stopping training when the accumulated reward R is larger than a preset threshold value.

The initialization value of R _ best in the step (3) is-9999.

In step (3), if the number of steps executed in the current round exceeds 3000 times, the current round is immediately ended.

The threshold value in step (3) of the present invention is set to be 3 times of the minimum subblock area in the candidate subblocks.

In the step (5), the number of accumulated executed rounds exceeds the set number of steps by 5000 times, and the training is stopped.

It should be understood that the statements herein reciting aspects are not intended to limit the critical or essential features of any embodiment of the invention, nor are they intended to limit the scope of the invention. Other features of the present invention will become apparent from the following description.

The invention has the beneficial effects that: the invention converts the layout problem of the two-dimensional special-shaped piece into a scene suitable for reinforcement learning, sets a state space, an action space and a reward mechanism, adds an auxiliary function, helps to improve the searching efficiency, can deal with the layout problem of the complex two-dimensional special-shaped piece, can obtain a new solution by multiplexing, such as changing the shape of a mother block, the shape of a sub block, the minimum cutting quantity and the like, under different demand scenes at lower cost, and has the following advantages: firstly, complex pretreatment means and complex exploration strategies do not need to be designed manually; the sample arrangement problem of a complex two-dimensional special-shaped piece can be solved, and the sample arrangement problem can be solved in a concave-convex, annular and hollow manner, because the action space can cover all the remaining areas which can be reasonably placed; and thirdly, the migration models such as the shape of the mother block, the shape of the sub-blocks, the minimum cutting quantity and the like can be changed at low cost under different demand scenes.

Drawings

FIG. 1 is a schematic diagram of a layout problem case;

fig. 2 is a schematic overall flow chart of the two-dimensional profile layout method based on deep reinforcement learning according to the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.

In addition, the term "and/or" herein is only one kind of association relationship describing an associated object, and means that there may be three kinds of relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

The scheme provides a two-dimensional special-shaped part layout method with a wide application range, and the layout problem of the two-dimensional special-shaped part can be solved more flexibly. By utilizing the characteristics of deep reinforcement learning, training data are generated by automatic exploration and sampling, strategies are continuously optimized through back propagation until the conditions specified by the stock layout task are met, and the whole process does not need to introduce too much preprocessing work.

For dealing with the layout problem of a more complicated two-dimensional special-shaped part, the scheme models the scene of the layout problem based on deep reinforcement learning, an observation space is constructed by a cut wood block and a cut wood block shape, a movement space is built according to the size of the shape needing to be cut, a reward and punishment mechanism is set, a deep reinforcement learning training environment is configured, a deep reinforcement learning training layout strategy is adopted, automatic exploration and sampling are performed, training data are generated, a strategy is continuously optimized through back propagation until the condition specified by a layout task is met.

The specific method comprises the following steps:

firstly, stock-form task information is obtained, and a state space of a task and an action space of an agent are determined.

The plane material for cutting the stock layout is called a mother block, and the special-shaped piece and the non-special-shaped piece which need to be cut are called sub blocks; the parent block for segmentation can be described by a polygonal area on a plane, such as a common possibly rectangular plane material, using coordinates of two diagonal points. The modeling process is as follows:

determining a state space: the region divided into sub-blocks and the remaining undivided regions on the parent block are marked according to whether the division is performed or not, so that a state space S is formed, the state space S is composed of a region in a two-dimensional space, and the state space is initially in an undivided state of the whole parent block.

Determining a motion space: the action space a is determined by the properties (size and pose) of the sub-blocks.

The subblocks can be cut from the mother block at any angle without considering the calculation cost, but the problem of excessively large action space and sampling difficulty is brought, and the problem can be somewhat simplified in order to reduce the calculation amount, for example, each candidate subblock is assigned with a standard initial pose (two-dimensional space) and is only rotated by one angle of {30,60,90,120,150,180,210,240,270,300,330,360} relative to the initial position when the shape for actually cutting the stock layout is specified, so that each time the shape of the subblock is selected for cutting the wood block, the action that can be taken is described as follows: selecting a two-dimensional point in the area of the parent block left with no stock layout as the center point of the candidate cutting shape, selecting one angle from 12 candidate angles of {30,60,90,120,150,180,210,240,270,300,330,360}, selecting one of the candidate sub-wood blocks, and segmenting according to the shape of the sub-block. The action space comprises two parts of information, namely a two-dimensional coordinate of a central point, belonging to a continuous value, and selecting a sub-block shape and determining a rotation angle, which is a discrete value of 12 x N.

And secondly, introducing an auxiliary decision, reducing the action space and accelerating the exploration process of the intelligent agent.

In order to increase sampling efficiency, an auxiliary decision is introduced, and the remained undischarged mother block area is corroded (image processing field method, area can be reduced) each time action is taken, wherein the corroded radius is the minimum radius of the candidate subblock, and the minimum radius of the subblock is defined as the minimum distance from the geometric center of the shape to the contour point of the shape;

thirdly, setting a reward and punishment mechanism, which specifically comprises the following steps:

no overlap: if the placed sub-blocks and the rest wood block areas which can be placed do not meet the condition that the sub-blocks are completely surrounded (the corresponding practical meaning is that the shapes of the sub-blocks which are ready to be placed exceed the range or overlap with other sub-blocks which are already arranged), the round is finished, the round is marked as the end of the task, and a penalty of-1 is given;

each time a sub-block is successfully cut in the simulation environment, a reward of 0.01 is given;

time cost: each additional step of exploration gives a penalty of-0.001, because in the two-dimensional profile layout problem scenario, there is generally no real-time requirement and the time cost requirement is not strict. Retraining is generally only required when the requirements are changed (such as changing the shape of the mother block, the shape of the sub-blocks, and the minimum number of cuts), so as to generate a new scheme;

utilization rate: and (3) when each round is finished, calculating the area ratio S _ left/S _ total of the remaining unmanaged samples, and returning the reward to the round: gamma (1-S _ left/S _ total), wherein gamma is a super parameter, and a value can be selected from 0.5 to 2.0 according to the strict degree of the requirement of the task on the material utilization rate;

minimum number of sub-blocks requirement: defining a reward related to the minimum number of sub-blocks, wherein the reward is defined as the negative value of the number of incomplete sub-blocks in the proportion of the sub-blocks which need to be drained in total: n _ undefined/N _ total, the complex version may be weighted averaged according to the importance of each sub-block.

And fourthly, configuring a deep reinforcement learning training environment to train the intelligent agent.

After a state space, an action space and a reward and punishment mechanism are matched, a proper depth reinforcement learning algorithm is selected, and the adopted depth reinforcement learning algorithm comprises the following steps: a deep Q learning algorithm (DQN), DDQN of a bivalence network, etc., which starts training of the agent:

(1) and obtaining the next layout action of the intelligent agent in the cuttable area of the mother block according to the current strategy pi: the input of the strategy pi based on deep learning is the state of the currently observed task scene: the area of the mother block into which the sub-blocks are divided and the area which is not divided are input after the area which is not divided is corroded according to an auxiliary strategy, and the two-dimensional coordinate point P, the candidate sub-blocks n and the rotation angle alpha are output.

(2) At the position P of the parent block, the candidate sub-block is rotated clockwise by an angle alpha relative to the default position to execute the sub-block segmentation operation, the partial area is marked as segmented, the reward r given by the environment is obtained,

(3) repeating the step (1) and the step (2) until the turn is finished, wherein the mark of the turn is as follows:

if the number of steps executed in the current round exceeds 3000, the current round is immediately ended;

② the area of the residual region in the mother block is smaller than the set threshold, for example, it can be set to 3 times of the minimum sub-block area in the candidate sub-blocks. The fact that the area of the residual area in the mother block is smaller than the set threshold value indicates that any sub-block cannot be segmented again in the current state, and the current round is ended;

when the round is finished, recording the cumulative reward R of the round, namely the cumulative sum of the rewards R of each step of the round, comparing the cumulative reward R with a preset optimal model cumulative reward threshold value R _ best (which can be initialized to a minimum value of-9999), judging whether the current model is in the optimal model by comparing whether the cumulative reward R of the round is greater than the R _ best, if so, saving the model of the round as the optimal model, and updating the value of the R _ best as the R of the current round;

(4) saving the state transfer process of each step in the current round to a memory pool, and using the state transfer process to train a cost function in a deep reinforcement learning algorithm and update and improve a strategy pi;

(5) if the signal for stopping training is not received, carrying out the exploration of the next round, repeating the steps (1) to (4) until one or more of the following two states appear, and stopping training;

firstly, accumulating the number of executed rounds to exceed 5000 times, and stopping training;

And fifthly, substituting the optimal model into the task scene to carry out reasoning calculation to obtain a final stock layout scheme.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims

1. A two-dimensional special-shaped piece stock layout method based on deep reinforcement learning is characterized in that a plane material used for cutting stock layout is called a mother block, and a special-shaped piece and a non-special-shaped piece to be cut are called sub blocks, and the method comprises the following steps:

step 3, setting a reward and punishment mechanism;

2. The two-dimensional profile layout method based on deep reinforcement learning according to claim 1, wherein the specific method for determining the state space and the action space of the task in step 1 is as follows:

3. The two-dimensional profile layout method based on deep reinforcement learning according to claim 1, wherein the specific determination method of the motion space in the step (2) is as follows: each candidate sub-block is assigned a standard initial pose specifying that it can only be rotated by one of {30,60,90,120,150,180,210,240,270,300,330,360} relative to the initial position for actually cutting the lofted placement shape.

4. The depth-enhanced learning-based two-dimensional profile layout method according to claim 1, wherein the reward and punishment mechanism is set as:

5. The two-dimensional special-shaped part layout method based on deep reinforcement learning according to claim 4, wherein γ in the step (3) takes a value between 0.5 and 2.0 according to the strict degree of the requirement of the task on the material utilization rate.

6. The two-dimensional profile layout method based on deep reinforcement learning according to claim 1, wherein a deep reinforcement learning training environment is configured, and the training comprises:

(1) and obtaining the next stock layout action in the cuttable area of the mother block according to the current strategy pi: the input of the strategy pi based on deep learning is the state S of the currently observed task scene: the method comprises the following steps that a region of a mother block into which sub-blocks are divided and a region which is not divided are input after being subjected to corrosion treatment according to an auxiliary strategy, and the input is output as a two-dimensional coordinate point P, a candidate sub-block n and a rotation angle alpha;

7. The two-dimensional profile layout method based on deep reinforcement learning according to claim 6, wherein the initialized value of R _ best in step (3) is-9999.

8. The method according to claim 6, wherein the number of steps performed in the current round in step (3) exceeds 3000 times, and the current round is immediately ended.

9. The two-dimensional profile layout method based on deep reinforcement learning according to claim 6, wherein the threshold value in step (5) is set to be 3 times of the area of the smallest sub-block in the candidate sub-blocks.

10. The two-dimensional profile layout method based on deep reinforcement learning according to claim 6, wherein in the step (6), the number of accumulated executed rounds exceeds the set number of steps by 5000 times, and the training is stopped.