CN115237119A - AGV collaborative transfer target distribution and decision algorithm - Google Patents
AGV collaborative transfer target distribution and decision algorithm Download PDFInfo
- Publication number
- CN115237119A CN115237119A CN202210627881.4A CN202210627881A CN115237119A CN 115237119 A CN115237119 A CN 115237119A CN 202210627881 A CN202210627881 A CN 202210627881A CN 115237119 A CN115237119 A CN 115237119A
- Authority
- CN
- China
- Prior art keywords
- agv
- action
- decision
- target
- matrix
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004422 calculation algorithm Methods 0.000 title claims abstract description 30
- 238000012546 transfer Methods 0.000 title description 2
- 230000009471 action Effects 0.000 claims abstract description 119
- 230000006870 function Effects 0.000 claims abstract description 91
- 239000011159 matrix material Substances 0.000 claims abstract description 76
- 238000005457 optimization Methods 0.000 claims abstract description 22
- 230000002787 reinforcement Effects 0.000 claims abstract description 10
- 230000010354 integration Effects 0.000 claims abstract description 4
- 238000000034 method Methods 0.000 claims description 36
- 230000008569 process Effects 0.000 claims description 26
- 230000035515 penetration Effects 0.000 claims description 20
- 230000007123 defense Effects 0.000 claims description 15
- 230000004888 barrier function Effects 0.000 claims description 11
- 239000013598 vector Substances 0.000 claims description 10
- 238000011156 evaluation Methods 0.000 claims description 4
- 241001307210 Pene Species 0.000 claims description 3
- 238000005315 distribution function Methods 0.000 claims description 3
- 229910052739 hydrogen Inorganic materials 0.000 claims description 2
- 239000001257 hydrogen Substances 0.000 claims description 2
- 125000004435 hydrogen atom Chemical class [H]* 0.000 claims description 2
- 238000010295 mobile communication Methods 0.000 claims 1
- 238000012549 training Methods 0.000 abstract description 15
- 230000000694 effects Effects 0.000 abstract description 6
- 238000004364 calculation method Methods 0.000 abstract description 4
- 230000006872 improvement Effects 0.000 description 9
- 238000004088 simulation Methods 0.000 description 4
- 239000003795 chemical substances by application Substances 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000000903 blocking effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000010355 oscillation Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000010845 search algorithm Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0212—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
- G05D1/0221—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving a learning process
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0631—Resource planning, allocation, distributing or scheduling for enterprises or organisations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/40—Business processes related to the transportation industry
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Human Resources & Organizations (AREA)
- Economics (AREA)
- Strategic Management (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- Entrepreneurship & Innovation (AREA)
- Marketing (AREA)
- Theoretical Computer Science (AREA)
- Tourism & Hospitality (AREA)
- Quality & Reliability (AREA)
- Development Economics (AREA)
- Game Theory and Decision Science (AREA)
- Operations Research (AREA)
- Educational Administration (AREA)
- Aviation & Aerospace Engineering (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Automation & Control Theory (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses an AGV (automatic guided vehicle) cooperative transportation target distribution and decision algorithm, which is characterized by comprising the steps of respectively establishing an dominance matrix and a target decision matrix based on the quantity of AGV and the quantity of targets to be transported, establishing a target distribution optimization function, analyzing AGV transportation environment, establishing a probability matrix and an anti-intrusion action matrix, and establishing an action decision target function and an action decision constraint; linear weighting integration, namely establishing a uniform objective function, performing reinforcement learning and designing a reward function to form a multi-stage decision problem model; designing a joint reward function to carry out action reward; setting a selection strategy and backtracking an updating formula, improving the selection strategy and backtracking the updating formula, and solving a solution problem model. The invention has the advantages of high calculation speed, short training time and good convergence effect, thereby realizing the effects of intelligent calculation, accurate allocation and action decision.
Description
Technical Field
The invention relates to the technical field of intelligent carrying equipment, in particular to an AGV (automatic guided vehicle) cooperative carrying target distribution and decision algorithm.
Background
An AGV is an unmanned transport vehicle that implements vehicle positioning, orientation, obstacle avoidance, and path planning by installing a rotatable laser scanner to detect the surrounding environment. As key transportation equipment in goods transportation, all the AGVs form a transportation network, and transportation activities between a starting point and a target unit are completed under the instruction of a control system. Therefore, how to perform target allocation and action decision of AGVs through the control system is an important issue to be solved urgently.
The AGV target allocation and action decision problem is an extension of the Vehicle Routing Problem (VRP) and belongs to the NP-hard problem, so that the large-scale problem is difficult to solve using the conventional precise solution algorithm. The problem is far more complex than the traditional VRP problem and scheduling problem due to the fact that the scene is complex and the problems of multiple constraints, multiple targets, uncertainty and the like exist. At present, a scheduling method based on experience and rules is mainly used, and the method causes the problems of low on-time delivery rate, low AGV utilization rate, long delivery time and the like. Therefore, it is urgently needed to combine the optimal scheduling theory with the characteristics of AGV delivery and design an effective optimal theory and method for solving the problem.
In conclusion, the method has very important practical significance in researching the AGV scheduling problem, integrates target distribution and action decision under various constraints and maneuvers aiming at the AGV scheduling problem, and designs an effective intelligent algorithm to solve the problem.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide an AGV cooperative transport target allocation and decision algorithm which has the advantages of high calculation speed, short training time and good convergence effect, thereby realizing the effects of intelligent calculation, accurate allocation and action decision.
In order to achieve the purpose, the invention provides the following technical scheme:
an AGV cooperative transport target distribution and decision algorithm comprises the following steps:
s1: respectively establishing an dominance matrix and a target decision matrix based on the number of the AGVs and the number of the targets to be carried, and establishing a target distribution optimization function based on the dominance matrix and the target decision matrix;
s2: analyzing an AGV carrying environment, establishing a probability matrix, optimizing according to the probability matrix to obtain a penetration action matrix, and establishing an action decision objective function based on the probability matrix and the penetration action matrix;
s3: establishing action decision constraint according to the action decision objective function;
s4: performing linear weighted integration on the target distribution optimization function and the action decision objective function to establish a uniform objective function;
s5: performing reinforcement learning, performing Markov decision process modeling on target allocation and decision in stages to construct a state space and an action space, and designing a reward function based on a unified target function to form a multi-stage decision problem model;
s6, designing a joint reward function based on the unified objective function;
s7: establishing a corresponding Monte Carlo tree for each AGV, searching each Monte Carlo tree, forming a combined action set for the search results, substituting a combined reward function for evaluation and obtaining a reward value;
s8: setting a selection strategy and a backtracking updating formula, returning the obtained reward value to each Monte Carlo tree, improving the selection strategy and the backtracking updating formula of the algorithm, and solving a decision problem model.
As a further improvement of the present invention, the step S1 specifically includes:
in the AGV cooperative process, N is set U AGV and N Tar An object to be carried;
the established dominance matrix is A ij I (i =1, 2.... Times., N) in the matrix u ) Line j (j =1, 2.. Ang., N) Tar ) The values in the column represent the comprehensive dominance of the ith AGV to the target j;
the established objective decision matrix is X ij The element value in the matrix is 1, which means that the ith AGV is allocated to the target j, otherwise, the element value of the matrix is 0;
the established target allocation optimization function is as follows:
wherein: j is a unit of dis,i And allocating an optimization function to the target of the ith AGV stage.
As a further improvement of the present invention, step S1 further includes:
in the process of allocating targets, each AGV can only be allocated to one target, and each target is allocated with at least one AGV, and a constraint model is established for the AGV:
as a further improvement of the present invention, in the step S2, due to the AGV transport process, different targets make different maneuvers based on different environments to pass through the obstacle, and different targets are set to deploy different air defense areas in different numbers, that is, the target j has N in total d,j Individual obstacle region, AGV has N A Selecting one action for defense through each barrier area, and establishing probability matrix P mk Matrix represents performing action m (m =1, 2..., N) A ) N in the barrier region k (k =1,2.. Times.n) d,j ) The probability percentage can be reduced, and the probability matrix is optimized to obtain a penetration matrix M km The matrix represents that if the AGV selects the penetration action m in the obstacle area k, the matrix element is 1, otherwise, the matrix element is 0;
establishing an action decision objective function of the AGV:
wherein: j is a unit of pene,i Representing the action decision objective function of the ith AGV.
As a further improvement of the invention, the step S2 further comprises the step of restricting the number of action selections in the process of transporting the AGV according to the physical characteristics of the AGV, and defining that each action selection does not exceed b 1 Secondly, establishing action decision constraint:
and optimally selecting decision variables in a discrete decision space based on action decision constraints.
As a further improvement of the present invention, the step S4 specifically includes:
establishing a unified objective function for the ith AGV based on an objective distribution function and an action decision objective function:
converting the multi-target optimization of the AGV cluster into single-target optimization by using a linear weighting method:
as a further improvement of the present invention, said step S4 further includes establishing an objective function constraint:
wherein the objective function constraint is a combination of a constraint model and an action decision constraint.
As a further improvement of the present invention, the step S5 specifically includes:
the state space is established as:
wherein: s dis Assign status to the target, S r Is the area of the obstacle 1 and is,in order to pass through the area of the obstacle 2 in sequence,in order to pass through the area of the obstacle 3 in sequence,are areas of obstacles 4 that pass in sequence. By integrating the state spaces of all targets, the total state space can be expressed as:
the AGV acts differently according to the states of different stages, the action is selected as a target in the target distribution stage, the AGV can only select one target, and the result is represented as a discrete vectorFor the presence of N Tar When there is one target, there is N target distribution stage Tar An action is selectable;
when the obstacle passing stage action is taken as 5 types of defense actions, the AGV selects one action in a certain obstacle area state, the 5 types of defense actions are expressed by vectors, elements in the vectors sequentially represent action 1, action 2, action 3, action 4 and action 5, and a reward function is established:
as a further improvement of the present invention, the joint reward function is:
and giving a penalty of-1 when the objective decision matrix and the penetration matrix do not meet the constraint, or giving a uniform objective function value to the reward value.
As a further improvement of the present invention, the selection decision is:
wherein: v. of father Being a parent node of a node v being computed, C p A constant, orientation relationship used to weigh search and utilization, Q (v) is the result based on all AGV assignments and penetration;
the backtracking update formula is as follows:
wherein: n is a radical of new (v) As a new node, N old (v) Is an old node, Q new (v) For new penetration results, Q old (v) For old blast results, Δ Q is the blast result difference.
The invention has the beneficial effects that: the method comprises the steps of respectively establishing a target distribution optimization function and an action decision objective function by considering the advantages, the target value and the action probability of the AGVs, integrating the two functions to form a unified objective function for collaborative task planning of the AGVs, constructing a state space and an action space in stages in a reinforcement learning frame, designing a reward function according to the unified objective function, and finally providing an improved Monte Carlo tree search reinforcement learning algorithm.
Drawings
FIGS. 1-5 are partial AGV search tree search depth maps;
FIGS. 6-10 show J of a partial AGV i The value is obtained.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples. In which like parts are designated by like reference numerals. It should be noted that the terms "front," "back," "left," "right," "upper" and "lower" used in the following description refer to directions in the drawings, and the terms "bottom" and "top," "inner" and "outer" refer to directions toward and away from, respectively, the geometric center of a particular component.
Referring to fig. 1, a specific embodiment of an AGV cooperative transport target allocation and decision algorithm according to the present invention includes the following steps:
s1: respectively establishing an dominance matrix and a target decision matrix based on the number of the AGVs and the number of the targets to be carried, and establishing a target distribution optimization function based on the dominance matrix and the target decision matrix;
s2: analyzing an AGV carrying environment, establishing a probability matrix, optimizing according to the probability matrix to obtain a penetration action matrix, and establishing an action decision objective function based on the probability matrix and the penetration action matrix;
s3: establishing action decision constraint according to the action decision objective function;
s4: performing linear weighted integration on the target distribution optimization function and the action decision objective function to establish a uniform objective function;
s5: performing reinforcement learning, performing Markov decision process modeling on target allocation and decision in stages to construct a state space and an action space, and designing a reward function based on a unified target function to form a multi-stage decision problem model;
s6, designing a combined reward function based on the unified objective function;
s7: establishing a corresponding Monte Carlo tree for each AGV, searching each Monte Carlo tree, forming a combined action set for the search results, substituting a combined reward function for evaluation and obtaining a reward value;
s8: setting a selection strategy and a backtracking updating formula, returning the obtained reward value to each Monte Carlo tree, improving the selection strategy and the backtracking updating formula of the algorithm, and solving the decision problem model.
The step S1 specifically comprises the following steps:
in the AGV cooperative process, N is set U Tables AGV and N Tar An object to be carried;
the established dominance degree matrix is A ij I (i =1, 2.... Times., N) in the matrix u ) Line j (j =1, 2.... Times., N) Tar ) The values of the columns represent the comprehensive dominance of the ith AGV to the target j;
the established objective decision matrix is X ij If the element value in the matrix is 1, the ith AGV is allocated to a target j, otherwise, the element value of the matrix is 0;
the established target distribution optimization function is as follows:
wherein: j is a unit of dis,i Allocating an optimization function to the ith target of the AGVs, wherein in the process of allocating the targets, each AGV can only be allocated to one target, and each target is allocated to at least one AGV, and establishing a constraint model:
and carrying out constraint control on each AGV through a constraint model.
In the process of transporting the AGV, passing environments of different targets are different, the environments mainly comprise different obstacles, and the AGV can make different maneuvering actions to pass through the obstacles. The probability of the obstacle being blocked and the probability of the maneuver are shown in the following table:
table 1 original blocking probability table for each obstacle
TABLE 2 AGV action to obstacle block reduction ratio table
Different targets pass through the obstacle by making different maneuvers based on different environments in the AGV transporting process, and different targets are set to deploy different air defense areas with different numbers, namely the target j has N in total d,j Individual obstacle region, AGV has N A Selecting one action for surreptitious defense through each barrier area, and establishing probability matrix P mk The matrix represents the execution of action m (m =1, 2...., N) A ) In the barrier region k (k =1,2.. N.) d,j ) The probability percentage can be reduced, and the probability matrix is optimized to obtain a penetration matrix M km The matrix represents that if the AGV selects the penetration action m in the obstacle area k, the matrix element is 1, otherwise, the matrix element is 0;
establishing an action decision objective function of the AGV:
wherein: j is a unit of pene,i And representing the action decision objective function of the ith station of the AGV.
According to the physical characteristics of the AGV, the times of action selection in the process of transporting the AGV are restrained, and each action selection is defined not to exceed b 1 And secondly, establishing action decision constraint:
and optimally selecting decision variables in a discrete decision space based on action decision constraint, wherein the solving form is matched with the reinforcement learning solving process, and the multi-stage decision problem is uniformly solved by utilizing a Monte Carlo tree search algorithm.
Establishing a unified objective function for the ith AGV based on an objective distribution function and an action decision objective function:
converting the multi-target optimization of the AGV cluster into single-target optimization by using a linear weighting method:
Establishing an objective function constraint:
wherein the objective function constraint is a combination of a constraint model and an action decision constraint.
The problem to be solved is modeled by a Markov decision process so as to improve the use of the algorithm solving process, the state is the information quantity of the intelligent body for showing the self characteristics, and the intelligent body selects actions according to the state in the iteration process, so that the selection of the state has very important influence on the quality of a training result. The problem solved by the invention consists of two stages of target distribution and action decision, wherein the target distribution is a precondition stage of the action decision, and the action selection can be sequentially carried out according to the barrier area of the target only after the AGV selects the target, so that aiming at the problem, the target distribution is taken as a precondition state to be combined with the subsequent barrier area state to carry out unified state space modeling, and according to different targets j, the state space is established as follows:
wherein: s dis Assign status to the target, S r Is the area of the obstacle 1 and is,in order to pass through the area of the obstacle 2 in sequence,in order to pass through the area of the obstacle 3 in sequence,are areas of obstacles 4 that pass in sequence. By integrating the state spaces of all targets, the total state space can be expressed as:
the AGV acts according to different states of different stages, the action is taken as a target selection in the target distribution stage, the AGV can only select one target, and the result is represented as the target selection by a discretization vectorFor the presence of N Tar When there is one target, there is N target distribution stage Tar The actions can be selected;
when the obstacle passing stage action is taken as 5 types of defense actions, the AGV selects one action in a certain obstacle area state, the 5 types of defense actions are expressed by vectors, elements in the vectors sequentially represent action 1, action 2, action 3, action 4 and action 5, and a reward function is established:
the reward function is the most central part of reinforcement learning and guides the intelligent agent to learn. For the problem, the decision result can be evaluated according to the uniform objective function only when the agent reaches the final state, namely, the complete objective distribution matrix X and the action matrix M are decided.
Based on the unified objective function and the objective function constraint, the joint reward function is designed as follows:
and giving a penalty of-1 when the objective decision matrix and the penetration matrix do not meet the constraint, or giving a reward value as a unified objective function value.
And establishing a Monte Carlo tree for each AGV, performing independent search on each tree, forming a combined action set by search results, substituting the combined action set into a combined reward function for evaluation, finally returning the obtained reward value to each tree, and performing backtracking update on the result path node of each tree.
To avoid the influence of the average simulation result on the problem solution result, the selection decision is established as follows:
wherein: v. of father Being a parent node of a node v being computed, C p A constant, orientation relationship used to weigh the search and utilization, Q (v) is based on the results of all AGV assignments and surreptitious defense;
the backtracking update formula is as follows:
wherein: n is a radical of hydrogen new (v) As a new node, N old (v) Is an old node, Q new (v) For new penetration results, Q old (v) Is oldThe defense result, Δ Q, is the poor defense result.
The method takes 20 AGVs and 5 targets as an example, namely the effectiveness of the proposed unified target function and improved algorithm is verified, the difference between the target value and the barrier region is considered, a target value dominance degree and a situation information table are given, meanwhile, the initial course angle of each AGV is different from the carrying time of each target, and each action selection is set not to exceed b 1 =2 times, weight factor ρ 1 ,ρ 2 ,....,ρ 50 All take 1/50, in the algorithmThe algorithm running environment is an i5-9400F processor and is 2.90GHz, the simulation environment is python3.6, and the total iteration step number is set to be 20000 steps.
After 20000 steps of learning training, the algorithm is converged, the total training time is 162.1854 seconds, a target distribution matrix obtained by training and an action matrix of each AGV are represented by the following table, in order to facilitate representation of action results, actions 1 to 5 are represented by numbers 1 to 5 respectively, and a motion sequence made by the AGV sequentially passing through a barrier region is represented by a one-dimensional vector.
Results of the algorithm presented in Table 2
Each AGV carries out independent search according to self heuristic factors in the training process, and because the space is limited, the AGV has the advantages that the search depth and the single AGV objective function value j in the training process are given by taking AGV U1, AGV U5, AGV U10, AGV U15 and AGV U20 as examples i The data of (1).
Fig. 1 to fig. 5 show depth data of tree search in a training process of a partial AGV, and it can be seen from the graph that after 10000 steps of training, the tree search can reach the maximum depth, which illustrates that an AGV strategy can obtain an optimal solution range in the previous training according to a reward function and based on rolout random simulation, and a search strategy mainly searches the optimal solution range in the subsequent training.
FIGS. 6-10 show a portion of an AGV during the training process J i The data is shallow in search depth in the previous training, target shooting is mainly carried out by Monte Carlo at the moment, the search tree obtains a return value by random simulation, but as the search depth is increased, the tree stores the obtained better solution in node information and continuously searches the optimal solution by means of heuristic factors. The oscillation phenomenon occurs in the searching process at the later stage of training, which is that the searching part plays a role in the selection strategy, the optimal solution recorded by the current node is avoided, and other nodes with less access times are selected, so that the local optimization is avoided.
The working principle and the effect are as follows:
the method comprises the steps of respectively establishing a target distribution optimization function and an action decision objective function by considering the advantages, the target value and the action probability of the AGVs, integrating the two functions to form a unified objective function for collaborative task planning of the AGVs, constructing a state space and an action space in stages in a reinforcement learning frame, designing a reward function according to the unified objective function, and finally providing an improved Monte Carlo tree search reinforcement learning algorithm.
The above description is only a preferred embodiment of the present invention, and the scope of the present invention is not limited to the above embodiments, and all technical solutions that belong to the idea of the present invention belong to the scope of the present invention. It should be noted that modifications and adaptations to those skilled in the art without departing from the principles of the present invention should also be considered as within the scope of the present invention.
Claims (10)
1. An AGV cooperative transport target distribution and decision algorithm is characterized by comprising the following steps of:
s1: respectively establishing an dominance matrix and a target decision matrix based on the number of the AGVs and the number of the targets to be carried, and establishing a target distribution optimization function based on the dominance matrix and the target decision matrix;
s2: analyzing an AGV carrying environment, establishing a probability matrix, optimizing according to the probability matrix to obtain a penetration action matrix, and establishing an action decision objective function based on the probability matrix and the penetration action matrix;
s3: establishing action decision constraint according to the action decision objective function;
s4: performing linear weighted integration on the target distribution optimization function and the action decision objective function to establish a uniform objective function;
s5: performing reinforcement learning, performing Markov decision process modeling on target distribution and decision in stages to construct a state space and an action space, and designing a reward function based on a uniform target function to form a multi-stage decision problem model;
s6, designing a combined reward function based on the unified objective function;
s7: establishing a corresponding Monte Carlo tree for each AGV, searching each Monte Carlo tree, forming a combined action set for the search results, substituting the combined action set into a combined reward function for evaluation, and obtaining a reward value;
s8: setting a selection strategy and a backtracking updating formula, returning the obtained reward value to each Monte Carlo tree, improving the selection strategy and the backtracking updating formula of the algorithm, and solving a decision problem model.
2. The AGV cooperative transport target allocation and decision algorithm of claim 1, wherein: the step S1 specifically includes:
in the AGV cooperative process, N is set U AGV and N Tar An object to be carried;
the established dominance degree matrix is A ij I (i =1, 2.... Times., N) in the matrix u ) Line for mobile communication terminalJ (j =1, 2.. Times.n) Tar ) The values in the column represent the comprehensive dominance of the ith AGV to the target j;
the established objective decision matrix is X ij The element value in the matrix is 1, which means that the ith AGV is allocated to the target j, otherwise, the element value of the matrix is 0;
the established target allocation optimization function is as follows:
wherein: j is a unit of dis,i And allocating an optimization function for the target of the ith AGV.
3. The AGV cooperative transport target allocation and decision algorithm of claim 2, wherein: the step S1 further comprises the following steps:
in the process of allocating targets, each AGV can only be allocated to one target, and each target is allocated with at least one AGV, and a constraint model is established for the target allocation process:
4. the AGV cooperative transport target allocation and decision algorithm according to claim 1, wherein: in the step S2, different targets make different maneuvers based on different environments to pass through the barrier in the AGV transporting process, and different targets are set to deploy different air defense areas with different numbers, namely, the target j has N in total d,j Individual obstacle region, AGV has N A Selecting one action for surreptitious defense through each barrier area, and establishing probability matrix P mk The matrix represents the execution of action m (m =1, 2...., N) A ) In the barrier region k (k =1,2.. N.) d,j ) The probability percentage can be reduced, and the probability matrix is optimized to obtain a penetration matrix M km The matrix indicates if the AGV selects the penetration action m in the obstacle area kThe matrix element is 1, otherwise 0;
establishing an action decision objective function of the AGV:
wherein: j is a unit of pene,i And representing the action decision objective function of the ith station of the AGV.
5. The AGV cooperative transport target allocation and decision algorithm according to claim 3, wherein: step S2, the times of action selection in the AGV transporting process are restrained according to the physical characteristics of the AGV, and each action selection is defined not to exceed b 1 Secondly, establishing action decision constraint:
and optimally selecting decision variables in a discrete decision space based on action decision constraints.
6. The AGV cooperative transport target allocation and decision algorithm according to claim 1, wherein: the step S4 specifically comprises the following steps:
establishing a unified objective function for the ith AGV based on an objective distribution function and an action decision objective function:
converting the multi-target optimization of the AGV cluster into single-target optimization by using a linear weighting method:
8. The AGV cooperative transport target allocation and decision algorithm according to claim 1, wherein: the step S5 specifically comprises the following steps:
the state space is established as:
wherein: s dis Assign status to the target, S r Is the area of the obstacle 1 and is,in order to pass through the area of the obstacle 2 in sequence,in order to pass through the area of the obstacle 3 in sequence,are areas of obstacles 4 that pass in sequence. By integrating the state spaces of all targets, the total state space can be expressed as:
the AGV acts differently according to the states of different stages, the action is selected as a target in the target distribution stage, the AGV can only select one target, and the result is represented as a discrete vectorFor the presence of N Tar When there is one target, there is N target distribution stage Tar An action is selectable;
when the obstacle passing stage action is performed as 5 types of defense actions, the AGV selects one action in a certain obstacle area state, the 5 types of defense actions are expressed by vectors, elements in the vectors sequentially represent the action 1, the action 2, the action 3, the action 4 and the action 5, and a reward function is established:
9. the AGV cooperative transport target allocation and decision algorithm of claim 8, wherein: the joint reward function is:
and giving a penalty of-1 when the objective decision matrix and the penetration matrix do not meet the constraint, or giving a uniform objective function value to the reward value.
10. The AGV cooperative transport target allocation and decision algorithm according to claim 1, wherein: the selection decision is:
wherein: v. of father Being a parent node of the node v being computed, C p A constant, orientation relationship used to weigh search and utilization, Q (v) is the result based on all AGV assignments and penetration;
the backtracking update formula is as follows:
wherein: n is a radical of hydrogen new (v) As a new node, N old (v) Is an old node, Q new (v) For new penetration results, Q old (v) For old blast results, Δ Q is poor blast results.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210627881.4A CN115237119A (en) | 2022-06-06 | 2022-06-06 | AGV collaborative transfer target distribution and decision algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210627881.4A CN115237119A (en) | 2022-06-06 | 2022-06-06 | AGV collaborative transfer target distribution and decision algorithm |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115237119A true CN115237119A (en) | 2022-10-25 |
Family
ID=83670192
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210627881.4A Pending CN115237119A (en) | 2022-06-06 | 2022-06-06 | AGV collaborative transfer target distribution and decision algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115237119A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116149375A (en) * | 2023-04-21 | 2023-05-23 | 中国人民解放军国防科技大学 | Unmanned aerial vehicle search planning method and device for online decision, electronic equipment and medium |
CN116673968A (en) * | 2023-08-03 | 2023-09-01 | 南京云创大数据科技股份有限公司 | Mechanical arm track planning element selection method and system based on reinforcement learning |
-
2022
- 2022-06-06 CN CN202210627881.4A patent/CN115237119A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116149375A (en) * | 2023-04-21 | 2023-05-23 | 中国人民解放军国防科技大学 | Unmanned aerial vehicle search planning method and device for online decision, electronic equipment and medium |
CN116149375B (en) * | 2023-04-21 | 2023-07-07 | 中国人民解放军国防科技大学 | Unmanned aerial vehicle search planning method and device for online decision, electronic equipment and medium |
CN116673968A (en) * | 2023-08-03 | 2023-09-01 | 南京云创大数据科技股份有限公司 | Mechanical arm track planning element selection method and system based on reinforcement learning |
CN116673968B (en) * | 2023-08-03 | 2023-10-10 | 南京云创大数据科技股份有限公司 | Mechanical arm track planning element selection method and system based on reinforcement learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115237119A (en) | AGV collaborative transfer target distribution and decision algorithm | |
CN113159432B (en) | Multi-agent path planning method based on deep reinforcement learning | |
CN112733421B (en) | Task planning method for cooperation of unmanned aerial vehicle with ground fight | |
CN107886201B (en) | Multi-objective optimization method and device for multi-unmanned aerial vehicle task allocation | |
CN101136081B (en) | Unmanned aircraft multiple planes synergic tasks distributing method based on ant colony intelligence | |
CN107807665B (en) | Unmanned aerial vehicle formation detection task cooperative allocation method and device | |
CN107677273A (en) | A kind of cluster unmanned plane Multiple routes planning method based on two-dimensional grid division | |
CN114422056A (en) | Air-ground non-orthogonal multiple access uplink transmission method based on intelligent reflecting surface | |
CN102778229A (en) | Mobile Agent path planning method based on improved ant colony algorithm under unknown environment | |
CN114020031B (en) | Unmanned aerial vehicle cluster collaborative dynamic target searching method based on improved pigeon colony optimization | |
CN114460959A (en) | Unmanned aerial vehicle group cooperative autonomous decision-making method and device based on multi-body game | |
CN110109653B (en) | Intelligent engine for land fighter chess and operation method thereof | |
CN116736883B (en) | Unmanned aerial vehicle cluster intelligent cooperative motion planning method | |
CN114167898B (en) | Global path planning method and system for collecting data of unmanned aerial vehicle | |
Park et al. | APE: A data-driven, behavioral model-based anti-poaching engine | |
CN116702633B (en) | Heterogeneous warhead task reliability planning method based on multi-objective dynamic optimization | |
CN115016537B (en) | Heterogeneous unmanned aerial vehicle configuration and task planning combined optimization method in SEAD scene | |
Yasear et al. | Fine-Tuning the Ant Colony System Algorithm Through Harris’s Hawk Optimizer for Travelling Salesman Problem. | |
CN114326822B (en) | Unmanned aerial vehicle cluster information sharing method based on evolutionary game | |
CN115454067A (en) | Path planning method based on fusion algorithm | |
CN113283827B (en) | Two-stage unmanned aerial vehicle logistics path planning method based on deep reinforcement learning | |
CN112486185A (en) | Path planning method based on ant colony and VO algorithm in unknown environment | |
Gaowei et al. | Using multi-layer coding genetic algorithm to solve time-critical task assignment of heterogeneous UAV teaming | |
CN114578845B (en) | Unmanned aerial vehicle track planning method based on improved ant colony algorithm | |
Li et al. | Improved genetic algorithm for multi-agent task allocation with time windows |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |