CN115237119A - AGV collaborative transfer target distribution and decision algorithm - Google Patents

AGV collaborative transfer target distribution and decision algorithm Download PDF

Info

Publication number
CN115237119A
CN115237119A CN202210627881.4A CN202210627881A CN115237119A CN 115237119 A CN115237119 A CN 115237119A CN 202210627881 A CN202210627881 A CN 202210627881A CN 115237119 A CN115237119 A CN 115237119A
Authority
CN
China
Prior art keywords
agv
action
decision
target
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210627881.4A
Other languages
Chinese (zh)
Inventor
魏才盛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Ruizhixingyuan Intelligent Technology Co ltd
Original Assignee
Suzhou Ruizhixingyuan Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Ruizhixingyuan Intelligent Technology Co ltd filed Critical Suzhou Ruizhixingyuan Intelligent Technology Co ltd
Priority to CN202210627881.4A priority Critical patent/CN115237119A/en
Publication of CN115237119A publication Critical patent/CN115237119A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0221Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving a learning process
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/40Business processes related to the transportation industry

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Theoretical Computer Science (AREA)
  • Tourism & Hospitality (AREA)
  • Quality & Reliability (AREA)
  • Development Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Educational Administration (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Automation & Control Theory (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an AGV (automatic guided vehicle) cooperative transportation target distribution and decision algorithm, which is characterized by comprising the steps of respectively establishing an dominance matrix and a target decision matrix based on the quantity of AGV and the quantity of targets to be transported, establishing a target distribution optimization function, analyzing AGV transportation environment, establishing a probability matrix and an anti-intrusion action matrix, and establishing an action decision target function and an action decision constraint; linear weighting integration, namely establishing a uniform objective function, performing reinforcement learning and designing a reward function to form a multi-stage decision problem model; designing a joint reward function to carry out action reward; setting a selection strategy and backtracking an updating formula, improving the selection strategy and backtracking the updating formula, and solving a solution problem model. The invention has the advantages of high calculation speed, short training time and good convergence effect, thereby realizing the effects of intelligent calculation, accurate allocation and action decision.

Description

AGV collaborative transfer target distribution and decision algorithm
Technical Field
The invention relates to the technical field of intelligent carrying equipment, in particular to an AGV (automatic guided vehicle) cooperative carrying target distribution and decision algorithm.
Background
An AGV is an unmanned transport vehicle that implements vehicle positioning, orientation, obstacle avoidance, and path planning by installing a rotatable laser scanner to detect the surrounding environment. As key transportation equipment in goods transportation, all the AGVs form a transportation network, and transportation activities between a starting point and a target unit are completed under the instruction of a control system. Therefore, how to perform target allocation and action decision of AGVs through the control system is an important issue to be solved urgently.
The AGV target allocation and action decision problem is an extension of the Vehicle Routing Problem (VRP) and belongs to the NP-hard problem, so that the large-scale problem is difficult to solve using the conventional precise solution algorithm. The problem is far more complex than the traditional VRP problem and scheduling problem due to the fact that the scene is complex and the problems of multiple constraints, multiple targets, uncertainty and the like exist. At present, a scheduling method based on experience and rules is mainly used, and the method causes the problems of low on-time delivery rate, low AGV utilization rate, long delivery time and the like. Therefore, it is urgently needed to combine the optimal scheduling theory with the characteristics of AGV delivery and design an effective optimal theory and method for solving the problem.
In conclusion, the method has very important practical significance in researching the AGV scheduling problem, integrates target distribution and action decision under various constraints and maneuvers aiming at the AGV scheduling problem, and designs an effective intelligent algorithm to solve the problem.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide an AGV cooperative transport target allocation and decision algorithm which has the advantages of high calculation speed, short training time and good convergence effect, thereby realizing the effects of intelligent calculation, accurate allocation and action decision.
In order to achieve the purpose, the invention provides the following technical scheme:
an AGV cooperative transport target distribution and decision algorithm comprises the following steps:
s1: respectively establishing an dominance matrix and a target decision matrix based on the number of the AGVs and the number of the targets to be carried, and establishing a target distribution optimization function based on the dominance matrix and the target decision matrix;
s2: analyzing an AGV carrying environment, establishing a probability matrix, optimizing according to the probability matrix to obtain a penetration action matrix, and establishing an action decision objective function based on the probability matrix and the penetration action matrix;
s3: establishing action decision constraint according to the action decision objective function;
s4: performing linear weighted integration on the target distribution optimization function and the action decision objective function to establish a uniform objective function;
s5: performing reinforcement learning, performing Markov decision process modeling on target allocation and decision in stages to construct a state space and an action space, and designing a reward function based on a unified target function to form a multi-stage decision problem model;
s6, designing a joint reward function based on the unified objective function;
s7: establishing a corresponding Monte Carlo tree for each AGV, searching each Monte Carlo tree, forming a combined action set for the search results, substituting a combined reward function for evaluation and obtaining a reward value;
s8: setting a selection strategy and a backtracking updating formula, returning the obtained reward value to each Monte Carlo tree, improving the selection strategy and the backtracking updating formula of the algorithm, and solving a decision problem model.
As a further improvement of the present invention, the step S1 specifically includes:
in the AGV cooperative process, N is set U AGV and N Tar An object to be carried;
the established dominance matrix is A ij I (i =1, 2.... Times., N) in the matrix u ) Line j (j =1, 2.. Ang., N) Tar ) The values in the column represent the comprehensive dominance of the ith AGV to the target j;
the established objective decision matrix is X ij The element value in the matrix is 1, which means that the ith AGV is allocated to the target j, otherwise, the element value of the matrix is 0;
the established target allocation optimization function is as follows:
Figure BDA0003678551000000021
wherein: j is a unit of dis,i And allocating an optimization function to the target of the ith AGV stage.
As a further improvement of the present invention, step S1 further includes:
in the process of allocating targets, each AGV can only be allocated to one target, and each target is allocated with at least one AGV, and a constraint model is established for the AGV:
Figure BDA0003678551000000031
as a further improvement of the present invention, in the step S2, due to the AGV transport process, different targets make different maneuvers based on different environments to pass through the obstacle, and different targets are set to deploy different air defense areas in different numbers, that is, the target j has N in total d,j Individual obstacle region, AGV has N A Selecting one action for defense through each barrier area, and establishing probability matrix P mk Matrix represents performing action m (m =1, 2..., N) A ) N in the barrier region k (k =1,2.. Times.n) d,j ) The probability percentage can be reduced, and the probability matrix is optimized to obtain a penetration matrix M km The matrix represents that if the AGV selects the penetration action m in the obstacle area k, the matrix element is 1, otherwise, the matrix element is 0;
establishing an action decision objective function of the AGV:
Figure BDA0003678551000000032
wherein: j is a unit of pene,i Representing the action decision objective function of the ith AGV.
As a further improvement of the invention, the step S2 further comprises the step of restricting the number of action selections in the process of transporting the AGV according to the physical characteristics of the AGV, and defining that each action selection does not exceed b 1 Secondly, establishing action decision constraint:
Figure BDA0003678551000000041
and optimally selecting decision variables in a discrete decision space based on action decision constraints.
As a further improvement of the present invention, the step S4 specifically includes:
establishing a unified objective function for the ith AGV based on an objective distribution function and an action decision objective function:
Figure BDA0003678551000000042
converting the multi-target optimization of the AGV cluster into single-target optimization by using a linear weighting method:
Figure BDA0003678551000000043
wherein:
Figure BDA0003678551000000044
is a weight factor, and
Figure BDA0003678551000000045
as a further improvement of the present invention, said step S4 further includes establishing an objective function constraint:
Figure BDA0003678551000000051
wherein the objective function constraint is a combination of a constraint model and an action decision constraint.
As a further improvement of the present invention, the step S5 specifically includes:
the state space is established as:
Figure BDA0003678551000000052
wherein: s dis Assign status to the target, S r Is the area of the obstacle 1 and is,
Figure BDA0003678551000000053
in order to pass through the area of the obstacle 2 in sequence,
Figure BDA0003678551000000054
in order to pass through the area of the obstacle 3 in sequence,
Figure BDA0003678551000000055
are areas of obstacles 4 that pass in sequence. By integrating the state spaces of all targets, the total state space can be expressed as:
Figure BDA0003678551000000056
the AGV acts differently according to the states of different stages, the action is selected as a target in the target distribution stage, the AGV can only select one target, and the result is represented as a discrete vector
Figure BDA0003678551000000057
For the presence of N Tar When there is one target, there is N target distribution stage Tar An action is selectable;
when the obstacle passing stage action is taken as 5 types of defense actions, the AGV selects one action in a certain obstacle area state, the 5 types of defense actions are expressed by vectors, elements in the vectors sequentially represent action 1, action 2, action 3, action 4 and action 5, and a reward function is established:
Figure BDA0003678551000000058
as a further improvement of the present invention, the joint reward function is:
Figure BDA0003678551000000061
and giving a penalty of-1 when the objective decision matrix and the penetration matrix do not meet the constraint, or giving a uniform objective function value to the reward value.
As a further improvement of the present invention, the selection decision is:
Figure BDA0003678551000000062
wherein: v. of father Being a parent node of a node v being computed, C p A constant, orientation relationship used to weigh search and utilization, Q (v) is the result based on all AGV assignments and penetration;
the backtracking update formula is as follows:
Figure BDA0003678551000000063
wherein: n is a radical of new (v) As a new node, N old (v) Is an old node, Q new (v) For new penetration results, Q old (v) For old blast results, Δ Q is the blast result difference.
The invention has the beneficial effects that: the method comprises the steps of respectively establishing a target distribution optimization function and an action decision objective function by considering the advantages, the target value and the action probability of the AGVs, integrating the two functions to form a unified objective function for collaborative task planning of the AGVs, constructing a state space and an action space in stages in a reinforcement learning frame, designing a reward function according to the unified objective function, and finally providing an improved Monte Carlo tree search reinforcement learning algorithm.
Drawings
FIGS. 1-5 are partial AGV search tree search depth maps;
FIGS. 6-10 show J of a partial AGV i The value is obtained.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples. In which like parts are designated by like reference numerals. It should be noted that the terms "front," "back," "left," "right," "upper" and "lower" used in the following description refer to directions in the drawings, and the terms "bottom" and "top," "inner" and "outer" refer to directions toward and away from, respectively, the geometric center of a particular component.
Referring to fig. 1, a specific embodiment of an AGV cooperative transport target allocation and decision algorithm according to the present invention includes the following steps:
s1: respectively establishing an dominance matrix and a target decision matrix based on the number of the AGVs and the number of the targets to be carried, and establishing a target distribution optimization function based on the dominance matrix and the target decision matrix;
s2: analyzing an AGV carrying environment, establishing a probability matrix, optimizing according to the probability matrix to obtain a penetration action matrix, and establishing an action decision objective function based on the probability matrix and the penetration action matrix;
s3: establishing action decision constraint according to the action decision objective function;
s4: performing linear weighted integration on the target distribution optimization function and the action decision objective function to establish a uniform objective function;
s5: performing reinforcement learning, performing Markov decision process modeling on target allocation and decision in stages to construct a state space and an action space, and designing a reward function based on a unified target function to form a multi-stage decision problem model;
s6, designing a combined reward function based on the unified objective function;
s7: establishing a corresponding Monte Carlo tree for each AGV, searching each Monte Carlo tree, forming a combined action set for the search results, substituting a combined reward function for evaluation and obtaining a reward value;
s8: setting a selection strategy and a backtracking updating formula, returning the obtained reward value to each Monte Carlo tree, improving the selection strategy and the backtracking updating formula of the algorithm, and solving the decision problem model.
The step S1 specifically comprises the following steps:
in the AGV cooperative process, N is set U Tables AGV and N Tar An object to be carried;
the established dominance degree matrix is A ij I (i =1, 2.... Times., N) in the matrix u ) Line j (j =1, 2.... Times., N) Tar ) The values of the columns represent the comprehensive dominance of the ith AGV to the target j;
the established objective decision matrix is X ij If the element value in the matrix is 1, the ith AGV is allocated to a target j, otherwise, the element value of the matrix is 0;
the established target distribution optimization function is as follows:
Figure BDA0003678551000000081
wherein: j is a unit of dis,i Allocating an optimization function to the ith target of the AGVs, wherein in the process of allocating the targets, each AGV can only be allocated to one target, and each target is allocated to at least one AGV, and establishing a constraint model:
Figure BDA0003678551000000082
and carrying out constraint control on each AGV through a constraint model.
In the process of transporting the AGV, passing environments of different targets are different, the environments mainly comprise different obstacles, and the AGV can make different maneuvering actions to pass through the obstacles. The probability of the obstacle being blocked and the probability of the maneuver are shown in the following table:
table 1 original blocking probability table for each obstacle
Figure BDA0003678551000000083
Figure BDA0003678551000000091
TABLE 2 AGV action to obstacle block reduction ratio table
Figure BDA0003678551000000092
Different targets pass through the obstacle by making different maneuvers based on different environments in the AGV transporting process, and different targets are set to deploy different air defense areas with different numbers, namely the target j has N in total d,j Individual obstacle region, AGV has N A Selecting one action for surreptitious defense through each barrier area, and establishing probability matrix P mk The matrix represents the execution of action m (m =1, 2...., N) A ) In the barrier region k (k =1,2.. N.) d,j ) The probability percentage can be reduced, and the probability matrix is optimized to obtain a penetration matrix M km The matrix represents that if the AGV selects the penetration action m in the obstacle area k, the matrix element is 1, otherwise, the matrix element is 0;
establishing an action decision objective function of the AGV:
Figure BDA0003678551000000093
wherein: j is a unit of pene,i And representing the action decision objective function of the ith station of the AGV.
According to the physical characteristics of the AGV, the times of action selection in the process of transporting the AGV are restrained, and each action selection is defined not to exceed b 1 And secondly, establishing action decision constraint:
Figure BDA0003678551000000101
and optimally selecting decision variables in a discrete decision space based on action decision constraint, wherein the solving form is matched with the reinforcement learning solving process, and the multi-stage decision problem is uniformly solved by utilizing a Monte Carlo tree search algorithm.
Establishing a unified objective function for the ith AGV based on an objective distribution function and an action decision objective function:
Figure BDA0003678551000000102
converting the multi-target optimization of the AGV cluster into single-target optimization by using a linear weighting method:
Figure BDA0003678551000000103
wherein:
Figure BDA0003678551000000104
is a weight factor, an
Figure BDA0003678551000000105
Establishing an objective function constraint:
Figure BDA0003678551000000111
wherein the objective function constraint is a combination of a constraint model and an action decision constraint.
The problem to be solved is modeled by a Markov decision process so as to improve the use of the algorithm solving process, the state is the information quantity of the intelligent body for showing the self characteristics, and the intelligent body selects actions according to the state in the iteration process, so that the selection of the state has very important influence on the quality of a training result. The problem solved by the invention consists of two stages of target distribution and action decision, wherein the target distribution is a precondition stage of the action decision, and the action selection can be sequentially carried out according to the barrier area of the target only after the AGV selects the target, so that aiming at the problem, the target distribution is taken as a precondition state to be combined with the subsequent barrier area state to carry out unified state space modeling, and according to different targets j, the state space is established as follows:
Figure BDA0003678551000000112
wherein: s dis Assign status to the target, S r Is the area of the obstacle 1 and is,
Figure BDA0003678551000000113
in order to pass through the area of the obstacle 2 in sequence,
Figure BDA0003678551000000114
in order to pass through the area of the obstacle 3 in sequence,
Figure BDA0003678551000000115
are areas of obstacles 4 that pass in sequence. By integrating the state spaces of all targets, the total state space can be expressed as:
Figure BDA0003678551000000116
the AGV acts according to different states of different stages, the action is taken as a target selection in the target distribution stage, the AGV can only select one target, and the result is represented as the target selection by a discretization vector
Figure BDA0003678551000000117
For the presence of N Tar When there is one target, there is N target distribution stage Tar The actions can be selected;
when the obstacle passing stage action is taken as 5 types of defense actions, the AGV selects one action in a certain obstacle area state, the 5 types of defense actions are expressed by vectors, elements in the vectors sequentially represent action 1, action 2, action 3, action 4 and action 5, and a reward function is established:
Figure BDA0003678551000000121
the reward function is the most central part of reinforcement learning and guides the intelligent agent to learn. For the problem, the decision result can be evaluated according to the uniform objective function only when the agent reaches the final state, namely, the complete objective distribution matrix X and the action matrix M are decided.
Based on the unified objective function and the objective function constraint, the joint reward function is designed as follows:
Figure BDA0003678551000000122
and giving a penalty of-1 when the objective decision matrix and the penetration matrix do not meet the constraint, or giving a reward value as a unified objective function value.
And establishing a Monte Carlo tree for each AGV, performing independent search on each tree, forming a combined action set by search results, substituting the combined action set into a combined reward function for evaluation, finally returning the obtained reward value to each tree, and performing backtracking update on the result path node of each tree.
To avoid the influence of the average simulation result on the problem solution result, the selection decision is established as follows:
Figure BDA0003678551000000131
wherein: v. of father Being a parent node of a node v being computed, C p A constant, orientation relationship used to weigh the search and utilization, Q (v) is based on the results of all AGV assignments and surreptitious defense;
the backtracking update formula is as follows:
Figure BDA0003678551000000132
wherein: n is a radical of hydrogen new (v) As a new node, N old (v) Is an old node, Q new (v) For new penetration results, Q old (v) Is oldThe defense result, Δ Q, is the poor defense result.
The method takes 20 AGVs and 5 targets as an example, namely the effectiveness of the proposed unified target function and improved algorithm is verified, the difference between the target value and the barrier region is considered, a target value dominance degree and a situation information table are given, meanwhile, the initial course angle of each AGV is different from the carrying time of each target, and each action selection is set not to exceed b 1 =2 times, weight factor ρ 12 ,....,ρ 50 All take 1/50, in the algorithm
Figure BDA0003678551000000134
The algorithm running environment is an i5-9400F processor and is 2.90GHz, the simulation environment is python3.6, and the total iteration step number is set to be 20000 steps.
After 20000 steps of learning training, the algorithm is converged, the total training time is 162.1854 seconds, a target distribution matrix obtained by training and an action matrix of each AGV are represented by the following table, in order to facilitate representation of action results, actions 1 to 5 are represented by numbers 1 to 5 respectively, and a motion sequence made by the AGV sequentially passing through a barrier region is represented by a one-dimensional vector.
Results of the algorithm presented in Table 2
Figure BDA0003678551000000133
Figure BDA0003678551000000141
Each AGV carries out independent search according to self heuristic factors in the training process, and because the space is limited, the AGV has the advantages that the search depth and the single AGV objective function value j in the training process are given by taking AGV U1, AGV U5, AGV U10, AGV U15 and AGV U20 as examples i The data of (1).
Fig. 1 to fig. 5 show depth data of tree search in a training process of a partial AGV, and it can be seen from the graph that after 10000 steps of training, the tree search can reach the maximum depth, which illustrates that an AGV strategy can obtain an optimal solution range in the previous training according to a reward function and based on rolout random simulation, and a search strategy mainly searches the optimal solution range in the subsequent training.
FIGS. 6-10 show a portion of an AGV during the training process J i The data is shallow in search depth in the previous training, target shooting is mainly carried out by Monte Carlo at the moment, the search tree obtains a return value by random simulation, but as the search depth is increased, the tree stores the obtained better solution in node information and continuously searches the optimal solution by means of heuristic factors. The oscillation phenomenon occurs in the searching process at the later stage of training, which is that the searching part plays a role in the selection strategy, the optimal solution recorded by the current node is avoided, and other nodes with less access times are selected, so that the local optimization is avoided.
The working principle and the effect are as follows:
the method comprises the steps of respectively establishing a target distribution optimization function and an action decision objective function by considering the advantages, the target value and the action probability of the AGVs, integrating the two functions to form a unified objective function for collaborative task planning of the AGVs, constructing a state space and an action space in stages in a reinforcement learning frame, designing a reward function according to the unified objective function, and finally providing an improved Monte Carlo tree search reinforcement learning algorithm.
The above description is only a preferred embodiment of the present invention, and the scope of the present invention is not limited to the above embodiments, and all technical solutions that belong to the idea of the present invention belong to the scope of the present invention. It should be noted that modifications and adaptations to those skilled in the art without departing from the principles of the present invention should also be considered as within the scope of the present invention.

Claims (10)

1. An AGV cooperative transport target distribution and decision algorithm is characterized by comprising the following steps of:
s1: respectively establishing an dominance matrix and a target decision matrix based on the number of the AGVs and the number of the targets to be carried, and establishing a target distribution optimization function based on the dominance matrix and the target decision matrix;
s2: analyzing an AGV carrying environment, establishing a probability matrix, optimizing according to the probability matrix to obtain a penetration action matrix, and establishing an action decision objective function based on the probability matrix and the penetration action matrix;
s3: establishing action decision constraint according to the action decision objective function;
s4: performing linear weighted integration on the target distribution optimization function and the action decision objective function to establish a uniform objective function;
s5: performing reinforcement learning, performing Markov decision process modeling on target distribution and decision in stages to construct a state space and an action space, and designing a reward function based on a uniform target function to form a multi-stage decision problem model;
s6, designing a combined reward function based on the unified objective function;
s7: establishing a corresponding Monte Carlo tree for each AGV, searching each Monte Carlo tree, forming a combined action set for the search results, substituting the combined action set into a combined reward function for evaluation, and obtaining a reward value;
s8: setting a selection strategy and a backtracking updating formula, returning the obtained reward value to each Monte Carlo tree, improving the selection strategy and the backtracking updating formula of the algorithm, and solving a decision problem model.
2. The AGV cooperative transport target allocation and decision algorithm of claim 1, wherein: the step S1 specifically includes:
in the AGV cooperative process, N is set U AGV and N Tar An object to be carried;
the established dominance degree matrix is A ij I (i =1, 2.... Times., N) in the matrix u ) Line for mobile communication terminalJ (j =1, 2.. Times.n) Tar ) The values in the column represent the comprehensive dominance of the ith AGV to the target j;
the established objective decision matrix is X ij The element value in the matrix is 1, which means that the ith AGV is allocated to the target j, otherwise, the element value of the matrix is 0;
the established target allocation optimization function is as follows:
Figure FDA0003678550990000021
wherein: j is a unit of dis,i And allocating an optimization function for the target of the ith AGV.
3. The AGV cooperative transport target allocation and decision algorithm of claim 2, wherein: the step S1 further comprises the following steps:
in the process of allocating targets, each AGV can only be allocated to one target, and each target is allocated with at least one AGV, and a constraint model is established for the target allocation process:
Figure FDA0003678550990000022
4. the AGV cooperative transport target allocation and decision algorithm according to claim 1, wherein: in the step S2, different targets make different maneuvers based on different environments to pass through the barrier in the AGV transporting process, and different targets are set to deploy different air defense areas with different numbers, namely, the target j has N in total d,j Individual obstacle region, AGV has N A Selecting one action for surreptitious defense through each barrier area, and establishing probability matrix P mk The matrix represents the execution of action m (m =1, 2...., N) A ) In the barrier region k (k =1,2.. N.) d,j ) The probability percentage can be reduced, and the probability matrix is optimized to obtain a penetration matrix M km The matrix indicates if the AGV selects the penetration action m in the obstacle area kThe matrix element is 1, otherwise 0;
establishing an action decision objective function of the AGV:
Figure FDA0003678550990000023
Figure FDA0003678550990000031
wherein: j is a unit of pene,i And representing the action decision objective function of the ith station of the AGV.
5. The AGV cooperative transport target allocation and decision algorithm according to claim 3, wherein: step S2, the times of action selection in the AGV transporting process are restrained according to the physical characteristics of the AGV, and each action selection is defined not to exceed b 1 Secondly, establishing action decision constraint:
Figure FDA0003678550990000032
and optimally selecting decision variables in a discrete decision space based on action decision constraints.
6. The AGV cooperative transport target allocation and decision algorithm according to claim 1, wherein: the step S4 specifically comprises the following steps:
establishing a unified objective function for the ith AGV based on an objective distribution function and an action decision objective function:
Figure FDA0003678550990000033
converting the multi-target optimization of the AGV cluster into single-target optimization by using a linear weighting method:
Figure FDA0003678550990000041
wherein:
Figure FDA00036785509900000411
is a weight factor, and
Figure FDA0003678550990000043
7. the AGV cooperative transport target allocation and decision algorithm according to claim 6, wherein: the step S4 further includes establishing an objective function constraint:
Figure FDA0003678550990000044
wherein the objective function constraint is a combination of a constraint model and an action decision constraint.
8. The AGV cooperative transport target allocation and decision algorithm according to claim 1, wherein: the step S5 specifically comprises the following steps:
the state space is established as:
Figure FDA0003678550990000045
wherein: s dis Assign status to the target, S r Is the area of the obstacle 1 and is,
Figure FDA0003678550990000046
in order to pass through the area of the obstacle 2 in sequence,
Figure FDA0003678550990000047
in order to pass through the area of the obstacle 3 in sequence,
Figure FDA0003678550990000048
are areas of obstacles 4 that pass in sequence. By integrating the state spaces of all targets, the total state space can be expressed as:
Figure FDA0003678550990000049
the AGV acts differently according to the states of different stages, the action is selected as a target in the target distribution stage, the AGV can only select one target, and the result is represented as a discrete vector
Figure FDA00036785509900000410
For the presence of N Tar When there is one target, there is N target distribution stage Tar An action is selectable;
when the obstacle passing stage action is performed as 5 types of defense actions, the AGV selects one action in a certain obstacle area state, the 5 types of defense actions are expressed by vectors, elements in the vectors sequentially represent the action 1, the action 2, the action 3, the action 4 and the action 5, and a reward function is established:
Figure FDA0003678550990000051
9. the AGV cooperative transport target allocation and decision algorithm of claim 8, wherein: the joint reward function is:
Figure FDA0003678550990000052
and giving a penalty of-1 when the objective decision matrix and the penetration matrix do not meet the constraint, or giving a uniform objective function value to the reward value.
10. The AGV cooperative transport target allocation and decision algorithm according to claim 1, wherein: the selection decision is:
Figure FDA0003678550990000053
wherein: v. of father Being a parent node of the node v being computed, C p A constant, orientation relationship used to weigh search and utilization, Q (v) is the result based on all AGV assignments and penetration;
the backtracking update formula is as follows:
Figure FDA0003678550990000061
wherein: n is a radical of hydrogen new (v) As a new node, N old (v) Is an old node, Q new (v) For new penetration results, Q old (v) For old blast results, Δ Q is poor blast results.
CN202210627881.4A 2022-06-06 2022-06-06 AGV collaborative transfer target distribution and decision algorithm Pending CN115237119A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210627881.4A CN115237119A (en) 2022-06-06 2022-06-06 AGV collaborative transfer target distribution and decision algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210627881.4A CN115237119A (en) 2022-06-06 2022-06-06 AGV collaborative transfer target distribution and decision algorithm

Publications (1)

Publication Number Publication Date
CN115237119A true CN115237119A (en) 2022-10-25

Family

ID=83670192

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210627881.4A Pending CN115237119A (en) 2022-06-06 2022-06-06 AGV collaborative transfer target distribution and decision algorithm

Country Status (1)

Country Link
CN (1) CN115237119A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116149375A (en) * 2023-04-21 2023-05-23 中国人民解放军国防科技大学 Unmanned aerial vehicle search planning method and device for online decision, electronic equipment and medium
CN116673968A (en) * 2023-08-03 2023-09-01 南京云创大数据科技股份有限公司 Mechanical arm track planning element selection method and system based on reinforcement learning

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116149375A (en) * 2023-04-21 2023-05-23 中国人民解放军国防科技大学 Unmanned aerial vehicle search planning method and device for online decision, electronic equipment and medium
CN116149375B (en) * 2023-04-21 2023-07-07 中国人民解放军国防科技大学 Unmanned aerial vehicle search planning method and device for online decision, electronic equipment and medium
CN116673968A (en) * 2023-08-03 2023-09-01 南京云创大数据科技股份有限公司 Mechanical arm track planning element selection method and system based on reinforcement learning
CN116673968B (en) * 2023-08-03 2023-10-10 南京云创大数据科技股份有限公司 Mechanical arm track planning element selection method and system based on reinforcement learning

Similar Documents

Publication Publication Date Title
CN115237119A (en) AGV collaborative transfer target distribution and decision algorithm
CN113159432B (en) Multi-agent path planning method based on deep reinforcement learning
CN112733421B (en) Task planning method for cooperation of unmanned aerial vehicle with ground fight
CN107886201B (en) Multi-objective optimization method and device for multi-unmanned aerial vehicle task allocation
CN101136081B (en) Unmanned aircraft multiple planes synergic tasks distributing method based on ant colony intelligence
CN107807665B (en) Unmanned aerial vehicle formation detection task cooperative allocation method and device
CN107677273A (en) A kind of cluster unmanned plane Multiple routes planning method based on two-dimensional grid division
CN114422056A (en) Air-ground non-orthogonal multiple access uplink transmission method based on intelligent reflecting surface
CN102778229A (en) Mobile Agent path planning method based on improved ant colony algorithm under unknown environment
CN114020031B (en) Unmanned aerial vehicle cluster collaborative dynamic target searching method based on improved pigeon colony optimization
CN114460959A (en) Unmanned aerial vehicle group cooperative autonomous decision-making method and device based on multi-body game
CN110109653B (en) Intelligent engine for land fighter chess and operation method thereof
CN116736883B (en) Unmanned aerial vehicle cluster intelligent cooperative motion planning method
CN114167898B (en) Global path planning method and system for collecting data of unmanned aerial vehicle
Park et al. APE: A data-driven, behavioral model-based anti-poaching engine
CN116702633B (en) Heterogeneous warhead task reliability planning method based on multi-objective dynamic optimization
CN115016537B (en) Heterogeneous unmanned aerial vehicle configuration and task planning combined optimization method in SEAD scene
Yasear et al. Fine-Tuning the Ant Colony System Algorithm Through Harris’s Hawk Optimizer for Travelling Salesman Problem.
CN114326822B (en) Unmanned aerial vehicle cluster information sharing method based on evolutionary game
CN115454067A (en) Path planning method based on fusion algorithm
CN113283827B (en) Two-stage unmanned aerial vehicle logistics path planning method based on deep reinforcement learning
CN112486185A (en) Path planning method based on ant colony and VO algorithm in unknown environment
Gaowei et al. Using multi-layer coding genetic algorithm to solve time-critical task assignment of heterogeneous UAV teaming
CN114578845B (en) Unmanned aerial vehicle track planning method based on improved ant colony algorithm
Li et al. Improved genetic algorithm for multi-agent task allocation with time windows

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination