CN116720703A - AGV multi-target task scheduling method and system based on deep reinforcement learning - Google Patents

AGV multi-target task scheduling method and system based on deep reinforcement learning Download PDF

Info

Publication number
CN116720703A
CN116720703A CN202310726554.9A CN202310726554A CN116720703A CN 116720703 A CN116720703 A CN 116720703A CN 202310726554 A CN202310726554 A CN 202310726554A CN 116720703 A CN116720703 A CN 116720703A
Authority
CN
China
Prior art keywords
module
agv
task
task scheduling
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310726554.9A
Other languages
Chinese (zh)
Inventor
吴小倩
郑益民
吴庆耀
秦卓睿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Bangqi Technology Intelligent Development Co ltd
Original Assignee
Shenzhen Bangqi Technology Intelligent Development Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Bangqi Technology Intelligent Development Co ltd filed Critical Shenzhen Bangqi Technology Intelligent Development Co ltd
Priority to CN202310726554.9A priority Critical patent/CN116720703A/en
Publication of CN116720703A publication Critical patent/CN116720703A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/092Reinforcement learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • General Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Software Systems (AREA)
  • Development Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Game Theory and Decision Science (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Operations Research (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Educational Administration (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an AGV multi-target task scheduling method based on deep reinforcement learning, which comprises the following steps: acquiring environment real-time state information and making a decision through a deep Q network to obtain an optimal scheduling sequence; constructing a deep reinforcement learning task scheduling model; the processed real-time environment state information, actions and rewards are used as a sample training task scheduling model; and carrying out deployment prediction on the trained task scheduling model. According to the invention, AGV tasks in a specific scene are dynamically scheduled in real time by using a deep reinforcement learning method, and a transducer module is introduced to enable a model to pay attention to global task information, learn the characteristics and the importance of different tasks and learn the general knowledge among the tasks; the method improves the reinforcement learning reward function, introduces AGV collision waiting time, not only considers the goal of reducing environmental resource waste, but also considers the goal of reducing traffic collision blocking in the AGV driving process, and is more in line with the AGV task scheduling in a specific scene.

Description

AGV multi-target task scheduling method and system based on deep reinforcement learning
Technical Field
The invention relates to the technical field of AGV scheduling, in particular to an AGV multi-target task scheduling method and system based on deep reinforcement learning.
Background
In recent years, artificial intelligence and related industries are rapidly developing and becoming the focus of attention of academia, industry and governments around the world, and the national institutes have issued "new generation development planning" to highlight the national strategic status of artificial intelligence research and industry.
The prior bulletin number CN114912809A discloses an automatic container terminal AGV scheduling method for DDQN, which aims at data conversion of actual conditions of an automatic container terminal, and builds an automatic container terminal horizontal transport link AGV task allocation scheduling problem model by using a mathematical modeling method; converting the AGV scheduling problem model into a reinforcement learning DDQN model by using a Markov decision process MDP; constructing a DDQN deep reinforcement learning neural network model Q network; programming and simulating external environment changes in the DDQN model, and training the Q network model; the trained Q network model is packaged into a real-time online dispatching system for the automatic container terminal horizontal transport AGVs, real-time and efficient task dispatching is carried out on the terminal AGVs, and the method introduces a deep reinforcement learning DDQN method into the problem of AGV task allocation dispatching of the automatic container terminal horizontal transport links, so that the real-time and efficient task dispatching of the AGVs in the face of a terminal dynamic environment is realized, the AGV utilization rate is improved, and the transport efficiency is improved.
However, although the transportation efficiency is improved, the method only emphasizes the dispatching rule of the horizontal transportation link of the automatic container terminal, ignores the real-time dynamic task information of the whole system, is only suitable for the AGV dispatching scene of the horizontal transportation link of the automatic container terminal, and cannot consider the collision waiting time-consuming condition in the actual running of the AGV, so that the method combines the attention mechanism and the general knowledge in the task in the deep reinforcement learning process, and simultaneously considers the aim of reducing the traffic collision blocking in the running process of the AGV, thereby being a problem to be solved in task dispatching.
Disclosure of Invention
In order to overcome the defects in the prior art, the embodiment of the invention provides an AGV multi-target task scheduling method and system based on deep reinforcement learning, which enable a model to pay attention to global task information by introducing a transducer with an attention mechanism, learn the characteristics and the importance of different tasks and learn the general knowledge among the tasks; meanwhile, the targets for reducing traffic collision blocking in the AGV driving process are considered, the AGV task scheduling in a specific scene is more met, and the multi-target scheduling has a great effect on improving the task execution efficiency.
In order to achieve the above purpose, the present invention provides the following technical solutions:
the invention provides an AGV multi-target task scheduling method based on deep reinforcement learning, which comprises the following steps:
s1: collecting and preprocessing environment real-time state information in a system;
s2: the method comprises the steps of constructing a task scheduling model, wherein the task scheduling model comprises an AGV module, a task module, a map module and a prediction module;
s3: taking the preprocessed environment real-time state information, corresponding actions and rewards as a sample training task scheduling model;
the process of training the task scheduling model in the step S3 is as follows:
s31: dividing the sample into an AGV state, a map state and a task state, respectively inputting the AGV state, the map state and the task state into the AGV module, the map module and the task module, and connecting the characteristic vector output by the AGV module and the map module with the characteristic vector output by the task module introduced with the transducer to obtain the characteristic vector of the current input training sample;
s32: and inputting the feature vector of the current input training sample into a prediction module to obtain rewards of corresponding actions, calculating the mean square error between the actual prediction result and the target model prediction result as an objective function of the training model, training a task scheduling model by using a random gradient descent method, and finally obtaining a trained task scheduling model.
S4: and carrying out deployment prediction on the trained task scheduling model, wherein the task scheduling model carries out deployment prediction by inputting real-time environment state information into the trained task scheduling model, and predicting the next scheduling option according to the output result of the model.
Preferably, the environmental real-time state information collected in the S1 comprises an AGV state, a map state and a task state, wherein the AGV state comprises a departure point, a target point and the current task running time; the map state comprises a warehouse entry point, a warehouse entry buffer zone, a high-order buffer zone, a warehouse exit buffer zone, an AGV waiting queue of the warehouse exit point, the running time of a running task and the AGV queue of the waiting zone; the task state includes corresponding target points for the task for each of the departure points.
Preferably, the training sample expression of the sample training task scheduling model in S3 is:
xi=(s t ,a t ,R t ,s t+1 )
a t =(agv k ,task l )
wherein xi is an ith training sample, st is an environmental state at t, at is an action taken at t, which means that a specific AGV is allocated to a specific task, and Rt is a reward obtained after the action is taken at t, including idle waiting loss of resources in a map module at t and collision waiting loss at t AGV, so that the method can reduce time consumption of collision waiting of the AGV while focusing on map resource waste.
Preferably, the specific calculation step of the feature vector of the currently input training sample in S31 is as follows:
a1, calculating an embedded matrix of a current input training sample, wherein the specific formula is as follows:
wherein ,an embedding matrix, x, representing the current module input samples i1 、x i2 、...、x in Representing input information required by the current module, M representing a learnable embedded transformation matrix;
a2, respectively obtaining a query matrix Q, a key matrix K and a value matrix V from the embedded matrix through a plurality of different linear transformations, wherein the specific formula is as follows:
Q=qW Q
K=kW K
V=vW V
wherein ,WQ 、W K 、W V Representing a linear operation matrix;
a3, carrying out attention weighting on Q, K, V, carrying out normalization and full-connection linear operation, and finally obtaining a feature vector, wherein the specific formula is as follows:
where S is the weighted result of the attention,for normalization result, z is the result after full-join linear operation, +.>Is a feature vector +_>For scaling factor, W 1 、b 1 Is an adaptively learnable parameter, LN () represents a layer normalization operation, where softmax () represents an activation function for calculating the attention score of the real-time state in the task module, and furthermore, a recursive jump connection is employed after the attention calculation and after the feed forward calculation in the self-attention module of the transducer.
Preferably, the next scheduling option predicted in S4 is a feature vectorAnd obtaining the score of each scheduling option through single-layer full-connection layer and Softmax operation in the prediction module, and taking the scheduling option with the highest score as the prediction result of the next scheduling option in the current state.
The invention further provides an AGV multi-target task scheduling system based on deep reinforcement learning, which is applied to the AGV multi-target task scheduling method based on deep reinforcement learning, and comprises a preprocessing module, a model building module, a model training module, a model prediction module and a data storage module;
the preprocessing module is used for collecting and preprocessing environment real-time state information in the system;
the model construction module is used for constructing a task scheduling model, and the task scheduling model construction module comprises an AGV module, a task module, a map module and a prediction module;
the model training module is used for taking the preprocessed environment real-time state information, corresponding actions and rewards as a sample training task scheduling model;
the model prediction module is used for carrying out deployment prediction on the trained task scheduling model, wherein the task scheduling model carries out deployment prediction by inputting real-time environment state information into the trained task scheduling model, and predicting the next scheduling option according to the output result of the model;
the data storage module is used for storing data information in the whole task scheduling process.
The invention has the technical effects and advantages that:
1. the AGV multi-target task scheduling method based on deep reinforcement learning provided by the invention utilizes a self-attention mechanism to process task states, has strong modeling capability on data, enables a model to pay attention to global task information, learns the characteristics of different tasks, learns the general knowledge of the tasks, and plays a great role in improving the task execution efficiency.
2. The task scheduling method provided by the invention can dynamically feed back the real-time environment state, and simultaneously, the more important task is focused by using the transducer, so that the time consumption of AGV collision waiting is introduced, the traffic collision blocking condition in the AGV driving process can be considered, and the method is more suitable for AGV task scheduling in a specific scene.
Drawings
Fig. 1 is a schematic diagram of a training process of a training task scheduling model according to an embodiment of the present invention.
Fig. 2 is a schematic overall flow chart of an AGV multi-objective task scheduling method based on deep reinforcement learning according to an embodiment of the present invention.
FIG. 3 is a schematic diagram of an AGV multi-objective task scheduling system based on deep reinforcement learning according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As shown in fig. 1 and 2, in an embodiment of the present invention, there is provided an AGV multi-target task scheduling method based on deep reinforcement learning, including the following steps:
s1: collecting and preprocessing environment real-time state information in a system;
further, the environment real-time state information collected in the S1 comprises an AGV state, a map state and a task state, wherein the AGV state comprises a departure point, a target point and the current task running time; the map state comprises a warehouse entry point, a warehouse entry buffer zone, a high-order buffer zone, a warehouse exit buffer zone, an AGV waiting queue of the warehouse exit point, the running time of a running task and the AGV queue of the waiting zone; the task state includes corresponding target points for the task for each of the departure points.
In this embodiment, it is specifically required to explain that the preprocessing process of the environmental real-time status information is to make a decision on the environmental real-time information through the deep Q network to obtain an optimal scheduling sequence.
S2: the method comprises the steps of constructing a task scheduling model, wherein the task scheduling model comprises an AGV module, a task module, a map module and a prediction module;
in this embodiment, it is specifically required to explain that there are a plurality of material transportation tasks and a plurality of AGV trolleys in the manufacturing system, under a certain constraint condition, the material transportation tasks need to be reasonably assigned to each AGV trolley, and the execution sequence is specified, and meanwhile, the material transportation tasks are completed by real-time scheduling under the dynamic condition, so that the whole system meets a certain performance optimization index, and the process is task scheduling.
S3: taking the preprocessed environment real-time state information, corresponding actions and rewards as a sample training task scheduling model;
further, the training sample expression of the sample training task scheduling model in S3 is:
xi=(s t ,a t ,R t ,s t+1 )
a t =(agv k ,task l )
wherein xi is an ith training sample, st is an environmental state at t, at is an action taken at t, which means that a specific AGV is allocated to a specific task, and Rt is a reward obtained after the action is taken at t, including idle waiting loss of resources in a map module at t and collision waiting loss at t AGV, so that the method can reduce time consumption of collision waiting of the AGV while focusing on map resource waste.
The process of training the task scheduling model in the step S3 is as follows:
s31: dividing the sample into an AGV state, a map state and a task state, respectively inputting the AGV state, the map state and the task state into the AGV module, the map module and the task module, and connecting the characteristic vector output by the AGV module and the map module with the characteristic vector output by the task module introduced with the transducer to obtain the characteristic vector of the current input training sample;
further, the specific calculation step of the feature vector of the currently input training sample in S31 is as follows:
a1, calculating an embedded matrix of a current input training sample, wherein the specific formula is as follows:
wherein ,an embedding matrix, x, representing the current module input samples i1 、x i2 、...、x in Representing input information required by the current module, M representing a learnable embedded transformation matrix;
a2, respectively obtaining a query matrix Q, a key matrix K and a value matrix V from the embedded matrix through a plurality of different linear transformations, wherein the specific formula is as follows:
Q=qW Q
K=kW K
V=vW V
wherein ,WQ 、W K 、W V Representing a linear operation matrix;
a3, carrying out attention weighting on Q, K, V, carrying out normalization and full-connection linear operation, and finally obtaining a feature vector, wherein the specific formula is as follows:
where S is the weighted result of the attention,for normalization result, z is the result after full-join linear operation, +.>Is a feature vector +_>For scaling factor, W 1 、b 1 Is an adaptively learnable parameter, LN () represents a layer normalization operation, where softmax () represents an activation function, used to calculate an attention score for a real-time state in a task module, and, in addition,a recursive jump connection is used after the attention calculation and after the feedforward calculation in the transducer's self-attention module.
S32: and inputting the feature vector of the current input training sample into a prediction module to obtain rewards of corresponding actions, calculating the mean square error between the actual prediction result and the target model prediction result as an objective function of the training model, training a task scheduling model by using a random gradient descent method, and finally obtaining a trained task scheduling model.
S4: and carrying out deployment prediction on the trained task scheduling model, wherein the task scheduling model carries out deployment prediction by inputting real-time environment state information into the trained task scheduling model, and predicting the next scheduling option according to the output result of the model.
Further, the next scheduling option predicted in S4 is a feature vectorAnd obtaining the score of each scheduling option through single-layer full-connection layer and Softmax operation in the prediction module, and taking the scheduling option with the highest score as the prediction result of the next scheduling option in the current state.
Based on the same thought as the AGV multi-target task scheduling method based on the deep reinforcement learning in the embodiment, the invention also provides an AGV multi-target task scheduling system based on the deep reinforcement learning, which can be used for executing the AGV multi-target task scheduling method based on the deep reinforcement learning. For ease of illustration, only those portions of the structural schematic diagram of an embodiment of an AGV multi-objective task scheduling system based on deep reinforcement learning are shown in connection with embodiments of the present invention, and those skilled in the art will appreciate that the illustrated structure is not limiting of the apparatus and may include more or fewer components than illustrated, or may combine certain components, or a different arrangement of components.
As shown in fig. 3, in an embodiment of the present invention, an AGV multi-objective task scheduling system based on deep reinforcement learning is provided, which includes a preprocessing module, a model building module, a model training module, a model prediction module, and a data storage module;
the preprocessing module is used for collecting and preprocessing environment real-time state information in the system;
the model construction module is used for constructing a task scheduling model, and the task scheduling model construction module comprises an AGV module, a task module, a map module and a prediction module;
the model training module is used for taking the preprocessed environment real-time state information, corresponding actions and rewards as a sample training task scheduling model;
further, the training task scheduling model specifically includes:
dividing the sample into an AGV state, a map state and a task state, respectively inputting the AGV state, the map state and the task state into an AGV module, a map module and a task module, and connecting the characteristic vectors output by the AGV module and the map module with the characteristic vectors output by the task module introduced with a transducer to obtain the characteristic vector of the current input training sample;
and inputting the feature vector of the sample into a prediction module to obtain rewards of corresponding actions, calculating the mean square error between the prediction result and the target model prediction result as an objective function of a training model, training a task scheduling model by using a random gradient descent method, and finally obtaining a trained task scheduling model.
The model prediction module is used for carrying out deployment prediction on the trained task scheduling model, wherein the task scheduling model carries out deployment prediction by inputting real-time environment state information into the trained task scheduling model, and predicting the next scheduling option according to the output result of the model;
the data storage module is used for storing data information in the whole task scheduling process.
In this embodiment, it should be specifically noted that, in the embodiment of the present invention that the deep reinforcement learning-based AGV multi-target task scheduling system and the deep reinforcement learning-based AGV multi-target task scheduling method of the present invention are in one-to-one correspondence, the technical features and the beneficial effects described in the embodiment of the foregoing embodiment of the deep reinforcement learning-based AGV multi-target task scheduling method are applicable to the embodiment of the deep reinforcement learning-based AGV multi-target task scheduling system, and specific content may be found in the description of the embodiment of the method of the present invention, which is not repeated herein, and thus is stated.
In addition, in the implementation of the deep reinforcement learning-based AGV multi-objective task scheduling system of the foregoing embodiment, the logic division of each program module is merely illustrative, and in practical application, the above-mentioned function allocation may be performed by different program modules according to needs, for example, in view of configuration requirements of corresponding hardware or convenience of implementation of software, that is, the internal structure of the deep reinforcement learning-based AGV multi-objective task scheduling system is divided into different program modules to perform all or part of the functions described above.
Finally: the foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (6)

1. The AGV multi-target task scheduling method based on deep reinforcement learning is characterized by comprising the following steps of:
s1: collecting and preprocessing environment real-time state information in a system;
s2: the method comprises the steps of constructing a task scheduling model, wherein the task scheduling model comprises an AGV module, a task module, a map module and a prediction module;
s3: taking the preprocessed environment real-time state information, corresponding actions and rewards as a sample training task scheduling model;
the process of training the task scheduling model in the step S3 is as follows:
s31: dividing the sample into an AGV state, a map state and a task state, respectively inputting the AGV state, the map state and the task state into the AGV module, the map module and the task module, and connecting the characteristic vector output by the AGV module and the map module with the characteristic vector output by the task module introduced with the transducer to obtain the characteristic vector of the current input training sample;
s32: and inputting the feature vector of the current input training sample into a prediction module to obtain rewards of corresponding actions, calculating the mean square error between the actual prediction result and the target model prediction result as an objective function of the training model, training a task scheduling model by using a random gradient descent method, and finally obtaining a trained task scheduling model.
S4: and carrying out deployment prediction on the trained task scheduling model, wherein the task scheduling model carries out deployment prediction by inputting real-time environment state information into the trained task scheduling model, and predicting the next scheduling option according to the output result of the model.
2. The deep reinforcement learning-based AGV multi-objective task scheduling method according to claim 1, wherein the method comprises the following steps: the environment real-time state information collected in the S1 comprises an AGV state, a map state and a task state, wherein the AGV state comprises a departure point, a target point and the current task running time; the map state comprises a warehouse entry point, a warehouse entry buffer zone, a high-order buffer zone, a warehouse exit buffer zone, an AGV waiting queue of the warehouse exit point, the running time of a running task and the AGV queue of the waiting zone; the task state includes corresponding target points for the task for each of the departure points.
3. The deep reinforcement learning-based AGV multi-objective task scheduling method according to claim 1, wherein the method comprises the following steps: and S3, training sample expression of the sample training task scheduling model is as follows:
xi=(s t ,a t ,R t ,s t+1 )
a t =(agv k ,task l )
wherein xi is an ith training sample, st is an environmental state at t, at is an action taken at t, which means that a specific AGV is allocated to a specific task, and Rt is a reward obtained after the action is taken at t, including idle waiting loss of resources in a map module at t and collision waiting loss at t AGV, so that the method can reduce time consumption of collision waiting of the AGV while focusing on map resource waste.
4. The deep reinforcement learning-based AGV multi-objective task scheduling method according to claim 1, wherein the method comprises the following steps: the specific calculation steps of the feature vector of the current input training sample in S31 are as follows:
a1, calculating an embedded matrix of a current input training sample, wherein the specific formula is as follows:
wherein ,an embedding matrix, x, representing the current module input samples i1 、x i2 、...、x in Representing input information required by the current module, M representing a learnable embedded transformation matrix;
a2, respectively obtaining a query matrix Q, a key matrix K and a value matrix V from the embedded matrix through a plurality of different linear transformations, wherein the specific formula is as follows:
Q=qW Q
K=kW K
V=vW V
wherein ,WQ 、W K 、W V Representing a linear operation matrix;
a3, carrying out attention weighting on Q, K, V, carrying out normalization and full-connection linear operation, and finally obtaining a feature vector, wherein the specific formula is as follows:
where S is the weighted result of the attention,for normalization result, z is the result after full-join linear operation, +.>Is a feature vector +_>For scaling factor, W 1 、b 1 Is an adaptively learnable parameter, LN () represents a layer normalization operation, where softmax () represents an activation function for calculating the attention score of the real-time state in the task module, and furthermore, a recursive jump connection is employed after the attention calculation and after the feed forward calculation in the self-attention module of the transducer.
5. The deep reinforcement learning-based AGV multi-objective task scheduling method according to claim 1, wherein the method comprises the following steps: the next scheduling option predicted in S4 is the feature vectorIn predictionAnd obtaining the score of each scheduling option through single-layer full-connection layer and Softmax operation in the module, and taking the scheduling option with the highest score as the prediction result of the next scheduling option in the current state.
6. An AGV multi-target task scheduling system based on deep reinforcement learning is characterized in that: an AGV multi-target task scheduling method based on deep reinforcement learning applied to any one of claims 1-5, comprising a preprocessing module, a model construction module, a model training module, a model prediction module and a data storage module;
the preprocessing module is used for collecting and preprocessing environment real-time state information in the system;
the model construction module is used for constructing a task scheduling model, and the task scheduling model construction module comprises an AGV module, a task module, a map module and a prediction module;
the model training module is used for taking the preprocessed environment real-time state information, corresponding actions and rewards as a sample training task scheduling model;
the model prediction module is used for carrying out deployment prediction on the trained task scheduling model, wherein the task scheduling model carries out deployment prediction by inputting real-time environment state information into the trained task scheduling model, and predicting the next scheduling option according to the output result of the model;
the data storage module is used for storing data information in the whole task scheduling process.
CN202310726554.9A 2023-06-19 2023-06-19 AGV multi-target task scheduling method and system based on deep reinforcement learning Pending CN116720703A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310726554.9A CN116720703A (en) 2023-06-19 2023-06-19 AGV multi-target task scheduling method and system based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310726554.9A CN116720703A (en) 2023-06-19 2023-06-19 AGV multi-target task scheduling method and system based on deep reinforcement learning

Publications (1)

Publication Number Publication Date
CN116720703A true CN116720703A (en) 2023-09-08

Family

ID=87865799

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310726554.9A Pending CN116720703A (en) 2023-06-19 2023-06-19 AGV multi-target task scheduling method and system based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN116720703A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117236821A (en) * 2023-11-10 2023-12-15 淄博纽氏达特机器人系统技术有限公司 Online three-dimensional boxing method based on hierarchical reinforcement learning

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117236821A (en) * 2023-11-10 2023-12-15 淄博纽氏达特机器人系统技术有限公司 Online three-dimensional boxing method based on hierarchical reinforcement learning
CN117236821B (en) * 2023-11-10 2024-02-06 淄博纽氏达特机器人系统技术有限公司 Online three-dimensional boxing method based on hierarchical reinforcement learning

Similar Documents

Publication Publication Date Title
Nakasuka et al. Dynamic scheduling system utilizing machine learning as a knowledge acquisition tool
CN111191934B (en) Multi-target cloud workflow scheduling method based on reinforcement learning strategy
Capizzi et al. Advanced and adaptive dispatch for smart grids by means of predictive models
WO2019127945A1 (en) Structured neural network-based imaging task schedulability prediction method
Zhang et al. DeepMAG: Deep reinforcement learning with multi-agent graphs for flexible job shop scheduling
Li et al. Data-based scheduling framework and adaptive dispatching rule of complex manufacturing systems
CN116720703A (en) AGV multi-target task scheduling method and system based on deep reinforcement learning
CN112732436B (en) Deep reinforcement learning acceleration method of multi-core processor-single graphics processor
Liu et al. A framework for scheduling in cloud manufacturing with deep reinforcement learning
Gabel et al. On a successful application of multi-agent reinforcement learning to operations research benchmarks
CN114936783A (en) RGV (vehicle target volume) trolley scheduling method and system based on MMDDPG (multimedia messaging service data distribution group) algorithm
Wang et al. Logistics-involved task scheduling in cloud manufacturing with offline deep reinforcement learning
Chen et al. Deep reinforcement learning assisted genetic programming ensemble hyper-heuristics for dynamic scheduling of container port trucks
CN117472000A (en) AGV intelligent operation regulation and control system based on operation state analysis
CN115293623A (en) Training method and device for production scheduling model, electronic equipment and medium
CN112734111A (en) AGV dynamic time estimation method for horizontal transportation task
Zhang et al. Tmfo-aggru: a graph convolutional gated recurrent network for metro passenger flow forecasting
Zhao et al. Model for Selecting Optimal Dispatching Rules Based Real-time Optimize Job Shop Scheduling Problem
CN111950691A (en) Reinforced learning strategy learning method based on potential action representation space
Kaoud et al. Scheduling of automated guided vehicles and machines in flexible manufacturing systems: a simulation study
Wang et al. Job Shop Scheduling Problem Using Proximal Policy Optimization
Wang et al. Hybrid Task Scheduling in Cloud Manufacturing With Sparse-Reward Deep Reinforcement Learning
Yuan et al. Research on flexible job shop scheduling problem with AGV using double DQN
Li et al. Research on collaborative control method of manufacturing process based on distributed multi-agent cooperation
Wang et al. Deep Recurrent Q-Network for Cloud Manufacturing Scheduling Problems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination