CN116629479A

CN116629479A - AGV multi-target traffic control method and system for single-way collision-free AGV

Info

Publication number: CN116629479A
Application number: CN202310634753.7A
Authority: CN
Inventors: 吴小倩; 郑益民; 吴庆耀; 秦卓睿
Original assignee: Shenzhen Bangqi Technology Intelligent Development Co ltd
Current assignee: Shenzhen Bangqi Technology Intelligent Development Co ltd
Priority date: 2023-05-31
Filing date: 2023-05-31
Publication date: 2023-08-22

Abstract

The application discloses a single-way collision-free AGV multi-target traffic control method and system, and particularly relates to the technical field of AGV collision-free, comprising the following steps of S10, firstly, digitally marking a single-way path resource and an AGV path task of a target reality map and storing the single-way path resource and the AGV path task in a database; s20, acquiring real-time environment state information, calculating the current single-way road path resource occupation condition, and acquiring an AGV collision-free evaluation coefficient; s30, constructing a traffic control deep reinforcement learning model based on real-time environment state information; according to the application, the traffic control deep reinforcement learning model is utilized to perform normalization processing and full-connection linear operation, a sample training traffic control model is finally obtained, and the next scheduling option is predicted according to the output result of the sample training traffic control model, so that the AGV is facilitated to find the optimal scheduling value in each scheduling task, the collision-free running of the one-way road is automatically completed, the collision-free running efficiency of the AGV is improved, and the running cost of the AGV is reduced.

Description

AGV multi-target traffic control method and system for single-way collision-free AGV

Technical Field

The application relates to the technical field of AGV collision-free, in particular to a single-way collision-free AGV multi-target traffic control method and system.

Background

AGVs are short for automatic guided vehicles, which are automatically or manually loaded with goods and then automatically unloaded according to a set route or by pulling a loaded truck to a set place. Along with the continuous development of scientific technology, the AGV utilizes computer technology, artificial intelligence technique to realize how to understand tasks and plan reasonable paths, improves the performance of the AGV, reduces the cost of manufacturing industry, and therefore has wider scope in the automobile assembly line industry.

In the aspect of traffic control in the automobile assembly line industry, great progress has been made based on rule driving and heuristic learning, but most of the methods only emphasize a specific scheduling rule, neglect the real-time dynamic environment information of the whole system and only adapt to a specific certain AGV scheduling scene; and the time-consuming condition of collision waiting in the actual running of the AGV cannot be considered. Therefore, how to combine real-time environmental status information in collision-free constraint, so that less environmental resources are wasted, and meanwhile, the aim of reducing traffic collision blocking in the AGV driving process is considered, so that the problem to be solved in traffic control is solved.

Disclosure of Invention

In order to overcome the above-mentioned drawbacks of the prior art, embodiments of the present application provide a single-way collision-free AGV multi-target traffic control method and system, so as to solve the problems set forth in the above-mentioned background art.

In order to achieve the above purpose, the present application provides the following technical solutions: a single-way collision-free AGV multi-target traffic control method comprises the following steps:

s10, firstly, digitally marking a single-way path resource and an AGV path task of a target reality map and storing the single-way path resource and the AGV path task in a database;

s20, acquiring real-time environment state information, calculating the current single-way road path resource occupation condition, and acquiring an AGV collision-free evaluation coefficient;

s30, constructing a traffic control deep reinforcement learning model based on real-time environment state information; the traffic control depth intensity learning model is realized by the following formula:

x _i ＝(S _t ,A _t ,R _t ,S _t+1 )

A _t ＝(a _k )

wherein ,x_i For the ith training sample, S _t For the environmental state at time t, A _t Action taken for time t, S _t+1 Represents an AGV assigned to the next departure, R _t For the scheduling value obtained after taking action at time t,

C _path,t indicating the idle loss of the single-line road path resource in the map module at the moment t,

indicating the crash waiting loss of the AGV at time t,

indicating the total length of time lost for all AGVs to complete the task,

Q(S _t ,A _t ) Representing a traffic control deep reinforcement learning model, wherein the traffic control deep reinforcement learning model is obtained by iteratively accumulating scheduling values at the moment t and carrying out expected operation, and the Q value is used for reinforcement learning;

s40, based on the traffic control deep reinforcement learning model, carrying out normalization processing on feature vectors of training samples, obtaining a sample training traffic control model through a prediction submodule, and completing iterative optimization by using a gradient descent method in the prediction process;

s50, carrying out deployment prediction on the sample training traffic control model, wherein the deployment prediction is to input real-time environment state information into the sample training traffic control model, and predicting the next scheduling option according to the output result of the sample training traffic control model.

Preferably, the step S10 includes the following:

s101, marking the abscissa and the ordinate of the workshop range and the drivable single-way road path as [ x ] _i ，y _i ]；

S102, marking the abscissa and ordinate of the starting point and the end point of the AGV driving path as [ x ] ₀ ，y ₀ ]、[x _n ，n]；

S103, marking the path resource number of the AGV to travel as pj, the travel speed of the AGV as vj, the type number of the AGV as aj and the safety distance as d.

Preferably, the step S20 includes the following:

s201, real-time environment state information comprises an AGV state and a map state;

the AGV state comprises a departure point, a target point, a running path resource number pj, a running speed vj, a category number aj and waiting time w;

the map state comprises the abscissa [ x ] of the workshop range and the travelable one-way road path _i ，y _i ]The time t when the path resource is occupied and the AGV driving safety distance d.

S202, integrating the AGV state and the map state to form an AGV collision-free evaluation coefficient;

the AGV collision-free evaluation coefficient is based on the occupation condition of the single-way road path resources, the starting time of the collision-free AGV is obtained, and the calculation formula of the AGV collision-free evaluation coefficient is as follows:

p＝p _k ∩p _j

[x,y]＝p[0][0]

s _j ＝t[x][y]+d×v _j -(p _j -p)×v _j

w[x][y]＝p×(v _k -v _j )+C

wherein p is the collision path set of two AGVs, p _k 、p _j Form path resource numbers, p 0, respectively denoted as two AGVs][0]Is the first coordinate of the first section collision path under the condition of the same direction or opposite direction, s _j Is a as _j The earliest possible departure time, w [ x ]][y]Is a as _j And C is a correction coefficient of waiting time consumption, and the correction is carried out by a user according to actual conditions.

S203, comparing the AGV collision-free evaluation coefficient with a preset AGV collision-free evaluation coefficient threshold value, if the AGV collision-free evaluation coefficient exceeds a preset range, indicating that the AGV is easy to collide, and carrying out self-adaptive adjustment on the running speed or waiting time of the AGV when the collision risk is about to occur.

Preferably, the step S40 includes the following:

s401, carrying out normalization processing on the feature vector of the training sample to obtain a calculation formula of the feature vector of the current input training sample, wherein the calculation formula is as follows:

wherein ,feature vector, x representing current input sample _i1 、x _i2 、…、x _in Representing the currently required input information, M represents a learnable embedded transformation matrix.

Preferably, the step S40 further includes the following:

s402, inputting the feature vector of the training sample into a prediction submodule to obtain a sample training traffic control model; the prediction submodule process comprises

Carrying out normalization and full-connection linear operation on feature vectors of training samples to finally obtain a sample training traffic control model, wherein the specific formula is as follows:

wherein ,to normalize the result, z _j Training a traffic control model for samples after full-join linear operation, W ₁ 、b ₁ Is a parameter that can be adaptively learned; LN (. Alpha.) ₁ ) Representation layer normalization operation,/->Representation layer feedforward neural network operation, +.>

Preferably, the step S40 further includes the following:

and carrying out iterative optimization by adopting a gradient descent method in the training sample prediction process, wherein the iterative optimization accords with the following expression:

wherein ,/>Representing the negative direction of the gradient, C _j Representing the search step in the direction.

Preferably, the deployment prediction refers to inputting real-time environmental state information into a trained traffic control model, and predicting a multi-objective optimized AGV departure sequence with minimum traffic time and minimum waiting delay and corresponding according to an output result of the modelDeparture time; the next scheduling option of the deployment prediction is referred to as a feature vectorAnd obtaining the score of each scheduling option through single-layer full-connection layer and Softmax operation in the prediction submodule, and taking the scheduling option with the highest score as the prediction result of the next scheduling option in the current state.

A single-way collision-free AGV multi-target traffic control system comprising:

the traffic control digitizing module is used for digitally marking the single-way path resource and the AGV path task of the target real map and storing the single-way path resource and the AGV path task into a database;

the collision-free evaluation coefficient module is used for acquiring real-time environment state information, calculating the current single-way road path resource occupation condition and acquiring an AGV collision-free evaluation coefficient;

the traffic control depth reinforcement module is used for constructing a traffic control depth reinforcement learning model based on the real-time environmental state information;

the traffic control iteration optimization module is used for carrying out normalization processing on the feature vector of the training sample based on the traffic control deep reinforcement learning model, obtaining a sample training traffic control model through the prediction submodule, and completing iteration optimization by using a gradient descent method in the prediction process;

the traffic control deployment prediction sub-module inputs the real-time environment state information into the sample training traffic control model to conduct deployment prediction, and predicts the next scheduling option according to the output result of the sample training traffic control model.

The application has the technical effects and advantages that:

(1) According to the application, the AGV path task and the real-time map data are digitally marked to form real-time environment state information, the current single-way path resource occupation condition is calculated, the AGV collision-free evaluation coefficient is obtained, automatic adjustment of the sending speed and the time point of the AGV are realized according to the obtained AGV collision-free evaluation coefficient, normalization processing and full-connection linear operation are carried out by using a traffic control depth reinforcement learning model, a sample training traffic control model is finally obtained, the next scheduling option is predicted according to the output result of the sample training traffic control model, the AGV is facilitated to find the optimal scheduling value in each scheduling task, the single-way collision-free running of the AGV is automatically completed, the AGV collision-free running efficiency is improved, and the AGV running cost is reduced;

(2) The AGV multi-target traffic control method suitable for the single-way collision-free uses the resource occupation rule, so that real-time road condition information of a running path can be considered while the AGV runs, and the waste of single-way road path resources is reduced;

(3) The traffic control method provided by the application also introduces a deep reinforcement learning traffic control model, can pay attention to global environment information, dynamically feeds back real-time environment states, simultaneously introduces AGV collision waiting time consumption, can consider traffic collision blocking conditions in the AGV driving process, is more in line with the AGV traffic control in a specific scene, and has a great effect on improving the traffic control efficiency by multi-objective scheduling.

Drawings

FIG. 1 is a flow chart of the method of the present application.

Fig. 2 is a block diagram of the system architecture of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

Traffic control: there are multiple material transport tasks in the manufacturing system, multiple AGV carts. Under certain constraint conditions, the execution sequence and departure time of the AGVs are required to be specified, and meanwhile, the real-time scheduling under the dynamic condition is carried out to complete the material transportation task, so that the whole system meets certain performance optimization indexes on the premise of no collision in a one-way road map.

Example 1

Referring to fig. 1-2, the embodiment provides a single-way collision-free AGV multi-target traffic control method, which specifically includes the following steps:

referring to fig. 1-2, the present embodiment provides a single-way collision-free AGV multi-target traffic control method, and step S10 includes the following steps:

In the embodiment, the target real map and the path states of the AGVs are digitally marked to form a dynamic path planning map, so that collision-free calculation among a plurality of AGVs can be conveniently judged.

referring to fig. 1-2, the present embodiment provides a single-way collision-free AGV multi-target traffic control method, and step S20 includes the following steps:

p＝p _k ∩p _j

[x,y]＝p[0][0]

s _j ＝t[x][y]+d×v _j -(p _j -p)×v _j

w[x][y]＝p×(v _k -v _j )+C

In this embodiment, through obtaining the route state and the map state information of AGV and correlating, form AGV collision-free evaluation coefficient, according to actual conditions, select suitable waiting time consuming correction coefficient, comprehensively judge AGV collision-free evaluation coefficient, be favorable to the AGV to judge collision risk according to the calculation result to trace back the reason that appears collision risk, thereby be convenient for the AGV to carry out further processing.

S30, constructing a traffic control deep reinforcement learning model based on real-time environment state information;

referring to fig. 1-2, the present embodiment provides a single-way collision-free AGV multi-target traffic control method, and step S30 includes the following steps:

the traffic control depth intensity learning model is realized by the following formula:

x _i ＝(S _t ,A _t ,R _t ,S _t+1 )

A _t ＝(a _k )

indicating the crash waiting loss of the AGV at time t,

indicating the total length of time lost for all AGVs to complete the task,

Q(S _t ,A _t ) The model represents a deep reinforcement learning model of traffic control, which is obtained by iteratively accumulating scheduling values at time t and performing expected operation, and the Q value is used for reinforcement learning.

In this embodiment, by using the convolutional neural network to obtain training sample characteristics and reduce data dimension for the AGV state and map state data, a traffic control deep reinforcement learning model is formed, which is favorable for giving different values according to different AGV states during initialization, further improves the accuracy and stability of the AGV collision-free data, and reduces the collision waiting time and the task completion time of the AGV while focusing on path resource waste.

referring to fig. 1-2, the present embodiment provides a single-way collision-free AGV multi-target traffic control method, and step S40 includes the following steps:

wherein ,feature vector, x representing current input sample _i1 、x _i2 、…、x _in Representing the currently required input information, M represents a learnable embedded transformation matrix,

s402, inputting the feature vector of the training sample into a prediction submodule to obtain a sample training traffic control model;

the prediction submodule process comprises the following steps:

And S403, performing iterative optimization by adopting a gradient descent method in the training sample prediction process.

The scheduling option overlap optimization conforms to the following expression:

wherein Representing the negative direction of the gradient, C _j The search step in the direction is indicated,

in the implementation, a sample training traffic control model is obtained by a prediction submodule through a gradient descent method, the sample training traffic control model is subjected to iterative optimization according to the negative direction of the gradient, the gradient direction is obtained through normalization processing, the searching step length is determined by a linear searching algorithm, namely the score coordinate of the next point is regarded as z _k+1 And then find a function that satisfies f (z _k+1 ) Z of the minimum value of (2) _k+1 And the sample training traffic control model is obtained through iterative optimization scheduling by a gradient descent method, so that the optimal solution of the objective function is found, and the AGV optimal scheduling value is obtained.

In this embodiment, it should be specifically described that the deployment prediction refers to inputting real-time environmental status information into a trained traffic control model, and predicting a multi-objective optimized AGV departure sequence with minimum traffic time and minimum waiting delay and a corresponding departure time according to an output result of the model; the next scheduling option of the deployment prediction is referred to as a feature vectorAnd obtaining the score of each scheduling option through single-layer full-connection layer and Softmax operation in the prediction submodule, and taking the scheduling option with the highest score as the prediction result of the next scheduling option in the current state.

Example 2

Referring to fig. 1-2, the present embodiment provides an AGV multi-target traffic control system with single-way collision free, which specifically includes the following contents:

In summary, the AGV path task and the real-time map data are digitally marked to form real-time environment state information, the current single-way path resource occupation condition is calculated, the AGV collision-free evaluation coefficient is obtained, automatic adjustment of the sending speed and the time point of the AGV are achieved according to the obtained AGV collision-free evaluation coefficient, the traffic control depth reinforcement learning model is utilized to perform normalization processing and full-connection linear operation, the sample training traffic control model is finally obtained, the next scheduling option is predicted according to the output result of the sample training traffic control model, the AGV can find the optimal scheduling value in each scheduling task, the single-way collision-free running of the AGV is automatically completed, the AGV collision-free running efficiency is improved, and the running cost of the AGV is reduced.

It should be noted that the above-described working procedure is merely illustrative, and does not limit the scope of the present application, and in practical application, a person skilled in the art may select part or all of them according to actual needs to achieve the purpose of the embodiment, which is not limited herein.

Other embodiments or specific implementations of a single-way collision-free AGV multi-target traffic control method and system according to the present application may refer to the above method embodiments, and are not described herein.

Finally: the foregoing description of the preferred embodiments of the application is not intended to limit the application to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and principles of the application are intended to be included within the scope of the application.

Claims

1. A single-way collision-free AGV multi-target traffic control method is characterized in that: the method comprises the following steps:

x _i --(S _t ,A _t ,R _t ,S _t+1 )

A _t ＝(a _k )

indicating the crash waiting loss of the AGV at time t,

indicating the total length of time lost for all AGVs to complete the task,

2. The single-way collision-free AGV multi-target traffic control method according to claim 1, wherein: the step S10 includes the following:

3. The single-way collision-free AGV multi-target traffic control method according to claim 1, wherein: the step S20 includes the following:

p＝p _k ∩p _j

[x，y]＝p[0][0]

s _j ＝t[x][y]+d×v _j -(p _j -p)×v _j

w[x][y]＝p×(v _k -v _j )+C

4. The single-way collision-free AGV multi-target traffic control method according to claim 1, wherein: the step S40 includes the following:

5. The single-way collision-free AGV multi-target traffic control method according to claim 1, wherein: the step S40 further includes the following:

s402, inputting the feature vector of the training sample into a prediction submodule to obtain a sample training traffic control model; the prediction submodule process comprises the following steps:

6. A method of single-way collision-free AGV multi-target traffic control as in claim 5, wherein: the step S40 further includes the following:

7. The single-way collision-free AGV multi-target traffic control method according to claim 1, wherein: the deployment prediction refers to inputting real-time environmental state information into a trained traffic control model, and predicting a multi-objective optimized AGV departure sequence with minimum traffic time and minimum waiting delay and corresponding departure time according to an output result of the model; the next scheduling option of the deployment prediction is referred to as a feature vectorAnd obtaining the score of each scheduling option through single-layer full-connection layer and Softmax operation in the prediction submodule, and taking the scheduling option with the highest score as the prediction result of the next scheduling option in the current state.

8. A single-way collision-free AGV multi-target traffic control system comprising: