CN117035207A

CN117035207A - Method for planning path of securicar, method and device for training path prediction model

Info

Publication number: CN117035207A
Application number: CN202310789095.9A
Authority: CN
Inventors: 皮文倩; 李岩; 姚一泽; 傅亚敏; 张靖羚; 张思秦; 郝佳; 舒欣
Original assignee: Bank of China Ltd
Current assignee: Bank of China Ltd
Priority date: 2023-06-29
Filing date: 2023-06-29
Publication date: 2023-11-10

Abstract

The application relates to a banknote carrier path planning method, a path prediction model training method and a path prediction model training device, which can be used in the financial field or other fields, such as the technical field of machine learning. The method comprises the following steps: acquiring network point information, total banknote transport amount and vehicle capacity of a banknote transport vehicle of network points in a region to be planned; the network point information of each network point at least comprises transportation demand information, surrounding traffic condition information, security level information and distance information between the network point and other network points in the area to be planned; extracting dot characteristics of dots from dot information of the dots through a pre-trained path prediction model; determining target net points of all banknote transporting vehicles under the constraint of total banknote transporting quantity and vehicle capacity according to net point characteristics; and combining the target network points of the banknote vehicles according to the determined sequence of the target network points of the banknote vehicles to obtain the planned path of the banknote vehicles. The method can accurately determine the planned path of each securicar.

Description

Method for planning path of securicar, method and device for training path prediction model

Technical Field

The present application relates to the field of machine learning technology, and in particular, to a method for planning a path of an securicar, a method for training a path prediction model, a device, a computer device, a storage medium, and a computer program product.

Background

The bank outlets can provide and withdraw cash required by the day through the cash truck every day so as to ensure the demand of customers for funds. However, the volume, the number and the running path of the banknote transport vehicles directly determine the banknote transport time and the banknote transport cost.

In the conventional technology, a path planning algorithm is generally adopted to carry out path planning of a vehicle; however, the factors considered by the conventional path planning algorithm are single, and when facing a complex scene, the planned path has poor accuracy, so that the path planning accuracy is low.

Disclosure of Invention

Based on this, it is necessary to provide a securicar path planning method, a training method of a path prediction model, a device, a computer readable storage medium and a computer program product, aiming at the technical problem of low path planning accuracy.

In a first aspect, the present application provides a method for path planning for a securicar. The method comprises the following steps:

Acquiring network point information, total banknote transport amount and vehicle capacity of a banknote transport vehicle of network points in a region to be planned; the network point information of each network point at least comprises transportation demand information, surrounding traffic condition information, security level information of the network point and distance information between the network point and other network points in the area to be planned;

extracting the dot characteristics of the dots from the dot information of the dots through a pre-trained path prediction model;

determining target network points of the banknote carrying vehicles under the constraint of the total banknote carrying quantity and the vehicle capacity according to the network point characteristics;

and combining the target network points of the banknote carrying vehicles according to the determined sequence of the target network points of the banknote carrying vehicles to obtain the planned path of the banknote carrying vehicles.

In one embodiment, the determining, according to the website characteristics, the target website of each banknote carrying vehicle under the constraint of the total banknote carrying amount and the vehicle capacity includes:

aiming at each securicar, taking a starting website in the area to be planned as the current website of each securicar, and taking the website except the current website as other websites;

Determining the probability that the other network points are the next network point of the current network point under the constraint of the total banknote carrying quantity and the vehicle capacity of each banknote carrying vehicle according to the network point characteristics;

selecting the target mesh point with the highest probability from the other mesh points, updating the other mesh points according to the target mesh point, and updating the total banknote transport amount and the vehicle capacity of each banknote transport vehicle according to the transport demand information of the target mesh point;

and taking the target website as a new current website, and jumping to the step of determining the probability that the other website is the next website of the current website under the constraint of the total banknote transporting amount and the vehicle capacity of each banknote transporting vehicle according to the website characteristics until the vehicle capacity of each banknote transporting vehicle meets the preset capacity condition.

In one embodiment, the determining, according to the node characteristics, the probability that the other nodes are the next node of the current node under the constraint of the total banknote carrying amount and the vehicle capacity of each banknote carrying vehicle includes:

updating the node characteristics of the current node in the node characteristics through a multi-head attention mechanism layer in the pre-trained path prediction model to obtain updated characteristics of the current node, and updating the node characteristics of a last target node of the current node to obtain updated characteristics of the last target node;

And determining the probability that the other mesh points are the next mesh point of the current mesh point under the constraint of the total banknote transporting quantity and the vehicle capacity of each banknote transporting vehicle based on the updating characteristics of the current mesh point and the updating characteristics of the last target mesh point.

In one embodiment, the determining, based on the update feature of the current website and the update feature of the last target website, the probability that the other website is the next website of the current website under the constraint of the total banknote transport amount and the vehicle capacity of each banknote transport vehicle includes:

determining an attention mask according to the total banknote transport amount, the vehicle capacity of each banknote transport vehicle and all target network points; the attention mask is used for representing constraint information associated with the total banknote transport amount, the vehicle capacity of each banknote transport vehicle and all target network points;

and determining the probability that other mesh points are the next mesh point of the current mesh point based on the update characteristic of the current mesh point, the update characteristic of the last target mesh point and the attention mask.

In one embodiment, the extracting, by a pre-trained path prediction model, the dot characteristic of the dot from the dot information of the dot includes:

Extracting transportation demand characteristics, peripheral traffic condition characteristics, security level characteristics and distance characteristics between the mesh point and other mesh points in the area to be planned from mesh point information of the mesh point through a pre-trained path prediction model;

and carrying out fusion processing on the transportation demand characteristics, the peripheral traffic condition characteristics and the security level characteristics of the mesh points and the distance characteristics between the mesh points and other mesh points in the area to be planned to obtain the mesh point characteristics of the mesh points.

In a second aspect, the application further provides a training method of the path prediction model. The method comprises the following steps:

acquiring sample dot information of sample dots in a sample area; the sample network point information comprises transportation demand information, surrounding traffic condition information, security level information of the sample network point and distance information between the sample network point and other sample network points in the sample area;

inputting the sample website information into a path prediction model to be trained to obtain a planned path of the securicar;

determining gradient update information of the path prediction model to be trained by using a reinforcement learning model;

According to the gradient updating information, adjusting model parameters of the path prediction model to be trained to obtain an adjusted path prediction model;

the adjusted path prediction model is used as a new path prediction model to be trained, and the step of inputting the sample network point information into the path prediction model to be trained is carried out in a returning mode to obtain a planned path of the securicar until the training ending condition is met;

and taking the path prediction model reaching the training ending condition as a path prediction model after training is completed.

In a third aspect, the application also provides a path planning device for the securicar. The device comprises:

the information acquisition module is used for acquiring the network point information, the total banknote transport amount and the vehicle capacity of the banknote transport vehicle of the network points in the area to be planned; the network point information of each network point at least comprises transportation demand information, surrounding traffic condition information, security level information of the network point and distance information between the network point and other network points in the area to be planned;

the feature extraction module is used for extracting the dot features of the dots from the dot information of the dots through a pre-trained path prediction model;

The network point determining module is used for determining target network points of the banknote transport vehicles under the constraint of the total banknote transport amount and the vehicle capacity according to the network point characteristics;

and the path determining module is used for combining the target mesh points of the banknote vehicles according to the determining sequence of the target mesh points of the banknote vehicles to obtain the planned path of the banknote vehicles.

In a fourth aspect, the application further provides a training device of the path prediction model. The device comprises:

the sample acquisition module is used for acquiring sample dot information of sample dots in the sample area; the sample network point information comprises transportation demand information, surrounding traffic condition information, security level information of the sample network point and distance information between the sample network point and other sample network points in the sample area;

the sample input module is used for inputting the sample website information into a path prediction model to be trained to obtain a planned path of the securicar;

the reinforcement learning module is used for determining gradient update information of the path prediction model to be trained by utilizing the reinforcement learning model;

the parameter adjustment module is used for adjusting the model parameters of the path prediction model to be trained according to the gradient update information to obtain an adjusted path prediction model;

The training iteration module is used for taking the adjusted path prediction model as a new path prediction model to be trained, and returning to execute the step of inputting the sample network point information into the path prediction model to be trained to obtain a planned path of the securicar until the training ending condition is met;

and the training completion module is used for taking the path prediction model reaching the training completion condition as a path prediction model after training completion.

In a fifth aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor which when executing the computer program performs the steps of:

In a sixth aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor which when executing the computer program performs the steps of:

and taking the path prediction model reaching the training ending condition as a pre-trained path prediction model.

In a seventh aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:

In an eighth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:

In a ninth aspect, the present application also provides a computer program product. The computer program product comprises a computer program which, when executed by a processor, implements the steps of:

In a tenth aspect, the present application also provides a computer program product. The computer program product comprises a computer program which, when executed by a processor, implements the steps of:

Firstly, acquiring network point information, total banknote transport quantity and vehicle capacity of a banknote transport vehicle of network points in an area to be planned, wherein the network point information of each network point at least comprises transportation demand information, surrounding traffic condition information, security level information and distance information between the network point and other network points in the area to be planned; compared with the path planning of general goods transportation, the path planning of the securicar needs to consider more safety factors, and the safety level information of the website is used as one of the website information, so that the planning path meeting the path planning requirement of the securicar can be obtained; then, extracting dot characteristics of dots from dot information of the dots through a pre-trained path prediction model, and determining target dots of each banknote carrier under the constraint of total banknote transport amount and vehicle capacity according to the dot characteristics; based on the dot characteristics determined by the dot information, the target dots are determined, and the obtained target dots can be ensured to meet the transportation efficiency and scene requirements of the securicar path planning; finally, combining the target network points of the banknote vehicles according to the determined sequence of the target network points of the banknote vehicles to obtain the planned path of the banknote vehicles; by utilizing the network point information and the pre-trained path prediction model, the planning path of each securicar can be accurately determined, and the transportation which is efficient, safe and meets the requirements is realized. In addition, the path prediction model is obtained through training the following steps: firstly, sample mesh point information of sample mesh points in a sample area is obtained, wherein the sample mesh point information comprises transportation demand information, surrounding traffic condition information and security level information of the sample mesh points and distance information between the sample mesh points and other sample mesh points in the sample area; then, inputting sample website information into a path prediction model to be trained to obtain a planned path of the securicar; then, determining gradient update information of the path prediction model to be trained by using the reinforcement learning model, and adjusting model parameters of the path prediction model to be trained according to the gradient update information to obtain an adjusted path prediction model; through the reinforcement learning optimization process, model parameters are continuously adjusted, so that the model can predict a path more accurately to obtain a result more meeting requirements; the adjusted path prediction model is used as a new path prediction model to be trained, and the step of inputting sample network point information into the path prediction model to be trained is carried out, so that a planned path of the securicar is obtained until the training ending condition is met; and finally, taking the path prediction model reaching the training ending condition as a path prediction model after training. In the method, when the path planning is carried out, various factors such as transportation demand information, surrounding traffic condition information, security level information, distance information between the network points and other network points in the area to be planned, total banknote transporting quantity, vehicle capacity of a banknote transporting vehicle and the like are comprehensively considered, so that the accurate determination of the planned path is facilitated, and the path planning accuracy is further improved. In addition, the planning path of each securicar is automatically determined through the pre-trained path prediction model, so that the intelligent securicar path planning is realized, the transportation efficiency can be improved, the cost and the transportation risk can be reduced, the specific transportation requirement and constraint conditions can be met, and the optimization of the securicar path planning decision is facilitated.

Drawings

FIG. 1 is a flow chart of a method for path planning for a securicar in one embodiment;

FIG. 2 is a flow chart of steps for determining destination points for each of the securities carriers in one embodiment;

FIG. 3 is a flow diagram of a method of training a path prediction model in one embodiment;

FIG. 4 is a block diagram of an embodiment of a securicar path planning apparatus;

FIG. 5 is a block diagram of a training apparatus of a path prediction model in one embodiment;

fig. 6 is an internal structural diagram of a computer device in one embodiment.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

It should be noted that, the method for planning the path of the securicar and the training method and device of the path prediction model provided by the application can be used in the financial field, for example, in the securicar transportation scene, the method can realize the automatic planning of the securicar path; but also can be used in any field other than the financial field, such as the technical field of machine learning. In addition, the application does not limit the application fields of the path planning method of the securicar and the training method and device of the path prediction model.

It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region.

In one embodiment, as shown in fig. 1, a method for planning a path of an securicar is provided, and this embodiment is applied to a terminal for illustration by using the method, it is understood that the method may also be applied to a server, and may also be applied to a system including the terminal and the server, and implemented through interaction between the terminal and the server. The terminal may be, but not limited to, various personal computers, notebook computers, smart phones and tablet computers. The server may be implemented as a stand-alone server or as a server cluster composed of a plurality of servers. In this embodiment, the method includes the steps of:

step S101, acquiring network point information, total banknote transporting quantity and vehicle capacity of a banknote transporting vehicle of network points in a region to be planned.

The network point information of each network point at least comprises transportation demand information, surrounding traffic condition information, security level information of the network point and distance information between the network point and other network points in the area to be planned.

The terminal may obtain the dot information of the dots in the area to be planned through input of the user, or may query the pre-stored dot information of the dots in the area to be planned from the database. The network point information comprises transportation demand information, surrounding traffic condition information, security level information and distance information between the network point and other network points in the area to be planned. The transport demand information is the number of banknotes to be transported by the banknote transport vehicle; the surrounding traffic condition information can be historical average data of network points or real-time data; the security level information may be determined by the road type around the network point, people stream information, and the like. The website includes a starting website, namely a departure place of the securicar, and the website information of the starting website can only include distance information between the website and each website.

Step S102, extracting dot characteristics of dots from dot information of the dots through a pre-trained path prediction model.

The path prediction model may be a neural network model including a multi-headed attention mechanism layer.

Illustratively, the terminal processes the dot information based on a pre-trained path prediction model, wherein a transducer model (a neural network architecture based on an attention mechanism) can be used to perform feature extraction processing on the dot information, so as to obtain the dot features applicable to the multi-head attention mechanism layer.

And step S103, determining target network points of all banknote transporting vehicles under the constraint of the total banknote transporting quantity and the vehicle capacity according to the network point characteristics.

In an exemplary path prediction model, the terminal shields (removes) some mesh points which do not meet the constraint under the constraint of the total banknote transporting amount and the vehicle capacity, and then sequentially determines the target mesh point of the banknote transporting vehicle which meets the path planning requirement according to the mesh point characteristics until the vehicle capacity of the banknote transporting vehicle meets the preset capacity condition.

And step S104, the terminal combines the target network points of the banknote vehicles according to the determined sequence of the target network points of the banknote vehicles to obtain the planned path of the banknote vehicles.

The planned path of each securicar from the initial website can be obtained by connecting and combining the target website of each securicar according to the determined sequence of the target website of each securicar.

It should be noted that, whether the banknote carrier carries the banknote to the website or loads the banknote from the website, the banknote carrier can be considered as the same planning problem, so that the path planning of the banknote carrier can be performed by adopting the method, and the two cases can be mutually converted by the user for the convenience of understanding or application.

In the path planning method of the securicar, firstly, the website information, the total banknote carrying amount and the vehicle capacity of the securicar of the website in the area to be planned are obtained, and the website information of each website at least comprises the transportation demand information, the surrounding traffic condition information, the security level information and the distance information between the website and other websites in the area to be planned; compared with the path planning of general goods transportation, the path planning of the securicar needs to consider more safety factors, and the safety level information of the website is used as one of the website information, so that the planning path meeting the path planning requirement of the securicar can be obtained; then, extracting dot characteristics of dots from dot information of the dots through a pre-trained path prediction model, and determining target dots of each banknote carrier under the constraint of total banknote transport amount and vehicle capacity according to the dot characteristics; based on the dot characteristics determined by the dot information, the target dots are determined, and the obtained target dots can be ensured to meet the transportation efficiency and scene requirements of the securicar path planning; and finally, combining the target network points of the banknote vehicles according to the determined sequence of the target network points of the banknote vehicles to obtain the planned path of the banknote vehicles. In the method, when the path planning is carried out, various factors such as transportation demand information, surrounding traffic condition information, security level information, distance information between the network points and other network points in the area to be planned, total banknote transporting quantity, vehicle capacity of a banknote transporting vehicle and the like are comprehensively considered, so that the accurate determination of the planned path is facilitated, and the path planning accuracy is further improved. In addition, the planning path of each securicar is automatically determined through the pre-trained path prediction model, so that the intelligent securicar path planning is realized, the transportation efficiency can be improved, the cost and the transportation risk can be reduced, the specific transportation requirement and constraint conditions can be met, and the optimization of the securicar path planning decision is facilitated.

In one embodiment, as shown in fig. 2, step S103, determining, according to the dot characteristics, destination dots of each of the banknote carrying vehicles under the constraint of the total banknote carrying amount and the vehicle capacity may further be implemented by:

step S201, regarding each securicar, taking a starting website in the area to be planned as the current website of each securicar, and taking the website except the current website as other websites;

step S202, determining the probability that other nodes are the next node of the current node under the constraint of the total banknote carrying amount and the vehicle capacity of each banknote carrying vehicle according to the node characteristics;

step S203, selecting the target mesh point with the highest probability from other mesh points, updating the other mesh points according to the target mesh point, and updating the total banknote transport amount and the vehicle capacity of each banknote transport vehicle according to the transport demand information of the target mesh point;

step S204, taking the target mesh point as a new current mesh point, and jumping to a step of determining the probability that other mesh points are the next mesh point of the current mesh point under the constraint of the total banknote carrying amount and the vehicle capacity of each banknote carrying vehicle according to the mesh point characteristics until the vehicle capacity of each banknote carrying vehicle meets the preset capacity condition.

The terminal determines the number of the securicar, and before the securicar is subjected to path planning, a user inputs the number of the securicar which needs to be used, and then the securicar is subjected to path planning one by one until the path planning of all securicars is completed; the path prediction model can also dynamically determine, the path prediction model performs path planning one by one, and when the current path planning of the securicar is completed but the transportation demands of all the sites are not completed, one securicar is added and the path planning is performed until the transportation demands of all the sites are completed. For the path planning of each securicar, the initial website (namely the starting point) is used as the current website in the path planning at the beginning, then the probability that other websites are the next website of the current website is determined according to the website characteristics of all websites in a softmax (normalized index) layer of the path prediction model, the website with the highest probability is selected as the next website of the current website (namely the target website), and the total securities transport amount and the vehicle capacity of the securicar are updated according to the website information of the target website; taking the target mesh point as a new current mesh point, then determining the probability of the next mesh point of other mesh points as the current mesh point again based on the mesh point characteristics of the mesh points until the vehicle capacity of the securicar meets the preset capacity condition (namely the vehicle capacity is maximally used), and shielding (removing) mesh points which do not meet the constraint under the constraint of the total securities and the vehicle capacity of each securicar before determining the probability of the mesh points; for example, the transport requirement for site a is 5 units of banknote, but the securicar currently has a remaining capacity of 3 units, then it is necessary to mask site a.

In this embodiment, by calculating the probability of each dot as the next access target according to the dot characteristics and selecting the target dot with the highest probability, the effective path planning of the vehicle can be realized. Meanwhile, the total banknote transporting quantity and the vehicle capacity of each banknote transporting vehicle can be dynamically adjusted by updating the network point information and the transportation demand information so as to adapt to the actual demand. The path planning result of the area to be planned can be obtained by completing the path planning of the banknote transport vehicle one by one, so that the path meeting the transportation requirement and the vehicle capacity limit can be effectively planned, and the banknote transport efficiency and the resource utilization rate are improved.

In one embodiment, step S202 determines, according to the characteristics of the nodes, the probability that other nodes are the next node of the current node under the constraint of the total banknote transport amount and the vehicle capacity of each banknote transport vehicle, and further includes: updating the dot characteristics of the current dot in the dot characteristics through a multi-head attention mechanism layer in a pre-trained path prediction model to obtain the updated characteristics of the current dot, and updating the dot characteristics of the last target dot of the current dot to obtain the updated characteristics of the last target dot; based on the update characteristics of the current network point and the update characteristics of the last target network point, the probability that other network points are the next network point of the current network point under the constraint of the total banknote transporting quantity and the vehicle capacity of each banknote transporting vehicle is determined.

The terminal calculates the query, key and value corresponding to all the dot attention mechanisms by using a transducer model based on a multi-head attention mechanism layer in the path prediction model, and updates the dot characteristics of the current dot according to the attention weights led by the query of the dots except the current dot to obtain the sub-update characteristics of the current dot; in the multi-head attention mechanism, a plurality of sub-update features of the current network point are determined by adopting a plurality of transformers models with different parameter settings, and all the sub-update features are fused to obtain the update features of the current network point. Similarly, in another multi-head attention mechanism layer, a plurality of convertors models with different parameter settings are adopted to calculate and determine the dot characteristics of the last target dot for updating. And in the third multi-head attention mechanism layer, a plurality of Transformer models with different parameter settings are adopted, the output of the third multi-head attention mechanism layer is obtained according to the updating characteristics of the current mesh point and the updating characteristics of the last target mesh point, and the probability of other mesh points serving as the next mesh point of the current mesh point is determined through a softmax (normalized index) layer. It should be noted that, the output of each multi-head attention mechanism layer needs to be processed by a batch normalization layer, and then the result after batch normalization is used as the input of the next network layer.

In this embodiment, the probability of each dot being the next dot of the current dot can be calculated more accurately by introducing a multi-head attention mechanism to update the features. The attention mechanism can better capture the correlation and importance between the mesh points by utilizing the updated characteristics of the current mesh point and the last target mesh point, thereby improving the accuracy and effect of path planning.

In one embodiment, the determining, by the terminal, the probability that the other mesh point is the next mesh point of the current mesh point under the constraint of the total banknote transporting amount and the vehicle capacity of each banknote transporting vehicle based on the update feature of the current mesh point and the update feature of the last target mesh point includes: determining an attention mask according to the total banknote transport amount, the vehicle capacity of each banknote transport vehicle and all target network points; the attention mask is used for representing constraint information associated with the total banknote quantity, the vehicle capacity of each banknote carrying vehicle and all target network points; based on the updated characteristics of the current mesh point, the updated characteristics of the last target mesh point and the attention mask, the probability that other mesh points are the next mesh point of the current mesh point is determined.

Illustratively, a Masked attention mechanism is employed to incorporate constraints into the path prediction model. That is, according to the total banknote transport amount, the vehicle capacity of each banknote transport vehicle and all target network points, an attention mask (i.e. a mask matrix of attention) is determined, so as to mask (remove) some network points which do not meet the constraint conditions, such as network points which have been planned (i.e. all target network points), network points with transport requirements greater than the current capacity of the banknote transport vehicle, and the like. In a multi-head attention mechanism layer of the path prediction model, screening the mesh points according to an attention mask, and then determining the probability of other mesh points serving as the next mesh point of the current mesh point according to the updating characteristics of the current mesh point and the updating characteristics of the last target mesh point.

In this embodiment, by introducing the attention mask, constraint information can be integrated into the path planning process, so as to ensure that the generated path meets the total banknote transport amount, vehicle capacity and other constraint conditions. Under the attention mechanism, the terminal adjusts the selection probability of different network points according to the updated characteristics and the weight of the attention mask, so that the path planning accords with the actual requirements and the limitation.

In one embodiment, step S102, extracting, from dot information of dots, dot characteristics of the dots by a pre-trained path prediction model, includes: extracting transportation demand characteristics, surrounding traffic condition characteristics, security level characteristics and distance characteristics between the mesh point and other mesh points in the area to be planned from mesh point information of the mesh point through a pre-trained path prediction model; and carrying out fusion processing on the transportation demand characteristics, the peripheral traffic condition characteristics, the security level characteristics and the distance characteristics between the mesh points and other mesh points in the area to be planned to obtain mesh point characteristics of the mesh points.

The terminal extracts transportation demand characteristics, surrounding traffic condition characteristics, security level characteristics and distance characteristics between the mesh point and other mesh points in the area to be planned from mesh point information of the mesh point through a pre-trained path prediction model, and then performs splicing and fusion processing on each characteristic to obtain the mesh point characteristics required in subsequent path planning. For the security level characteristics, in the subsequent path planning, if the security level characteristics of a certain website indicate that the security around the website is poor, in the path from the banknote transport vehicle to the website, the website is planned to a later path node as much as possible, so that the expected value of loss caused by accidents is lower. For the peripheral traffic condition characteristics, in the subsequent path planning, if the peripheral traffic condition characteristics of a certain website indicate that the periphery of the website has a high probability of traffic jam, the website is planned to a later path node as much as possible in the path of the banknote transport vehicle for transporting the banknote to the website, so that the total waiting time of each website waiting for the banknote transport vehicle is reduced.

In this embodiment, various features can be extracted from the dot information of the dots through a pre-trained path prediction model, and the features are fused to obtain dot features of the dots so as to support a subsequent path planning process.

In another embodiment, the application also provides a method for planning a path of a securicar, which comprises the following steps:

step one, acquiring network point information, total banknote transport quantity and vehicle capacity of a banknote transport vehicle of network points in a region to be planned.

And secondly, extracting transportation demand characteristics, surrounding traffic condition characteristics, security level characteristics and distance characteristics between the mesh point and other mesh points in the area to be planned from mesh point information of the mesh point through a pre-trained path prediction model.

And thirdly, carrying out fusion processing on the transportation demand characteristics, the peripheral traffic condition characteristics, the security level characteristics and the distance characteristics between the mesh points and other mesh points in the area to be planned to obtain mesh point characteristics of the mesh points.

And step four, aiming at each securicar, taking the initial website in the area to be planned as the current website of each securicar, and taking the website except the current website as other websites.

And fifthly, updating the dot characteristics of the current dot in the dot characteristics through a multi-head attention mechanism layer in the pre-trained path prediction model to obtain the updated characteristics of the current dot, and updating the dot characteristics of the last target dot of the current dot to obtain the updated characteristics of the last target dot.

Step six, determining an attention mask according to the total banknote carrying quantity, the vehicle capacity of each banknote carrying vehicle and all target network points;

wherein the attention mask is used to characterize constraint information associated with the total amount of money transported, the vehicle capacity of each money truck, and all target sites.

And step seven, determining the probability that other mesh points are the next mesh point of the current mesh point based on the update characteristic of the current mesh point, the update characteristic of the last target mesh point and the attention mask.

And step eight, selecting the target mesh point with the highest probability from other mesh points, updating the other mesh points according to the target mesh point, and updating the total banknote transporting quantity and the vehicle capacity of each banknote transporting vehicle according to the transportation demand information of the target mesh point.

Step nine, taking the target mesh point as a new current mesh point, and jumping to a step of determining the probability that other mesh points are the next mesh point of the current mesh point under the constraint of the total banknote transporting amount and the vehicle capacity of each banknote transporting vehicle according to the mesh point characteristics until the vehicle capacity of each banknote transporting vehicle meets the preset capacity condition.

And step ten, combining the target network points of the banknote vehicles according to the determined sequence of the target network points of the banknote vehicles to obtain the planned path of the banknote vehicles.

In this embodiment, not only physical related factors and road related factors but also interactive related factors can be considered, and the method is suitable for more complex scenes, so that under the condition of large data volume, after the securicar path is planned, transport vehicles can be reduced, the driving distance is shortened, and the transport cost is reduced to the minimum.

In one embodiment, as shown in fig. 3, the present application further provides a training method of a path prediction model, which includes the following steps:

step S301, sample dot information of sample dots in a sample area is obtained;

the sample network point information comprises transportation demand information, surrounding traffic condition information, security level information of the sample network point and distance information between the sample network point and other sample network points in the sample area;

Step S302, inputting sample website information into a path prediction model to be trained to obtain a planned path of the securicar;

step S303, determining gradient update information of a path prediction model to be trained by using the reinforcement learning model;

step S304, according to the gradient update information, adjusting model parameters of a path prediction model to be trained to obtain an adjusted path prediction model;

step S305, the adjusted path prediction model is used as a new path prediction model to be trained, and the step of inputting sample network point information into the path prediction model to be trained to obtain a planned path of the securicar is carried out until the training ending condition is met;

in step S306, the path prediction model that reaches the training end condition is used as a pre-trained path prediction model.

Illustratively, reinforcement learning is typically described using a Markov decision process (Markov Decision Process, MDP) with elements including State, action, policy, and rewards (Reward). In this embodiment, the state is the website where the securicar is currently located; the action is the next net point of the money-carrying vehicle; the strategy is the probability of the mesh point as the next mesh point; the rewards of the current mesh point are determined by a scalar function determined by the user based on the mesh point's transportation demand characteristics, surrounding traffic condition characteristics, security level characteristics, and distance characteristics between the mesh point and other mesh points in the area to be planned. And obtaining a corresponding optimization function by taking the path with the largest cumulative reward as a target, and obtaining a gradient update function by deviant of the optimization function. Then, after a planned route is obtained according to the path prediction model to be trained, determining an expected value of accumulated rewards of the current path of the current target network point and a decayed rewards sum of all target network points after the current target network point according to rewards and strategies, determining gradient update information according to the expected value, the decayed rewards sum and a gradient update function, and adjusting model parameters of the path prediction model to be trained by adopting an Adom optimizer according to the gradient update information.

Specifically, a fixed Q-targets concept similar to DQN (deep Q network) can be adopted, and two neural networks with identical structures and different parameters are set: the neural network of Q estimate uses the latest parameters and the neural network of Qtarget uses the previous parameters. Initializing parameters of two networks to be the same value when updating the parameters of the networks each time; path prediction is then performed using a Q estimate network: randomly selecting one node in the path as a current state, then outputting a node with the highest probability as a next node based on a preset probability, or randomly selecting one node as the next node; and then, carrying out path prediction by using a Q target network: adopting a greedy strategy, and selecting the dot with the highest probability as the next dot each time; and finally, calculating gradient update information of the Qestination network, and updating parameters of the Qestination network. When the update times of the Qestination network reach the preset update times, carrying out unilateral test on the distribution of the updated parameters of the Qestination network and the parameters of the Q target network, and if the two distributions are obviously different, namely fall into a reject domain, updating the parameters of the Q target network by using the updated parameters of the Q estination network to finish the training of a path prediction model. And when the training times reach the preset times, determining the path prediction model to complete training. The idea of DNQ can enable models to learn more complex strategies and help to avoid trapping in local optimum, and can reduce the correlation of data and avoid trapping in fixed local optimum by saving previous experience in an experience playback buffer and randomly sampling for training. This may improve the stability of the training and sample efficiency.

In the embodiment, the planned path of the securicar is obtained by inputting sample website information, and the path prediction model is optimized by reinforcement learning, so that the trained path prediction model is finally obtained, and the accuracy and efficiency of the path prediction model on path planning can be effectively improved. Reinforcement learning may take into account long-term rewards, allowing models to better evaluate long-term effects of paths and avoid simply pursuing immediate high rewards and ignoring long-term optimization goals.

It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.

Based on the same inventive concept, the embodiment of the application also provides a securicar path planning device for realizing the securicar path planning method and a training device for realizing the path prediction model of the training method of the path prediction model. The implementation of the solution provided by these two devices is similar to that described in the above method, so the specific limitations in one or more device embodiments provided below may be referred to above as limitations on the method of path planning for a securicar, and will not be described here again.

In one embodiment, as shown in fig. 4, there is provided a securicar path planning apparatus, comprising: an information acquisition module 401, a feature extraction module 402, a mesh point determination module 403, and a path determination module 404, wherein:

the information acquisition module 401 is used for acquiring network point information, total banknote transport amount and vehicle capacity of a banknote transport vehicle of a network point in a region to be planned; the network point information of each network point at least comprises transportation demand information, surrounding traffic condition information, security level information and distance information between the network point and other network points in the area to be planned;

The feature extraction module 402 is configured to extract dot features of dots from dot information of the dots through a pre-trained path prediction model;

the website determining module 403 is configured to determine, according to website characteristics, target websites of each banknote carrying vehicle under the constraint of total banknote carrying amount and vehicle capacity;

the path determining module 404 is configured to combine the target mesh points of the securicar according to the determining order of the target mesh points of the securicar, so as to obtain a planned path of the securicar.

In one embodiment, the above-mentioned mesh point determining module 403 is further configured to, for each securicar, use a starting mesh point in the area to be planned as a current mesh point of each securicar, and use mesh points other than the current mesh point as other mesh points; determining the probability that other network points are the next network point of the current network point under the constraint of the total banknote transporting quantity and the vehicle capacity of each banknote transporting vehicle according to the network point characteristics; selecting a target mesh point with the highest probability from other mesh points, updating the other mesh points according to the target mesh point, and updating the total banknote transporting quantity and the vehicle capacity of each banknote transporting vehicle according to the transportation demand information of the target mesh point; and taking the target mesh point as a new current mesh point, and jumping to a step of determining the probability that other mesh points are the next mesh point of the current mesh point under the constraint of the total banknote transporting amount and the vehicle capacity of each banknote transporting vehicle according to the mesh point characteristics until the vehicle capacity of each banknote transporting vehicle meets the preset capacity condition.

In one embodiment, the above-mentioned mesh point determining module 403 is further configured to update, through a multi-head attention mechanism layer in the pre-trained path prediction model, mesh point features of a current mesh point in mesh point features to obtain updated features of the current mesh point, and update mesh point features of a last target mesh point of the current mesh point to obtain updated features of the last target mesh point; based on the update characteristics of the current network point and the update characteristics of the last target network point, the probability that other network points are the next network point of the current network point under the constraint of the total banknote transporting quantity and the vehicle capacity of each banknote transporting vehicle is determined.

In one embodiment, the website determining module 403 is further configured to determine an attention mask according to the total banknote amount, the vehicle capacity of each banknote carrier, and all target websites; the attention mask is used for representing constraint information associated with the total banknote quantity, the vehicle capacity of each banknote carrying vehicle and all target network points; based on the updated characteristics of the current mesh point, the updated characteristics of the last target mesh point and the attention mask, the probability that other mesh points are the next mesh point of the current mesh point is determined.

In one embodiment, the feature extraction module 402 is further configured to extract, from the website information of the website, transportation demand features, surrounding traffic condition features, security level features, and distance features between the website and other websites in the area to be planned, by using a pre-trained path prediction model; and carrying out fusion processing on the transportation demand characteristics, the peripheral traffic condition characteristics, the security level characteristics and the distance characteristics between the mesh points and other mesh points in the area to be planned to obtain mesh point characteristics of the mesh points.

In one embodiment, as shown in fig. 5, there is provided a training apparatus of a path prediction model, including: a sample acquisition module 501, a sample input module 502, a reinforcement learning module 503, a parameter adjustment module 504, a training iteration module 505, and a training completion module 506, wherein:

a sample acquiring module 501, configured to acquire sample dot information of sample dots in a sample area; the sample network point information comprises transportation demand information, surrounding traffic condition information, security level information and distance information between the sample network point and other sample network points in the sample area;

the sample input module 502 is configured to input sample website information into a path prediction model to be trained, so as to obtain a planned path of the securicar;

a reinforcement learning module 503, configured to determine gradient update information of a path prediction model to be trained using the reinforcement learning model;

the parameter adjustment module 504 is configured to adjust model parameters of a path prediction model to be trained according to the gradient update information, so as to obtain an adjusted path prediction model;

the training iteration module 505 is configured to return and execute the step of inputting the sample website information into the path prediction model to be trained to obtain the planned path of the securicar, using the adjusted path prediction model as a new path prediction model to be trained, until the training end condition is satisfied;

The training completion module 506 is configured to use the path prediction model that reaches the training end condition as a pre-trained path prediction model.

The above-mentioned each module in the securicar path planning device and the training device of the path prediction model may be implemented in whole or in part by software, hardware and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 6. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used to store website information data. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by the processor, implements a method for path planning for a securicar and a method for training a path prediction model.

It will be appreciated by those skilled in the art that the structure shown in FIG. 6 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

In an embodiment, there is also provided a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, carries out the steps of the method embodiments described above.

In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims

1. A method for path planning for a securicar, the method comprising:

2. The method of claim 1, wherein said determining a destination node for each of said banknote carriers under the constraints of said total banknote amount and said vehicle capacity based on said node characteristics comprises:

3. The method of claim 2, wherein said determining, based on said node characteristics, a probability that said other node is the next node to said current node under the constraint of said total amount of money and said vehicle capacity of each of said securities carriers comprises:

4. A method according to claim 3, wherein said determining the probability that the other mesh point is the next mesh point to the current mesh point under the constraints of the total amount of money transported and the vehicle capacity of each of the money transporting vehicles based on the updated characteristics of the current mesh point and the updated characteristics of the last target mesh point comprises:

5. The method according to claim 1, wherein extracting the dot characteristics of the dots from the dot information of the dots by a pre-trained path prediction model includes:

6. A method of training a path prediction model, the method comprising:

7. A securicar path planning apparatus, comprising:

8. A training apparatus for a path prediction model, the apparatus comprising:

and the training completion module is used for taking the path prediction model reaching the training ending condition as a pre-trained path prediction model.

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 6 when the computer program is executed.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.

11. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.