CN116166030A - Path planning method and device, storage medium and electronic equipment - Google Patents

Path planning method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN116166030A
CN116166030A CN202310219228.9A CN202310219228A CN116166030A CN 116166030 A CN116166030 A CN 116166030A CN 202310219228 A CN202310219228 A CN 202310219228A CN 116166030 A CN116166030 A CN 116166030A
Authority
CN
China
Prior art keywords
target
money
information
data
added
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310219228.9A
Other languages
Chinese (zh)
Inventor
李磊
李冬
冯玉清
韩丹丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202310219228.9A priority Critical patent/CN116166030A/en
Publication of CN116166030A publication Critical patent/CN116166030A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0219Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory ensuring the processing of the whole working surface

Landscapes

  • Engineering & Computer Science (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a path planning method, a path planning device, a storage medium and electronic equipment. Relates to the technical field of artificial intelligence. Wherein the method comprises the following steps: acquiring equipment information of a financial machine to be added with money; based on the equipment information, acquiring the position information and the target information of the financial machine to be added with money, wherein the target information at least comprises: real-time road condition information; inputting the position information and the target information into a target planning model, and outputting banknote adding paths of all financial machines to be added with banknotes, wherein the banknote adding paths at least comprise: the target vehicle runs from the current starting point to the running route of each financial machine to be added with money and runs to the ending point. The invention solves the technical problems of poor planning effect of planning the banknote transporting path of the banknote transporting vehicle by manual planning or a traditional algorithm planning mode.

Description

Path planning method and device, storage medium and electronic equipment
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a path planning method, a path planning device, a storage medium and electronic equipment.
Background
ATM (automatic teller machine) is a financial institution, in order to satisfy the user handling the deposit and withdrawal business, there are business network points distributed in each place, with the rapid development of the city, the city scale is increasing, the network points are increasing, meanwhile, since the financial industry is to ensure the security, the network points will not place the cashbox generally, this needs to be effectively collected and delivered on a daily basis as required; as the number of dots is increased, the dots are scattered everywhere, the banknote adding demand is uncertain, and the banknote adding road condition is variable, the banknote adding road condition becomes more and more complex when the driving route of the banknote transporting vehicle is planned.
Under the influence of the epidemic situation of the large environment, the business of the securicar is also changed, the factor of epidemic prevention and control is also considered when the securicar provides the securicar transporting service for the network, and the securicar path planning means used in the prior art mainly comprises the following two kinds of securicar path planning means: firstly, through manual planning by banking staff, the banking staff daily passes through each ATM point location for adding money required on the current day, and according to own experience, the money transporting and transporting route is formulated in combination with road congestion, and the means is preferable in most cases, but the means needs to be completed manually with time consumption and can cause poor route planning effect when more factors are considered; secondly, a traditional algorithm is adopted to plan the path of the securicar, such as an ant colony algorithm, a manual potential field method, a genetic algorithm and the like, but the path planning effect of the securicar is difficult to reach the expected value due to the limited performance and poor convergence of the algorithm.
The conventional algorithm is used for planning the banknote transport route, and the banknote transport vehicle route planning based on the ant colony algorithm is introduced here:
1. basic principle of ant colony algorithm: (1) ants release pheromones on the path; (2) When the intersection which is not walked is met, a road is randomly selected, and the pheromone related to the path length is released; (3) The pheromone is inversely proportional to the path length, and when the ants hit the intersection again, the path with higher information concentration is selected; (4) the pheromone concentration on the optimal path is larger and larger; and (5) finally, the ant colony finds the optimal path.
2. The method comprises the following basic steps of banknote transport route planning by an ant colony algorithm: the method is characterized in that the improvement and optimization are carried out on the basis of a basic ant colony algorithm, a Petri fusion ant colony algorithm is provided, the algorithm changes the setting mode of the basic ant colony tabu table, ants share the tabu table-share a tabu transition sequence, the traditional tabu table is replaced, the problem solving speed is increased, and the elements of the ant share tabu table are transitions. And adding constraint test in the probability selection rule, and rejecting the transition which does not accord with the distribution constraint by 0 probability. Meanwhile, the path strategies applied in the algorithm are: a path with a more concentrated information concentration is selected. The path planning algorithm of the Petri fusion ant colony based on the TCPN (timed coloring Petri network) path network model has a certain effect on the planning of the banknote adding path of the bank note carrying vehicle, and intelligent path planning is realized to a certain extent.
For example: disadvantages of the ant colony algorithm: the convergence speed is low, and the local optimum is easy to fall into; the ant colony algorithm generally needs to search for a long time under the condition of high complexity, the processing time is slow, and stagnation phenomenon is easy to occur, namely after the searching is performed to a certain extent, solutions found by all individuals are completely consistent, the ant colony algorithm cannot continue to be explored, and the ant colony algorithm can be converged to a local optimal solution rather than a global optimal solution too early, so that the ant colony algorithm is unfavorable for finding the optimal solution. Under the condition of being influenced by epidemic situation, the banknote transporting vehicle needs to consider factors such as medium and high risk streets when selecting the banknote transporting path, and the complexity of banknote transporting vehicle path planning is increased to a certain extent, so that the difficulty of line planning is increased.
At present, the ATM banknote adding route planning process comprises the following steps: business personnel make money adding plans in advance, derive ATM end machines needing money adding, after confirming the point location of ATM to be money adding, business personnel can be in all roads related to ATM point location in the present day, according to real-time road conditions in the present day: whether the road is congested, whether the road is limited, whether the road is a single way, and the like, and then considering the factors of mileage charging, banknote transporting timeliness and the like of the banknote transporting vehicle, an optimal banknote adding route is planned. Because the consideration is more and the factor is uncontrollable, the efficiency of the manual planning mode can be greatly reduced.
In view of the above problems, no effective solution has been proposed at present.
Disclosure of Invention
The embodiment of the invention provides a path planning method, a path planning device, a storage medium and electronic equipment, which at least solve the technical problem that the banknote transport path of a banknote transport vehicle is planned in a manual planning or traditional algorithm planning mode, and the planning effect is poor.
According to an aspect of an embodiment of the present invention, there is provided a path planning method including: acquiring equipment information of a financial machine to be added with money; based on the equipment information, acquiring the position information and target information of the financial machine to be added with money, wherein the target information at least comprises: real-time road condition information; inputting the position information and the target information into a target planning model, and outputting banknote adding paths of all the financial machines to be added with banknotes, wherein the banknote adding paths at least comprise: and the target vehicle runs from the current starting point to the running route of each financial machine to be added with money and runs from the current starting point to the running ending point.
Further, the target planning model is obtained by: obtaining training sample data in a target data pool, wherein the training sample data at least comprises: first state data, action data, second state data and score values of the historical time period, wherein the first state data at least comprises: the second status data includes at least position information of a first position of the target vehicle: position information of the second position, the action data at least comprising: an act of the target vehicle selecting to travel from the first location to the second location; and training an initial neural network model through a target algorithm based on the training sample data to obtain the target planning model, wherein a gradient descent algorithm is adopted to process a loss function associated with the initial neural network model when the initial neural network model is trained.
Further, before acquiring training sample data in the target data pool, the method further comprises: acquiring the first state data of a first historical moment;
selecting the action data through a target strategy to enable the state of the target vehicle at the first historical moment to be switched to the state at the second historical moment, so as to obtain the second state data;
Determining the score value based on position information of the target vehicle at the first history time and the second history time; and storing the first state data, the second state data, the action data and the score value into the target data pool.
Further, determining the score value based on the position information of the target vehicle at the first history time and the second history time includes: calculating a distance value of the first position and the second position based on position information of the target vehicle at the first history time and the second history time; acquiring road condition data between the first historical moment and the second historical moment; and determining the score value based on the road condition data and the distance value.
Further, the road condition data at least includes: congestion section data, seal section data.
Further, the state data of the target vehicle is represented by a first vector, the first vector including at least: a position vector representing a position of the vehicle, a vector representing a vehicle-mounted allowance of the vehicle after money is discharged at the position, and a vector representing a money adding amount of money adding at each money adding device, wherein the vehicle-mounted allowance at least comprises: the amount of money remaining after the money is unloaded at the position of the vehicle.
Further, the financial machine to be added with money at least comprises: the ATM inputs the position information and the target information into a target planning model, and outputs a money adding path to the financial machine to be added with money, and the method further comprises: and adding money to each financial machine to be added with money based on the money adding path.
According to another aspect of the embodiment of the present invention, there is also provided a path planning apparatus, including: the first acquisition unit is used for acquiring equipment information of the financial machine to be added with money; the second obtaining unit is configured to obtain, based on the device information, location information and target information of the financial machine to be added, where the target information at least includes: real-time road condition information; the processing unit is used for inputting the position information and the target information into a target planning model and outputting banknote adding paths of all the financial machines to be added with banknotes, wherein the banknote adding paths at least comprise: and the target vehicle runs from the current starting point to the running route of each financial machine to be added with money and runs from the current starting point to the running ending point.
Further, the target planning model is obtained by: a third obtaining unit, configured to obtain training sample data in a target data pool, where the training sample data at least includes: first state data, action data, second state data and score values of the historical time period, wherein the first state data at least comprises: the second status data includes at least position information of a first position of the target vehicle: position information of the second position, the action data at least comprising: an act of the target vehicle selecting to travel from the first location to the second location; the training unit is used for training the initial neural network model through a target algorithm based on the training sample data to obtain the target planning model, wherein a gradient descent algorithm is adopted to process a loss function associated with the initial neural network model when the initial neural network model is trained.
Further, the path planning apparatus further includes: a fourth obtaining unit, configured to obtain the first state data at a first history time before obtaining training sample data in a target data pool; the selection unit is used for selecting the action data through a target strategy so as to enable the state of the target vehicle at the first historical moment to be switched to the state at the second historical moment, and the second state data is obtained; a determining unit configured to determine the score value based on position information of the target vehicle at the first history time and the second history time; and the storage unit is used for storing the first state data, the second state data, the action data and the score value into the target data pool.
Further, the determining unit includes: a calculating subunit configured to calculate a distance value of the first location and the second location based on location information of the target vehicle at the first history time and the second history time; the acquisition subunit is used for acquiring road condition data between the first historical moment and the second historical moment; and the determining subunit is used for determining the scoring value based on the road condition data and the distance value.
Further, the road condition data at least includes: congestion section data, seal section data.
Further, the state data of the target vehicle is represented by a first vector, the first vector including at least: a position vector representing a position of the vehicle, a vector representing a vehicle-mounted allowance of the vehicle after money is discharged at the position, and a vector representing a money adding amount of money adding at each money adding device, wherein the vehicle-mounted allowance at least comprises: the amount of money remaining after the money is unloaded at the position of the vehicle.
Further, the financial machine to be added with money at least comprises: the automatic teller machine, the route planning device further includes: the banknote adding unit is used for inputting the position information and the target information into a target planning model, and outputting a banknote adding path of the financial machine to be added with banknotes, and the method further comprises the following steps: and adding money to each financial machine to be added with money based on the money adding path.
According to another aspect of the embodiment of the present invention, there is also provided an electronic device, including: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform the path planning method of any of the above via execution of the executable instructions.
According to another aspect of the embodiment of the present invention, there is also provided a computer readable storage medium storing a computer program, where the computer program when executed controls a device in which the computer readable storage medium is located to execute the path planning method of any one of the above.
In the invention, the equipment information of the financial machine to be added with money is obtained; based on the equipment information, acquiring the position information and the target information of the financial machine to be added with money, wherein the target information at least comprises: real-time road condition information; inputting the position information and the target information into a target planning model, and outputting banknote adding paths of all financial machines to be added with banknotes, wherein the banknote adding paths at least comprise: the target vehicle runs from the current starting point to the running route of each financial machine to be added with money and runs to the ending point. The banknote transporting path planning method and device solve the technical problem that the banknote transporting path of the banknote transporting vehicle is planned in a manual planning or traditional algorithm planning mode, and the planning effect is poor. In the invention, when the planned banknote transporting path is planned, the travel route of the target vehicle is planned through the constructed target planning model, so that the situations of low efficiency and poor precision of planning by adopting a manual planning or a traditional algorithm in the related technology are avoided, and the technical effects of improving the banknote transporting path planning efficiency and planning precision of the banknote transporting vehicle are realized.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiments of the invention and together with the description serve to explain the invention and do not constitute a limitation on the invention. In the drawings:
FIG. 1 is a flow chart of an alternative path planning method according to an embodiment of the invention;
FIG. 2 is a flow chart of an alternative banknote transportation path planning according to an embodiment of the present invention;
FIG. 3 is a flow chart of an alternative model training according to an embodiment of the present invention;
FIG. 4 is a flow chart of an alternative agent interacting with environmental conditions in accordance with an embodiment of the present invention;
FIG. 5 is a schematic diagram of an alternative path planning apparatus according to an embodiment of the invention;
fig. 6 is a schematic diagram of an electronic device according to an embodiment of the invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be noted that, related information (including, but not limited to, user equipment information, user personal information, etc.) and data (including, but not limited to, data for presentation, analyzed data, etc.) related to the present disclosure are information and data authorized by a user or sufficiently authorized by each party. For example, an interface is provided between the system and the relevant user or institution, before acquiring the relevant information, the system needs to send an acquisition request to the user or institution through the interface, and acquire the relevant information after receiving the consent information fed back by the user or institution.
Example 1
According to an embodiment of the present invention, an alternative method embodiment of a path planning method is provided, it being noted that the steps shown in the flowchart of the figures may be performed in a computer system, such as a set of computer executable instructions, and, although a logical order is shown in the flowchart, in some cases, the steps shown or described may be performed in an order different from that shown or described herein.
Fig. 1 is a flowchart of an alternative path planning method according to an embodiment of the present invention, as shown in fig. 1, the method comprising the steps of:
step S101, acquiring equipment information of a financial machine to be added with money.
The device information may include, but is not limited to, device identification (e.g., device number), website where the device is located, etc. The financial machine to be banknote added may include, but is not limited to, an ATM machine for a financial institution.
Step S102, based on the equipment information, acquiring the position information and the target information of the financial machine to be added with money, wherein the target information at least comprises: real-time road condition information.
The target information may include real-time road condition information and sealing control information of streets. The real-time road condition information can be the road condition information of each moment, and the real-time road condition information can include but is not limited to the congestion condition of roads, such as the congestion road sections, the number of the congestion road sections, the length of the congestion road sections, whether the traffic is limited, whether the traffic is a single-way road, the banknote transporting mileage, the banknote transporting time efficiency and the like. In this embodiment, the location information of each banknote machine to be added may be acquired in the relevant system based on the device information.
Step S103, inputting the position information and the target information into a target planning model, and outputting banknote adding paths for all financial machines to be added, wherein the banknote adding paths at least comprise: the target vehicle runs from the current starting point to the running route of each financial machine to be added with money and runs to the ending point.
The target planning model may be a trained neural network model, and may be a neural network model that is trained by a depth deterministic strategy gradient algorithm or a modified twin delay depth deterministic strategy gradient algorithm. By inputting the position information and the target information into the neural network model, a banknote adding path for adding banknotes to all the banknote adding devices can be output. It should be noted that, the execution subject of the above steps may be based on a deep reinforcement learning ATM banknote-adding path planning system.
FIG. 2 is a flow chart of an alternative banknote transporting path planning according to an embodiment of the invention, as shown in FIG. 2, when planning the banknote transporting path planning, a cash adding ATM end machine on the same day can be obtained by a business staff at a website, the business staff inputs ATM end machine points into a banknote transporting route planning system, and the route planning system automatically obtains road condition information on the same day, epidemic situation dynamics and inputs the ATM points into a trained model to obtain an optimal route for transporting banknotes on the same day.
Through the steps, in the embodiment, when the banknote transporting path is planned, the travel route of the target vehicle is planned through the constructed target planning model, so that the situation that manual planning or a traditional algorithm is adopted for planning in the related technology, and the efficiency is low and the precision is poor is avoided, and the technical effects of improving the banknote transporting path planning efficiency and planning precision of the banknote transporting vehicle are achieved. The banknote transporting path planning method and device solve the technical problem that the banknote transporting path of the banknote transporting vehicle is planned in a manual planning or traditional algorithm planning mode, and the planning effect is poor.
Optionally, the target planning model is obtained by: obtaining training sample data in a target data pool, wherein the training sample data at least comprises: first state data, action data, second state data and score values of the historical time period, wherein the first state data at least comprises: the second status data includes at least position information of the first position of the target vehicle: the second position information, the action data at least includes: an act of the target vehicle selecting to travel from the first location to the second location; based on training sample data, training an initial neural network model through a target algorithm to obtain a target planning model, wherein a gradient descent algorithm is adopted to process a loss function associated with the initial neural network model when the initial neural network model is trained.
In this embodiment, training sample data may be obtained from the target data pool, where the training sample data may include first state data, action data, second state data, and score values of the historical time period.
The first state data and the second state data may be input based on a state space, and the motion data may be input based on a motion space.
The target algorithms described above may include, but are not limited to: DDPG (depth deterministic strategy gradient algorithm), improved twin delay DDPG algorithm-TD 3 algorithm, twin delay DDPG (Twin Delayed Deep Deterministic Policy Gradient, TD 3) algorithm is an improvement of depth deterministic strategy gradient algorithm.
The twin delay DDPG (Twin Delayed Deep Deterministic Policy Gradient, TD 3) algorithm is an improvement of the depth deterministic strategy gradient algorithm, wherein three key techniques are applied:
(1) Truncated Double Q-Learning: by Learning two Q-value functions, the critic network is updated in a manner similar to Double Q-Learning.
(2) Delay policy update: in the updating process, the updating frequency of the strategy network is lower than that of the Q value network.
(3) Target policy smoothing: noise is added in the output action of the target strategy, so that the estimation of the Q value function is smoothed, and overfitting is avoided.
The model training process is illustrated below with the target algorithm being the DDPG algorithm.
For example: in the model training process, one of the inputs of the algorithm is the state space S, so that the state space S is designed firstly, and the state st of the environment at the moment t can be calculated according to the established model of the securicar path planning by using a group of vectors (p, qt, j 1 ,j 2 ,…,j N ) Wherein p represents the current vehicle transportation position, qt represents the vehicle-mounted allowance after money discharging at the p position, j 1 ,j 2 ,…,j N Representing the banknote transport at node 1, node 2, and node … …, node N. The state of the environment is changed continuously with time, and although time is a continuous variable, according to the markov property of reinforcement learning, in this embodiment, the state change track of the environment may be divided into countless discrete states in units of a certain small period of time, and may be divided in units of the length of time for distributing the vehicle from one network point to another network point. Assuming that a T state exists, in the banknote-car distribution, t=n+2 m, m is the number of times of returning to the distribution center, and N is the customer point or the network point. All states constitute a state set, i.e., state space S, s= { st:0, …, T }, wherein the data corresponding to the state may be state data.
Another input to the algorithm is the action space a, in which the agent selects the next visit in each iteration of the securicar path planningThe dot is the action. The agent can only select one action to execute, and the time discretization processing of the action space is consistent with the state space of the environment. The securicar needs to start from the distribution center and eventually return to the distribution center, thus defaulting to a) 0 =a,T=p 0 I.e., both the first and last actions select a distribution center. At other times, one of the dots that has not been accessed is selected.
FIG. 3 is a flow chart of an alternative model training, as shown in FIG. 3, in training a target planning model, in accordance with an embodiment of the present invention. The experience pool can be initialized firstly, then network parameters are initialized randomly, target network parameters are initialized randomly, the state s is initialized, the action a is selected, the action a is executed, the environment is observed, the timely rewards r and the new state s 'are obtained, the s, a, r and s' are put into the experience pool, the s, aa, rr and ss 'are sampled from the D (experience pool), the training network is updated through the s' until the s is in a termination state, the convergence of the training network model is determined, the network is output, and the model training is finished.
In the path planning of the securicar based on DDPG (depth deterministic policy gradient algorithm), the action cost function Q (s, a|θ) can be updated by bellman equation (Bellman Equations) as in the depth Q network algorithm.
In state St, an action is performed by a strategy μ
Figure BDA0004116027860000091
The next state st+1 and the prize value Rt are obtained. The method comprises the following steps:
Q μ =E[r(S t ,A t )+γQ μ (S t+1 ,μ(S t+1 ))]
wherein gamma is a super parameter, represents a rewarding discount factor, theta is a network parameter, E represents an environment, mu represents a deterministic behavior strategy, and then a Q value y is calculated i
y i =r i +γQ′(S i+1 ,μ′(S i+1μ′ )|θ Q′ )
Minimizing the loss function using a gradient descent algorithm:
Figure BDA0004116027860000092
updated by means of batch samples (batch):
Figure BDA0004116027860000093
where N is a model parameter, a represents an action, and s represents a state.
Furthermore, the depth deterministic strategy gradient algorithm employs a target network like the depth Q network algorithm, but here the target network is updated by an exponential smoothing method instead of directly replacing the parameters:
θ Q′ ←τθ Q +(1-τ)θ Q′
θ μ′ ←τθ μ +(1-τ)θ μ′
where τ is a super parameter, τ represents a soft update factor, the parameter τ < <1, the update of the target network is slow and stable, and the training is beneficial to convergence.
In this embodiment, after setting a reward function in the established path planning environment of the securicar, firstly setting input conditions such as bad road conditions and streets with risks as random when training a model, then introducing an algorithm training model, and after a great amount of training and debugging, using the trained model for path planning of the securicar, wherein the model only needs to obtain necessary factors such as coordinates of all net points to be added with money, road condition information of the day, streets with epidemic situation with separated risks, and the like in the using process, and finally obtaining an optimal path passing through a plurality of net points. The route at this time should be the shortest route to avoid the congested road section and the risky area, and to get through all the points requiring banknote adding.
Optionally, before acquiring the training sample data in the target data pool, the method further includes: acquiring first state data of a first historical moment; selecting action data through a target strategy so as to enable the state of the target vehicle at the first historical moment to be switched to the state at the second historical moment, and obtaining second state data; determining a score value based on position information of the target vehicle at the first history time and the second history time; the first state data, the second state data, the action data and the score value are stored in a target data pool.
In the present embodiment, in the path planning actual process, at time t+1 (corresponding to the second history time), the feedback (rewards) of the environment depend only on the state St and the action At the time t of the last time (corresponding to the first history time) and are independent of any other time steps. FIG. 4 is a flowchart of an alternative agent interacting with an environmental state, as shown in FIG. 4, assuming a time step t, the agent obtains a current state St from the environment and takes action At (the data associated with action At is action data) according to a policy μ (deterministic behavior policy, corresponding to the target policy described above), such that St becomes St+1 and obtains a reward value Rt (corresponding to the score value described above), thereby generating experience data (St, at, rt, st+1), and storing the experience data in an experience pool (corresponding to the target data pool described above) as a data set of a training model.
Optionally, determining the score value based on the position information of the target vehicle at the first historical moment and the second historical moment includes: calculating a distance value of the first position and the second position based on the position information of the target vehicle at the first history time and the second history time; acquiring road condition data between a first historical moment and a second historical moment; and determining a score value based on the road condition data and the distance value.
In this embodiment, the rewarding measure is the core step of the reinforcement learning design. If the design is not reasonable, the desired result is not obtained. However, the reward design is difficult because the reward (result) is quite sparse, and generally, the manually designed reward function is more practical and efficient. The action rewards r are the key of the algorithm and determine the learning direction and efficiency of the algorithm. In each step of the iterative process, the environment gives a prize value (i.e. a score) based on the current state s and the action a selected for execution, and the evaluation and improvement of the strategy will be based on the prize r. In the securicar path planning problem, the objective is to minimize the total access distance, the rewards of each step can be expressed by the distance between every two nodes (i.e. the distance between the first position and the second position), and meanwhile, the road condition jammed in the simulation environment and the street regarded as medium and high risk are regarded as negative rewards in the rewards function, namely, the scoring values can be determined based on the distance between the nodes, the road condition and the street sealing condition.
Optionally, the road condition data at least includes: congestion section data, seal section data.
The road condition data may include, but is not limited to, a congested road condition (corresponding to the congested road section data, for example, the number of congested road sections, the length of congested road sections, the location of congested road sections, etc.), a traffic control road section data (for example, a street regarded as a medium-high risk street, etc.), whether to limit traffic, whether to be a one-way road, a banknote mileage, a banknote timeliness, etc.
Optionally, the state data of the target vehicle is represented by a first vector, and the first vector at least includes: a position vector representing a position of the vehicle, a vector representing a vehicle-mounted allowance of the vehicle after money is discharged at the position, and a vector representing a money adding amount of money adding at each money adding device, wherein the vehicle-mounted allowance at least comprises: the amount of money remaining after the money is unloaded at the position of the vehicle.
The vehicle-mounted allowance can comprise the residual banknote carrying capacity of the vehicle after the banknote is unloaded at the position, the residual carrying capacity of the vehicle and the like.
In the model training process, one of the inputs of the algorithm is the state space S, so that the state space S is designed firstly, and the state S at the moment of the environment at the moment of t can be calculated according to the established model of the securicar path planning t With a set of vectors (p, q t ,j 1 ,j 2 ,…,j N ) (corresponding to the first vector described above), where p represents the current vehicle transport position, qt represents the vehicle-mounted margin after money removal at the p position, j 1 ,j 2 ,…,j N Representing the banknote transport at node 1, node 2, and node … …, node N. The state of the environment is changing over time, although time is a continuous variable, according to reinforcement learning Ma ErkeIn this embodiment, the environmental state change track may be divided into a plurality of discrete states in units of a small period of time, and may be divided in units of a length of time for which the delivery vehicle is from one network point to another. Assuming that a T state exists, in the banknote-car distribution, t=n+2 m, m is the number of returns to the distribution center, and N represents a customer point or a network point. All states constitute a state set, i.e., state space S, s= { st:0, …, T }, wherein the data corresponding to the state may be state data.
Optionally, the financial machine to be added with money at least comprises: the automatic teller machine inputs the position information and the target information into the target planning model, and outputs a money adding path of the financial machine to be added with money, and the method further comprises: and adding money to each financial machine to be added with money based on the money adding path.
Through the embodiment, the banknote adding route can be intelligently planned, and the most reasonable route among the network points is automatically calculated by combining geographic positions, road condition information, medium-high risk streets acquired from public platforms and utilizing a machine algorithm and an artificial intelligent algorithm; the banknote transporting path planning based on the deep reinforcement learning algorithm can be used for overview, dynamically integrating all network points, route conditions and current epidemic situation into a whole, and reasonably calculating to select an optimal solution.
In the embodiment, the banknote adding path can be planned efficiently, and the speed and effect of the algorithm planning are far greater than those of manual work, so that the operation resources of a financial institution can be saved, and the banknote transporting route does not need to be planned by spending manpower and a large amount of time. The system can flexibly adjust the details of the planning scheme, business personnel only need to input necessary factors such as ATM (automatic teller machine) sites needing money adding, latest dynamic streets in epidemic situation and the like on the same day, and the trained model can automatically plan an optimal route and carry out money carrying based on the route, so that the technical effect of improving the money carrying efficiency of a money carrying vehicle is realized.
Compared with the conventional algorithm, the method for planning the path of the securicar has higher algorithm efficiency and better planning effect, and the planned route is more intelligent and humanized by adding factors such as epidemic situation, road condition and the like, so that the technical effect of improving the efficiency and the accuracy of the securicar path planning is realized.
Example two
A second embodiment of the present application provides an embodiment of an alternative path planning apparatus, where each implementation unit in the path planning apparatus corresponds to each implementation step in the first embodiment.
Fig. 5 is a schematic diagram of an alternative path planning apparatus according to an embodiment of the present invention, as shown in fig. 5, including: a first acquisition unit 51, a second acquisition unit 52, a processing unit 53.
Specifically, the first obtaining unit 51 is configured to obtain device information of a financial machine to be added with money;
a second obtaining unit 52, configured to obtain, based on the device information, location information of the financial machine to be added with money and target information, where the target information at least includes: real-time road condition information;
the processing unit 53 is configured to input the position information and the target information into the target planning model, and output banknote adding paths for all the financial machines to be added, where the banknote adding paths at least include: the target vehicle runs from the current starting point to the running route of each financial machine to be added with money and runs to the ending point.
In the path planning device provided in the second embodiment of the present application, the device information of the financial machine to be added may be acquired by the first acquiring unit 51, and the position information and the target information of the financial machine to be added may be acquired by the second acquiring unit 52 based on the device information, where the target information at least includes: the real-time road condition information is input into the target planning model through the processing unit 53 by the position information and the target information, and banknote adding paths of all financial machines to be added are output, wherein the banknote adding paths at least comprise: the target vehicle runs from the current starting point to the running route of each financial machine to be added with money and runs to the ending point. The banknote transporting path planning method and device solve the technical problem that the banknote transporting path of the banknote transporting vehicle is planned in a manual planning or traditional algorithm planning mode, and the planning effect is poor. In the invention, when the planned banknote transporting path is planned, the travel route of the target vehicle is planned through the constructed target planning model, so that the condition that the manual planning or the traditional algorithm is adopted for planning in the related technology, the efficiency is low and the precision is poor is avoided, and the technical effects of improving the banknote transporting path planning efficiency and the planning precision of the banknote transporting vehicle are realized.
Optionally, in the path planning apparatus provided in the second embodiment of the present application, the target planning model is obtained by: the third obtaining unit is configured to obtain training sample data in the target data pool, where the training sample data at least includes: first state data, action data, second state data and score values of the historical time period, wherein the first state data at least comprises: the second status data includes at least position information of the first position of the target vehicle: the second position information, the action data at least includes: an act of the target vehicle selecting to travel from the first location to the second location; the training unit is used for training the initial neural network model through a target algorithm based on training sample data to obtain a target planning model, wherein a gradient descent algorithm is adopted to process a loss function associated with the initial neural network model when the initial neural network model is trained.
Optionally, in the path planning device provided in the second embodiment of the present application, the path planning device further includes: a fourth obtaining unit, configured to obtain first state data at a first history time before obtaining training sample data in the target data pool; the selection unit is used for selecting the action data through the target strategy so as to enable the state of the target vehicle at the first historical moment to be switched to the state at the second historical moment, and second state data are obtained; a determining unit configured to determine a score value based on position information of the target vehicle at the first history time and the second history time; and the storage unit is used for storing the first state data, the second state data, the action data and the score value into the target data pool.
Optionally, in the path planning apparatus provided in the second embodiment of the present application, the determining unit includes: a calculation subunit for calculating a distance value of the first position and the second position based on position information of the target vehicle at the first history time and the second history time; the acquisition subunit is used for acquiring road condition data between the first historical moment and the second historical moment; and the determining subunit is used for determining the score value based on the road condition data and the distance value.
Optionally, in the path planning device provided in the second embodiment of the present application, the road condition data at least includes: congestion section data, seal section data.
Optionally, in the path planning device provided in the second embodiment of the present application, the state data of the target vehicle is represented by a first vector, where the first vector at least includes: a position vector representing a position of the vehicle, a vector representing a vehicle-mounted allowance of the vehicle after money is discharged at the position, and a vector representing a money adding amount of money adding at each money adding device, wherein the vehicle-mounted allowance at least comprises: the amount of money remaining after the money is unloaded at the position of the vehicle.
Optionally, in the path planning device provided in the second embodiment of the present application, the financial machine to be added with money at least includes: the automatic teller machine, the route planning device further includes: the banknote adding unit is used for inputting the position information and the target information into the target planning model, and outputting a banknote adding path of the financial machine to be added, and the method further comprises the following steps: and adding money to each financial machine to be added with money based on the money adding path.
The path planning apparatus may further include a processor and a memory, wherein the first acquiring unit 51, the second acquiring unit 52, the processing unit 53, and the like are stored in the memory as program units, and the processor executes the program units stored in the memory to implement the corresponding functions.
The processor includes a kernel, and the kernel fetches a corresponding program unit from the memory. The inner core can be provided with one or more than one core parameters, and when the planned banknote transporting path is planned, the driving route of the target vehicle is planned through the constructed target planning model, so that the condition that manual planning or a traditional algorithm is adopted for planning in the related technology, and the efficiency is low and the precision is poor is avoided, and the technical effects of improving the banknote transporting path planning efficiency and planning precision of the banknote transporting vehicle are realized.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM), which includes at least one memory chip.
According to another aspect of the embodiment of the present invention, there is also provided an electronic device, including: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform the path planning method of any of the above via execution of the executable instructions.
According to another aspect of the embodiment of the present invention, there is also provided a computer readable storage medium storing a computer program, where the computer program when executed controls a device in which the computer readable storage medium is located to execute the path planning method of any one of the above.
Fig. 6 is a schematic diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 6, an electronic device 60 is provided according to an embodiment of the present invention, where the electronic device includes a processor, a memory, and a program stored on the memory and executable on the processor, and the processor implements a path planning method according to any one of the above when executing the program.
The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
In the foregoing embodiments of the present invention, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.
In the several embodiments provided in the present application, it should be understood that the disclosed technology content may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of the units, for example, may be a logic function division, and may be implemented in another manner, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which are intended to be comprehended within the scope of the present invention.

Claims (10)

1. A method of path planning, comprising:
acquiring equipment information of a financial machine to be added with money;
based on the equipment information, acquiring the position information and target information of the financial machine to be added with money, wherein the target information at least comprises: real-time road condition information;
inputting the position information and the target information into a target planning model, and outputting banknote adding paths of all the financial machines to be added with banknotes, wherein the banknote adding paths at least comprise: and the target vehicle runs from the current starting point to the running route of each financial machine to be added with money and runs from the current starting point to the running ending point.
2. The path planning method of claim 1, wherein the target planning model is obtained by:
obtaining training sample data in a target data pool, wherein the training sample data at least comprises: first state data, action data, second state data and score values of the historical time period, wherein the first state data at least comprises: the second status data includes at least position information of a first position of the target vehicle: position information of the second position, the action data at least comprising: an act of the target vehicle selecting to travel from the first location to the second location;
And training an initial neural network model through a target algorithm based on the training sample data to obtain the target planning model, wherein a gradient descent algorithm is adopted to process a loss function associated with the initial neural network model when the initial neural network model is trained.
3. The path planning method of claim 2, further comprising, prior to acquiring training sample data from the target data pool:
acquiring the first state data of a first historical moment;
selecting the action data through a target strategy to enable the state of the target vehicle at the first historical moment to be switched to the state at the second historical moment, so as to obtain the second state data;
determining the score value based on position information of the target vehicle at the first history time and the second history time;
and storing the first state data, the second state data, the action data and the score value into the target data pool.
4. A path planning method according to claim 3, characterized in that determining the score value based on the position information of the target vehicle at the first history time and the second history time comprises:
Calculating a distance value of the first position and the second position based on position information of the target vehicle at the first history time and the second history time;
acquiring road condition data between the first historical moment and the second historical moment;
and determining the score value based on the road condition data and the distance value.
5. The path planning method according to claim 4, wherein the road condition data includes at least: congestion section data, seal section data.
6. The path planning method according to claim 2, wherein the state data of the target vehicle is represented by a first vector, the first vector including at least: a position vector representing a position of the vehicle, a vector representing a vehicle-mounted allowance of the vehicle after money is discharged at the position, and a vector representing a money adding amount of money adding at each money adding device, wherein the vehicle-mounted allowance at least comprises: the amount of money remaining after the money is unloaded at the position of the vehicle.
7. A path planning method according to any one of claims 1 to 6, characterized in that the financial machine to be added with money comprises at least: the ATM inputs the position information and the target information into a target planning model, and outputs a money adding path to the financial machine to be added with money, and the method further comprises: and adding money to each financial machine to be added with money based on the money adding path.
8. A path planning apparatus, comprising:
the first acquisition unit is used for acquiring equipment information of the financial machine to be added with money;
the second obtaining unit is configured to obtain, based on the device information, location information and target information of the financial machine to be added, where the target information at least includes: real-time road condition information;
the processing unit is used for inputting the position information and the target information into a target planning model and outputting banknote adding paths of all the financial machines to be added with banknotes, wherein the banknote adding paths at least comprise: and the target vehicle runs from the current starting point to the running route of each financial machine to be added with money and runs from the current starting point to the running ending point.
9. A computer readable storage medium, wherein a computer program is stored in the computer readable storage medium, and wherein the computer program when run controls a device in which the computer readable storage medium is located to perform the path planning method according to any one of claims 1 to 7.
10. An electronic device comprising one or more processors and a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the path planning method of any of claims 1-7.
CN202310219228.9A 2023-03-06 2023-03-06 Path planning method and device, storage medium and electronic equipment Pending CN116166030A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310219228.9A CN116166030A (en) 2023-03-06 2023-03-06 Path planning method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310219228.9A CN116166030A (en) 2023-03-06 2023-03-06 Path planning method and device, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN116166030A true CN116166030A (en) 2023-05-26

Family

ID=86420043

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310219228.9A Pending CN116166030A (en) 2023-03-06 2023-03-06 Path planning method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN116166030A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116499487A (en) * 2023-06-28 2023-07-28 新石器慧通(北京)科技有限公司 Vehicle path planning method, device, equipment and medium
CN117103282A (en) * 2023-10-20 2023-11-24 南京航空航天大学 Double-arm robot cooperative motion control method based on MATD3 algorithm

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116499487A (en) * 2023-06-28 2023-07-28 新石器慧通(北京)科技有限公司 Vehicle path planning method, device, equipment and medium
CN116499487B (en) * 2023-06-28 2023-09-05 新石器慧通(北京)科技有限公司 Vehicle path planning method, device, equipment and medium
CN117103282A (en) * 2023-10-20 2023-11-24 南京航空航天大学 Double-arm robot cooperative motion control method based on MATD3 algorithm
CN117103282B (en) * 2023-10-20 2024-02-13 南京航空航天大学 Double-arm robot cooperative motion control method based on MATD3 algorithm

Similar Documents

Publication Publication Date Title
CN116166030A (en) Path planning method and device, storage medium and electronic equipment
Ulmer et al. On modeling stochastic dynamic vehicle routing problems
Liu et al. How machine learning informs ride-hailing services: A survey
Cheung et al. Dynamic routing model and solution methods for fleet management with mobile technologies
Dallmeyer et al. Don't go with the ant flow: Ant-inspired traffic routing in urban environments
Zhen et al. Crowdsourcing mode evaluation for parcel delivery service platforms
Kirci An optimization algorithm for a capacitated vehicle routing problem with time windows
van der Hagen et al. Machine learning–based feasibility checks for dynamic time slot management
CN113780956B (en) Logistics freight generation method, device, equipment and storage medium
Mrazovic et al. Improving mobility in smart cities with intelligent tourist trip planning
Pugliese et al. Combining variable neighborhood search and machine learning to solve the vehicle routing problem with crowd-shipping
Wang et al. Joint optimization of parcel allocation and crowd routing for crowdsourced last-mile delivery
Gu et al. Dynamic truck–drone routing problem for scheduled deliveries and on-demand pickups with time-related constraints
Su et al. Heterogeneous fleet vehicle scheduling problems for dynamic pickup and delivery problem with time windows in shared logistics platform: Formulation, instances and algorithms
Dieter et al. Integrating driver behavior into last-mile delivery routing: Combining machine learning and optimization in a hybrid decision support framework
Zhang et al. Hybrid evolutionary optimization for takeaway order selection and delivery path planning utilizing habit data
Friedrich et al. Urban consolidation centers and city toll schemes–Investigating the impact of city tolls on transshipment decisions
Zhou et al. Reinforcement Learning-based approach for dynamic vehicle routing problem with stochastic demand
Pan et al. A Grey Neural Network Model Optimized by Fruit Fly Optimization Algorithm for Short-term Traffic Forecasting.
Gao et al. The stochastic share-a-ride problem with electric vehicles and customer priorities
Ruiz et al. Prize-collecting traveling salesman problem: a reinforcement learning approach
Akkerman et al. Learning dynamic selection and pricing of out-of-home deliveries
Habib et al. Multi-agent reinforcement learning for multi vehicles one-commodity vehicle routing problem
Hanna et al. Selecting compliant agents for opt-in micro-tolling
Wu et al. A decision support approach for two-stage multi-objective index tracking using improved lagrangian decomposition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination