CN117367435A

CN117367435A - Evacuation path planning method, device, equipment and storage medium

Info

Publication number: CN117367435A
Application number: CN202311658928.4A
Authority: CN
Inventors: 王凯敏; 李文娟; 刘川枫; 牟林; 牛茜如
Original assignee: Shenzhen Marine Development Research And Promotion Center Shenzhen Marine Monitoring And Forecasting Center; Shenzhen University
Current assignee: Shenzhen Marine Development Research And Promotion Center Shenzhen Marine Monitoring And Forecasting Center; Shenzhen University
Priority date: 2023-12-06
Filing date: 2023-12-06
Publication date: 2024-01-09
Anticipated expiration: 2043-12-06
Also published as: CN117367435B

Abstract

The present disclosure relates to an evacuation path planning method, apparatus, device, and storage medium, the method comprising: acquiring current position information; based on the position information, acquiring information of surrounding environment from a database updated in real time, wherein the information of the surrounding environment comprises road network information, disaster avoidance facility information, regional risk level and distance from the nearest disaster avoidance facility; based on the information of the surrounding environment and a pre-trained deep reinforcement learning model, determining an evacuation path in the range of the surrounding environment, wherein the evacuation path is at least a partial path from the current position to the nearest disaster avoidance facility. According to the method and the system, the evacuation path in the range of the surrounding environment is obtained by obtaining the information of the surrounding environment and utilizing the deep reinforcement learning model, the evacuation path can be planned according to the information of the surrounding environment which changes in real time, so that the evacuation path is adaptive to the current situation, and evacuation personnel can complete evacuation by utilizing the information in the range of the visual field under the guidance of the evacuation path.

Description

Evacuation path planning method, device, equipment and storage medium

Technical Field

The disclosure relates to the technical field of path planning, and in particular relates to an evacuation path planning method, device, equipment and storage medium.

Background

Storm surge is one of the main ocean disasters faced in China, and is characterized in that during strong storm, abnormal and temporary rising of water level occurs, so that extreme inland floods occur in coastal areas. The method has high frequency and strong destructiveness, and causes a great deal of personnel and economic losses for coastal areas of China. At present, storm surge disasters are listed in huge disaster types in China, and emergency evacuation path planning under the storm surge disasters plays a vital role in saving lives and reducing disasters.

The existing evacuation path planning method mainly performs evacuation path planning according to a static global environment from the perspective of a manager, is not flexible enough, and requires evacuation personnel to master complete environment information, however, in a real situation, the evacuation personnel often have difficulty in acquiring global information, only can acquire limited surrounding environment information, the surrounding environment is dynamically changed in real time, and the evacuation path planned according to the static global environment can play a very limited role. Therefore, how to flexibly plan the evacuation path when storm surge occurs is a technical problem to be solved.

Disclosure of Invention

In order to solve the technical problems, the present disclosure provides an evacuation path planning method, an evacuation path planning device, evacuation path planning equipment and a storage medium.

A first aspect of an embodiment of the present disclosure provides an evacuation path planning method, including:

acquiring current position information;

based on the position information, acquiring information of surrounding environment from a database updated in real time, wherein the information of the surrounding environment comprises road network information, disaster avoidance facility information, regional risk level and distance from the nearest disaster avoidance facility;

based on the information of the surrounding environment and a pre-trained deep reinforcement learning model, determining an evacuation path in the range of the surrounding environment, wherein the evacuation path is at least a partial path from the current position to the nearest disaster avoidance facility.

A second aspect of an embodiment of the present disclosure provides an evacuation path planning apparatus, including:

the position acquisition module is used for acquiring current position information;

the information acquisition module is used for acquiring information of surrounding environment from a database updated in real time based on the position information, wherein the information of the surrounding environment comprises road network information, disaster avoidance facility information, regional risk level and distance from the nearest disaster avoidance facility;

And the path determining module is used for determining an evacuation path in the range of the surrounding environment based on the information of the surrounding environment and a pre-trained deep reinforcement learning model, wherein the evacuation path is at least a part of the path from the current position to the nearest disaster avoidance facility.

A third aspect of the embodiments of the present disclosure provides a computer device, including a memory and a processor, and a computer program, where the memory stores the computer program, and when the computer program is executed by the processor, implements the evacuation path planning method as in the first aspect.

A fourth aspect of the embodiments of the present disclosure provides a computer-readable storage medium, in which a computer program is stored which, when executed by a processor, implements an evacuation path planning method as in the first aspect described above.

Compared with the prior art, the technical scheme provided by the embodiment of the disclosure has the following advantages:

according to the evacuation path planning method, device and equipment and storage medium, the current position information is acquired, the information of the surrounding environment is acquired from the database updated in real time based on the position information, the information of the surrounding environment comprises road network information, disaster avoidance facility information, regional risk level and distance from the nearest disaster avoidance facility, the evacuation path in the surrounding environment range is determined based on the information of the surrounding environment and the pre-trained deep reinforcement learning model, the evacuation path is at least a part of the path from the current position to the nearest disaster avoidance facility, the evacuation path can be planned according to the information of the surrounding environment changing in real time when storm surge occurs, the evacuation path is enabled to be suitable for the current situation, a better guiding effect is achieved, and evacuation personnel can complete evacuation by utilizing the observable environment information in the visual field range under the guidance of the evacuation path without grasping global information, so that the flexibility of path planning is improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure.

In order to more clearly illustrate the embodiments of the present disclosure or the solutions in the prior art, the drawings that are required for the description of the embodiments or the prior art will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.

Fig. 1 is a flowchart of an evacuation path planning method provided in an embodiment of the present disclosure;

FIG. 2 is a flow chart of a method of determining a location of a disaster avoidance facility provided by an embodiment of the present disclosure;

FIG. 3 is a flow chart of a method of determining a disaster avoidance facility provided by an embodiment of the present disclosure;

FIG. 4 is a flow chart of a method of training a deep reinforcement learning model provided by an embodiment of the present disclosure;

FIG. 5 is a flow chart of a method of determining a first neighboring grid provided by an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of an evacuation path planning apparatus according to an embodiment of the present disclosure;

Fig. 7 is a schematic structural diagram of a computer device according to an embodiment of the present disclosure.

Detailed Description

In order that the above objects, features and advantages of the present disclosure may be more clearly understood, a further description of aspects of the present disclosure will be provided below. It should be noted that, without conflict, the embodiments of the present disclosure and features in the embodiments may be combined with each other.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure, but the present disclosure may be practiced otherwise than as described herein; it will be apparent that the embodiments in the specification are only some, but not all, embodiments of the disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order and/or performed in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

Fig. 1 is a flowchart of an evacuation path planning method according to an embodiment of the present disclosure, which may be performed by an evacuation path planning apparatus. As shown in fig. 1, the evacuation path planning method provided in this embodiment includes the following steps:

S101, acquiring current position information.

In the embodiment of the disclosure, the evacuation path planning device may acquire the current location information of the personnel to be evacuated, where the location information may be longitude and latitude information with higher precision.

In an exemplary implementation manner of the embodiment of the present disclosure, the evacuation path planning device may acquire, when receiving an evacuation path planning request of a user, location information of the user at present, or may acquire, after receiving the evacuation path planning request of the user, location information of the user at intervals of a preset time.

S102, based on the position information, acquiring information of surrounding environment from a database updated in real time, wherein the information of the surrounding environment comprises road network information, disaster avoidance facility information, regional risk level and distance from the nearest disaster avoidance facility.

The database in the embodiment of the disclosure can be updated in real time according to the collected information, such as updating disaster avoidance facility information and the distance between the disaster avoidance facility information and the nearest disaster avoidance facility according to personnel bearing conditions of each disaster avoidance facility, updating regional risk levels according to the submerged depths of each place, and the like, so that the evacuation path planning device can acquire the latest real information from the database, and path planning is better performed.

The surrounding environment in the embodiment of the present disclosure may be understood as an environmental area around a person to be evacuated, and for example, in order to conform to the field of view of a person in a real scene, a rectangular area with a side length of 320m may be determined as the surrounding environment with the person to be evacuated as the center.

The disaster avoidance facility in the embodiment of the disclosure may be understood as a facility capable of being used as a temporary disaster avoidance building when a storm surge disaster occurs, and for example, since a building structure of a hospital, a primary school, a middle school is firm and has good waterproof performance, the disaster avoidance facility may include the hospital, the primary school, the middle school, etc., and the disaster avoidance facility information may be understood as information representing whether the disaster avoidance facility exists at the current position.

The risk level of the area in the embodiment of the present disclosure may be understood as a risk level determined according to the submerged water depth of the area where the current position is located, different submerged water depths may be classified into different risk levels, the deeper the water depth is, the greater the passing difficulty is, the higher the risk level is, for example, the risk level of the area may be classified into five levels, the risk level is one level when the submerged water depth is 0-0.15m, the risk level is two levels when the submerged water depth is 0.15-0.5m, the risk level is three levels when the submerged water depth is 0.5-1.0m, the risk level is four levels when the submerged water depth is 1.0-2.0m, and the risk level is five levels when the submerged water depth is greater than 2.0m, which is not limited herein.

In the embodiment of the disclosure, a plurality of disaster avoidance facilities may exist in a region, when planning an evacuation path, the disaster avoidance facility closest to a person may be determined according to the position of the person, and the closest disaster avoidance facility is used as a final destination for planning, so that the person can arrive at one disaster avoidance facility as soon as possible.

In the embodiment of the disclosure, after obtaining the current position information, the evacuation path planning device may determine the range of the surrounding environment according to the position information, and further obtain the information of the surrounding environment in the database updated in real time, specifically obtain the road network information, the disaster avoidance facility information, the regional risk level and the distance from the nearest disaster avoidance facility within the range of the surrounding environment.

In an exemplary implementation manner of the embodiment of the present disclosure, the evacuation path planning apparatus may divide the surrounding environment range into a certain number of grids, for example, divide an environment with a rectangular shape having a side length of 320m into 20×20 grids, and obtain road network information, disaster avoidance facility information, regional risk level, and distance from the nearest disaster avoidance facility corresponding to each grid.

In another exemplary implementation of the disclosed embodiments, the regional risk level may be determined by modeling storm surge data in combination with a regional topography elevation map, in particular, the evacuation path planning apparatus may model a wind field using a Jelesnianski hurricane model, model a submerged water depth using a coupled model of a marine hydrodynamic calculation (Advanced Circulation Model For Oceanic, coastal and Estuarine Waters, ADCIRC) model and a coastal sea wave number (Simulating Waves Nearshore, SWAN) model, obtain a continuous submerged water depth image using interpolation, subtract the regional topography elevation map to obtain an actual submerged water depth, and determine the corresponding regional risk level. Wherein the Jelesnianski hurricane model requires setting four parameters: the cyclone trajectory T, the lowest center air pressure Pc, the maximum wind speed Vmax, the maximum radius Rmax, rmax may be obtained using the following empirical formula:

Wherein,for the background barometric pressure, +.>The cyclone track T can be obtained through a tropical cyclone optimal path set, the lowest central air pressure Pc takes a value range of 880 hPa-960 hPa, and the maximum wind speed Vmax can be estimated through the following empirical formula:

s103, determining an evacuation path in the range of the surrounding environment based on the information of the surrounding environment and the pre-trained deep reinforcement learning model, wherein the evacuation path is at least a part of the path from the current position to the nearest disaster avoidance facility.

In the embodiment of the disclosure, after obtaining the information of the surrounding environment, the evacuation path planning device may input the information of the surrounding environment into the deep reinforcement learning model to obtain a model output result, and determine an evacuation path within a range of the surrounding environment according to the model output result, where the evacuation path is a path from the current location to the nearest disaster avoidance facility or is a part of a path from the current location to the nearest disaster avoidance facility.

The Deep reinforcement learning model in the embodiment of the present disclosure may be understood as a model trained by using a Deep reinforcement learning method, and the Deep reinforcement learning model may be a Deep Q Network (DQN) model, or may be other models, which is not limited herein.

In an exemplary implementation manner of the embodiment of the present disclosure, when it is determined that the currently planned evacuation path cannot reach the nearest disaster avoidance facility, the evacuation path planning apparatus may continue to execute the step of acquiring the current location information in S101 when the user follows the planned path to reach the path end point or moves to a certain point in the planned path, acquire new location information, further acquire information of a new surrounding environment based on the new location information, and plan the new evacuation path by using the deep reinforcement learning model until the evacuation path can reach the nearest disaster avoidance facility.

In another exemplary implementation of the disclosed embodiments, the path planning problem may be redefined based on a markov decision process, the path planning problem is converted into a finite continuous decision process, which direction to proceed next is decided according to the current location, and in a grid environment, the path planning may be described as moving from an initial grid to a grid where a destination is located, each movement may reach a grid adjacent to the current grid from the current grid, and a deep reinforcement learning model is used to implement the decision process, so S103 may include: inputting the information of the surrounding environment into a deep reinforcement learning model to obtain the advancing direction output by the deep reinforcement learning model, and determining the evacuation path based on the advancing direction and road network information.

Specifically, the evacuation path planning device may input the obtained information of the surrounding environment to the deep reinforcement learning model, determine the mesh to be moved from eight surrounding meshes by the deep reinforcement learning model, determine the direction of the mesh relative to the current mesh as the advancing direction, output the advancing direction, and eight adjacent meshes correspond to the eight advancing directions, namely, the north direction, the north-east direction, the south-southeast direction, the southwest direction, the west direction and the northwest direction, respectively, and after the advancing direction output by the model is obtained, the evacuation path planning device may plan an evacuation path corresponding to the advancing direction in combination with the road network information.

According to the embodiment of the disclosure, by acquiring the current position information and acquiring the information of the surrounding environment from the real-time updated database based on the position information, the information of the surrounding environment comprises road network information, disaster avoidance facility information, regional risk level and distance from the nearest disaster avoidance facility, and the evacuation path in the surrounding environment range is determined based on the information of the surrounding environment and the pre-trained deep reinforcement learning model, and is at least a part of the path from the current position to the nearest disaster avoidance facility, so that the evacuation path can be planned according to the information of the surrounding environment which changes in real time when storm surge occurs, the evacuation path is adapted to the current situation, a better guiding effect is achieved, and meanwhile evacuation personnel can complete evacuation by utilizing the observable environment information in the visual field range under the guidance of the evacuation path without grasping global information, the flexibility of path planning is improved, and compared with the processing speed of global planning.

Fig. 2 is a flowchart of a method for determining a location of a disaster avoidance facility according to an embodiment of the present disclosure, and as shown in fig. 2, the location of the disaster avoidance facility may be determined according to the following method on the basis of the above embodiment.

S201, acquiring a satellite image of a preset area where the current position is located.

In the embodiment of the disclosure, the evacuation path planning device may determine the position of the disaster avoidance facility existing in the preset area where the current position is located before acquiring the disaster avoidance facility information of the surrounding environment and the distance from the nearest disaster avoidance facility, and specifically may first acquire the satellite image of the preset area where the current position is located, and then determine the position of the disaster avoidance facility according to the satellite image.

S202, performing target detection on the satellite image, and determining an image of a building contained in the satellite image.

In the embodiment of the disclosure, the evacuation path planning device may perform target detection processing on the satellite image after obtaining the satellite image, and determine an image of a building included in the satellite image.

In an exemplary implementation manner of the embodiment of the disclosure, since the satellite image is large, it is difficult to detect images of all buildings included in the satellite image in one target detection, the evacuation path planning apparatus may slide on the large-scale satellite image by using the overlapping sliding window to select a rectangular area after obtaining the satellite image, which is exemplified as the size The sliding window size is +.>A slip step distance d (d)<l) the image can be segmented into +.>A rectangular region, alternatively, may be +.>Pixel (s)/(s)>Other dimensions of the pixel may also be chosen, but +.>It should not exceed 1000 pixels and the step size should be between +.>To->Between them. After the rectangular areas are obtained through segmentation, target detection is carried out on each rectangular area, detection frames of buildings contained in the rectangular areas are determined, the intersection ratio of the detection frames contained in adjacent rectangular areas is calculated, the detection frames are combined based on a calculation result, and the images of the buildings contained in the satellite images are determined based on the combined detection frames.

The target detection for each rectangular area can be achieved through a pre-trained target detection model, and an example is that the target detection model can be a Faster area convolutional neural network (Faster Region Convolutional Neural Networks, faster R-CNN) model, a classical Faster-RCNN model is provided with nine candidate frames with different sizes for detecting targets with different scales, in the embodiment of the disclosure, a building in a satellite image is detected, the scale is small, meanwhile, the building outline characteristics of coastal areas which are attacked by storm surge are considered, the aspect ratio of the building is close to 1:1, so that new candidate frames are used, the ratio can be three of 1:1,1:1.4 and 1.4:1, the size can be two of 64 pixels and 128 pixels, and the building frames are shared And (5) planting candidate boxes. Alternatively, other proportions and sizes of candidate frames may be preset, but the proportion of candidate frames is not excessiveAnd 1:2. The target detection model can be trained by adopting an image marked by manpower, and a loss function used for training is as follows:

the fast-RCNN model uses a regional generation network (Region Proposal Network, RPN) network to generate candidate boxes.Classifying loss for the RPN network II; />Regression loss for the RPN network candidate frame; />Loss for multiple classifications; />Regression loss is detected for the target. />Predicting the probability of being a building target for the ith anchor point; />A real tag representing the ith anchor point, 1 when the target is a building, or 0 otherwise; />A bounding box regression prediction parameter representing an ith anchor point; />Representing the regression parameters of the real boundary boxes corresponding to the ith anchor point; />Predicting a probability distribution for the building class of the j-th candidate box; />Is j thBuilding class true probability distribution of the candidate frames; />Regression prediction parameters of the bounding box of the category u corresponding to the j-th candidate box; />And representing the regression parameters of the real boundary boxes corresponding to the j-th candidate box. />For cross entropy loss function, +.>Is a smoothjl1 loss function. />The loss is used to measure whether a building target in the satellite image can be detected. The conventional fast-RCNN model has a multi-classifier for multi-classifying detected objects, but at this stage, only the object of a building in an image can be detected, and the type of the building cannot be accurately inferred, which results in >Always larger, if the original loss function is used, this will result in a multi-classification loss ++>Bad influence on the model makes the model difficult to converge or converge in the wrong direction. Four coefficients are used in the present invention +.>The loss is weighted to optimize the loss function of the Faster-RCNN model, and the model is a better task for finishing building target detection. Coefficient of regression loss of two detection frames +.>The method comprises the steps of carrying out a first treatment on the surface of the The current step should pay attention to separating the building object from the background, thus +.>Making the overall loss more biased toward separating the background from the building target; since the building cannot be classified accurately only by image features,/->Loss causes a large disturbance of the whole model, but on the other hand +.>The loss can help the detection frame to return better to a certain extent, so the +.>Loss, but use +.>To balance this loss.

After determining the detection frames of the buildings contained in the rectangular areas, calculating the intersection ratio of the two detection frames aiming at the adjacent rectangular areas, judging whether the intersection ratio is larger than a preset threshold value, such as 0.7, if so, determining that the two detection frames correspond to the same building, merging the detection frames, and only reserving one detection frame, thereby solving the problem that the buildings are segmented due to image segmentation, and after merging the detection frames, determining the images of the buildings contained in the satellite images according to the detection frames contained in the satellite images.

S203, identifying the image of the building, determining whether the building is a residential building, and labeling the image of the residential building.

In the embodiment of the disclosure, the evacuation path planning device may perform recognition processing for each building image after determining the building image from the satellite image, determine whether the building included therein is a residential building, and label the image of the residential building.

In an exemplary implementation manner of the disclosed embodiment, the evacuation path planning device may identify an image of a building by using a pre-trained classification model, where the classification model may select a support vector machine (support vector machines, SVM) model, and the classification model may be trained based on preset training data, where the training data includes an image of the building and a corresponding label of whether the building is a resident building.

S204, determining whether the building is a disaster avoidance facility or not based on the image and the labeling result of the building.

In the embodiment of the disclosure, the evacuation path planning device may further determine whether the building is a disaster avoidance facility after determining whether the image of the building included in the satellite image and the image of each building are images of residential buildings, so as to screen out the images of the disaster avoidance facility.

S205, determining the actual position of the disaster avoidance facility based on the position of the disaster avoidance facility in the satellite image.

In the embodiment of the disclosure, the evacuation path planning device may determine the positions of the disaster avoidance facilities in the satellite image after determining the images of the disaster avoidance facilities included in the satellite image, so as to determine the actual positions of the disaster avoidance facilities according to the conversion relationship between the satellite image and the actual positions in space.

According to the embodiment of the disclosure, the satellite image of the preset area where the current position is located is acquired, target detection is carried out on the satellite image, the image of a building contained in the satellite image is determined, the image of the building is identified, whether the building is a resident building or not is determined, the image of the resident building is marked, whether the building is a disaster avoidance facility is determined based on the image of the building and the marking result, the actual position of the disaster avoidance facility is determined based on the position of the disaster avoidance facility in the satellite image, the disaster avoidance facility can be automatically identified according to the satellite image, the actual position of the disaster avoidance facility is further determined, the follow-up planning of evacuation paths based on the actual position of the disaster avoidance facility is facilitated, meanwhile, because the resident building is densely distributed, a single building is difficult to extract during target detection, a plurality of buildings are often taken as a main body, whether the resident building is judged to be the disaster avoidance facility, the method of judging whether the resident building is the disaster avoidance facility is the primary identification, the disaster avoidance facility judgment is carried out the second time, the disaster avoidance facility judgment accuracy can be improved, and the misjudgment probability is reduced.

Fig. 3 is a flowchart of a method for determining a disaster avoidance facility according to an embodiment of the present disclosure, and as shown in fig. 3, on the basis of the above embodiment, the disaster avoidance facility may be determined by the following method.

And S301, extracting features of the image of the building to obtain the features of the building.

In the embodiment of the disclosure, the evacuation path planning device may perform feature extraction processing on the image of the building after obtaining the image of the building to obtain features of the building, specifically, may implement feature extraction through a pre-trained feature extraction model, input the images of the building with different sizes into the feature extraction model to obtain a fixed size of the buildingThe feature extraction model of (2) consists of 5 convolution layers, 1 alignment (Region of Interest Align) layer and 1 up-sampling layer, wherein the convolution kernel size of all convolution networks is +.>Step size is 1 and fill is 1.

S302, acquiring the position of each building in the satellite image.

In the embodiment of the disclosure, the evacuation path planning device may determine a position of each building in the satellite image based on a position of the image of the building in the satellite image after obtaining the image of the building.

In an exemplary implementation manner of the embodiment of the disclosure, the evacuation path planning device may determine, with reference to the entire preset area, a relative position of each building in the area, where the relative position coordinate of the upper left corner of the preset area is (0, 0) and the relative position coordinate of the lower right corner is (1, 1), and the position of the building is represented by a center point position of the building image.

S303, calculating the attention between the building and other buildings based on the first labeling result of the building and the first position in the satellite image and the second labeling result of other buildings and the second position in the satellite image for each building.

In the embodiment of the disclosure, the evacuation path planning device may acquire, for each building, a first labeling result of whether the building is a residential building and a first position of the building in the satellite image, and a second labeling result of whether other buildings except the building are residential buildings and a second position of other buildings in the satellite image after determining the position of the building in the satellite image, and calculate a spatial attention between the building and the other buildings according to the above information.

In an exemplary implementation manner of the embodiment of the present disclosure, the evacuation path planning apparatus may combine the labeling result of each building with the position in the satellite image to obtain a triplet (x, y, t), where x, y is the position of the building in the satellite image, t is the labeling result, t=0 or 1, and when the building is a residential building, t=0, and vice versa t=1. For the building i and other buildings j, the attention between i and j can be calculated) And (/ ->) Sending into a neural network to obtain two +.>Spatial feature vector +.>And->. For->Transformation using Q matrix, for +.>Transforming with K matrix to obtain two +.>Calculating the inner product of q and k to obtain the attention A (i, j) of building i to building j, i.e.>。

And S304, updating the characteristics of the building based on the attention between the building and other buildings.

In the embodiment of the disclosure, after determining the attention between the building and other buildings, the evacuation path planning device may fuse the features of the building with the features of the other buildings based on the attention, so as to update the features of the building.

In an exemplary implementation of the disclosed embodiment, the evacuation path planning apparatus may normalize the attention using a Softmax function after obtaining the attention of the building i with respect to other buildings, to obtain the weight magnitude I.e. +.>Use weight +.>The characteristic fusion is carried out in the following specific fusion mode:

wherein the method comprises the steps ofFor convolutional network layer, +.>The convolution kernel is +.>，/>The convolution kernel is. Updated features->The size is +.>N is all buildings, j is other buildings.

S305, inputting the updated characteristics into a pre-trained building classification model to obtain a classification result of the building output by the building classification model.

The building classification model in the embodiment of the disclosure may be understood as a multi-classifier, the classification result is a multi-dimensional vector, each dimension corresponds to the probability of one building type, the vector dimension is the same as the number of types, and the building types may include seven types, namely, a hospital, a primary school, a middle school, a disaster avoidance facility, a residential building, a commercial building and other types of buildings.

In the embodiment of the disclosure, the evacuation path planning device may input the updated features into the pre-trained building classification model after obtaining the updated features, so as to obtain a classification result of the building output by the building classification model.

In an exemplary implementation manner of the disclosed embodiment, the building classification model is composed of 7 dual-layer fully-connected networks and one activation (Sigmoid) layer, the 7 dual-layer fully-connected networks respectively process 7 channels of the feature, and then the probability is obtained through normalization of the Sigmoid layer, and the 7 channels correspond to 7 probabilities respectively: probability of a hospital as a target, probability of a primary school as a target, probability of disaster avoidance facilities as a target, probability of a residential building as a target, probability of a commercial building as a target, and probability of other types of buildings as a target. The disaster avoidance facilities are further divided into hospitals, primary schools and middle schools, and the disaster avoidance facilities are actually used for forming multi-task learning, so that a better classifying learning effect can be achieved.

Wherein, in training the building classification model, the following cross entropy loss function may be used:

alternatively to this, the method may comprise,，/>，/>，/>-/>the 7 probabilities are respectively corresponding, and are not limited herein.

S306, determining whether the building is a disaster avoidance facility based on the classification result of the building.

In this embodiment of the present disclosure, after obtaining a classification result output by a building classification model, the evacuation path planning device may determine whether a building is a disaster avoidance facility according to the classification result, specifically, the evacuation path planning device may determine, according to a numerical value of each dimension in the classification result, a probability that the building is a building type of each building, determine a building type with a maximum probability as a building type of the building, and further determine whether the building type of the building is the disaster avoidance facility.

According to the embodiment of the disclosure, the characteristics of the buildings are obtained by extracting the characteristics of the images of the buildings, the position of each building in the satellite image is obtained, for each building, based on the first labeling result of the building, the first position in the satellite image, the second labeling result of other buildings and the second position in the satellite image, attention between the building and other buildings is calculated, the characteristics of the building are updated based on the attention between the building and other buildings, the updated characteristics are input into a pre-trained building classification model, the classification result of the building output by the building classification model is obtained, whether the building is a disaster avoidance facility is determined based on the classification result of the building, and as the spatial distribution of the regional building and the type of the building show a certain correlation, for example, hospitals are always distributed in areas with more residential buildings.

FIG. 4 is a flow chart of a method of training a deep reinforcement learning model provided by an embodiment of the present disclosure. As shown in fig. 4, on the basis of the above-described embodiment, the deep reinforcement learning model may be trained as follows.

S4001, acquiring training data, and performing gridding processing on the training data, wherein the training data comprises road network data, disaster avoidance facility data and risk grade data in a preset area.

Gridding in the embodiments of the present disclosure may be understood as representing the spatial distribution of features in the form of a two-dimensional matrix.

In the embodiment of the disclosure, the evacuation path planning device may collect training data in advance, specifically may acquire road network data, disaster avoidance facility data and risk level data in a preset area, and perform gridding processing after obtaining corresponding data. Specifically, disaster avoidance facility data may be obtained by identifying satellite images as described above, and risk level data may be obtained by simulating storm surge inundation data, similar to S102.

In an exemplary implementation of the disclosed embodiments, the cell size used in the meshing process may beWhen the road network data is gridded, a binary two-dimensional matrix can be used for representing the vector road, and the vector road passes through The grid is 1, the non-passing grid is 0, and specifically, the grid of the road network data can be realized by carrying out the grid of each vector forming each road in the road network. When gridding disaster avoidance facility data, a binary two-dimensional matrix can be used for representing disaster avoidance facilities, 0 represents no disaster avoidance facilities in the current grid, 1 represents disaster avoidance facilities in the current grid, specifically, the matrix can be initialized to 0, each disaster avoidance facility can be traversed in sequence, the grid where the center point of the disaster avoidance facility is located is taken as the center, and the square area value with the side length of 11 in the matrix is set to 1. When the grid processing is carried out on the risk level data, firstly, the grid processing is carried out on the regional submerged water depth, the submerged water depths of 9 points are sampled in the cells, the average value of the submerged water depths of the 9 points is used as the submerged water depth value of the cell, and then the risk level data of the cell is determined according to the submerged water depth value of the cell.

S4002, for each grid, determining a positional action relation table corresponding to the grid, where the positional action relation table is used to record a next action effective with respect to a previous action when the previous action is an action in an arbitrary forward direction, and the actions include actions in eight forward directions.

In this embodiment of the present disclosure, for each grid, there are at most eight grids adjacent to each other, so that an action of moving from one grid to an adjacent grid may have eight directions, if a current grid is a, a grid at a previous time is B, a grid at a next time is C, a previous action is used to move from grid B to grid a, and a next action is used to move from grid a to grid C, then grid C cannot be adjacent to grid B, that is, when moving from grid B to grid C through grid a by the previous action and the next action, grid C cannot be a grid that can be reached from grid B by only one step of action, at this time, it may be ensured that the next action is valid with respect to the previous action, and the evacuation path planning device may determine, for each grid, a positional action relation table corresponding to the grid, which is valid with respect to the previous action when the positional action relation table is used to record the action of which the previous action is an action of any forward direction based on the above rule.

S4003, calculating the Chebyshev distance between the grid and the nearest disaster avoidance facility grid, and determining the Chebyshev distance as a first distance corresponding to the grid.

In the embodiment of the disclosure, because the Chebyshev distance calculation speed is high, the method can be used for roughly estimating the linear distance ,) And (/ ->，/>) Chebyshev distance between>。

In the embodiment of the disclosure, after obtaining disaster avoidance facility data, the evacuation path planning device may determine, for each grid, a closest disaster avoidance facility grid, calculate a chebyshev distance between the grid and the closest disaster avoidance facility grid, and determine the chebyshev distance as a first distance corresponding to the grid.

S4004, randomly determining a position in a preset area as a starting position, and determining a starting grid where the starting position is located.

In the embodiment of the disclosure, the evacuation path planning device may randomly determine a position as a starting position in a preset area when training the deep reinforcement learning model, and determine a grid where the starting position is located as a starting grid.

In an exemplary implementation manner of the embodiment of the present disclosure, when training the deep reinforcement learning model, the evacuation path planning apparatus may randomly determine a plurality of starting positions in a preset area, so as to perform multiple rounds of training or verification on the deep reinforcement learning model, so that the deep learning model has good performance for any starting position in the preset area.

S4005, determining the start grid as the current grid, and determining neighboring grids neighboring the current grid.

In an embodiment of the present disclosure, the evacuation path planning apparatus may start executing the continuous decision problem after determining the initial mesh, and specifically, may determine the initial mesh as the current mesh and determine at most eight meshes adjacent to the current mesh as adjacent meshes.

S4006, for each time step, a position action relation table corresponding to the current grid, road network data, disaster avoidance facility data, risk level data and a first distance corresponding to the current grid and the adjacent grids are obtained, and are input to a deep reinforcement learning network, and the deep reinforcement learning network determines a first adjacent grid from the adjacent grids.

In the embodiment of the disclosure, after determining the current grid and the adjacent grid, the evacuation path planning device may acquire a position and action relation table corresponding to the current grid, road network data, disaster avoidance facility data, risk level data and a first distance corresponding to the current grid and the adjacent grid in a time step, input the acquired data into a deep reinforcement learning network, and output the deep reinforcement learning network with one Is>The 8 values of the periphery of the matrix represent the value of the 8 actions, the model selects the adjacent grid corresponding to the action with the highest value as the first adjacent grid, when the initial grid is determined to be the current grid, the current grid does not have the previous action, the position action relation table cannot be searched to determine the next action which is effective relative to the previous action, when the action with the highest value is selected, the action corresponding to the maximum value is directly selected from the 8 values corresponding to the 8 actions, when the current grid is not the initial grid, the current grid has the previous action, the position action relation table needs to be searched according to the previous action to determine the set of the next action which is effective relative to the previous action, and when the action with the highest value is selected, the next action which is effective relative to the previous action is determined fromAnd selecting the action with the highest corresponding value from the set as the next action, and further determining a first adjacent grid corresponding to the next action.

S4007, randomly determining a second adjacent grid from the adjacent grids based on the position action relation table corresponding to the current grid.

In the embodiment of the disclosure, the evacuation path planning device may determine, by the deep reinforcement learning network, a first adjacent grid at the same time, and randomly determine, according to a position and action relation table corresponding to the current grid, one grid from the adjacent grids as a second adjacent grid. Specifically, the evacuation path planning device may randomly determine a second adjacent mesh from the adjacent meshes when the current mesh is the initial mesh, and determine a set of next actions effective with respect to the previous action according to the position action relation table corresponding to the current mesh and the previous action when the current mesh is not the initial mesh, and randomly select the adjacent mesh corresponding to one action in the set as the second adjacent mesh.

S4008, selecting one from the first neighboring mesh and the second neighboring mesh as a target mesh to be reached next based on the preset probability.

In the embodiment of the disclosure, the preset probability may be understood as a probability of selecting a second adjacent grid from a first adjacent grid output by a model and a randomly determined second adjacent grid, setting the preset probability may enable the model to have a free exploration capability, and in an exploration process, performance of the model is improved.

In the embodiment of the disclosure, the evacuation path planning device may select, according to a preset probability, one mesh from the first adjacent mesh and the second adjacent mesh as a target mesh to be reached by the next action.

S4009, determining the target grid as the current grid, and repeatedly executing the step of determining the next arrived target grid until the next arrived target grid is the nearest disaster avoidance facility grid.

In the embodiment of the present disclosure, after determining the target mesh, the evacuation path planning apparatus may determine the target mesh as a current mesh when making a next decision, execute a next decision process to determine a next arriving target mesh, specifically execute the steps of determining an adjacent mesh adjacent to the current mesh in S4005, and S4006 to S4008 until the determined next arriving target mesh is a nearest disaster avoidance facility mesh, and may determine that the path planning is completed.

In an exemplary implementation manner of the embodiment of the present disclosure, the evacuation path planning apparatus may perform the loop process of S4005-S4009 after determining the starting position a and the corresponding starting grid a, determine a grid path from the starting grid a to the nearest disaster avoidance facility grid B, and re-perform the loop process of S4005-S4009 for the starting grid a after determining the grid path, determine a new grid path until the number of determined grid paths from the starting grid a to the nearest disaster avoidance facility grid B is equal to the preset number, so as to facilitate the subsequent calculation of the cumulative rewards of the multiple grid paths, adjust the network parameters, or evaluate the network performance.

S4010, calculating a cumulative prize from the initial grid to the nearest disaster avoidance facility grid based on a preset prize calculation rule.

The rewards in the embodiments of the present disclosure may be understood as parameters for evaluating the grid path from the starting grid to the nearest disaster avoidance facility grid, the larger the rewards, the better the evaluation result.

In the embodiment of the disclosure, the evacuation path planning device may determine that the mesh path from the initial mesh to the nearest disaster avoidance facility mesh is planned when the next arriving target mesh is the nearest disaster avoidance facility mesh, and may calculate the cumulative rewards from the initial mesh to the nearest disaster avoidance facility mesh according to information of all meshes traversed by the mesh path based on a preset reward calculation rule.

In one exemplary implementation of the disclosed embodiments, the reward calculation rule may include: every time step passes, giving a basic reward; giving a risk reward based on the value of the risk reward corresponding to the risk level data of the next arrived target grid; giving an arrival reward in the case that the next arriving target grid is the nearest disaster avoidance facility grid; in the case that the first distance corresponding to the next arriving destination grid is smaller than the first distance corresponding to the current grid, the distance rewards are given, wherein the basic rewards and the risk rewards are negative rewards, the arrival rewards and the distance rewards are positive rewards, the basic rewards and the arrival rewards are set to arrive at the disaster avoidance points with the minimum number of steps (namely, the grid number passed by the grid path is minimum), a large number of positive arrival rewards (such as +2000) are given after arriving at the nearest disaster avoidance facility grid, and each time step is passed, one destination grid is given a negative basic rewards (such as-1), the risk rewards are set to make the advancing direction towards the grid which is safer and easier to pass as much as possible, different risk level data correspond to the different risk grid values, such as 0 for the risk rewards corresponding to the first risk level, the risk level two corresponds to the value of-4, the risk level three corresponds to the value of-8, the risk level four corresponds to the value of-16, the risk level five corresponds to the value of-32, and the distance is given to the current grid in the first distance (such as-1) which is given to the current grid in the next destination grid, and the distance can be given to the current grid in the first distance (such as-1).

S4011, adjusting parameters of the deep reinforcement learning network based on the jackpot prize to maximize the jackpot prize.

In the embodiment of the disclosure, the evacuation path planning device may adjust parameters of the deep reinforcement learning network according to the cumulative rewards after calculating the cumulative rewards of all grids passing from the initial grid to the nearest disaster avoidance facility grid, so as to maximize the cumulative rewards and obtain a trained deep reinforcement learning model.

In an exemplary implementation manner of the embodiment of the present disclosure, the evacuation path planning apparatus may continuously adjust parameters of the deep reinforcement learning network during the model training process, so that a cumulative prize corresponding to a mesh path determined by the network after the parameters are adjusted is larger, for example, the cumulative prize corresponding to a mesh path determined by the deep reinforcement learning network is significantly larger than a cumulative prize corresponding to N mesh paths determined by randomly determining a target mesh, and at this time, it may be determined that the deep reinforcement learning network is trained completely, and a deep reinforcement learning model is obtained.

In another exemplary implementation manner of the embodiments of the present disclosure, when the deep reinforcement learning network is a DQN network, the DQN network includes an evaluation network and a target network with the same structure, parameters in the evaluation network are copied to the target network every preset round, the target network is updated periodically, the evacuation path planning device may calculate a loss of the deep reinforcement learning network, and the network parameters are updated by using a gradient descent method using the loss value. The loss function is calculated as follows:

Wherein,is a discount factor, is between 0 and 1, < ->For parameters of the target network->For evaluating parameters of the network->For the prize value of a single time step according to the prize calculation rules +.>Representing the location of the output of the target network +.>Middle action->Value of->Representing the position of the evaluation network output +.>Middle action->Is of value (c).

According to the embodiment of the disclosure, training data is obtained and gridding processing is carried out on the training data, the training data comprises road network data, disaster avoidance facility data and risk grade data in a preset area, a position action relation table corresponding to the grids is determined for each grid, the position action relation table is used for recording the next action which is effective relative to the previous action and comprises eight actions in the previous direction when the previous action is in any advancing direction, chebyshev distances between the grids and the nearest disaster avoidance facility grid are calculated, the Chebyshev distances are determined to be first distances corresponding to the grids, a position is randomly determined in the preset area as a starting position, the starting grid where the starting position is located is determined, the starting grid is determined to be the current grid, adjacent grids adjacent to the current grid are determined, and for each time step, obtaining a position action relation table corresponding to the current grid, road network data, disaster avoidance facility data, risk level data and first distance corresponding to the current grid and the adjacent grids, inputting the position action relation table to a depth reinforcement learning network, determining a first adjacent grid from the adjacent grids by the depth reinforcement learning network, randomly determining a second adjacent grid from the adjacent grids based on the position action relation table corresponding to the current grid, selecting one as a next arriving target grid from the first adjacent grid and the second adjacent grid based on a preset probability, determining the target grid as the current grid, repeatedly executing the step of determining the next arriving target grid until the next arriving target grid is the nearest disaster avoidance facility grid, calculating accumulated rewards from the initial grid to the nearest disaster avoidance facility grid based on a preset rewards calculation rule, parameters of the deep reinforcement learning network are adjusted based on the jackpot prize so as to maximize the jackpot prize, and the influence of disaster avoidance facilities and risk levels on path planning can be comprehensively considered by the model, so that the shortest path which is high in safety and easy to pass and leads to the disaster avoidance facilities is planned, the convergence speed of the model is improved, and the training process is accelerated.

Fig. 5 is a flowchart of a method for determining a first neighboring mesh according to an embodiment of the present disclosure, and as shown in fig. 5, the first neighboring mesh may be determined as follows based on the above-described embodiment.

S501, downsampling the road network data subjected to the gridding treatment to obtain first road network data.

Downsampling in the embodiments of the present disclosure may be understood as a process of converting a plurality of adjacent grids into one grid, thereby reducing the number of grids contained in a region.

In the embodiment of the disclosure, after determining the starting position and the starting grid, the evacuation path planning device may first downsample the road network data after the gridding processing to obtain first road network data.

In one exemplary implementation of the disclosed embodiments, the downsampling magnification isI.e. in the original cell size +.>At this time, the downsampled cell size is +.>After the downsampling is completed, a low-resolution image which is 64 times smaller than the original grid image is obtained, each grid of the low-resolution image is mapped with a unique block area in the high-resolution image, and optionally, the downsampling is performed by using the method of maximum value samplingThe area of the starting trellis diagram takes the maximum value as the value of the down-sampled trellis.

S502, determining a starting point grid and an ending point grid which correspond to the starting grid and the nearest disaster avoidance facility grid respectively in first road network data.

In the embodiment of the disclosure, after obtaining the first path network data with low resolution, the evacuation path planning device may determine a grid corresponding to the initial grid with high resolution under low resolution as a starting grid, and determine a grid corresponding to the nearest disaster avoidance facility grid with high resolution under low resolution as an ending grid.

S503, determining the shortest path from the starting point grid to the end point grid based on the first path network data, and determining the grid through which the shortest path passes as an auxiliary grid.

In the embodiment of the disclosure, after determining the start point grid and the end point grid, the evacuation path planning device may plan a shortest path from the start point grid to the end point grid according to the first road network data, and determine a grid through which the shortest path passes at a low resolution as the auxiliary grid.

In one exemplary implementation of the disclosed embodiments, the evacuation path planning apparatus may determine the shortest path using a breadth-first algorithm.

S504, determining the area corresponding to the auxiliary grid in the preset area as a high-value area.

In the embodiment of the disclosure, since the shortest path is a low-resolution path, one shortest path may correspond to a plurality of high-resolution grid paths, and one optimal high-resolution grid path may correspond to only one low-resolution shortest path, and the low-resolution shortest path may roughly reflect the trend and direction of the optimal high-resolution grid path, the optimal high-resolution grid path may only pass through a part of the area, which is a high-value area corresponding to the auxiliary grid through which the low-resolution shortest path passes, so that the optimal high-resolution grid path determined in the preset area is the same as the optimal high-resolution grid path determined in the high-value area, and the grid path determined in the high-value area may reduce the exploration space and reduce the data processing amount.

In the embodiment of the present disclosure, after determining the auxiliary grid with low resolution, the evacuation path planning apparatus may determine, in the original preset area, an area corresponding to the auxiliary grid as a high-value area.

In an exemplary implementation manner of the embodiment of the present disclosure, the evacuation path planning device may restore the auxiliary mesh to a high resolution according to a correspondence between the low-resolution mesh and the high-resolution mesh during the downsampling process, so as to obtain a high-value mesh under the high resolution, where an area corresponding to the high-value mesh is a high-value area.

In another exemplary implementation of the disclosed embodiments, since the high value region may be obtained by mapping and expanding the shortest path passing auxiliary grid back to the high resolution image, for each auxiliary gridMapping back to high resolution map and expanding to obtain high value region +.>The following conditions are satisfied:

alternatively to this, the method may comprise,。

s505, determining a first adjacent grid from adjacent grids by the deep reinforcement learning network in the high value area.

In embodiments of the present disclosure, the evacuation path planning apparatus may limit the deep reinforcement learning network to select the first neighboring mesh within the high-value area when one first neighboring mesh is determined from among the neighboring meshes by the deep reinforcement learning network.

According to the embodiment of the disclosure, the first road network data is obtained by downsampling the road network data subjected to gridding processing, in the first road network data, the starting point grid and the end point grid corresponding to the starting point grid and the nearest disaster avoidance facility grid respectively are determined, the shortest path from the starting point grid to the end point grid is determined based on the first road network data, the grid through which the shortest path passes is determined as an auxiliary grid, the area corresponding to the auxiliary grid in the preset area is determined as a high-value area, and in the high-value area, the deep reinforcement learning network determines a first adjacent grid from the adjacent grids, so that the exploration space can be reduced, the data processing amount is reduced, and the model training speed is further improved.

Fig. 6 is a schematic structural diagram of an evacuation path planning apparatus according to an embodiment of the present disclosure. As shown in fig. 6, the evacuation path planning apparatus 600 includes: the system comprises a position acquisition module 610, an information acquisition module 620 and a path determination module 630, wherein the position acquisition module 610 is used for acquiring current position information; an information obtaining module 620, configured to obtain, based on the location information, information of a surrounding environment from a database updated in real time, where the information of the surrounding environment includes road network information, disaster avoidance facility information, a regional risk level, and a distance from a nearest disaster avoidance facility; a path determining module 630, configured to determine an evacuation path in a range of the surrounding environment based on the information of the surrounding environment and a pre-trained deep reinforcement learning model, where the evacuation path is at least a partial path from a current location to the nearest disaster avoidance facility.

Optionally, the evacuation path planning apparatus 600 further includes: the image acquisition module is used for acquiring satellite images of a preset area where the current position is located; the object detection module is used for carrying out object detection on the satellite image and determining an image of a building contained in the satellite image; the labeling module is used for identifying the image of the building, determining whether the building is a residential building or not, and labeling the image of the residential building; the classification module is used for determining whether the building is a disaster avoidance facility or not based on the image and the labeling result of the building; and the position determining module is used for determining the actual position of the disaster avoidance facility based on the position of the disaster avoidance facility in the satellite image.

Optionally, the classification module includes: the feature extraction unit is used for extracting features of the image of the building to obtain features of the building; a position acquisition unit configured to acquire a position of each building in the satellite image; an attention determining unit configured to calculate, for each building, attention between the building and the other buildings based on a first labeling result of the building and a first position in the satellite image, and a second labeling result of the other buildings and a second position in the satellite image; a feature updating unit configured to update a feature of the building based on attention between the building and the other building; the model classification unit is used for inputting the updated characteristics into a pre-trained building classification model to obtain a classification result of the building output by the building classification model; and the classification determining unit is used for determining whether the building is disaster avoidance facilities or not based on the classification result of the building.

Optionally, the path determining module 630 includes: the model output unit is used for inputting the information of the surrounding environment into the deep reinforcement learning model to obtain the advancing direction of the output of the deep reinforcement learning model; and a path determining unit configured to determine the evacuation path based on the advancing direction and the road network information.

Optionally, the evacuation path planning apparatus 600 further includes a training module, where the training module includes: the training data acquisition unit is used for acquiring training data and carrying out gridding treatment on the training data, wherein the training data comprises road network data, disaster avoidance facility data and risk grade data in the preset area; a first determining unit, configured to determine, for each grid, a position and action relation table corresponding to the grid, where the position and action relation table is used to record, when a previous action is an action in an arbitrary forward direction, a next action that is effective with respect to the previous action, and the actions include actions in eight forward directions; the distance calculation unit is used for calculating the Chebyshev distance between the grid and the nearest disaster avoidance facility grid and determining the Chebyshev distance as a first distance corresponding to the grid; a second determining unit, configured to randomly determine a position as a starting position in the preset area, and determine a starting grid where the starting position is located; a third determining unit configured to determine the start grid as a current grid and determine an adjacent grid adjacent to the current grid; the model output unit is used for acquiring a position action relation table corresponding to the current grid, road network data, disaster avoidance facility data, risk level data and a first distance corresponding to the current grid and the adjacent grid aiming at each time step, inputting the position action relation table to a deep reinforcement learning network, and determining a first adjacent grid from the adjacent grids by the deep reinforcement learning network; a random selection unit, configured to randomly determine a second neighboring mesh from the neighboring meshes based on a position action relation table corresponding to the current mesh; a fourth determining unit configured to select one of the first neighboring mesh and the second neighboring mesh as a target mesh to be reached next based on a preset probability; an execution unit, configured to determine the target grid as a current grid, and repeatedly execute the step of determining a next-arriving target grid until the next-arriving target grid is the nearest disaster avoidance facility grid; a reward calculation unit for calculating a cumulative reward from the start grid to the nearest disaster avoidance facility grid based on a preset reward calculation rule; and a parameter adjustment unit for adjusting parameters of the deep reinforcement learning network based on the jackpot to maximize the jackpot.

Optionally, the reward calculation rule includes: every time step passes, giving a basic reward; giving the risk rewards based on the value of the risk rewards corresponding to the risk level data of the next arrived target grid; giving an arrival reward in the case that the next arriving target grid is the nearest disaster avoidance facility grid; and giving a distance reward when the first distance corresponding to the next arrived target grid is smaller than the first distance corresponding to the current grid, wherein the basic reward and the risk reward are negative rewards, and the arrival reward and the distance reward are positive rewards.

Optionally, the training module further includes: the down-sampling unit is used for down-sampling the road network data subjected to the gridding treatment to obtain first road network data; a fifth determining unit, configured to determine, in the first road network data, a start point grid and an end point grid corresponding to the start grid and the nearest disaster avoidance facility grid, respectively; a sixth determining unit configured to determine, based on the first road network data, a shortest path from the start point mesh to the end point mesh, and determine a mesh through which the shortest path passes as an auxiliary mesh; a seventh determining unit, configured to determine an area corresponding to the auxiliary grid in the preset area as a high-value area; the model output unit is specifically configured to determine, by the deep reinforcement learning network, a first neighboring mesh from the neighboring meshes within the high-value region.

The evacuation path planning device provided in this embodiment can execute the method described in any one of the above embodiments, and the execution manner and the beneficial effects of the method are similar, and are not described herein again.

As shown in fig. 7, the computer device may include a processor 710 and a memory 720 storing computer program instructions.

In particular, the processor 710 may include a Central Processing Unit (CPU), or an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or may be configured to implement one or more integrated circuits of embodiments of the present application.

Memory 720 may include mass storage for information or instructions. By way of example, and not limitation, memory 720 may include a Hard Disk Drive (HDD), floppy Disk Drive, flash memory, optical Disk, magneto-optical Disk, magnetic tape, or universal serial bus (Universal Serial Bus, USB) Drive, or a combination of two or more of these. Memory 720 may include removable or non-removable (or fixed) media, where appropriate. Memory 720 may be internal or external to the integrated gateway device, where appropriate. In a particular embodiment, the memory 720 is a non-volatile solid state memory. In a particular embodiment, the Memory 720 includes Read-Only Memory (ROM). The ROM may be mask-programmed ROM, programmable ROM (PROM), erasable PROM (Electrical Programmable ROM, EPROM), electrically erasable PROM (Electrically Erasable Programmable ROM, EEPROM), electrically rewritable ROM (Electrically Alterable ROM, EAROM), or flash memory, or a combination of two or more of these, where appropriate.

The processor 710 reads and executes the computer program instructions stored in the memory 720 to perform the steps of the evacuation path planning method provided by the embodiments of the present disclosure.

In one example, the computer device may also include a transceiver 730 and a bus 740. As shown in fig. 7, the processor 710, the memory 720, and the transceiver 730 are connected and communicate with each other through a bus 740.

Bus 740 includes hardware, software, or both. By way of example, and not limitation, the buses may include an accelerated graphics port (Accelerated Graphics Port, AGP) or other graphics BUS, an enhanced industry standard architecture (Extended Industry Standard Architecture, EISA) BUS, a Front Side BUS (FSB), a HyperTransport (HT) interconnect, an industry standard architecture (Industrial Standard Architecture, ISA) BUS, an InfiniBand interconnect, a Low Pin Count (LPC) BUS, a memory BUS, a micro channel architecture (Micro Channel Architecture, MCa) BUS, a peripheral control interconnect (Peripheral Component Interconnect, PCI) BUS, a PCI-Express (PCI-X) BUS, a serial advanced technology attachment (Serial Advanced Technology Attachment, SATA) BUS, a video electronics standards association local (Video Electronics Standards Association Local Bus, VLB) BUS, or other suitable BUS, or a combination of two or more of these. Bus 740 may include one or more buses, where appropriate. Although embodiments of the present application describe and illustrate a particular bus, the present application contemplates any suitable bus or interconnect.

The embodiments of the present disclosure also provide a computer-readable storage medium, which may store a computer program, which when executed by a processor, causes the processor to implement the evacuation path planning method provided by the embodiments of the present disclosure.

The storage medium may, for example, comprise a memory 720 of computer program instructions executable by the processor 710 of the evacuation path planning device to perform the evacuation path planning method provided by the embodiments of the present disclosure. The computer programs described above may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server.

It should be noted that in this document, relational terms such as "first" and "second" and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The foregoing is merely a specific embodiment of the disclosure to enable one skilled in the art to understand or practice the disclosure. The present disclosure is not intended to be limited to the embodiments shown and described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. An evacuation path planning method, characterized in that the method comprises:

acquiring current position information;

2. The method of claim 1, wherein prior to the obtaining information of the surrounding environment from the real-time updated database, the method further comprises:

acquiring a satellite image of a preset area where the current position is located;

performing target detection on the satellite image, and determining an image of a building contained in the satellite image;

identifying the image of the building, determining whether the building is a resident building or not, and labeling the image of the resident building;

determining whether the building is a disaster avoidance facility or not based on the image and the labeling result of the building;

And determining the actual position of the disaster avoidance facility based on the position of the disaster avoidance facility in the satellite image.

3. The method of claim 2, wherein the determining whether the building is a disaster avoidance facility based on the image and labeling results of the building comprises:

extracting features of the image of the building to obtain features of the building;

acquiring the position of each building in the satellite image;

calculating, for each building, attention between the building and the other buildings based on a first annotation result of the building and a first location in the satellite image, and a second annotation result of the other buildings and a second location in the satellite image;

updating a feature of the building based on attention between the building and the other buildings;

inputting the updated characteristics into a pre-trained building classification model to obtain a classification result of the building output by the building classification model;

based on the classification result of the building, determining whether the building is a disaster avoidance facility.

4. The method of claim 1, wherein the determining an evacuation path within the surrounding environment based on the information of the surrounding environment and a pre-trained deep reinforcement learning model comprises:

Inputting the information of the surrounding environment into the deep reinforcement learning model to obtain the advancing direction of the output of the deep reinforcement learning model;

and determining the evacuation path based on the advancing direction and the road network information.

5. The method of claim 2, wherein the deep reinforcement learning model is trained based on the steps of:

acquiring training data, and performing gridding processing on the training data, wherein the training data comprises road network data, disaster avoidance facility data and risk grade data in the preset area;

determining a position action relation table corresponding to each grid, wherein the position action relation table is used for recording the next action which is effective relative to the previous action when the previous action is the action in any advancing direction, and the actions comprise eight actions in the advancing direction;

calculating Chebyshev distance between the grid and the nearest disaster avoidance facility grid, and determining the Chebyshev distance as a first distance corresponding to the grid;

randomly determining a position as a starting position in the preset area, and determining a starting grid where the starting position is located;

Determining the initial grid as a current grid, and determining adjacent grids adjacent to the current grid;

for each time step, acquiring a position action relation table corresponding to the current grid, road network data, disaster avoidance facility data, risk level data and a first distance corresponding to the current grid and the adjacent grid, inputting the position action relation table to a deep reinforcement learning network, and determining a first adjacent grid from the adjacent grids by the deep reinforcement learning network;

randomly determining a second adjacent grid from the adjacent grids based on a position action relation table corresponding to the current grid;

selecting one from the first adjacent grid and the second adjacent grid as a target grid to be reached next based on a preset probability;

determining the target grid as a current grid, and repeatedly executing the step of determining the next arrived target grid until the next arrived target grid is the nearest disaster avoidance facility grid;

calculating a cumulative prize from the starting grid to the nearest disaster avoidance facility grid based on a preset prize calculation rule;

parameters of the deep reinforcement learning network are adjusted based on the jackpot to maximize the jackpot.

6. The method of claim 5, wherein the reward calculation rule comprises:

every time step passes, giving a basic reward;

giving the risk rewards based on the value of the risk rewards corresponding to the risk level data of the next arrived target grid;

giving an arrival reward in the case that the next arriving target grid is the nearest disaster avoidance facility grid;

and giving a distance reward when the first distance corresponding to the next arrived target grid is smaller than the first distance corresponding to the current grid, wherein the basic reward and the risk reward are negative rewards, and the arrival reward and the distance reward are positive rewards.

7. The method of claim 5, wherein the determining a location within the predetermined area at random is performed as a starting location, and wherein after determining a starting grid in which the starting location is located, the method further comprises:

downsampling the road network data subjected to gridding treatment to obtain first road network data;

determining a starting point grid and an ending point grid which correspond to the starting grid and the nearest disaster avoidance facility grid respectively in the first road network data;

Determining a shortest path from the starting point grid to the end point grid based on the first path network data, and determining a grid through which the shortest path passes as an auxiliary grid;

determining an area corresponding to the auxiliary grid in the preset area as a high-value area;

the determining, by the deep reinforcement learning network, a first adjacent mesh from the adjacent meshes, comprising:

a first adjacent mesh is determined from the adjacent meshes by the deep reinforcement learning network within the high value region.

8. An evacuation path planning apparatus, the apparatus comprising:

9. A computer device, comprising: a memory; a processor; a computer program; wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method of any of claims 1-7.

10. A computer readable storage medium, wherein a computer program is stored in the storage medium, which, when executed by a processor, implements an evacuation path planning method according to any one of claims 1-7.