CN112330043B

CN112330043B - Evacuation path planning method and system combining Q-learning and multi-swarm algorithm

Info

Publication number: CN112330043B
Application number: CN202011284240.0A
Authority: CN
Inventors: 刘弘; 赵缘; 李信金; 孟祥栋
Original assignee: Shandong Normal University
Current assignee: Shandong Normal University
Priority date: 2020-11-17
Filing date: 2020-11-17
Publication date: 2022-10-18
Anticipated expiration: 2040-11-17
Also published as: CN112330043A

Abstract

The invention discloses an evacuation path planning method and system combining Q-learning and multi-swarm algorithm, comprising the following steps: initializing an evacuation crowd and an evacuation exit for the constructed evacuation scene model; and (4) performing macroscopic path planning by adopting a multi-swarm algorithm, guiding and driving the individuals to reach an evacuation exit by combining with the movement of the microscopic population until the number of persons in the evacuation exit is equal to the total number of persons, and finishing the evacuation process. Dividing an original single population into a plurality of sub-populations, introducing a plurality of search strategies to construct a search strategy pool, performing feedback adjustment on the search strategies by combining a Q table, constructing a self-adaptive selection mechanism of the search strategies, and realizing search of a global scope; by continuously reducing the range of the neighborhood of the following bees, the following bees are searched under the guidance of top E excellent solutions, and the updating success rate is improved; the searching direction of the detection bees is adaptively adjusted to the side where the updating is more hopeful, the searching blindness is avoided, collision-free crowd movement is generated by combining the social force model, and the evacuation efficiency is improved.

Description

Evacuation path planning method and system combining Q-learning and multi-swarm algorithm

Technical Field

The invention relates to the technical field of crowd evacuation path planning, in particular to an evacuation path planning method and system combining Q-learning and multi-swarm algorithm.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

Along with the development of society, people are gathered in indoor places such as large tourist attractions, cinemas and supermarkets, and potential safety hazards exist. In the intensive environment of crowd, dangerous accident in case takes place, the pedestrian avoids the danger source under the condition of being in a hurry, seeks the escape exit, but the unordered nature of evacuation not only can reduce evacuation efficiency, probably causes the collision moreover, tramples the accident, forms the secondary damage to the personnel of evacuating. Therefore, how to effectively guide people to evacuate in the crowd dense place, improve evacuation speed and reduce the occurrence rate of danger at the same time becomes a hot point problem. In crowd evacuation, people are often in a state of confusion and blindness, and usually do not know the exit most beneficial for evacuation, nor the most suitable path to reach this exit. Therefore, guidance for path planning is required for crowd evacuation.

The Artificial Bee Colony Algorithm (ABC) is a Colony intelligent algorithm, and is applied to route planning in crowd evacuation to show a good effect because of its excellent ability to solve optimization problems and the self-organization and self-learning properties of Artificial Bee colonies similar to those of crowds. The artificial bee colony algorithm is an intelligent optimization algorithm inspired by the foraging behavior of bees. The interior of one bee colony is divided into three bees, namely a hiring bee, a following bee and a reconnaissance bee, the hiring bee is better for exploration, the following bee is better for development, and the reconnaissance bee is responsible for searching for a new position in the global range and simulating the foraging behavior of the bees for search and optimization. Because the concept is simple and easy to realize, the artificial bee colony algorithm has been successfully used for the related problems in the application fields of path planning, image classification, image segmentation and the like;

however, the inventor finds that the application of the artificial bee colony algorithm to crowd evacuation simulation at present has some disadvantages, namely, the problem of blindness to path search in crowd evacuation path planning, the problem of incapability of meeting the requirement of matching different path searches for individuals in different states, and the problem of slow convergence rate of the algorithm in path planning calculation, which are not beneficial to efficiently and orderly evacuating the crowd as a whole, and is not beneficial to truly reproducing the crowd evacuation state in real life.

Disclosure of Invention

In order to solve the problems, the invention provides an evacuation path planning method and system combining Q-learning and multi-swarm algorithm, an original single population is divided into a plurality of sub-populations, a search strategy pool is constructed by introducing a plurality of search strategies, the search strategies are subjected to feedback adjustment by combining a Q table, a self-adaptive selection mechanism of the search strategies is constructed, and the search of the global scope is realized; by continuously reducing the range of the neighborhood of the following bees, the following bees are searched under the guidance of top E excellent solutions, and the updating success rate is improved; the searching direction of the reconnaissance bees is adaptively adjusted to be carried out towards the side which is hopeful to be updated successfully, so that the searching blindness is avoided; the micro crowd movement guidance of the social force model is combined to generate the crowd movement without collision, so that the evacuation efficiency is improved, and the evacuation time is shortened.

In order to achieve the purpose, the invention adopts the following technical scheme:

in a first aspect, the present invention provides an evacuation path planning method combining Q-learning and multi-swarm algorithm, including:

initializing an evacuation crowd and an evacuation exit for the constructed evacuation scene model;

adopting a multi-swarm algorithm to carry out macroscopic path planning, combining the microscopic crowd movement guidance to drive the individuals to reach an evacuation exit until the number of people in the evacuation exit is equal to the total number of people, and finishing the evacuation process;

the multi-swarm algorithm comprises the steps of dividing evacuation crowds into a plurality of groups, calculating fitness according to the distance between the position of an individual in each group and an evacuation outlet and the crowding degree of the evacuation outlet, determining a search strategy according to the fitness value and the quality value of the search strategy to be selected in a Q table, determining the next position according to the fitness value and the quality value of the search strategy to be selected, enabling leader ranges selectable by followers in the groups to be E leaders with the best fitness value in each group, and obtaining new positions by adopting an improved scout search strategy after the leaders are converted into scouts.

In a second aspect, the present invention provides an evacuation path planning system combining Q-learning and multi-swarm algorithm, comprising:

the model initialization module is used for initializing the evacuation crowd and an evacuation exit for the constructed evacuation scene model;

the evacuation simulation module is used for planning a macro path by adopting a multi-swarm algorithm, guiding and driving the individuals to reach an evacuation outlet by combining the micro crowd movement until the number of people in the evacuation outlet is equal to the total number of people, and finishing the evacuation process;

and the path planning module is used for dividing the evacuation crowd into a plurality of groups by a multi-bee colony algorithm, calculating the fitness according to the distance between the position of each individual in the group and the evacuation exit and the congestion degree of the evacuation exit, determining the search strategy according to the fitness value and the quality value of the search strategy to be selected in the Q table, determining the next position according to the search strategy, wherein the selectable leader range of the followers in the group is E leaders with the best fitness value in the group, and after the leaders are converted into scouts, obtaining new positions by adopting an improved scout search strategy.

In a third aspect, the present invention provides an electronic device comprising a memory and a processor, and computer instructions stored on the memory and executed on the processor, wherein when the computer instructions are executed by the processor, the method of the first aspect is performed.

In a fourth aspect, the present invention provides a computer readable storage medium for storing computer instructions which, when executed by a processor, perform the method of the first aspect.

Compared with the prior art, the invention has the following beneficial effects:

aiming at the problems that the Artificial Bee Colony algorithm is low in convergence speed, blindness exists in searching and the like, the invention provides an improved Multi-Bee Colony algorithm (MABCQ), an original single population is divided into a plurality of sub-populations which are searched in parallel, and the searching speed is increased; a search strategy pool is established by introducing various search strategies, the search strategies are established in a self-adaptive mode by combining a Q table in Q-learning, search schemes are selected for individuals at different positions in a self-adaptive mode, the search requirements of the individuals at different stages are met, and full search in the global range is achieved.

The invention continuously reduces the neighborhood range of the follower bees, so that the follower bees are searched under the guidance of top E excellent solutions, the update success rate is improved by improving the neighborhood range of the follower bees and the search equation of the scout bees, the blind search is avoided, and the performance of the artificial bee colony algorithm is effectively improved.

The invention avoids the blindness of searching by adaptively adjusting the searching direction of the scout bees to the side which is hopeful to be updated successfully.

The improved multi-swarm MABCQ algorithm is applied to crowd evacuation path planning, the next position of a pedestrian is determined, the micro crowd motion guidance of the social force model is utilized to drive the pedestrian to move, the evacuation of the crowd in different scenes is simulated, the evacuation scene is truly and visually reproduced, the crowd evacuation efficiency is improved, the evacuation time is shortened, and the multi-swarm MABCQ algorithm has certain guiding significance for the formulation of a crowd evacuation path planning scheme.

Advantages of additional aspects of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are included to illustrate an exemplary embodiment of the invention and not to limit the invention.

Fig. 1 is a flowchart of an evacuation path planning method combining Q-learning and multi-swarm algorithm according to embodiment 1 of the present invention;

fig. 2 (a) -2 (b) are schematic diagrams of two evacuation scenarios provided in embodiment 1 of the present invention;

FIG. 3 is a schematic diagram of a Q table structure provided in embodiment 1 of the present invention;

fig. 4 (a) -4 (c) are schematic diagrams of evacuation stages in a three-door scenario according to embodiment 1 of the present invention;

fig. 5 (a) -5 (c) are schematic diagrams of evacuation stages in a scene with obstacles according to embodiment 1 of the present invention.

The specific implementation mode is as follows:

the invention is further described with reference to the following figures and examples.

It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise, and it should be understood that the terms "comprises" and "comprising", and any variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The embodiments and features of the embodiments of the invention may be combined with each other without conflict.

Example 1

As shown in fig. 1, the present embodiment provides an evacuation path planning method combining Q-learning and multi-swarm algorithm, including:

s1: initializing an evacuation crowd and an evacuation exit for the constructed evacuation scene model;

s2: adopting a multi-swarm algorithm to carry out macroscopic path planning, combining the microscopic crowd movement guidance to drive the individuals to reach an evacuation exit until the number of people in the evacuation exit is equal to the total number of people, and finishing the evacuation process;

the multi-swarm algorithm comprises the steps of dividing evacuation crowds into a plurality of groups, calculating the fitness according to the distance between the position of an individual in the group and an evacuation outlet and the crowding degree of the evacuation outlet, determining a search strategy according to the fitness value and the quality value of the search strategy to be selected in a Q table, determining the next position according to the fitness value and the quality value of the search strategy to be selected, wherein the leader range selectable by followers in the groups is E leaders with the best fitness value in the groups, and after the leaders are converted into scouts, adopting an improved scout search strategy to obtain new positions.

In the step S1, the real evacuation scene parameters and the crowd parameters are obtained to construct the evacuation scene model, as shown in fig. 2 (a) -2 (b), the embodiment adopts an unobstructed evacuation scene with three doors and an obstructed evacuation scene with two doors, and a counter is arranged at each evacuation exit of the initialized evacuation scene model, and is used for counting the number of crowd individuals evacuated at each evacuation exit and determining the degree of congestion at the evacuation exits.

Route planning is an important part in crowd evacuation, and a suitable route for pedestrians to pass through needs to be made according to the positions and environments of the pedestrians. The evacuation scene is often complex and has the influence of various dynamic and static factors, the artificial bee colony algorithm ABC is an effective method for path planning, the conditions of obstacles, congestion and the like in the evacuation scene can be considered, and pedestrians can make timely route selection.

In view of the above disadvantages, the present embodiment provides a multi-swarm algorithm MABCQ, which performs path planning for a pedestrian based on the MABCQ algorithm to determine a next position of the pedestrian; the method specifically comprises the following steps:

s2-1-1: dividing the evacuation crowd into a plurality of groups according to the positions of the individuals, ensuring that the number of the individuals in each group is similar, and selecting a group leader;

specifically, the method comprises the following steps: a single population in the original ABC algorithm is divided into a plurality of sub-populations, and each sub-population completes the conversion of individual roles in the sub-populations to realize parallel search. The sub-population division mode adopts a maximum and minimum distance method based on Euclidean distance;

(1) Taking a first individual as a 1 st central point, selecting an individual farthest from the 1 st central point as a 2 nd central point, sequentially determining other central points by the same method until no new central point is generated, and finally classifying the other individuals into the nearest central point according to the minimum distance principle;

the pseudo code that performs the classification is:

(2) After the group is divided, calculating the fitness value of each individual in the group, selecting the first half of individuals with higher fitness values as leaders and the rest as followers;

the calculation mode of the fitness value fitness is as follows:

fitness＝1/(α·distance+β·crowd) (1)

wherein, distance is the distance from the individual position to the selected evacuation exit, crown is the crowdedness of the evacuation exit, and alpha and beta are weighting factors.

S2-1-2: establishing a search strategy pool according to four different search strategies, increasing the diversity of search, simultaneously combining a Q table of Q-learning to perform feedback adjustment on the advantages and disadvantages of the search strategies, establishing a self-adaptive selection mechanism of the search strategies, and completing position updating according to the selected optimal search strategy;

specifically, the method comprises the following steps: in this embodiment, a Q table is used to design a search policy selection mechanism, and as shown in fig. 3, rows and columns of the Q table represent a fitness level of a leader and selectable search policies, respectively; by means of the process of selecting actions according to states and reacting the results generated by the actions on the states in Q learning, the search strategy is strengthened continuously, so that the most suitable search strategy for individuals with different fitness grades is obtained, and different search requirements of different individuals are met; the method comprises the following specific steps:

(1) Initializing a Q table with n rows and t columns in each group, wherein n is the number of leaders in the group, and t is the number of search strategies;

(2) Calculating the fitness value of the leader according to the current position of the leader, sequencing the leader from large to small according to the fitness value, and obtaining the state S corresponding to each row of the Q table _r There are t search strategies that can be selected, i.e., the individuals located on the r-th row;

(3) Probability of each search strategy l being selected and quality function Q of the search strategy (S) _r ,a _l ) Correlation, as shown in equation (2), the higher the Q value of the search strategy, the greater the probability of being selected;

(4) Updating the location according to the selected search strategy and keeping the better one between the new location and the old location, while calculating again the Q value according to the updated location:

Q(s _t ,a _t )＝Q(s _t ,a _t )+α·[R _t +γmaxQ(s _t+1 ,a)-Q(s _t ,a _t )] (3)

wherein, Q(s) _t ,a _t ) Representing Q value, alpha is learning rate, gamma is reward coefficient, R is reward value, maxQ(s) _t+1 A) is the next state S _t+1 The medium and maximum Q values; the reported value R is:

R＝fitness _new -fitness _old (4)

wherein, fitness _new And fitness _old The fitness of the new location and the fitness of the old location are respectively.

(5) The follower selects a search strategy consistent with the leader followed by the follower and sequentially determines the next position;

(6) After each iteration, all the leaders are reordered in the group according to the fitness value of the new position, each leader obtains a new ordering state, and each individual selects a search strategy according to the Q value in the new state row and updates the position in the next iteration.

In this embodiment, in addition to the original search strategy, three types of search methods are added to construct a search strategy pool, which specifically includes:

(1) An ABC original search strategy, as shown in formula (5), the strategy adopts the current individual position and a neighbor individual position randomly selected in the current group to obtain a new position;

wherein x is _i,j Is the current individual position, x _k,j Is a randomly selected neighbor individual position, v, within the current group _i,j Is the new location that is obtained by the update,

is a random number, and

(2) Updating strategy with formula (6) as leader, with current position x _i,j For search starting point, neighbor individual position x within two randomly selected groups _k1,j 、x _k2,j To update the location v under the direction of _i,j Two random numbers phi and

respectively is phi e [ -1,1]，

The search equation of the follower is shown as formula (7), and the neighborhood search equation is improved by adding the global optimal individual position guide part:

wherein v is _i,j Is the new position, x, found _i,j Is the current position, x _best,j Is a global optimum position, x _k1,j Is the position of the neighbor individual within the randomly selected group, and the range of the random number is phi epsilon [ -1,1]，

(3) The leader updates the location using equation (8):

wherein v is _i,j Is the new position, x, found by the search _k,j 、x _k2,j Is randomly selected within the groupNeighbor individual position of (1), x _best,j Is a globally optimal individual location, random number

The search strategy for the corresponding follower is:

wherein v is _i,j Is the new position, x, found by the search _i,j Is the current position, x _k2,j Is the randomly selected neighbor individual position, x, within the group _best,j Is a globally optimal individual position, a random number

(4) The leader randomly selects two neighbors in the group, takes the optimal individual as a search starting point, improves a neighborhood search formula, and has strong development capability as shown in formula (10) because a candidate solution is generated near the current optimal individual, and accelerates the convergence speed under the guidance of the global optimal individual;

wherein v is _i,j Is the new position, x, found _best,j Is a globally optimal individual position, x _k,j 、x _k2,j Is the position of a randomly selected neighbor individual within a group, a random number

Accordingly, the search strategy for the follower is:

wherein v is _i,j Is to search forNew position, x, obtained by cable _i,j Is the current position, x _best,j Is a globally optimal individual position, x _k,j Is the position of a randomly selected neighbor individual within a group, a random number

In the embodiment, by introducing a Q table to design a selection mechanism of the search strategy, the effect of the search strategy on improving the fitness of the current individual position and the performance condition of the search strategy in the conventional search are comprehensively considered, the advantages and the disadvantages of the search strategy are evaluated and measured sufficiently, and the obtained feedback value can objectively evaluate the advantages and the disadvantages of the search strategy, so that the subsequent individuals can be helped to select the search strategy better; meanwhile, the level state of the leader is updated after each iteration, and the individual selects a search strategy according to the feedback value of the state, so that the search requirements of the individual in different states are met.

S2-1-3: and for the followers in the group, gradually narrowing the range of selectable leaders to be the top E leaders as evacuation progresses, and improving the success rate of location updating.

In original ABC, a follower searches further near a selected leader, and simultaneously selects a neighbor in a global scope to guide updating, but the quality of the neighbors is not uniform, and previous researches show that searching near a good neighbor may obtain a better position;

therefore, in the step S2-1-3, the range of the neighbor individuals selected by the follower is limited to be the top E in the fitness ranking; e is calculated as follows:

wherein NP is the number of leaders in the group, iter is the current iteration number; with the progress of iteration, the range of E is continuously reduced, the searched areas are concentrated around a plurality of excellent leaders, the positions near the leaders are sufficiently searched, and the updating success rate is improved.

S2-1-4: when the leader in the group can not effectively lead the rest individuals, the leader is converted into a scout, and an improved search equation is adopted to find the excellent position again.

In the original ABC, a leader gives up a bad position and then becomes a reconnaissance person, and a new position is searched in the global scope according to a formula (5), but the search is random and has certain blindness;

therefore, in the step S2-1-4, the search strategy of the investigator is improved, and the search direction is adaptively adjusted between the upper limit and the lower limit, and the investigator moves to the side with better fitness, so that the investigator is more likely to update to a good position, and unnecessary search and search blindness are avoided; the improved scout search strategy is as follows:

wherein v is _i,j Is the new location found, l _j And u _j Upper and lower limits, respectively, of the j-th dimension, fitness _l And fitness _j Respectively representing the fitness; new search strategies ensure that scout bee updates are made towards more promising directions and thus more likely to adapt to a better location.

In the step S2, the individuals are driven to reach an evacuation exit by combining with the guidance of the movement of the microscopic population, the movement of the individuals is simulated by adopting a social force model, the collision among the individuals is avoided, and the individuals are driven to reach the next position determined by the path planning until the individuals reach the evacuation exit;

specifically, the method comprises the following steps: in the social force model, the mass is m _i The individual i changes its speed:

wherein,

is an individualThe direction of the speed is given by a vector pointing from the current position of the individual i to the next position, the force

Desired force by an individual on a target

And interaction force

Two parts are formed.

Individuals in an evacuation scene all have a target location and therefore all have a corresponding desired direction, which is given by a vector, pointing from the current position of the individual i to the target position; driven by the subjective expectation of an individual, the individual tends to be at a desired velocity v _wi (t) walking to a target location where the actual speed of movement of the individual will differ from the desired speed due to interaction factors of the crowd in evacuation; therefore, the temperature of the molten steel is controlled,

s2-2-1: the expected force of an individual is expressed as:

wherein,

is the actual walking speed, τ, of the individual _i Is the time of the reaction, and is,

is the desired direction of the individual.

S2-2-2: interaction to avoid collision of the individual with a wall or other object in motion, including the forces of obstacles and the individual

And interaction force between individuals

The interaction force between the individuals is expressed as attraction or repulsion, and if the individuals are too close to other individuals, the interaction force between the individuals is repulsion, so that the space requirement of the individuals is ensured; when the distance between the individuals is larger, the acting force between the individuals becomes the attraction force; 4 (a) -4 (c) show schematic diagrams of evacuation stages in an unobstructed scene with three doors;

the interaction force between individuals is formulated as:

wherein,

is the interaction force between the individual alpha and the individual beta,

and with

Is the intensity of the force of action,

and with

For the range of influence of the applied force, r _αβ -d _αβ Is the distance between the individuals and is,

is a unit vector pointing from β to α, f _αβ Is a state factor.

S2-2-3: in the walking process, in order to ensure the safety and comfort of the individual, the individual can keep a certain distance from the obstacle, so that the individual can receive an acting force of the obstacle, and a schematic diagram of evacuation stages in a scene with the obstacle is shown in fig. 5 (a) -5 (c);

the force is expressed as:

wherein,

is the force of the obstacle on the individual alpha, A _αB Is the strength of the force of the obstacle on the individual alpha, B _αB Is the range of influence of the force of the obstacle, r _α -d _αB Is the distance of the individual to the obstacle,

is a unit vector pointed to α by the boundary.

In conclusion, on the basis of the expected force, the repulsive force of the obstacle to the individual and the interaction force among the individuals, the individual is guided to operate, collision is avoided, the individual moves to a target point, and the motion phenomena of 'fast or slow' and 'outlet arching' are shown.

The embodiment adopts a social force model to simulate the motion of an individual, and the social force model describes an individual motion dynamics model by using personal motivation and environmental constraints. The Helbin et al is inspired by the fact that behavior changes are guided by social strength in the social field, and the main factors influencing individual movement are summarized as follows: the expected force of an individual to reach a certain destination, which force tends to the individual to choose a way as close as possible; repulsive force between the individual and the strange individual and the wall, and the force keeps a certain safe distance between the individuals; individuals are affected by the attractions among friends and things being located at different perspectives. Despite the simple concept of the proposed social force model, it effectively simulates many observed phenomena, reproducing the self-organizing behavior of individuals.

Example 2

The present embodiment provides an evacuation path planning system combining Q-learning and multi-swarm algorithm, including:

and the path planning module is used for dividing the evacuation crowd into a plurality of groups by a multi-swarm algorithm, calculating the fitness according to the distance between the position of the individual in the group and the evacuation outlet and the congestion degree of the evacuation outlet, determining the search strategy according to the fitness value and the quality value of the search strategy to be selected in the Q table, determining the next position according to the search strategy, and obtaining new positions by adopting an improved scout search strategy after the leaders selectable in the groups are E leaders with the best fitness values in the groups and the leaders are converted into scouts.

It should be noted that the modules correspond to the steps described in embodiment 1, and the modules are the same as the corresponding steps in the implementation examples and application scenarios, but are not limited to the disclosure in embodiment 1. It should be noted that the modules described above as part of a system may be implemented in a computer system such as a set of computer executable instructions.

In further embodiments, there is also provided:

an electronic device comprising a memory and a processor and computer instructions stored on the memory and executed on the processor, the computer instructions when executed by the processor performing the method of embodiment 1. For brevity, no further description is provided herein.

It should be understood that in this embodiment, the processor may be a central processing unit CPU, and the processor may also be other general purpose processors, digital signal processors DSP, application specific integrated circuits ASIC, off-the-shelf programmable gate arrays FPGA or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and so on. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory may include both read-only memory and random access memory, and may provide instructions and data to the processor, and a portion of the memory may also include non-volatile random access memory. For example, the memory may also store device type information.

A computer readable storage medium storing computer instructions which, when executed by a processor, perform the method described in embodiment 1.

The method in embodiment 1 may be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in the processor. The software modules may be located in ram, flash, rom, prom, or eprom, registers, among other storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor. To avoid repetition, it is not described in detail here.

Those of ordinary skill in the art will appreciate that the various illustrative elements, i.e., algorithm steps, described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes will occur to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive efforts by those skilled in the art based on the technical solution of the present invention.

Claims

1. An evacuation path planning method combining Q-learning and multi-swarm algorithm, comprising:

adopting a multi-swarm algorithm to plan a macroscopic path, combining the microscopic crowd movement guidance to drive the individuals to reach an evacuation exit until the number of people evacuated at the evacuation exit is equal to the total number of people, and ending the evacuation process;

the multi-swarm algorithm comprises the steps of dividing the evacuation crowd into a plurality of groups, calculating the fitness according to the distance between the position of an individual in the group and an evacuation outlet and the crowding degree of the evacuation outlet, determining a search strategy according to the fitness value and the quality value of the search strategy to be selected in a Q table, and determining the next position according to the search strategy, wherein the specific steps are as follows:

(3) Probability of each search strategy l being selected and quality function Q value Q (S) of the search strategy _r ,a _l ) Correlation, as shown in equation (2), the higher the Q value of the search strategy, the greater the probability of being selected;

Q(s _t ,a _t )＝Q(s _t ,a _t )+α·[R _t +γmax Q(s _t+1 ,a)-Q(s _t ,a _t )] (3)

wherein, Q(s) _t ,a _t ) Represents Q value, alpha is learning rate, gamma is reward coefficient, R is return value, max Q(s) _t+1 A) is the next state S _t+1 Medium maximum Q value; the return value R is:

R＝fitness _new -fitness _old (4)

wherein, fitness _new And fitness _old Respectively the fitness of the new position and the fitness of the old position;

(6) After each iteration, all leaders are reordered in the group according to the fitness value of the new position, each leader obtains a new ordering state, and each individual selects a search strategy and updates the position according to the Q value in the new state row in the next iteration;

the leader range selectable by the followers in the group is E leaders with the best fitness value in the group, and after the leaders are converted into scouts, an improved scout search strategy is adopted to obtain new positions;

matching the search strategies to be selected in the constructed search strategy pool, wherein the search strategy pool comprises:

obtaining a new position according to the current position of the individual and a neighbor individual position randomly selected in the current group; updating the positions under the guidance of the positions of the neighbor individuals in the two randomly selected groups by taking the current position of the leader as a search starting point; taking a randomly selected neighbor individual position in the group as a search starting point, and obtaining a new position according to two randomly selected neighbor individual positions and the optimal individual position in the group; and taking the optimal individual position as a search starting point, and obtaining a new position according to two randomly selected neighbor individual positions in the group.

2. An evacuation path planning method combining Q-learning and multi-swarm algorithm according to claim 1, wherein the dividing of the evacuated crowd into a plurality of groups comprises:

taking the first individual as a first central point, selecting the individual farthest from the first central point as a second central point, and sequentially determining other central points by the same method until no new central point exists;

classifying the rest individuals into the nearest central point according to the minimum distance principle;

and calculating the fitness value of each individual in the group, sorting the fitness values, selecting a leader, and taking the rest of the fitness values as followers.

3. An evacuation route planning method combining Q-learning and multi-swarm algorithm as claimed in claim 1, wherein after determining the next location, re-determining the search strategy and updating the location according to the fitness value of the new location and the quality value of the search strategy to be selected in the Q-table.

4. An evacuation path planning method combining Q-learning and multi-swarm algorithm according to claim 1, wherein the followers in the group select the same search strategy as the following leader, and the followers narrow the selectable range of the leader as the evacuation process progresses to E leaders with the best fitness in the group; e is calculated as follows:

and the range of E is continuously reduced along with the iteration.

5. An evacuation path planning method combining Q-learning and multi-swarm algorithm as claimed in claim 1, wherein the improved scout search strategy is to adaptively adjust the search direction of the scout, moving to the side with better fitness.

6. An evacuation path planning method combining Q-learning and multi-swarm algorithm as claimed in claim 1, wherein the social force model is used to conduct micro crowd movement guidance, and the individuals are driven to the next position according to the individual expectation force, the repulsion force of the obstacle to the individuals and the interaction force between the individuals until the individuals reach the evacuation exit.

7. An evacuation path planning system combining Q-learning and multi-swarm algorithm, comprising:

the route planning module is used for dividing the evacuation crowd into a plurality of groups by a multi-bee colony algorithm, calculating the fitness according to the distance between the position of an individual in the group and an evacuation outlet and the crowding degree of the evacuation outlet, and determining a search strategy according to the fitness value and the quality value of the search strategy to be selected in the Q table so as to determine the next position, and the specific steps are as follows:

wherein, Q(s) _t ,a _t ) Represents Q value, alpha is learning rate, gamma is reward coefficient, R is return value, max Q(s) _t+1 A) is the next state S _t+1 Medium maximum Q value; the reported value R is:

R＝fitness _new -fitness _old (4)

the leader range selectable by the followers in the group is E leaders with the best fitness value in the group, and after the leaders are converted into scouts, improved scout search strategies are adopted to obtain new positions;

8. An electronic device comprising a memory and a processor and computer instructions stored on the memory and executed on the processor, the computer instructions when executed by the processor performing the method of any of claims 1-6.

9. A computer-readable storage medium storing computer instructions which, when executed by a processor, perform the method of any one of claims 1 to 6.