CN112330043B - Evacuation path planning method and system combining Q-learning and multi-swarm algorithm - Google Patents
Evacuation path planning method and system combining Q-learning and multi-swarm algorithm Download PDFInfo
- Publication number
- CN112330043B CN112330043B CN202011284240.0A CN202011284240A CN112330043B CN 112330043 B CN112330043 B CN 112330043B CN 202011284240 A CN202011284240 A CN 202011284240A CN 112330043 B CN112330043 B CN 112330043B
- Authority
- CN
- China
- Prior art keywords
- evacuation
- fitness
- search
- value
- individual
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000004422 calculation algorithm Methods 0.000 title claims abstract description 47
- 238000000034 method Methods 0.000 title claims abstract description 46
- 230000033001 locomotion Effects 0.000 claims abstract description 20
- 230000008569 process Effects 0.000 claims abstract description 12
- 230000003993 interaction Effects 0.000 claims description 10
- 238000004088 simulation Methods 0.000 claims description 4
- 230000006870 function Effects 0.000 claims description 3
- 238000012163 sequencing technique Methods 0.000 claims description 3
- 241000257303 Hymenoptera Species 0.000 abstract description 14
- 201000004569 Blindness Diseases 0.000 abstract description 8
- 230000007246 mechanism Effects 0.000 abstract description 5
- 238000001514 detection method Methods 0.000 abstract 1
- 238000010586 diagram Methods 0.000 description 6
- 230000009286 beneficial effect Effects 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 230000018109 developmental process Effects 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 238000005457 optimization Methods 0.000 description 3
- 230000006399 behavior Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000019637 foraging behavior Effects 0.000 description 2
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 description 1
- 229910000831 Steel Inorganic materials 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000009472 formulation Methods 0.000 description 1
- 238000003709 image segmentation Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000008450 motivation Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 239000010959 steel Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
- G06Q10/047—Optimisation of routes or paths, e.g. travelling salesman problem
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/004—Artificial life, i.e. computing arrangements simulating life
- G06N3/006—Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/26—Government or public services
- G06Q50/265—Personal security, identity or safety
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Tourism & Hospitality (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- General Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- Marketing (AREA)
- Development Economics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Biophysics (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Game Theory and Decision Science (AREA)
- Mathematical Physics (AREA)
- Entrepreneurship & Innovation (AREA)
- Biomedical Technology (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Security & Cryptography (AREA)
- Educational Administration (AREA)
- Primary Health Care (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses an evacuation path planning method and system combining Q-learning and multi-swarm algorithm, comprising the following steps: initializing an evacuation crowd and an evacuation exit for the constructed evacuation scene model; and (4) performing macroscopic path planning by adopting a multi-swarm algorithm, guiding and driving the individuals to reach an evacuation exit by combining with the movement of the microscopic population until the number of persons in the evacuation exit is equal to the total number of persons, and finishing the evacuation process. Dividing an original single population into a plurality of sub-populations, introducing a plurality of search strategies to construct a search strategy pool, performing feedback adjustment on the search strategies by combining a Q table, constructing a self-adaptive selection mechanism of the search strategies, and realizing search of a global scope; by continuously reducing the range of the neighborhood of the following bees, the following bees are searched under the guidance of top E excellent solutions, and the updating success rate is improved; the searching direction of the detection bees is adaptively adjusted to the side where the updating is more hopeful, the searching blindness is avoided, collision-free crowd movement is generated by combining the social force model, and the evacuation efficiency is improved.
Description
Technical Field
The invention relates to the technical field of crowd evacuation path planning, in particular to an evacuation path planning method and system combining Q-learning and multi-swarm algorithm.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
Along with the development of society, people are gathered in indoor places such as large tourist attractions, cinemas and supermarkets, and potential safety hazards exist. In the intensive environment of crowd, dangerous accident in case takes place, the pedestrian avoids the danger source under the condition of being in a hurry, seeks the escape exit, but the unordered nature of evacuation not only can reduce evacuation efficiency, probably causes the collision moreover, tramples the accident, forms the secondary damage to the personnel of evacuating. Therefore, how to effectively guide people to evacuate in the crowd dense place, improve evacuation speed and reduce the occurrence rate of danger at the same time becomes a hot point problem. In crowd evacuation, people are often in a state of confusion and blindness, and usually do not know the exit most beneficial for evacuation, nor the most suitable path to reach this exit. Therefore, guidance for path planning is required for crowd evacuation.
The Artificial Bee Colony Algorithm (ABC) is a Colony intelligent algorithm, and is applied to route planning in crowd evacuation to show a good effect because of its excellent ability to solve optimization problems and the self-organization and self-learning properties of Artificial Bee colonies similar to those of crowds. The artificial bee colony algorithm is an intelligent optimization algorithm inspired by the foraging behavior of bees. The interior of one bee colony is divided into three bees, namely a hiring bee, a following bee and a reconnaissance bee, the hiring bee is better for exploration, the following bee is better for development, and the reconnaissance bee is responsible for searching for a new position in the global range and simulating the foraging behavior of the bees for search and optimization. Because the concept is simple and easy to realize, the artificial bee colony algorithm has been successfully used for the related problems in the application fields of path planning, image classification, image segmentation and the like;
however, the inventor finds that the application of the artificial bee colony algorithm to crowd evacuation simulation at present has some disadvantages, namely, the problem of blindness to path search in crowd evacuation path planning, the problem of incapability of meeting the requirement of matching different path searches for individuals in different states, and the problem of slow convergence rate of the algorithm in path planning calculation, which are not beneficial to efficiently and orderly evacuating the crowd as a whole, and is not beneficial to truly reproducing the crowd evacuation state in real life.
Disclosure of Invention
In order to solve the problems, the invention provides an evacuation path planning method and system combining Q-learning and multi-swarm algorithm, an original single population is divided into a plurality of sub-populations, a search strategy pool is constructed by introducing a plurality of search strategies, the search strategies are subjected to feedback adjustment by combining a Q table, a self-adaptive selection mechanism of the search strategies is constructed, and the search of the global scope is realized; by continuously reducing the range of the neighborhood of the following bees, the following bees are searched under the guidance of top E excellent solutions, and the updating success rate is improved; the searching direction of the reconnaissance bees is adaptively adjusted to be carried out towards the side which is hopeful to be updated successfully, so that the searching blindness is avoided; the micro crowd movement guidance of the social force model is combined to generate the crowd movement without collision, so that the evacuation efficiency is improved, and the evacuation time is shortened.
In order to achieve the purpose, the invention adopts the following technical scheme:
in a first aspect, the present invention provides an evacuation path planning method combining Q-learning and multi-swarm algorithm, including:
initializing an evacuation crowd and an evacuation exit for the constructed evacuation scene model;
adopting a multi-swarm algorithm to carry out macroscopic path planning, combining the microscopic crowd movement guidance to drive the individuals to reach an evacuation exit until the number of people in the evacuation exit is equal to the total number of people, and finishing the evacuation process;
the multi-swarm algorithm comprises the steps of dividing evacuation crowds into a plurality of groups, calculating fitness according to the distance between the position of an individual in each group and an evacuation outlet and the crowding degree of the evacuation outlet, determining a search strategy according to the fitness value and the quality value of the search strategy to be selected in a Q table, determining the next position according to the fitness value and the quality value of the search strategy to be selected, enabling leader ranges selectable by followers in the groups to be E leaders with the best fitness value in each group, and obtaining new positions by adopting an improved scout search strategy after the leaders are converted into scouts.
In a second aspect, the present invention provides an evacuation path planning system combining Q-learning and multi-swarm algorithm, comprising:
the model initialization module is used for initializing the evacuation crowd and an evacuation exit for the constructed evacuation scene model;
the evacuation simulation module is used for planning a macro path by adopting a multi-swarm algorithm, guiding and driving the individuals to reach an evacuation outlet by combining the micro crowd movement until the number of people in the evacuation outlet is equal to the total number of people, and finishing the evacuation process;
and the path planning module is used for dividing the evacuation crowd into a plurality of groups by a multi-bee colony algorithm, calculating the fitness according to the distance between the position of each individual in the group and the evacuation exit and the congestion degree of the evacuation exit, determining the search strategy according to the fitness value and the quality value of the search strategy to be selected in the Q table, determining the next position according to the search strategy, wherein the selectable leader range of the followers in the group is E leaders with the best fitness value in the group, and after the leaders are converted into scouts, obtaining new positions by adopting an improved scout search strategy.
In a third aspect, the present invention provides an electronic device comprising a memory and a processor, and computer instructions stored on the memory and executed on the processor, wherein when the computer instructions are executed by the processor, the method of the first aspect is performed.
In a fourth aspect, the present invention provides a computer readable storage medium for storing computer instructions which, when executed by a processor, perform the method of the first aspect.
Compared with the prior art, the invention has the following beneficial effects:
aiming at the problems that the Artificial Bee Colony algorithm is low in convergence speed, blindness exists in searching and the like, the invention provides an improved Multi-Bee Colony algorithm (MABCQ), an original single population is divided into a plurality of sub-populations which are searched in parallel, and the searching speed is increased; a search strategy pool is established by introducing various search strategies, the search strategies are established in a self-adaptive mode by combining a Q table in Q-learning, search schemes are selected for individuals at different positions in a self-adaptive mode, the search requirements of the individuals at different stages are met, and full search in the global range is achieved.
The invention continuously reduces the neighborhood range of the follower bees, so that the follower bees are searched under the guidance of top E excellent solutions, the update success rate is improved by improving the neighborhood range of the follower bees and the search equation of the scout bees, the blind search is avoided, and the performance of the artificial bee colony algorithm is effectively improved.
The invention avoids the blindness of searching by adaptively adjusting the searching direction of the scout bees to the side which is hopeful to be updated successfully.
The improved multi-swarm MABCQ algorithm is applied to crowd evacuation path planning, the next position of a pedestrian is determined, the micro crowd motion guidance of the social force model is utilized to drive the pedestrian to move, the evacuation of the crowd in different scenes is simulated, the evacuation scene is truly and visually reproduced, the crowd evacuation efficiency is improved, the evacuation time is shortened, and the multi-swarm MABCQ algorithm has certain guiding significance for the formulation of a crowd evacuation path planning scheme.
Advantages of additional aspects of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are included to illustrate an exemplary embodiment of the invention and not to limit the invention.
Fig. 1 is a flowchart of an evacuation path planning method combining Q-learning and multi-swarm algorithm according to embodiment 1 of the present invention;
fig. 2 (a) -2 (b) are schematic diagrams of two evacuation scenarios provided in embodiment 1 of the present invention;
FIG. 3 is a schematic diagram of a Q table structure provided in embodiment 1 of the present invention;
fig. 4 (a) -4 (c) are schematic diagrams of evacuation stages in a three-door scenario according to embodiment 1 of the present invention;
fig. 5 (a) -5 (c) are schematic diagrams of evacuation stages in a scene with obstacles according to embodiment 1 of the present invention.
The specific implementation mode is as follows:
the invention is further described with reference to the following figures and examples.
It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise, and it should be understood that the terms "comprises" and "comprising", and any variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The embodiments and features of the embodiments of the invention may be combined with each other without conflict.
Example 1
As shown in fig. 1, the present embodiment provides an evacuation path planning method combining Q-learning and multi-swarm algorithm, including:
s1: initializing an evacuation crowd and an evacuation exit for the constructed evacuation scene model;
s2: adopting a multi-swarm algorithm to carry out macroscopic path planning, combining the microscopic crowd movement guidance to drive the individuals to reach an evacuation exit until the number of people in the evacuation exit is equal to the total number of people, and finishing the evacuation process;
the multi-swarm algorithm comprises the steps of dividing evacuation crowds into a plurality of groups, calculating the fitness according to the distance between the position of an individual in the group and an evacuation outlet and the crowding degree of the evacuation outlet, determining a search strategy according to the fitness value and the quality value of the search strategy to be selected in a Q table, determining the next position according to the fitness value and the quality value of the search strategy to be selected, wherein the leader range selectable by followers in the groups is E leaders with the best fitness value in the groups, and after the leaders are converted into scouts, adopting an improved scout search strategy to obtain new positions.
In the step S1, the real evacuation scene parameters and the crowd parameters are obtained to construct the evacuation scene model, as shown in fig. 2 (a) -2 (b), the embodiment adopts an unobstructed evacuation scene with three doors and an obstructed evacuation scene with two doors, and a counter is arranged at each evacuation exit of the initialized evacuation scene model, and is used for counting the number of crowd individuals evacuated at each evacuation exit and determining the degree of congestion at the evacuation exits.
Route planning is an important part in crowd evacuation, and a suitable route for pedestrians to pass through needs to be made according to the positions and environments of the pedestrians. The evacuation scene is often complex and has the influence of various dynamic and static factors, the artificial bee colony algorithm ABC is an effective method for path planning, the conditions of obstacles, congestion and the like in the evacuation scene can be considered, and pedestrians can make timely route selection.
In view of the above disadvantages, the present embodiment provides a multi-swarm algorithm MABCQ, which performs path planning for a pedestrian based on the MABCQ algorithm to determine a next position of the pedestrian; the method specifically comprises the following steps:
s2-1-1: dividing the evacuation crowd into a plurality of groups according to the positions of the individuals, ensuring that the number of the individuals in each group is similar, and selecting a group leader;
specifically, the method comprises the following steps: a single population in the original ABC algorithm is divided into a plurality of sub-populations, and each sub-population completes the conversion of individual roles in the sub-populations to realize parallel search. The sub-population division mode adopts a maximum and minimum distance method based on Euclidean distance;
(1) Taking a first individual as a 1 st central point, selecting an individual farthest from the 1 st central point as a 2 nd central point, sequentially determining other central points by the same method until no new central point is generated, and finally classifying the other individuals into the nearest central point according to the minimum distance principle;
the pseudo code that performs the classification is:
(2) After the group is divided, calculating the fitness value of each individual in the group, selecting the first half of individuals with higher fitness values as leaders and the rest as followers;
the calculation mode of the fitness value fitness is as follows:
fitness=1/(α·distance+β·crowd) (1)
wherein, distance is the distance from the individual position to the selected evacuation exit, crown is the crowdedness of the evacuation exit, and alpha and beta are weighting factors.
S2-1-2: establishing a search strategy pool according to four different search strategies, increasing the diversity of search, simultaneously combining a Q table of Q-learning to perform feedback adjustment on the advantages and disadvantages of the search strategies, establishing a self-adaptive selection mechanism of the search strategies, and completing position updating according to the selected optimal search strategy;
specifically, the method comprises the following steps: in this embodiment, a Q table is used to design a search policy selection mechanism, and as shown in fig. 3, rows and columns of the Q table represent a fitness level of a leader and selectable search policies, respectively; by means of the process of selecting actions according to states and reacting the results generated by the actions on the states in Q learning, the search strategy is strengthened continuously, so that the most suitable search strategy for individuals with different fitness grades is obtained, and different search requirements of different individuals are met; the method comprises the following specific steps:
(1) Initializing a Q table with n rows and t columns in each group, wherein n is the number of leaders in the group, and t is the number of search strategies;
(2) Calculating the fitness value of the leader according to the current position of the leader, sequencing the leader from large to small according to the fitness value, and obtaining the state S corresponding to each row of the Q table r There are t search strategies that can be selected, i.e., the individuals located on the r-th row;
(3) Probability of each search strategy l being selected and quality function Q of the search strategy (S) r ,a l ) Correlation, as shown in equation (2), the higher the Q value of the search strategy, the greater the probability of being selected;
(4) Updating the location according to the selected search strategy and keeping the better one between the new location and the old location, while calculating again the Q value according to the updated location:
Q(s t ,a t )=Q(s t ,a t )+α·[R t +γmaxQ(s t+1 ,a)-Q(s t ,a t )] (3)
wherein, Q(s) t ,a t ) Representing Q value, alpha is learning rate, gamma is reward coefficient, R is reward value, maxQ(s) t+1 A) is the next state S t+1 The medium and maximum Q values; the reported value R is:
R=fitness new -fitness old (4)
wherein, fitness new And fitness old The fitness of the new location and the fitness of the old location are respectively.
(5) The follower selects a search strategy consistent with the leader followed by the follower and sequentially determines the next position;
(6) After each iteration, all the leaders are reordered in the group according to the fitness value of the new position, each leader obtains a new ordering state, and each individual selects a search strategy according to the Q value in the new state row and updates the position in the next iteration.
In this embodiment, in addition to the original search strategy, three types of search methods are added to construct a search strategy pool, which specifically includes:
(1) An ABC original search strategy, as shown in formula (5), the strategy adopts the current individual position and a neighbor individual position randomly selected in the current group to obtain a new position;
wherein x is i,j Is the current individual position, x k,j Is a randomly selected neighbor individual position, v, within the current group i,j Is the new location that is obtained by the update,is a random number, and
(2) Updating strategy with formula (6) as leader, with current position x i,j For search starting point, neighbor individual position x within two randomly selected groups k1,j 、x k2,j To update the location v under the direction of i,j Two random numbers phi andrespectively is phi e [ -1,1],
The search equation of the follower is shown as formula (7), and the neighborhood search equation is improved by adding the global optimal individual position guide part:
wherein v is i,j Is the new position, x, found i,j Is the current position, x best,j Is a global optimum position, x k1,j Is the position of the neighbor individual within the randomly selected group, and the range of the random number is phi epsilon [ -1,1],
(3) The leader updates the location using equation (8):
wherein v is i,j Is the new position, x, found by the search k,j 、x k2,j Is randomly selected within the groupNeighbor individual position of (1), x best,j Is a globally optimal individual location, random number
The search strategy for the corresponding follower is:
wherein v is i,j Is the new position, x, found by the search i,j Is the current position, x k2,j Is the randomly selected neighbor individual position, x, within the group best,j Is a globally optimal individual position, a random number
(4) The leader randomly selects two neighbors in the group, takes the optimal individual as a search starting point, improves a neighborhood search formula, and has strong development capability as shown in formula (10) because a candidate solution is generated near the current optimal individual, and accelerates the convergence speed under the guidance of the global optimal individual;
wherein v is i,j Is the new position, x, found best,j Is a globally optimal individual position, x k,j 、x k2,j Is the position of a randomly selected neighbor individual within a group, a random number
Accordingly, the search strategy for the follower is:
wherein v is i,j Is to search forNew position, x, obtained by cable i,j Is the current position, x best,j Is a globally optimal individual position, x k,j Is the position of a randomly selected neighbor individual within a group, a random number
In the embodiment, by introducing a Q table to design a selection mechanism of the search strategy, the effect of the search strategy on improving the fitness of the current individual position and the performance condition of the search strategy in the conventional search are comprehensively considered, the advantages and the disadvantages of the search strategy are evaluated and measured sufficiently, and the obtained feedback value can objectively evaluate the advantages and the disadvantages of the search strategy, so that the subsequent individuals can be helped to select the search strategy better; meanwhile, the level state of the leader is updated after each iteration, and the individual selects a search strategy according to the feedback value of the state, so that the search requirements of the individual in different states are met.
S2-1-3: and for the followers in the group, gradually narrowing the range of selectable leaders to be the top E leaders as evacuation progresses, and improving the success rate of location updating.
In original ABC, a follower searches further near a selected leader, and simultaneously selects a neighbor in a global scope to guide updating, but the quality of the neighbors is not uniform, and previous researches show that searching near a good neighbor may obtain a better position;
therefore, in the step S2-1-3, the range of the neighbor individuals selected by the follower is limited to be the top E in the fitness ranking; e is calculated as follows:
wherein NP is the number of leaders in the group, iter is the current iteration number; with the progress of iteration, the range of E is continuously reduced, the searched areas are concentrated around a plurality of excellent leaders, the positions near the leaders are sufficiently searched, and the updating success rate is improved.
S2-1-4: when the leader in the group can not effectively lead the rest individuals, the leader is converted into a scout, and an improved search equation is adopted to find the excellent position again.
In the original ABC, a leader gives up a bad position and then becomes a reconnaissance person, and a new position is searched in the global scope according to a formula (5), but the search is random and has certain blindness;
therefore, in the step S2-1-4, the search strategy of the investigator is improved, and the search direction is adaptively adjusted between the upper limit and the lower limit, and the investigator moves to the side with better fitness, so that the investigator is more likely to update to a good position, and unnecessary search and search blindness are avoided; the improved scout search strategy is as follows:
wherein v is i,j Is the new location found, l j And u j Upper and lower limits, respectively, of the j-th dimension, fitness l And fitness j Respectively representing the fitness; new search strategies ensure that scout bee updates are made towards more promising directions and thus more likely to adapt to a better location.
In the step S2, the individuals are driven to reach an evacuation exit by combining with the guidance of the movement of the microscopic population, the movement of the individuals is simulated by adopting a social force model, the collision among the individuals is avoided, and the individuals are driven to reach the next position determined by the path planning until the individuals reach the evacuation exit;
specifically, the method comprises the following steps: in the social force model, the mass is m i The individual i changes its speed:
wherein,is an individualThe direction of the speed is given by a vector pointing from the current position of the individual i to the next position, the forceDesired force by an individual on a targetAnd interaction forceTwo parts are formed.
Individuals in an evacuation scene all have a target location and therefore all have a corresponding desired direction, which is given by a vector, pointing from the current position of the individual i to the target position; driven by the subjective expectation of an individual, the individual tends to be at a desired velocity v wi (t) walking to a target location where the actual speed of movement of the individual will differ from the desired speed due to interaction factors of the crowd in evacuation; therefore, the temperature of the molten steel is controlled,
s2-2-1: the expected force of an individual is expressed as:
wherein,is the actual walking speed, τ, of the individual i Is the time of the reaction, and is,is the desired direction of the individual.
S2-2-2: interaction to avoid collision of the individual with a wall or other object in motion, including the forces of obstacles and the individualAnd interaction force between individuals
The interaction force between the individuals is expressed as attraction or repulsion, and if the individuals are too close to other individuals, the interaction force between the individuals is repulsion, so that the space requirement of the individuals is ensured; when the distance between the individuals is larger, the acting force between the individuals becomes the attraction force; 4 (a) -4 (c) show schematic diagrams of evacuation stages in an unobstructed scene with three doors;
the interaction force between individuals is formulated as:
wherein,is the interaction force between the individual alpha and the individual beta,and withIs the intensity of the force of action,and withFor the range of influence of the applied force, r αβ -d αβ Is the distance between the individuals and is,is a unit vector pointing from β to α, f αβ Is a state factor.
S2-2-3: in the walking process, in order to ensure the safety and comfort of the individual, the individual can keep a certain distance from the obstacle, so that the individual can receive an acting force of the obstacle, and a schematic diagram of evacuation stages in a scene with the obstacle is shown in fig. 5 (a) -5 (c);
the force is expressed as:
wherein,is the force of the obstacle on the individual alpha, A αB Is the strength of the force of the obstacle on the individual alpha, B αB Is the range of influence of the force of the obstacle, r α -d αB Is the distance of the individual to the obstacle,is a unit vector pointed to α by the boundary.
In conclusion, on the basis of the expected force, the repulsive force of the obstacle to the individual and the interaction force among the individuals, the individual is guided to operate, collision is avoided, the individual moves to a target point, and the motion phenomena of 'fast or slow' and 'outlet arching' are shown.
The embodiment adopts a social force model to simulate the motion of an individual, and the social force model describes an individual motion dynamics model by using personal motivation and environmental constraints. The Helbin et al is inspired by the fact that behavior changes are guided by social strength in the social field, and the main factors influencing individual movement are summarized as follows: the expected force of an individual to reach a certain destination, which force tends to the individual to choose a way as close as possible; repulsive force between the individual and the strange individual and the wall, and the force keeps a certain safe distance between the individuals; individuals are affected by the attractions among friends and things being located at different perspectives. Despite the simple concept of the proposed social force model, it effectively simulates many observed phenomena, reproducing the self-organizing behavior of individuals.
Example 2
The present embodiment provides an evacuation path planning system combining Q-learning and multi-swarm algorithm, including:
the model initialization module is used for initializing the evacuation crowd and an evacuation exit for the constructed evacuation scene model;
the evacuation simulation module is used for planning a macro path by adopting a multi-swarm algorithm, guiding and driving the individuals to reach an evacuation outlet by combining the micro crowd movement until the number of people in the evacuation outlet is equal to the total number of people, and finishing the evacuation process;
and the path planning module is used for dividing the evacuation crowd into a plurality of groups by a multi-swarm algorithm, calculating the fitness according to the distance between the position of the individual in the group and the evacuation outlet and the congestion degree of the evacuation outlet, determining the search strategy according to the fitness value and the quality value of the search strategy to be selected in the Q table, determining the next position according to the search strategy, and obtaining new positions by adopting an improved scout search strategy after the leaders selectable in the groups are E leaders with the best fitness values in the groups and the leaders are converted into scouts.
It should be noted that the modules correspond to the steps described in embodiment 1, and the modules are the same as the corresponding steps in the implementation examples and application scenarios, but are not limited to the disclosure in embodiment 1. It should be noted that the modules described above as part of a system may be implemented in a computer system such as a set of computer executable instructions.
In further embodiments, there is also provided:
an electronic device comprising a memory and a processor and computer instructions stored on the memory and executed on the processor, the computer instructions when executed by the processor performing the method of embodiment 1. For brevity, no further description is provided herein.
It should be understood that in this embodiment, the processor may be a central processing unit CPU, and the processor may also be other general purpose processors, digital signal processors DSP, application specific integrated circuits ASIC, off-the-shelf programmable gate arrays FPGA or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and so on. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory may include both read-only memory and random access memory, and may provide instructions and data to the processor, and a portion of the memory may also include non-volatile random access memory. For example, the memory may also store device type information.
A computer readable storage medium storing computer instructions which, when executed by a processor, perform the method described in embodiment 1.
The method in embodiment 1 may be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in the processor. The software modules may be located in ram, flash, rom, prom, or eprom, registers, among other storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor. To avoid repetition, it is not described in detail here.
Those of ordinary skill in the art will appreciate that the various illustrative elements, i.e., algorithm steps, described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes will occur to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive efforts by those skilled in the art based on the technical solution of the present invention.
Claims (9)
1. An evacuation path planning method combining Q-learning and multi-swarm algorithm, comprising:
initializing an evacuation crowd and an evacuation exit for the constructed evacuation scene model;
adopting a multi-swarm algorithm to plan a macroscopic path, combining the microscopic crowd movement guidance to drive the individuals to reach an evacuation exit until the number of people evacuated at the evacuation exit is equal to the total number of people, and ending the evacuation process;
the multi-swarm algorithm comprises the steps of dividing the evacuation crowd into a plurality of groups, calculating the fitness according to the distance between the position of an individual in the group and an evacuation outlet and the crowding degree of the evacuation outlet, determining a search strategy according to the fitness value and the quality value of the search strategy to be selected in a Q table, and determining the next position according to the search strategy, wherein the specific steps are as follows:
(1) Initializing a Q table with n rows and t columns in each group, wherein n is the number of leaders in the group, and t is the number of search strategies;
(2) Calculating the fitness value of the leader according to the current position of the leader, sequencing the leader from large to small according to the fitness value, and obtaining the state S corresponding to each row of the Q table r There are t search strategies that can be selected, i.e., the individuals located on the r-th row;
(3) Probability of each search strategy l being selected and quality function Q value Q (S) of the search strategy r ,a l ) Correlation, as shown in equation (2), the higher the Q value of the search strategy, the greater the probability of being selected;
(4) Updating the location according to the selected search strategy and keeping the better one between the new location and the old location, while calculating again the Q value according to the updated location:
Q(s t ,a t )=Q(s t ,a t )+α·[R t +γmax Q(s t+1 ,a)-Q(s t ,a t )] (3)
wherein, Q(s) t ,a t ) Represents Q value, alpha is learning rate, gamma is reward coefficient, R is return value, max Q(s) t+1 A) is the next state S t+1 Medium maximum Q value; the return value R is:
R=fitness new -fitness old (4)
wherein, fitness new And fitness old Respectively the fitness of the new position and the fitness of the old position;
(5) The follower selects a search strategy consistent with the leader followed by the follower and sequentially determines the next position;
(6) After each iteration, all leaders are reordered in the group according to the fitness value of the new position, each leader obtains a new ordering state, and each individual selects a search strategy and updates the position according to the Q value in the new state row in the next iteration;
the leader range selectable by the followers in the group is E leaders with the best fitness value in the group, and after the leaders are converted into scouts, an improved scout search strategy is adopted to obtain new positions;
matching the search strategies to be selected in the constructed search strategy pool, wherein the search strategy pool comprises:
obtaining a new position according to the current position of the individual and a neighbor individual position randomly selected in the current group; updating the positions under the guidance of the positions of the neighbor individuals in the two randomly selected groups by taking the current position of the leader as a search starting point; taking a randomly selected neighbor individual position in the group as a search starting point, and obtaining a new position according to two randomly selected neighbor individual positions and the optimal individual position in the group; and taking the optimal individual position as a search starting point, and obtaining a new position according to two randomly selected neighbor individual positions in the group.
2. An evacuation path planning method combining Q-learning and multi-swarm algorithm according to claim 1, wherein the dividing of the evacuated crowd into a plurality of groups comprises:
taking the first individual as a first central point, selecting the individual farthest from the first central point as a second central point, and sequentially determining other central points by the same method until no new central point exists;
classifying the rest individuals into the nearest central point according to the minimum distance principle;
and calculating the fitness value of each individual in the group, sorting the fitness values, selecting a leader, and taking the rest of the fitness values as followers.
3. An evacuation route planning method combining Q-learning and multi-swarm algorithm as claimed in claim 1, wherein after determining the next location, re-determining the search strategy and updating the location according to the fitness value of the new location and the quality value of the search strategy to be selected in the Q-table.
4. An evacuation path planning method combining Q-learning and multi-swarm algorithm according to claim 1, wherein the followers in the group select the same search strategy as the following leader, and the followers narrow the selectable range of the leader as the evacuation process progresses to E leaders with the best fitness in the group; e is calculated as follows:
and the range of E is continuously reduced along with the iteration.
5. An evacuation path planning method combining Q-learning and multi-swarm algorithm as claimed in claim 1, wherein the improved scout search strategy is to adaptively adjust the search direction of the scout, moving to the side with better fitness.
6. An evacuation path planning method combining Q-learning and multi-swarm algorithm as claimed in claim 1, wherein the social force model is used to conduct micro crowd movement guidance, and the individuals are driven to the next position according to the individual expectation force, the repulsion force of the obstacle to the individuals and the interaction force between the individuals until the individuals reach the evacuation exit.
7. An evacuation path planning system combining Q-learning and multi-swarm algorithm, comprising:
the model initialization module is used for initializing the evacuation crowd and an evacuation exit for the constructed evacuation scene model;
the evacuation simulation module is used for planning a macro path by adopting a multi-swarm algorithm, guiding and driving the individuals to reach an evacuation outlet by combining the micro crowd movement until the number of people in the evacuation outlet is equal to the total number of people, and finishing the evacuation process;
the route planning module is used for dividing the evacuation crowd into a plurality of groups by a multi-bee colony algorithm, calculating the fitness according to the distance between the position of an individual in the group and an evacuation outlet and the crowding degree of the evacuation outlet, and determining a search strategy according to the fitness value and the quality value of the search strategy to be selected in the Q table so as to determine the next position, and the specific steps are as follows:
(1) Initializing a Q table with n rows and t columns in each group, wherein n is the number of leaders in the group, and t is the number of search strategies;
(2) Calculating the fitness value of the leader according to the current position of the leader, sequencing the leader from large to small according to the fitness value, and obtaining the state S corresponding to each row of the Q table r There are t search strategies that can be selected, i.e., the individuals located on the r-th row;
(3) Probability of each search strategy l being selected and quality function Q of the search strategy (S) r ,a l ) Correlation, as shown in equation (2), the higher the Q value of the search strategy, the greater the probability of being selected;
(4) Updating the location according to the selected search strategy and keeping the better one between the new location and the old location, while calculating again the Q value according to the updated location:
Q(s t ,a t )=Q(s t ,a t )+α·[R t +γmax Q(s t+1 ,a)-Q(s t ,a t )] (3)
wherein, Q(s) t ,a t ) Represents Q value, alpha is learning rate, gamma is reward coefficient, R is return value, max Q(s) t+1 A) is the next state S t+1 Medium maximum Q value; the reported value R is:
R=fitness new -fitness old (4)
wherein, fitness new And fitness old Respectively the fitness of the new position and the fitness of the old position;
(5) The follower selects a search strategy consistent with the leader followed by the follower and sequentially determines the next position;
(6) After each iteration, all leaders are reordered in the group according to the fitness value of the new position, each leader obtains a new ordering state, and each individual selects a search strategy and updates the position according to the Q value in the new state row in the next iteration;
the leader range selectable by the followers in the group is E leaders with the best fitness value in the group, and after the leaders are converted into scouts, improved scout search strategies are adopted to obtain new positions;
matching the search strategies to be selected in the constructed search strategy pool, wherein the search strategy pool comprises:
obtaining a new position according to the current position of the individual and a neighbor individual position randomly selected in the current group; updating the positions under the guidance of the positions of the neighbor individuals in the two randomly selected groups by taking the current position of the leader as a search starting point; taking a randomly selected neighbor individual position in the group as a search starting point, and obtaining a new position according to two randomly selected neighbor individual positions and the optimal individual position in the group; and taking the optimal individual position as a search starting point, and obtaining a new position according to two randomly selected neighbor individual positions in the group.
8. An electronic device comprising a memory and a processor and computer instructions stored on the memory and executed on the processor, the computer instructions when executed by the processor performing the method of any of claims 1-6.
9. A computer-readable storage medium storing computer instructions which, when executed by a processor, perform the method of any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011284240.0A CN112330043B (en) | 2020-11-17 | 2020-11-17 | Evacuation path planning method and system combining Q-learning and multi-swarm algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011284240.0A CN112330043B (en) | 2020-11-17 | 2020-11-17 | Evacuation path planning method and system combining Q-learning and multi-swarm algorithm |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112330043A CN112330043A (en) | 2021-02-05 |
CN112330043B true CN112330043B (en) | 2022-10-18 |
Family
ID=74320818
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011284240.0A Active CN112330043B (en) | 2020-11-17 | 2020-11-17 | Evacuation path planning method and system combining Q-learning and multi-swarm algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112330043B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114781228B (en) * | 2022-05-10 | 2024-05-17 | 杭州中奥科技有限公司 | Global evacuation method, equipment and storage medium based on single evacuation target |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106227958A (en) * | 2016-07-27 | 2016-12-14 | 山东师范大学 | Group's evacuation emulation system and method that artificial bee colony is combined with social force model |
CN107292064A (en) * | 2017-08-09 | 2017-10-24 | 山东师范大学 | A kind of crowd evacuation emulation method and system based on many ant colony algorithms |
CN107403049A (en) * | 2017-07-31 | 2017-11-28 | 山东师范大学 | A kind of Q Learning pedestrians evacuation emulation method and system based on artificial neural network |
CN108388734A (en) * | 2018-02-28 | 2018-08-10 | 山东师范大学 | Crowd evacuation emulation method based on TABU search ant colony algorithm and system |
CN108491598A (en) * | 2018-03-09 | 2018-09-04 | 山东师范大学 | A kind of crowd evacuation emulation method and system based on path planning |
CN110795833A (en) * | 2019-10-15 | 2020-02-14 | 山东师范大学 | Crowd evacuation simulation method, system, medium and equipment based on cat swarm algorithm |
CN111400963A (en) * | 2020-03-04 | 2020-07-10 | 山东师范大学 | Crowd evacuation simulation method and system based on chicken swarm algorithm and social force model |
-
2020
- 2020-11-17 CN CN202011284240.0A patent/CN112330043B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106227958A (en) * | 2016-07-27 | 2016-12-14 | 山东师范大学 | Group's evacuation emulation system and method that artificial bee colony is combined with social force model |
CN107403049A (en) * | 2017-07-31 | 2017-11-28 | 山东师范大学 | A kind of Q Learning pedestrians evacuation emulation method and system based on artificial neural network |
CN107292064A (en) * | 2017-08-09 | 2017-10-24 | 山东师范大学 | A kind of crowd evacuation emulation method and system based on many ant colony algorithms |
CN108388734A (en) * | 2018-02-28 | 2018-08-10 | 山东师范大学 | Crowd evacuation emulation method based on TABU search ant colony algorithm and system |
CN108491598A (en) * | 2018-03-09 | 2018-09-04 | 山东师范大学 | A kind of crowd evacuation emulation method and system based on path planning |
CN110795833A (en) * | 2019-10-15 | 2020-02-14 | 山东师范大学 | Crowd evacuation simulation method, system, medium and equipment based on cat swarm algorithm |
CN111400963A (en) * | 2020-03-04 | 2020-07-10 | 山东师范大学 | Crowd evacuation simulation method and system based on chicken swarm algorithm and social force model |
Also Published As
Publication number | Publication date |
---|---|
CN112330043A (en) | 2021-02-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104573812B (en) | A kind of unmanned plane air route determining method of path based on particle firefly colony optimization algorithm | |
US11755882B2 (en) | Method, apparatus and system for recommending location of robot charging station | |
CN110737968B (en) | Crowd trajectory prediction method and system based on deep convolutional long and short memory network | |
CN112461247A (en) | Robot path planning method based on self-adaptive sparrow search algorithm | |
CN112362066A (en) | Path planning method based on improved deep reinforcement learning | |
CN111611749A (en) | RNN-based indoor crowd evacuation automatic guiding simulation method and system | |
CN110375761A (en) | Automatic driving vehicle paths planning method based on enhancing ant colony optimization algorithm | |
CN110514206A (en) | A kind of unmanned plane during flying path prediction technique based on deep learning | |
CN110795833B (en) | Crowd evacuation simulation method, system, medium and equipment based on cat swarm algorithm | |
CN115560774B (en) | Dynamic environment-oriented mobile robot path planning method | |
Li et al. | Deep deterministic policy gradient algorithm for crowd-evacuation path planning | |
CN112231967A (en) | Crowd evacuation simulation method and system based on deep reinforcement learning | |
CN112330043B (en) | Evacuation path planning method and system combining Q-learning and multi-swarm algorithm | |
CN114167865A (en) | Robot path planning method based on confrontation generation network and ant colony algorithm | |
CN113204417A (en) | Multi-satellite multi-point target observation task planning method based on improved genetic and firefly combined algorithm | |
Huang et al. | Reinforcement learning for mobile robot obstacle avoidance under dynamic environments | |
CN105678401A (en) | Global optimization method based on strategy adaptability differential evolution | |
Elaziz et al. | Triangular mutation-based manta-ray foraging optimization and orthogonal learning for global optimization and engineering problems | |
CN115129064A (en) | Path planning method based on fusion of improved firefly algorithm and dynamic window method | |
CN109657800A (en) | Intensified learning model optimization method and device based on parametric noise | |
Lu et al. | Robot navigation in crowds via deep reinforcement learning with modeling of obstacle uni-action | |
CN110244757A (en) | A kind of motion control method being easy to group's evolution | |
Kishikawa et al. | Multi-objective inverse reinforcement learning via non-negative matrix factorization | |
Panda et al. | Autonomous mobile robot path planning using hybridization of particle swarm optimization and Tabu search | |
Duc et al. | An approach for UAV indoor obstacle avoidance based on AI technique with ensemble of ResNet8 and Res-DQN |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |