WO2022032442A1 - 多智能主体协同搬运物件的方法、系统和计算机可读存储介质 - Google Patents
多智能主体协同搬运物件的方法、系统和计算机可读存储介质 Download PDFInfo
- Publication number
- WO2022032442A1 WO2022032442A1 PCT/CN2020/108242 CN2020108242W WO2022032442A1 WO 2022032442 A1 WO2022032442 A1 WO 2022032442A1 CN 2020108242 W CN2020108242 W CN 2020108242W WO 2022032442 A1 WO2022032442 A1 WO 2022032442A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- agent
- intelligent
- target
- cost function
- agents
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 42
- 239000003795 chemical substances by application Substances 0.000 claims abstract description 362
- 239000003016 pheromone Substances 0.000 claims abstract description 12
- 230000006870 function Effects 0.000 claims description 102
- 230000003993 interaction Effects 0.000 claims description 23
- 230000006399 behavior Effects 0.000 claims description 22
- 230000009471 action Effects 0.000 claims description 14
- 238000012546 transfer Methods 0.000 claims description 9
- 238000004590 computer program Methods 0.000 claims description 7
- AYFVYJQAPQTCCC-GBXIJSLDSA-N L-threonine Chemical compound C[C@@H](O)[C@H](N)C(O)=O AYFVYJQAPQTCCC-GBXIJSLDSA-N 0.000 claims description 6
- 230000008901 benefit Effects 0.000 claims description 5
- 230000001186 cumulative effect Effects 0.000 claims description 5
- 238000004364 calculation method Methods 0.000 claims description 4
- 230000010287 polarization Effects 0.000 claims description 4
- 230000008569 process Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 230000002787 reinforcement Effects 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000003542 behavioural effect Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000011217 control strategy Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000007599 discharging Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000005284 excitation Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000004936 stimulating effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
Definitions
- the present application relates to the field of swarm intelligence, and in particular, to a method, a system, and a computer-readable storage medium for multi-intelligent subjects to cooperatively carry objects.
- swarm intelligence systems are usually highly complex and have extremely diverse swarm behaviors.
- the existing methods of multi-intelligent agents cooperating to move objects have certain limitations, and only relying on local control strategies cannot satisfy the effective control of large-scale swarm intelligence systems.
- Embodiments of the present application provide a method, system, and computer-readable storage medium for multi-intelligent agents to carry objects cooperatively, so as to solve certain limitations of the existing methods for multi-intelligence agents to cooperatively carry objects.
- the technical solution is as follows:
- a method for cooperatively carrying objects by multiple intelligent agents includes:
- the cost function corresponding at least one strategy is invoked for the target agent from the decision set to control the target agent to perform the desired behavior, the cost function and the incentive cost function of the target agent and the multi-agent
- the interaction cost function of other intelligent agents other than the target intelligent agent in the target agent is related to the interaction cost function of the target intelligent agent;
- a system for cooperatively carrying objects by multiple intelligent agents includes:
- a determination module for determining a target intelligent subject from the multi-intelligent subjects performing the task of carrying the object
- a policy invocation module for invoking at least one corresponding policy for the target intelligent agent from the decision set to control the target intelligent agent to perform desired behaviors according to a cost function, the cost function and the incentive cost function of the target intelligent agent and the interaction cost function of other intelligent agents other than the target intelligent agent in the multi-agent agent relative to the target agent agent;
- the building module is used for constructing the topology structure of the cooperative operation of multiple intelligent agents according to the anisotropy and heterogeneity of the neighbor distribution around the target intelligent agent and the pheromone released between each intelligent agent in the multiple intelligent agents;
- a first update module configured to update the collaborative operation partner of the target intelligent agent under the topology structure of the multi-agent cooperative operation
- the second updating module is used for updating the moving speed and position of the target intelligent body, and returning to the step of constructing the topology structure of the cooperative operation of the multiple intelligent bodies until the multiple intelligent bodies complete the task of carrying objects.
- a system for multi-intelligent agents to cooperatively carry objects comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, the computer program code being executed by the one or multiple processors are loaded and executed to implement the operations performed by the method for cooperatively carrying objects by multiple intelligent agents.
- a computer-readable storage medium stores a computer program loaded and executed by a processor to implement operations performed by the method for cooperatively carrying objects by multiple intelligent agents.
- At least one corresponding strategy is invoked from the decision set for the target intelligent agent to control the target intelligent agent to perform the desired behavior, and, under the topology structure of multi-agent cooperative operation, update The cooperating partner of the target intelligent agent and the moving speed and position of the target intelligent agent, and then return to the step of constructing the topology structure of the multi-agent cooperative operation until the multi-agent completes the task of moving objects.
- each intelligent agent makes more frequent explorations of influential states and behavior points. Subjects are able to learn complex cooperative strategies to effectively solve complex tasks for collaborative work and completion.
- FIG. 1 is a flowchart of a method for cooperatively transporting objects by multi-intelligence agents provided by an embodiment of the present application
- FIG. 2 is a schematic structural diagram of a system for cooperatively transporting objects by multiple intelligent agents according to an embodiment of the present application
- FIG. 3 is a schematic functional structural diagram of a system for cooperatively carrying objects by multiple intelligent agents according to another embodiment of the present application.
- FIG. 1 it is a method for cooperatively transporting objects by multi-intelligence agents provided by an embodiment of the present application.
- the method mainly includes the following steps S101 to S105, which are described in detail as follows:
- Step S101 Determine a target intelligent agent from among the multiple intelligent agents performing the task of carrying the object.
- the multi-intelligent agents include multiple intelligent agents, for example, multiple automated guided vehicles (AGVs), and each intelligent agent performs its own sub-tasks to perform the target task, for example, unmanned vehicles Handling objects in unmanned scenarios such as supermarkets and smart warehouses.
- AGVs automated guided vehicles
- the target intelligent agent does not mean that it is different from other intelligent agents in the multi-intelligence agent, but the method used to instruct the multi-intelligence agent to coordinately carry objects is in the execution agent of this action, in other words, in the multi-intelligence agent. Any agent can be the target agent.
- the task of moving objects because the objects are relatively large, exceeds the ability of a single intelligent agent. Therefore, the task of moving objects is a cooperative task, that is, the target intelligent agent and other intelligent agents in the multi-agent agent are required. Tasks that can only be completed by the main body working together. For example, it is a task that requires the target intelligent subject and other intelligent subjects in the multi-intelligent subject to cooperate with each other through actions such as "move forward”, “move backward”, “move left”, and "move right”.
- Step S102 according to the cost function, calling at least one corresponding strategy for the target intelligent agent from the decision set to control the target intelligent agent to perform the desired behavior, wherein the cost function and the incentive cost function of the target intelligent agent and the multi-agent except the target intelligent agent
- the interaction cost function of other agents other than the target agent is related to the interaction cost function of the target agent.
- the desired behavior includes an action that enables the intelligent agent to directly or indirectly accomplish a certain target task.
- the intelligent subject is located somewhere in the smart warehouse or unmanned supermarket at the current moment.
- the actions that the intelligent subject can perform include “to the "Move forward”, “Move back”, “Move left”, “Move right”, and “Turn the warehouse door handle”, etc.
- the desired behavior may be an action such as "Turn the warehouse door handle”.
- the task of moving objects involved in the embodiments of the present application is based on a reinforcement learning (Reinforcement Learning, RL) task, and the application environment of the object moving task is modeled by Markov Decision Processes (MDP).
- Reinforcement learning uses the intelligent agent to learn from the environment to maximize the reward. If a certain behavioral strategy of the intelligent agent leads to a positive reward in the environment, the tendency of the intelligent agent to generate this behavioral strategy in the future will be strengthened. Therefore, in this embodiment of the present application, the method for cooperatively transporting objects by multiple intelligent agents further includes the step of determining an implementation environment for performing the task of transporting objects. Different implementation environments have different extrinsic excitation functions, which affect the cost function of the target agent.
- the goal of a Markov Decision Process is to find an optimal policy that maximizes the expected reward.
- the cost function learning algorithm is used to obtain the optimal value function through learning, so as to find the corresponding optimal strategy, which is better than (at least equal to) any other strategy.
- calling the corresponding at least one strategy from the decision set for the target intelligent agent to control the target intelligent agent to execute the desired behavior is achieved through the following steps S1021 to S1023:
- Step S1021 Determine the interaction cost function of the target intelligent agent in the multi-agent agent.
- the interaction cost function is related to the expected difference, which is the actions of other intelligent agents except the target intelligent agent in the multi-agent after the transfer - the cost function and the state and action of the ignored agent
- the expected difference of the action - cost function obtained by the anti - real calculation under the condition of The value of Y in the case.
- the anti-realistic calculation may be to calculate the probability of the agent performing an action assuming that agent 2 does not exist.
- the action-cost function of other intelligent agents except the target intelligent agent in the multi-agent agent the rewards of other intelligent agents except the target intelligent agent, and the expected cumulative income of other intelligent agents after the transfer
- the sum is related.
- the action-cost function of other agents except the target agent in the multi-agent agent is the sum of the rewards of other agents except the target agent and the expected cumulative benefits of other agents after the transfer.
- Step S1022 Determine the cost function of the target intelligent agent according to the interaction cost function and the incentive cost function.
- the incentive cost function is related to the external incentive cost function and the intrinsic incentive cost function of the target intelligent agent.
- the incentive cost function of the target intelligent agent is the sum of the external incentive cost function and the intrinsic incentive cost function of the target intelligent agent.
- the external incentive cost function is the incentive cost function provided by the environment, according to the incentive cost of the environment that may be obtained by the current action, to influence whether the tendency of the intelligent agent to generate this action strategy in the future is strengthened or weakened.
- the intrinsic incentive cost function can be, for example, curiosity or the like.
- curiosity When curiosity is used as an intrinsic incentive cost function, it can prompt intelligent agents to explore according to the uncertainty of the environment, so that on the one hand, it can avoid falling into the local optimal situation, and on the other hand, it can discover cost-effective interaction points to a greater extent. .
- Step S1023 Obtain a strategy from the decision set according to the cost function of the target intelligent agent, and control the target intelligent agent to perform desired behavior according to the strategy.
- the totality of strategies that can be adopted is called a decision set. That is to say, in the task of multi-agent moving objects cooperatively, the decision set is a set of strategies that can be selected by each agent.
- Policies can be learned by training on reliable data samples of considerable size in mature tasks, or they can be learned by training a deep neural network.
- the deep neural network includes a continuous parameter space, and each group of parameters in the continuous parameter space corresponds to a strategy, thereby forming a continuous decision set.
- Step S103 according to the anisotropy and heterogeneity of the neighbor distribution around the target intelligent agent and the pheromone released between each intelligent agent in the multiple intelligent agents, construct a topology structure for the cooperative operation of the multiple intelligent agents.
- the anisotropy and isotropy of the neighbor distribution around the target intelligent agent includes anisotropy and isotropy, wherein the anisotropy refers to the characteristic that each intelligent agent in the multi-agent agent moves in different directions. , and isotropy refers to the property that each agent in a multi-agent moves in roughly the same direction.
- anisotropy refers to the characteristic that each intelligent agent in the multi-agent agent moves in different directions.
- isotropy refers to the property that each agent in a multi-agent moves in roughly the same direction.
- a single intelligent agent in the entire multi-agent moves according to its own direction. From the overall point of view, the direction of movement is disordered, showing the characteristics of anisotropy.
- a single intelligent agent adjusts itself according to its nearest 6 or 7 neighbors.
- the movement direction of multi-agent agents is roughly the same as a whole, showing an isotropic feature.
- the pheromone released between individual agents in a multi-agent determines the topological distance, reflecting a topology-distance relationship rather than a metric-distance framework.
- This topology-distance is determined by pheromones released between individual agents in a multi-agent. Therefore, according to the anisotropy and heterogeneity of the neighbor distribution around the target intelligent agent and the pheromone released between each intelligent agent in the multi-agent, the topology structure of the multi-agent cooperative operation can be constructed.
- Step S104 Under the topology structure of multi-agent cooperative operation, update the cooperative operation partner of the target agent.
- the collaborative operation partner for updating the target intelligent agent may be: according to the principle that the selection probability p j is inversely proportional to the distance d ij , the distance from the target intelligent agent's field of vision radius Select the intelligent agent as the target intelligent agent's pre-collaboration partner A j from the nearest m neighbors within the range of r, and compare the suitability of the pre-collaboration partner A j with the preset fitness function threshold f thre .
- the pre-collaborative operation partner A j is not used as the collaborative operation partner of the target intelligent agent, otherwise, the pre-collaborative operation partner A j is taken as the collaborative operation partner of the target intelligent agent, wherein, d ij is the distance between the target intelligent agent and the pre-collaboration partner A j , and m is 6 or 7.
- the fitness function f(x j ) of A j is defined as the evaluation of the current best position of the j-th intelligent agent A j tending to the target point.
- Step S105 Update the moving speed and position of the target intelligent body, and return to the step of constructing the topology structure of the multi-intelligent body cooperative operation, until the multi-intelligence body completes the task of transporting objects.
- updating the moving speed and position of the target intelligent agent may be: a polarization factor introduced by Control the multi-agent group to update the moving speed and position of the target agent, wherein, v i is the velocity of the i-th intelligent agent in the multi-agent, and
- is the norm of v i in its metric space.
- the polarization factor It is used to measure the overall order degree of the multi-agent, reflecting the consistency of the overall movement direction, that is, when When , it indicates that the overall movement direction of the multi-agent is disorganized, and when , indicating that the multi-intelligence agent as a whole moves in the same direction.
- step S103 After completing the updating of the moving speed and position of the target intelligent body, return to step S103 , that is, repeat steps S103 to S105 until the multi-smart body completes the task of carrying the object.
- a corresponding at least one strategy is invoked from the decision set for the target intelligent agent to control the target intelligent agent to perform the desired behavior, and, under the topology of the multi-agent cooperative operation, Update the collaborative operation partner of the target intelligent agent and the moving speed and position of the target intelligent agent, and then return to the step of constructing the topology structure of the multi-agent cooperative operation until the multi-agent completes the task of transporting objects.
- the function enables each intelligent agent to explore the influential states and behavior points more frequently. Intelligent agents can learn complex cooperative strategies to effectively solve complex tasks for collaborative work and completion.
- FIG. 2 is a schematic structural diagram of a system for cooperatively handling objects by multiple agents provided by an embodiment of the present application.
- the system may include a determination module 201 , a policy invocation module 202 , a construction module 203 , a first update module 204 and The second update module 205, wherein:
- a determination module 201 configured to determine a target intelligent subject from the multiple intelligent subjects performing the task of carrying the object;
- the policy calling module 202 is used to call at least one corresponding policy for the target intelligent agent from the decision set according to the cost function to control the target intelligent agent to perform the desired behavior, wherein the cost function and the incentive cost function of the target intelligent agent and the multi-agent
- the interaction cost function of other intelligent agents other than the target agent is related to the target agent;
- the construction module 203 is used for constructing the topology structure of the cooperative operation of the multi-agent according to the anisotropy and heterogeneity of the neighbor distribution around the target agent and the pheromone released between each agent in the multi-agent;
- the first update module 204 is configured to update the cooperative operation partner of the target intelligent agent under the topology structure of multi-agent cooperative operation;
- the second update module 205 is used to update the moving speed and position of the target intelligent body, and return to the step of constructing the topology structure of the cooperative operation of the multiple intelligent bodies until the multiple intelligent bodies complete the task of transporting objects.
- the policy invocation module 202 may include a first determination unit, a second determination unit and a control unit, wherein:
- a first determining unit configured to determine the interaction cost function of the target intelligent agent in the multi-agent
- a second determining unit configured to determine the cost function of the target intelligent agent according to the interaction cost function and the incentive cost function
- the control unit is used to obtain a strategy from the decision set according to the cost function of the target intelligent agent, and control the target intelligent agent to execute the desired behavior according to the strategy.
- the first update module 204 may include a smart selection unit and a third determination unit, wherein:
- the intelligent selection unit is used to select the intelligent subject as the pre-collaborative operation partner of the target intelligent subject from the nearest m neighbors within the field of view radius r of the target intelligent subject according to the principle that the selection probability p j is inversely proportional to the distance d ij A j , where, d ij is the distance between the target intelligent agent and the pre-collaboration partner A j , and m is 6 or 7;
- the third determining unit is configured to compare the suitability of the pre-collaboration partner A j with the preset suitability function threshold f thre , and if the suitability of the pre-collaboration partner A j is greater than f thre , the pre-collaboration partner A j will not be determined A j is taken as the collaborative operation partner of the target intelligent agent, otherwise, the pre-collaborative operation partner A j is taken as the collaborative operation partner of the target intelligent agent.
- the second update module 205 may include a velocity position update unit for passing the introduced polarization factor Control the multi-agent group to update the moving speed and position of the target agent, wherein, v i is the velocity of the i-th intelligent agent in the multi-agent, and
- is the norm of v i in its metric space.
- the interaction cost function is related to the expected difference, where the expected difference is the actions of other agents in the multi-agent after the transfer except the target agent - the cost function is related to the ignoring agent's The expected difference of the action-cost function calculated by the anti-truth under the condition of state and action.
- the action-cost function of other intelligent agents in the multi-agent except the target agent the rewards of other agents except the target agent, and the expected accumulation of other agents after the transfer
- the sum of the benefits is related.
- the action-cost function calculated by anti-real is related to the sum of anti-real rewards of other agents other than the target agent and the anti-real expected cumulative benefits of other agents after transfer .
- the embodiment of the present application also provides a system for cooperatively transporting objects by multiple intelligent agents, and the system for cooperatively transporting objects by multiple intelligent agents is shown in FIG. Schematic diagram of the system structure, specifically:
- the system for cooperatively carrying objects by multi-agents may include a processor 301 of one or more processing cores, a memory 302 of one or more computer-readable storage media, a power supply 303, an input unit 304 and other components.
- a processor 301 of one or more processing cores may include a processor 301 of one or more processing cores, a memory 302 of one or more computer-readable storage media, a power supply 303, an input unit 304 and other components.
- the processor 301 is the control center of the system of the multi-intelligent subject cooperatively transporting objects, using various interfaces and lines to connect various parts of the entire multi-intelligence subject cooperating object transporting system, and by running or executing the software programs stored in the memory 302 and and/or modules, and call the data stored in the memory 302 to execute various functions and process data of the system of multi-intelligent agents cooperatively transporting objects, so as to perform overall monitoring of the system of multi-intelligence agents cooperating to transport objects.
- the processor 301 may include one or more processing cores; preferably, the processor 301 may integrate an application processor and a modem processor, wherein the application processor mainly processes the operating system, user interface, and application programs, etc. , the modem processor mainly deals with wireless communication. It can be understood that, the above-mentioned modulation and demodulation processor may not be integrated into the processor 301.
- the memory 302 can be used to store software programs and modules, and the processor 301 executes various functional applications and data processing by running the software programs and modules stored in the memory 302 .
- the memory 302 may mainly include a stored program area and a stored data area, wherein the stored program area may store an operating system, an application program required for at least one function (such as a sound playback function, an image playback function, etc.), etc.; Data created by the use of a system in which multiple agents cooperate to move objects.
- memory 302 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, memory 302 may also include a memory controller to provide processor 301 access to memory 302 .
- the system for cooperatively transporting objects by multiple intelligent agents further includes a power supply 303 for supplying power to each component.
- the power supply 303 can be logically connected to the processor 301 through a power management system, so as to manage charging, discharging, and power consumption through the power management system. management and other functions.
- the power source 303 may also include one or more DC or AC power sources, recharging systems, power failure detection circuits, power converters or inverters, power status indicators, and any other components.
- the system for cooperatively carrying objects by multi-intelligent agents may further include an input unit 304, which can be used to receive input numerical or character information, and generate keyboard, mouse, joystick, optics or track related to user settings and function control Ball signal input.
- an input unit 304 which can be used to receive input numerical or character information, and generate keyboard, mouse, joystick, optics or track related to user settings and function control Ball signal input.
- the system for cooperatively carrying objects by multi-intelligent agents may also include a display unit, etc., which will not be repeated here.
- the processor 301 in the system for cooperatively transporting objects by multiple intelligent agents loads the executable files corresponding to the processes of one or more application programs into the memory 302 according to the following instructions, and the processing The device 301 is used to run the application program stored in the memory 302, so as to realize various functions, as follows: determine a target intelligent agent from the multi-agents performing the task of carrying objects; according to the cost function, from the decision set as the target intelligent agent Invoke the corresponding at least one strategy to control the target intelligent agent to perform the desired behavior, wherein the cost function and the incentive cost function of the target intelligent agent and the interaction cost function of other intelligent agents in the multi-agent except the target intelligent agent relative to the target intelligent agent related; according to the anisotropy and dissimilarity of the neighbor distribution around the target intelligent agent and the pheromone released between each intelligent agent in the multi-
- At least one corresponding strategy is called from the decision set for the target intelligent agent to control the target intelligent agent to perform the desired behavior, and, under the topology structure of multi-agent cooperative operation, the cooperative operation of the target intelligent agent is updated.
- the embodiments of the present application provide a computer-readable storage medium, in which a plurality of instructions are stored, and the instructions can be loaded by a processor to execute any one of the multi-intelligence agents provided in the embodiments of the present application to carry objects cooperatively steps in the method.
- the instruction may perform the following steps: determine a target intelligent agent from the multi-agents that perform the task of carrying objects; according to the cost function, call at least one corresponding strategy for the target intelligent agent from the decision set to control the execution of the target intelligent agent Desired behavior, in which the cost function is related to the incentive cost function of the target agent and the interaction cost function of other agents except the target agent in the multi-agent relative to the target agent; according to the neighbor distribution around the target agent Similarities and differences and the pheromone released between the various intelligent agents in the multi-agent, construct the topology structure of the multi-agent cooperative operation; under the topology of the multi-agent cooperative operation, update the cooperative operation partner of the target agent; update the target intelligence The moving speed and position of the main body, and returning to the step of constructing the topology structure of the cooperative operation of the multi-intelligent main body, until the multi-intelligent main body completes the task of carrying the object.
- the computer-readable storage medium may include: a read-only memory (ROM, Read Only Memory), a random access memory (RAM, Random Access Memory), a magnetic disk or an optical disk, and the like.
- any of the methods for cooperatively transporting objects provided by the embodiments of the present application can be implemented.
- the beneficial effects that can be achieved by the method for cooperatively transporting objects by multiple intelligent agents are detailed in the previous embodiments, which will not be repeated here.
Landscapes
- Engineering & Computer Science (AREA)
- Aviation & Aerospace Engineering (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
一种多智能主体协同搬运货物的方法、系统和计算机可读存储介质。方法包括:从执行搬运物件这一任务的多智能主体中确定一目标智能主体(S101);根据成本函数,从决策集中为目标智能主体调用相应的至少一策略以控制目标智能主体执行期望行为(S102);按照目标智能主体周围的邻居分布各向异同性以及多智能主体中各个智能主体之间释放的信息素,构建多智能主体协同作业的拓扑结构(S103);在多智能主体协同作业的拓扑结构下,更新目标智能主体的协同作业伙伴(S104);更新目标智能主体的移动速度与位置,直至多智能主体完成搬运物件这一任务(S105)。本技术方案使得智能主体能够学习到复杂的合作策略,以有效地解决针对复杂型任务的协同作业。
Description
本申请涉及群体智能领域,特别涉及一种多智能主体协同搬运物件的方法、系统和计算机可读存储介质。
在群体智能领域中,智能主体(例如传感器、机器人、飞行器等)的个体能力有限,但其群体却能表现出高效的协同合作能力和高级的智能协调水平。随着计算机网络、通信通讯、分布计算等技术的不断发展,许多实际应用系统往往变得非常的庞大和复杂,如何使智能主体的团队合作达到最大化效果,有关智能群体协同理论的研究一直以来为群体智能的重要课题和关键。例如,在无人场景下(例如,无人超市、智能仓储),往往需要多智能主体协同才能搬运一件超过单个智能主体能力的物件。
然而,群体智能系统通常高度复杂,群体行为极其多样,现有的多智能主体协同搬运物件的方法存在一定的局限性,仅仅依靠局部控制策略并不能满足大规模群体智能系统的有效控制。
发明内容
本申请实施例提供了一种多智能主体协同搬运物件的方法、系统和计算机可读存储介质,以解决现有的多智能主体协同搬运物件的方法存在一定的局限性。该技术方案如下:
一方面,提供了一种多智能主体协同搬运物件的方法,该方法包括:
从执行搬运物件这一任务的多智能主体中确定一目标智能主体;
根据成本函数,从决策集中为所述目标智能主体调用相应的至少一策略以控制所述目标智能主体执行期望行为,所述成本函数与所述目标智能主体的激励成本函数以及所述多智能主体中除所述目标智能主体之外的其他智能主体相对所述目标智能主体的交互成本函数相关;
按照所述目标智能主体周围的邻居分布各向异同性以及多智能主体中各个智能主体之间释放的信息素,构建多智能主体协同作业的拓扑结构;
在所述多智能主体协同作业的拓扑结构下,更新所述目标智能主体的协同作业伙伴;
更新所述目标智能主体的移动速度与位置,返回所述构建多智能主体协同作业的拓扑结构的步骤,直至所述多智能主体完成搬运物件这一任务。
一方面,提供了一种多智能主体协同搬运物件的系统,该系统包括:
确定模块,用于从执行搬运物件这一任务的多智能主体中确定一目标智能主体;
策略调用模块,用于根据成本函数,从决策集中为所述目标智能主体调用相应的至少一策略以控制所述目标智能主体执行期望行为,所述成本函数与所 述目标智能主体的激励成本函数以及所述多智能主体中除所述目标智能主体之外的其他智能主体相对所述目标智能主体的交互成本函数相关;
构建模块,用于按照所述目标智能主体周围的邻居分布各向异同性以及多智能主体中各个智能主体之间释放的信息素,构建多智能主体协同作业的拓扑结构;
第一更新模块,用于在所述多智能主体协同作业的拓扑结构下,更新所述目标智能主体的协同作业伙伴;
第二更新模块,用于更新所述目标智能主体的移动速度与位置,返回所述构建多智能主体协同作业的拓扑结构的步骤,直至所述多智能主体完成搬运物件这一任务。
一方面,提供了一种多智能主体协同搬运物件的系统,该系统包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,该计算机程序代码由该一个或多个处理器加载并执行以实现该多智能主体协同搬运物件的方法所执行的操作。
一方面,提供了一种计算机可读存储介质,该计算机可读存储介质存储有计算机程序由处理器加载并执行以实现该多智能主体协同搬运物件的方法所执行的操作。
从上述本申请提供的技术方案可知,根据成本函数,为目标智能主体从决策集中调用相应的至少一策略以控制目标智能主体执行期望行为,并且,在多智能主体协同作业的拓扑结构下,更新目标智能主体的协同作业伙伴以及目标 智能主体的移动速度与位置,然后,返回构建多智能主体协同作业的拓扑结构的步骤,直至多智能主体完成搬运物件这一任务,由于前述通过设置交互成本函数作为一种内在激励成本函数,使得每个智能主体对有影响力的状态和行为点进行更频繁的探索,通过激励智能主体之间的交互,促使多个智能主体之间产生合作,进而使得智能主体能够学习到复杂的合作策略,以有效地解决针对复杂型任务的协同作业和完成。
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的多智能主体协同搬运物件的方法的流程图;
图2是本申请实施例提供的多智能主体协同搬运物件的系统的结构示意图;
图3是本申请另一实施例提供的多智能主体协同搬运物件的系统的功能结构示意图。
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
参见图1,是本申请实施例提供的一种多智能主体协同搬运物件的方法,该方法主要包括以下步骤S101至S105,详细说明如下:
步骤S101:从执行搬运物件这一任务的多智能主体中确定一目标智能主体。
在本申请实施例中,多智能主体包含多个智能主体,例如,多个自动引导车辆(Automated Guided Vehicle,AGV),各个智能主体为执行目标任务而分别执行各自的子任务,例如,无人超市、智能仓储等无人场景下搬运物件。需要的是,目标智能主体并不意味着与多智能主体中其他智能主体有所区别,而是用于指示多智能主体协同搬运物件的方法在此次动作的执行主体,换言之,多智能主体中任一智能主体均能够作为目标智能主体。
在本申请实施例中,搬运物件这一任务,由于物件比较大,超出了单个智能主体的能力,因此,搬运物件这一任务是合作类任务,即需要目标智能主体与多智能主体中其他智能主体协同作业才可坑完成的任务。例如,,需要目标智能主体与多智能主体中其他智能主体通过各自″向前移动″、″向后移动″、″向左移动″、″向右移动″等动作协同才能完成的任务。
步骤S102:根据成本函数,从决策集中为目标智能主体调用相应的至少一策略以控制目标智能主体执行期望行为,其中,成本函数与目标智能主体的激励成本函数以及多智能主体中除目标智能主体之外的其他智能主体相对目标智能主体的交互成本函数相关。
在本申请实施例中,期望行为包括能够使智能主体直接或间接完成某项目 标任务的动作。例如,智能主体在当前时刻位于智能仓储或无人超市的某一处,当目标任务为打开智能仓储或无人超市的门,向某个出口搬运物件时,智能主体能够执行的动作包括″向前移动″、″向后移动″、″向左移动″、″向右移动″以及″转动库房门把手″等,则期望行为可以为例如″转动库房门把手″的动作。
本申请实施例所涉及的搬运物件这一任务是根据强化学习(Reinforcement Learning,RL)的任务,该搬运物件任务的应用环境由马尔科夫决策过程(Markov Decision Processes,MDP)建模。强化学习通过智能主体从环境学习以使得奖励最大,若智能主体的某个行为策略导致环境正的奖励,则智能主体以后产生这个行为策略的趋势便会加强。因此,在本申请实施例中,多智能主体协同搬运物件的方法还包括确定执行搬运物件这一任务的实施环境的步骤。不同的实施环境有不同的外在激励函数,从而影响目标智能主体的成本函数。马尔科夫决策过程MDP的目标是找到一个最优策略,使期望奖励最大化。成本函数学习算法就是用于通过学习获得最优价值函数,从而找到相应的最优策略,该最优策略要好于(至少是等于)任何其他策略。
作为本申请一个实施例,根据成本函数,为目标智能主体从决策集中调用相应的至少一策略以控制目标智能主体执行期望行为通过如下步骤S1021至S1023实现:
步骤S1021:确定多智能主体中目标智能主体的交互成本函数。
在本申请实施例中,交互成本函数与期望差值相关,该期望差为转移之后的多智能主体中除目标智能主体之外的其他智能主体的动作-成本函数与忽略智能主体的状态和动作的条件下经反真实计算得到的动作-成本函数的期望差值,而反真实计算是一种概率推断方式,用于表示在现实情况为X=x
1的情况下、估计X=x
2的情况下Y的值。例如,在一个包含有智能主体1和智能主体2的多智能主体中,该反真实计算可以是计算智能主体在假设不存在智能主体2的情况下执行某一动作的概率。
在本申请实施例中,多智能主体中除目标智能主体之外的其他智能主体的动作-成本函数与除目标智能主体之外的其他智能主体的奖励以及其他智能主体在转移之后的期望累计收益之和相关。例如,多智能主体中除目标智能主体之外的其他智能主体的动作-成本函数为除目标智能主体之外的其他智能主体的奖励与其他智能主体在转移之后的期望累计收益之和的和。
步骤S1022:根据交互成本函数和激励成本函数确定目标智能主体的成本函数。
在本申请实施例中,激励成本函数与目标智能主体的外在激励成本函数以及内在激励成本函数相关。在本申请实施例中,目标智能主体的激励成本函数为目标智能主体的外在激励成本函数与内在激励成本函数之和。外在激励成本函数为环境提供的激励成本函数,根据当前动作可能获得的环境的激励成本以影响智能主体以后产生这个动作策略的趋势是否加强或减弱。内在激励成本函数可以为例如好奇心等。好奇心作为内在激励成本函数时,能够促使智能主体 根据其对环境的不确定性进行探索,从而一方面能够避免陷入局部最优的情况,另一方面能够更大程度上发现有成本的交互点。
步骤S1023:根据目标智能主体的成本函数,从决策集中获得一策略,根据策略控制目标智能主体执行期望行为。
在执行某一目标任务,例如搬运物件中,可以采取的策略的总体称为决策集。也就是说,在多智能主体协同搬运物件这一任务时,决策集为可供各个智能主体进行选择的策略的集合。策略可以是通过对成熟任务中拥有相当规模的可靠数据样本进行训练而学习得到,或者可以通过一深度神经网络进行训练而学习得到。该深度神经网络中包括一连续参数空间,该连续参数空间中的每一组参数对应一个策略,由此形成了一连续决策集。
步骤S103:按照目标智能主体周围的邻居分布各向异同性以及多智能主体中各个智能主体之间释放的信息素,构建多智能主体协同作业的拓扑结构。
在本申请实施例中,目标智能主体周围的邻居分布各向异同性包括各向异性和各向同性,其中,各向异性是指多智能主体中每个智能主体运动的方向各不相同的特性,而各向同性是指多智能主体中每个智能主体运动的方向大致相同的特性。开始尚未协同作业时,整个多智能主体中的单个智能主体按照自己的方向运动,从整体来看,运动方向是杂乱无章的,表现为各向异性的特征。经过一段时间后,单个智能主体都按照其最邻近的6个或7个邻居进行自我调整,最终,多智能主体从整体上来看其运动方向大致一致,表现为各向同性的特征。此外,多智能主体中单个智能主体之间释放的信息素决定了拓扑距离, 体现出拓扑-距离关系,而非度量-距离框架。通过多智能主体中单个智能主体之间释放的信息素决定这种拓扑-距离。因此,可以按照目标智能主体周围的邻居分布各向异同性以及多智能主体中各个智能主体之间释放的信息素,构建多智能主体协同作业的拓扑结构。
步骤S104:在多智能主体协同作业的拓扑结构下,更新目标智能主体的协同作业伙伴。
作为本申请一个实施例,在多智能主体协同作业的拓扑结构下,更新目标智能主体的协同作业伙伴可以是:按照选中概率p
j与距离d
ij成反比原则,在距离目标智能主体的视野半径r范围内从最邻近的m个邻居中选择智能主体作为目标智能主体的预协同作业伙伴A
j,将预协同作业伙伴A
j的合适度与预设合适度函数阈值f
thre相比,若预协同作业伙伴A
j的合适度大于f
thre,则不将预协同作业伙伴A
j作为目标智能主体的协同作业伙伴,否则,将预协同作业伙伴A
j作为目标智能主体的协同作业伙伴,其中,
d
ij为目标智能主体与预协同作业伙伴A
j之间的距离,m为6或7。
需要说明的是,A
j的合适度函数f(x
j)定义为对第j个智能主体A
j趋向于目标点的当前最好位置的评价。
步骤S105:更新目标智能主体的移动速度与位置,返回构建多智能主体协同作业的拓扑结构的步骤,直至多智能主体完成搬运物件这一任务。
具体地,作为本申请一个实施例,更新目标智能主体的移动速度与位置可 以是:通过引入的两极分化因子
对多智能主体群体进行控制,以更新目标智能主体的移动速度与位置,其中,
v
i是多智能主体中第i个智能主体的速度,||v
i||为计算v
i在其度量空间中的范数。在本申请实施例中,两极分化因子
用于度量多智能主体的整体有序程度,反映该整体运动方向的一致程度,即,当
时,表明多智能主体整体运动方向杂乱无章,当
时,表明多智能主体整体基本朝向同一方向运动。
当完成对目标智能主体的移动速度与位置的更新后,返回到步骤S103,即重复步骤S103至步骤S105,直至多智能主体完成搬运物件这一任务。
从上述附图1示例的技术方案可知,根据成本函数,为目标智能主体从决策集中调用相应的至少一策略以控制目标智能主体执行期望行为,并且,在多智能主体协同作业的拓扑结构下,更新目标智能主体的协同作业伙伴以及目标智能主体的移动速度与位置,然后,返回构建多智能主体协同作业的拓扑结构的步骤,直至多智能主体完成搬运物件这一任务,由于前述通过设置交互成本函数作为一种内在激励成本函数,使得每个智能主体对有影响力的状态和行为点进行更频繁的探索,通过激励智能主体之间的交互,促使多个智能主体之间产生合作,进而使得智能主体能够学习到复杂的合作策略,以有效地解决针对复杂型任务的协同作业和完成。
请参阅附图2,是本申请实施例提供的一种多智能主体协同搬运物件的系统的结构示意图,该系统可以包括确定模块201、策略调用模块202、构建模块 203、第一更新模块204和第二更新模块205,其中:
确定模块201,用于从执行搬运物件这一任务的多智能主体中确定一目标智能主体;
策略调用模块202,用于根据成本函数,从决策集中为目标智能主体调用相应的至少一策略以控制目标智能主体执行期望行为,其中,成本函数与目标智能主体的激励成本函数以及多智能主体中除目标智能主体之外的其他智能主体相对目标智能主体的交互成本函数相关;
构建模块203,用于按照目标智能主体周围的邻居分布各向异同性以及多智能主体中各个智能主体之间释放的信息素,构建多智能主体协同作业的拓扑结构;
第一更新模块204,用于在多智能主体协同作业的拓扑结构下,更新目标智能主体的协同作业伙伴;
第二更新模块205,用于更新目标智能主体的移动速度与位置,返回构建多智能主体协同作业的拓扑结构的步骤,直至多智能主体完成搬运物件这一任务。
在一种可能实现方式中,策略调用模块202可以包括第一确定单元、第二确定单元和控制单元,其中:
第一确定单元,用于确定多智能主体中目标智能主体的交互成本函数;
第二确定单元,用于根据交互成本函数和激励成本函数确定目标智能主体的成本函数;
控制单元,用于根据目标智能主体的成本函数,从决策集中获得一策略,根据策略控制目标智能主体执行期望行为。
在一种可能实现方式中,第一更新模块204可以包括智选单元和第三确定单元,其中:
智选单元,用于按照选中概率p
j与距离d
ij成反比原则,在距离目标智能主体的视野半径r范围内从最邻近的m个邻居中选择智能主体作为目标智能主体的预协同作业伙伴A
j,其中,
d
ij为目标智能主体与预协同作业伙伴A
j之间的距离,m为6或7;
第三确定单元,用于将预协同作业伙伴A
j的合适度与预设合适度函数阈值f
thre相比,若预协同作业伙伴A
j的合适度大于f
thre,则不将预协同作业伙伴A
j作为目标智能主体的协同作业伙伴,否则,将预协同作业伙伴A
j作为目标智能主体的协同作业伙伴。
在一种可能实现方式中,第二更新模块205可以包括速度位置更新单元,用于通过引入的两极分化因子
对多智能主体群体进行控制,以更新目标智能主体的移动速度与位置,其中,
v
i是多智能主体中第i个智能主体的速度,||v
i||为计算v
i在其度量空间中的范数。
在一种可能实现方式中,交互成本函数与期望差值相关,其中,期望差值为转移之后的多智能主体中除目标智能主体之外的其他智能主体的动作-成本 函数与忽略智能主体的状态和动作的条件下经反真实计算得到的动作-成本函数的期望差值。
在一种可能实现方式中,多智能主体中除目标智能主体之外的其他智能主体的动作-成本函数与除目标智能主体之外的其他智能主体的奖励以及其他智能主体在转移之后的期望累计收益之和相关。
在一种可能实现方式中,经反真实计算得到的动作-成本函数与除目标智能主体之外的其他智能主体的反真实奖励以及其他智能主体在转移之后的反真实的期望累计收益之和相关。
需要说明的是,上述实施例提供的多智能主体协同搬运物件的系统在多智能主体协同搬运物件时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将系统的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的多智能主体协同搬运物件的系统与多智能主体协同搬运物件的方法实施例属于同一构思,其具体实现过程以及技术效果详见方法实施例,此处不再赘述。
本申请实施例还提供一种多智能主体协同搬运物件的系统,该多智能主体协同搬运物件的系统如图3所示,其示出了本申请实施例所涉及的多智能主体协同搬运物件的系统的结构示意图,具体来讲:
该多智能主体协同搬运物件的系统可以包括一个或者一个以上处理核心的处理器301、一个或一个以上计算机可读存储介质的存储器302、电源303和输 入单元304等部件。本领域技术人员可以理解,图3中示出的多智能主体协同搬运物件的系统结构并不构成对多智能主体协同搬运物件的系统的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。其中:
处理器301是该多智能主体协同搬运物件的系统的控制中心,利用各种接口和线路连接整个多智能主体协同搬运物件的系统的各个部分,通过运行或执行存储在存储器302内的软件程序和/或模块,以及调用存储在存储器302内的数据,执行多智能主体协同搬运物件的系统的各种功能和处理数据,从而对多智能主体协同搬运物件的系统进行整体监控。可选的,处理器301可包括一个或多个处理核心;优选的,处理器301可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器301中。
存储器302可用于存储软件程序以及模块,处理器301通过运行存储在存储器302的软件程序以及模块,从而执行各种功能应用以及数据处理。存储器302可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据多智能主体协同搬运物件的系统的使用所创建的数据等。此外,存储器302可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。相应地, 存储器302还可以包括存储器控制器,以提供处理器301对存储器302的访问。
多智能主体协同搬运物件的系统还包括给各个部件供电的电源303,可选地,电源303可以通过电源管理系统与处理器301逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。电源303还可以包括一个或一个以上的直流或交流电源、再充电系统、电源故障检测电路、电源转换器或者逆变器、电源状态指示器等任意组件。
该多智能主体协同搬运物件的系统还可包括输入单元304,该输入单元304可用于接收输入的数字或字符信息,以及产生与用户设置以及功能控制有关的键盘、鼠标、操作杆、光学或者轨迹球信号输入。
尽管未示出,多智能主体协同搬运物件的系统还可以包括显示单元等,在此不再赘述。具体在本实施例中,多智能主体协同搬运物件的系统中的处理器301会按照如下的指令,将一个或一个以上的应用程序的进程对应的可执行文件加载到存储器302中,并由处理器301来运行存储在存储器302中的应用程序,从而实现各种功能,如下:从执行搬运物件这一任务的多智能主体中确定一目标智能主体;根据成本函数,从决策集中为目标智能主体调用相应的至少一策略以控制目标智能主体执行期望行为,其中,成本函数与目标智能主体的激励成本函数以及多智能主体中除目标智能主体之外的其他智能主体相对目标智能主体的交互成本函数相关;按照目标智能主体周围的邻居分布各向异同性以及多智能主体中各个智能主体之间释放的信息素,构建多智能主体协同作业的拓扑结构;在多智能主体协同作业的拓扑结构下,更新目标智能主体的协同 作业伙伴;更新目标智能主体的移动速度与位置,返回构建多智能主体协同作业的拓扑结构的步骤,直至多智能主体完成搬运物件这一任务。
以上个操作的具体实施例可参见前面的实施例,在此不再赘述。
由以上可知,根据成本函数,为目标智能主体从决策集中调用相应的至少一策略以控制目标智能主体执行期望行为,并且,在多智能主体协同作业的拓扑结构下,更新目标智能主体的协同作业伙伴以及目标智能主体的移动速度与位置,然后,返回构建多智能主体协同作业的拓扑结构的步骤,直至多智能主体完成搬运物件这一任务,由于前述通过设置交互成本函数作为一种内在激励成本函数,使得每个智能主体对有影响力的状态和行为点进行更频繁的探索,通过激励智能主体之间的交互,促使多个智能主体之间产生合作,进而使得智能主体能够学习到复杂的合作策略,以有效地解决针对复杂型任务的协同作业和完成。
本领域普通技术人员可以理解,上述实施例的各种方法中的全部或部分步骤可以通过指令来完成,或通过指令控制相关的硬件来完成,该指令可以存储于一计算机可读存储介质中,并由处理器进行加载和执行。
为此,本申请实施例提供一种计算机可读存储介质,其中存储有多条指令,该指令能够被处理器进行加载,以执行本申请实施例所提供的任一种多智能主体协同搬运物件的方法中的步骤。例如,该指令可以执行如下步骤:从执行搬运物件这一任务的多智能主体中确定一目标智能主体;根据成本函数,从决策集中为目标智能主体调用相应的至少一策略以控制目标智能主体执行期望行 为,其中,成本函数与目标智能主体的激励成本函数以及多智能主体中除目标智能主体之外的其他智能主体相对目标智能主体的交互成本函数相关;按照目标智能主体周围的邻居分布各向异同性以及多智能主体中各个智能主体之间释放的信息素,构建多智能主体协同作业的拓扑结构;在多智能主体协同作业的拓扑结构下,更新目标智能主体的协同作业伙伴;更新目标智能主体的移动速度与位置,返回构建多智能主体协同作业的拓扑结构的步骤,直至多智能主体完成搬运物件这一任务。
以上各个操作的具体实施方式可参见前面的实施例,在此不再赘述。
其中,该计算机可读存储介质可以包括:只读存储器(ROM,Read Only Memory)、随机存取记忆体(RAM,Random Access Memory)、磁盘或光盘等。
由于该计算机可读存储介质中所存储的指令,可以执行本申请实施例所提供的任一种多智能主体协同搬运物件的方法中的步骤,因此,可以实现本申请实施例所提供的任一种多智能主体协同搬运物件的方法所能实现的有益效果,详见前面的实施例,在此不再赘述。
以上对本申请实施例所提供的一种多智能主体协同搬运物件的方法、设备和计算机可读存储介质进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上,本说明书内容不应理解为对本申 请的限制。
Claims (10)
- 一种多智能主体协同搬运物件的方法,其特征在于,所述方法包括:从执行搬运物件这一任务的多智能主体中确定一目标智能主体;根据成本函数,从决策集中为所述目标智能主体调用相应的至少一策略以控制所述目标智能主体执行期望行为,所述成本函数与所述目标智能主体的激励成本函数以及所述多智能主体中除所述目标智能主体之外的其他智能主体相对所述目标智能主体的交互成本函数相关;按照所述目标智能主体周围的邻居分布各向异同性以及多智能主体中各个智能主体之间释放的信息素,构建多智能主体协同作业的拓扑结构;在所述多智能主体协同作业的拓扑结构下,更新所述目标智能主体的协同作业伙伴;更新所述目标智能主体的移动速度与位置,返回所述构建多智能主体协同作业的拓扑结构的步骤,直至所述多智能主体完成搬运物件这一任务。
- 根据权利要求1所述的多智能主体协同搬运物件方法,其特征在于,所述根据成本函数,为所述目标智能主体从决策集中调用相应的至少一策略以控制所述目标智能主体执行期望行为,包括:确定所述多智能主体中目标智能主体的交互成本函数;根据所述交互成本函数和所述激励成本函数确定所述目标智能主体的成本函数;根据所述目标智能主体的成本函数,从所述决策集中获得一策略,根据所述策略控制所述目标智能主体执行期望行为。
- 根据权利要求1所述的多智能主体协同搬运物件的方法,其特征在于,所述在所述多智能主体协同作业的拓扑结构下,更新所述目标智能主体的协同作业伙伴,包括:按照选中概率p j与距离d ij成反比原则,在距离所述目标智能主体的视野半径r范围内从最邻近的m个邻居中选择智能主体作为目标智能主体的预协同作业伙伴A j,所述 所述d ij为所述目标智能主体与所述预协同作业伙伴A j之间的距离,所述m为6或7;将所述预协同作业伙伴A j的合适度与预设合适度函数阈值f thre相比若所述预协同作业伙伴A j的合适度大于所述f thre,则不将所述预协同作业伙伴A j作为所述目标智能主体的协同作业伙伴,否则将所述预协同作业伙伴A j作为所述目标智能主体的协同作业伙伴。
- 根据权利要求1所述的多智能主体协同搬运物件的方法,其特征在于,所述交互成本函数与期望差值相关,所述期望差值为转移之后的多智能主体中除所述目标智能主体之外的其他智能主体的动作-成本函数与忽略所述智能主体的状态和动作的条件下经反真实计算得到的动作-成本函数的期望差值。
- 根据权利要求5所述的多智能主体协同搬运物件的方法,其特征在于,所述多智能主体中除所述目标智能主体之外的其他智能主体的动作-成本函数与除所述目标智能主体之外的其他智能主体的奖励以及所述其他智能主体在转移之后的期望累计收益之和相关。
- 根据权利要求5所述的多智能主体协同搬运物件的方法,其特征在于,所述经反真实计算得到的动作-成本函数与除所述目标智能主体之外的其他智能主体的反真实奖励以及所述其他智能主体在转移之后的反真实的期望累计收益之和相关。
- 一种多智能主体协同搬运物件的系统,其特征在于,所述系统包括:确定模块,用于从执行搬运物件这一任务的多智能主体中确定一目标智能主体;策略调用模块,用于根据成本函数,从决策集中为所述目标智能主体调用相应的至少一策略以控制所述目标智能主体执行期望行为,所述成本函数与所述目标智能主体的激励成本函数以及所述多智能主体中除所述目标智能主体之外的其他智能主体相对所述目标智能主体的交互成本函数相关;构建模块,用于按照所述目标智能主体周围的邻居分布各向异同性以及多智能主体中各个智能主体之间释放的信息素,构建多智能主体协同作业的拓扑结构;第一更新模块,用于在所述多智能主体协同作业的拓扑结构下,更新所述目标智能主体的协同作业伙伴;第二更新模块,用于更新所述目标智能主体的移动速度与位置,返回所述构建多智能主体协同作业的拓扑结构的步骤,直至所述多智能主体完成搬运物件这一任务。
- 一种多智能主体协同搬运物件的系统,所述系统包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,所述计算机程序代码由该一个或多个处理器加载并执行以实现如权利要求1至6任意一项所述方法的步骤。
- 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至6任意一项所述方法的步骤。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2020/108242 WO2022032442A1 (zh) | 2020-08-10 | 2020-08-10 | 多智能主体协同搬运物件的方法、系统和计算机可读存储介质 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2020/108242 WO2022032442A1 (zh) | 2020-08-10 | 2020-08-10 | 多智能主体协同搬运物件的方法、系统和计算机可读存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022032442A1 true WO2022032442A1 (zh) | 2022-02-17 |
Family
ID=80247482
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/108242 WO2022032442A1 (zh) | 2020-08-10 | 2020-08-10 | 多智能主体协同搬运物件的方法、系统和计算机可读存储介质 |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2022032442A1 (zh) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101719931A (zh) * | 2009-11-27 | 2010-06-02 | 南京邮电大学 | 一种基于多智能主体的层次式云端计算模型构建方法 |
WO2013166096A1 (en) * | 2012-05-01 | 2013-11-07 | 5D Robotics, Inc. | Distributed positioning and collaborative behavior determination |
KR20150075639A (ko) * | 2013-12-26 | 2015-07-06 | 주식회사 라스테크 | 협동로봇 제어시스템 |
CN106272411A (zh) * | 2016-08-24 | 2017-01-04 | 上海交通大学 | 基于引力源的多机器人协同搬运船舱货物方法 |
CN108829140A (zh) * | 2018-09-11 | 2018-11-16 | 河南大学 | 一种基于多群体蚁群算法的多无人机协同目标搜索方法 |
CN110162103A (zh) * | 2019-06-13 | 2019-08-23 | 河南宙合网络科技有限公司 | 一种无人机与智能车组自主协同运输系统及方法 |
CN110245809A (zh) * | 2019-06-26 | 2019-09-17 | 北京洛必德科技有限公司 | 一种用于多机器人多任务协作工作的智能优化方法和系统 |
CN110347159A (zh) * | 2019-07-12 | 2019-10-18 | 苏州融萃特种机器人有限公司 | 移动机器人多机协作方法和系统 |
-
2020
- 2020-08-10 WO PCT/CN2020/108242 patent/WO2022032442A1/zh active Application Filing
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101719931A (zh) * | 2009-11-27 | 2010-06-02 | 南京邮电大学 | 一种基于多智能主体的层次式云端计算模型构建方法 |
WO2013166096A1 (en) * | 2012-05-01 | 2013-11-07 | 5D Robotics, Inc. | Distributed positioning and collaborative behavior determination |
KR20150075639A (ko) * | 2013-12-26 | 2015-07-06 | 주식회사 라스테크 | 협동로봇 제어시스템 |
CN106272411A (zh) * | 2016-08-24 | 2017-01-04 | 上海交通大学 | 基于引力源的多机器人协同搬运船舱货物方法 |
CN108829140A (zh) * | 2018-09-11 | 2018-11-16 | 河南大学 | 一种基于多群体蚁群算法的多无人机协同目标搜索方法 |
CN110162103A (zh) * | 2019-06-13 | 2019-08-23 | 河南宙合网络科技有限公司 | 一种无人机与智能车组自主协同运输系统及方法 |
CN110245809A (zh) * | 2019-06-26 | 2019-09-17 | 北京洛必德科技有限公司 | 一种用于多机器人多任务协作工作的智能优化方法和系统 |
CN110347159A (zh) * | 2019-07-12 | 2019-10-18 | 苏州融萃特种机器人有限公司 | 移动机器人多机协作方法和系统 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Cao et al. | Cooperative mobile robotics: Antecedents and directions | |
Jonsson et al. | Causal Graph Based Decomposition of Factored MDPs. | |
Yogeswaran et al. | Reinforcement learning: Exploration–exploitation dilemma in multi-agent foraging task | |
Luviano et al. | Continuous-time path planning for multi-agents with fuzzy reinforcement learning | |
WO2022032444A1 (zh) | 一种多智能主体避障方法、系统和计算机可读存储介质 | |
WO2022032443A1 (zh) | 多智能主体编队搬运方法、系统和计算机可读存储介质 | |
CN112700099A (zh) | 基于强化学习和运筹学的资源调度规划方法 | |
McMahon et al. | A survey on the integration of machine learning with sampling-based motion planning | |
Malavolta et al. | Mining the ROS ecosystem for green architectural tactics in robotics and an empirical evaluation | |
Gu et al. | An improved Q-Learning algorithm for path planning in maze environments | |
Tkach et al. | Towards addressing dynamic multi-agent task allocation in law enforcement | |
Setyawan et al. | Cooperative multi-robot hierarchical reinforcement learning | |
Chen et al. | Deep reinforcement learning-based robot exploration for constructing map of unknown environment | |
Kafaf et al. | A web service-based approach for developing self-adaptive systems | |
Yuan et al. | Research on flexible job shop scheduling problem with AGV using double DQN | |
Kwa et al. | Adaptivity: a path towards general swarm intelligence? | |
WO2022032442A1 (zh) | 多智能主体协同搬运物件的方法、系统和计算机可读存储介质 | |
Chen et al. | Hybrid MDP based integrated hierarchical Q-learning | |
Schneckenreither | Average reward adjusted discounted reinforcement learning: Near-blackwell-optimal policies for real-world applications | |
Gautier et al. | Deep Q-learning-based dynamic management of a robotic cluster | |
Jin et al. | Methods for blended shared control of hydraulic excavators with learning and prediction | |
Fernandez-Gauna et al. | Undesired state-action prediction in multi-agent reinforcement learning for linked multi-component robotic system control | |
CN112034844A (zh) | 多智能主体编队搬运方法、系统和计算机可读存储介质 | |
Braga et al. | A topological reinforcement learning agent for navigation | |
El Habib Souidi et al. | Multi-agent pursuit-evasion game based on organizational architecture |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20948941 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20948941 Country of ref document: EP Kind code of ref document: A1 |