CN113377655B - Task allocation method based on MAS-Q-learning - Google Patents

Task allocation method based on MAS-Q-learning Download PDF

Info

Publication number
CN113377655B
CN113377655B CN202110664158.9A CN202110664158A CN113377655B CN 113377655 B CN113377655 B CN 113377655B CN 202110664158 A CN202110664158 A CN 202110664158A CN 113377655 B CN113377655 B CN 113377655B
Authority
CN
China
Prior art keywords
agent
intelligent
state
decision
intelligent agent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110664158.9A
Other languages
Chinese (zh)
Other versions
CN113377655A (en
Inventor
王崇骏
张�杰
乔羽
曹亦康
李宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN202110664158.9A priority Critical patent/CN113377655B/en
Publication of CN113377655A publication Critical patent/CN113377655A/en
Application granted granted Critical
Publication of CN113377655B publication Critical patent/CN113377655B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3684Test management for test design, e.g. generating new test cases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a task allocation method based on MAS-Q-learning, which is used for acquiring user data in a real application scene, modeling the user data by adopting a Markov decision, designing crowdsourcing personnel into an agent quintuple, and calculating global benefits of the crowdsourcing personnel by a Q value learning method; and positioning the state of the adjacent intelligent agent and the next state, describing the association relation among the intelligent agent members by using a Laplace matrix, calculating by using a multi-attribute decision method, and then carrying out weight distribution and aggregation on the calculation result. And estimating the action-value function by adopting a time difference method, and simultaneously providing an agent state function meeting the conditions of rationality and integrity. The invention has good robustness and adaptability.

Description

Task allocation method based on MAS-Q-learning
Technical Field
The invention relates to the field of task allocation, and is mainly applied to crowdsourcing scenes, in particular to a cost optimization problem of complex task allocation under the crowdsourcing scenes.
Background
The design power of the invention is derived from the emerging application of software testing work in the current crowdsourcing, and the general crowdsourcing process is characterized in that the task allocation is not clear, and the crowdsourcing workers cannot obtain the maximum personal benefit.
Disclosure of Invention
The invention aims to: in order to avoid the problems that task allocation is not clear in the crowdsourcing process, personal benefits cannot be obtained by crowdsourcing workers and the like, the invention provides a task allocation method based on MAS-Q-learning. The robustness of the model is improved by using the Q value learning method and designing a knowledge sharing mechanism, and the expandability of the solution scheme can be improved by utilizing the interaction characteristic by allowing partial knowledge sharing among various agents, wherein most of the agents are similar to each other and are mutually influenced by the collective states of the agents. Secondly, training and solving are carried out on small sample data, the data is trained in a semi-supervision mode, and modeling is carried out on an uncertainty area; and the model can also utilize the symmetry of a large multi-agent system to converge task allocation into a differential-convex function planning problem, so that the convergence of an algorithm is improved. Finally, in order to verify the algorithm, the related simulator developed on the multi-agent carries out transfer learning on the task allocation problem and the climbing problem, and multi-agent systems and environments with different scales are tested, so that the algorithm has better Q value learning effect than the traditional multi-agent.
The technical scheme is as follows: in order to achieve the above purpose, the invention adopts the following technical scheme:
a task allocation method based on MAS-Q-learning comprises the following steps:
step 1, data acquisition: user data in a real application scenario is acquired, the user data comprising user generated data having a set of states, an action function, a selection probability and a reward function.
Step 2, data preprocessing: modeling the user data obtained in the step 1 by adopting a Markov decision, carrying out normalization processing on capability data of crowdsourcing personnel aiming at different types of tasks, designing the crowdsourcing personnel into an agent quintuple, and calculating global benefits of the agents by a Q value learning method.
Step 3, state transition: the state of the neighboring agent and the next state are located to assist in its state transition using the target estimated state of the neighboring agent. And the neighbor nodes are positioned and calculated by using the distance observation and the information transmitted by the neighbor nodes.
Step 4, modeling a multi-agent system: the Laplace matrix is used for describing the association relation among the intelligent agent members, and the purpose is to construct a mechanism for information interaction of the intelligent agents of the members in the multi-intelligent system and a corresponding topology model, so that the solving difficulty of the complex problem is reduced.
In the step 4, the modeling of the multi-agent system is as follows:
step 4 a), the intelligent system comprises more than two intelligent agents, and the topological structure of the intelligent agent system is formed by
Figure BDA0003116597340000021
And (3) representing, calculating and obtaining a dynamic equation and an edge state definition of the single agent.
And 4 b), updating a dynamic equation of a single intelligent agent, and then calculating to obtain a corresponding incidence-degree correlation matrix, thereby reasoning to obtain a Laplace matrix, establishing an information feedback model, and further obtaining information interaction feedback of the intelligent agent.
And 4 c), after an information feedback model among the intelligent agents in the multi-intelligent agent system is obtained, model reduction is carried out on the multi-intelligent agent system, and the complexity of solving is reduced based on the spanning tree sub-graph structure. And performing linear transformation on the spanning tree to obtain a spanning residual tree which is used as an internal feedback item of the multi-intelligent system, and finally obtaining the reduced multi-intelligent system model.
Step 5, multi-attribute decision stage: firstly, a decision matrix is given, whether the weight is known and the weight is determined, an aggregation operator of the attribute matrix is obtained according to the attribute value of the decision matrix, meanwhile, a corresponding multi-attribute decision method is selected for calculation according to a solving target and the form of the decision matrix, the calculation result is subjected to weight distribution and aggregation, and decision is carried out according to the final scoring condition of each scheme.
Step 6, method optimization stage: and estimating the action-value function by adopting a time difference method, and simultaneously providing an agent state function meeting the conditions of rationality and integrity.
Preferably: the data preprocessing method in the step 2 is as follows:
step 2 a), designing crowdsourcing personnel into an agent quintuple: < S, A, P, gamma, R >, where S is the state, A is the action function, P is the selection probability, gamma is the discount factor, gamma E (0, 1), R is the reward function.
Step 2 b), when at a certain time t, the agent is in state S t+1 Selecting strategies from the strategy set and generating an action function A t At this time according to probability p t Transition to the next state S t+1 And by analogy, traversing the state to obtain the global benefit of the intelligent agent.
Preferably: the state transition method in the step 3 is as follows:
step 3 a), firstly deducing the Euclidean distance of the intelligent agent relative to the adjacent intelligent agent to obtain the relative estimated position of the intelligent agent j in the local coordinate system under the intelligent agent i, and obtaining the distance observation.
And 3 b) positioning the neighbor node by utilizing the distance observation obtained in the step 3 a) and the information transmitted by the neighbor node.
Preferably: the MAS-Q-learning based task allocation method according to claim 4, wherein: the multi-attribute decision stage method in the step 6 is as follows: and solving the Markov decision process problem under the condition that the transition probability model is unknown. Setting a state (S), an action (A), a reward function (r), a transition probability (p), a Markov of p (S) t+1 |s 0 ,a 0 ,…,s t ,a t )=p(s t+1 |s t ,a t ) Wherein s is t Representing the state at time t, a t Representing behavior at time t; the optimization goal of the model is
Figure BDA0003116597340000031
a t ~π(·|s t ) T=0, … T-1, pi represents a constant,π(·|s t ) Represented in state s t Probability below. At p(s) using reinforcement learning method t+1 |s t ,a t ) Solving a Markov decision process problem under the unknown condition, and estimating an action-value function by adopting a time difference method;
preferably: the agent status meeting the integrity condition includes all information needed for agent decision making.
Preferably: discrete or continuous motion values are designed for the motion of the agent according to the numerical characteristics of the applied control quantity.
Compared with the prior art, the invention has the following beneficial effects:
the invention establishes a multi-person model based on a single person decision method. Aiming at the specificity of crowd test environments, the invention designs a multi-attribute decision mechanism in the crowd test process. The invention selects Q value learning as a training algorithm and optimizes the design of an imperfect information sharing mechanism. Through different imperfect information sharing scenes and different gamma values and data sets, the training results are analyzed, and the system designed by the invention has good robustness and adaptability, and the method and the model provided by the invention have certain applicability. Has reference value for future research in the related field. The method has strong practicability and is suitable for all crowdsourcing system systems.
Drawings
FIG. 1 is a flow chart of the overall method of the present invention;
FIG. 2 is a mass measurement process used in the present invention.
FIG. 3 is a framework of the multi-agent collaborative behavior decision-making model research used in the present invention.
Detailed Description
The present invention is further illustrated in the accompanying drawings and detailed description which are to be understood as being merely illustrative of the invention and not limiting of its scope, and various equivalent modifications to the invention will fall within the scope of the appended claims to the skilled person after reading the invention.
A task allocation method based on MAS-Q-learning, as shown in figures 1-3, comprises the following steps:
step 1, data acquisition: user data in a real application scene is acquired, wherein the user data comprises data with a state set, an action function, a selection probability and a reward function, which are generated by a user, and the four types of data cannot have any loss.
Step 2, data preprocessing: modeling the user data obtained in the step 1 by adopting a Markov decision, carrying out normalization processing on capability data of crowdsourcing personnel aiming at different types of tasks, designing the crowdsourcing personnel into an agent quintuple, and calculating global benefits of the agents by a Q value learning method.
The data preprocessing method in the step 2 is as follows:
step 2 a), designing crowdsourcing personnel into an agent quintuple: < S, A, P, gamma, R >, where S is the state, A is the action function, P is the selection probability, gamma is the discount factor, gamma E (0, 1), R is the reward function.
Step 2 b), when at a certain time t, the agent is in state S t+1 Selecting strategies from the strategy set and generating an action function A t At this time according to probability p t Transition to the next state S t+1 And by analogy, traversing the state to obtain the global benefit of the intelligent agent.
Step 3, state transition: the state of the neighboring agent and the next state are located to assist in its state transition using the target estimated state of the neighboring agent. And the neighbor nodes are positioned and calculated by using the distance observation and the information transmitted by the neighbor nodes.
Step 3 a), firstly deducing the Euclidean distance of the intelligent agent relative to the adjacent intelligent agent to obtain the relative estimated position of the intelligent agent j in the local coordinate system under the intelligent agent i, and obtaining the distance observation.
And 3 b) positioning the neighbor node by utilizing the distance observation obtained in the step 3 a) and the information transmitted by the neighbor node.
Step 4, modeling a multi-agent system: the invention adopts the Laplace matrix to describe the association relation among the intelligent agent members, and aims to construct a mechanism for information interaction of the intelligent agents of the members in a multi-intelligent system and a corresponding topology model, thereby reducing the solving difficulty of complex problems.
In the step 4, the modeling of the multi-agent system is as follows:
step 4 a), the intelligent system comprises more than two intelligent agents, and the topological structure of the intelligent agent system is formed by
Figure BDA0003116597340000041
And (3) representing, calculating and obtaining a dynamic equation and an edge state definition of the single agent.
And 4 b), updating a dynamic equation of a single intelligent agent, and then calculating to obtain a corresponding incidence-degree correlation matrix, thereby reasoning to obtain a Laplace matrix, establishing an information feedback model, and further obtaining information interaction feedback of the intelligent agent.
And 4 c), after an information feedback model among the intelligent agents in the multi-intelligent agent system is obtained, model reduction is carried out on the multi-intelligent agent system, and the complexity of solving is reduced based on the spanning tree sub-graph structure. And performing linear transformation on the spanning tree to obtain a spanning residual tree which is used as an internal feedback item of the multi-intelligent system, and finally obtaining the reduced multi-intelligent system model.
Step 5, multi-attribute decision stage: firstly, a decision matrix is given, whether the weight is known and the weight is determined, an aggregation operator of the attribute matrix is obtained according to the attribute value of the decision matrix, meanwhile, a corresponding multi-attribute decision method is selected for calculation according to a solving target and the form of the decision matrix, the calculation result is subjected to weight distribution and aggregation, and decision is carried out according to the final scoring condition of each scheme.
Step 6, method optimization stage: and estimating the action-value function by adopting a time difference method, and simultaneously providing an agent state function meeting the conditions of rationality and integrity.
The multi-attribute decision stage method in the step 6 is as follows: and solving the Markov decision process problem under the condition that the transition probability model is unknown. Setting a state (S), an action (A), a reward function (r), a transition probability (p), a Markov of p (S) t+1 |s 0 ,a 0 ,…,s t ,a t )=p(s t+1 |s t ,a t ) Wherein s is t Representing the state at time t, a t Representing behavior at time t; the optimization goal of the model is
Figure BDA0003116597340000051
a t ~π(·|s t ) T=0,..t-1. At p(s) using reinforcement learning method t+1 |s t ,a t ) And solving the Markov decision process problem under the unknown condition, and estimating the action-value function by adopting a time difference method. Under the research framework, the intelligent agent state is designed to meet the conditions of rationality, integrity and the like. The integrity requirement state contains all information needed by the decision of the agent, such as trend information of the target track needs to be added in the track tracking problem of the agent, but if the information cannot be observed, the extended state contains historical observation values. The motion of the agent is designed, and discrete or continuous motion values are designed according to the numerical characteristics of the applied control quantity.
In practical deployment, the method cannot be applied for one time and can not be used for all, and corresponding adjustment is needed according to different data such as a decision set, an action set and the like of a user.
In summary, the invention designs a multi-attribute decision mechanism in the crowd test process. We choose Q-value learning as the training algorithm and optimize the design of the imperfect information sharing mechanism. Through different imperfect information sharing scenes and different gamma values and data sets, the training results are analyzed, experiments prove that the method can converge when running to the 50 th round, the algorithm has certain superiority in the aspects of convergence speed and stability, the effect is good, and the system designed by the method has good robustness and adaptability, and the method and the model provided by the invention have certain applicability. Has reference value for future research in the related field.
The foregoing is only a preferred embodiment of the invention, it being noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the present invention, and such modifications and adaptations are intended to be comprehended within the scope of the invention.

Claims (4)

1. The MAS-Q-learning-based task allocation method is characterized by comprising the following steps:
step 1, data acquisition: acquiring user data in a real application scene, wherein the user data comprises data with a state set, an action function, a selection probability and a reward function, which are generated by a user;
step 2, data preprocessing: modeling the user data obtained in the step 1 by adopting a Markov decision, carrying out normalization processing on capability data of crowdsourcing personnel aiming at different types of tasks, designing the crowdsourcing personnel into an agent quintuple, and calculating global benefits of the crowdsourcing personnel by a Q value learning method;
the data preprocessing method in the step 2 is as follows:
step 2 a), designing crowdsourcing personnel into an agent quintuple:
Figure QLYQS_1
wherein->
Figure QLYQS_2
For the status of->
Figure QLYQS_3
For action function +.>
Figure QLYQS_4
To select probability +.>
Figure QLYQS_5
For discounts factor->
Figure QLYQS_6
,/>
Figure QLYQS_7
Is a reward function;
step 2 b), when at a certain moment
Figure QLYQS_8
When the intelligent agent is in the state->
Figure QLYQS_9
Selecting a policy from the set of policies and generating an action function +.>
Figure QLYQS_10
At this time according to probability->
Figure QLYQS_11
Transition to the next state->
Figure QLYQS_12
And so on, after traversing the state, obtaining the global benefit of the intelligent agent;
step 3, state transition: positioning the state of the adjacent agent and the next state so as to assist the state transition of the adjacent agent by using the target estimated state of the adjacent agent; the neighbor node performs positioning and calculates by using distance observation and information transmitted by the neighbor node;
step 4, modeling a multi-agent system: the Laplace matrix is used for describing the association relation among the intelligent agent members, and the purpose is to construct a mechanism for information interaction of the intelligent agents of the members in the multi-intelligent system and a corresponding topological model, so that the solving difficulty of the complex problem is reduced;
the multi-intelligent system modeling method comprises the following steps:
step 4 a), the intelligent system comprises more than two intelligent agents, and the topological structure of the intelligent agent system is formed by
Figure QLYQS_13
Representing, calculating to obtain a dynamic equation and an edge state definition of a single agent;
step 4 b), updating a dynamic equation of a single intelligent agent, and then calculating to obtain a corresponding incidence-degree associated matrix, thereby reasoning to obtain a Laplace matrix, establishing an information feedback model, and further obtaining information interaction feedback of the intelligent agent;
step 4 c), after an information feedback model among the intelligent agents in the multi-intelligent agent system is obtained, model reduction is carried out on the multi-intelligent agent system, and the complexity of solving is reduced based on a spanning tree sub-graph structure; performing linear transformation on the spanning tree to obtain a spanning residual tree which is used as an internal feedback item of the multi-intelligent system, and finally obtaining a reduced multi-intelligent system model;
step 5, multi-attribute decision stage: firstly, giving a decision matrix, judging whether the weight is known and determining the weight, obtaining an aggregation operator of the attribute matrix according to the attribute value of the decision matrix, selecting a corresponding multi-attribute decision method to calculate according to a solving target and the form of the decision matrix, distributing and aggregating the weight of a calculation result, and deciding according to the scoring condition of each scheme;
the multi-attribute decision stage method is as follows: solving the Markov decision process problem under the condition that the transition probability model is unknown, and setting the state
Figure QLYQS_14
Action->
Figure QLYQS_17
Reward function->
Figure QLYQS_21
Transition probability->
Figure QLYQS_15
Its Markov is
Figure QLYQS_18
Wherein s is t State at time t +.>
Figure QLYQS_20
Representing behavior at time t; the optimization goal of the model is->
Figure QLYQS_23
,/>
Figure QLYQS_16
Representing a constant->
Figure QLYQS_19
Is indicated in the state->
Figure QLYQS_22
Probability under +.>
Figure QLYQS_24
Solving a Markov decision process problem under the unknown condition, and estimating an action-value function by adopting a time difference method;
step 6, method optimization stage: and estimating the action-value function by adopting a time difference method, and simultaneously providing an agent state function meeting the conditions of rationality and integrity.
2. The MAS-Q-learning based task allocation method according to claim 1, wherein: the state transition method in the step 3 is as follows:
step 3 a), firstly deducing the Euclidean distance of the intelligent agent relative to the adjacent intelligent agent to obtain the intelligent agent
Figure QLYQS_25
In the intelligent body
Figure QLYQS_26
The relative estimated position of the lower local coordinate system is used for obtaining distance observation;
and 3 b) positioning the neighbor node by utilizing the distance observation obtained in the step 3 a) and the information transmitted by the neighbor node.
3. The MAS-Q-learning based task allocation method according to claim 2, wherein: the agent status meeting the integrity condition includes all information needed for agent decision making.
4. The MAS-Q-learning based task allocation method according to claim 3, wherein: discrete or continuous motion values are designed for the motion of the agent according to the numerical characteristics of the applied control quantity.
CN202110664158.9A 2021-06-16 2021-06-16 Task allocation method based on MAS-Q-learning Active CN113377655B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110664158.9A CN113377655B (en) 2021-06-16 2021-06-16 Task allocation method based on MAS-Q-learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110664158.9A CN113377655B (en) 2021-06-16 2021-06-16 Task allocation method based on MAS-Q-learning

Publications (2)

Publication Number Publication Date
CN113377655A CN113377655A (en) 2021-09-10
CN113377655B true CN113377655B (en) 2023-06-20

Family

ID=77574510

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110664158.9A Active CN113377655B (en) 2021-06-16 2021-06-16 Task allocation method based on MAS-Q-learning

Country Status (1)

Country Link
CN (1) CN113377655B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109409739A (en) * 2018-10-19 2019-03-01 南京大学 A kind of crowdsourcing platform method for allocating tasks based on part Observable markov decision process
WO2020092437A1 (en) * 2018-10-29 2020-05-07 Google Llc Determining control policies by minimizing the impact of delusion
CN111695690A (en) * 2020-07-30 2020-09-22 航天欧华信息技术有限公司 Multi-agent confrontation decision-making method based on cooperative reinforcement learning and transfer learning
CN111770454A (en) * 2020-07-03 2020-10-13 南京工业大学 Game method for position privacy protection and platform task allocation in mobile crowd sensing
CN111860649A (en) * 2020-07-21 2020-10-30 赵佳 Action set output method and system based on multi-agent reinforcement learning
CN112121439A (en) * 2020-08-21 2020-12-25 林瑞杰 Cloud game engine intelligent optimization method and device based on reinforcement learning
CN112598137A (en) * 2020-12-21 2021-04-02 西北工业大学 Optimal decision method based on improved Q-learning
CN112801430A (en) * 2021-04-13 2021-05-14 贝壳找房(北京)科技有限公司 Task issuing method and device, electronic equipment and readable storage medium
CN112884239A (en) * 2021-03-12 2021-06-01 重庆大学 Aerospace detonator production scheduling method based on deep reinforcement learning

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109409739A (en) * 2018-10-19 2019-03-01 南京大学 A kind of crowdsourcing platform method for allocating tasks based on part Observable markov decision process
WO2020092437A1 (en) * 2018-10-29 2020-05-07 Google Llc Determining control policies by minimizing the impact of delusion
CN111770454A (en) * 2020-07-03 2020-10-13 南京工业大学 Game method for position privacy protection and platform task allocation in mobile crowd sensing
CN111860649A (en) * 2020-07-21 2020-10-30 赵佳 Action set output method and system based on multi-agent reinforcement learning
CN111695690A (en) * 2020-07-30 2020-09-22 航天欧华信息技术有限公司 Multi-agent confrontation decision-making method based on cooperative reinforcement learning and transfer learning
CN112121439A (en) * 2020-08-21 2020-12-25 林瑞杰 Cloud game engine intelligent optimization method and device based on reinforcement learning
CN112598137A (en) * 2020-12-21 2021-04-02 西北工业大学 Optimal decision method based on improved Q-learning
CN112884239A (en) * 2021-03-12 2021-06-01 重庆大学 Aerospace detonator production scheduling method based on deep reinforcement learning
CN112801430A (en) * 2021-04-13 2021-05-14 贝壳找房(北京)科技有限公司 Task issuing method and device, electronic equipment and readable storage medium

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
Learning Task Allocation for Multiple Flows in Multi-Agent Systems;Zheng Zhao;《2009 International Conference on Communication Software and Networks》;1-9 *
分布式任务分配中的一种信誉重连策略;张雷;《广西大学学报(自然科学版)》;645-648 *
基于微云的移动云平台负载分发与资源分配;郑晓杰;《中国优秀硕士学位论文全文数据库信息科技辑》;I139-142 *
基于模糊马尔科夫理论的机动智能体决策模型;杨萍;毕义明;刘卫东;;系统工程与电子技术(第03期);1-5 *
基于深度强化学习的空间众包任务分配策略;倪志伟;《模式识别与人工智能 》;191-205 *

Also Published As

Publication number Publication date
CN113377655A (en) 2021-09-10

Similar Documents

Publication Publication Date Title
Liu et al. Assessing optimal assignment under uncertainty: An interval-based algorithm
CN104408518B (en) Based on the neural network learning optimization method of particle swarm optimization algorithm
Tsiogkas et al. Efficient multi-AUV cooperation using semantic knowledge representation for underwater archaeology missions
CN112633591B (en) Space searching method and device based on deep reinforcement learning
Wei et al. Multi-robot path planning for mobile sensing through deep reinforcement learning
CN116520281B (en) DDPG-based extended target tracking optimization method and device
CN114861368B (en) Construction method of railway longitudinal section design learning model based on near-end strategy
CN115599779A (en) Urban road traffic missing data interpolation method and related equipment
CN116560409A (en) Unmanned aerial vehicle cluster path planning simulation method based on MADDPG-R
Lee et al. Sampling of pareto-optimal trajectories using progressive objective evaluation in multi-objective motion planning
Li et al. Differentiable bootstrap particle filters for regime-switching models
Atashbar et al. AI and macroeconomic modeling: Deep reinforcement learning in an RBC model
CN113379063B (en) Whole-flow task time sequence intelligent decision-making method based on online reinforcement learning model
CN113377655B (en) Task allocation method based on MAS-Q-learning
Zhang et al. Mobile robot localization based on gradient propagation particle filter network
Yashin et al. Assessment of Material and Intangible Motivation of Top Management in Regions Using Multipurpose Genetic Algorithm
Hu et al. An experience aggregative reinforcement learning with multi-attribute decision-making for obstacle avoidance of wheeled mobile robot
Er et al. A novel framework for automatic generation of fuzzy neural networks
Chen et al. A New Decision-Making Process for Selecting Project Leader Based on Social Network and Knowledge Map.
Owda et al. Using artificial neural network techniques for prediction of electric energy consumption
Qiu et al. Evaluation criterion for different methods of multiple-attribute group decision making with interval-valued intuitionistic fuzzy information
Wawrzynczak et al. Feedforward neural networks in forecasting the spatial distribution of the time-dependent multidimensional functions
Ahadi et al. A new hybrid for software cost estimation using particle swarm optimization and differential evolution algorithms
Pedroso et al. The sea exploration problem: Data-driven orienteering on a continuous surface
Du et al. Multi-agent trajectory prediction based on graph neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant