CN113377655A - MAS-Q-learning-based task allocation method - Google Patents

MAS-Q-learning-based task allocation method Download PDF

Info

Publication number
CN113377655A
CN113377655A CN202110664158.9A CN202110664158A CN113377655A CN 113377655 A CN113377655 A CN 113377655A CN 202110664158 A CN202110664158 A CN 202110664158A CN 113377655 A CN113377655 A CN 113377655A
Authority
CN
China
Prior art keywords
agent
state
intelligent
decision
mas
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110664158.9A
Other languages
Chinese (zh)
Other versions
CN113377655B (en
Inventor
王崇骏
张�杰
乔羽
曹亦康
李宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN202110664158.9A priority Critical patent/CN113377655B/en
Publication of CN113377655A publication Critical patent/CN113377655A/en
Application granted granted Critical
Publication of CN113377655B publication Critical patent/CN113377655B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3668Software testing
    • G06F11/3672Test management
    • G06F11/3684Test management for test design, e.g. generating new test cases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a MAS-Q-learning-based task allocation method, which comprises the steps of obtaining user data in a real application scene, modeling the user data by adopting Markov decision, designing crowdsourcing personnel into an intelligent body quintuple, and calculating the global income of the crowdsourcing personnel through a Q value learning method; and positioning the state of the adjacent agent and the next state, describing the incidence relation among the agent members by adopting a Laplace matrix, calculating by adopting a multi-attribute decision method, and distributing and aggregating the calculation results by weight. And estimating an action-value function by adopting a time difference method, and simultaneously providing an intelligent state function meeting the conditions of reasonableness and completeness. The invention has good robustness and adaptability.

Description

MAS-Q-learning-based task allocation method
Technical Field
The invention relates to the field of task allocation, is mainly applied to a crowdsourcing scene, and particularly relates to a cost optimization problem of complex task allocation in the crowdsourcing scene.
Background
The design power of the invention is derived from emerging application of software testing work in the current crowdsourcing, and a general crowdsourcing process, in the crowdsourcing process, task allocation is not clear, and crowdsourcing workers cannot obtain personal income maximization.
Disclosure of Invention
The purpose of the invention is as follows: in order to avoid the problems that task allocation is ambiguous, crowdsourcing workers cannot obtain personal income maximization and the like in the crowdsourcing process, the invention provides a task allocation method based on MAS-Q-learning. The Q value learning method is used and a knowledge sharing mechanism is designed, the robustness of the model is improved, and the expandability of the solution scheme can be improved by utilizing the interaction characteristic by allowing partial knowledge sharing among various agents, wherein most agents are similar to each other and mutually influence through the collective states of the agents. Secondly, training and solving are carried out on small sample data, the data are trained in a semi-supervised mode, and modeling is carried out on an uncertain region; moreover, the model can also utilize the symmetry of a large-scale multi-agent system to converge task allocation into a difference-convex function planning problem, so that the convergence of the algorithm is improved. Finally, in order to verify the algorithm, a related simulator developed on the multi-agent carries out transfer learning on the task allocation problem and the mountain climbing problem, and multi-agent systems and environments with different scales are tested, so that the algorithm disclosed by the invention has a better learning effect than the Q value of the traditional multi-agent.
The technical scheme is as follows: in order to achieve the purpose, the invention adopts the technical scheme that:
a task allocation method based on MAS-Q-learning comprises the following steps:
step 1, data acquisition: user data in a real application scenario is obtained, the user data including user generated data having a state set, an action function, a selection probability, and a reward function.
Step 2, data preprocessing: modeling the user data obtained in the step 1 by adopting Markov decision, carrying out normalization processing on capacity data of crowdsourcing personnel aiming at different types of tasks, designing the crowdsourcing personnel into an intelligent body quintuple, and calculating the global income of the crowdsourcing personnel by a Q value learning method.
And step 3, state transition: the state of the neighboring agent and the next state are located to assist the self state transition with the target estimated state of the neighboring agent. And the neighbor nodes perform positioning and are calculated by using distance observation and information transmitted by the neighbor nodes.
Step 4, modeling the multi-agent system: the Laplace matrix is used for describing the incidence relation among the members of the agents, and the purpose is to construct a mechanism for information interaction of the agents of the members in the multi-agent system and a corresponding topological model, so that the difficulty in solving complex problems is reduced.
In the step 4, the multi-agent system is modeled as follows:
step 4a), the intelligent body system comprises more than two intelligent bodies, and the topological structure of the intelligent body system is composed of
Figure BDA0003116597340000021
And expressing, and calculating to obtain a kinetic equation and an edge state definition of a single intelligent body.
And 4b), updating the kinetic equation of a single agent, calculating to obtain a corresponding incidence correlation matrix, deducing to obtain a Laplace matrix, establishing an information feedback model, and further obtaining the information interaction feedback of the agent.
And 4c), after information feedback models among the agents in the multi-agent system are obtained, model reduction is carried out on the multi-agent system, and the complexity of solving is reduced based on the spanning tree sub-graph structure. And performing linear transformation on the spanning tree to obtain a spanning residual tree which is used as an internal feedback item of the multi-agent system, and finally obtaining the reduced-order multi-agent system model.
Step 5, a multi-attribute decision stage: firstly, a decision matrix is given, whether the weight is known or not is judged, the weight is determined, an aggregation operator of the attribute matrix is obtained according to the attribute value of the decision matrix, meanwhile, a corresponding multi-attribute decision method is selected for calculation according to the form of the solved target and the decision matrix, the calculation result is subjected to weight distribution and aggregation, and decision is made according to the scoring condition of the final schemes.
Step 6, the method optimization stage: and estimating an action-value function by adopting a time difference method, and simultaneously providing an intelligent state function meeting the conditions of reasonableness and completeness.
Preferably: the data preprocessing method in the step 2 is as follows:
step 2a), designing crowdsourcing personnel into an intelligent agent quintuple: s, A, P, gamma and R >, wherein S is a state, A is an action function, P is a selection probability, gamma is a discount factor, gamma belongs to (0,1) and R is a reward function.
Step 2b), when at a certain time t, the agent is in state St+1Selecting strategy from strategy set and generating action function AtAt this time according to the probability ptTransfer to the next state St+1And repeating the state, and obtaining the global benefit of the agent.
Preferably: the state transition method in the step 3 is as follows:
and 3a), firstly, deducing the Euclidean distance of the intelligent agent relative to the adjacent intelligent agent to obtain the relative estimation position of the intelligent agent j in a local coordinate system under the intelligent agent i to obtain distance observation.
And 3b), positioning the neighbor node by using the distance observation obtained in the step 3a) and the information transmitted by the neighbor node.
Preferably: the MAS-Q-learning based task allocation method as claimed in claim 4, wherein: the multi-attribute decision phase method in the step 6 is as follows: and solving the Markov decision process problem under the condition that the transition probability model is unknown. Setting a state (S), an action (A), a reward function (r), a transition probability (p) with a Markov property of p (S)t+1|s0,a0,…,st,at)=p(st+1|st,at) Wherein s istIndicating the state at time t, atRepresenting behavior at time t; the optimization goal of the model is
Figure BDA0003116597340000031
at~π(·|st) T is 0, … T-1, pi denotes a constant, pi (· | s)t) Is shown in state stThe probability of the following. Using reinforcement learning method in p(s)t+1|st,at) Solving the Markov decision process problem under the unknown condition, and estimating an action-value function by adopting a time difference method;
preferably: the state of the agent satisfying the integrity condition includes all information required by the agent's decision.
Preferably: and designing discrete or continuous action values for the action of the intelligent agent according to the numerical characteristics of the applied control quantity.
Compared with the prior art, the invention has the following beneficial effects:
the invention establishes a multi-person model based on a single-person decision method. Aiming at the particularity of the crowd test environment, the invention designs a multi-attribute decision-making mechanism in the crowd test process. The invention selects Q value learning as a training algorithm and optimizes the design of an imperfect information sharing mechanism. Through different imperfect information sharing scenes and different gamma values and data sets, training results are analyzed, and the system designed by the method has good robustness and adaptability, and the method and the model provided by the invention have certain applicability. Has reference value for future research in the related field. The method has strong practicability and is suitable for all crowdsourcing systems.
Drawings
FIG. 1 is an overall flow chart of the method of the present invention;
FIG. 2 is a cross-testing process used in the present invention.
FIG. 3 is a multi-agent collaborative behavior decision model research framework used in the present invention.
Detailed Description
The present invention is further illustrated by the following description in conjunction with the accompanying drawings and the specific embodiments, it is to be understood that these examples are given solely for the purpose of illustration and are not intended as a definition of the limits of the invention, since various equivalent modifications will occur to those skilled in the art upon reading the present invention and fall within the limits of the appended claims.
A task allocation method based on MAS-Q-learning, as shown in fig. 1-3, comprising the following steps:
step 1, data acquisition: user data in a real application scene is acquired, wherein the user data comprises data which are generated by a user and comprise a state set, an action function, a selection probability and a reward function, and the four types of data cannot have any deficiency.
Step 2, data preprocessing: modeling the user data obtained in the step 1 by adopting Markov decision, carrying out normalization processing on capacity data of crowdsourcing personnel aiming at different types of tasks, designing the crowdsourcing personnel into an intelligent body quintuple, and calculating the global income of the crowdsourcing personnel by a Q value learning method.
The data preprocessing method in the step 2 is as follows:
step 2a), designing crowdsourcing personnel into an intelligent agent quintuple: s, A, P, gamma and R >, wherein S is a state, A is an action function, P is a selection probability, gamma is a discount factor, gamma belongs to (0,1) and R is a reward function.
Step 2b), when at a certain time t, the agent is in state St+1Selecting strategy from strategy set and generating action function AtAt this time according to the probability ptTransfer to the next state St+1And repeating the state, and obtaining the global benefit of the agent.
And step 3, state transition: the state of the neighboring agent and the next state are located to assist the self state transition with the target estimated state of the neighboring agent. And the neighbor nodes perform positioning and are calculated by using distance observation and information transmitted by the neighbor nodes.
And 3a), firstly, deducing the Euclidean distance of the intelligent agent relative to the adjacent intelligent agent to obtain the relative estimation position of the intelligent agent j in a local coordinate system under the intelligent agent i to obtain distance observation.
And 3b), positioning the neighbor node by using the distance observation obtained in the step 3a) and the information transmitted by the neighbor node.
Step 4, modeling the multi-agent system: the invention adopts the Laplace matrix to describe the incidence relation among the members of the agents, and aims to construct a mechanism for information interaction among the agents in the multi-agent system and a corresponding topological model, thereby reducing the difficulty in solving complex problems.
In the step 4, the multi-agent system is modeled as follows:
step 4a), the intelligent body system comprises more than two intelligent bodies, and the topological structure of the intelligent body system is composed of
Figure BDA0003116597340000041
And expressing, and calculating to obtain a kinetic equation and an edge state definition of a single intelligent body.
And 4b), updating the kinetic equation of a single agent, calculating to obtain a corresponding incidence correlation matrix, deducing to obtain a Laplace matrix, establishing an information feedback model, and further obtaining the information interaction feedback of the agent.
And 4c), after information feedback models among the agents in the multi-agent system are obtained, model reduction is carried out on the multi-agent system, and the complexity of solving is reduced based on the spanning tree sub-graph structure. And performing linear transformation on the spanning tree to obtain a spanning residual tree which is used as an internal feedback item of the multi-agent system, and finally obtaining the reduced-order multi-agent system model.
Step 5, a multi-attribute decision stage: firstly, a decision matrix is given, whether the weight is known or not is judged, the weight is determined, an aggregation operator of the attribute matrix is obtained according to the attribute value of the decision matrix, meanwhile, a corresponding multi-attribute decision method is selected for calculation according to the form of the solved target and the decision matrix, the calculation result is subjected to weight distribution and aggregation, and decision is made according to the scoring condition of the final schemes.
Step 6, the method optimization stage: and estimating an action-value function by adopting a time difference method, and simultaneously providing an intelligent state function meeting the conditions of reasonableness and completeness.
The multi-attribute decision phase method in the step 6 is as follows: solving under the condition that a transition probability model is unknownSolving a markov decision process problem. Setting a state (S), an action (A), a reward function (r), a transition probability (p) with a Markov property of p (S)t+1|s0,a0,…,st,at)=p(st+1|st,at) Wherein s istIndicating the state at time t, atRepresenting behavior at time t; the optimization goal of the model is
Figure BDA0003116597340000051
at~π(·|st) T-1, T ═ 0. Using reinforcement learning method in p(s)t+1|st,at) And solving the Markov decision process problem under the unknown condition, and estimating an action-value function by adopting a time difference method. Under the research framework, the state of the intelligent agent is designed, and the conditions of reasonability, completeness and the like are met. The integrity requirement state contains all information needed by the decision of the agent, for example, trend information of a target track needs to be added in the track tracing problem of the agent, but if the information cannot be observed, the state needs to be expanded to contain historical observed values. The action of the intelligent agent is designed, and discrete or continuous action values are designed according to the numerical characteristics of the applied control quantity.
In actual deployment, the method is not suitable for all, and needs to be adjusted according to different data of a user, such as a decision set, an action set and the like.
In summary, the present invention designs a multi-attribute decision mechanism in the crowd test process. We choose Q learning as the training algorithm and optimize the design of imperfect information sharing mechanisms. Through different imperfect information sharing scenes and different gamma values and data sets, training results are analyzed, experiments prove that the method can be converged when the method runs to the 50 th round, which shows that the algorithm has certain superiority and good effect in the aspects of convergence speed and stability, and prove that the system designed by the invention has good robustness and adaptability, and the method and the model provided by the invention have certain applicability. Has reference value for future research in the related field.
The above description is only of the preferred embodiments of the present invention, and it should be noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the invention and these are intended to be within the scope of the invention.

Claims (7)

1. A task allocation method based on MAS-Q-learning is characterized by comprising the following steps:
step 1, data acquisition: acquiring user data in a real application scene, wherein the user data comprises data which are generated by a user and have a state set, an action function, a selection probability and a reward function;
step 2, data preprocessing: modeling the user data obtained in the step 1 by adopting Markov decision, carrying out normalization processing on capacity data of crowdsourcing personnel aiming at different types of tasks, designing the crowdsourcing personnel into an intelligent body quintuple, and calculating the global income of the crowdsourcing personnel by a Q value learning method;
and step 3, state transition: locating the state of the neighboring agent and the next state to assist the self state transition with the target estimated state of the neighboring agent; the neighbor node carries out positioning and is calculated by utilizing distance observation and information transmitted by the neighbor node;
step 4, modeling the multi-agent system: the Laplace matrix is used for describing the incidence relation among the members of each agent, and the purpose is to construct a mechanism for information interaction of the agents of each member in the multi-agent system and a corresponding topological model, so that the difficulty in solving a complex problem is reduced;
step 5, a multi-attribute decision stage: firstly, a decision matrix is given, whether the weight is known or not is judged, the weight is determined, an aggregation operator of the attribute matrix is obtained according to the attribute value of the decision matrix, meanwhile, a corresponding multi-attribute decision method is selected for calculation according to the form of a solution target and the decision matrix, the calculation result is subjected to weight distribution and aggregation, and decision is made according to the scoring condition of the final schemes;
step 6, the method optimization stage: and estimating an action-value function by adopting a time difference method, and simultaneously providing an intelligent state function meeting the conditions of reasonableness and completeness.
2. The MAS-Q-learning based task allocation method as claimed in claim 1, wherein: the data preprocessing method in the step 2 is as follows:
step 2a), designing crowdsourcing personnel into an intelligent agent quintuple: s, A, P, gamma and R >, wherein S is a state, A is an action function, P is a selection probability, gamma is a discount factor, gamma belongs to (0,1), and R is a reward function;
step 2b), when at a certain time t, the agent is in state St+1Selecting strategy from strategy set and generating action function AtAt this time according to the probability ptTransfer to the next state St+1And repeating the state, and obtaining the global benefit of the agent.
3. The MAS-Q-learning based task allocation method as claimed in claim 2, wherein: the state transition method in the step 3 is as follows:
step 3a), firstly, deducing the Euclidean distance of the intelligent agent relative to the adjacent intelligent agent to obtain the relative estimation position of the intelligent agent j in a local coordinate system under the intelligent agent i to obtain distance observation;
and 3b), positioning the neighbor node by using the distance observation obtained in the step 3a) and the information transmitted by the neighbor node.
4. A MAS-Q-learning based task allocation method as claimed in claim 3 wherein: in the step 4, the multi-agent system is modeled as follows:
step 4a), the intelligent body system comprises more than two intelligent bodies, and the topological structure of the intelligent body system is composed of
Figure FDA0003116597330000022
Expressing, calculating to obtain a kinetic equation and an edge state definition of a single intelligent agent;
step 4b), updating the kinetic equation of a single agent, then calculating to obtain a corresponding incidence correlation matrix, deducing to obtain a Laplace matrix, establishing an information feedback model, and further obtaining the information interaction feedback of the agent;
step 4c), after information feedback models among the agents in the multi-agent system are obtained, model reduction is carried out on the multi-agent system, and the complexity of solving is reduced based on the structure of the spanning tree sub-graph; and performing linear transformation on the spanning tree to obtain a spanning residual tree which is used as an internal feedback item of the multi-agent system, and finally obtaining the reduced-order multi-agent system model.
5. The MAS-Q-learning based task allocation method as claimed in claim 4, wherein: the multi-attribute decision phase method in the step 6 is as follows: solving a Markov decision process problem under the condition that a transition probability model is unknown, setting a state (S), an action (A), a reward function (r) and a transition probability (p), wherein the Markov property of the transition probability model is p (S)t+1|s0,a0,…,st,at)=p(st+1|st,at) Wherein s istIndicating the state at time t, atRepresenting behavior at time t; the optimization goal of the model is
Figure FDA0003116597330000021
s.t.st+1~p(·|st,at),at~π(·|st) T-1, pi denotes a constant, pi (· | s)t) Is shown in state stProbability of using reinforcement learning method in p(s)t+1|st,at) And solving the Markov decision process problem under the unknown condition, and estimating an action-value function by adopting a time difference method.
6. The MAS-Q-learning based task allocation method as claimed in claim 5, wherein: the state of the agent satisfying the integrity condition includes all information required by the agent's decision.
7. The MAS-Q-learning based task allocation method as claimed in claim 6, wherein: and designing discrete or continuous action values for the action of the intelligent agent according to the numerical characteristics of the applied control quantity.
CN202110664158.9A 2021-06-16 2021-06-16 Task allocation method based on MAS-Q-learning Active CN113377655B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110664158.9A CN113377655B (en) 2021-06-16 2021-06-16 Task allocation method based on MAS-Q-learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110664158.9A CN113377655B (en) 2021-06-16 2021-06-16 Task allocation method based on MAS-Q-learning

Publications (2)

Publication Number Publication Date
CN113377655A true CN113377655A (en) 2021-09-10
CN113377655B CN113377655B (en) 2023-06-20

Family

ID=77574510

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110664158.9A Active CN113377655B (en) 2021-06-16 2021-06-16 Task allocation method based on MAS-Q-learning

Country Status (1)

Country Link
CN (1) CN113377655B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109409739A (en) * 2018-10-19 2019-03-01 南京大学 A kind of crowdsourcing platform method for allocating tasks based on part Observable markov decision process
WO2020092437A1 (en) * 2018-10-29 2020-05-07 Google Llc Determining control policies by minimizing the impact of delusion
CN111695690A (en) * 2020-07-30 2020-09-22 航天欧华信息技术有限公司 Multi-agent confrontation decision-making method based on cooperative reinforcement learning and transfer learning
CN111770454A (en) * 2020-07-03 2020-10-13 南京工业大学 Game method for position privacy protection and platform task allocation in mobile crowd sensing
CN111860649A (en) * 2020-07-21 2020-10-30 赵佳 Action set output method and system based on multi-agent reinforcement learning
CN112121439A (en) * 2020-08-21 2020-12-25 林瑞杰 Cloud game engine intelligent optimization method and device based on reinforcement learning
CN112598137A (en) * 2020-12-21 2021-04-02 西北工业大学 Optimal decision method based on improved Q-learning
CN112801430A (en) * 2021-04-13 2021-05-14 贝壳找房(北京)科技有限公司 Task issuing method and device, electronic equipment and readable storage medium
CN112884239A (en) * 2021-03-12 2021-06-01 重庆大学 Aerospace detonator production scheduling method based on deep reinforcement learning

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109409739A (en) * 2018-10-19 2019-03-01 南京大学 A kind of crowdsourcing platform method for allocating tasks based on part Observable markov decision process
WO2020092437A1 (en) * 2018-10-29 2020-05-07 Google Llc Determining control policies by minimizing the impact of delusion
CN111770454A (en) * 2020-07-03 2020-10-13 南京工业大学 Game method for position privacy protection and platform task allocation in mobile crowd sensing
CN111860649A (en) * 2020-07-21 2020-10-30 赵佳 Action set output method and system based on multi-agent reinforcement learning
CN111695690A (en) * 2020-07-30 2020-09-22 航天欧华信息技术有限公司 Multi-agent confrontation decision-making method based on cooperative reinforcement learning and transfer learning
CN112121439A (en) * 2020-08-21 2020-12-25 林瑞杰 Cloud game engine intelligent optimization method and device based on reinforcement learning
CN112598137A (en) * 2020-12-21 2021-04-02 西北工业大学 Optimal decision method based on improved Q-learning
CN112884239A (en) * 2021-03-12 2021-06-01 重庆大学 Aerospace detonator production scheduling method based on deep reinforcement learning
CN112801430A (en) * 2021-04-13 2021-05-14 贝壳找房(北京)科技有限公司 Task issuing method and device, electronic equipment and readable storage medium

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
YONG SUN: "A trust-aware task allocation method using deep q-learning for uncertain mobile crowdsourcing", pages 1 - 7, Retrieved from the Internet <URL:《 Human-centric Computing and Information Sciences 》> *
ZHENG ZHAO: "Learning Task Allocation for Multiple Flows in Multi-Agent Systems", 《2009 INTERNATIONAL CONFERENCE ON COMMUNICATION SOFTWARE AND NETWORKS》, pages 1 - 9 *
倪志伟: "基于深度强化学习的空间众包任务分配策略", 《模式识别与人工智能 》, pages 191 - 205 *
张雷: "分布式任务分配中的一种信誉重连策略", 《广西大学学报(自然科学版)》, pages 645 - 648 *
杨萍;毕义明;刘卫东;: "基于模糊马尔科夫理论的机动智能体决策模型", 系统工程与电子技术, no. 03, pages 1 - 5 *
洋葱YCY: "Q-learning理解、实现以及动态分配应用(一)", pages 1 - 5, Retrieved from the Internet <URL:《https://blog.csdn.net/ycy0706/article/details/84655242》> *
郑晓杰: "基于微云的移动云平台负载分发与资源分配", 《中国优秀硕士学位论文全文数据库信息科技辑》, pages 139 - 142 *

Also Published As

Publication number Publication date
CN113377655B (en) 2023-06-20

Similar Documents

Publication Publication Date Title
Liu et al. Assessing optimal assignment under uncertainty: An interval-based algorithm
CN112131786B (en) Target detection and distribution method and device based on multi-agent reinforcement learning
CN104408518B (en) Based on the neural network learning optimization method of particle swarm optimization algorithm
CN109241674A (en) A kind of multi-time Delay method for analyzing stability of intelligent network connection platooning
CN110653824B (en) Method for characterizing and generalizing discrete trajectory of robot based on probability model
CN109657868A (en) A kind of probabilistic programming recognition methods of task sequential logic constraint
CN109940614A (en) A kind of quick motion planning method of the more scenes of mechanical arm merging memory mechanism
CN110826244A (en) Conjugate gradient cellular automata method for simulating influence of rail transit on urban growth
CN116361697A (en) Learner learning state prediction method based on heterogeneous graph neural network model
CN115599779A (en) Urban road traffic missing data interpolation method and related equipment
CN114819068A (en) Hybrid target track prediction method and system
CN109961129A (en) A kind of Ocean stationary targets search scheme generation method based on improvement population
CN111192158A (en) Transformer substation daily load curve similarity matching method based on deep learning
Zhang et al. A method for linguistic multiple attribute decision making based on TODIM
CN108153519B (en) Universal design framework for target intelligent tracking method
Syberfeldt et al. Multi-objective evolutionary simulation-optimisation of a real-world manufacturing problem
Li et al. Differentiable bootstrap particle filters for regime-switching models
Dwivedi et al. A comparison of particle swarm optimization (PSO) and genetic algorithm (GA) in second order design (SOD) of GPS networks
CN115691140B (en) Analysis and prediction method for space-time distribution of automobile charging demand
CN113377655A (en) MAS-Q-learning-based task allocation method
CN113379063B (en) Whole-flow task time sequence intelligent decision-making method based on online reinforcement learning model
Zhan et al. Dueling network architecture for multi-agent deep deterministic policy gradient
CN114372418A (en) Wind power space-time situation description model establishing method
CN112633591B (en) Space searching method and device based on deep reinforcement learning
Hossain et al. Efficient learning of voltage control strategies via model-based deep reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant