CN113377655B - Task allocation method based on MAS-Q-learning - Google Patents
Task allocation method based on MAS-Q-learning Download PDFInfo
- Publication number
- CN113377655B CN113377655B CN202110664158.9A CN202110664158A CN113377655B CN 113377655 B CN113377655 B CN 113377655B CN 202110664158 A CN202110664158 A CN 202110664158A CN 113377655 B CN113377655 B CN 113377655B
- Authority
- CN
- China
- Prior art keywords
- agent
- intelligent
- state
- decision
- intelligent agent
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G06F11/3668—Software testing
- G06F11/3672—Test management
- G06F11/3684—Test management for test design, e.g. generating new test cases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Computer Hardware Design (AREA)
- Quality & Reliability (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a task allocation method based on MAS-Q-learning, which is used for acquiring user data in a real application scene, modeling the user data by adopting a Markov decision, designing crowdsourcing personnel into an agent quintuple, and calculating global benefits of the crowdsourcing personnel by a Q value learning method; and positioning the state of the adjacent intelligent agent and the next state, describing the association relation among the intelligent agent members by using a Laplace matrix, calculating by using a multi-attribute decision method, and then carrying out weight distribution and aggregation on the calculation result. And estimating the action-value function by adopting a time difference method, and simultaneously providing an agent state function meeting the conditions of rationality and integrity. The invention has good robustness and adaptability.
Description
Technical Field
The invention relates to the field of task allocation, and is mainly applied to crowdsourcing scenes, in particular to a cost optimization problem of complex task allocation under the crowdsourcing scenes.
Background
The design power of the invention is derived from the emerging application of software testing work in the current crowdsourcing, and the general crowdsourcing process is characterized in that the task allocation is not clear, and the crowdsourcing workers cannot obtain the maximum personal benefit.
Disclosure of Invention
The invention aims to: in order to avoid the problems that task allocation is not clear in the crowdsourcing process, personal benefits cannot be obtained by crowdsourcing workers and the like, the invention provides a task allocation method based on MAS-Q-learning. The robustness of the model is improved by using the Q value learning method and designing a knowledge sharing mechanism, and the expandability of the solution scheme can be improved by utilizing the interaction characteristic by allowing partial knowledge sharing among various agents, wherein most of the agents are similar to each other and are mutually influenced by the collective states of the agents. Secondly, training and solving are carried out on small sample data, the data is trained in a semi-supervision mode, and modeling is carried out on an uncertainty area; and the model can also utilize the symmetry of a large multi-agent system to converge task allocation into a differential-convex function planning problem, so that the convergence of an algorithm is improved. Finally, in order to verify the algorithm, the related simulator developed on the multi-agent carries out transfer learning on the task allocation problem and the climbing problem, and multi-agent systems and environments with different scales are tested, so that the algorithm has better Q value learning effect than the traditional multi-agent.
The technical scheme is as follows: in order to achieve the above purpose, the invention adopts the following technical scheme:
a task allocation method based on MAS-Q-learning comprises the following steps:
step 1, data acquisition: user data in a real application scenario is acquired, the user data comprising user generated data having a set of states, an action function, a selection probability and a reward function.
Step 2, data preprocessing: modeling the user data obtained in the step 1 by adopting a Markov decision, carrying out normalization processing on capability data of crowdsourcing personnel aiming at different types of tasks, designing the crowdsourcing personnel into an agent quintuple, and calculating global benefits of the agents by a Q value learning method.
Step 3, state transition: the state of the neighboring agent and the next state are located to assist in its state transition using the target estimated state of the neighboring agent. And the neighbor nodes are positioned and calculated by using the distance observation and the information transmitted by the neighbor nodes.
Step 4, modeling a multi-agent system: the Laplace matrix is used for describing the association relation among the intelligent agent members, and the purpose is to construct a mechanism for information interaction of the intelligent agents of the members in the multi-intelligent system and a corresponding topology model, so that the solving difficulty of the complex problem is reduced.
In the step 4, the modeling of the multi-agent system is as follows:
step 4 a), the intelligent system comprises more than two intelligent agents, and the topological structure of the intelligent agent system is formed byAnd (3) representing, calculating and obtaining a dynamic equation and an edge state definition of the single agent.
And 4 b), updating a dynamic equation of a single intelligent agent, and then calculating to obtain a corresponding incidence-degree correlation matrix, thereby reasoning to obtain a Laplace matrix, establishing an information feedback model, and further obtaining information interaction feedback of the intelligent agent.
And 4 c), after an information feedback model among the intelligent agents in the multi-intelligent agent system is obtained, model reduction is carried out on the multi-intelligent agent system, and the complexity of solving is reduced based on the spanning tree sub-graph structure. And performing linear transformation on the spanning tree to obtain a spanning residual tree which is used as an internal feedback item of the multi-intelligent system, and finally obtaining the reduced multi-intelligent system model.
Step 5, multi-attribute decision stage: firstly, a decision matrix is given, whether the weight is known and the weight is determined, an aggregation operator of the attribute matrix is obtained according to the attribute value of the decision matrix, meanwhile, a corresponding multi-attribute decision method is selected for calculation according to a solving target and the form of the decision matrix, the calculation result is subjected to weight distribution and aggregation, and decision is carried out according to the final scoring condition of each scheme.
Step 6, method optimization stage: and estimating the action-value function by adopting a time difference method, and simultaneously providing an agent state function meeting the conditions of rationality and integrity.
Preferably: the data preprocessing method in the step 2 is as follows:
step 2 a), designing crowdsourcing personnel into an agent quintuple: < S, A, P, gamma, R >, where S is the state, A is the action function, P is the selection probability, gamma is the discount factor, gamma E (0, 1), R is the reward function.
Step 2 b), when at a certain time t, the agent is in state S t+1 Selecting strategies from the strategy set and generating an action function A t At this time according to probability p t Transition to the next state S t+1 And by analogy, traversing the state to obtain the global benefit of the intelligent agent.
Preferably: the state transition method in the step 3 is as follows:
step 3 a), firstly deducing the Euclidean distance of the intelligent agent relative to the adjacent intelligent agent to obtain the relative estimated position of the intelligent agent j in the local coordinate system under the intelligent agent i, and obtaining the distance observation.
And 3 b) positioning the neighbor node by utilizing the distance observation obtained in the step 3 a) and the information transmitted by the neighbor node.
Preferably: the MAS-Q-learning based task allocation method according to claim 4, wherein: the multi-attribute decision stage method in the step 6 is as follows: and solving the Markov decision process problem under the condition that the transition probability model is unknown. Setting a state (S), an action (A), a reward function (r), a transition probability (p), a Markov of p (S) t+1 |s 0 ,a 0 ,…,s t ,a t )=p(s t+1 |s t ,a t ) Wherein s is t Representing the state at time t, a t Representing behavior at time t; the optimization goal of the model isa t ~π(·|s t ) T=0, … T-1, pi represents a constant,π(·|s t ) Represented in state s t Probability below. At p(s) using reinforcement learning method t+1 |s t ,a t ) Solving a Markov decision process problem under the unknown condition, and estimating an action-value function by adopting a time difference method;
preferably: the agent status meeting the integrity condition includes all information needed for agent decision making.
Preferably: discrete or continuous motion values are designed for the motion of the agent according to the numerical characteristics of the applied control quantity.
Compared with the prior art, the invention has the following beneficial effects:
the invention establishes a multi-person model based on a single person decision method. Aiming at the specificity of crowd test environments, the invention designs a multi-attribute decision mechanism in the crowd test process. The invention selects Q value learning as a training algorithm and optimizes the design of an imperfect information sharing mechanism. Through different imperfect information sharing scenes and different gamma values and data sets, the training results are analyzed, and the system designed by the invention has good robustness and adaptability, and the method and the model provided by the invention have certain applicability. Has reference value for future research in the related field. The method has strong practicability and is suitable for all crowdsourcing system systems.
Drawings
FIG. 1 is a flow chart of the overall method of the present invention;
FIG. 2 is a mass measurement process used in the present invention.
FIG. 3 is a framework of the multi-agent collaborative behavior decision-making model research used in the present invention.
Detailed Description
The present invention is further illustrated in the accompanying drawings and detailed description which are to be understood as being merely illustrative of the invention and not limiting of its scope, and various equivalent modifications to the invention will fall within the scope of the appended claims to the skilled person after reading the invention.
A task allocation method based on MAS-Q-learning, as shown in figures 1-3, comprises the following steps:
step 1, data acquisition: user data in a real application scene is acquired, wherein the user data comprises data with a state set, an action function, a selection probability and a reward function, which are generated by a user, and the four types of data cannot have any loss.
Step 2, data preprocessing: modeling the user data obtained in the step 1 by adopting a Markov decision, carrying out normalization processing on capability data of crowdsourcing personnel aiming at different types of tasks, designing the crowdsourcing personnel into an agent quintuple, and calculating global benefits of the agents by a Q value learning method.
The data preprocessing method in the step 2 is as follows:
step 2 a), designing crowdsourcing personnel into an agent quintuple: < S, A, P, gamma, R >, where S is the state, A is the action function, P is the selection probability, gamma is the discount factor, gamma E (0, 1), R is the reward function.
Step 2 b), when at a certain time t, the agent is in state S t+1 Selecting strategies from the strategy set and generating an action function A t At this time according to probability p t Transition to the next state S t+1 And by analogy, traversing the state to obtain the global benefit of the intelligent agent.
Step 3, state transition: the state of the neighboring agent and the next state are located to assist in its state transition using the target estimated state of the neighboring agent. And the neighbor nodes are positioned and calculated by using the distance observation and the information transmitted by the neighbor nodes.
Step 3 a), firstly deducing the Euclidean distance of the intelligent agent relative to the adjacent intelligent agent to obtain the relative estimated position of the intelligent agent j in the local coordinate system under the intelligent agent i, and obtaining the distance observation.
And 3 b) positioning the neighbor node by utilizing the distance observation obtained in the step 3 a) and the information transmitted by the neighbor node.
Step 4, modeling a multi-agent system: the invention adopts the Laplace matrix to describe the association relation among the intelligent agent members, and aims to construct a mechanism for information interaction of the intelligent agents of the members in a multi-intelligent system and a corresponding topology model, thereby reducing the solving difficulty of complex problems.
In the step 4, the modeling of the multi-agent system is as follows:
step 4 a), the intelligent system comprises more than two intelligent agents, and the topological structure of the intelligent agent system is formed byAnd (3) representing, calculating and obtaining a dynamic equation and an edge state definition of the single agent.
And 4 b), updating a dynamic equation of a single intelligent agent, and then calculating to obtain a corresponding incidence-degree correlation matrix, thereby reasoning to obtain a Laplace matrix, establishing an information feedback model, and further obtaining information interaction feedback of the intelligent agent.
And 4 c), after an information feedback model among the intelligent agents in the multi-intelligent agent system is obtained, model reduction is carried out on the multi-intelligent agent system, and the complexity of solving is reduced based on the spanning tree sub-graph structure. And performing linear transformation on the spanning tree to obtain a spanning residual tree which is used as an internal feedback item of the multi-intelligent system, and finally obtaining the reduced multi-intelligent system model.
Step 5, multi-attribute decision stage: firstly, a decision matrix is given, whether the weight is known and the weight is determined, an aggregation operator of the attribute matrix is obtained according to the attribute value of the decision matrix, meanwhile, a corresponding multi-attribute decision method is selected for calculation according to a solving target and the form of the decision matrix, the calculation result is subjected to weight distribution and aggregation, and decision is carried out according to the final scoring condition of each scheme.
Step 6, method optimization stage: and estimating the action-value function by adopting a time difference method, and simultaneously providing an agent state function meeting the conditions of rationality and integrity.
The multi-attribute decision stage method in the step 6 is as follows: and solving the Markov decision process problem under the condition that the transition probability model is unknown. Setting a state (S), an action (A), a reward function (r), a transition probability (p), a Markov of p (S) t+1 |s 0 ,a 0 ,…,s t ,a t )=p(s t+1 |s t ,a t ) Wherein s is t Representing the state at time t, a t Representing behavior at time t; the optimization goal of the model isa t ~π(·|s t ) T=0,..t-1. At p(s) using reinforcement learning method t+1 |s t ,a t ) And solving the Markov decision process problem under the unknown condition, and estimating the action-value function by adopting a time difference method. Under the research framework, the intelligent agent state is designed to meet the conditions of rationality, integrity and the like. The integrity requirement state contains all information needed by the decision of the agent, such as trend information of the target track needs to be added in the track tracking problem of the agent, but if the information cannot be observed, the extended state contains historical observation values. The motion of the agent is designed, and discrete or continuous motion values are designed according to the numerical characteristics of the applied control quantity.
In practical deployment, the method cannot be applied for one time and can not be used for all, and corresponding adjustment is needed according to different data such as a decision set, an action set and the like of a user.
In summary, the invention designs a multi-attribute decision mechanism in the crowd test process. We choose Q-value learning as the training algorithm and optimize the design of the imperfect information sharing mechanism. Through different imperfect information sharing scenes and different gamma values and data sets, the training results are analyzed, experiments prove that the method can converge when running to the 50 th round, the algorithm has certain superiority in the aspects of convergence speed and stability, the effect is good, and the system designed by the method has good robustness and adaptability, and the method and the model provided by the invention have certain applicability. Has reference value for future research in the related field.
The foregoing is only a preferred embodiment of the invention, it being noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the present invention, and such modifications and adaptations are intended to be comprehended within the scope of the invention.
Claims (4)
1. The MAS-Q-learning-based task allocation method is characterized by comprising the following steps:
step 1, data acquisition: acquiring user data in a real application scene, wherein the user data comprises data with a state set, an action function, a selection probability and a reward function, which are generated by a user;
step 2, data preprocessing: modeling the user data obtained in the step 1 by adopting a Markov decision, carrying out normalization processing on capability data of crowdsourcing personnel aiming at different types of tasks, designing the crowdsourcing personnel into an agent quintuple, and calculating global benefits of the crowdsourcing personnel by a Q value learning method;
the data preprocessing method in the step 2 is as follows:
step 2 a), designing crowdsourcing personnel into an agent quintuple:wherein->For the status of->For action function +.>To select probability +.>For discounts factor->,/>Is a reward function;
step 2 b), when at a certain momentWhen the intelligent agent is in the state->Selecting a policy from the set of policies and generating an action function +.>At this time according to probability->Transition to the next state->And so on, after traversing the state, obtaining the global benefit of the intelligent agent;
step 3, state transition: positioning the state of the adjacent agent and the next state so as to assist the state transition of the adjacent agent by using the target estimated state of the adjacent agent; the neighbor node performs positioning and calculates by using distance observation and information transmitted by the neighbor node;
step 4, modeling a multi-agent system: the Laplace matrix is used for describing the association relation among the intelligent agent members, and the purpose is to construct a mechanism for information interaction of the intelligent agents of the members in the multi-intelligent system and a corresponding topological model, so that the solving difficulty of the complex problem is reduced;
the multi-intelligent system modeling method comprises the following steps:
step 4 a), the intelligent system comprises more than two intelligent agents, and the topological structure of the intelligent agent system is formed byRepresenting, calculating to obtain a dynamic equation and an edge state definition of a single agent;
step 4 b), updating a dynamic equation of a single intelligent agent, and then calculating to obtain a corresponding incidence-degree associated matrix, thereby reasoning to obtain a Laplace matrix, establishing an information feedback model, and further obtaining information interaction feedback of the intelligent agent;
step 4 c), after an information feedback model among the intelligent agents in the multi-intelligent agent system is obtained, model reduction is carried out on the multi-intelligent agent system, and the complexity of solving is reduced based on a spanning tree sub-graph structure; performing linear transformation on the spanning tree to obtain a spanning residual tree which is used as an internal feedback item of the multi-intelligent system, and finally obtaining a reduced multi-intelligent system model;
step 5, multi-attribute decision stage: firstly, giving a decision matrix, judging whether the weight is known and determining the weight, obtaining an aggregation operator of the attribute matrix according to the attribute value of the decision matrix, selecting a corresponding multi-attribute decision method to calculate according to a solving target and the form of the decision matrix, distributing and aggregating the weight of a calculation result, and deciding according to the scoring condition of each scheme;
the multi-attribute decision stage method is as follows: solving the Markov decision process problem under the condition that the transition probability model is unknown, and setting the stateAction->Reward function->Transition probability->Its Markov isWherein s is t State at time t +.>Representing behavior at time t; the optimization goal of the model is->,/>Representing a constant->Is indicated in the state->Probability under +.>Solving a Markov decision process problem under the unknown condition, and estimating an action-value function by adopting a time difference method;
step 6, method optimization stage: and estimating the action-value function by adopting a time difference method, and simultaneously providing an agent state function meeting the conditions of rationality and integrity.
2. The MAS-Q-learning based task allocation method according to claim 1, wherein: the state transition method in the step 3 is as follows:
step 3 a), firstly deducing the Euclidean distance of the intelligent agent relative to the adjacent intelligent agent to obtain the intelligent agentIn the intelligent bodyThe relative estimated position of the lower local coordinate system is used for obtaining distance observation;
and 3 b) positioning the neighbor node by utilizing the distance observation obtained in the step 3 a) and the information transmitted by the neighbor node.
3. The MAS-Q-learning based task allocation method according to claim 2, wherein: the agent status meeting the integrity condition includes all information needed for agent decision making.
4. The MAS-Q-learning based task allocation method according to claim 3, wherein: discrete or continuous motion values are designed for the motion of the agent according to the numerical characteristics of the applied control quantity.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110664158.9A CN113377655B (en) | 2021-06-16 | 2021-06-16 | Task allocation method based on MAS-Q-learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110664158.9A CN113377655B (en) | 2021-06-16 | 2021-06-16 | Task allocation method based on MAS-Q-learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113377655A CN113377655A (en) | 2021-09-10 |
CN113377655B true CN113377655B (en) | 2023-06-20 |
Family
ID=77574510
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110664158.9A Active CN113377655B (en) | 2021-06-16 | 2021-06-16 | Task allocation method based on MAS-Q-learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113377655B (en) |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109409739A (en) * | 2018-10-19 | 2019-03-01 | 南京大学 | A kind of crowdsourcing platform method for allocating tasks based on part Observable markov decision process |
WO2020092437A1 (en) * | 2018-10-29 | 2020-05-07 | Google Llc | Determining control policies by minimizing the impact of delusion |
CN111695690A (en) * | 2020-07-30 | 2020-09-22 | 航天欧华信息技术有限公司 | Multi-agent confrontation decision-making method based on cooperative reinforcement learning and transfer learning |
CN111770454A (en) * | 2020-07-03 | 2020-10-13 | 南京工业大学 | Game method for position privacy protection and platform task allocation in mobile crowd sensing |
CN111860649A (en) * | 2020-07-21 | 2020-10-30 | 赵佳 | Action set output method and system based on multi-agent reinforcement learning |
CN112121439A (en) * | 2020-08-21 | 2020-12-25 | 林瑞杰 | Cloud game engine intelligent optimization method and device based on reinforcement learning |
CN112598137A (en) * | 2020-12-21 | 2021-04-02 | 西北工业大学 | Optimal decision method based on improved Q-learning |
CN112801430A (en) * | 2021-04-13 | 2021-05-14 | 贝壳找房(北京)科技有限公司 | Task issuing method and device, electronic equipment and readable storage medium |
CN112884239A (en) * | 2021-03-12 | 2021-06-01 | 重庆大学 | Aerospace detonator production scheduling method based on deep reinforcement learning |
-
2021
- 2021-06-16 CN CN202110664158.9A patent/CN113377655B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109409739A (en) * | 2018-10-19 | 2019-03-01 | 南京大学 | A kind of crowdsourcing platform method for allocating tasks based on part Observable markov decision process |
WO2020092437A1 (en) * | 2018-10-29 | 2020-05-07 | Google Llc | Determining control policies by minimizing the impact of delusion |
CN111770454A (en) * | 2020-07-03 | 2020-10-13 | 南京工业大学 | Game method for position privacy protection and platform task allocation in mobile crowd sensing |
CN111860649A (en) * | 2020-07-21 | 2020-10-30 | 赵佳 | Action set output method and system based on multi-agent reinforcement learning |
CN111695690A (en) * | 2020-07-30 | 2020-09-22 | 航天欧华信息技术有限公司 | Multi-agent confrontation decision-making method based on cooperative reinforcement learning and transfer learning |
CN112121439A (en) * | 2020-08-21 | 2020-12-25 | 林瑞杰 | Cloud game engine intelligent optimization method and device based on reinforcement learning |
CN112598137A (en) * | 2020-12-21 | 2021-04-02 | 西北工业大学 | Optimal decision method based on improved Q-learning |
CN112884239A (en) * | 2021-03-12 | 2021-06-01 | 重庆大学 | Aerospace detonator production scheduling method based on deep reinforcement learning |
CN112801430A (en) * | 2021-04-13 | 2021-05-14 | 贝壳找房(北京)科技有限公司 | Task issuing method and device, electronic equipment and readable storage medium |
Non-Patent Citations (5)
Title |
---|
Learning Task Allocation for Multiple Flows in Multi-Agent Systems;Zheng Zhao;《2009 International Conference on Communication Software and Networks》;1-9 * |
分布式任务分配中的一种信誉重连策略;张雷;《广西大学学报(自然科学版)》;645-648 * |
基于微云的移动云平台负载分发与资源分配;郑晓杰;《中国优秀硕士学位论文全文数据库信息科技辑》;I139-142 * |
基于模糊马尔科夫理论的机动智能体决策模型;杨萍;毕义明;刘卫东;;系统工程与电子技术(第03期);1-5 * |
基于深度强化学习的空间众包任务分配策略;倪志伟;《模式识别与人工智能 》;191-205 * |
Also Published As
Publication number | Publication date |
---|---|
CN113377655A (en) | 2021-09-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Liu et al. | Assessing optimal assignment under uncertainty: An interval-based algorithm | |
CN104408518B (en) | Based on the neural network learning optimization method of particle swarm optimization algorithm | |
Tsiogkas et al. | Efficient multi-AUV cooperation using semantic knowledge representation for underwater archaeology missions | |
CN112633591B (en) | Space searching method and device based on deep reinforcement learning | |
Wei et al. | Multi-robot path planning for mobile sensing through deep reinforcement learning | |
CN116520281B (en) | DDPG-based extended target tracking optimization method and device | |
CN114861368B (en) | Construction method of railway longitudinal section design learning model based on near-end strategy | |
CN115599779A (en) | Urban road traffic missing data interpolation method and related equipment | |
CN116560409A (en) | Unmanned aerial vehicle cluster path planning simulation method based on MADDPG-R | |
Lee et al. | Sampling of pareto-optimal trajectories using progressive objective evaluation in multi-objective motion planning | |
Li et al. | Differentiable bootstrap particle filters for regime-switching models | |
Atashbar et al. | AI and macroeconomic modeling: Deep reinforcement learning in an RBC model | |
CN113379063B (en) | Whole-flow task time sequence intelligent decision-making method based on online reinforcement learning model | |
CN113377655B (en) | Task allocation method based on MAS-Q-learning | |
Zhang et al. | Mobile robot localization based on gradient propagation particle filter network | |
Yashin et al. | Assessment of Material and Intangible Motivation of Top Management in Regions Using Multipurpose Genetic Algorithm | |
Hu et al. | An experience aggregative reinforcement learning with multi-attribute decision-making for obstacle avoidance of wheeled mobile robot | |
Er et al. | A novel framework for automatic generation of fuzzy neural networks | |
Chen et al. | A New Decision-Making Process for Selecting Project Leader Based on Social Network and Knowledge Map. | |
Owda et al. | Using artificial neural network techniques for prediction of electric energy consumption | |
Qiu et al. | Evaluation criterion for different methods of multiple-attribute group decision making with interval-valued intuitionistic fuzzy information | |
Wawrzynczak et al. | Feedforward neural networks in forecasting the spatial distribution of the time-dependent multidimensional functions | |
Ahadi et al. | A new hybrid for software cost estimation using particle swarm optimization and differential evolution algorithms | |
Pedroso et al. | The sea exploration problem: Data-driven orienteering on a continuous surface | |
Du et al. | Multi-agent trajectory prediction based on graph neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |