CN113671827B - Dynamic bipartite graph allocation length decision method based on cyclic neural network and reinforcement learning - Google Patents

Dynamic bipartite graph allocation length decision method based on cyclic neural network and reinforcement learning Download PDF

Info

Publication number
CN113671827B
CN113671827B CN202110820657.2A CN202110820657A CN113671827B CN 113671827 B CN113671827 B CN 113671827B CN 202110820657 A CN202110820657 A CN 202110820657A CN 113671827 B CN113671827 B CN 113671827B
Authority
CN
China
Prior art keywords
worker
task
available
allocation
current
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110820657.2A
Other languages
Chinese (zh)
Other versions
CN113671827A (en
Inventor
陈荣
刘岳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian Maritime University
Original Assignee
Dalian Maritime University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian Maritime University filed Critical Dalian Maritime University
Priority to CN202110820657.2A priority Critical patent/CN113671827B/en
Publication of CN113671827A publication Critical patent/CN113671827A/en
Application granted granted Critical
Publication of CN113671827B publication Critical patent/CN113671827B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/04Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
    • G05B13/042Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a dynamic bipartite graph allocation length decision method based on a cyclic neural network and reinforcement learning, which comprises the following steps: s1: judging whether a worker or a task arrives at the current moment; s2: if the worker arrives, adding the worker into the available worker set; s3: acquiring a currently available worker set and an available task set according to the time information; s4: reading parameter information of a worker set and a task set; s5: inputting the worker parameters and the task parameters into a reinforcement learning network; s6: if not, directly jumping to S8, and if so, performing S7; s7: performing task allocation on the current available worker set and the available task set by using a Hungary algorithm, and recording allocation rewards; s8: removing expired workers and tasks from the available worker set W and the available task set T, and recording an expired penalty; s9: and training the reinforcement learning network according to the obtained rewards and punishments, entering the next moment, and returning to S1 to wait for a newly arrived worker or task.

Description

Dynamic bipartite graph allocation length decision method based on cyclic neural network and reinforcement learning
Technical Field
The invention relates to the technical field of dynamic task allocation, in particular to a dynamic bipartite graph allocation length decision method based on a cyclic neural network and reinforcement learning.
Background
The reason why the dynamic bipartite graph needs to make a decision on the allocation length in the matching process is that: under the dynamic condition, workers and tasks arrive and leave randomly, and the tasks have different difficulties, so that the workers have different capacities and calculate the task completion rate by using the capacities of the workers and the task difficulties; it is difficult to maximize the overall completion rate if greedy direct allocation occurs when workers or tasks arrive, while waiting for workers and tasks to reach a fixed number for allocation can result in significant outages of workers and tasks affecting the results.
The closest technology at present is Wang Y, etc [1] The Q learning-based RQL (stabilized Q-learning) method proposed in 2019 makes dynamic decisions on allocation length in the case of dynamic bipartite graphs. The specific method is that rewards obtained by selecting different actions according to the current task state of workers continuously update Q values by using a Belman equation, and the Q values are stored on a Q table. The action with the largest Q value in a certain state is the current optimal action. The state-in-action is defined as follows: the state is the time and difficulty of workers and tasks,Capacity, etc., but this makes the state space so large that the Q table cannot record, thus reducing the state space, and using the current worker and task numbers to form a binary group as the current state. The action is the target allocation length, and allocation is performed when the current worker task reaches the target allocation length. The distributed rewards are the total completion rate obtained by using the Hungary algorithm for distribution as rewards. State transition, the current state plus the number of worker tasks that arrive at the next moment.
The allocation length decision can be made using previous methods, but there are a number of disadvantages: workers and tasks have time constraints and expire beyond, but few studies can be directly and effectively applied to dynamic bipartite graph matching. The calculation of rewards without consideration of expired worker tasks should be penalized for the calculation of rewards in the RQL method to prevent excessive dispense length. Furthermore, where only workers, tasks, and no expiration penalties are considered, the method may be more prone to selecting longer dispense lengths within a range of selectable dispense lengths, making dynamic decisions for different situations difficult.
The worker capacity and task difficulty parameters are only applied to determining the stage of allocation using the hungarian algorithm after allocation, but in practice the worker capacity and task difficulty have a great influence in the decision making process of allocation length. Consider that in one case, the worker task distance expires longer, which results in generally lower worker capacity and generally higher task difficulty, the overall completion rate of the dispense would be significantly lower, and the dispense length should be longer.
Disclosure of Invention
According to the problems existing in the prior art, the invention discloses a dynamic bipartite graph allocation length decision method based on a cyclic neural network and reinforcement learning, which specifically comprises the following steps:
s1: judging whether a worker or a task arrives at the current moment, if not, directly ending, and if so, entering S2;
s2: if the task arrives, the worker adds the worker into the available worker set W, and if the worker arrives, the worker adds the worker into the available task set T;
s3: acquiring a currently available worker set W= { W according to the time information 1 ,w 2 ,...,w n Sum available task set t= { T } 1 ,t 2 ,...,t m };
S4: reading parameter information of a worker set and a task set;
wherein the worker set parameters are
Figure BDA0003171875160000021
The urgency level including all worker's ability and remaining time (urgency level formula +.>
Figure BDA0003171875160000022
The remaining time to duration);
the task set parameters are
Figure BDA0003171875160000023
Including the difficulty of the task used and the urgency of the remaining time (urgency formula +.>
Figure BDA0003171875160000024
);
S5: inputting the worker parameters and the task parameters into a reinforcement learning network, outputting a distributed Q value and a non-distributed Q value by the reinforcement learning network, and selecting actions corresponding to the larger Q value to execute at the current moment;
s6: if not, directly jumping to S8, and if so, performing S7;
s7: using hungarian algorithm for the current set of available workers w= { W 1 ,w 2 ,...,w n Sum available task set t= { T } 1 ,t 2 ,...,t m Performing task allocation and recording allocation rewards;
s8: removing expired workers and tasks from the available worker set W and the available task set T, and recording an expired penalty;
s9: and training the reinforcement learning network according to the obtained rewards and punishments, entering the next moment, and returning to S1 to wait for a newly arrived worker or task.
Further, whether a new worker and a new crowdsourcing task arrive or not is judged, information of the worker and the crowdsourcing task is read, and a worker adopts a quadruple w=<l w ,s w ,e w ,a w >Description in which l w Representing worker position s w Indicating the arrival time of the worker e w Indicating the departure time of a worker, a w Representing worker's ability, a crowdsourcing task employs five-tuple t =<l t ,r t ,s t ,e t ,d t >Description in which l t Representing the position of a task s t Indicating the time at which the task starts, e t Representing the deadline of a task, r t Representing a task allocation scope, d t The difficulty of the task is represented and the task is rewarded;
let the current time be c, if s w ≤c<e w The worker or task is made available at the current time when c.gtoreq.e w The worker or task expires without being assignable. Similarly, if s t ≤c<e t The task is available at the current time when c is greater than or equal to e t The task is expired and unallocated
Judging whether a worker or a task arrives at each moment, if not, ending, if the arriving entity is the worker, adding an available worker set, otherwise, adding the available task set, and setting the current available worker set as W= { W 1 ,w 2 ,...,w n The current set of available tasks is noted as t= { T } 1 ,t 2 ,...,t m };
The method comprises the steps of determining the allocation length at each moment, wherein a reinforcement learning network is adopted to determine whether to allocate at the current moment according to the data of an available worker set and an available task set, the decision is aimed at maximizing the total completion rate and minimizing the expiration number of chemical human tasks, and if the allocation is carried out at the current moment, a Hungary algorithm is used for allocating the current available worker task set to maximize the total completion rate;
constructing bipartite graph b=from a worker task set<T,W,E>E represents an edge, representingThe assigned worker task pair takes the weight of the edge as the completion rate and adopts p ij The formula is as follows:
Figure BDA0003171875160000031
recording rewards after completing the distribution and removing expired worker tasks and distributed worker tasks to enter the next moment.
Further, the interactive environment of the reinforcement learning network is an MDP model, and the MDP model comprises the environment existence state
Figure BDA0003171875160000032
Action of network selection->
Figure BDA0003171875160000033
Rewards returned after performing an action +.>
Figure BDA0003171875160000034
Transfer function of state after execution of action->
Figure BDA0003171875160000041
Wherein the status is
Figure BDA0003171875160000042
Defined as the currently available task set parameter +.>
Figure BDA0003171875160000043
Parameter of the set of available workers->
Figure BDA0003171875160000044
Wherein u is w 、u t Indicating the degree of time urgency, the calculation method is as follows
Figure BDA0003171875160000045
Action
Figure BDA0003171875160000046
Defined as 2 different actions, 1 indicating allocation and 0 indicating no allocation;
rewards
Figure BDA0003171875160000047
The calculation formula is as follows:
Figure BDA0003171875160000048
representing hungarian distribution result rewards minus distribution costs minus expiration penalties, x ij Representing that the hungarian allocation result task i is allocated to worker j with 1, otherwise with 0, cost with allocation cost,
Figure BDA0003171875160000049
the Boolean function representing whether the expiration is expired, the expiration is 1, otherwise, the Boolean function is 0, and if the distribution is not performed, the Hungary distribution result and the distribution cost are both 0;
transfer function
Figure BDA00031718751600000410
The next state is the current worker task set minus the expired worker task set minus the assigned worker task set plus the workers or tasks that arrive at the next time, if not assigned, the assigned worker task set is an empty set.
By adopting the technical scheme, the dynamic bipartite graph allocation length decision method based on the cyclic neural network and reinforcement learning provided by the invention takes the information related to the completion rate, such as the residual time information of the worker task and the task difficulty of the worker capability, into consideration when the dynamic bipartite graph allocation length decision is carried out. Therefore, various parameters can be comprehensively considered in decision making, and the best result can be obtained under various conditions compared with the prior method which only considers the respective number of the tasks of the workers; secondly, the reinforced learning method based on the DQN is used and the LSTM is used for preprocessing the data, so that the problem that the state space is difficult to train and store and must be reduced due to overlarge state space originally is solved; finally, the problem of expiration penalties is considered in rewards, not only to remove expired worker tasks in the state, but also to penalize them in rewards, which allows our approach to take into account minimizing the number of expired tasks while maximizing the overall completion rate. The past methods have only exhibited an expired worker task in terms of a reduction in the number of worker tasks, and thus have tended to be more difficult to handle in a variety of environments for longer dispense lengths.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a schematic diagram of a reinforcement learning module network structure in the method of the present invention;
Detailed Description
In order to make the technical scheme and advantages of the present invention more clear, the technical scheme in the embodiment of the present invention is clearly and completely described below with reference to the accompanying drawings in the embodiment of the present invention:
the dynamic bipartite graph allocation length decision method based on the cyclic neural network and reinforcement learning shown in fig. 1 specifically comprises the following steps:
s1: judging whether a worker or a task arrives at the current moment, if not, directly ending, and if so, entering S2;
s2: if the task arrives, the worker adds the worker into the available worker set W, and if the worker arrives, the worker adds the worker into the available task set T;
s3: acquiring a currently available worker set W= { W according to the time information 1 ,w 2 ,...,w n Sum available task set t= { T } 1 ,t 2 ,...,t m };
S4: reading parameter information of a worker set and a task set;
wherein the worker set parameters are
Figure BDA0003171875160000051
The urgency including all worker capacity and time remaining;
the task set parameters are
Figure BDA0003171875160000052
Including the difficulty of the task used and the urgency of the remaining time;
s5: inputting the worker parameters and the task parameters into a reinforcement learning network, outputting a distributed Q value and a non-distributed Q value by the reinforcement learning network, and selecting actions corresponding to the larger Q value to execute at the current moment;
s6: if not, directly jumping to S8, and if so, performing S7;
s7: using hungarian algorithm for the current set of available workers w= { W 1 ,w 2 ,...,w n Sum available task set t= { T } 1 ,t 2 ,...,t m Performing task allocation and recording allocation rewards;
s8: removing expired workers and tasks from the available worker set W and the available task set T, and recording an expired penalty;
s9: and training the reinforcement learning network according to the obtained rewards and punishments, entering the next moment, and returning to S1 to wait for a newly arrived worker or task.
Judging whether a new worker and a new crowdsourcing task arrive or not, reading information of the worker and the crowdsourcing task, and adopting a quadruple w=for one worker<l w ,s w ,e w ,a w >Description in which l w Representing worker position s w Indicating the arrival time of the worker e w Indicating the departure time of a worker, a w Representing worker's ability, a crowdsourcing task employs five-tuple t =<l t ,r t ,s t ,e t ,d t >Description in which l t Representing the position of a task s t Indicating the time at which the task starts, e t Representing deadlines for tasks,r t Representing a task allocation scope, d t The difficulty of the task is represented and the task is rewarded;
let the current time be c, if s w ≤c<e w The worker or task is made available at the current time when c.gtoreq.e w The worker or task is expired and not assignable;
judging whether a worker or a task arrives at each moment, if not, ending, if the arriving entity is the worker, adding an available worker set, otherwise, adding the available task set, and setting the current available worker set as W= { W 1 ,w 2 ,...,w n The current set of available tasks is noted as t= { T } 1 ,t 2 ,...,t m };
The method comprises the steps of determining the allocation length at each moment, wherein a reinforcement learning network is adopted to determine whether to allocate at the current moment according to the data of an available worker set and an available task set, the decision is aimed at maximizing the total completion rate and minimizing the expiration number of chemical human tasks, and if the allocation is carried out at the current moment, a Hungary algorithm is used for allocating the current available worker task set to maximize the total completion rate;
constructing bipartite graph b=from a worker task set<T,W,E>E represents a side, represents a worker task pair capable of being allocated, the weight of the side is the completion rate, and p is adopted ij The formula is as follows:
Figure BDA0003171875160000061
recording rewards after completing the distribution and removing expired worker tasks and distributed worker tasks to enter the next moment.
Further, the interactive environment of the reinforcement learning network is an MDP model, and the MDP model comprises the environment existence state
Figure BDA0003171875160000062
Action of network selection->
Figure BDA0003171875160000063
Rewards returned after performing an action +.>
Figure BDA0003171875160000064
Transfer function of state after execution of action->
Figure BDA0003171875160000065
Wherein the status is
Figure BDA0003171875160000066
Defined as the currently available task set parameter +.>
Figure BDA0003171875160000067
Parameter of the set of available workers->
Figure BDA0003171875160000068
Wherein u is w 、u t Indicating the degree of time urgency, the calculation method is as follows
Figure BDA0003171875160000071
Action
Figure BDA0003171875160000072
Defined as 2 different actions, 1 indicating allocation and 0 indicating no allocation;
rewards
Figure BDA0003171875160000073
The calculation formula is as follows:
Figure BDA0003171875160000074
representing hungarian distribution result rewards minus distribution costs minus expiration penalties, x ij Representing that the hungarian allocation result task i is allocated to worker j with 1, otherwise with 0, cost with allocation cost,
Figure BDA0003171875160000075
the Boolean function representing whether the expiration is expired, the expiration is 1, otherwise, the Boolean function is 0, and if the distribution is not performed, the Hungary distribution result and the distribution cost are both 0;
transfer function
Figure BDA0003171875160000076
The next state is the current worker task set minus the expired worker task set minus the assigned worker task set plus the workers or tasks that arrive at the next time, if not assigned, the assigned worker task set is an empty set.
The foregoing is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art, who is within the scope of the present invention, should make equivalent substitutions or modifications according to the technical scheme of the present invention and the inventive concept thereof, and should be covered by the scope of the present invention.

Claims (3)

1. A dynamic bipartite graph allocation length decision method based on a cyclic neural network and reinforcement learning is characterized by comprising the following steps:
s1: judging whether a worker or a task arrives at the current moment, if not, directly ending, and if so, entering S2;
s2: if the task arrives, the worker adds the worker into the available worker set W, and if the worker arrives, the worker adds the worker into the available task set T;
s3: acquiring a currently available worker set W= { W according to the time information 1 ,w 2 ,...,w n Sum available task set t= { T } 1 ,t 2 ,...,t m };
S4: reading parameter information of a worker set and a task set;
wherein the worker set parameters are
Figure FDA0003171875150000011
The urgency including all worker capacity and time remaining;
tasksSet parameters are
Figure FDA0003171875150000012
Including the difficulty of the task used and the urgency of the remaining time;
s5: inputting the worker parameters and the task parameters into a reinforcement learning network, outputting a distributed Q value and a non-distributed Q value by the reinforcement learning network, and selecting actions corresponding to the larger Q value to execute at the current moment;
s6: if not, directly jumping to S8, and if so, performing S7;
s7: using hungarian algorithm for the current set of available workers w= { W 1 ,w 2 ,...,w n Sum available task set t= { T } 1 ,t 2 ,...,t m Performing task allocation and recording allocation rewards;
s8: removing expired workers and tasks from the available worker set W and the available task set T, and recording an expired penalty;
s9: and training the reinforcement learning network according to the obtained rewards and punishments, entering the next moment, and returning to S1 to wait for a newly arrived worker or task.
2. The dynamic bipartite graph allocation length decision method according to claim 1, wherein: judging whether a new worker and a new crowdsourcing task arrive or not, reading information of the worker and the crowdsourcing task, and enabling one worker to adopt a quadruple w=<l w ,s w ,e w ,a w >Description in which l w Representing worker position s w Indicating the arrival time of the worker e w Indicating the departure time of a worker, a w Representing worker's ability, a crowdsourcing task employs five-tuple t =<l t ,r t ,s t ,e t ,d t >Description in which l t Representing the position of a task s t Indicating the time at which the task starts, e t Representing the deadline of a task, r t Representing a task allocation scope, d t Representing the difficulty of the task and completing the taskIs a reward of (a);
let the current time be c, if s w ≤c<e w The worker is available at the current time when c is not less than e w The worker is overdue and not assignable, and similarly, if s t ≤c<e t The task is available at the current time when c is greater than or equal to e t The task is expired and not assignable;
judging whether a worker or a task arrives at each moment, if not, ending, if the arriving entity is the worker, adding an available worker set, otherwise, adding the available task set, and setting the current available worker set as W= { W 1 ,w 2 ,...,w n The current set of available tasks is noted as t= { T } 1 ,t 2 ,...,t m };
The method comprises the steps of determining the allocation length at each moment, wherein a reinforcement learning network is adopted to determine whether to allocate at the current moment according to the data of an available worker set and an available task set, the decision is aimed at maximizing the total completion rate and minimizing the expiration number of chemical human tasks, and if the allocation is performed at the current moment, a Hungary algorithm is used for allocating the current available worker task set to maximize the total completion rate;
constructing bipartite graph b=from a worker task set<T,W,E>E represents a side, represents a worker task pair capable of being allocated, the weight of the side is the completion rate, and p is adopted ij The formula is as follows:
Figure FDA0003171875150000021
recording rewards after completing the distribution and removing expired worker tasks and distributed worker tasks to enter the next moment.
3. The dynamic bipartite graph allocation length decision method according to claim 1, wherein: the interactive environment of the reinforcement learning network is an MDP model, and the MDP model comprises the state of environment existence
Figure FDA0003171875150000022
Action of network selection->
Figure FDA0003171875150000023
Rewards returned after performing an action +.>
Figure FDA0003171875150000024
Transfer function of state after execution of action->
Figure FDA0003171875150000025
Wherein the status is
Figure FDA0003171875150000026
Defined as the currently available task set parameter +.>
Figure FDA0003171875150000027
Parameter of the set of available workers->
Figure FDA0003171875150000031
Wherein u is w 、u t Indicating the degree of time urgency, the calculation method is as follows
Figure FDA0003171875150000032
Meaning the ratio of the remaining time to the total duration;
action
Figure FDA0003171875150000033
Defined as 2 different actions, 1 indicating allocation and 0 indicating no allocation;
rewards
Figure FDA0003171875150000034
The calculation formula is as follows:
Figure FDA0003171875150000035
representing hungarian distribution result rewards minus distribution costs minus expiration penalties, x ij Representing that the hungarian allocation result task i is allocated to worker j with 1, otherwise with 0, cost with allocation cost,
Figure FDA0003171875150000036
the Boolean function representing whether the expiration is expired, the expiration is 1, otherwise, the Boolean function is 0, and if the distribution is not performed, the Hungary distribution result and the distribution cost are both 0;
transfer function
Figure FDA0003171875150000037
The next state is the current worker task set minus the expired worker task set minus the assigned worker task set plus the workers or tasks that arrive at the next time, if not assigned, the assigned worker task set is an empty set.
CN202110820657.2A 2021-07-20 2021-07-20 Dynamic bipartite graph allocation length decision method based on cyclic neural network and reinforcement learning Active CN113671827B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110820657.2A CN113671827B (en) 2021-07-20 2021-07-20 Dynamic bipartite graph allocation length decision method based on cyclic neural network and reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110820657.2A CN113671827B (en) 2021-07-20 2021-07-20 Dynamic bipartite graph allocation length decision method based on cyclic neural network and reinforcement learning

Publications (2)

Publication Number Publication Date
CN113671827A CN113671827A (en) 2021-11-19
CN113671827B true CN113671827B (en) 2023-06-27

Family

ID=78539669

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110820657.2A Active CN113671827B (en) 2021-07-20 2021-07-20 Dynamic bipartite graph allocation length decision method based on cyclic neural network and reinforcement learning

Country Status (1)

Country Link
CN (1) CN113671827B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110458429A (en) * 2019-07-29 2019-11-15 暨南大学 A kind of intelligent task distribution and personal scheduling method, system for geographical site
EP3806007A1 (en) * 2019-10-09 2021-04-14 Bayerische Motoren Werke Aktiengesellschaft Methods, computer programs and systems for assigning vehicles to vehicular tasks and for providing a machine-learning model
CN113093727A (en) * 2021-03-08 2021-07-09 哈尔滨工业大学(深圳) Robot map-free navigation method based on deep security reinforcement learning

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11640528B2 (en) * 2019-10-22 2023-05-02 Baidu Usa Llc Method, electronic device and computer readable medium for information processing for accelerating neural network training

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110458429A (en) * 2019-07-29 2019-11-15 暨南大学 A kind of intelligent task distribution and personal scheduling method, system for geographical site
EP3806007A1 (en) * 2019-10-09 2021-04-14 Bayerische Motoren Werke Aktiengesellschaft Methods, computer programs and systems for assigning vehicles to vehicular tasks and for providing a machine-learning model
CN113093727A (en) * 2021-03-08 2021-07-09 哈尔滨工业大学(深圳) Robot map-free navigation method based on deep security reinforcement learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
面向全局优化的时空众包任务分配算法;聂茜婵;张阳;余敦辉;张兴盛;;计算机应用(07);全文 *

Also Published As

Publication number Publication date
CN113671827A (en) 2021-11-19

Similar Documents

Publication Publication Date Title
CN109976909B (en) Learning-based low-delay task scheduling method in edge computing network
Han et al. Appointment scheduling and routing optimization of attended home delivery system with random customer behavior
CN114756358B (en) DAG task scheduling method, device, equipment and storage medium
CN112801430B (en) Task issuing method and device, electronic equipment and readable storage medium
Hildebrandt et al. Supervised learning for arrival time estimations in restaurant meal delivery
Gravin et al. Procrastination with variable present bias
Kolker Healthcare management engineering: what does this fancy term really mean?: The use of operations management methodology for quantitative decision-making in healthcare settings
CN115271130B (en) Dynamic scheduling method and system for maintenance order of ship main power equipment
CN111813524B (en) Task execution method and device, electronic equipment and storage medium
CN114546608A (en) Task scheduling method based on edge calculation
CN113671827B (en) Dynamic bipartite graph allocation length decision method based on cyclic neural network and reinforcement learning
CN110263136B (en) Method and device for pushing object to user based on reinforcement learning model
CN115562832A (en) Multi-resource service function chain scheduling method based on deep reinforcement learning
Jia et al. Dynamic container drayage with uncertain request arrival times and service time windows
CN117789945A (en) Depth reinforcement learning-based clinic service sequential scheduling decision method
WO2024011867A1 (en) Method and apparatus for optimizing scheduling table of a plurality of bus routes, and related device
CN116069473A (en) Deep reinforcement learning-based Yarn cluster workflow scheduling method
Ahn et al. Attitude driven team formation using multi-dimensional trust
CN112559144A (en) Intelligent sharing device scheduling method facing data and information rights and interests exchange
CN115409243A (en) Operation resource allocation and task scheduling method and terminal
CN117909348B (en) Associated data scheduling and calculating method and device
CN115907441B (en) Business process simulation method and system
CN117808386B (en) All-online AGV material distribution network order distribution method, equipment and medium
Xie Decentralized and Dynamic Home Health Care Resource Scheduling Using an Agent-Based Model
Kouvelis et al. Robust discrete optimization: past successes and future challenges

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant