CN108491255B - Self-service MapReduce data optimal distribution method and system - Google Patents

Self-service MapReduce data optimal distribution method and system Download PDF

Info

Publication number
CN108491255B
CN108491255B CN201810130531.0A CN201810130531A CN108491255B CN 108491255 B CN108491255 B CN 108491255B CN 201810130531 A CN201810130531 A CN 201810130531A CN 108491255 B CN108491255 B CN 108491255B
Authority
CN
China
Prior art keywords
task
module
tasks
mapreduce
task execution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810130531.0A
Other languages
Chinese (zh)
Other versions
CN108491255A (en
Inventor
崔鹏飞
田春华
史巨伟
李闯
刘家扬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunlun Intellectual Exchange Data Technology Beijing Co ltd
Original Assignee
Kunlun Intellectual Exchange Data Technology Beijing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kunlun Intellectual Exchange Data Technology Beijing Co ltd filed Critical Kunlun Intellectual Exchange Data Technology Beijing Co ltd
Priority to CN201810130531.0A priority Critical patent/CN108491255B/en
Publication of CN108491255A publication Critical patent/CN108491255A/en
Application granted granted Critical
Publication of CN108491255B publication Critical patent/CN108491255B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5038Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • G06F9/5088Techniques for rebalancing the load in a distributed system involving task migration

Abstract

The invention provides a self-service MapReduce data optimal distribution method and a self-service MapReduce data optimal distribution system, wherein the method comprises the following steps: the method comprises the steps that a job analysis module receives a MapReduce job data packet sent by a client and analyzes the MapReduce job data packet into a task and job data parameters; the task queue forming module adds the tasks into the task queue according to the task scheduling strategy; the task execution history log recording module records task execution history logs of the plurality of task execution modules for the task allocation and scheduling module to read in real time; the task allocation and scheduling module calculates a task optimization allocation scheme according to the job data parameters and the task execution historical log, and calls the tasks in the task queue according to the task optimization allocation scheme and sends the tasks to the task execution module; the plurality of task execution modules respectively execute the tasks and report task execution history logs. The method and the system optimize task scheduling according to the size of the data block of the task, the physical node distribution of the data block and the performance of each available node.

Description

Self-service MapReduce data optimal distribution method and system
Technical Field
The invention relates to the technical field of data optimized distribution, in particular to a self-service MapReduce data optimized distribution method and system.
Background
MapReduce is a programming model for parallel operation of large-scale data sets (greater than 1 TB). The MapReduce system is a distributed parallel system, and realizes distributed processing on data in the Mapreduce system through mapping (Map) and reduction (Reduce) processes. Task scheduling is a key process in the MapReduce task.
The Mapreduce system has three main stream task scheduling strategies, namely Capacity Scheduler, fair Scheduler, and FIFO (First Input First Output, First in First out queue scheduling). The three strategies all adopt a three-level scheduling mode, namely, one queue, one job and one task are selected for an idle slot (position) at a time.
Different schedulers use different policies at the queue and job level, and the same policy, i.e. locality policy, at the task level (task). The locality strategy cannot fully utilize the functions of each node in the Mapreduce system, and resource waste is caused.
In the prior art, except for a local policy, random allocation is adopted for other types of data in the Mapreduce system, the execution state of an available node is not recorded in real time, and optimal allocation calculation is also not performed on the available node and a task to be executed, so that resources of the available node in the MR system cannot be fully utilized, resource waste is caused, and task execution efficiency is low.
Disclosure of Invention
In view of the above, the present invention is proposed to provide a self-service MapReduce data optimized distribution method and system that overcomes or at least partially solves the above-mentioned problems.
One aspect of the invention provides a self-service MapReduce data optimization distribution method, which comprises the following steps:
the method comprises the steps that a job analysis module receives a MapReduce job data packet sent by a client, analyzes the MapReduce job data packet into tasks and job data parameters, and respectively sends the tasks and the job data parameters to a task queue forming module and a task distributing and scheduling module; the task queue forming module adds the tasks into the task queue according to the task scheduling strategy; the task execution history log recording module records task execution history logs of the plurality of task execution modules for the task allocation and scheduling module to read in real time; the task allocation and scheduling module calculates a task optimization allocation scheme according to the job data parameters and the task execution historical log, and calls the tasks in the task queue according to the task optimization allocation scheme and sends the tasks to the task execution module; the plurality of task execution modules respectively execute the tasks and report task execution history logs to the task execution history log recording module.
The tasks in the task queue have priorities and corresponding data blocks, and the priorities are consistent with the priorities of the MapReduce job data packets.
And the task execution module is a task execution node in the Mapreduce system topological structure.
And the task allocation and scheduling module stores Mapreduce system topological structure information, wherein the Mapreduce system topological structure information comprises the positions of all nodes and the connection relation among all the nodes.
The job data parameters include: and the size information of the data block in the task and the position information of the node where the data block is located.
The task scheduling strategy comprises the following steps: capacity scheduling, fair scheduling, first-in first-out queue scheduling.
The task execution history log includes: the execution time of each task executed in history in the task execution module, the data block size of the task, the data block position, the data transmission time of the data block among different nodes and the data block attribute.
The task allocation and scheduling module calculates a task optimal allocation scheme according to the job data parameters and the task execution historical log, and comprises the following steps of:
s11, obtaining available task execution nodes _1, node _2, … … and node _ m in the Mapreduce system, and tasks task _1, task _2, … … and task _ n to be executed; s12, memory of SijIs a decision variable, where sij0 or sij=1,sij1 denotes that task _ i is executed on node _ j, 1 ≦ i ≦ n, 1 ≦ j ≦ n, satisfying the constraint ΣjSij1, it means that one executing node can only execute one task at the same time; s13, the execution time of the data block of the ith task on the jth available task execution node is tijThe transmission time from the data block of the ith task to the jth available task execution node is
Figure BDA0001574784600000031
Wherein, the execution time and the transmission time are calculated according to the task execution history log; s14, the optimization target is
Figure BDA0001574784600000032
I.e. all tasks are availableThe service execution node executes the completion in the shortest time.
In another aspect of the present invention, a self-service MapReduce data optimization distribution system is provided, including:
the system comprises a task analysis module, a task queue forming module and a task allocation and scheduling module, wherein the task analysis module is used for receiving a MapReduce task data packet sent by a client, analyzing the MapReduce task data packet into tasks and task data parameters, and respectively sending the tasks and the task data parameters to the task queue forming module and the task allocation and scheduling module; the task queue forming module is used for adding the tasks into the task queue according to the task scheduling strategy; the task execution history log recording module is used for recording task execution history logs of the plurality of task execution modules so as to be read by the task allocation and scheduling module in real time; the task allocation and scheduling module is used for calculating a task optimization allocation scheme according to the job data parameters and the task execution historical logs, and calling the tasks in the task queue according to the task optimization allocation scheme and sending the tasks to the task execution module; and the task execution modules are used for respectively executing the tasks and reporting the task execution history logs to the task execution history log recording module.
The self-service MapReduce data optimization allocation method and the self-service MapReduce data optimization allocation system optimize task scheduling according to the data block size of a task, the physical node distribution of the data block and the performance of each available node, and estimate the performance of each node, the size of the data block and the movement relation according to the multiple execution results of each available node, namely a history, so that the self-service MapReduce data optimization allocation method and the self-service MapReduce data optimization allocation system not only consider the locality of the task, but also consider the calculation performance of the nodes, and enhance the success rate and the execution efficiency of task execution.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
FIG. 1 is a step diagram of a self-service MapReduce data optimization distribution method according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a framework of a self-service MapReduce data optimization distribution system according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
Fig. 1 is a step diagram of a self-service MapReduce data optimal allocation method according to an embodiment of the present invention. The MapReduce system is a distributed parallel system, and realizes distributed processing on data in the Mapreduce system through mapping (Map) and reduction (Reduce) processes. Referring to fig. 1, the self-service MapReduce data optimal allocation method provided by the embodiment of the invention specifically includes the following steps:
and step S1, the job analysis module receives the MapReduce job data packet sent by the client, analyzes the MapReduce job data packet into tasks and job data parameters, and respectively sends the tasks and the job data parameters to the task queue forming module and the task allocation and scheduling module.
In practical applications, the job data parameters include: and the size information of the data block in the task and the position information of the node where the data block is located.
And step S2, the task queue forming module adds the task into the task queue according to the task scheduling strategy.
In an embodiment, the tasks in the task queue have a priority and corresponding data blocks, and the priority is consistent with the priority of the MapReduce job data packet. The task scheduling strategy comprises the following steps: capacity scheduling, fair scheduling, first-in first-out queue scheduling.
Step S3, the task execution history log recording module records task execution history logs of the plurality of task execution modules for the task allocation and scheduling module to read in real time.
In an embodiment, the task execution history log comprises: the execution time of each task executed in history in the task execution module, the data block size of the task, the data block position, the data transmission time of the data block among different nodes and the data block attribute.
And step S4, the task allocation and scheduling module calculates a task optimization allocation scheme according to the job data parameters and the task execution history log, and calls the tasks in the task queue according to the task optimization allocation scheme and sends the tasks to the task execution module.
In the embodiment, the task allocation and scheduling module stores Mapreduce system topology structure information, wherein the Mapreduce system topology structure information comprises positions of all nodes and connection relations among the nodes. The task allocation principle of the task allocation and scheduling module is as follows: and estimating the time of the available nodes for executing the tasks, and preferentially distributing the tasks in the task queue to the available nodes with short execution time and high success rate, wherein the task execution time is estimated according to the size of the data block, the data transmission time, and historical performance logs of the available nodes, such as the task execution time, the success rate and the like. Specifically, the task allocation and scheduling module calculates a task optimization allocation formula according to the operation data parameters and the task execution history logThe method comprises the following steps: s11, obtaining available task execution nodes _1, node _2, … … and node _ m in the Mapreduce system, and tasks task _1, task _2, … … and task _ n to be executed; s12, memory of SijIs a decision variable, where sij0 or sij=1,sij1 denotes that task _ i is executed on node _ j, 1 ≦ i ≦ n, 1 ≦ j ≦ n, satisfying the constraint ΣjSij1, it means that one executing node can only execute one task at the same time; s13, the execution time of the data block of the ith task on the jth available task execution node is tijThe transmission time from the data block of the ith task to the jth available task execution node is
Figure BDA0001574784600000061
Wherein, the execution time and the transmission time are calculated according to the task execution history log; s14, the optimization target is
Figure BDA0001574784600000062
That is, all tasks are executed and completed in the shortest time at the available task execution node.
Step S5, the plurality of task execution modules execute the task and report the task execution history log to the task execution history log recording module.
In practical application, the task execution module is a node in a Mapreduce system topology.
The self-service MapReduce data optimization allocation method optimizes task scheduling according to the size of a data block of a task, the distribution of physical nodes of the data block and the performance of each available node, and estimates the performance of each node, the size of the data block and the movement relation according to the multiple execution results of each available node, namely a history, so that the self-service MapReduce data optimization allocation method and the self-service MapReduce data optimization allocation system not only consider the locality of the task, but also consider the calculation performance of the nodes, and enhance the success rate and the execution efficiency of task execution.
For simplicity of explanation, the method embodiments are described as a series of acts or combinations, but those skilled in the art will appreciate that the embodiments are not limited by the order of acts described, as some steps may occur in other orders or concurrently with other steps in accordance with the embodiments of the invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.
Fig. 2 is a schematic diagram of a framework of a self-service MapReduce data optimization distribution system according to an embodiment of the present invention. Referring to fig. 2, the self-service data optimization distribution system according to the embodiment of the present invention specifically includes:
the system comprises a task analysis module, a task queue forming module and a task allocation and scheduling module, wherein the task analysis module is used for receiving a MapReduce task data packet sent by a client, analyzing the MapReduce task data packet into tasks and task data parameters, and respectively sending the tasks and the task data parameters to the task queue forming module and the task allocation and scheduling module; the task queue forming module is used for adding the tasks into the task queue according to the task scheduling strategy; the task execution history log recording module is used for recording task execution history logs of the plurality of task execution modules so as to be read by the task allocation and scheduling module in real time; the task allocation and scheduling module is used for calculating a task optimization allocation scheme according to the job data parameters and the task execution historical logs, and calling the tasks in the task queue according to the task optimization allocation scheme and sending the tasks to the task execution module; and the task execution modules are used for respectively executing the tasks and reporting the task execution history logs to the task execution history log recording module.
Specifically, when the job analysis module receives a MapReduce job data packet, the working principle of the self-service MapReduce data optimization distribution system is as follows: the method comprises the steps that a client submits a MapReduce operation data packet to an operation analysis module, the operation analysis module receives the MapReduce operation data packet sent by the client, analyzes the MapReduce operation data packet into a plurality of map tasks, reduce tasks and operation data parameters, and respectively sends the map tasks and the operation data parameters to a task queue forming module and a task distribution and scheduling module; the task queue forming module adds a plurality of map tasks into a task queue according to a task scheduling strategy; the task execution history log recording module records task execution history logs of the plurality of task execution modules for the task allocation and scheduling module to read in real time; the task allocation and scheduling module calculates a task optimization allocation scheme according to the job data parameters and the task execution historical log, and calls a plurality of map tasks in the task queue according to the task optimization allocation scheme and sends the map tasks to the task execution module; the plurality of task execution modules execute the map tasks assigned thereto and report the task execution history log to the task execution history log recording module.
Specifically, when the job analysis module receives a plurality of MapReduce job data packets, the working principle of the self-service MapReduce data optimization distribution system is as follows: the method comprises the steps that a client submits a plurality of MapReduce operation data packets to an operation analysis module, the operation analysis module receives the MapReduce operation data packets sent by the client, the MapReduce operation data packets with the same priority are respectively analyzed into a plurality of map tasks, reduce tasks and operation data parameters, the priority of the map tasks is the same as that of the MapReduce operation data packets, and the map tasks and the operation data parameters with the same priority are respectively sent to a task queue forming module and a task distributing and scheduling module; the task queue forming module adds a plurality of map tasks into a task queue according to a task scheduling strategy; the task execution history log recording module records task execution history logs of the plurality of task execution modules for the task allocation and scheduling module to read in real time; the task allocation and scheduling module calculates a task optimization allocation scheme according to the job data parameters and the task execution historical log, and calls a plurality of map tasks in the task queue according to the task optimization allocation scheme and sends the map tasks to the task execution module; the plurality of task execution modules execute the map tasks and report task execution history logs to the task execution history log recording module.
In the embodiment of the present invention, multiple clients and multiple task execution modules may be included.
For the system embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.
The self-service MapReduce data optimization allocation method and the self-service MapReduce data optimization allocation system optimize task scheduling according to the data block size of a task, the physical node distribution of the data block and the performance of each available node, and estimate the performance of each node, the size of the data block and the movement relation according to the multiple execution results of each available node, namely a history, so that the self-service MapReduce data optimization allocation method and the self-service MapReduce data optimization allocation system not only consider the locality of the task, but also consider the calculation performance of the nodes, and enhance the success rate and the execution efficiency of task execution.
The above-described system embodiments are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Furthermore, those skilled in the art will appreciate that while some embodiments herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (3)

1. A self-service MapReduce data optimization distribution method is characterized by comprising the following steps:
the method comprises the steps that a job analysis module receives a MapReduce job data packet sent by a client, analyzes the MapReduce job data packet into tasks and job data parameters, and respectively sends the tasks and the job data parameters to a task queue forming module and a task distributing and scheduling module;
the task queue forming module adds the tasks into the task queue according to the task scheduling strategy;
the task execution history log recording module records task execution history logs of the plurality of task execution modules for the task allocation and scheduling module to read in real time;
the task allocation and scheduling module calculates a task optimization allocation scheme according to the job data parameters and the task execution historical log, and calls the tasks in the task queue according to the task optimization allocation scheme and sends the tasks to the task execution module;
the plurality of task execution modules respectively execute the tasks and report task execution history logs to the task execution history log recording module;
the tasks in the task queue have priorities and corresponding data blocks, and the priorities are consistent with the priorities of the MapReduce job data packets;
the task execution module is a task execution node in a Mapreduce system topological structure;
the task allocation and scheduling module stores Mapreduce system topological structure information, wherein the Mapreduce system topological structure information comprises positions of all nodes and connection relations among the nodes;
the job data parameters include: the method comprises the steps that in a task, data block size information and node position information of a data block are obtained;
the task scheduling strategy comprises the following steps: capacity scheduling, fair scheduling and first-in first-out queue scheduling;
the task execution history log includes: the execution time of each historically executed task in a task execution module, the size of a data block of the task, the position of the data block, the data transmission time of the data block among different nodes and the attribute of the data block;
the task allocation and scheduling module calculates a task optimal allocation scheme according to the job data parameters and the task execution historical log, and comprises the following steps of:
s11, obtaining available task execution nodes _1, node _2, … … and node _ m in the Mapreduce system, and tasks task _1, task _2, … … and task _ n to be executed;
s12, memory of SijIs a decision variable, where sij0 or sij=1,sij1 denotes that task _ i is executed on node _ j, 1 ≦ i ≦ n, 1 ≦ j ≦ n, satisfying the constraint ΣjSij1, it means that one executing node can only execute one task at the same time;
s13, the execution time of the data block of the ith task on the jth available task execution node is tijThe transmission time from the data block of the ith task to the jth available task execution node is
Figure FDA0002550866090000021
Wherein, the execution time and the transmission time are calculated according to the task execution history log;
s14, the optimization target is
Figure FDA0002550866090000022
That is, all tasks are executed and completed in the shortest time at the available task execution node.
2. A system for implementing the self-service MapReduce data optimized distribution method of claim 1, comprising:
the system comprises a task analysis module, a task queue forming module and a task allocation and scheduling module, wherein the task analysis module is used for receiving a MapReduce task data packet sent by a client, analyzing the MapReduce task data packet into tasks and task data parameters, and respectively sending the tasks and the task data parameters to the task queue forming module and the task allocation and scheduling module;
the task queue forming module is used for adding the tasks into the task queue according to the task scheduling strategy;
the task execution history log recording module is used for recording task execution history logs of the plurality of task execution modules so as to be read by the task allocation and scheduling module in real time;
the task allocation and scheduling module is used for calculating a task optimization allocation scheme according to the job data parameters and the task execution historical logs, and calling the tasks in the task queue according to the task optimization allocation scheme and sending the tasks to the task execution module;
and the task execution modules are used for respectively executing the tasks and reporting the task execution history logs to the task execution history log recording module.
3. The system of claim 2, wherein the task execution modules are nodes in a Mapreduce system topology.
CN201810130531.0A 2018-02-08 2018-02-08 Self-service MapReduce data optimal distribution method and system Active CN108491255B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810130531.0A CN108491255B (en) 2018-02-08 2018-02-08 Self-service MapReduce data optimal distribution method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810130531.0A CN108491255B (en) 2018-02-08 2018-02-08 Self-service MapReduce data optimal distribution method and system

Publications (2)

Publication Number Publication Date
CN108491255A CN108491255A (en) 2018-09-04
CN108491255B true CN108491255B (en) 2020-11-03

Family

ID=63340023

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810130531.0A Active CN108491255B (en) 2018-02-08 2018-02-08 Self-service MapReduce data optimal distribution method and system

Country Status (1)

Country Link
CN (1) CN108491255B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110222018A (en) * 2019-05-14 2019-09-10 联动优势科技有限公司 Data summarization executes method and device
CN110609850A (en) * 2019-08-01 2019-12-24 联想(北京)有限公司 Information determination method, electronic equipment and computer storage medium
CN112422169B (en) * 2020-11-04 2022-07-26 中国空间技术研究院 Method, device and system for coordinating nodes of composite link
CN113296907B (en) * 2021-04-29 2023-12-22 上海淇玥信息技术有限公司 Task scheduling processing method, system and computer equipment based on clusters
CN116723225A (en) * 2023-06-16 2023-09-08 广州银汉科技有限公司 Automatic allocation method and system for game tasks

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101227500A (en) * 2008-02-21 2008-07-23 上海交通大学 Task scheduling method based on optical grid
CN101957863A (en) * 2010-10-14 2011-01-26 广州从兴电子开发有限公司 Data parallel processing method, device and system
CN102710785A (en) * 2012-06-15 2012-10-03 哈尔滨工业大学 Cloud service node architecture in self-service tourism system, and service collaborating and balancing module and method among service nodes in self-service tourism system
CN102760073A (en) * 2011-04-29 2012-10-31 中兴通讯股份有限公司 Method, system and device for scheduling task

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101986272A (en) * 2010-11-05 2011-03-16 北京大学 Task scheduling method under cloud computing environment
US9367601B2 (en) * 2012-03-26 2016-06-14 Duke University Cost-based optimization of configuration parameters and cluster sizing for hadoop
WO2014117295A1 (en) * 2013-01-31 2014-08-07 Hewlett-Packard Development Company, L.P. Performing an index operation in a mapreduce environment
CN103399787B (en) * 2013-08-06 2016-09-14 北京华胜天成科技股份有限公司 A kind of MapReduce operation streaming dispatching method and dispatching patcher calculating platform based on Hadoop cloud
CN103631657B (en) * 2013-11-19 2017-08-25 浪潮电子信息产业股份有限公司 A kind of method for scheduling task based on MapReduce
CN104156505B (en) * 2014-07-22 2017-12-15 中国科学院信息工程研究所 A kind of Hadoop cluster job scheduling method and devices based on user behavior analysis
CN106155791B (en) * 2016-06-30 2019-05-07 电子科技大学 A kind of workflow task dispatching method under distributed environment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101227500A (en) * 2008-02-21 2008-07-23 上海交通大学 Task scheduling method based on optical grid
CN101957863A (en) * 2010-10-14 2011-01-26 广州从兴电子开发有限公司 Data parallel processing method, device and system
CN102760073A (en) * 2011-04-29 2012-10-31 中兴通讯股份有限公司 Method, system and device for scheduling task
CN102710785A (en) * 2012-06-15 2012-10-03 哈尔滨工业大学 Cloud service node architecture in self-service tourism system, and service collaborating and balancing module and method among service nodes in self-service tourism system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
多云环境下基于代价驱动的科学工作流调度策略;林兵 等;《模式识别与人工智能》;20151015;第28卷(第10期);第865-875页 *

Also Published As

Publication number Publication date
CN108491255A (en) 2018-09-04

Similar Documents

Publication Publication Date Title
CN108491255B (en) Self-service MapReduce data optimal distribution method and system
CN109885397B (en) Delay optimization load task migration algorithm in edge computing environment
CN103309738B (en) User job dispatching method and device
WO2021179462A1 (en) Improved quantum ant colony algorithm-based spark platform task scheduling method
Salah A queueing model to achieve proper elasticity for cloud cluster jobs
CN109788315A (en) Video transcoding method, apparatus and system
WO2016148963A1 (en) Intelligent placement within a data center
US10521258B2 (en) Managing test services in a distributed production service environment
CN111444021A (en) Synchronous training method, server and system based on distributed machine learning
Che et al. A deep reinforcement learning approach to the optimization of data center task scheduling
CN113138860B (en) Message queue management method and device
CN110308984B (en) Cross-cluster computing system for processing geographically distributed data
CN103294548A (en) Distributed file system based IO (input output) request dispatching method and system
CN113228574A (en) Computing resource scheduling method, scheduler, internet of things system and computer readable medium
CN104580447A (en) Spatio-temporal data service scheduling method based on access heat
Tang et al. Dependent task offloading for multiple jobs in edge computing
CN110233802A (en) A method of the block chain framework of the building more side chains of one main chain
Banerjee et al. Analysis of finite-buffer bulk-arrival bulk-service queue with variable service capacity and batch-size-dependent service
Choi et al. An enhanced data-locality-aware task scheduling algorithm for hadoop applications
Li et al. Endpoint-flexible coflow scheduling across geo-distributed datacenters
CN105404554B (en) Method and apparatus for Storm stream calculation frame
CN117608840A (en) Task processing method and system for comprehensive management of resources of intelligent monitoring system
CN110247854B (en) Multi-level service scheduling method, scheduling system and scheduling controller
Larrañaga Dynamic control of stochastic and fluid resource-sharing systems
CN116192849A (en) Heterogeneous accelerator card calculation method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant