CN108845884A - Physical source distributing method, apparatus, computer equipment and storage medium - Google Patents
Physical source distributing method, apparatus, computer equipment and storage medium Download PDFInfo
- Publication number
- CN108845884A CN108845884A CN201810621848.4A CN201810621848A CN108845884A CN 108845884 A CN108845884 A CN 108845884A CN 201810621848 A CN201810621848 A CN 201810621848A CN 108845884 A CN108845884 A CN 108845884A
- Authority
- CN
- China
- Prior art keywords
- task
- spark
- resource
- spark task
- execution
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 76
- 238000013468 resource allocation Methods 0.000 claims abstract description 110
- 238000012544 monitoring process Methods 0.000 claims abstract description 43
- 230000006870 function Effects 0.000 claims description 99
- 238000004590 computer program Methods 0.000 claims description 29
- 230000002159 abnormal effect Effects 0.000 claims description 20
- 230000007812 deficiency Effects 0.000 claims description 7
- 230000000694 effects Effects 0.000 claims description 2
- 230000008569 process Effects 0.000 description 46
- 238000012545 processing Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 4
- 238000005538 encapsulation Methods 0.000 description 4
- 238000012986 modification Methods 0.000 description 4
- 230000004048 modification Effects 0.000 description 4
- 241000208340 Araliaceae Species 0.000 description 3
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 3
- 235000003140 Panax quinquefolius Nutrition 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 235000008434 ginseng Nutrition 0.000 description 3
- 230000009977 dual effect Effects 0.000 description 2
- 238000012423 maintenance Methods 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 230000001960 triggered effect Effects 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000000977 initiatory effect Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5011—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
- G06F9/5016—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/50—Indexing scheme relating to G06F9/50
- G06F2209/508—Monitor
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Abstract
This application involves a kind of physical source distributing method, apparatus, computer equipment and storage mediums.The method includes:Receive the Spark task and corresponding configuration file that terminal is submitted;The resource allocation parameters that Spark task is read from configuration file carry out physical source distributing according to resource allocation parameters;Physical resource based on distribution executes Spark task, monitors the execution efficiency of Spark task;When monitoring execution efficiency lower than threshold value, the resource allocation parameters in configuration file are adjusted;Spark task is dispatched to the physical resource being adapted with resource allocation parameters adjusted from allocated physical resource to continue to execute.Spark task version updating can not depended on using this method, Resource dynamic allocation is carried out to Spark task in time, and then improve Spark task run efficiency.
Description
Technical field
This application involves field of computer technology, set more particularly to a kind of physical source distributing method, apparatus, computer
Standby and storage medium.
Background technique
A kind of Spark (computing engines for large-scale data processing) task is committed to task schedule after the completion of exploitation
Platform.Task schedule platform can execute multiple Spark task schedules.Task schedule platform needs for each Spark task
Suitable physical resource is distributed, such as CPU (Central Processing Unit, central processing unit), memory etc..Resource allocation
It is unreasonable to will lead to Spark task run inefficiency, or even be unable to run at all.However, the money of traditional approach Spark task
Source allocation strategy is a kind of method of static state.Even if there is a situation where resource allocation it is unreasonable be also required to until Spark task into
Physical source distributing can be just re-started after row version updating, and thus Spark task run efficiency is affected greatly.
Summary of the invention
Based on this, it is necessary to which in view of the above technical problems, providing one kind, can not depend on Spark task version updating timely
To Spark task carry out Resource dynamic allocation, and then improve Spark task run efficiency physical source distributing method, apparatus,
Computer equipment and storage medium.
A kind of physical source distributing method, the method includes:Receive Spark task and corresponding configuration that terminal is submitted
File;The resource allocation parameters that the Spark task is read from the configuration file are carried out according to the resource allocation parameters
Physical source distributing;Physical resource based on distribution executes the Spark task;During the Spark task execution, monitoring
The execution efficiency of the Spark task;When monitoring the execution efficiency lower than threshold value, to the resource in the configuration file
Allocation of parameters is adjusted;The Spark task is dispatched to from allocated physical resource and is joined with resource allocation adjusted
It is continued to execute on the adaptable physical resource of number.
In one of the embodiments, it is described receive terminal submit Spark task and corresponding configuration file before, also
Including:It receives the Spark task that terminal is sent and develops request;The exploitation request is identified comprising entrance function;Enter described in identification
The corresponding function queue of mouth function identification;The function queue includes multiple business functions;Multiple business functions are distinguished
Be converted to corresponding multiple background tasks;It calls the entrance function to identify corresponding group's decorator to seal multiple background tasks
Dress is multiple tasks group;The dispatching sequence for configuring multiple tasks group, is packaged multiple tasks group based on the dispatching sequence,
Obtain the Spark task.
The Spark task includes Shell script in one of the embodiments,;The Shell script is prefixed to institute
State the call back function of configuration file;The resource allocation parameters that the Spark task is read from the configuration file, packet
It includes:By executing Spark task described in Shell script startup;Based on the call back function, the configuration file is returned in generation
Adjust instruction;Corresponding configuration file is pulled according to the callback instruction;The resource point is read from the configuration file pulled
With parameter.
The physical resource based on distribution executes the Spark task in one of the embodiments, including:By institute
It states Spark task and is split as multiple tasks group;Each task groups have corresponding task group identification;The task groups are split as
Multiple background tasks;Each background task has corresponding log decorator;Physical resource based on distribution executes multiple bases
Task generates the execution journal of each background task;Using the log decorator, in the execution journal of corresponding background task
Add the corresponding task group identification of the background task;When the Spark task execution finishes, there is same task group to record
Multiple execution journal of mark are collected, and generate the corresponding task daily record of each task group identification.
The execution efficiency of the monitoring Spark task in one of the embodiments, including:Calculate the Spark
The task total amount of task;The task duration of the Spark task is calculated according to the task total amount;According to preset time frequency tune
The operation information of the Spark task is acquired with task run monitor component;The Spark is calculated according to the operation information to appoint
It is engaged in the task execution amount of multiple timing nodes;According to the task execution amount and the task duration, calculates the Spark and appoint
It is engaged in the execution efficiency of multiple timing nodes.
It is described when monitoring the execution efficiency lower than threshold value in one of the embodiments, to the configuration file
In resource allocation parameters be adjusted, including:Compare whether the execution efficiency is lower than threshold value;If so, according to the task
Total amount and task execution amount calculate remaining task amount;Residual time length is calculated according to the task duration and current timing node;
Newly-increased physical resource is needed according to the remaining task amount and residual time length measuring and calculating;Otherwise, according to the operational information recording
Two neighboring timing node resource using information, computing resource utilization rate;It needs to release according to resource utilization measuring and calculating
The physical resource put;The resource allocation parameters are adjusted according to results of measuring.
It is described when monitoring the execution efficiency lower than threshold value in one of the embodiments, to the configuration file
In resource allocation parameters be adjusted, including:Compare whether the execution efficiency is lower than threshold value;If so, marking the Spark
Task execution is abnormal, obtains the task daily record of the Spark task;Abnormal cause positioning is carried out according to the task daily record;If
The abnormal cause includes physical resource deficiency, generates resource adjustment according to the resource allocation parameters of configuration file record and mentions
Show the page, the resource is adjusted into the prompt page and is sent to the terminal;The terminal is set to adjust the prompt page in the resource
The resource allocation parameters are adjusted.
A kind of physical source distributing device, described device include:Resource distribution module, for receiving terminal submission
Spark task and corresponding configuration file;The resource allocation parameters of the Spark task, root are read from the configuration file
Physical source distributing is carried out according to the resource allocation parameters;Efficiency monitoring module executes institute for the physical resource based on distribution
State Spark task;During the Spark task execution, the execution efficiency of the Spark task is monitored;Resource adjusts module,
For being adjusted to the resource allocation parameters in the configuration file when monitoring the execution efficiency lower than threshold value;It will
The Spark task is dispatched on the physical resource being adapted with resource allocation parameters adjusted from allocated physical resource
It continues to execute.
A kind of computer equipment, including memory and processor, the memory are stored with computer program, the processing
Device realizes following steps when executing the computer program:Receive the Spark task and corresponding configuration file that terminal is submitted;From
The resource allocation parameters that the Spark task is read in the configuration file carry out physics money according to the resource allocation parameters
Source distribution;Physical resource based on distribution executes the Spark task;During the Spark task execution, described in monitoring
The execution efficiency of Spark task;When monitoring the execution efficiency lower than threshold value, to the resource allocation in the configuration file
Parameter is adjusted;The Spark task is dispatched to and resource allocation parameters phase adjusted from allocated physical resource
It is continued to execute on the physical resource of adaptation.
A kind of computer readable storage medium, is stored thereon with computer program, and the computer program is held by processor
Following steps are realized when row:Receive the Spark task and corresponding configuration file that terminal is submitted;It is read from the configuration file
The resource allocation parameters of the Spark task carry out physical source distributing according to the resource allocation parameters;Object based on distribution
It manages resource and executes the Spark task;During the Spark task execution, the execution efficiency of the Spark task is monitored;
When monitoring the execution efficiency lower than threshold value, the resource allocation parameters in the configuration file are adjusted;It will be described
Spark task is dispatched on the physical resource being adapted with resource allocation parameters adjusted from allocated physical resource to be continued
It executes.
Above-mentioned physical source distributing method, apparatus, computer equipment and storage medium, the configuration file submitted according to terminal
The resource allocation parameters of the Spark task of middle record, the Spark task that can be submitted for terminal distribute physical resource;Based on point
The physical resource matched can execute Spark task;By monitoring the execution efficiency of the Spark task, can be tied according to monitoring
Fruit is adjusted the resource allocation parameters in the configuration file;By the Spark task schedule to resource adjusted
The adaptable physical resource of allocation of parameters executes.Since resource allocation parameters individually being stored in a manner of configuration file,
Independently of Spark task itself, resource allocation ginseng is flexibly freely modified so as to get rid of the limitation of Spark task version updating
Number;Real-time monitoring Spark task execution efficiency, and according to the physical resource of execution efficiency dynamic adjustment distribution, it is adapted to
Actual demand of the Spark task to physical resource, and then Spark task execution efficiency can be improved.
Detailed description of the invention
Fig. 1 is the application scenario diagram of physical source distributing method in one embodiment;
Fig. 2 is the flow diagram of physical source distributing method in one embodiment;
Fig. 3 is the structural block diagram of physical source distributing device in one embodiment;
Fig. 4 is the internal structure chart of computer equipment in one embodiment.
Specific embodiment
It is with reference to the accompanying drawings and embodiments, right in order to which the objects, technical solutions and advantages of the application are more clearly understood
The application is further elaborated.It should be appreciated that specific embodiment described herein is only used to explain the application, not
For limiting the application.
Physical source distributing method provided by the present application, can be applied in application environment as shown in Figure 1.Wherein, eventually
End 102 is communicated with server 104 by network.Wherein, terminal 102 can be, but not limited to be various personal computers, pen
Remember this computer, smart phone, tablet computer and portable wearable device, what server 104 can be formed with multiple servers
Server cluster is realized.Spark task and corresponding configuration file are committed to server 104 by terminal 102.Server 104
On deploy task schedule platform, to Spark task schedule execute.The resource point that task schedule platform is recorded based on configuration file
It is that Spark task distributes corresponding physical resource with parameter.Task schedule platform provides the physics of Spark task schedule to distribution
It executes on source, and is monitored according to execution efficiency of the preset time frequency to Spark task.Task schedule platform compares execution efficiency
Whether threshold value is lower than.If execution efficiency is lower than threshold value, remaining task amount is calculated according to task total amount and task execution amount;According to appoint
Duration of being engaged in and current timing node calculate residual time length;Newly-increased physics is needed according to remaining task amount and residual time length measuring and calculating
Resource.If execution efficiency is greater than or equal to threshold value, letter is used according to the resource of the two neighboring timing node of operational information recording
Breath, computing resource utilization rate;The physical resource for needing to discharge according to resource utilization measuring and calculating.Task schedule platform stops executing
Spark task joins Spark task schedule to resource allocation adjusted according to results of measuring adjustresources allocation of parameters
The adaptable physical resource of number executes.Above-mentioned physical source distributing process, due to by resource allocation parameters with the side of configuration file
Formula is individually stored, and independently of Spark task itself, the limitation so as to get rid of Spark task version updating is flexible certainly
By modification resource allocation parameters;Real-time monitoring Spark task execution efficiency, and according to the physics of execution efficiency dynamic adjustment distribution
Resource is adapted to actual demand of the Spark task to physical resource, and then Spark task execution efficiency can be improved.
In one embodiment, as shown in Fig. 2, providing a kind of physical source distributing method, it is applied to Fig. 1 in this way
In server for be illustrated, include the following steps:
Step 202, the Spark task and corresponding configuration file that terminal is submitted are received.
The corresponding service logic script of Spark task includes Shell script.Task schedule personnel are by the money of spark task
Source allocation of parameters is recorded in configuration file, and the preset call back function to configuration file in Shell script.Resource allocation ginseng
Number can be what task schedule personnel estimated in advance according to the task amount of Spark task.
The server cluster of multiple server compositions, including host node Master and multiple working node Worker.Task
Spark task and corresponding configuration file are committed to host node by spark-submit order in terminal by dispatcher.It is main
Task schedule platform is deployed on node, for being scheduled execution to multiple Spark tasks that multiple terminals are submitted.Task tune
Degree platform is individually stored configuration file independently of Spark task, and corresponding for each Spark task start one
Driver process.According to preset deployment mode (deploy-mode), Driver process in the local boot Spark task or
Certain working node starts Spark task to person in the cluster.
Step 204, the resource allocation parameters that Spark task is read from configuration file, carry out according to resource allocation parameters
Physical source distributing.
Task schedule platform is based on Driver process initiation Spark task, and distributes physical resource for Spark task.Tool
Body, Driver process calls the corresponding Shell script of Spark task, a callback instruction to configuration file is generated, according to
Callback instruction reads the resource allocation parameters in configuration file.Driver process is according to the resource allocation parameters read, to collection
Group's manager application operation Spark task needs physical resource to be used.Cluster manager dual system can be Spark Standalone
Cluster or YARN resource management cluster etc..Physical resource refers to memory and CPU etc..Cluster manager dual system exists according to resource allocation parameters
Start a certain number of Executor processes on each working node of cluster.It is readily appreciated that, Driver process and each Executor
Process itself can also occupy certain physical resource.
Step 206, the physical resource based on distribution executes Spark task.
After applying for physical resource needed for Spark task execution, task schedule platform is based on Driver process and opens
Begin scheduling execution Spark task.Specifically, Spark task is split as the task groups of multiple asynchronous executions by Driver process
Stage, each task groups stage include multiple asynchronous executions and/or the background task task concurrently executed.Driver process will
Multiple background task task of one task groups stage, which are assigned in multiple Executor processes, to be executed.Background task task is
The smallest execution unit.The implementing result of each background task task is stored to the corresponding memory of Executor process or place work
Make in the disk file of node.When all background task task of current task group stage are carried out and finish, Driver process exists
Intermediate result, and management and running next task group stage are written in the disk file of each working node local.So circulation
Back and forth, until all having executed Spark task.
Step 208, during Spark task execution, the execution efficiency of Spark task is monitored.
During Spark task execution, task schedule platform executes effect based on Driver monitoring the process Spark task
Rate calculates the execution speed of background task task.It is readily appreciated that, the execution speed of background task task and corresponding Executor
The physical resources such as the CPU core number of process are directly related.In general, a CPU same time executes a thread.Physical resource is enough
In the case where, as the multiple background task task being assigned in Executor process, multi-thread concurrent can be called to execute more
A background task task, to improve the execution efficiency of Spark task.
Step 210, when monitoring execution efficiency lower than threshold value, the resource allocation parameters in configuration file are adjusted
It is whole.
Task schedule platform is based on Driver process and compares whether execution efficiency is lower than threshold value.Threshold value can be according to practical need
Free setting is asked, it can also be without limitation with dynamic change.If execution efficiency is lower than threshold value, indicate that current Spark task is deposited
In the insufficient risk of physical resource, task schedule platform, which generates, to be stopped executing instruction, and will stop executing instruction being sent to corresponding work
Make node, to terminate corresponding Driver process and Executor process.The measuring and calculating of task schedule platform needs newly-increased physics money
Source is adjusted according to the resource allocation parameters that results of measuring corresponds to configuration file record to Spark task.If execution efficiency is big
In or equal to threshold value, indicating current Spark task, there is no the insufficient risk of physical resource or risk are relatively low.Task schedule
Platform judges whether the allocated physical resource of Spark task has idling-resource, and measuring and calculating needs the physical resource discharged, according to
The resource allocation parameters that results of measuring corresponds to configuration file record to Spark task are adjusted.
Step 212, Spark task is dispatched to from allocated physical resource and is mutually fitted with resource allocation parameters adjusted
It is continued to execute on the physical resource answered.
Task schedule platform be based on resource allocation parameters adjusted, again for one Driver of Spark task start into
Journey calls the Driver process of the new starting to distribute physical resource again in the manner described above for Spark task, i.e., more in cluster
A working node restarts a certain number of Executor processes.Driver process is by Spark task schedule extremely and after adjustment
The adaptable physical resource of resource allocation parameters execute, i.e., multiple background task task that Spark task is split are sent
It is executed to the multiple Executor processes redistributed.The Driver process that task schedule platform is based on continues to monitor Spark task
Execution efficiency, and the adjustment of resource allocation parameters is carried out according to execution efficiency, until Spark task execution finishes.
Traditional resource allocation parameters are fixedly arranged in the Shell script of Spark task, so that only waiting until
Spark task carries out just can be carried out resource allocation parameters change when version updating, so that resource allocation parameters modification is inconvenient, in turn
Influence Spark task run efficiency and operation result.
In the present embodiment, the resource allocation parameters of Spark task recorded in the configuration file submitted according to terminal can be with
Physical resource is distributed for the Spark task that terminal is submitted;Physical resource based on distribution can execute Spark task;Pass through prison
The execution efficiency for surveying Spark task, can be adjusted the resource allocation parameters in configuration file according to monitoring result;It will
Spark task schedule to the physical resource being adapted with resource allocation parameters adjusted executes.Due to by resource allocation parameters
It is individually stored in a manner of configuration file, independently of Spark task itself, so as to get rid of Spark task version more
Resource allocation parameters are flexibly freely modified in new limitation;Real-time monitoring Spark task execution efficiency, and according to execution efficiency dynamic
The physical resource for adjusting distribution, is adapted to actual demand of the Spark task to physical resource, and then Spark can be improved and appoint
Business execution efficiency.
In one embodiment, before receiving the Spark task and corresponding configuration file that terminal is submitted, further include:It connects
It receives the Spark task that terminal is sent and develops request;Exploitation request is identified comprising entrance function;Identify that entrance function mark is corresponding
Function queue;Function queue includes multiple business functions;Multiple business functions are respectively converted into corresponding multiple background tasks;
Multiple background tasks are encapsulated as multiple tasks group by the corresponding group's decorator of call entry function identification;Configure multiple tasks group
Dispatching sequence, multiple tasks group is packaged based on dispatching sequence, obtains Spark task.
Spark task is based on multiple business functions and realizes certain business function.For convenience, one kind will be realized jointly
Multiple business functions of business function are known as function queue.Spark task has corresponding service logic script.Service logic foot
This includes multiple function queues.Different business functions is realized in different functions queue.It is readily appreciated that, the division for business function
Dimension, Spark task developer can freely define.When the service logic of Spark task changes, it is part of or
The corresponding function queue of whole business functions is accordingly changed.Multiple business functions are arranged according to dispatching sequence in function queue
Column, are known as entrance function for the business function of wherein the first dispatching sequence.
Above-mentioned Spark task can be to be developed based on a kind of distributive parallel computation framework provided in this embodiment.
The frame includes task decorator, group's decorator and group's container.Task decorator is used to business function being converted to correspondence
Background task task.Group's decorator is used to background task task multiple in a task queue being encapsulated as corresponding task
Group Stage.Group's container is used to multiple tasks group being encapsulated as corresponding task group Job.
It, can be with when developer calls distributive parallel computation framework provided by the present application to carry out the exploitation of Spark task
Exploitation request is sent by terminal to server.Server is according to exploitation request distributive parallel computation framework.Server
Based on the first call request that terminal is sent, task decorator is returned to terminal.Each entrance of the terminal in service logic script
A task decorator is added at function.Specifically, the corresponding call back function of a task decorator is added in entrance function,
And task decorator is set by the callback object of call back function, it is that entrance function is touched by the readjustment condition setting of call back function
Hair executes.When the entrance function is triggered and executes, one is generated by call back function, the readjustment of corresponding task decorator is referred to
It enables, server can call corresponding task decorator according to callback instruction, and entrance function is corresponded to letter using task decorator
Each business function is encapsulated as corresponding background task in number queue.
The second call request that server is sent based on terminal returns to group's decorator to terminal.Group's decorator includes
Multiple elements, such as status checker, preposition detector and litter cleaner.Wherein, status checker is used for encapsulation
The execution state of the task groups of generation is checked.Preposition detector is for detecting whether current task group meets execution condition.
Litter cleaner is used to carry out rubbish cleaning to when cancelling current task group.Multiple function is also recorded in group's decorator
Parameter, such as the task group identification Stage_id of task groups.Terminal adds one at each entrance function of service logic script
A group's decorator.Specifically, terminal adds the corresponding call back function of group's decorator in entrance function, and will readjustment
The callback object of function is set as group's decorator, is to generate entrance function corresponding by the readjustment condition setting of call back function
Business queue, configures multiple elements in the corresponding group's decorator of each entrance function and parameters.In entrance
Function is triggered when executing, and generates a callback instruction to relevant groups decorator by call back function, server can root
Corresponding group's decorator is called according to callback instruction, group's decorator identifies the corresponding task queue of the entrance function, by this
Multiple background task task are encapsulated as corresponding task groups Stage in business queue.
The third call request that server is sent based on terminal returns to group's container to terminal.Developer will in terminal
Group's container is added to service logic script, and configures the corresponding task group identification of group's container.Group's container is used for will be more
A task groups are encapsulated as task group.In other words, group's container is for accommodating multiple tasks group, and this multiple tasks group is as corresponding
The encapsulated object of group's container.As shown in the realization script of above-mentioned group's wrapper, developer is in terminal by task group identification
Job_id is added to the corresponding group's decorator of each encapsulated object, with the encapsulation relationship established between task groups.Service logic
Multiple group's containers can be added in script.According to the task group identification in group's wrapper, can determine whether to encapsulate task groups to
Which task group.It is readily appreciated that, the one or more task group Job encapsulated are above-mentioned Spark task.
Group's container itself provides asynchronous execution function in distributive parallel computation framework.It is added to by group's container
After service logic script, developer can use asynchronous execution function, in group's container pre-define multiple tasks group it
Between dispatching sequence rules of arrangement.Rules of arrangement includes asynchronous between multiple tasks group mark and multiple tasks group mark holds
Capable successive dispatching sequence.
Traditional most distributive parallel computation frameworks can only carry out control scheduling based on individual task, lack business level
Concurrent control mechanism.If developer is desired based on the task schedule that multiple tasks realize business level, need opening
Hair process additional maintenance one opens even multiple tables of data for dispatching sequence between logger task, to developer bring it is many not
Just.
In the present embodiment, due to being integrated with group's decorator in Spark task in advance, and group's decorator itself provides and appoints
Business encapsulation and dispatching sequence's capacity of arranging movements, multiple scattered background tasks according to service logic be encapsulated as can be realized certain
The task groups of business function or task group, and then task tune can be realized from business level without additional maintenance tables of data
Degree, fills in the mode in tables of data compared to traditional dispatching sequence by multiple scattered background tasks one by one, can be significantly
Simplify the exploitation of Spark task.
In one embodiment, Spark task includes Shell script;Shell script is prefixed the readjustment to configuration file
Function;The resource allocation parameters of Spark task are read from configuration file, including:By executing Shell script startup Spark
Task;Based on the preset call back function of Shell script, the callback instruction to configuration file is generated;It is pulled pair according to callback instruction
The configuration file answered;Resource allocation parameters are read from the configuration file pulled.
The corresponding service logic script of Spark task includes Shell script, Submit script and Class script etc..Task
Dispatching platform passes through load Shell script startup Spark task.Shell script, which is used to record, executes the number that Spark task needs
Enter ginseng according to parameter, such as execution of Submit script or Class script.Traditional resource allocation parameters by Spark task are also remembered
It records to Shell script, and the fixation of Shell script is encapsulated in Spark task, can only carry out phase with the version updating of Spark task
Answer the change of data parameters.The present embodiment records resource allocation parameters to the configuration file independently of Spark task, and
The preset call back function to configuration file in Shell script.When Shell script is scheduled to be executed, generated based on call back function
Callback instruction.Driver process pulls corresponding configuration file from the configuration pulled according to the file identification that callback instruction carries
Resource allocation parameters are read in file.
In the present embodiment, due to individually being stored resource allocation parameters in a manner of configuration file, independently of Spark
Task itself flexibly freely modifies resource allocation parameters so as to get rid of the limitation of Spark task version updating.
In one embodiment, the physical resource based on distribution executes Spark task, including:Spark task is split as
Multiple tasks group;Each task groups have corresponding task group identification;Task groups are split as multiple background tasks;Each basis
Task has corresponding log decorator;Physical resource based on distribution executes multiple background tasks, generates each background task
Execution journal;Using log decorator, the corresponding task groups of background task are added in the execution journal of corresponding background task
Mark;When Spark task execution finishes, the multiple execution journal for having same task group to identify record are collected, and are generated
The corresponding task daily record of each task group identification.
Server can be according to the reverse logic of the wide logic for relying on narrow dependence or above-mentioned encapsulation process by Spark task
It is split as multiple tasks group, each task groups are split as multiple background tasks.Server is according to layout preparatory in task group
The dispatching sequence of multiple tasks group is scheduled execution to multiple tasks group in task group.In other words, server is dispatched first
Multiple background tasks of serial task group are distributed to multiple working nodes and are executed, multiple bases to first dispatching sequence's task groups
Task, which is all finished, executes multiple background tasks of next dispatching sequence's task groups.When background task is performed, generate
Corresponding execution journal.
It is executed since server host node distributes multiple background tasks to working nodes multiple in cluster, so that holding
The execution journal that the different background tasks of row generate is dispersed in multiple servers or multiple virtual machines, and then see developer can only
To the dispatch situation of the task level of scattered black box.But developer is usually only concerned the task schedule of business level, this is opened
It is greatly inconvenient that hair personnel consult journal tape.
To solve the above-mentioned problems, distributive parallel computation framework provided in this embodiment further includes log decorator.Day
Will decorator enables Spark task to carry out log generation from business dimension, i.e. the multiple of same business function are realized in control
The execution journal of background task, which is concentrated, to be shown.
When developer calls distributive parallel computation framework provided by the present application to carry out the exploitation of Spark task, service
The 4th call request that device is sent based on terminal returns to log decorator to terminal.Developer is by terminal in service logic
A log decorator is added at each entrance function of script.Multiple business functions in same entrance function respective function queue
Corresponding same log decorator.Developer refers to settled date will processing mode by terminal in the log decorator of deployment.Log
Decorative device such as generates the corresponding log of each task groups there are many different log processing modes, or generates each task group
Corresponding log etc..According to specified log processing mode, corresponding task group identification or task are added in log decorator
Group identification.
When background task is performed, corresponding execution journal is generated, corresponding log decorator is called, log decorator is pressed
Corresponding task group identification or task group identification etc. are inserted into task daily record according to specified log processing mode.In certain business
When being finished, the acquisition in the multiple servers or multiple virtual machines for executing this business of log decorator has same task
Multiple execution journal of group mark or task group identification, integrate collected multiple execution journal, generate each task
Task daily record return terminal is shown by group mark or the corresponding task daily record of task group identification.
In the present embodiment, the corresponding log decorator of each entrance function is prefixed in Spark task.Log decorator sheet
Body provides the ability that log collection is carried out from business level, executes background task in multiple servers and generates scattered execution journal
Afterwards, log collection and integration are carried out from business level automatically, solves the problems, such as that log is consulted inconvenient caused by dispersing because of log.
In one embodiment, the execution efficiency of Spark task is monitored, including:Calculate the task total amount of Spark task;
The task duration of Spark task is calculated according to task total amount;It is acquired according to preset time frequency coordination task run monitor component
The operation information of Spark task;Spark task is calculated in the task execution amount of multiple timing nodes according to operation information;According to
Task execution amount and task duration calculate Spark task in the execution efficiency of multiple timing nodes.
During Spark task execution, the execution efficiency of Driver process real-time monitoring Spark task.Spark task
Execution efficiency refers to the task execution amount of unit time.Specifically, Driver process calculates the task total amount of Spark task, and
The task duration of Spark task is calculated according to task total amount.Task schedule platform can be default according to preset time frequency coordination
Task run monitor component acquisition Spark task operation information.Task run monitor component can be REST interface
(Representational State Transfer, declarative state transmitting) etc..The operation information of Spark task includes working as
The execution state of preceding moment multiple tasks group, and the task amount that state is the task groups executed is executed, execution state is to hold
The task amount etc. of the background task executed in task groups in row.
Driver process is the task amount of the task groups executed according to execution state, and execution state is task in execution
The task amount of the background task executed in group calculates Spark task in the task execution amount at current time.Driver process root
It is calculated according to the initial time for starting to execute Spark task of record and current time and executes duration.Driver process is according to task
Execution amount and execution duration calculate Spark task in the execution efficiency of current time node.
In the present embodiment, according to the task execution amount for the Spark task that real-time monitoring obtains, and the execution calculated in advance
The task duration that Spark task needs calculates Spark task in the execution efficiency of different monitoring time nodes, can make to calculate
Obtained execution efficiency is more bonded Spark task practical operation situation, and then improves the calculating accuracy rate of execution efficiency.
In one embodiment, when monitoring execution efficiency lower than threshold value, to the resource allocation parameters in configuration file
It is adjusted, including:Compare whether execution efficiency is lower than threshold value;If so, being calculated according to task total amount and task execution amount remaining
Task amount;Residual time length is calculated according to task duration and current timing node;Calculated according to remaining task amount and residual time length
Need newly-increased physical resource;Otherwise, it according to the resource using information of the two neighboring timing node of operational information recording, calculates
Resource utilization;The physical resource for needing to discharge according to resource utilization measuring and calculating;According to results of measuring adjustresources allocation of parameters.
Task schedule platform can be automatically right according to the monitoring result to Spark task execution efficiency based on Driver process
Resource allocation parameters are adjusted.Specifically, Driver process compares whether execution efficiency is lower than threshold value.If so, Driver into
Journey calculates remaining task amount, and according to the execution of measuring and calculating according to the task total amount and task execution amount of the Spark task of measuring and calculating
The task duration and current timing node that Spark task needs, calculate residual time length.Driver process is according to remaining task amount
And residual time length, calculate the target execution efficiency of Spark task.Driver process reads the resource allocation of configuration file record
Parameter, the Spark task obtained according to monitoring are determined in the physical resource of current time actual execution efficiency and corresponding distribution
Reach the target physical resource of target execution efficiency needs.It is readily appreciated that, target physical resource and allocated physical resource
Difference is to need newly-increased physical resource.The resource allocation that Driver process records configuration file according to target physical resource
Parameter is adjusted.
The operation information of Spark task based on the acquisition of preset task run monitor component further includes Spark task
Resource using information, such as CPU usage, memory remaining space capacity etc..If execution efficiency be greater than or equal to threshold value, Driver into
Journey calculates the resource utilization of physical resource, according to resource according to the resource using information of the two neighboring timing node of acquisition
Utilization rate judges allocated physical resource with the presence or absence of free physical resource.Driver process reads the money of configuration file record
Source allocation of parameters determines the free physical resource for needing to discharge according to resource allocation parameters and resource utilization.Driver into
Journey is adjusted according to the resource allocation parameters that free physical resource records configuration file.
In the present embodiment, resource allocation parameters are adjusted automatically according to the monitoring result to Spark task execution efficiency
It is whole, carry out that physical resource is newly-increased in time when execution efficiency is lower than threshold value, with guarantee the execution efficiency of Spark task and execute at
Power;Even if carrying out physical resource release when execution efficiency is greater than or equal to threshold value, physical resource utilization rate can be improved, subtract
Few waste to physical resource.
In one embodiment, when monitoring execution efficiency lower than threshold value, to the resource allocation parameters in configuration file
It is adjusted, including:Compare whether execution efficiency is lower than threshold value;If so, label Spark task execution is abnormal, obtains Spark and appoint
The task daily record of business;Abnormal cause positioning is carried out according to task daily record;If abnormal cause includes physical resource deficiency, according to configuration
The resource allocation parameters of file record generate the resource adjustment prompt page, and resource is adjusted the prompt page and is sent to terminal;Make end
End is adjusted in resource adjustment tips page in face of resource allocation parameters.
When needing to be adjusted resource allocation parameters, Driver process sends resource adjustment prompt to terminal.Specifically
, Driver process compares whether execution efficiency is lower than threshold value.If so, indicate that Spark task may occur to execute exception,
Driver process obtains the task daily record of Spark task, traverses to execution a plurality of in task daily record record, screening executes different
Normal executes record, and it is abnormal former to be recorded in progress in the corresponding service logic script of Spark task according to the execution for executing exception
Because of positioning.If abnormal cause includes physical resource deficiency, stop executing Spark task, the money recorded according to configuration file
Source allocation of parameters generates the resource adjustment prompt page, and resource is adjusted the prompt page and is sent to terminal.Task schedule personnel can be with
Tips page is adjusted in resource by terminal to be adjusted in face of resource allocation parameters.In another embodiment, task schedule people
Member at any time can be out of service Spark task, restart phase after modifying in configuration file to respective resources allocation of parameters
Spark task is answered, is no longer limited by Spark version.
Task schedule platform be based on resource allocation parameters adjusted, again for one Driver of Spark task start into
Journey, call the Driver process of the new starting according to resource allocation parameters adjusted be again Spark task distribution physics money
Source is restarted based on the physical resource redistributed and executes abnormal Spark task.
In the present embodiment, is judged automatically and whether needed to resource point according to the monitoring result to Spark task execution efficiency
It is adjusted with parameter, if desired resource allocation parameters is adjusted, send resource adjustment prompt to terminal in time, and providing
The physical resource parameters of the source adjustment prompt original setting of page presentation, facilitate user referring to modification.
It should be understood that although each step in the flow chart of Fig. 2 is successively shown according to the instruction of arrow, this
A little steps are not that the inevitable sequence according to arrow instruction successively executes.Unless expressly state otherwise herein, these steps
It executes there is no the limitation of stringent sequence, these steps can execute in other order.Moreover, at least part in Fig. 2
Step may include that perhaps these sub-steps of multiple stages or stage are executed in synchronization to multiple sub-steps
It completes, but can execute at different times, the execution sequence in these sub-steps or stage, which is also not necessarily, successively to be carried out,
But it can be executed in turn or alternately at least part of the sub-step or stage of other steps or other steps.
In one embodiment, as shown in figure 3, providing a kind of physical source distributing device, including:Resource distribution module
302, efficiency monitoring module 304 and resource adjust module 306, wherein:
Resource distribution module 302, for receiving the Spark task and corresponding configuration file that terminal is submitted;From configuration text
The resource allocation parameters that Spark task is read in part carry out physical source distributing according to resource allocation parameters.
Efficiency monitoring module 304 executes Spark task for the physical resource based on distribution;In the Spark task execution phase
Between, monitor the execution efficiency of Spark task.
Resource adjusts module 306, for when monitoring execution efficiency lower than threshold value, to the resource allocation in configuration file
Parameter is adjusted;Spark task is dispatched to from allocated physical resource and is adapted with resource allocation parameters adjusted
Physical resource on continue to execute.
In one embodiment, which further includes task package module 308, and the Spark for receiving terminal transmission appoints
Business exploitation request;Exploitation request is identified comprising entrance function;Identify that entrance function identifies corresponding function queue;Function queue packet
Include multiple business functions;Multiple business functions are respectively converted into corresponding multiple background tasks;Call entry function identification pair
Multiple background tasks are encapsulated as multiple tasks group by the group's decorator answered;The dispatching sequence for configuring multiple tasks group, based on tune
Degree sequence is packaged multiple tasks group, obtains Spark task.
In one embodiment, Spark task includes Shell script;Shell script is prefixed the readjustment to configuration file
Function;Resource distribution module 302 is also used to by executing Shell script startup Spark task;It is preset based on Shell script
Call back function generates the callback instruction to configuration file;Corresponding configuration file is pulled according to callback instruction;Match from what is pulled
It sets and reads resource allocation parameters in file.
In one embodiment, efficiency monitoring module 304 is also used to Spark task being split as multiple tasks group;Each
Task groups have corresponding task group identification;Task groups are split as multiple background tasks;Each background task has corresponding
Log decorator;Physical resource based on distribution executes multiple background tasks, generates the execution journal of each background task;It utilizes
Log decorator adds the corresponding task group identification of background task in the execution journal of corresponding background task;When Spark task
When being finished, the multiple execution journal for having same task group to identify record are collected, and generate each task group identification pair
The task daily record answered.
In one embodiment, efficiency monitoring module 304 is also used to calculate the task total amount of Spark task;According to task
The task duration of total amount measuring and calculating Spark task;Spark task is acquired according to preset time frequency coordination task run monitor component
Operation information;Spark task is calculated in the task execution amount of multiple timing nodes according to operation information;According to task execution amount
And task duration, Spark task is calculated in the execution efficiency of multiple timing nodes.
In one embodiment, resource adjustment module 306 is also used to compare whether execution efficiency is lower than threshold value;If so, root
Remaining task amount is calculated according to task total amount and task execution amount;When calculating remaining according to task duration and current timing node
It is long;Newly-increased physical resource is needed according to remaining task amount and residual time length measuring and calculating;Otherwise, according to the adjacent of operational information recording
The resource using information of two timing nodes, computing resource utilization rate;The physics money for needing to discharge according to resource utilization measuring and calculating
Source;According to results of measuring adjustresources allocation of parameters.
In one embodiment, resource adjustment module 306 is also used to compare whether execution efficiency is lower than threshold value;If so, mark
Remember that Spark task execution is abnormal, obtains the task daily record of Spark task;Abnormal cause positioning is carried out according to task daily record;If different
Normal reason includes physical resource deficiency, generates the resource adjustment prompt page according to the resource allocation parameters of configuration file record, will
The resource adjustment prompt page is sent to terminal;It is adjusted terminal in face of resource allocation parameters in resource adjustment tips page.
Specific about physical source distributing device limits the limit that may refer to above for physical source distributing method
Fixed, details are not described herein.Modules in above-mentioned physical source distributing device can fully or partially through software, hardware and its
Combination is to realize.Above-mentioned each module can be embedded in the form of hardware or independently of in the processor in computer equipment, can also be with
It is stored in the memory in computer equipment in a software form, in order to which processor calls the above modules of execution corresponding
Operation.
In one embodiment, a kind of computer equipment is provided, which can be server, internal junction
Composition can be as shown in Figure 4.The computer equipment include by system bus connect processor, memory, network interface and
Database.Wherein, the processor of the computer equipment is for providing calculating and control ability.The memory packet of the computer equipment
Include non-volatile memory medium, built-in storage.The non-volatile memory medium is stored with operating system, computer program and data
Library.The built-in storage provides environment for the operation of operating system and computer program in non-volatile memory medium.The calculating
The database of machine equipment is used for storage configuration file.The network interface of the computer equipment is used to pass through network with external terminal
Connection communication.To realize a kind of physical source distributing method when the computer program is executed by processor.
It will be understood by those skilled in the art that structure shown in Fig. 4, only part relevant to application scheme is tied
The block diagram of structure does not constitute the restriction for the computer equipment being applied thereon to application scheme, specific computer equipment
It may include perhaps combining certain components or with different component layouts than more or fewer components as shown in the figure.
In one embodiment, a kind of computer equipment, including memory and processor are provided, which is stored with
Computer program, the processor realize following steps when executing computer program:Receive Spark task and correspondence that terminal is submitted
Configuration file;The resource allocation parameters that Spark task is read from configuration file carry out physics money according to resource allocation parameters
Source distribution;Physical resource based on distribution executes Spark task;During Spark task execution, monitoring Spark task is held
Line efficiency;When monitoring execution efficiency lower than threshold value, the resource allocation parameters in configuration file are adjusted;By Spark
Task is dispatched on the physical resource being adapted with resource allocation parameters adjusted from allocated physical resource and continues to execute.
In one embodiment, following steps are also realized when processor executes computer program:Receive what terminal was sent
The exploitation request of Spark task;Exploitation request is identified comprising entrance function;Identify that entrance function identifies corresponding function queue;Letter
Number queue includes multiple business functions;Multiple business functions are respectively converted into corresponding multiple background tasks;Call entry letter
Number identifies corresponding group's decorator and multiple background tasks is encapsulated as multiple tasks group;The scheduling for configuring multiple tasks group is suitable
Sequence is packaged multiple tasks group based on dispatching sequence, obtains Spark task.
In one embodiment, Spark task includes Shell script;Shell script is prefixed the readjustment to configuration file
Function;Processor also realizes following steps when executing computer program:By executing Shell script startup Spark task;It is based on
The preset call back function of Shell script generates the callback instruction to configuration file;Corresponding configuration text is pulled according to callback instruction
Part;Resource allocation parameters are read from the configuration file pulled.
In one embodiment, following steps are also realized when processor executes computer program:Spark task is split as
Multiple tasks group;Each task groups have corresponding task group identification;Task groups are split as multiple background tasks;Each basis
Task has corresponding log decorator;Physical resource based on distribution executes multiple background tasks, generates each background task
Execution journal;Using log decorator, the corresponding task groups of background task are added in the execution journal of corresponding background task
Mark;When Spark task execution finishes, the multiple execution journal for having same task group to identify record are collected, and are generated
The corresponding task daily record of each task group identification.
In one embodiment, following steps are also realized when processor executes computer program:Calculate appointing for Spark task
Business total amount;The task duration of Spark task is calculated according to task total amount;According to preset time frequency coordination task run monitoring group
The operation information of part acquisition Spark task;Spark task is calculated in the task execution of multiple timing nodes according to operation information
Amount;According to task execution amount and task duration, Spark task is calculated in the execution efficiency of multiple timing nodes.
In one embodiment, following steps are also realized when processor executes computer program:Whether compare execution efficiency
Lower than threshold value;If so, calculating remaining task amount according to task total amount and task execution amount;According to task duration and current time
Node calculates residual time length;Newly-increased physical resource is needed according to remaining task amount and residual time length measuring and calculating;Otherwise, according to operation
The resource using information of the two neighboring timing node of information record, computing resource utilization rate;Need are calculated according to resource utilization
The physical resource to be discharged;According to results of measuring adjustresources allocation of parameters.
In one embodiment, following steps are also realized when processor executes computer program:Whether compare execution efficiency
Lower than threshold value;If so, label Spark task execution is abnormal, the task daily record of Spark task is obtained;It is carried out according to task daily record
Abnormal cause positioning;If abnormal cause includes physical resource deficiency, money is generated according to the resource allocation parameters of configuration file record
The source adjustment prompt page, adjusts the prompt page for resource and is sent to terminal;Terminal is set to adjust tips page in face of resource point in resource
It is adjusted with parameter.
In one embodiment, a kind of computer readable storage medium is provided, computer program is stored thereon with, is calculated
Machine program realizes following steps when being executed by processor:Receive the Spark task and corresponding configuration file that terminal is submitted;From matching
The resource allocation parameters for reading Spark task in file are set, carry out physical source distributing according to resource allocation parameters;Based on distribution
Physical resource execute Spark task;During Spark task execution, the execution efficiency of Spark task is monitored;When monitoring
When execution efficiency is lower than threshold value, the resource allocation parameters in configuration file are adjusted;By Spark task from allocated object
It is continued to execute in reason scheduling of resource to the physical resource being adapted with resource allocation parameters adjusted.
In one embodiment, following steps are also realized when computer program is executed by processor:Receive what terminal was sent
The exploitation request of Spark task;Exploitation request is identified comprising entrance function;Identify that entrance function identifies corresponding function queue;Letter
Number queue includes multiple business functions;Multiple business functions are respectively converted into corresponding multiple background tasks;Call entry letter
Number identifies corresponding group's decorator and multiple background tasks is encapsulated as multiple tasks group;The scheduling for configuring multiple tasks group is suitable
Sequence is packaged multiple tasks group based on dispatching sequence, obtains Spark task.
In one embodiment, Spark task includes Shell script;Shell script is prefixed the readjustment to configuration file
Function;Following steps are also realized when computer program is executed by processor:By executing Shell script startup Spark task;Base
In the preset call back function of Shell script, the callback instruction to configuration file is generated;Corresponding configuration is pulled according to callback instruction
File;Resource allocation parameters are read from the configuration file pulled.
In one embodiment, following steps are also realized when computer program is executed by processor:Spark task is split
For multiple tasks group;Each task groups have corresponding task group identification;Task groups are split as multiple background tasks;Each base
Plinth task has corresponding log decorator;Physical resource based on distribution executes multiple background tasks, generates each basis and appoints
The execution journal of business;Using log decorator, the corresponding task of background task is added in the execution journal of corresponding background task
Group mark;When Spark task execution finishes, the multiple execution journal for having same task group to identify record are collected, raw
At the corresponding task daily record of each task group identification.
In one embodiment, following steps are also realized when computer program is executed by processor:Calculate Spark task
Task total amount;The task duration of Spark task is calculated according to task total amount;It is monitored according to preset time frequency coordination task run
The operation information of component acquisition Spark task;Spark task is calculated in the task execution of multiple timing nodes according to operation information
Amount;According to task execution amount and task duration, Spark task is calculated in the execution efficiency of multiple timing nodes.
In one embodiment, following steps are also realized when computer program is executed by processor:Comparing execution efficiency is
It is no to be lower than threshold value;If so, calculating remaining task amount according to task total amount and task execution amount;According to task duration and it is current when
Intermediate node calculates residual time length;Newly-increased physical resource is needed according to remaining task amount and residual time length measuring and calculating;Otherwise, according to fortune
The resource using information of the two neighboring timing node of row information record, computing resource utilization rate;Calculated according to resource utilization
The physical resource for needing to discharge;According to results of measuring adjustresources allocation of parameters.
In one embodiment, following steps are also realized when computer program is executed by processor:Comparing execution efficiency is
It is no to be lower than threshold value;If so, label Spark task execution is abnormal, the task daily record of Spark task is obtained;According to task daily record into
The positioning of row abnormal cause;If abnormal cause includes physical resource deficiency, generated according to the resource allocation parameters of configuration file record
The resource adjustment prompt page, adjusts the prompt page for resource and is sent to terminal;Make terminal in resource adjustment tips page in face of resource
Allocation of parameters is adjusted.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with
Instruct relevant hardware to complete by computer program, computer program to can be stored in a non-volatile computer readable
It takes in storage medium, the computer program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, this Shen
Please provided by any reference used in each embodiment to memory, storage, database or other media, may each comprise
Non-volatile and/or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM
(PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include
Random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms,
Such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double data rate sdram (DDRSDRAM), enhancing
Type SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM
(RDRAM), direct memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..
Each technical characteristic of above embodiments can be combined arbitrarily, for simplicity of description, not to above-described embodiment
In each technical characteristic it is all possible combination be all described, as long as however, the combination of these technical characteristics be not present lance
Shield all should be considered as described in this specification.
Above embodiments only express the several embodiments of the application, and the description thereof is more specific and detailed, but can not
Therefore it is construed as limiting the scope of the patent.It should be pointed out that for those of ordinary skill in the art,
Under the premise of not departing from the application design, various modifications and improvements can be made, these belong to the protection scope of the application.
Therefore, the scope of protection shall be subject to the appended claims for the application patent.
Claims (10)
1. a kind of physical source distributing method, the method includes:
Receive the Spark task and corresponding configuration file that terminal is submitted;
The resource allocation parameters that the Spark task is read from the configuration file are carried out according to the resource allocation parameters
Physical source distributing;
Physical resource based on distribution executes the Spark task;
During the Spark task execution, the execution efficiency of the Spark task is monitored;
When monitoring the execution efficiency lower than threshold value, the resource allocation parameters in the configuration file are adjusted, it will
The Spark task from allocated physical resource be dispatched on physical resource corresponding with resource allocation parameters adjusted after
It is continuous to execute.
2. the method according to claim 1, wherein the Spark task and corresponding for receiving terminal and submitting
Before configuration file, further include:
It receives the Spark task that terminal is sent and develops request;The exploitation request is identified comprising entrance function;
Identify that the entrance function identifies corresponding function queue;The function queue includes multiple business functions;
Multiple business functions are respectively converted into corresponding multiple background tasks;
It calls the entrance function to identify corresponding group's decorator and multiple background tasks is encapsulated as multiple tasks group;
The dispatching sequence for configuring multiple task groups is packaged multiple task groups based on the dispatching sequence, obtains
To the Spark task.
3. the method according to claim 1, wherein the Spark task includes Shell script;The Shell
Script is prefixed the call back function to the configuration file;The money that the Spark task is read from the configuration file
Source allocation of parameters, including:
By executing Spark task described in the Shell script startup;
Based on the call back function, the callback instruction to the configuration file is generated;
Corresponding configuration file is pulled according to the callback instruction;
The resource allocation parameters are read from the configuration file pulled.
4. the method according to claim 1, wherein the physical resource based on distribution executes the Spark
Task, including:
The Spark task is split as multiple tasks group;Each task groups have corresponding task group identification;
The task groups are split as multiple background tasks;Each background task has corresponding log decorator;
Physical resource based on distribution executes multiple background tasks, generates the execution journal of each background task;
Using the log decorator, the corresponding task groups of the background task are added in the execution journal of corresponding background task
Mark;
When the Spark task execution finishes, the multiple execution journal for having same task group to identify record are collected, raw
At the corresponding task daily record of each task group identification.
5. the method according to claim 1, wherein the execution efficiency of the monitoring Spark task, packet
It includes:
Calculate the task total amount of the Spark task;
The task duration of the Spark task is calculated according to the task total amount;
The operation information of the Spark task is acquired according to preset time frequency coordination task run monitor component;
The Spark task is calculated in the task execution amount of multiple timing nodes according to the operation information;
According to the task execution amount and the task duration, the Spark task is calculated in multiple timing nodes and executes effect
Rate.
6. according to the method described in claim 5, it is characterized in that, described when monitoring the execution efficiency lower than threshold value,
Resource allocation parameters in the configuration file are adjusted, including:
Compare whether the execution efficiency is lower than threshold value;
If so, calculating remaining task amount according to the task total amount and task execution amount;According to the task duration and currently
Timing node calculates residual time length;Newly-increased physical resource is needed according to the remaining task amount and residual time length measuring and calculating;
Otherwise, according to the resource using information of the two neighboring timing node of the operational information recording, computing resource utilization rate;
The physical resource for needing to discharge according to resource utilization measuring and calculating;
The resource allocation parameters are adjusted according to results of measuring.
7. the method according to claim 1, wherein described when monitoring the execution efficiency lower than threshold value,
Resource allocation parameters in the configuration file are adjusted, including:
Compare whether the execution efficiency is lower than threshold value;
If so, marking the Spark task execution abnormal, the task daily record of the Spark task is obtained;
Abnormal cause positioning is carried out according to the task daily record;
If the abnormal cause includes physical resource deficiency, resource is generated according to the resource allocation parameters of configuration file record
The resource is adjusted the prompt page and is sent to the terminal by the adjustment prompt page;Mention the terminal in resource adjustment
Show that the page is adjusted the resource allocation parameters.
8. a kind of physical source distributing device, which is characterized in that described device includes:
Resource distribution module, for receiving the Spark task and corresponding configuration file that terminal is submitted;From the configuration file
The resource allocation parameters for reading the Spark task carry out physical source distributing according to the resource allocation parameters;
Efficiency monitoring module executes the Spark task for the physical resource based on distribution;In the Spark task execution
Period monitors the execution efficiency of the Spark task;
Resource adjusts module, for dividing the resource in the configuration file when monitoring the execution efficiency lower than threshold value
It is adjusted with parameter;The Spark task is dispatched to and resource allocation parameters adjusted from allocated physical resource
It is continued to execute on adaptable physical resource.
9. a kind of computer equipment, including memory and processor, the memory are stored with computer program, feature exists
In the step of processor realizes any one of claims 1 to 7 the method when executing the computer program.
10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program
The step of method described in any one of claims 1 to 7 is realized when being executed by processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810621848.4A CN108845884B (en) | 2018-06-15 | 2018-06-15 | Physical resource allocation method, device, computer equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810621848.4A CN108845884B (en) | 2018-06-15 | 2018-06-15 | Physical resource allocation method, device, computer equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108845884A true CN108845884A (en) | 2018-11-20 |
CN108845884B CN108845884B (en) | 2024-04-19 |
Family
ID=64202053
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810621848.4A Active CN108845884B (en) | 2018-06-15 | 2018-06-15 | Physical resource allocation method, device, computer equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108845884B (en) |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109491841A (en) * | 2018-11-21 | 2019-03-19 | 南京安讯科技有限责任公司 | A method of improving Spark on yarn real-time task reliability |
CN110275777A (en) * | 2019-06-10 | 2019-09-24 | 广州市九重天信息科技有限公司 | Resource scheduling system |
CN110597858A (en) * | 2019-08-30 | 2019-12-20 | 深圳壹账通智能科技有限公司 | Task data processing method and device, computer equipment and storage medium |
CN111078496A (en) * | 2019-11-29 | 2020-04-28 | 联想(北京)有限公司 | Data monitoring method, platform and storage medium |
CN111338779A (en) * | 2020-02-27 | 2020-06-26 | 深圳华锐金融技术股份有限公司 | Resource allocation method, device, computer equipment and storage medium |
CN111767092A (en) * | 2020-06-30 | 2020-10-13 | 深圳前海微众银行股份有限公司 | Job execution method, device, system and computer readable storage medium |
CN112068874A (en) * | 2020-07-30 | 2020-12-11 | 深圳市优必选科技股份有限公司 | Software project continuous integration method and device, terminal equipment and storage medium |
CN112114958A (en) * | 2019-06-21 | 2020-12-22 | 上海哔哩哔哩科技有限公司 | Resource isolation method, distributed platform, computer device, and storage medium |
CN112148469A (en) * | 2019-06-28 | 2020-12-29 | 杭州海康威视数字技术股份有限公司 | Method, apparatus and computer storage medium for managing resources |
WO2021017701A1 (en) * | 2019-07-29 | 2021-02-04 | 中兴通讯股份有限公司 | Spark performance optimization control method and apparatus, and device and storage medium |
CN112527384A (en) * | 2020-12-15 | 2021-03-19 | 青岛海尔科技有限公司 | Resource allocation parameter configuration method and device, storage medium and electronic device |
CN112597121A (en) * | 2020-12-25 | 2021-04-02 | 北京知因智慧科技有限公司 | Logic script processing method and device, electronic equipment and storage medium |
CN113691587A (en) * | 2021-07-20 | 2021-11-23 | 北京达佳互联信息技术有限公司 | Virtual resource processing method and device, electronic equipment and storage medium |
CN114168302A (en) * | 2021-12-28 | 2022-03-11 | 中国建设银行股份有限公司 | Task scheduling method, device, equipment and storage medium |
EP4086764A1 (en) * | 2021-05-06 | 2022-11-09 | Ateme | Method for dynamic resources allocation and apparatus for implementing the same |
CN115794591A (en) * | 2023-02-06 | 2023-03-14 | 南方电网数字电网研究院有限公司 | Scheduling method of power grid IT (information technology) resources |
WO2023115931A1 (en) * | 2021-12-21 | 2023-06-29 | 浪潮通信信息系统有限公司 | Big-data component parameter adjustment method and apparatus, and electronic device and storage medium |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102594869A (en) * | 2011-12-30 | 2012-07-18 | 深圳市同洲视讯传媒有限公司 | Method and device for dynamically distributing resources under cloud computing environment |
CN104951372A (en) * | 2015-06-16 | 2015-09-30 | 北京工业大学 | Method for dynamic allocation of Map/Reduce data processing platform memory resources based on prediction |
CN106033371A (en) * | 2015-03-13 | 2016-10-19 | 杭州海康威视数字技术股份有限公司 | Method and system for dispatching video analysis task |
CN107291550A (en) * | 2017-06-22 | 2017-10-24 | 华中科技大学 | A kind of Spark platform resources dynamic allocation method and system for iterated application |
CN107454019A (en) * | 2017-09-28 | 2017-12-08 | 北京邮电大学 | Software defined network distribution method of dynamic bandwidth, device, equipment and storage medium |
US20180024863A1 (en) * | 2016-03-31 | 2018-01-25 | Huawei Technologies Co., Ltd. | Task Scheduling and Resource Provisioning System and Method |
CN108023759A (en) * | 2016-10-28 | 2018-05-11 | 腾讯科技(深圳)有限公司 | Adaptive resource regulating method and device |
-
2018
- 2018-06-15 CN CN201810621848.4A patent/CN108845884B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102594869A (en) * | 2011-12-30 | 2012-07-18 | 深圳市同洲视讯传媒有限公司 | Method and device for dynamically distributing resources under cloud computing environment |
CN106033371A (en) * | 2015-03-13 | 2016-10-19 | 杭州海康威视数字技术股份有限公司 | Method and system for dispatching video analysis task |
CN104951372A (en) * | 2015-06-16 | 2015-09-30 | 北京工业大学 | Method for dynamic allocation of Map/Reduce data processing platform memory resources based on prediction |
US20180024863A1 (en) * | 2016-03-31 | 2018-01-25 | Huawei Technologies Co., Ltd. | Task Scheduling and Resource Provisioning System and Method |
CN108023759A (en) * | 2016-10-28 | 2018-05-11 | 腾讯科技(深圳)有限公司 | Adaptive resource regulating method and device |
CN107291550A (en) * | 2017-06-22 | 2017-10-24 | 华中科技大学 | A kind of Spark platform resources dynamic allocation method and system for iterated application |
CN107454019A (en) * | 2017-09-28 | 2017-12-08 | 北京邮电大学 | Software defined network distribution method of dynamic bandwidth, device, equipment and storage medium |
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109491841A (en) * | 2018-11-21 | 2019-03-19 | 南京安讯科技有限责任公司 | A method of improving Spark on yarn real-time task reliability |
CN110275777B (en) * | 2019-06-10 | 2021-10-29 | 广州市九重天信息科技有限公司 | Resource scheduling system |
CN110275777A (en) * | 2019-06-10 | 2019-09-24 | 广州市九重天信息科技有限公司 | Resource scheduling system |
CN112114958A (en) * | 2019-06-21 | 2020-12-22 | 上海哔哩哔哩科技有限公司 | Resource isolation method, distributed platform, computer device, and storage medium |
CN112148469B (en) * | 2019-06-28 | 2024-02-20 | 杭州海康威视数字技术股份有限公司 | Method and device for managing resources and computer storage medium |
CN112148469A (en) * | 2019-06-28 | 2020-12-29 | 杭州海康威视数字技术股份有限公司 | Method, apparatus and computer storage medium for managing resources |
WO2021017701A1 (en) * | 2019-07-29 | 2021-02-04 | 中兴通讯股份有限公司 | Spark performance optimization control method and apparatus, and device and storage medium |
CN112379935A (en) * | 2019-07-29 | 2021-02-19 | 中兴通讯股份有限公司 | Spark performance optimization control method, device, equipment and storage medium |
CN110597858A (en) * | 2019-08-30 | 2019-12-20 | 深圳壹账通智能科技有限公司 | Task data processing method and device, computer equipment and storage medium |
CN111078496A (en) * | 2019-11-29 | 2020-04-28 | 联想(北京)有限公司 | Data monitoring method, platform and storage medium |
CN111338779A (en) * | 2020-02-27 | 2020-06-26 | 深圳华锐金融技术股份有限公司 | Resource allocation method, device, computer equipment and storage medium |
CN111338779B (en) * | 2020-02-27 | 2021-11-02 | 深圳华锐金融技术股份有限公司 | Resource allocation method, device, computer equipment and storage medium |
CN111767092A (en) * | 2020-06-30 | 2020-10-13 | 深圳前海微众银行股份有限公司 | Job execution method, device, system and computer readable storage medium |
CN112068874B (en) * | 2020-07-30 | 2023-12-29 | 深圳市优必选科技股份有限公司 | Continuous integration method and device for software items, terminal equipment and storage medium |
CN112068874A (en) * | 2020-07-30 | 2020-12-11 | 深圳市优必选科技股份有限公司 | Software project continuous integration method and device, terminal equipment and storage medium |
CN112527384A (en) * | 2020-12-15 | 2021-03-19 | 青岛海尔科技有限公司 | Resource allocation parameter configuration method and device, storage medium and electronic device |
CN112527384B (en) * | 2020-12-15 | 2023-06-16 | 青岛海尔科技有限公司 | Method and device for configuring resource allocation parameters, storage medium and electronic device |
CN112597121A (en) * | 2020-12-25 | 2021-04-02 | 北京知因智慧科技有限公司 | Logic script processing method and device, electronic equipment and storage medium |
EP4086764A1 (en) * | 2021-05-06 | 2022-11-09 | Ateme | Method for dynamic resources allocation and apparatus for implementing the same |
CN113691587A (en) * | 2021-07-20 | 2021-11-23 | 北京达佳互联信息技术有限公司 | Virtual resource processing method and device, electronic equipment and storage medium |
WO2023115931A1 (en) * | 2021-12-21 | 2023-06-29 | 浪潮通信信息系统有限公司 | Big-data component parameter adjustment method and apparatus, and electronic device and storage medium |
CN114168302A (en) * | 2021-12-28 | 2022-03-11 | 中国建设银行股份有限公司 | Task scheduling method, device, equipment and storage medium |
CN115794591A (en) * | 2023-02-06 | 2023-03-14 | 南方电网数字电网研究院有限公司 | Scheduling method of power grid IT (information technology) resources |
Also Published As
Publication number | Publication date |
---|---|
CN108845884B (en) | 2024-04-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108845884A (en) | Physical source distributing method, apparatus, computer equipment and storage medium | |
CN109271447A (en) | Method of data synchronization, device, computer equipment and storage medium | |
Gunasekaran et al. | Fifer: Tackling resource underutilization in the serverless era | |
CN111708627B (en) | Task scheduling method and device based on distributed scheduling framework | |
Struhár et al. | React: Enabling real-time container orchestration | |
Tămaş-Selicean et al. | Design optimization of mixed-criticality real-time embedded systems | |
Axer et al. | Response-time analysis of parallel fork-join workloads with real-time constraints | |
CN100489790C (en) | Processing management device, computer system, distributed processing method | |
CN110597858A (en) | Task data processing method and device, computer equipment and storage medium | |
CN108920153B (en) | Docker container dynamic scheduling method based on load prediction | |
CN108897610A (en) | Method for scheduling task, device, computer equipment and storage medium | |
CN106406983A (en) | Task scheduling method and device in cluster | |
Soualhia et al. | Predicting scheduling failures in the cloud: A case study with google clusters and hadoop on amazon EMR | |
CN111625331B (en) | Task scheduling method, device, platform, server and storage medium | |
CN107291546A (en) | A kind of resource regulating method and device | |
CN103677990B (en) | Dispatching method, device and the virtual machine of virtual machine real-time task | |
CN112286671B (en) | Containerization batch processing job scheduling method and device and computer equipment | |
Imai et al. | Accurate resource prediction for hybrid IaaS clouds using workload-tailored elastic compute units | |
CN112486642B (en) | Resource scheduling method, device, electronic equipment and computer readable storage medium | |
Moulik | RESET: A real-time scheduler for energy and temperature aware heterogeneous multi-core systems | |
Caniou et al. | Budget-aware scheduling algorithms for scientific workflows with stochastic task weights on heterogeneous iaas cloud platforms | |
CN110196773B (en) | Multi-time-scale security check system and method for unified scheduling computing resources | |
CN106845746A (en) | A kind of cloud Workflow Management System for supporting extensive example intensive applications | |
CN109656692A (en) | A kind of big data task management method, device, equipment and storage medium | |
Werner et al. | HARDLESS: A generalized serverless compute architecture for hardware processing accelerators |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |