CN110287016A - A kind of distribution flow chart Heterogeneous Computing dispatching method - Google Patents
A kind of distribution flow chart Heterogeneous Computing dispatching method Download PDFInfo
- Publication number
- CN110287016A CN110287016A CN201910584305.4A CN201910584305A CN110287016A CN 110287016 A CN110287016 A CN 110287016A CN 201910584305 A CN201910584305 A CN 201910584305A CN 110287016 A CN110287016 A CN 110287016A
- Authority
- CN
- China
- Prior art keywords
- data
- task
- factor
- algorithm
- flow chart
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The present invention relates to field of computer technology, it is in order to which the isomery algorithm for solving to face in current distributed type assemblies job processing flow executes scheduling, data seamless circulates between isomery algorithm, parallel task partition by fine granularities, the problems such as stream data dynamic is handled, a kind of distributed flow chart Heterogeneous Computing dispatching method proposed, it is configured including customized calculation flow chart, centralization management and running calculation process, mixed scheduling is carried out to isomery algorithm, input to calculation procedure in flow chart, output data is managed and forms stream data treatment mechanism, multilingual isomery algorithm access SDK is provided simultaneously, configuration executes data seamless circulation problem between Preprocessing Algorithm program processing isomery algorithm, configuration executes Preprocessing Algorithm program and carries out partition by fine granularities etc. to parallel task, it constitutes under a set of distributed type assemblies work pattern to isomery The scheduling of algorithm calculation process executes the complete solution of management.
Description
Technical field
The present invention relates to field of computer technology more particularly to a kind of distributed flow chart Heterogeneous Computing dispatch deal sides
Method, specifically a kind of schedule management method of big data isomery algorithm process process.
Background technique
Recently as the diversification in data acquisition source, the demand that every profession and trade handles big data is increasingly apparent, packet
The data production for including spatial data production field is required substantially by multiple processes, carries out procedure using various software algorithm
Data mart modeling, big for data creation data amount, process flow is complicated, the diversified feature of handling implement, needs using stream
The PC cluster mode of journey improves overall data production efficiency, and the practicability of PC cluster and feasibility are mainly reflected in
To the scheduling aspect of workflow management, current existing calculating Scheduling Framework is had the disadvantage that
It is supported 1. being based only upon flow chart and providing logic control scheduling, scheduling granularity is only limitted to flowchart elements node, can not be right
The progress of flowchart elements node is more fine-grained to execute division, and generally requires in real data treatment process to single process
Figure unit carries out parallel processing, and need to carry out dynamic control to each parallel task.
2. scheduling flow is that current process unit end of run executes next or multiple relevant flow elements, nothing again
Method supports dynamic streaming task scheduling.
3. real data production process in using to tool algorithm have the automation journey run in isomerous environment
Sequence, the human-computer interaction of single machine class, C/S model cooperation class tool algorithm, and current Scheduling Framework has strict demand to algorithm, only
Can operate in Linux, PC, virtual machine some single environment program therein, can not accomplish to whole flow process or single stream
Different type tool algorithm in Cheng Danyuan carries out mixed scheduling.
4. there is no between stream compression offer support, the cluster based on Scheduling Framework exploitation algorithm for frame itself
Computing system can only meet single specific process application scenarios, cannot achieve the dynamic expansion of system processing scene ability, calculate
Method developer not only needs to learn the development technique of relevant framework, is transformed access to algorithm, it is also necessary to data conversion
And entire data flow is managed, and increases the technical threshold that algorithm development personnel carry out PC cluster processing exploitation.
In conclusion there is presently no the distributed flow chart of complete set, universality scheduling calculation methods to meet reality
Complicated, polynary application scenarios in the production of border data.
Summary of the invention
To solve the above-mentioned problems, the present invention provides a kind of distributed flow chart Heterogeneous Computing dispatching method and frame,
It is intended that a set of distributed flow chart Heterogeneous Computing Scheduling Framework and system are provided, to solve process tune in the prior art
Degree Control granularity degree of refinement is not high, not can be carried out streaming task data processing, cannot be to isomery algorithm mixed scheduling and different
Between structure algorithm data can not seamless automation circulation the problem of.
To achieve the above object, a kind of distributed flow chart Heterogeneous Computing method provided by the invention is achieved in that
As shown in Figure 1, be according to the present invention involved in system architecture diagram, distributed flow chart Heterogeneous Computing mainly includes process
Figure calculates configuration module 101, task schedule micro services 102, PC cluster resource management service 103, Hadoop task execution and guards
Six parts such as process 104, PC task execution finger daemon 105, third party's algorithm access SDK 106 are constituted, wherein distributed
The planning of flow chart Heterogeneous Computing and the specific providing method for executing scheduling are as follows:
Configuration module, which is calculated, by flow chart carries out engineering creation to meet multi-tenant requirement, according to real data process flow base
In the engineering visioning procedure figure.As shown in Fig. 2, be according to the present invention involved in distributed heterogeneous flowchart configuration structure
Scheme, strong dependence 202(solid arrow can be configured between the flow elements 201 of the flow chart) and weak dependence 203(void
Line arrow), the strong dependence 202 is embodied in flow elements 203 only associated with target flow elements 204 and has executed
At could performance objective flow elements 204, as long as the weak dependence 201 be embodied in it is associated with target flow elements 205
Flow elements 201 start execute can performance objective flow elements 205, the flow elements can customize configuration one processing
Calculate that the factor 207 and multiple pretreatments calculate the factor 206, post-processing calculates the factor 208, the pretreatment calculating factor and described
Post-processing, which calculates the factor, can only configure a tool algorithm, and the processing, which calculates the factor, can configure multiple tool algorithms, described
Calculate and can configure strong 209 between the factor, weak 210 rely on incidence relation, the strong dependence 210 be embodied in only target calculate because
A upper calculating factor 211 for son 207 completes ability performance objective and calculates the factor 207, and the weak dependence 209 is embodied in only
The upper calculating factor 206 for wanting target to calculate the factor 211 has output can the performance objective calculating factor 211.It can be stream
Cheng Danyuan configuration condition control valve 212, the condition valve can custom-configure condition threshold values, and calculation procedure can be in operation knot
Condition threshold values is set after beam to execute the process of the condition of satisfaction.
Third party's algorithm access SDK is the secondary development SDK that software algorithm is linked under Scheduling Framework, and the third party calculates
Method access SDK provide so, dll, jar development library to support to run on isomery under the isomerous environments such as Windows, Linux, virtual machine
The access of algorithm.Third party's algorithm access SDK is responsible for the initial work and task schedule micro services, system of processing access cluster
The cluster centers service of the dependences such as one user login carries out RPC interface communication and externally provides algorithm routine access platform
Second development interface, so that the platform service attributes such as cluster access, task flow management, data stream management is saturating to the algorithm of access
Bright, algorithm routine, which only needs to carry out a small amount of transformation based on simple SDK interface, just can be linked into the calculating scheduling of platform, master
Want providing method as follows:
1. providing a kind of general file description scheme body to describe as the agreement for identifying file between isomery algorithm, institute
Stating general file description scheme body includes file type, the algoritic module coding for generating this document, the customized log-on data of platform
Five type, file absolute path, custom parameter attributes.Wherein, it is file, file that the file type, which is used to tab file,
Folder, the algoritic module coding for generating this document, which is used to mark this document, to be generated by that algorithm.The platform is made by oneself
Adopted log-on data type is the unique code of data type registered to platform, and platform provides data type register interface, can be to flat
Platform registers specified data type, and platform is that the data type generates a global exclusive identification code, and algorithm routine can be with
Determined by judging the customized log-on data type of platform this document whether be specified data type to be dealt with file.
The file absolute path is the file absolute path that algorithm routine can be directly read.The custom parameter attribute is used to access
The certain custom parameters for needing to transmit in algorithm implementation procedure, such as the file description scheme body describe a tiff
Image, then can be by information such as the image capturing range of the tiff image, seven parameters according to assigning after customized data structure organization
It is worth and carries out service parameter transmitting to the custom parameter attribute.
2. providing the interface of setting data source, the data source is described with general file description scheme body, it then follows logical
With the use protocol criteria of file description scheme body, the data source as entire scheduling flow input data source to all meters
Calculate the factor as it can be seen that and support the increment addition of data source, when the data source is local data, the setting data source is connect
Local data is first committed to the specified catalogue of server by mouth, then submits data source to platform.It also provides simultaneously for obtaining number
According to the interface in source, and the call back function registration interface for the variation of monitored data source is provided, calculation procedure can be by registering back
Letter of transfer number carrys out the variation of monitored data source to have the dynamic processing capacity in streaming data source.
3. providing the interface of setting output data, the output data is one or more " task encodings-data list "
Map set, wherein task encoding is transparent to the calculation procedure of access, and the data in the data list are described with general file
Structural body is described, it then follows the use protocol criteria of general file description scheme body, the output data are to calculate the factor
Output data, the output data is as the input data for being associated with the next or multiple calculating factor for calculating the factor.
It providing simultaneously and obtains the specified interface for calculating factor input data, the input data is upper one output data for calculating the factor,
The data are described with general file description scheme body, it then follows the use protocol criteria of general file description scheme body.Together
When provide and register call back function registration interface for monitoring the input data, calculation procedure can be registered by the interface
Call back function monitors the variation of input data to have the dynamic processing capacity of convection type input data.
4. providing the interface of setting global data, the global data is described with general file description scheme body, is abided by
Follow the use protocol criteria of general file description scheme body.The interface for obtaining global data is provided simultaneously, and the global data is used
General file description scheme body is described, it then follows the use protocol criteria of general file description scheme body, the global data
The factor is calculated as it can be seen that i.e. all calculating factors can be got by the interface for obtaining global data to all in flow chart
The global data, suitable for the stream compression between the calculating factor without the relationship of direct correlation in certain data streams journey figures.Together
When provide and register interface for monitoring global data call back function, calculation procedure can be moved by the registration call back function
State monitors the variation of global data and carries out corresponding business processing, so that it is global so that the calculating factor is had dynamic processing streaming
The ability of data.
5. providing the interface for submitting Hadoop parallel computation task, the submission Hadoop parallel computation task interface master
It is used for calculation procedure and fine-grained processing unit division is carried out to the derived data got, and can refer to for each processing unit
Fixed different Hadoop algorithm routine, and ready-portioned processing unit task data structure is committed to task schedule micro services clothes
Business is to execute Hadoop parallel task.
6. providing the interface for submitting PC parallel computation task, the interface for submitting PC parallel computation task is mainly used for
Calculation procedure carries out fine-grained processing unit division to the derived data got, and can be specified not for each processing unit
Same Windows algorithm routine, and ready-portioned processing unit task data structure is committed to task schedule micro services service
To execute PC parallel task.
In addition to above-mentioned critical function interface, third party's algorithm access SDK setting condition threshold values is also provided, to platform feed back into
The information such as degree, task run state, the interface that the business algorithms such as initialization, resource release may need to use.
The management and running center that task schedule micro services are executed as entire calculating task, is served in the form of micro services
Single engineering, i.e. an engineering will start a task schedule micro services to carry out the scheduling of the flow chart Heterogeneous Computing of the engineering
Management, it is as follows that the task schedule micro services handle the distributed heterogeneous process for calculating scheduling:
1. task schedule micro services calculate the flow chart that configuration module configuration generates based on flow chart and execute configuration template, initially
Change Heterogeneous Computing scheduling model, creates corresponding server-side engineering bibliographic structure according to the Heterogeneous Computing scheduling model.
2. task schedule micro services can be defaulted first executes first pretreatment factor, while by the pretreatment factor
State is set in progress, in the status set progress of flow chart flow elements.Calculation procedure can pass through in the process of implementation
Tripartite's algorithm accesses the interface that SDK is provided and carries out progress report to task schedule micro services, when the task schedule micro services are received
To calculation procedure issue end message when, then by the calculating factor marker be completion status.Task schedule micro services are executing
Next or multiple calculating factor of weak rigidity therewith, task schedule micro services can be executed after the pretreatment calculating factor
The strongly connected next therewith or multiple calculating factors could be executed after the end message for receiving the pretreatment and calculating the factor,
Wherein the calculating factor includes pretreatment, processing, the post-processing calculating factor.When all calculating factor knots in flowchart elements
Shu Shi, task schedule micro services complete the status set of the flowchart elements.
3. then generating the pretreatment factor when the calculating factor of task schedule micro services scheduling is Hadoop program
Perform script, and the task is committed to Hadoop task execution finger daemon to execute, when the pretreatment factor is
When PC program, then the task is committed to PC cluster resource management service, PC cluster resource management service passes through existing PC collection
Group's computing resource carries out resource allocation, and the task is pushed to specified PC task execution finger daemon to execute the calculating
The factor.
4. task schedule micro services, which receive and save, calculates the factor to task schedule micro services submission data source data, stream
All calculate nodes in journey figure can take out the data source number by the acquisition data resource interface that third party's algorithm accesses SDK
According to.Task schedule micro services, which receive and save, calculates the output data that the factor is submitted, and is associated with the calculating factor of the calculating factor
The output data can be got by the interface that third party's algorithm accesses the acquisition input data that SDK is provided.Task schedule is micro-
Service, which receives and saves, calculates the global data that the factor is submitted, and all calculating factors are accessed by third party's algorithm in flow chart
The global data interface that SDK is provided can get the global data.
5. task schedule micro services provide a kind of processing parallel computation scheduling mechanism, to need to carry out in flowchart elements
Configure pretreatment before the treatment factors of parallel computation and calculate the factor to carry out parallel task planning, the pretreatment calculating because
Subbase carries out task rule in derived data (note: derived data includes data source data, input data, global data in patent)
It draws, the task may include Hadoop parallel task and PC parallel task, that is, support to the same flowchart elements using different
The algorithm of structure carries out mixed processing.As shown in figure 3, be according to the present invention involved in parallel task partition mechanism schematic diagram, stream
Journey figure unit 301 is configured with that pretreatment calculates the factor 302 and a processing calculates the factor 303, the pretreatment calculating because
Son 301 is configured with a preprocessor 304, and the processing calculates the factor 303 and is configured with Hadoop program 1, Hadoop
308 4 kinds of program 2 306, PC program 307, PC program Heterogeneous Computing programs, task schedule micro services are executing the flow chart
When unit 301, a calculating State of factors relied in the factor 302 can be calculated according to the pre- processing and is counted to execute the pretreatment
The factor 302 is calculated, to start preprocessor 304, the preprocessor 304 is micro- from task schedule by input data interface
Service acquisition is incited somebody to action to input data 309,310,311,312, and according to own service scene (task handles a data)
The input data is divided into four tasks, and each task respectively specifies that different algorithm routines is handled, according to described
The algorithm routine type of business configuration, treatment factors 302, which call, submits the interface of Hadoop parallel computation to task schedule micro services
Hadoop parallel computation task 313,312 is submitted, the interface for submitting PC parallel computation task is called to mention to task schedule micro services
Hand over PC parallel computation task, task schedule micro services receive it is described pretreatment calculate the factor submit parallel task after, to appoint
Business carries out execution process flow, is carried out to same flowchart elements using different isomery algorithms to realize and is calculated place
Reason.On the other hand, data required for each affairs of each algorithm routine processing use software user transparence, software
Person is not necessarily to pay close attention to the data of algorithm routine needs, it is only necessary to which oriented mission carries out operation, and algorithm routine can be logical when starting
It crosses dispatching platform and loads the data needed automatically.
PC cluster resource management service is responsible for management and participates in the PC cluster resource of PC cluster, PC task execution guard into
GPU, CPU, memory computing resource information where journey is reported to PC cluster resource management service on PC node, PC cluster resource pipe
Reason service receives and saves the computing resource information.When PC cluster resource management service is received from task schedule micro services
When the PC parallel computation task of submission, PC cluster resource management service calculates configuration module acquisition to flow chart by task ID and holds
The computing resource parameter that the calculation procedure and the calculation procedure of the row task need, is based on the computing resource parameter and institute
It states computing resource information and carries out Dynamic Programming distribution, task is pushed to the PC task execution finger daemon of specified PC node, and
Receive the task execution information feedback from PC task execution finger daemon.When task execution terminates, PC cluster resource management is taken
Business discharges the PC node resource, and the PC node resource is participated in new task resource planning.It is lost when connecing task execution
It loses, the PC task that the task can be planned again and be pushed on another PC node by PC cluster resource management service
It executes finger daemon to be re-executed, when maximum times of the task more than setting, then stops the failure weight of the task
Try process.
PC task execution finger daemon is the PC finger daemon with interface, and it includes automatic algorithms journey that PC cluster task, which can be performed,
Sequence and human-computer interaction program.PC task execution finger daemon receive PC cluster resource management service push task after, according to
Task is executed according to the classification of the calculation procedure of the task execution.If the classification of the calculation procedure is automatic attribute, PC
Cluster resource management service automatically generates perform script, and the script logging, which executes, to be put down required for the automation algorithm of task
The information such as platform parameter, input data, working directory, and opened the script file as the start-up parameter of the automation algorithm
Dynamic operation automation algorithm, the whole detection automation algorithm running state of process during algorithm operation, if detecting automation
Algorithm process exception exits or time-out, then marks the task execution to fail, and the execution state of the task is fed back to PC
Cluster resource management service executes subsequent abnormality processing.If the classification of the calculation procedure is passive attribute, described will appoint
Business is shown to task list, and when operating personnel gets and starts to execute the task, PC task execution finger daemon, which generates, to be used
In the perform script for executing the passive human-computer interaction generic task, the perform script has recorded the specified man-machine friendship of execution task
The information such as platform parameters, input data, working directory required for mutual algorithm routine, using the script file as described automatic
Change the start-up parameter starting operation automation algorithm of algorithm.
Hadoop task execution finger daemon is based on MapReduce, Yarn and executes Hadoop calculating task, Hadoop task
The Hadoop calculating task that finger daemon receives the submission of task schedule micro services is executed, judges the Hadoop calculating task
Task category then passes through MapReduce if it is unique calculating task and executes the unique calculating task, and is based on
Zookeeper to the unique calculating task carry out uniqueness guarantee, keep heartbeat detection, when detect the unique appoint
After the heartbeat timeout of business, then processing is retried to what the unique task carried out configurable number.Then if it is simultaneously calculating task
The parallel computation task is executed by MapReduce, when parallel computation task execution failure, then to described parallel
What business carried out configurable number retries processing.The distribution of cluster resource provides guarantee by Yarn.
Distributed computing process is custom-configured since the present invention uses, and flowchart elements node is finely divided, is
Flow chart node configuration pretreatment, processing, post-processing calculate the factor, can configure different types of algorithm journey to the calculating factor
Sequence.Task schedule micro services are based on custom-configuring template and being scheduled entire calculation flow chart.Using running on Linux
Hadoop task execution finger daemon and run on the heterogeneous system framework of PC task execution finger daemon to realize pair
The execution and offer fault-tolerant processing of algorithm routine under Hadoop, PC isomerous environment.PC is realized by PC cluster resource management service
The dynamic allocation of cluster resource.On the other hand, the SDK that distributed heterogeneous computing platform is linked into for algorithm routine is provided, it is fixed
The general data description scheme body that justice is identified for stream compression between algorithm is established with platform by way of RPC interface and is communicated
Connection, algorithm routine can request derived data source to computing platform, output data are provided, submits parallel task, report shape
State progress, thus at least obtain it is following the utility model has the advantages that
1. operation flow is separated with scheduling flow is calculated, allows and calculate scheduling flow in algorithm routine level independently of industry
Process of being engaged in carries out flexible configuration.According to the customized flow chart calculation template of practical business scene, flowchart elements are refined,
The factor can be calculated with a processing and multiple pretreatments, post-processing to flowchart elements, calculate the factor and configure calculation procedure, thus
Whole flow process execution is set to become very flexibly, to form the incidence relation structure of " flowchart elements-calculating factor-calculation procedure ".
The concept for introducing the calculating factor decouples flowchart elements and calculation procedure, when handling the different data of same operation flow
It only needs the calculating factor to flowchart elements to be adjusted, replaces different pretreatments, post processor, without to whole
A flow chart calculation template is adjusted.
2. realizing the data automatic flow between isomery algorithm routine.The business processing program in the factor can be calculated in processing
Configuration pretreatment post processor, so that the business processing program decoupling between flow elements, the pretreatment post-process journey up and down
Sequence is based on third party's algorithm and accesses SDK development library, bridges the processing routine of flowchart elements up and down, solves and calculates in flow chart
Stream compression and adaptation issues in scheduling between isomery algorithm routine, algorithm routine no longer need the algorithm routine according to upstream and downstream
Carry out adaptation adjustment, it is only necessary to dynamic configuration pretreatment, post processor between isomery algorithm.
3. realizing in whole flow process or single flow elements and running on automated procedures, single machine class in isomerous environment
Mixed scheduling between human-computer interaction, C/S model cooperation class tool algorithm.It, can be with due to using system architecture as shown in Figure 1
Task based access control executes the isomery algorithm under Linux, Windows environment, to realize the collaborative work between multiplexing tool.
4. pair flow chart cell node carries out more fine-grained parallel processing.The processing of flowchart elements node calculate because
Son is supported to configure a variety of different calculation procedures, the factor can be calculated for the processing and configures one for dividing parallel task
Pretreatment calculates the factor and, according to practical business scene, data is carried out after the pretreatment calculating factor gets input data
More fine-grained task divides, and can specify the calculation procedure for executing the task, submits to task schedule micro services parallel
Calculating task carries out more fine-grained parallel processing to flow chart cell node to reach.
5. supporting streaming task operation.Preprocessor can access the data decryptor mechanism of SDK by third party's algorithm
Carry out the data that dynamic acquisition continually inputs, is mentioned after newly-increased data are carried out parallel task division after receiving new data
Parallel computation task is handed over, to realize the streaming task operation of streaming data.
6. versions of data is controllable.Distributed flow chart heterogeneous computing system has recorded each in carrying out implementation procedure
The information of " task-data ", the stream compression in entire process flow have important by platform courses in data production
Meaning.
7. reducing the exploitation threshold of calculation procedure access cluster flow scheduling.Distributed flow chart Heterogeneous Computing dispatches frame
Frame is transparent to calculation procedure by general-purpose attributes such as access cluster, stream compression, node state schedulings, and calculation procedure only needs to pay close attention to business
Processing itself, without the affairs other than processing business itself.
It should be appreciated that above-mentioned general description and following specific embodiments are merely illustrative and illustrative,
It can not limit the range of the invention to be advocated.
Detailed description of the invention
Below in conjunction with attached drawing, the invention will be described in further detail.
Attached drawing 1 be according to the present invention involved in system architecture diagram.
Attached drawing 2 be according to the present invention involved in distributed heterogeneous flowchart configuration structure chart.
Attached drawing 3 be according to the present invention involved in parallel task partition mechanism schematic diagram.
Attached drawing 4 is a kind of the excellent of distributed flow chart Heterogeneous Computing dispatching method that the specific embodiment of the invention provides
Exemplary schematic diagram is implemented in choosing.
Attached drawing 5 is the inside configuration schematic diagram that flowchart elements 1 in exemplary schematic diagram are preferably implemented in the present invention.
Attached drawing 6 is the inside configuration schematic diagram that flowchart elements 2 in exemplary schematic diagram are preferably implemented in the present invention.
Attached drawing 7 is the inside configuration schematic diagram that flowchart elements 5 in exemplary schematic diagram are preferably implemented in the present invention.
Specific embodiment
Understand in order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below will with attached drawing and in detail
Narration clearly illustrates the spirit of disclosed content, and any skilled artisan is understanding the content of present invention
After embodiment, when the technology that can be taught by the content of present invention, it is changed and modifies, without departing from the essence of the content of present invention
Mind and range.
The illustrative embodiments of the present invention and their descriptions are used to explain the present invention, but not as a limitation of the invention.
As shown in figure 4, being a kind of distributed flow chart Heterogeneous Computing dispatching method that the specific embodiment of the invention provides
The exemplary schematic diagram of preferred implementation, according to practical business scene flow chart calculate configuration module in define specific business
Process flow, and each flowchart elements are carried out to calculate factor configuration.
As shown in figure 5, be the inside configuration schematic diagram that flowchart elements 1 in exemplary schematic diagram are preferably implemented in the present invention,
Flowchart elements 401 are configured with 503 3 pretreatment 501, pretreatment 502 and processing calculating factors in Fig. 4, and are the pre- place
Reason calculates the factor 501 and is configured with calculation procedure Soft1-1, and the pretreatment calculates the factor 502 and is configured with calculation procedure Soft1-
2, the processing calculates the factor 503 and is configured with calculation procedure Soft1-3.
Task schedule micro services default first pretreatment in execution flow chart and calculate the factor 501, due to calculation procedure
Soft1-1 belongs to passive matrix attribute, and is operate in the window application on PC, and task schedule micro services only will meter
It calculates State of factors to be labeled as having started, is to have started by 401 status indication of flowchart elements.PC task execution finger daemon is synchronous
The scheduling flow template of task schedule micro services, display is configured with the flow chart node of PC program on interface.It is flowed when first
When configuration has the PC calculation procedure of passive matrix attribute on journey node of graph, it can trigger to execute generating one by right mouse button
The perform script of public simultaneously starts corresponding calculation procedure, and if the configuration of pretreatment 501 in Fig. 5 meets above-mentioned condition, PC appoints
Business executes finger daemon and generates perform script and start Soft1-1, and the Soft1-1 is the human-computer interaction journey with interface
Sequence, local data are packaged by the Soft1-1 using the data structure of general file description scheme body, and call third
The setting data resource interface that square algorithm access SDK is provided, submits data source Data1-1 to platform, it should explanation, the present invention
The data mentioned in specific embodiment include structure and unstructured data, and third party's algorithm accesses the setting data source of SDK
Data source Data1-1 is uploaded in the nodal directory that server distributes by interface, and will be in generic file structure body
The path Windows switchs to the path Linux.The data source Data1-1 belongs to global data, and all calculation procedures are logical in flow chart
Crossing acquisition data resource interface can get according to source Data1-1.After submitting data source, send terminates to disappear the Soft1-1
It ceases and gives task schedule micro services, pretreatment is calculated State of factors and is labeled as having terminated by task schedule micro services.
Belong to strong dependence since pretreatment calculates the factor 502 and pre-processes the calculating factor 501, so ought only locate in advance
Reason just starts execution pretreatment and calculates the factor 502 when calculating the execution completion of the factor 501, task schedule micro services mark to be located in advance
Reason calculates the factor 501 and calculates the factor 502 after completion status, to start to execute pretreatment.Pretreatment calculates the meter that the factor 502 configures
Calculating program Soft1-2 is the preprocessor for running on Hadoop, and task schedule micro services generate starting script and start
The calculation procedure Soft1-2, the calculation procedure Soft1-2 obtain data source by calling third party's algorithm to access SDK
The data source Data1-1 is divided into multiple data blocks by the business demand that interface gets according to source Data1-1, according to itself
Data1-2-1 to Data1-2-n, and as unit of each data block generate " task-data " parallel computation task data knot
Structure gets the Software Coding that next calculating factor is configured by interface, and " task-data " interface by described in is customized
Execution software distribution is carried out, the incidence relation of " task-data-software " is formed, is only configured with since processing calculates in the factor 503
One runs on the Soft1-3 program of Hadoop, therefore the preprocessor Soft1-2 is by calling the access of third party's algorithm
Ready-portioned task is committed to task schedule micro services and executed by the interface of the submission Hadoop parallel task of SDK, task schedule
Micro services generate perform script after receiving the Hadoop parallel task of submission, and are opened by Hadoop task execution finger daemon
Dynamic parallel computation task.Each process of the Soft1-3 can access the acquisition input data that SDK is provided by third party's algorithm
Interface get and execute data block associated by each carry out task, after handling data block, call third party's algorithm
Access the output data Data1-3-1 to Data1-3- that the interface of SDK setting output data is generated to platform output calculation procedure
N calls the end interface notification task schedule micro services calculating of third party's algorithm access SDK to terminate after the completion of calculating.Task tune
Degree micro services summarize the end notification of each task, at the end of all parallel tasks all terminate, and upper one pre-processes, then
It marks calculating factor nodes execution to terminate, terminates when all calculating factors in flowchart elements calculate, then mark flow chart
Unit, which executes state, to be terminated.
As shown in fig. 6, being the inside configuration schematic diagram that flowchart elements 2 in exemplary schematic diagram are preferably implemented in the present invention.
Flowchart elements 402 are configured with pretreatment and calculate the factor 601 and processing calculating 602 two calculating factors of the factor in Fig. 4, pre- to locate
Reason calculates the factor 601 and is configured with calculation procedure Soft2-1, processing calculate the factor 602 be configured with Soft2-2, Soft2-3,
Tri- calculation procedures of Soft2-4.Calculation procedure Soft2-2, Soft2-3, Soft2-4 constitute the module collection an of C/S model
It closes, wherein the calculation procedure Soft2-2 is operate in the serve end program of Hadoop, the calculation procedure Soft2-3,
Two different client human-computer interaction programs that Soft2-4 is operate on Windows.
Since flowchart elements 402 in Fig. 4 and flowchart elements 401 belong to weak dependence, as long as flow chart list
Processing in member 401 calculates can execution flow chart unit 402 when having output data in the factor.The starting of task schedule micro services
The preprocessor Soft2-1 of Hadoop is run on, and pretreatment node state is labeled as operating status, by flowchart elements
402 are labeled as operating status.The preprocessor Soft2-1 is by calling data resource interface to get data source, and from number
The input data Data2-1-1 needed according to starting calculation procedure Soft2-2 is taken out in source, generates the number of " task-data-software "
Parallel computation task is submitted to task schedule micro services according to structure, and the parallel computation task type is belonged to labeled as unique
Property, indicate that the calculating task uniquely exists in platform.The preprocessor Soft2-1 monitors input data by registration
Call back function come dynamically obtain the stream data exported in flowchart elements 401, foundation after receiving new input data
The input data is divided into fine-grained task by practical business demand, and the client of the execution task is specified to calculate journey
Sequence Soft2-3, Soft2-4 is called immediately and the interface of PC parallel task is submitted to submit PC parallel task to task schedule micro services.
Since calculation procedure Soft2-3, Soft2-4 are to need to carry out artificially to get task to start to execute, also needing will be described
PC parallel task data markers be passive attribute, task schedule micro services check the PC parallel task be passive attribute when,
It is no longer committed to PC cluster resource management service and carries out resource allocation, but push to PC task execution finger daemon and carry out interface
It has been shown that, operator can execute the Parallel PC task by start menu in PC task execution finger daemon come specified.Due to
The calculation procedure configured in the processing calculating factor in flowchart elements 405 needs to use the output in calculation procedure Soft2-2
Data Data2-2-1-1, therefore calculation procedure Soft2-2 needs to call setting global data interface will in setting output data
Data2-2-1-1 is set as global data.
As shown in fig. 7, being the inside configuration schematic diagram that flowchart elements 5 in exemplary schematic diagram are preferably implemented in the present invention.
Flowchart elements 405 are configured with a pretreatment and calculate the factor 701 and a processing calculating factor 702 in Fig. 4, and pretreatment calculates
The factor 701 is configured with the preprocessor Soft5-1 for running on Hadoop, and the processing calculating factor 702, which is configured with, to be run on
Tri- kinds of different calculation procedures of Soft5-2, Soft5-3, Soft5-4 on Windows.
Since flowchart elements 405 and flowchart elements 403, flowchart elements 404 belong to strong dependence, only
When the flowchart elements 403 and the flowchart elements 404 are all completed, task schedule micro services just start execution flow chart
Unit 405.Task schedule micro services start the preprocessor Soft5-1 for running on Hadoop, the preprocessor
Soft5-1 is by calling input data to obtain the output data of a upper flowchart elements, and the interface by obtaining global data
Obtain the global data Data2-2-1-1 exported in flowchart elements 402.What the preprocessor Soft5-1 will acquire
Data source carries out fine-grained " task-data " and divides, and according to service conditions that ready-portioned task is specified by certain rule
It is executed by calculation procedure Soft5-2, Soft5-3, Soft5-4.After task schedule micro services receive the task, detect
The task is to execute the task of PC PC cluster, and the task is then committed to PC cluster resource management service, PC cluster
Resource management service matches PC cluster resource information to task Dynamic Programming according to from the registration of PC task execution finger daemon, and
The task is pushed to the PC task execution finger daemon of computing resource.PC task execution finger daemon receives task
Afterwards, it generates perform script and starts corresponding calculation procedure.If after the task execution failure, PC task execution finger daemon
4 tasks can be re-executed, after being retried unsuccessfully if 4 times if the task fed back into PC cluster resource management service, PC
Another calculate node that the task can be reassigned to available free resource by cluster resource management service is attempted, such as
Fruit still fails, then terminates and retry process and be failed tasks by the task flagging.
Flowchart elements 406 and flowchart elements 405 belong to weak dependence in Fig. 4, when flowchart elements 406 have output
When task schedule micro services begin to execution flow chart unit 406.The flowchart elements 406 are provided with condition control valve 409,
The condition control valve 409 is configured with 1 and 2 two condition threshold values, and the algorithm routine that node is handled in flowchart elements 406 can be with
Threshold values setting is carried out according to own service executive condition, if it is 1 that threshold values, which is arranged, to task schedule micro services, by flowchart elements
406 output is directed toward flowchart elements 402 and is handled again affiliated output, if being to task schedule micro services setting threshold values
2, then flowchart elements 408 are directed toward in the output of flowchart elements 406 and handle.Flowchart elements 407 and flowchart elements
405 belong to strong dependence, and flowchart elements 405 is needed just to start execution flow chart unit 407, flowchart elements when completing
408 belong to weak dependence with flowchart elements 406, and flowchart elements 408 and flowchart elements 407 belong to strong dependence,
Therefore it needs that execution flow chart unit 408 could be started after the completion of flowchart elements 407.
Claims (9)
1. a kind of distribution flow chart Heterogeneous Computing dispatching method, general characteristic includes: to custom-configure distributed heterogeneous meter
Calculate scheduling flow figure, management is scheduled to calculation process with the scheduling micro services of centralization, in conjunction with Hadoop mission planning with
PC mission planning service carries out reasonable resource distribution to parallel computation task and by running on Linux, Windows isomerous environment
Under finger daemon receive and execute isomery algorithm, the data seamless automatic flow between isomery algorithm, streaming data is flowed
The dynamic of formula task is handled, for accessing the second development interface etc. of distributed heterogeneous computing platform.
2. one kind according to claim 1 custom-configures distributed heterogeneous calculating scheduling flow figure, feature includes: certainly
Business process map is defined, and the factor can be calculated to the multiple pretreatments of flowchart elements configuration, processing, post-processing;It is wherein each pre-
Processing, post-processing, which calculate the factor, can configure a calculation procedure, and processing, which calculates the factor, can configure a variety of different types of calculating
Program;Strong, weak rigidity relationship can be configured between flowchart elements, can configure strong, weak rigidity relationship between the calculating factor;Flow chart
Condition control valve can be set to be scheduled flow chart branches decision in unit.
3. strong, weak rigidity relationship according to claim 2, feature includes: only to flow between strongly connected flowchart elements
The flowchart elements being associated could be executed after the completion of journey figure unit, as long as flowchart elements between the flowchart elements of weak rigidity
There is output that can execute the flowchart elements being associated, only calculating the factor between the strongly connected calculating factor can just hold after the completion
The calculating factor that is associated of row, between the calculatings factor of weak rigidity as long as calculate the factor have export as long as can execute the meter being associated
Calculate the factor.
4. it is according to claim 1 it is a kind of management method is scheduled to calculation process with the scheduling micro services of centralization,
It is characterized by: a calculation flow chart runs a scheduling micro services, matched by customized distributed heterogeneous calculation flow chart
Template initialization scheduling model is set, is executed according to dependence and calculates the factor, receive and save the output data of each calculate node,
The stream compression control calculated between the factor, flowchart elements is provided, parallel computation scheduling mechanism is provided, maintains and calculates the factor and stream
The execution state of journey figure unit.
5. a kind of parallel computation scheduling mechanism according to claim 4, main feature includes: to match before the parallel computation factor
It sets pretreatment and calculates the factor, the pretreatment calculates the factor and is configured to divide the preprocessor of parallel computation task, described
Preprocessor carries out processing minimum unit to data by the derived data obtained and divides, and " task-is generated in the form of task
Data " incidence relation, and may specify the calculation procedure for executing the task, submit parallel computation task to touch to scheduling micro services
Send out parallel computation.
6. data seamless automatic flow method between a kind of isomery algorithm according to claim 1, feature includes: based on logical
Unify legislation is carried out to data with file structure, the adaptation journey for being used for data conversion is configured between different isomery programs
Sequence, the adaptation procedure handle the output data after the output data for getting a upper algorithm routine, are converted into
The data are exported to next calculating factor after next matched data format of algorithm routine.
7. a kind of streaming data according to claim 1 carries out the method for dynamically processing of streaming task, feature includes:
The derived data of the factor, the dynamic acquisition of shape paired data are calculated by monitoring, and newly-increased data are carried out in the form of task
" task-data " binding is committed to platform and executes the task, carries out at the dynamic of streaming task to form streaming data
Reason.
8. a kind of second development interface for accessing distributed heterogeneous computing platform according to claim 1, main feature
Include: that structuring, unstructured data are described using general file description scheme body, is connect based on the secondary development
The calculation procedure of mouth exploitation has to platform setting output data, the visible permission of setting data, requests derived data, monitoring source
Data submit parallel task, and condition threshold values is arranged, and feed back the abilities such as task execution progress and state.
9. a kind of method controlled by setting condition threshold values control calculation process according to claim 8, main
Wanting feature includes: flowchart elements configuration condition control valve, and condition control valve configures condition when threshold values and threshold values triggering
Branch, supports conditions threshold values is manually arranged and calculation procedure setting in calculating process, executes when meeting specified conditions threshold values pair
Answer branch pointed by condition threshold values.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910584305.4A CN110287016A (en) | 2019-07-01 | 2019-07-01 | A kind of distribution flow chart Heterogeneous Computing dispatching method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910584305.4A CN110287016A (en) | 2019-07-01 | 2019-07-01 | A kind of distribution flow chart Heterogeneous Computing dispatching method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110287016A true CN110287016A (en) | 2019-09-27 |
Family
ID=68020346
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910584305.4A Pending CN110287016A (en) | 2019-07-01 | 2019-07-01 | A kind of distribution flow chart Heterogeneous Computing dispatching method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110287016A (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111984648A (en) * | 2020-08-19 | 2020-11-24 | 上海翘腾科技有限公司 | Data initialization method and system under micro-service architecture |
CN112232115A (en) * | 2020-09-07 | 2021-01-15 | 北京北大千方科技有限公司 | Calculation factor implantation method, medium and equipment |
CN112698878A (en) * | 2020-12-18 | 2021-04-23 | 浙江中控技术股份有限公司 | Calculation method and system based on algorithm microservice |
CN112767513A (en) * | 2020-12-31 | 2021-05-07 | 浙江中控技术股份有限公司 | Visual flow chart, event synchronous configuration tool and flow chart drawing method |
CN112860450A (en) * | 2020-12-04 | 2021-05-28 | 武汉悦学帮网络技术有限公司 | Request processing method and device |
CN113497814A (en) * | 2020-03-19 | 2021-10-12 | 中科星图股份有限公司 | Satellite image processing algorithm hybrid scheduling system and method |
CN114866514A (en) * | 2022-04-29 | 2022-08-05 | 中国科学院信息工程研究所 | Multi-user data flow control and processing method, device, equipment and medium |
CN114924877A (en) * | 2022-05-17 | 2022-08-19 | 江苏泰坦智慧科技有限公司 | Dynamic allocation calculation method, device and equipment based on data stream |
CN115617533A (en) * | 2022-12-14 | 2023-01-17 | 上海登临科技有限公司 | Process switching management method in heterogeneous computing and computing device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120159506A1 (en) * | 2010-12-20 | 2012-06-21 | Microsoft Corporation | Scheduling and management in a personal datacenter |
CN105022670A (en) * | 2015-07-17 | 2015-11-04 | 中国海洋大学 | Heterogeneous distributed task processing system and processing method in cloud computing platform |
AU2017100410A4 (en) * | 2014-09-22 | 2017-05-11 | Tongji University | Method and system for large-scale real-time traffic index service based on distributed framework |
CN106775632A (en) * | 2016-11-21 | 2017-05-31 | 中国科学院遥感与数字地球研究所 | A kind of operation flow can flexible expansion high-performance geographic information processing method and system |
-
2019
- 2019-07-01 CN CN201910584305.4A patent/CN110287016A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120159506A1 (en) * | 2010-12-20 | 2012-06-21 | Microsoft Corporation | Scheduling and management in a personal datacenter |
AU2017100410A4 (en) * | 2014-09-22 | 2017-05-11 | Tongji University | Method and system for large-scale real-time traffic index service based on distributed framework |
CN105022670A (en) * | 2015-07-17 | 2015-11-04 | 中国海洋大学 | Heterogeneous distributed task processing system and processing method in cloud computing platform |
CN106775632A (en) * | 2016-11-21 | 2017-05-31 | 中国科学院遥感与数字地球研究所 | A kind of operation flow can flexible expansion high-performance geographic information processing method and system |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113497814A (en) * | 2020-03-19 | 2021-10-12 | 中科星图股份有限公司 | Satellite image processing algorithm hybrid scheduling system and method |
CN111984648B (en) * | 2020-08-19 | 2023-10-31 | 上海翘腾科技有限公司 | Data initialization method and system under micro-service architecture |
CN111984648A (en) * | 2020-08-19 | 2020-11-24 | 上海翘腾科技有限公司 | Data initialization method and system under micro-service architecture |
CN112232115A (en) * | 2020-09-07 | 2021-01-15 | 北京北大千方科技有限公司 | Calculation factor implantation method, medium and equipment |
CN112232115B (en) * | 2020-09-07 | 2024-02-13 | 北京北大千方科技有限公司 | Method, medium and equipment for implanting calculation factors |
CN112860450A (en) * | 2020-12-04 | 2021-05-28 | 武汉悦学帮网络技术有限公司 | Request processing method and device |
CN112860450B (en) * | 2020-12-04 | 2024-04-19 | 武汉悦学帮网络技术有限公司 | Request processing method and device |
CN112698878A (en) * | 2020-12-18 | 2021-04-23 | 浙江中控技术股份有限公司 | Calculation method and system based on algorithm microservice |
CN112767513A (en) * | 2020-12-31 | 2021-05-07 | 浙江中控技术股份有限公司 | Visual flow chart, event synchronous configuration tool and flow chart drawing method |
CN114866514A (en) * | 2022-04-29 | 2022-08-05 | 中国科学院信息工程研究所 | Multi-user data flow control and processing method, device, equipment and medium |
CN114866514B (en) * | 2022-04-29 | 2023-02-28 | 中国科学院信息工程研究所 | Multi-user data flow control and processing method, device, equipment and medium |
CN114924877B (en) * | 2022-05-17 | 2023-10-17 | 江苏泰坦智慧科技有限公司 | Dynamic allocation calculation method, device and equipment based on data stream |
CN114924877A (en) * | 2022-05-17 | 2022-08-19 | 江苏泰坦智慧科技有限公司 | Dynamic allocation calculation method, device and equipment based on data stream |
CN115617533B (en) * | 2022-12-14 | 2023-03-10 | 上海登临科技有限公司 | Process switching management method in heterogeneous computing and computing device |
CN115617533A (en) * | 2022-12-14 | 2023-01-17 | 上海登临科技有限公司 | Process switching management method in heterogeneous computing and computing device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110287016A (en) | A kind of distribution flow chart Heterogeneous Computing dispatching method | |
Zhang et al. | CODA: Toward automatically identifying and scheduling coflows in the dark | |
CN105786611A (en) | Method and device for task scheduling of distributed cluster | |
CN105975261B (en) | A kind of runtime system and operation method called towards unified interface | |
US8538793B2 (en) | System and method for managing real-time batch workflows | |
CN110825535A (en) | Job scheduling method and system | |
US11556362B2 (en) | Robotic process automation system with device user impersonation | |
CN109614227A (en) | Task resource concocting method, device, electronic equipment and computer-readable medium | |
CN109814992A (en) | Distributed dynamic dispatching method and system for the acquisition of large scale network data | |
CN109978474A (en) | A kind of Work-flow control method and Work-flow control device | |
CN107733739A (en) | Credible strategy and the System and method for of report are managed in visualization concentratedly | |
CN108228330A (en) | The multi-process method for scheduling task and device of a kind of serialization | |
CN112948723A (en) | Interface calling method and device and related equipment | |
Demirbaga et al. | Autodiagn: An automated real-time diagnosis framework for big data systems | |
Wang et al. | Test case prioritization for service-oriented workflow applications: A perspective of modification impact analysis | |
CN111913784B (en) | Task scheduling method and device, network element and storage medium | |
Kehrer et al. | TASKWORK: a cloud-aware runtime system for elastic task-parallel HPC applications | |
CN108595156A (en) | A kind of batch processing method and system based on Impala components | |
CN105281962B (en) | One kind realizing network management performance acquisition system and its working method based on parallel pipeline | |
Gerhards et al. | Provenance opportunities for WS-VLAM: an exploration of an e-science and an e-business approach | |
Mukunthan et al. | Multilevel Petri net‐based ticket assignment and IT management for improved IT organization support | |
US20100122254A1 (en) | Batch and application scheduler interface layer in a multiprocessor computing environment | |
US8402465B2 (en) | System tool placement in a multiprocessor computer | |
CN113110935A (en) | Distributed batch job processing system | |
CN101488872B (en) | Biological information computing grid system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20190927 |
|
WD01 | Invention patent application deemed withdrawn after publication |