CN110287016A

CN110287016A - A kind of distribution flow chart Heterogeneous Computing dispatching method

Info

Publication number: CN110287016A
Application number: CN201910584305.4A
Authority: CN
Inventors: 李建雄; 李鹏; 黄仰; 黄朝燊; 郑仙侠; 陈奇
Original assignee: Wuhan Meg Information Technology Co Ltd
Current assignee: Wuhan Meg Information Technology Co Ltd
Priority date: 2019-07-01
Filing date: 2019-07-01
Publication date: 2019-09-27

Abstract

The present invention relates to field of computer technology, it is in order to which the isomery algorithm for solving to face in current distributed type assemblies job processing flow executes scheduling, data seamless circulates between isomery algorithm, parallel task partition by fine granularities, the problems such as stream data dynamic is handled, a kind of distributed flow chart Heterogeneous Computing dispatching method proposed, it is configured including customized calculation flow chart, centralization management and running calculation process, mixed scheduling is carried out to isomery algorithm, input to calculation procedure in flow chart, output data is managed and forms stream data treatment mechanism, multilingual isomery algorithm access SDK is provided simultaneously, configuration executes data seamless circulation problem between Preprocessing Algorithm program processing isomery algorithm, configuration executes Preprocessing Algorithm program and carries out partition by fine granularities etc. to parallel task, it constitutes under a set of distributed type assemblies work pattern to isomery The scheduling of algorithm calculation process executes the complete solution of management.

Description

A kind of distribution flow chart Heterogeneous Computing dispatching method

Technical field

The present invention relates to field of computer technology more particularly to a kind of distributed flow chart Heterogeneous Computing dispatch deal sides Method, specifically a kind of schedule management method of big data isomery algorithm process process.

Background technique

Recently as the diversification in data acquisition source, the demand that every profession and trade handles big data is increasingly apparent, packet The data production for including spatial data production field is required substantially by multiple processes, carries out procedure using various software algorithm Data mart modeling, big for data creation data amount, process flow is complicated, the diversified feature of handling implement, needs using stream The PC cluster mode of journey improves overall data production efficiency, and the practicability of PC cluster and feasibility are mainly reflected in To the scheduling aspect of workflow management, current existing calculating Scheduling Framework is had the disadvantage that

It is supported 1. being based only upon flow chart and providing logic control scheduling, scheduling granularity is only limitted to flowchart elements node, can not be right The progress of flowchart elements node is more fine-grained to execute division, and generally requires in real data treatment process to single process Figure unit carries out parallel processing, and need to carry out dynamic control to each parallel task.

2. scheduling flow is that current process unit end of run executes next or multiple relevant flow elements, nothing again Method supports dynamic streaming task scheduling.

3. real data production process in using to tool algorithm have the automation journey run in isomerous environment Sequence, the human-computer interaction of single machine class, C/S model cooperation class tool algorithm, and current Scheduling Framework has strict demand to algorithm, only Can operate in Linux, PC, virtual machine some single environment program therein, can not accomplish to whole flow process or single stream Different type tool algorithm in Cheng Danyuan carries out mixed scheduling.

4. there is no between stream compression offer support, the cluster based on Scheduling Framework exploitation algorithm for frame itself Computing system can only meet single specific process application scenarios, cannot achieve the dynamic expansion of system processing scene ability, calculate Method developer not only needs to learn the development technique of relevant framework, is transformed access to algorithm, it is also necessary to data conversion And entire data flow is managed, and increases the technical threshold that algorithm development personnel carry out PC cluster processing exploitation.

In conclusion there is presently no the distributed flow chart of complete set, universality scheduling calculation methods to meet reality Complicated, polynary application scenarios in the production of border data.

Summary of the invention

To solve the above-mentioned problems, the present invention provides a kind of distributed flow chart Heterogeneous Computing dispatching method and frame, It is intended that a set of distributed flow chart Heterogeneous Computing Scheduling Framework and system are provided, to solve process tune in the prior art Degree Control granularity degree of refinement is not high, not can be carried out streaming task data processing, cannot be to isomery algorithm mixed scheduling and different Between structure algorithm data can not seamless automation circulation the problem of.

To achieve the above object, a kind of distributed flow chart Heterogeneous Computing method provided by the invention is achieved in that

As shown in Figure 1, be according to the present invention involved in system architecture diagram, distributed flow chart Heterogeneous Computing mainly includes process Figure calculates configuration module 101, task schedule micro services 102, PC cluster resource management service 103, Hadoop task execution and guards Six parts such as process 104, PC task execution finger daemon 105, third party's algorithm access SDK 106 are constituted, wherein distributed The planning of flow chart Heterogeneous Computing and the specific providing method for executing scheduling are as follows:

Configuration module, which is calculated, by flow chart carries out engineering creation to meet multi-tenant requirement, according to real data process flow base In the engineering visioning procedure figure.As shown in Fig. 2, be according to the present invention involved in distributed heterogeneous flowchart configuration structure Scheme, strong dependence 202(solid arrow can be configured between the flow elements 201 of the flow chart) and weak dependence 203(void Line arrow), the strong dependence 202 is embodied in flow elements 203 only associated with target flow elements 204 and has executed At could performance objective flow elements 204, as long as the weak dependence 201 be embodied in it is associated with target flow elements 205 Flow elements 201 start execute can performance objective flow elements 205, the flow elements can customize configuration one processing Calculate that the factor 207 and multiple pretreatments calculate the factor 206, post-processing calculates the factor 208, the pretreatment calculating factor and described Post-processing, which calculates the factor, can only configure a tool algorithm, and the processing, which calculates the factor, can configure multiple tool algorithms, described Calculate and can configure strong 209 between the factor, weak 210 rely on incidence relation, the strong dependence 210 be embodied in only target calculate because A upper calculating factor 211 for son 207 completes ability performance objective and calculates the factor 207, and the weak dependence 209 is embodied in only The upper calculating factor 206 for wanting target to calculate the factor 211 has output can the performance objective calculating factor 211.It can be stream Cheng Danyuan configuration condition control valve 212, the condition valve can custom-configure condition threshold values, and calculation procedure can be in operation knot Condition threshold values is set after beam to execute the process of the condition of satisfaction.

Third party's algorithm access SDK is the secondary development SDK that software algorithm is linked under Scheduling Framework, and the third party calculates Method access SDK provide so, dll, jar development library to support to run on isomery under the isomerous environments such as Windows, Linux, virtual machine The access of algorithm.Third party's algorithm access SDK is responsible for the initial work and task schedule micro services, system of processing access cluster The cluster centers service of the dependences such as one user login carries out RPC interface communication and externally provides algorithm routine access platform Second development interface, so that the platform service attributes such as cluster access, task flow management, data stream management is saturating to the algorithm of access Bright, algorithm routine, which only needs to carry out a small amount of transformation based on simple SDK interface, just can be linked into the calculating scheduling of platform, master Want providing method as follows:

1. providing a kind of general file description scheme body to describe as the agreement for identifying file between isomery algorithm, institute Stating general file description scheme body includes file type, the algoritic module coding for generating this document, the customized log-on data of platform Five type, file absolute path, custom parameter attributes.Wherein, it is file, file that the file type, which is used to tab file, Folder, the algoritic module coding for generating this document, which is used to mark this document, to be generated by that algorithm.The platform is made by oneself Adopted log-on data type is the unique code of data type registered to platform, and platform provides data type register interface, can be to flat Platform registers specified data type, and platform is that the data type generates a global exclusive identification code, and algorithm routine can be with Determined by judging the customized log-on data type of platform this document whether be specified data type to be dealt with file. The file absolute path is the file absolute path that algorithm routine can be directly read.The custom parameter attribute is used to access The certain custom parameters for needing to transmit in algorithm implementation procedure, such as the file description scheme body describe a tiff Image, then can be by information such as the image capturing range of the tiff image, seven parameters according to assigning after customized data structure organization It is worth and carries out service parameter transmitting to the custom parameter attribute.

2. providing the interface of setting data source, the data source is described with general file description scheme body, it then follows logical With the use protocol criteria of file description scheme body, the data source as entire scheduling flow input data source to all meters Calculate the factor as it can be seen that and support the increment addition of data source, when the data source is local data, the setting data source is connect Local data is first committed to the specified catalogue of server by mouth, then submits data source to platform.It also provides simultaneously for obtaining number According to the interface in source, and the call back function registration interface for the variation of monitored data source is provided, calculation procedure can be by registering back Letter of transfer number carrys out the variation of monitored data source to have the dynamic processing capacity in streaming data source.

3. providing the interface of setting output data, the output data is one or more " task encodings-data list " Map set, wherein task encoding is transparent to the calculation procedure of access, and the data in the data list are described with general file Structural body is described, it then follows the use protocol criteria of general file description scheme body, the output data are to calculate the factor Output data, the output data is as the input data for being associated with the next or multiple calculating factor for calculating the factor. It providing simultaneously and obtains the specified interface for calculating factor input data, the input data is upper one output data for calculating the factor, The data are described with general file description scheme body, it then follows the use protocol criteria of general file description scheme body.Together When provide and register call back function registration interface for monitoring the input data, calculation procedure can be registered by the interface Call back function monitors the variation of input data to have the dynamic processing capacity of convection type input data.

4. providing the interface of setting global data, the global data is described with general file description scheme body, is abided by Follow the use protocol criteria of general file description scheme body.The interface for obtaining global data is provided simultaneously, and the global data is used General file description scheme body is described, it then follows the use protocol criteria of general file description scheme body, the global data The factor is calculated as it can be seen that i.e. all calculating factors can be got by the interface for obtaining global data to all in flow chart The global data, suitable for the stream compression between the calculating factor without the relationship of direct correlation in certain data streams journey figures.Together When provide and register interface for monitoring global data call back function, calculation procedure can be moved by the registration call back function State monitors the variation of global data and carries out corresponding business processing, so that it is global so that the calculating factor is had dynamic processing streaming The ability of data.

5. providing the interface for submitting Hadoop parallel computation task, the submission Hadoop parallel computation task interface master It is used for calculation procedure and fine-grained processing unit division is carried out to the derived data got, and can refer to for each processing unit Fixed different Hadoop algorithm routine, and ready-portioned processing unit task data structure is committed to task schedule micro services clothes Business is to execute Hadoop parallel task.

6. providing the interface for submitting PC parallel computation task, the interface for submitting PC parallel computation task is mainly used for Calculation procedure carries out fine-grained processing unit division to the derived data got, and can be specified not for each processing unit Same Windows algorithm routine, and ready-portioned processing unit task data structure is committed to task schedule micro services service To execute PC parallel task.

In addition to above-mentioned critical function interface, third party's algorithm access SDK setting condition threshold values is also provided, to platform feed back into The information such as degree, task run state, the interface that the business algorithms such as initialization, resource release may need to use.

The management and running center that task schedule micro services are executed as entire calculating task, is served in the form of micro services Single engineering, i.e. an engineering will start a task schedule micro services to carry out the scheduling of the flow chart Heterogeneous Computing of the engineering Management, it is as follows that the task schedule micro services handle the distributed heterogeneous process for calculating scheduling:

1. task schedule micro services calculate the flow chart that configuration module configuration generates based on flow chart and execute configuration template, initially Change Heterogeneous Computing scheduling model, creates corresponding server-side engineering bibliographic structure according to the Heterogeneous Computing scheduling model.

2. task schedule micro services can be defaulted first executes first pretreatment factor, while by the pretreatment factor State is set in progress, in the status set progress of flow chart flow elements.Calculation procedure can pass through in the process of implementation Tripartite's algorithm accesses the interface that SDK is provided and carries out progress report to task schedule micro services, when the task schedule micro services are received To calculation procedure issue end message when, then by the calculating factor marker be completion status.Task schedule micro services are executing Next or multiple calculating factor of weak rigidity therewith, task schedule micro services can be executed after the pretreatment calculating factor The strongly connected next therewith or multiple calculating factors could be executed after the end message for receiving the pretreatment and calculating the factor, Wherein the calculating factor includes pretreatment, processing, the post-processing calculating factor.When all calculating factor knots in flowchart elements Shu Shi, task schedule micro services complete the status set of the flowchart elements.

3. then generating the pretreatment factor when the calculating factor of task schedule micro services scheduling is Hadoop program Perform script, and the task is committed to Hadoop task execution finger daemon to execute, when the pretreatment factor is When PC program, then the task is committed to PC cluster resource management service, PC cluster resource management service passes through existing PC collection Group's computing resource carries out resource allocation, and the task is pushed to specified PC task execution finger daemon to execute the calculating The factor.

4. task schedule micro services, which receive and save, calculates the factor to task schedule micro services submission data source data, stream All calculate nodes in journey figure can take out the data source number by the acquisition data resource interface that third party's algorithm accesses SDK According to.Task schedule micro services, which receive and save, calculates the output data that the factor is submitted, and is associated with the calculating factor of the calculating factor The output data can be got by the interface that third party's algorithm accesses the acquisition input data that SDK is provided.Task schedule is micro- Service, which receives and saves, calculates the global data that the factor is submitted, and all calculating factors are accessed by third party's algorithm in flow chart The global data interface that SDK is provided can get the global data.

5. task schedule micro services provide a kind of processing parallel computation scheduling mechanism, to need to carry out in flowchart elements Configure pretreatment before the treatment factors of parallel computation and calculate the factor to carry out parallel task planning, the pretreatment calculating because Subbase carries out task rule in derived data (note: derived data includes data source data, input data, global data in patent) It draws, the task may include Hadoop parallel task and PC parallel task, that is, support to the same flowchart elements using different The algorithm of structure carries out mixed processing.As shown in figure 3, be according to the present invention involved in parallel task partition mechanism schematic diagram, stream Journey figure unit 301 is configured with that pretreatment calculates the factor 302 and a processing calculates the factor 303, the pretreatment calculating because Son 301 is configured with a preprocessor 304, and the processing calculates the factor 303 and is configured with Hadoop program 1, Hadoop 308 4 kinds of program 2 306, PC program 307, PC program Heterogeneous Computing programs, task schedule micro services are executing the flow chart When unit 301, a calculating State of factors relied in the factor 302 can be calculated according to the pre- processing and is counted to execute the pretreatment The factor 302 is calculated, to start preprocessor 304, the preprocessor 304 is micro- from task schedule by input data interface Service acquisition is incited somebody to action to input data 309,310,311,312, and according to own service scene (task handles a data) The input data is divided into four tasks, and each task respectively specifies that different algorithm routines is handled, according to described The algorithm routine type of business configuration, treatment factors 302, which call, submits the interface of Hadoop parallel computation to task schedule micro services Hadoop parallel computation task 313,312 is submitted, the interface for submitting PC parallel computation task is called to mention to task schedule micro services Hand over PC parallel computation task, task schedule micro services receive it is described pretreatment calculate the factor submit parallel task after, to appoint Business carries out execution process flow, is carried out to same flowchart elements using different isomery algorithms to realize and is calculated place Reason.On the other hand, data required for each affairs of each algorithm routine processing use software user transparence, software Person is not necessarily to pay close attention to the data of algorithm routine needs, it is only necessary to which oriented mission carries out operation, and algorithm routine can be logical when starting It crosses dispatching platform and loads the data needed automatically.

PC cluster resource management service is responsible for management and participates in the PC cluster resource of PC cluster, PC task execution guard into GPU, CPU, memory computing resource information where journey is reported to PC cluster resource management service on PC node, PC cluster resource pipe Reason service receives and saves the computing resource information.When PC cluster resource management service is received from task schedule micro services When the PC parallel computation task of submission, PC cluster resource management service calculates configuration module acquisition to flow chart by task ID and holds The computing resource parameter that the calculation procedure and the calculation procedure of the row task need, is based on the computing resource parameter and institute It states computing resource information and carries out Dynamic Programming distribution, task is pushed to the PC task execution finger daemon of specified PC node, and Receive the task execution information feedback from PC task execution finger daemon.When task execution terminates, PC cluster resource management is taken Business discharges the PC node resource, and the PC node resource is participated in new task resource planning.It is lost when connecing task execution It loses, the PC task that the task can be planned again and be pushed on another PC node by PC cluster resource management service It executes finger daemon to be re-executed, when maximum times of the task more than setting, then stops the failure weight of the task Try process.

PC task execution finger daemon is the PC finger daemon with interface, and it includes automatic algorithms journey that PC cluster task, which can be performed, Sequence and human-computer interaction program.PC task execution finger daemon receive PC cluster resource management service push task after, according to Task is executed according to the classification of the calculation procedure of the task execution.If the classification of the calculation procedure is automatic attribute, PC Cluster resource management service automatically generates perform script, and the script logging, which executes, to be put down required for the automation algorithm of task The information such as platform parameter, input data, working directory, and opened the script file as the start-up parameter of the automation algorithm Dynamic operation automation algorithm, the whole detection automation algorithm running state of process during algorithm operation, if detecting automation Algorithm process exception exits or time-out, then marks the task execution to fail, and the execution state of the task is fed back to PC Cluster resource management service executes subsequent abnormality processing.If the classification of the calculation procedure is passive attribute, described will appoint Business is shown to task list, and when operating personnel gets and starts to execute the task, PC task execution finger daemon, which generates, to be used In the perform script for executing the passive human-computer interaction generic task, the perform script has recorded the specified man-machine friendship of execution task The information such as platform parameters, input data, working directory required for mutual algorithm routine, using the script file as described automatic Change the start-up parameter starting operation automation algorithm of algorithm.

Hadoop task execution finger daemon is based on MapReduce, Yarn and executes Hadoop calculating task, Hadoop task The Hadoop calculating task that finger daemon receives the submission of task schedule micro services is executed, judges the Hadoop calculating task Task category then passes through MapReduce if it is unique calculating task and executes the unique calculating task, and is based on Zookeeper to the unique calculating task carry out uniqueness guarantee, keep heartbeat detection, when detect the unique appoint After the heartbeat timeout of business, then processing is retried to what the unique task carried out configurable number.Then if it is simultaneously calculating task The parallel computation task is executed by MapReduce, when parallel computation task execution failure, then to described parallel What business carried out configurable number retries processing.The distribution of cluster resource provides guarantee by Yarn.

Distributed computing process is custom-configured since the present invention uses, and flowchart elements node is finely divided, is Flow chart node configuration pretreatment, processing, post-processing calculate the factor, can configure different types of algorithm journey to the calculating factor Sequence.Task schedule micro services are based on custom-configuring template and being scheduled entire calculation flow chart.Using running on Linux Hadoop task execution finger daemon and run on the heterogeneous system framework of PC task execution finger daemon to realize pair The execution and offer fault-tolerant processing of algorithm routine under Hadoop, PC isomerous environment.PC is realized by PC cluster resource management service The dynamic allocation of cluster resource.On the other hand, the SDK that distributed heterogeneous computing platform is linked into for algorithm routine is provided, it is fixed The general data description scheme body that justice is identified for stream compression between algorithm is established with platform by way of RPC interface and is communicated Connection, algorithm routine can request derived data source to computing platform, output data are provided, submits parallel task, report shape State progress, thus at least obtain it is following the utility model has the advantages that

1. operation flow is separated with scheduling flow is calculated, allows and calculate scheduling flow in algorithm routine level independently of industry Process of being engaged in carries out flexible configuration.According to the customized flow chart calculation template of practical business scene, flowchart elements are refined, The factor can be calculated with a processing and multiple pretreatments, post-processing to flowchart elements, calculate the factor and configure calculation procedure, thus Whole flow process execution is set to become very flexibly, to form the incidence relation structure of " flowchart elements-calculating factor-calculation procedure ". The concept for introducing the calculating factor decouples flowchart elements and calculation procedure, when handling the different data of same operation flow It only needs the calculating factor to flowchart elements to be adjusted, replaces different pretreatments, post processor, without to whole A flow chart calculation template is adjusted.

2. realizing the data automatic flow between isomery algorithm routine.The business processing program in the factor can be calculated in processing Configuration pretreatment post processor, so that the business processing program decoupling between flow elements, the pretreatment post-process journey up and down Sequence is based on third party's algorithm and accesses SDK development library, bridges the processing routine of flowchart elements up and down, solves and calculates in flow chart Stream compression and adaptation issues in scheduling between isomery algorithm routine, algorithm routine no longer need the algorithm routine according to upstream and downstream Carry out adaptation adjustment, it is only necessary to dynamic configuration pretreatment, post processor between isomery algorithm.

3. realizing in whole flow process or single flow elements and running on automated procedures, single machine class in isomerous environment Mixed scheduling between human-computer interaction, C/S model cooperation class tool algorithm.It, can be with due to using system architecture as shown in Figure 1 Task based access control executes the isomery algorithm under Linux, Windows environment, to realize the collaborative work between multiplexing tool.

4. pair flow chart cell node carries out more fine-grained parallel processing.The processing of flowchart elements node calculate because Son is supported to configure a variety of different calculation procedures, the factor can be calculated for the processing and configures one for dividing parallel task Pretreatment calculates the factor and, according to practical business scene, data is carried out after the pretreatment calculating factor gets input data More fine-grained task divides, and can specify the calculation procedure for executing the task, submits to task schedule micro services parallel Calculating task carries out more fine-grained parallel processing to flow chart cell node to reach.

5. supporting streaming task operation.Preprocessor can access the data decryptor mechanism of SDK by third party's algorithm Carry out the data that dynamic acquisition continually inputs, is mentioned after newly-increased data are carried out parallel task division after receiving new data Parallel computation task is handed over, to realize the streaming task operation of streaming data.

6. versions of data is controllable.Distributed flow chart heterogeneous computing system has recorded each in carrying out implementation procedure The information of " task-data ", the stream compression in entire process flow have important by platform courses in data production Meaning.

7. reducing the exploitation threshold of calculation procedure access cluster flow scheduling.Distributed flow chart Heterogeneous Computing dispatches frame Frame is transparent to calculation procedure by general-purpose attributes such as access cluster, stream compression, node state schedulings, and calculation procedure only needs to pay close attention to business Processing itself, without the affairs other than processing business itself.

It should be appreciated that above-mentioned general description and following specific embodiments are merely illustrative and illustrative, It can not limit the range of the invention to be advocated.

Detailed description of the invention

Below in conjunction with attached drawing, the invention will be described in further detail.

Attached drawing 1 be according to the present invention involved in system architecture diagram.

Attached drawing 2 be according to the present invention involved in distributed heterogeneous flowchart configuration structure chart.

Attached drawing 3 be according to the present invention involved in parallel task partition mechanism schematic diagram.

Attached drawing 4 is a kind of the excellent of distributed flow chart Heterogeneous Computing dispatching method that the specific embodiment of the invention provides Exemplary schematic diagram is implemented in choosing.

Attached drawing 5 is the inside configuration schematic diagram that flowchart elements 1 in exemplary schematic diagram are preferably implemented in the present invention.

Attached drawing 6 is the inside configuration schematic diagram that flowchart elements 2 in exemplary schematic diagram are preferably implemented in the present invention.

Attached drawing 7 is the inside configuration schematic diagram that flowchart elements 5 in exemplary schematic diagram are preferably implemented in the present invention.

Specific embodiment

Understand in order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below will with attached drawing and in detail Narration clearly illustrates the spirit of disclosed content, and any skilled artisan is understanding the content of present invention After embodiment, when the technology that can be taught by the content of present invention, it is changed and modifies, without departing from the essence of the content of present invention Mind and range.

The illustrative embodiments of the present invention and their descriptions are used to explain the present invention, but not as a limitation of the invention.

As shown in figure 4, being a kind of distributed flow chart Heterogeneous Computing dispatching method that the specific embodiment of the invention provides The exemplary schematic diagram of preferred implementation, according to practical business scene flow chart calculate configuration module in define specific business Process flow, and each flowchart elements are carried out to calculate factor configuration.

As shown in figure 5, be the inside configuration schematic diagram that flowchart elements 1 in exemplary schematic diagram are preferably implemented in the present invention, Flowchart elements 401 are configured with 503 3 pretreatment 501, pretreatment 502 and processing calculating factors in Fig. 4, and are the pre- place Reason calculates the factor 501 and is configured with calculation procedure Soft1-1, and the pretreatment calculates the factor 502 and is configured with calculation procedure Soft1- 2, the processing calculates the factor 503 and is configured with calculation procedure Soft1-3.

Task schedule micro services default first pretreatment in execution flow chart and calculate the factor 501, due to calculation procedure Soft1-1 belongs to passive matrix attribute, and is operate in the window application on PC, and task schedule micro services only will meter It calculates State of factors to be labeled as having started, is to have started by 401 status indication of flowchart elements.PC task execution finger daemon is synchronous The scheduling flow template of task schedule micro services, display is configured with the flow chart node of PC program on interface.It is flowed when first When configuration has the PC calculation procedure of passive matrix attribute on journey node of graph, it can trigger to execute generating one by right mouse button The perform script of public simultaneously starts corresponding calculation procedure, and if the configuration of pretreatment 501 in Fig. 5 meets above-mentioned condition, PC appoints Business executes finger daemon and generates perform script and start Soft1-1, and the Soft1-1 is the human-computer interaction journey with interface Sequence, local data are packaged by the Soft1-1 using the data structure of general file description scheme body, and call third The setting data resource interface that square algorithm access SDK is provided, submits data source Data1-1 to platform, it should explanation, the present invention The data mentioned in specific embodiment include structure and unstructured data, and third party's algorithm accesses the setting data source of SDK Data source Data1-1 is uploaded in the nodal directory that server distributes by interface, and will be in generic file structure body The path Windows switchs to the path Linux.The data source Data1-1 belongs to global data, and all calculation procedures are logical in flow chart Crossing acquisition data resource interface can get according to source Data1-1.After submitting data source, send terminates to disappear the Soft1-1 It ceases and gives task schedule micro services, pretreatment is calculated State of factors and is labeled as having terminated by task schedule micro services.

Belong to strong dependence since pretreatment calculates the factor 502 and pre-processes the calculating factor 501, so ought only locate in advance Reason just starts execution pretreatment and calculates the factor 502 when calculating the execution completion of the factor 501, task schedule micro services mark to be located in advance Reason calculates the factor 501 and calculates the factor 502 after completion status, to start to execute pretreatment.Pretreatment calculates the meter that the factor 502 configures Calculating program Soft1-2 is the preprocessor for running on Hadoop, and task schedule micro services generate starting script and start The calculation procedure Soft1-2, the calculation procedure Soft1-2 obtain data source by calling third party's algorithm to access SDK The data source Data1-1 is divided into multiple data blocks by the business demand that interface gets according to source Data1-1, according to itself Data1-2-1 to Data1-2-n, and as unit of each data block generate " task-data " parallel computation task data knot Structure gets the Software Coding that next calculating factor is configured by interface, and " task-data " interface by described in is customized Execution software distribution is carried out, the incidence relation of " task-data-software " is formed, is only configured with since processing calculates in the factor 503 One runs on the Soft1-3 program of Hadoop, therefore the preprocessor Soft1-2 is by calling the access of third party's algorithm Ready-portioned task is committed to task schedule micro services and executed by the interface of the submission Hadoop parallel task of SDK, task schedule Micro services generate perform script after receiving the Hadoop parallel task of submission, and are opened by Hadoop task execution finger daemon Dynamic parallel computation task.Each process of the Soft1-3 can access the acquisition input data that SDK is provided by third party's algorithm Interface get and execute data block associated by each carry out task, after handling data block, call third party's algorithm Access the output data Data1-3-1 to Data1-3- that the interface of SDK setting output data is generated to platform output calculation procedure N calls the end interface notification task schedule micro services calculating of third party's algorithm access SDK to terminate after the completion of calculating.Task tune Degree micro services summarize the end notification of each task, at the end of all parallel tasks all terminate, and upper one pre-processes, then It marks calculating factor nodes execution to terminate, terminates when all calculating factors in flowchart elements calculate, then mark flow chart Unit, which executes state, to be terminated.

As shown in fig. 6, being the inside configuration schematic diagram that flowchart elements 2 in exemplary schematic diagram are preferably implemented in the present invention. Flowchart elements 402 are configured with pretreatment and calculate the factor 601 and processing calculating 602 two calculating factors of the factor in Fig. 4, pre- to locate Reason calculates the factor 601 and is configured with calculation procedure Soft2-1, processing calculate the factor 602 be configured with Soft2-2, Soft2-3, Tri- calculation procedures of Soft2-4.Calculation procedure Soft2-2, Soft2-3, Soft2-4 constitute the module collection an of C/S model It closes, wherein the calculation procedure Soft2-2 is operate in the serve end program of Hadoop, the calculation procedure Soft2-3, Two different client human-computer interaction programs that Soft2-4 is operate on Windows.

Since flowchart elements 402 in Fig. 4 and flowchart elements 401 belong to weak dependence, as long as flow chart list Processing in member 401 calculates can execution flow chart unit 402 when having output data in the factor.The starting of task schedule micro services The preprocessor Soft2-1 of Hadoop is run on, and pretreatment node state is labeled as operating status, by flowchart elements 402 are labeled as operating status.The preprocessor Soft2-1 is by calling data resource interface to get data source, and from number The input data Data2-1-1 needed according to starting calculation procedure Soft2-2 is taken out in source, generates the number of " task-data-software " Parallel computation task is submitted to task schedule micro services according to structure, and the parallel computation task type is belonged to labeled as unique Property, indicate that the calculating task uniquely exists in platform.The preprocessor Soft2-1 monitors input data by registration Call back function come dynamically obtain the stream data exported in flowchart elements 401, foundation after receiving new input data The input data is divided into fine-grained task by practical business demand, and the client of the execution task is specified to calculate journey Sequence Soft2-3, Soft2-4 is called immediately and the interface of PC parallel task is submitted to submit PC parallel task to task schedule micro services. Since calculation procedure Soft2-3, Soft2-4 are to need to carry out artificially to get task to start to execute, also needing will be described PC parallel task data markers be passive attribute, task schedule micro services check the PC parallel task be passive attribute when, It is no longer committed to PC cluster resource management service and carries out resource allocation, but push to PC task execution finger daemon and carry out interface It has been shown that, operator can execute the Parallel PC task by start menu in PC task execution finger daemon come specified.Due to The calculation procedure configured in the processing calculating factor in flowchart elements 405 needs to use the output in calculation procedure Soft2-2 Data Data2-2-1-1, therefore calculation procedure Soft2-2 needs to call setting global data interface will in setting output data Data2-2-1-1 is set as global data.

As shown in fig. 7, being the inside configuration schematic diagram that flowchart elements 5 in exemplary schematic diagram are preferably implemented in the present invention. Flowchart elements 405 are configured with a pretreatment and calculate the factor 701 and a processing calculating factor 702 in Fig. 4, and pretreatment calculates The factor 701 is configured with the preprocessor Soft5-1 for running on Hadoop, and the processing calculating factor 702, which is configured with, to be run on Tri- kinds of different calculation procedures of Soft5-2, Soft5-3, Soft5-4 on Windows.

Since flowchart elements 405 and flowchart elements 403, flowchart elements 404 belong to strong dependence, only When the flowchart elements 403 and the flowchart elements 404 are all completed, task schedule micro services just start execution flow chart Unit 405.Task schedule micro services start the preprocessor Soft5-1 for running on Hadoop, the preprocessor Soft5-1 is by calling input data to obtain the output data of a upper flowchart elements, and the interface by obtaining global data Obtain the global data Data2-2-1-1 exported in flowchart elements 402.What the preprocessor Soft5-1 will acquire Data source carries out fine-grained " task-data " and divides, and according to service conditions that ready-portioned task is specified by certain rule It is executed by calculation procedure Soft5-2, Soft5-3, Soft5-4.After task schedule micro services receive the task, detect The task is to execute the task of PC PC cluster, and the task is then committed to PC cluster resource management service, PC cluster Resource management service matches PC cluster resource information to task Dynamic Programming according to from the registration of PC task execution finger daemon, and The task is pushed to the PC task execution finger daemon of computing resource.PC task execution finger daemon receives task Afterwards, it generates perform script and starts corresponding calculation procedure.If after the task execution failure, PC task execution finger daemon 4 tasks can be re-executed, after being retried unsuccessfully if 4 times if the task fed back into PC cluster resource management service, PC Another calculate node that the task can be reassigned to available free resource by cluster resource management service is attempted, such as Fruit still fails, then terminates and retry process and be failed tasks by the task flagging.

Flowchart elements 406 and flowchart elements 405 belong to weak dependence in Fig. 4, when flowchart elements 406 have output When task schedule micro services begin to execution flow chart unit 406.The flowchart elements 406 are provided with condition control valve 409, The condition control valve 409 is configured with 1 and 2 two condition threshold values, and the algorithm routine that node is handled in flowchart elements 406 can be with Threshold values setting is carried out according to own service executive condition, if it is 1 that threshold values, which is arranged, to task schedule micro services, by flowchart elements 406 output is directed toward flowchart elements 402 and is handled again affiliated output, if being to task schedule micro services setting threshold values 2, then flowchart elements 408 are directed toward in the output of flowchart elements 406 and handle.Flowchart elements 407 and flowchart elements 405 belong to strong dependence, and flowchart elements 405 is needed just to start execution flow chart unit 407, flowchart elements when completing 408 belong to weak dependence with flowchart elements 406, and flowchart elements 408 and flowchart elements 407 belong to strong dependence, Therefore it needs that execution flow chart unit 408 could be started after the completion of flowchart elements 407.

Claims

1. a kind of distribution flow chart Heterogeneous Computing dispatching method, general characteristic includes: to custom-configure distributed heterogeneous meter Calculate scheduling flow figure, management is scheduled to calculation process with the scheduling micro services of centralization, in conjunction with Hadoop mission planning with PC mission planning service carries out reasonable resource distribution to parallel computation task and by running on Linux, Windows isomerous environment Under finger daemon receive and execute isomery algorithm, the data seamless automatic flow between isomery algorithm, streaming data is flowed The dynamic of formula task is handled, for accessing the second development interface etc. of distributed heterogeneous computing platform.

2. one kind according to claim 1 custom-configures distributed heterogeneous calculating scheduling flow figure, feature includes: certainly Business process map is defined, and the factor can be calculated to the multiple pretreatments of flowchart elements configuration, processing, post-processing；It is wherein each pre- Processing, post-processing, which calculate the factor, can configure a calculation procedure, and processing, which calculates the factor, can configure a variety of different types of calculating Program；Strong, weak rigidity relationship can be configured between flowchart elements, can configure strong, weak rigidity relationship between the calculating factor；Flow chart Condition control valve can be set to be scheduled flow chart branches decision in unit.

3. strong, weak rigidity relationship according to claim 2, feature includes: only to flow between strongly connected flowchart elements The flowchart elements being associated could be executed after the completion of journey figure unit, as long as flowchart elements between the flowchart elements of weak rigidity There is output that can execute the flowchart elements being associated, only calculating the factor between the strongly connected calculating factor can just hold after the completion The calculating factor that is associated of row, between the calculatings factor of weak rigidity as long as calculate the factor have export as long as can execute the meter being associated Calculate the factor.

4. it is according to claim 1 it is a kind of management method is scheduled to calculation process with the scheduling micro services of centralization, It is characterized by: a calculation flow chart runs a scheduling micro services, matched by customized distributed heterogeneous calculation flow chart Template initialization scheduling model is set, is executed according to dependence and calculates the factor, receive and save the output data of each calculate node, The stream compression control calculated between the factor, flowchart elements is provided, parallel computation scheduling mechanism is provided, maintains and calculates the factor and stream The execution state of journey figure unit.

5. a kind of parallel computation scheduling mechanism according to claim 4, main feature includes: to match before the parallel computation factor It sets pretreatment and calculates the factor, the pretreatment calculates the factor and is configured to divide the preprocessor of parallel computation task, described Preprocessor carries out processing minimum unit to data by the derived data obtained and divides, and " task-is generated in the form of task Data " incidence relation, and may specify the calculation procedure for executing the task, submit parallel computation task to touch to scheduling micro services Send out parallel computation.

6. data seamless automatic flow method between a kind of isomery algorithm according to claim 1, feature includes: based on logical Unify legislation is carried out to data with file structure, the adaptation journey for being used for data conversion is configured between different isomery programs Sequence, the adaptation procedure handle the output data after the output data for getting a upper algorithm routine, are converted into The data are exported to next calculating factor after next matched data format of algorithm routine.

7. a kind of streaming data according to claim 1 carries out the method for dynamically processing of streaming task, feature includes: The derived data of the factor, the dynamic acquisition of shape paired data are calculated by monitoring, and newly-increased data are carried out in the form of task " task-data " binding is committed to platform and executes the task, carries out at the dynamic of streaming task to form streaming data Reason.

8. a kind of second development interface for accessing distributed heterogeneous computing platform according to claim 1, main feature Include: that structuring, unstructured data are described using general file description scheme body, is connect based on the secondary development The calculation procedure of mouth exploitation has to platform setting output data, the visible permission of setting data, requests derived data, monitoring source Data submit parallel task, and condition threshold values is arranged, and feed back the abilities such as task execution progress and state.

9. a kind of method controlled by setting condition threshold values control calculation process according to claim 8, main Wanting feature includes: flowchart elements configuration condition control valve, and condition control valve configures condition when threshold values and threshold values triggering Branch, supports conditions threshold values is manually arranged and calculation procedure setting in calculating process, executes when meeting specified conditions threshold values pair Answer branch pointed by condition threshold values.