WO2013138982A1

WO2013138982A1 - A parallel processing method and apparatus

Info

Publication number: WO2013138982A1
Application number: PCT/CN2012/072545
Authority: WO
Inventors: 李震
Original assignee: 华为技术有限公司
Priority date: 2012-03-19
Filing date: 2012-03-19
Publication date: 2013-09-26
Also published as: CN103502941A; CN103502941B

Abstract

The present invention relates to the computer technology field, particularly relates to a parallel processing method and apparatus. The parallel processing method comprises: receiving a plurality of task processing requests, and determining the service information corresponding to the tasks according to the service identifiers carried in the task processing requests; using a plurality of task steps to perform parallel processing to said a plurality of tasks according to said service information corresponding to the tasks, wherein the number of the steps of the plurality of task steps is greater than or equal to two. The parallel processing method and apparatus in the embodiments of the present invention use a plurality of task steps to perform parallel processing to the tasks, and determine the service information according to the service identifiers to perform task processing. It is not necessary to submit the task running parameters of the user repeatedly, and the processing of a plurality of steps is simplified.

Description

Parallel processing method and equipment S technology field

The embodiments of the present invention relate to the field of computer technologies, and in particular, to a parallel processing method and apparatus. Background of the invention

With the development of the Internet, it has entered the era of information explosion, and the parallel processing of information can improve processing efficiency. Currently, more well-known parallel processing systems such as the hadoop system (which is a distributed system infrastructure) and the EMR (Elastic MapReduce) system.

However, for the parallel processing of tasks, the above Hadoop system or EMR system needs to be processed strictly according to the two steps of map and reduce. The map refers to processing the original document according to the map rule, and outputting the intermediate result. The intermediate results are merged according to the reduce rule. If the task has multiple steps (more than 2 steps), it needs to be submitted multiple times, and each time the user's task running parameters are input to complete the multiple step processing. Therefore, it is more complicated for the user to use. Summary of the invention

It is an object of embodiments of the present invention to provide a parallel processing method and apparatus that simplifies the processing of multiple steps.

In one aspect, an embodiment of the present invention provides a parallel processing method, including

Receiving a plurality of task processing requests, determining the service information corresponding to the task according to the service identification carried by the task processing, and determining, according to the service information corresponding to the task, the plurality of task steps of the Chuanchuan to perform parallel processing on the multiple tasks, The number of steps of the multiple task steps is greater than or equal to 2.

In another aspect, an embodiment of the present invention provides a parallel processing apparatus, including

a receiving unit, configured to receive a plurality of task processing requests, and determine, according to the service identifier carried in the task processing request, service information corresponding to the task;

The processing unit is configured to perform parallel processing on the multiple tasks by using multiple task steps according to the service information corresponding to the task, where the number of steps of the multiple task steps is greater than 2.

The parallel processing method and device of the embodiment of the present invention perform parallel processing on the task by using multiple task steps, and determine the service information according to the service identifier to process the task, and do not need to repeatedly submit the task operation parameters of the user, and implement multiple Simplification of the step processing. BRIEF DESCRIPTION OF THE DRAWINGS

1 is a schematic flow chart of a parallel processing method according to an embodiment of the present invention;

m 2 is a schematic structural diagram of a parallel processing device according to an embodiment of the present invention;

3 is a schematic structural diagram of an application scenario of a parallel processing device according to an embodiment of the present invention;

4 is a schematic diagram 1 of a relationship of task steps in a parallel processing method according to an embodiment of the present invention;

5 is a schematic diagram 2 of a task step relationship in a parallel processing method according to an embodiment of the present invention; 6 is a schematic diagram of a flow in a scenario of a parallel processing method according to an embodiment of the present invention. Mode for carrying out the invention

As shown in FIG. 1, an embodiment of the present invention provides a parallel processing method, including:

11. Receive multiple task processing requests, and determine service information corresponding to the task according to the service identifier (ID) carried in the task processing request.

12. Performing, in parallel, processing the multiple tasks by using multiple task steps according to the service information corresponding to the task, where the number of steps of the multiple task steps is greater than equals -2.

The plurality of task steps can be understood as processing the task by the W person in two steps.

The parallel processing method of the embodiment of the present invention uses multiple task steps to perform parallel processing on the task, and determines the service information according to the service identifier to process the task, and does not need to repeatedly submit the task running parameters of the user, and simply implements multiple steps. Overcome the defects of the Hadoop system and the EMR system, which need to be submitted multiple times, each time inputting the user's task running parameters to complete multiple steps.

In the parallel processing method of the embodiment of the present invention, the method may further include: acquiring a user-defined service definition file before receiving the multiple task processing requests.

Parsing the service definition file to obtain the service information.

A service identifier is generated, and a correspondence between the service identifier and the service information is established.

In this embodiment, the user-defined service definition file can be used as a processing template for a certain type of service, and thus serves as a basis for running multiple tasks under the service. When submitting a task, the service identifier can be provided to determine the business information. It is not necessary to input the task operation parameters each time, which reduces the operation of the user when the task is running, and is convenient for the user to use.

For the A body, the service information may include:

Task definition information: Use D to define the fault tolerance level and calculation mode of the task.

Task split information: ffl T splits the task into multiple task steps.

Task step associated information: ffl defines the order of processing between multiple task steps.

Task step information: It is used to define the running information of each task step. The line information includes: resource information, user program, and Kawago settings.

Optionally, the running information may further include: a processing mode of the multiple task steps, where the processing mode is a serial processing mode or a parallel processing mode.

When the processing mode of the plurality of task steps is the serial processing mode, after all the outputs of the previous one of the plurality of task steps pass the integrity check, the next one of the plurality of task steps is performed. Input. That is, the output of each task step is multi-output, and all the losers can pass the integrity check before they can enter the next task step as input.

When the processing mode of the plurality of task steps is the parallel processing mode, any one of the plurality of the task steps outputs directly as an input of the next one of the plurality of task steps. That is, the output of each task step is multi-output, but it does not need to play all the outputs through the integrity check apricot, and any output of a task step can be used as input to enter a task step. It can be seen that multiple task steps can be processed in parallel, which improves the processing capability, and overcomes the defects of the Hadoop system and the EMR system which require two steps and are strictly processed in sequence.

Further, the manner of processing the task by using multiple task steps according to the service information corresponding to the task may include:

The task is split into multiple task steps according to the task split information in the service information.

Obtaining a Sichuan program of the task step according to the task step information in the service information, and requesting resources for the task step.

Deleting the task according to the task step association information in the service information, calling the user program, and processing the task, and completing a plurality of the task steps according to the processing sequence between the multiple task steps. Processing.

Optionally, the parallel processing method of the embodiment of the present invention may further include:

Prioritize tasks based on their priority.

Or, adjust the priority of the task to prioritize the task of priority 髙.

The manner in which the priority of the task is adjusted may include: Priority adjustment based on the waiting time of the task and/or the completion time of the task. As shown in FIG. 2, in accordance with the parallel processing method of the foregoing embodiment, an embodiment of the present invention provides a parallel processing apparatus, including: a receiving unit 21, configured to receive a plurality of task processing requests, which are carried according to the task processing request. Service identifier (ID), which determines the service information corresponding to the task;

The processing unit 22 is configured to perform parallel processing on the multiple tasks by using multiple task steps according to the total service information corresponding to the task, where the number of steps of the multiple task steps is greater than equals.

The parallel processing device of the embodiment of the present invention adopts multiple task steps of the HJ to perform parallel processing on the task, and determines the service information according to the service identifier to process the task, and does not need to repeatedly submit the task running parameters of the HJ household, and simply implements The multi-step processing overcomes the defects of the Hadoop system and the EMR system that need to be submitted multiple times, each time inputting the task of the user to complete the processing of multiple steps.

The parallel processing device of the embodiment of the present invention may further include:

The obtaining unit is configured to obtain a service definition file defined by the user Q.

The parsing unit is configured to parse the service definition file, obtain the service information, generate a service identifier, and establish a correspondence between the service identifier and the service information.

And a storage unit, configured to store a correspondence between the service identifier and the service information.

Specifically, the business information may include:

Task definition information: Used to define the fault tolerance level and calculation model of the task.

Task split information: Used to split a task into multiple task steps.

Task step information: Use D__ to define the line information of each task step. The line information includes: resource information, ffl household program, and Sichuan household design. Further, the processing unit 22, the body can be chuanding -:

Obtaining an HJ user program of the task step according to the task step information in the service information, and requesting resources for the task step.

And according to the task step related information in the service information, the resource that is requested, the W user program is processed, and the task is processed, and the plurality of the task steps are completed according to the processing sequence between the multiple task steps. Processing.

Optionally, the running information may further include: a processing manner of multiple task steps, the processing mode is a serial processing mode or a parallel processing mode, and the processing unit 22 may specifically:

^ When the processing mode of the plurality of task steps is the row processing mode, after all the outputs of the previous one of the plurality of task steps pass the integrity check, as the next one of the plurality of task steps Input.

When the processing mode of the plurality of task steps is the parallel processing mode, any one of the plurality of the task steps is directly input as the input of the next one of the plurality of task steps.

Optionally, the processing unit 22 may specifically be:

According to the priority order of the tasks, the tasks with higher priority are prioritized, or the priorities of the tasks are adjusted, and the tasks with higher priorities are prioritized.

The manner in which the priority of the task is adjusted may include: Priority adjustment based on the waiting time of the task and/or the completion time of the task.

The parallel processing device ffl of the embodiment of the present invention can be understood by referring to the parallel processing method of the foregoing embodiment, and the same content is not described herein. As shown in FIG. 3, a schematic diagram of a configuration of an application scenario of a parallel processing device according to an embodiment of the present invention is shown.

Web Service31, responsible for accepting and forwarding web requests from Sichuan. For example, receiving a request from a user to define a business file, and forwarding to a service definition block 32. The Web Service is a T application service, which can be understood by referring to the prior art, and will not be described herein.

The business definition module 32 is responsible for providing an interface for the Chuanhu to define a business definition file. The business definition file contains business information. The service information may include task definition information, task split information, task step association information, and task step information. The service information is the basis for the task scheduler to perform task scheduling processing. The task scheduler and service information will be specifically explained below.

The task parsing block 33, accepting the business definition file (the business definition file type such as json, etc.) defined by the Chuanhu, and parsing the business definition file, obtaining the business information defined by the household, and storing it in the In the database 34, the service identifier (ID) corresponding to the service information is returned at the same time. The database 34 can be a distributed database, and the distributed database can be understood by referring to the prior art, and will not be described herein.

The task scheduler 35 is responsible for accepting the task processing request sent by the Web Serv. The task scheduler 35 can be used to adapt different computing models according to the needs of the business, and calculate the investment: such as the _ma p, reduce model of the Hadoop system, or the calculation model such as a multi-step scheduling mode (steps of fire Ding et al. ). The task scheduler 35 can also be used to implement prioritization, or priority adjustment, or to disassemble tasks and allocate resources and task control for tasks.

The resource manager 36 is responsible for satisfying and releasing the resources of the task scheduler 35. In essence, the main player of the resource manager 36 Functions can include resource management, resource matching, and resource C].

The task runs the block 37, the processing of the negative task, the processing program developed by Tawakawa, and the processing task scheduler 35 distributes the 'coming task.

The cluster management module 38 is responsible for the deployment and monitoring of the clusters of the parallel tasks.

The bottom layer supports various heterogeneous hardware such as physical machines and VMs (virtual machines). Physical machines can include personal computers, I: stations, or various application servers, and so on.

Physically, the business information may include: task definition information, task split information, task step association information, and task step information.

(1) The task definition information includes the fault tolerance level Fau itTol nce and the calculation model ProgramModel, respectively: FaultTolerance="Normal"

ProgramModel="CMR"

Among them, CMR is Cloud MapReduce, which can be understood as a multi-step calculation model.

Optionally, the meter model may also be a computational model of the system of the hadoop system, which implements two-step processing to achieve compatibility between the two-step processing of the parallel processing apparatus of the embodiment of the present invention.

(2) Task split information, such as:

<JarRelativePath>opt/Package/user. jar</JarRelativePath> is interpreted as the jar package address <DownloadProtocol>LocalPath</DownloadProtocol> is interpreted as the download method

<StepExecClass>Spl i tter. transSpl i tter</StepExecClass> is interpreted as a handler </Spl itInfo>

According to the task split information of the Sichuan household, the task submitted by the user can be split (or the split function provided by the default system), and the result of the splitting of the Chuanxi continues the processing of the following tasks.

(3) Task step related information: You can define multiple (people 2) task steps instead of the map and reduce steps of the hadoop system, and define the processing order between multiple task steps, that is, multiple task step relationships. . Overcome the defects of the hadoop system and the EMR system that need to be submitted multiple times, each time inputting the user's task line parameters to complete multiple steps. The definition of D's multiple task steps is as follows:

<StepName>**</StepName> is interpreted as StepName (step name)

<StepRat io>**</Ste _P Ratio> is interpreted as the running task process comparison between Steps

<Previous>**</Previous> is interpreted as the previous step

<Next>**</Next> explained as the next step

</StepRelat ion>

Among them, StepRat io is the ratio of the task line process between each St., and adjusts the StepRatio to adjust the task running process. According to the task step relationship shown in FIG. 4, according to the task step relationship, the task manager performs the task scheduling, and runs a Step that appears in the Step as the input of the next Step, and continues to run, for example, after Step 1 enters Step 2, and then enters Step 3 .

As shown in Figure 5, the task step relationship including bifurcation, such as Step21 and St. 22, is placed after Τ·Stepl.

(4) Task step information, the task step information can include the resource information of each step run, the user program and other settings of the W household. Task step information such as:

<StepName>**</StepNarae> Solve the seed as StepName

<JarRelativePath>/opt/Package/user. jar</ JarRelat i vePath> is interpreted as user Jar package

<DownloadProtocol>LocalPath</DownloadProtocol> is interpreted as a 'loaded method' <StepExecClass>** </StcpExecClass> is interpreted as the run function of this step

<Integritycheck>false</lntegritycheck> is interpreted as integrity detection

<Partitioner>fal se</Part i t ioner> explained as

Part i tion

<Combine>false</Combine> is interpreted as Combine </StepRelatedFi le>

<ResourceRequirement> is interpreted as resource requirement

<Processor—percent X/Processor—percent> is interpreted as CPU usage

<Memory>**</Meraory> Unsolving seeds for memory requirements

<Swap>**</Swap> is interpreted as virtual memory requirements

<Bandwidth>**</Bandwid t,h> is interpreted as bandwidth requirement

<Disk>**</Disk> is interpreted as hard disk requirements </ResourceRequi rement>

<IsExclusiveVM>false</I sExc lus iveVM> Explain as Step is unique

<Faul tToleranceLevel>Normal</Fau 1 tToleranceLeve 1 > Interpreted as fault tolerance level

<ImageID>img-l ll l</ImageID> is interpreted as the Step Runtime Environment: Mirror <SpecTD>vsp-l ll /SpecID> Interpreted as Step Runtime Environment: Specifications

<IsStateful>false</IsStateful > Whether the seed is stateful <lsPreempt ive>fal se</IsPreempt i ve> is interpreted as ϋNo 抢 Λ·Other resources

<Sendlmmediately>false</Sendlmmedi atel y> is interpreted as immediate output

<IsSequence>false</IsSequence> is interpreted as whether or not the st print is processed and then a step is performed.

<Pre-Process>

<ScriptLanguageX/ScriptLanguage> is interpreted as a scripting language Shel l or perl

<ScriptUrlX/ScriptUrl> is interpreted as the address of the preprocessor script

</Pre-Process>

<AutoScale/> is interpreted as if autoscale

<thresholdnum/> is interpreted as an autoscale policy (the number of tasks in the queue exceeds this value when scale out)

<ovcrNum> is interpreted as scaleout when the number of overruns in the queue %overNum is the number of startup workers (executing 3⁄4)

</StepDef>

Where "%" is the sign of the division operation.

The scheduler performs resource processing based on the task step information and performs task processing. As shown in FIG. 6, in combination with the parallel processing apparatus shown in FIG. 3, taking the media processing transcoding service as an example, multiple transcoding tasks can be submitted, and the 1⁄4 transcoding tasks include three steps of fragmentation, transcoding, and merging. The parallel processing methods of the embodiments of the present invention are described as follows: Step 1 , step 2 , and step 3 are respectively illustrated as follows:

Step 61: The user logs in.

Step 62: The service definition module receives a request for the user to submit a β definition service definition file; and returns a service definition page. Step 63: The service definition module completes the definition of the service, and submits the generated service definition file to the service parsing module. Step 64: The service parsing block receives the service definition file, parses the file, and obtains the user-defined service information.

Step 65: The service parsing module saves the service information in the database, and returns the service ID of the service information.

Step 66: The user submits the task, and the web service receives the task processing request submitted by the user. In the embodiment, the task processing and consultation may include the input and output of the Mj household and the service 1D of the MJ. Step 67: The web service forwards the task processing request to the task scheduler.

Step 68: The task scheduler searches for the service information defined by the Kawasaki according to the task processing, obtains the service information in the database, and returns to the Kawagawa submission success. In this embodiment, the task processing request is Business ID.

Step 69: The task scheduler obtains the Sichuan River's Yingchuan program according to the task split information in the service information.

Specifically, the task scheduler splits the task into multiple small tasks according to the task splitting letter, such as st 1, step 2, and step 3, so that the parallel processing can be performed faster.

Step 610: The task scheduler allocates resources to the resource manager.

According to the running information in the task information of step1, the task scheduler requests the resource manager for the resources to be used (including the running specifications of the machine, the mirror, and the resources of the W-administrator line: CPU, memory, virtual memory) , hard disk, network bandwidth, etc.).

The resource manager returns the matching resource identifier based on the information provided by the task manager.

Optionally, the task scheduler may perform priority processing according to the priority order of the tasks, select a task with a higher priority, or adjust the priority by the task scheduler.

Step 611: Run the task in the resource in the request.

In fact, when the resource is started, a task running block is started for task running and management. The task scheduler sends information to the task run module in the resource.

Step 612: The task running module to the file storage component acquires the user application of the stepl.

Step 613: The task running module runs stepl.

If Stepl smashes the processing result, the task scheduler will search for the information according to the task, find St 2, and then run the task to the resource corresponding to Step 2 in the resource manager until Step3 executes the ifc.

It is processed by 7"- is a plurality of steps, and steps 68-613 are all involved, until the entire task is successfully issued. Optional, task wither/士: withering task step fl, need Specify the input and output of the intermediate step, and define in the business definition file whether the step is output after completion. If it is defined as not performing integrity check and immediate output, one result of each step in the task can be output. 3⁄4 is connected as the input of the next step to continue the next step, and realize the parallel processing of the steps. Conversely, if it is defined as the integrity check, each Step outputs a batch of intermediate results, and the intermediate result needs to be processed. After the full output, proceed to the next step to achieve the towel processing of the task.

Optional, business-defined placement, business parsing, task i-period, resource manager, can be set on the same or different servers. The task running module, the file storage component MJ, is set to be on the same or different physical machine or VM.

It is also necessary to explain that the user is submitting a type of service definition file /! 7 (steps 61-65), which can be used for the operation of the same type of task, that is, the user submits multiple tasks of the same type, which can be a total of steps 61-65. Submitted business definition file.

The MJ user can submit multiple tasks, and the Web Service forwards it to the task scheduler for parallel processing, implementing a service definition file for multiple tasks, and implementing parallel running of tasks. Like similar steps 66- - step 613:

Step 660: The user submits the task, and the Web Service receives the request (including the user's input and output and the used service ID) of the Sichuan household to submit the task processing.

Step 670: The web service forwards the request (including the input and output of the household and the used service ID) to the task scheduler. Step 680: The task scheduler finds user-defined business information according to the consulted information (service ID), and obtains it in the database. These business information, as well as the return of the W households submitted successfully.

Step 690: The task scheduler obtains an application of the HJ household according to the task split information in the service information.

Step 6100: The task scheduler accesses the resource to the resource manager. The resource manager returns the matching resource ID based on the information provided by the task manager.

Step 6110: The task scheduler notifies the task running module of the resources of the iff in stepl.

Step 6120: The task running module goes to the file storage component to obtain the MJ user application of the step].

Step 6130: The task running module runs stepl.

It can be seen that the task is processed in parallel by using multiple task steps, which overcomes the defects that the hadoop system and the EMR system need to submit multiple times, and input the task parameters of the Sichuan household each time to complete the multiple steps.

Moreover, the user can submit multiple tasks, implement a business definition file for multiple tasks of the same type, and implement the task.

A person skilled in the art can understand that all or part of the process of implementing the above embodiment method can be completed by a computer program to instruct related hardware, and the program can be stored in a computer readable storage medium, the program In execution, the flow of an embodiment of the methods as described above may be included. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM).

Claims

Rights request

1, a parallel processing method, its characteristics in T, package:

Receiving a plurality of task processing requests, determining the service information corresponding to the task according to the service identifier of the task, and determining the service information corresponding to the task; and performing parallel processing on the multiple tasks by using multiple task steps according to the service information corresponding to the task The number of steps of the plurality of task steps is a large T equal to -2.

2. The method according to claim 1, wherein the method further comprises: before receiving the plurality of task processing requests, the force-method further comprises:

Obtain the business definition file defined by the user;

Parsing the service definition file to obtain the service information;

The method according to claim 1 or 2, wherein the service information comprises:

Task definition information: W is used to define the fault tolerance level and calculation model of the task;

Task split information: Used to split a task into multiple task steps;

Task step association information: used to define the processing order between multiple task steps;

And, the task step information: defines the operation information of each task step, and the operation information includes: resource information, a user program, and a banker setting.

The method according to claim 3, wherein the method for processing the task by using multiple task steps according to the service information corresponding to the task includes:

Splitting the task into multiple task steps according to the task split information in the service information:

Obtaining a user program of the task step according to the task step information in the service information, and requesting resources for the task step:

Determining, according to the task step related information in the service information, the user resource, calling the user program, processing the task, and completing the plurality of the task steps according to the processing sequence between the multiple task steps. deal with.

5. The method according to claim 4, wherein the operation information further comprises: a processing mode of the plurality of task steps, the processing mode being a serial processing mode or a parallel processing mode:

When the processing mode of the plurality of task steps is the serial processing mode, after all the outputs of the previous task step in the plurality of task steps are subjected to the integrity check, as a plurality of the task steps Input of a step;

6. A parallel processing device, characterized by D-, comprising:

Receiving, by the receiving unit, a plurality of task processing requests, determining service information corresponding to the task according to the service identifier carried in the task processing request;

The processing unit is configured to perform parallel processing on the multiple tasks by using multiple task steps according to the total service correspondence corresponding to the task, where the number of steps of the multiple task steps is 2.

7. The device according to claim 6, wherein the device further comprises: Obtaining the yuan, Chuan T obtains the business definition file defined by 1+j households Cl;

a parsing unit, the Kawasawa parses the service definition file, obtains the service information, generates a service identifier, and establishes a correspondence relationship between the service information of the service identifier Lj;

The storage unit, W Τ· stores the correspondence between the service identifiers and the service information.

8. The apparatus according to claim 6 or 7, wherein the service information comprises:

Task definition information: D. Define the fault tolerance level and calculation mode of the task:

Task split information: Use Ding - split the task into multiple task steps;

Task step association information: The order of processing between defining multiple tasks;

And, the task step information: used to define running information of each task step, the running information includes: resource information, ffl household program, and household settings.

The device according to claim 8, wherein the processing unit is specifically configured to:

According to the task step information in the service information, the user program that receives the tt step, and the resources for the task step:

Determining the task according to the task step relationship information in the service information, calling the user program, and completing the plurality of the task steps according to the processing sequence between the multiple task steps. Processing.

The device according to claim 9, wherein the operation information further comprises: a processing mode of a plurality of task steps, the processing mode is a Φ row processing mode or a parallel processing mode, and the processing unit is specific Also use Ding-:

When the processing mode of the plurality of task steps is the towel processing mode, after all the outputs of the previous one of the plurality of task steps pass the integrity check S, the next one of the plurality of task steps Input of a step;

When the processing of the plurality of task steps is the parallel processing mode, any one of the plurality of the task steps outputs directly as an input to the next one of the plurality of task steps.