WO2013138982A1 - A parallel processing method and apparatus - Google Patents
A parallel processing method and apparatus Download PDFInfo
- Publication number
- WO2013138982A1 WO2013138982A1 PCT/CN2012/072545 CN2012072545W WO2013138982A1 WO 2013138982 A1 WO2013138982 A1 WO 2013138982A1 CN 2012072545 W CN2012072545 W CN 2012072545W WO 2013138982 A1 WO2013138982 A1 WO 2013138982A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- task
- steps
- information
- processing
- service
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
Definitions
- the embodiments of the present invention relate to the field of computer technologies, and in particular, to a parallel processing method and apparatus. Background of the invention
- the above Hadoop system or EMR system needs to be processed strictly according to the two steps of map and reduce.
- the map refers to processing the original document according to the map rule, and outputting the intermediate result.
- the intermediate results are merged according to the reduce rule. If the task has multiple steps (more than 2 steps), it needs to be submitted multiple times, and each time the user's task running parameters are input to complete the multiple step processing. Therefore, it is more complicated for the user to use. Summary of the invention
- an embodiment of the present invention provides a parallel processing method, including
- Receiving a plurality of task processing requests determining the service information corresponding to the task according to the service identification carried by the task processing, and determining, according to the service information corresponding to the task, the plurality of task steps of the Chuanchuan to perform parallel processing on the multiple tasks,
- the number of steps of the multiple task steps is greater than or equal to 2.
- an embodiment of the present invention provides a parallel processing apparatus, including
- a receiving unit configured to receive a plurality of task processing requests, and determine, according to the service identifier carried in the task processing request, service information corresponding to the task;
- the processing unit is configured to perform parallel processing on the multiple tasks by using multiple task steps according to the service information corresponding to the task, where the number of steps of the multiple task steps is greater than 2.
- the parallel processing method and device of the embodiment of the present invention perform parallel processing on the task by using multiple task steps, and determine the service information according to the service identifier to process the task, and do not need to repeatedly submit the task operation parameters of the user, and implement multiple Simplification of the step processing.
- FIG. 1 is a schematic flow chart of a parallel processing method according to an embodiment of the present invention.
- m 2 is a schematic structural diagram of a parallel processing device according to an embodiment of the present invention.
- FIG. 3 is a schematic structural diagram of an application scenario of a parallel processing device according to an embodiment of the present invention.
- FIG. 4 is a schematic diagram 1 of a relationship of task steps in a parallel processing method according to an embodiment of the present invention
- 5 is a schematic diagram 2 of a task step relationship in a parallel processing method according to an embodiment of the present invention
- 6 is a schematic diagram of a flow in a scenario of a parallel processing method according to an embodiment of the present invention.
- an embodiment of the present invention provides a parallel processing method, including:
- the plurality of task steps can be understood as processing the task by the W person in two steps.
- the parallel processing method of the embodiment of the present invention uses multiple task steps to perform parallel processing on the task, and determines the service information according to the service identifier to process the task, and does not need to repeatedly submit the task running parameters of the user, and simply implements multiple steps. Overcome the defects of the Hadoop system and the EMR system, which need to be submitted multiple times, each time inputting the user's task running parameters to complete multiple steps.
- the method may further include: acquiring a user-defined service definition file before receiving the multiple task processing requests.
- a service identifier is generated, and a correspondence between the service identifier and the service information is established.
- the user-defined service definition file can be used as a processing template for a certain type of service, and thus serves as a basis for running multiple tasks under the service.
- the service identifier can be provided to determine the business information. It is not necessary to input the task operation parameters each time, which reduces the operation of the user when the task is running, and is convenient for the user to use.
- the service information may include:
- Task definition information Use D to define the fault tolerance level and calculation mode of the task.
- Task split information ffl T splits the task into multiple task steps.
- Task step associated information ffl defines the order of processing between multiple task steps.
- Task step information It is used to define the running information of each task step.
- the line information includes: resource information, user program, and Kawago settings.
- the running information may further include: a processing mode of the multiple task steps, where the processing mode is a serial processing mode or a parallel processing mode.
- the processing mode of the plurality of task steps is the serial processing mode
- the next one of the plurality of task steps is performed.
- Input That is, the output of each task step is multi-output, and all the losers can pass the integrity check before they can enter the next task step as input.
- any one of the plurality of the task steps outputs directly as an input of the next one of the plurality of task steps. That is, the output of each task step is multi-output, but it does not need to play all the outputs through the integrity check apricot, and any output of a task step can be used as input to enter a task step. It can be seen that multiple task steps can be processed in parallel, which improves the processing capability, and overcomes the defects of the Hadoop system and the EMR system which require two steps and are strictly processed in sequence.
- the manner of processing the task by using multiple task steps according to the service information corresponding to the task may include:
- the task is split into multiple task steps according to the task split information in the service information.
- the parallel processing method of the embodiment of the present invention may further include:
- an embodiment of the present invention provides a parallel processing apparatus, including: a receiving unit 21, configured to receive a plurality of task processing requests, which are carried according to the task processing request.
- Service identifier ID
- Service information corresponding to the task
- the processing unit 22 is configured to perform parallel processing on the multiple tasks by using multiple task steps according to the total service information corresponding to the task, where the number of steps of the multiple task steps is greater than equals.
- the parallel processing device of the embodiment of the present invention adopts multiple task steps of the HJ to perform parallel processing on the task, and determines the service information according to the service identifier to process the task, and does not need to repeatedly submit the task running parameters of the HJ household, and simply implements
- the multi-step processing overcomes the defects of the Hadoop system and the EMR system that need to be submitted multiple times, each time inputting the task of the user to complete the processing of multiple steps.
- the obtaining unit is configured to obtain a service definition file defined by the user Q.
- the parsing unit is configured to parse the service definition file, obtain the service information, generate a service identifier, and establish a correspondence between the service identifier and the service information.
- a storage unit configured to store a correspondence between the service identifier and the service information.
- the business information may include:
- Task definition information Used to define the fault tolerance level and calculation model of the task.
- Task split information Used to split a task into multiple task steps.
- Task step associated information ffl defines the order of processing between multiple task steps.
- Task step information Use D__ to define the line information of each task step.
- the line information includes: resource information, ffl household program, and Sichuan household design. Further, the processing unit 22, the body can be chuanding -:
- the task is split into multiple task steps according to the task split information in the service information.
- the resource that is requested, the W user program is processed, and the task is processed, and the plurality of the task steps are completed according to the processing sequence between the multiple task steps. Processing.
- the running information may further include: a processing manner of multiple task steps, the processing mode is a serial processing mode or a parallel processing mode, and the processing unit 22 may specifically:
- any one of the plurality of the task steps is directly input as the input of the next one of the plurality of task steps.
- processing unit 22 may specifically be:
- the tasks with higher priority are prioritized, or the priorities of the tasks are adjusted, and the tasks with higher priorities are prioritized.
- the manner in which the priority of the task is adjusted may include: Priority adjustment based on the waiting time of the task and/or the completion time of the task.
- the parallel processing device ffl of the embodiment of the present invention can be understood by referring to the parallel processing method of the foregoing embodiment, and the same content is not described herein.
- FIG. 3 a schematic diagram of a configuration of an application scenario of a parallel processing device according to an embodiment of the present invention is shown.
- Web Service31 responsible for accepting and forwarding web requests from Sichuan. For example, receiving a request from a user to define a business file, and forwarding to a service definition block 32.
- the Web Service is a T application service, which can be understood by referring to the prior art, and will not be described herein.
- the business definition module 32 is responsible for providing an interface for the Chuanhu to define a business definition file.
- the business definition file contains business information.
- the service information may include task definition information, task split information, task step association information, and task step information.
- the service information is the basis for the task scheduler to perform task scheduling processing. The task scheduler and service information will be specifically explained below.
- the task parsing block 33 accepting the business definition file (the business definition file type such as json, etc.) defined by the Chuanhu, and parsing the business definition file, obtaining the business information defined by the household, and storing it in the In the database 34, the service identifier (ID) corresponding to the service information is returned at the same time.
- the database 34 can be a distributed database, and the distributed database can be understood by referring to the prior art, and will not be described herein.
- the task scheduler 35 is responsible for accepting the task processing request sent by the Web Serv.
- the task scheduler 35 can be used to adapt different computing models according to the needs of the business, and calculate the investment: such as the ma p, reduce model of the Hadoop system, or the calculation model such as a multi-step scheduling mode (steps of fire Ding et al. ).
- the task scheduler 35 can also be used to implement prioritization, or priority adjustment, or to disassemble tasks and allocate resources and task control for tasks.
- the resource manager 36 is responsible for satisfying and releasing the resources of the task scheduler 35.
- the main player of the resource manager 36 Functions can include resource management, resource matching, and resource C].
- the task runs the block 37, the processing of the negative task, the processing program developed by Tawakawa, and the processing task scheduler 35 distributes the 'coming task.
- the cluster management module 38 is responsible for the deployment and monitoring of the clusters of the parallel tasks.
- the bottom layer supports various heterogeneous hardware such as physical machines and VMs (virtual machines).
- Physical machines can include personal computers, I: stations, or various application servers, and so on.
- the business information may include: task definition information, task split information, task step association information, and task step information.
- CMR Cloud MapReduce, which can be understood as a multi-step calculation model.
- the meter model may also be a computational model of the system of the hadoop system, which implements two-step processing to achieve compatibility between the two-step processing of the parallel processing apparatus of the embodiment of the present invention.
- Task split information such as:
- jar ⁇ /JarRelativePath> is interpreted as the jar package address
- jar ⁇ /JarRelativePath>LocalPath ⁇ /DownloadProtocol> is interpreted as the download method
- transSpl i tter ⁇ /StepExecClass> is interpreted as a handler ⁇ /Spl itInfo>
- the task submitted by the user can be split (or the split function provided by the default system), and the result of the splitting of the Chuanxi continues the processing of the following tasks.
- Task step related information You can define multiple (people 2) task steps instead of the map and reduce steps of the hadoop system, and define the processing order between multiple task steps, that is, multiple task step relationships. . Overcome the defects of the hadoop system and the EMR system that need to be submitted multiple times, each time inputting the user's task line parameters to complete multiple steps.
- the definition of D's multiple task steps is as follows:
- StepName (step name)
- StepRat io>** ⁇ /Ste P Ratio> is interpreted as the running task process comparison between Steps
- StepRat io is the ratio of the task line process between each St., and adjusts the StepRatio to adjust the task running process.
- the task manager performs the task scheduling, and runs a Step that appears in the Step as the input of the next Step, and continues to run, for example, after Step 1 enters Step 2, and then enters Step 3 .
- the task step relationship including bifurcation, such as Step21 and St. 22, is placed after ⁇ Stepl.
- Task step information can include the resource information of each step run, the user program and other settings of the W household.
- Task step information such as:
- Step Runtime Environment Mirror ⁇ SpecTD>vsp-l ll /SpecID> Interpreted as Step Runtime Environment: Specifications
- ⁇ IsSequence>false ⁇ /IsSequence> is interpreted as whether or not the st print is processed and then a step is performed.
- ⁇ ScriptLanguageX/ScriptLanguage> is interpreted as a scripting language Shel l or perl
- ⁇ ScriptUrlX/ScriptUrl> is interpreted as the address of the preprocessor script
- ⁇ thresholdnum/> is interpreted as an autoscale policy (the number of tasks in the queue exceeds this value when scale out)
- ⁇ ovcrNum> is interpreted as scaleout when the number of overruns in the queue %overNum is the number of startup workers (executing 3 ⁇ 4)
- the scheduler performs resource processing based on the task step information and performs task processing.
- FIG. 6 in combination with the parallel processing apparatus shown in FIG. 3, taking the media processing transcoding service as an example, multiple transcoding tasks can be submitted, and the 1 ⁇ 4 transcoding tasks include three steps of fragmentation, transcoding, and merging.
- the parallel processing methods of the embodiments of the present invention are described as follows: Step 1 , step 2 , and step 3 are respectively illustrated as follows:
- Step 61 The user logs in.
- Step 62 The service definition module receives a request for the user to submit a ⁇ definition service definition file; and returns a service definition page.
- Step 63 The service definition module completes the definition of the service, and submits the generated service definition file to the service parsing module.
- Step 64 The service parsing block receives the service definition file, parses the file, and obtains the user-defined service information.
- Step 65 The service parsing module saves the service information in the database, and returns the service ID of the service information.
- Step 66 The user submits the task, and the web service receives the task processing request submitted by the user.
- the task processing and consultation may include the input and output of the Mj household and the service 1D of the MJ.
- Step 67 The web service forwards the task processing request to the task scheduler.
- Step 68 The task scheduler searches for the service information defined by the Kawasaki according to the task processing, obtains the service information in the database, and returns to the Kawagawa submission success.
- the task processing request is Business ID.
- Step 69 The task scheduler obtains the Sichuan River's Yingchuan program according to the task split information in the service information.
- the task scheduler splits the task into multiple small tasks according to the task splitting letter, such as st 1, step 2, and step 3, so that the parallel processing can be performed faster.
- Step 610 The task scheduler allocates resources to the resource manager.
- the task scheduler requests the resource manager for the resources to be used (including the running specifications of the machine, the mirror, and the resources of the W-administrator line: CPU, memory, virtual memory) , hard disk, network bandwidth, etc.).
- the resource manager returns the matching resource identifier based on the information provided by the task manager.
- the task scheduler may perform priority processing according to the priority order of the tasks, select a task with a higher priority, or adjust the priority by the task scheduler.
- Step 611 Run the task in the resource in the request.
- a task running block is started for task running and management.
- the task scheduler sends information to the task run module in the resource.
- Step 612 The task running module to the file storage component acquires the user application of the stepl.
- Step 613 The task running module runs stepl.
- Stepl smashes the processing result
- the task scheduler will search for the information according to the task, find St 2, and then run the task to the resource corresponding to Step 2 in the resource manager until Step3 executes the ifc.
- task wither/ ⁇ : withering task step fl need Specify the input and output of the intermediate step, and define in the business definition file whether the step is output after completion. If it is defined as not performing integrity check and immediate output, one result of each step in the task can be output. 3 ⁇ 4 is connected as the input of the next step to continue the next step, and realize the parallel processing of the steps. Conversely, if it is defined as the integrity check, each Step outputs a batch of intermediate results, and the intermediate result needs to be processed. After the full output, proceed to the next step to achieve the towel processing of the task.
- business-defined placement, business parsing, task i-period, resource manager can be set on the same or different servers.
- the task running module, the file storage component MJ is set to be on the same or different physical machine or VM.
- steps 61-65 which can be used for the operation of the same type of task, that is, the user submits multiple tasks of the same type, which can be a total of steps 61-65.
- Submitted business definition file a type of service definition file /! 7 (steps 61-65), which can be used for the operation of the same type of task, that is, the user submits multiple tasks of the same type, which can be a total of steps 61-65.
- the MJ user can submit multiple tasks, and the Web Service forwards it to the task scheduler for parallel processing, implementing a service definition file for multiple tasks, and implementing parallel running of tasks.
- the Web Service forwards it to the task scheduler for parallel processing, implementing a service definition file for multiple tasks, and implementing parallel running of tasks.
- Step 660 The user submits the task, and the Web Service receives the request (including the user's input and output and the used service ID) of the Sichuan household to submit the task processing.
- Step 670 The web service forwards the request (including the input and output of the household and the used service ID) to the task scheduler.
- Step 680 The task scheduler finds user-defined business information according to the consulted information (service ID), and obtains it in the database. These business information, as well as the return of the W households submitted successfully.
- Step 690 The task scheduler obtains an application of the HJ household according to the task split information in the service information.
- Step 6100 The task scheduler accesses the resource to the resource manager.
- the resource manager returns the matching resource ID based on the information provided by the task manager.
- Step 6110 The task scheduler notifies the task running module of the resources of the iff in stepl.
- Step 6120 The task running module goes to the file storage component to obtain the MJ user application of the step].
- Step 6130 The task running module runs stepl.
- the task is processed in parallel by using multiple task steps, which overcomes the defects that the hadoop system and the EMR system need to submit multiple times, and input the task parameters of the Sichuan household each time to complete the multiple steps.
- the user can submit multiple tasks, implement a business definition file for multiple tasks of the same type, and implement the task.
- the storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM).
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Debugging And Monitoring (AREA)
Abstract
The present invention relates to the computer technology field, particularly relates to a parallel processing method and apparatus. The parallel processing method comprises: receiving a plurality of task processing requests, and determining the service information corresponding to the tasks according to the service identifiers carried in the task processing requests; using a plurality of task steps to perform parallel processing to said a plurality of tasks according to said service information corresponding to the tasks, wherein the number of the steps of the plurality of task steps is greater than or equal to two. The parallel processing method and apparatus in the embodiments of the present invention use a plurality of task steps to perform parallel processing to the tasks, and determine the service information according to the service identifiers to perform task processing. It is not necessary to submit the task running parameters of the user repeatedly, and the processing of a plurality of steps is simplified.
Description
一种并行处理方法和装 S 技术领域 Parallel processing method and equipment S technology field
本发明实施例涉及计算机技术领域, 尤其涉及一种并行处理方法和装置。 发明背景 The embodiments of the present invention relate to the field of computer technologies, and in particular, to a parallel processing method and apparatus. Background of the invention
随着互联网的发展, 进入了信息爆炸的时代, 对人 ^;信息的并行处理可以提高处理效率。 目前,较为熟知的并行处理系统如 hadoop系统(属丁 种分布式系统基础架构)、 EMR ( Elastic MapReduce ) 系统。 With the development of the Internet, it has entered the era of information explosion, and the parallel processing of information can improve processing efficiency. Currently, more well-known parallel processing systems such as the hadoop system (which is a distributed system infrastructure) and the EMR (Elastic MapReduce) system.
但是, 对于任务的并行处理, 上述 hadoop系统或 EMR系统, 需要严格按照 map、 reduce两个步 骤进行处理, map指的是对原始的文档进行按照 map规则进行处理, 输出中间结果, reduce指的是 按照 reduce规则对中间结果合并。 如果任务存在多个步骤(大于 2个步骤) 的处理, 则需要通过提 交多次, 每次输入用户的任务运行参数来完成多个步骤处理, 因此, 对于用户使用上来说, 较复杂。 发明内容 However, for the parallel processing of tasks, the above Hadoop system or EMR system needs to be processed strictly according to the two steps of map and reduce. The map refers to processing the original document according to the map rule, and outputting the intermediate result. The intermediate results are merged according to the reduce rule. If the task has multiple steps (more than 2 steps), it needs to be submitted multiple times, and each time the user's task running parameters are input to complete the multiple step processing. Therefore, it is more complicated for the user to use. Summary of the invention
本发明实施例的目的是提供一种并行处理方法和装置, 实现多个步骤处理的简单化。 It is an object of embodiments of the present invention to provide a parallel processing method and apparatus that simplifies the processing of multiple steps.
一方面, 本发明实施例提供一种并行处理方法, 包括, In one aspect, an embodiment of the present invention provides a parallel processing method, including
接收多个任务处理请求, 根据所述任务处理 求携带的业务标识, 确定任务对应的业务信息; 根据所述任务对应的业务信息, 采川多个任务步骤对所述多个任务进行并行处理, 所述多个任 务步骤的步骤个数大于等于 2。 Receiving a plurality of task processing requests, determining the service information corresponding to the task according to the service identification carried by the task processing, and determining, according to the service information corresponding to the task, the plurality of task steps of the Chuanchuan to perform parallel processing on the multiple tasks, The number of steps of the multiple task steps is greater than or equal to 2.
另一方面, 本发明实施例提供一种并行处理装置, 包括, In another aspect, an embodiment of the present invention provides a parallel processing apparatus, including
接收单元, 用于接收多个任务处理请求, 根据所述任务处理请求携带的业务标识, 确定任务对 应的业务信息; a receiving unit, configured to receive a plurality of task processing requests, and determine, according to the service identifier carried in the task processing request, service information corresponding to the task;
处理单元, 用于根据所述任务对应的业务信息, 采用多个任务步骤对所述多个任务进行并行处 理, 所述多个任务步骤的步骤个数大于 于 2。 The processing unit is configured to perform parallel processing on the multiple tasks by using multiple task steps according to the service information corresponding to the task, where the number of steps of the multiple task steps is greater than 2.
本发明实施例的并行处理方法和装置, 釆用多个任务步骤对任务进行并行处理, 而且, 根据业 务标识确定业务信息对任务进行处理, 不需耍反复提交用户的任务运行参数, 实现多个步骤处理的 简单化。 附图简要说明 The parallel processing method and device of the embodiment of the present invention perform parallel processing on the task by using multiple task steps, and determine the service information according to the service identifier to process the task, and do not need to repeatedly submit the task operation parameters of the user, and implement multiple Simplification of the step processing. BRIEF DESCRIPTION OF THE DRAWINGS
图 1为本发明实施例并行处理方法的流程示意图; 1 is a schematic flow chart of a parallel processing method according to an embodiment of the present invention;
m 2为本发明实施例并行处理装置的构成示意图; m 2 is a schematic structural diagram of a parallel processing device according to an embodiment of the present invention;
图 3为本发明实施例并行处理装置一应用场景下的构成示意图; 3 is a schematic structural diagram of an application scenario of a parallel processing device according to an embodiment of the present invention;
图 4为本发明实施例并行处理方法中任务步骤关系示意图一; 4 is a schematic diagram 1 of a relationship of task steps in a parallel processing method according to an embodiment of the present invention;
5为本发明实施例并行处理方法中任务步骤关系示意图二;
Μ 6为本发明实施例并行处理方法 应 场景下的流 示意图。 实施本发明的方式 5 is a schematic diagram 2 of a task step relationship in a parallel processing method according to an embodiment of the present invention; 6 is a schematic diagram of a flow in a scenario of a parallel processing method according to an embodiment of the present invention. Mode for carrying out the invention
如图 1所示, 本发明实施例提供一种并行处理方法, 包括: As shown in FIG. 1, an embodiment of the present invention provides a parallel processing method, including:
11、 接收多个任务处理 求, 根据所述任务处理请求携带的业务标识(ID) , 确定任务对应的业 务信息。 11. Receive multiple task processing requests, and determine service information corresponding to the task according to the service identifier (ID) carried in the task processing request.
12、 根据所述任务对应的业务信息, 采用多个任务步骤对所述多个任务进行并行处理, 所述多 个任务步骤的步骤个数大于等丁- 2。 12. Performing, in parallel, processing the multiple tasks by using multiple task steps according to the service information corresponding to the task, where the number of steps of the multiple task steps is greater than equals -2.
其中, 所述多个任务步骤可以理解为采 W人于 于 2个步骤数对任务进行处理。 The plurality of task steps can be understood as processing the task by the W person in two steps.
本发明实施例的并行处理方法, 采用多个任务步骤对任务进行并行处理, 而且, 根据业务标识 确定业务信息对任务进行处理, 不需要反复提交用户的任务运行参数, 简单的实现多个步骤处理, 克服了 hadoop系统、 EMR系统需耍提交多次, 每次输入用户的任务运行参数来完成多个步骤处理的 缺陷。 The parallel processing method of the embodiment of the present invention uses multiple task steps to perform parallel processing on the task, and determines the service information according to the service identifier to process the task, and does not need to repeatedly submit the task running parameters of the user, and simply implements multiple steps. Overcome the defects of the Hadoop system and the EMR system, which need to be submitted multiple times, each time inputting the user's task running parameters to complete multiple steps.
本发明实施例的并行处理方法, 所述在接收多个任务处理请求之前, 所述方法还可以包括: 获取用户自定义的业务定义文件。 In the parallel processing method of the embodiment of the present invention, the method may further include: acquiring a user-defined service definition file before receiving the multiple task processing requests.
解析所述业务定义文件, 获取所述业务信息。 Parsing the service definition file to obtain the service information.
生成业务标识, 并建立业务标识与所述业务信息的对应关系。 A service identifier is generated, and a correspondence between the service identifier and the service information is established.
在本实施方式中, 用户定义的业务定义文件可以作为某类业务的处理模版, 从而作为该类业务 下多个任务运行的依据。 在提交任务时, 提供业务标识即可确定业务信息, 不必每次都输入任务运 行参数, 减少了用户在任务运行时的操作, 便于用户的使用。 In this embodiment, the user-defined service definition file can be used as a processing template for a certain type of service, and thus serves as a basis for running multiple tasks under the service. When submitting a task, the service identifier can be provided to determine the business information. It is not necessary to input the task operation parameters each time, which reduces the operation of the user when the task is running, and is convenient for the user to use.
A体而言, 所述业务信息, 可以包括: For the A body, the service information may include:
任务定义信息: 用丁定义任务的容错级别和计算模 ¾等。 Task definition information: Use D to define the fault tolerance level and calculation mode of the task.
任务拆分信息: ffl T将任务拆分成多个任务步骤等。 Task split information: ffl T splits the task into multiple task steps.
任务步骤关联信息: ffl于定义多个任务歩骤之间的处理顺序。 Task step associated information: ffl defines the order of processing between multiple task steps.
任务步骤信息: 用于定义每个任务歩骤的运行信息, 行信息包括: 资源信息、 用户程序以及 川户设置等。 Task step information: It is used to define the running information of each task step. The line information includes: resource information, user program, and Kawago settings.
可选的, 运行信息还可以包括: 多个任务歩骤的处理模式, 该处理模式为串行处理模式或者并 行处理模式。 Optionally, the running information may further include: a processing mode of the multiple task steps, where the processing mode is a serial processing mode or a parallel processing mode.
当多个任务步骤的处理模式为所述串行处理模式时, 多个所述任务步骤中前一个任务步骤的所 有输出经过完整性检杏后, 作为多个所述任务步骤中后一个任务歩骤的输入。 即每一个任务步骤的 输出都是多输出的, 所有输山经过完整性检杏后, 才可以才作为输入进入下一个任务步骤。 When the processing mode of the plurality of task steps is the serial processing mode, after all the outputs of the previous one of the plurality of task steps pass the integrity check, the next one of the plurality of task steps is performed. Input. That is, the output of each task step is multi-output, and all the losers can pass the integrity check before they can enter the next task step as input.
当多个任务步骤的处理模式为所述并行处理模式时, 多个所述任务步骤中前一个任务步骤的任 一个输出直接作为多个所述任务步骤中后一个任务步骤的输入。 即每一个任务步骤的输出都是多输 出的, 但不需耍所有输出经过完整性检杏, 一个任务步骤的任一个输出可以作为输入进入卜'一个任 务步骤。
可见, 多个任务步骤可以并行处理, 提卨了处理能力, 克服了 hadoop系统、 EMR系统需耍 2歩 骤且严格按顺序串行处理的缺陷。 When the processing mode of the plurality of task steps is the parallel processing mode, any one of the plurality of the task steps outputs directly as an input of the next one of the plurality of task steps. That is, the output of each task step is multi-output, but it does not need to play all the outputs through the integrity check apricot, and any output of a task step can be used as input to enter a task step. It can be seen that multiple task steps can be processed in parallel, which improves the processing capability, and overcomes the defects of the Hadoop system and the EMR system which require two steps and are strictly processed in sequence.
进一步的, 所述根据任务对应的业务信息, 采 W多个任务步骤对任务进行处理的方式, 可以包 拈: Further, the manner of processing the task by using multiple task steps according to the service information corresponding to the task may include:
根据所述业务信息中的任务拆分信息, 将所述任务拆分成多个任务步骤。 The task is split into multiple task steps according to the task split information in the service information.
根据所述业务信息中的任务步骤信息, 获取任务步骤的川户程序, 以及为所述任务步骤申请资 源。 Obtaining a Sichuan program of the task step according to the task step information in the service information, and requesting resources for the task step.
根据所述业务信息中的任务步骤关联信息, Λ川中请到的资源, 调用用户程序, 对所述任务进 行处理, S到按照多个任务步骤之间的处理顺序, 完成多个所述任务步骤的处理。 Deleting the task according to the task step association information in the service information, calling the user program, and processing the task, and completing a plurality of the task steps according to the processing sequence between the multiple task steps. Processing.
可选的, 本发明实施例的并行处理方法, 还可以包括: Optionally, the parallel processing method of the embodiment of the present invention may further include:
根据任务的优先级排序, 优先处理优先级高的任务。 Prioritize tasks based on their priority.
或者, 对任务的优先级进行调整, 优先处理优先级髙的任务。 Or, adjust the priority of the task to prioritize the task of priority 髙.
对任务的优先级进行调整的方式, 可以包括: 根据任务的等待时间和 /或任务的完成时间进行优 先级调整。 如图 2所示, 对应上述实施例的并行处理方法, 本发明实施例提供一种并行处理装置, 包括: 接收单元 21, 用于接收多个任务处理¾求, 根据所述任务处理请求携带的业务标识 (ID) , 确定 任务对应的业务信息; The manner in which the priority of the task is adjusted may include: Priority adjustment based on the waiting time of the task and/or the completion time of the task. As shown in FIG. 2, in accordance with the parallel processing method of the foregoing embodiment, an embodiment of the present invention provides a parallel processing apparatus, including: a receiving unit 21, configured to receive a plurality of task processing requests, which are carried according to the task processing request. Service identifier (ID), which determines the service information corresponding to the task;
处理单元 22, 用于根据所述任务对应的业务信总, 采用多个任务步骤对所述多个任务进行并行 处理, 所述多个任务步骤的步骤个数大于等丁 · 2。 The processing unit 22 is configured to perform parallel processing on the multiple tasks by using multiple task steps according to the total service information corresponding to the task, where the number of steps of the multiple task steps is greater than equals.
本发明实施例的并行 ·处理装置, 采 HJ多个任务步骤对任务进行并行处理, 而且, 根据业务标识 确定业务信息对任务进行处理, 不需耍反复提交 HJ户的任务运行参数, 简单的实现多个步骤处理, 克服了 hadoop系统、 EMR系统需要提交多次, 每次输入 户的任务运行参数来完成多个步骤处理的 缺陷。 The parallel processing device of the embodiment of the present invention adopts multiple task steps of the HJ to perform parallel processing on the task, and determines the service information according to the service identifier to process the task, and does not need to repeatedly submit the task running parameters of the HJ household, and simply implements The multi-step processing overcomes the defects of the Hadoop system and the EMR system that need to be submitted multiple times, each time inputting the task of the user to complete the processing of multiple steps.
本发明实施例的并行处理装置, 还可以包括: The parallel processing device of the embodiment of the present invention may further include:
获取单元, 用于获取用户 Q定义的业务定义文件。 The obtaining unit is configured to obtain a service definition file defined by the user Q.
解析单元, 用于解析所述业务定义文件, 获取所述业务信息, 生成业务标识, 并建立业务标识 与所述业务信息的对应关系。 The parsing unit is configured to parse the service definition file, obtain the service information, generate a service identifier, and establish a correspondence between the service identifier and the service information.
存储单元, 用 Τ·存储所述业务标识与所述业务信息的对应关系。 And a storage unit, configured to store a correspondence between the service identifier and the service information.
具体而言, 业务信息可以包括: Specifically, the business information may include:
任务定义信息: 用于定义任务的容错级别和计算模型等。 Task definition information: Used to define the fault tolerance level and calculation model of the task.
任务拆分信息: 用于将任务拆分成多个任务步骤等。 Task split information: Used to split a task into multiple task steps.
任务步骤关联信息: ffl于定义多个任务歩骤之间的处理顺序。 Task step associated information: ffl defines the order of processing between multiple task steps.
任务步骤信息: 用丁_定义每个任务步骤的 行信息, 行信息包括: 资源信息、 ffl户程序以及 川户设宜等。
进一步的, 处理单元 22, 体可以川丁-: Task step information: Use D__ to define the line information of each task step. The line information includes: resource information, ffl household program, and Sichuan household design. Further, the processing unit 22, the body can be chuanding -:
根据所述业务信息中的任务拆分信息, 将所述任务拆分成多个任务步骤。 The task is split into multiple task steps according to the task split information in the service information.
根据所述业务信息中的任务步骤信息, 获取任务步骤的 HJ户程序, 以及为所述任务步骤中请资 源。 Obtaining an HJ user program of the task step according to the task step information in the service information, and requesting resources for the task step.
根据所述业务信息中的任务步骤关联信息, 中请到的资源, 调 W用户程序, 对所述任务进 行处理, 苴到按照多个任务步骤之间的处理顺序, 完成多个所述任务步骤的处理。 And according to the task step related information in the service information, the resource that is requested, the W user program is processed, and the task is processed, and the plurality of the task steps are completed according to the processing sequence between the multiple task steps. Processing.
可选的, 运行信息还可以包括: 多个任务步骤的处理投式, 该处理模式为串行处理模式或者并 行处理模式, 处理单元 22, 具体还可以 HJ丁-: Optionally, the running information may further include: a processing manner of multiple task steps, the processing mode is a serial processing mode or a parallel processing mode, and the processing unit 22 may specifically:
^多个任务步骤的处理模式为所述 行处理模式时, 多个所述任务步骤中前一个任务步骤的所 有输出经过完整性检杳后, 作为多个所述任务步骤中后一个任务步骤的输入。 ^ When the processing mode of the plurality of task steps is the row processing mode, after all the outputs of the previous one of the plurality of task steps pass the integrity check, as the next one of the plurality of task steps Input.
当多个任务步骤的处理模式为所述并 处理模式时, 多个所述任务步骤中前一个任务步骤的任 一个输出直接作为多个所述任务步骤中后一个任务步骤的输入。 When the processing mode of the plurality of task steps is the parallel processing mode, any one of the plurality of the task steps is directly input as the input of the next one of the plurality of task steps.
可选的, 处理单元 22, 具体还可以 丁-: Optionally, the processing unit 22 may specifically be:
根据任务的优先级排序, 优先处理优先级高的任务, 或者, 对任务的优先级进行调整, 优先处 理优先级高的任务。 According to the priority order of the tasks, the tasks with higher priority are prioritized, or the priorities of the tasks are adjusted, and the tasks with higher priorities are prioritized.
对任务的优先级进行调整的方式, 可以包括: 根据任务的等待时间和 /或任务的完成时间进行优 先级调整。 The manner in which the priority of the task is adjusted may include: Priority adjustment based on the waiting time of the task and/or the completion time of the task.
本发明实施例的并行处理装 ffl可以对应参考上述实施例的并行处理方法得以理解, 相同内容, 在此不作赘述。 如图 3所示, 本发明实施例并行处理装置应用场景 的构成示意图。 The parallel processing device ffl of the embodiment of the present invention can be understood by referring to the parallel processing method of the foregoing embodiment, and the same content is not described herein. As shown in FIG. 3, a schematic diagram of a configuration of an application scenario of a parallel processing device according to an embodiment of the present invention is shown.
Web Service31 , 负责川户 web请求接受以及转发。 如接收用户请求定义业务文件的请求, 以及 转发给业务定义投块 32。 Web Service属 T应用服务, 可以参考现有技术得以理解, 在此不作赘述。 Web Service31, responsible for accepting and forwarding web requests from Sichuan. For example, receiving a request from a user to define a business file, and forwarding to a service definition block 32. The Web Service is a T application service, which can be understood by referring to the prior art, and will not be described herein.
业务定义模块 32, 负责提供接口让川户定义业务定义文件。 业务定义文件包含业务信息。 业务 信息可以包括任务定义信息, 任务拆分信息, 任务步骤关联信息, 以及任务步骤信息。 业务信息是 任务调度器进行任务调度处理的依据, 任务调度器及业务信息将在下文具体阐示。 The business definition module 32 is responsible for providing an interface for the Chuanhu to define a business definition file. The business definition file contains business information. The service information may include task definition information, task split information, task step association information, and task step information. The service information is the basis for the task scheduler to perform task scheduling processing. The task scheduler and service information will be specifically explained below.
任务解析投块 33, 负贵接收川户定义的业务定义文件(业务定义文件类型如 l成者 json, 等 ), 并将业务定义文件进行解析, 获取 户定义的业务信息, 并将其保存在数据库 34中, 同时返 问业务信息对应的业务标识 (ID)。 数据库 34可以足分布式数据库, 分布式数据库可以参考现有技 术得以理解, 在此不作赘述。 The task parsing block 33, accepting the business definition file (the business definition file type such as json, etc.) defined by the Chuanhu, and parsing the business definition file, obtaining the business information defined by the household, and storing it in the In the database 34, the service identifier (ID) corresponding to the service information is returned at the same time. The database 34 can be a distributed database, and the distributed database can be understood by referring to the prior art, and will not be described herein.
任务调度器 35, 负责接受 Web Serv i c^l发送来的任务处理请求。 任务调度器 35, 可以用来根 据业务的需求适配不同的计算模型, 计算投: 如 hadoop系统的 map、 reduce模型, 或者计算模型如 多个步骤调度模 (步骤火丁 ·等丁 · 2 )。 任务调度器 35, 还可以用来实现优先级排序, 或者优先级调 整, 或者将任务拆解并为任务分配资源及任务控制。 The task scheduler 35 is responsible for accepting the task processing request sent by the Web Serv. The task scheduler 35 can be used to adapt different computing models according to the needs of the business, and calculate the investment: such as the ma p, reduce model of the Hadoop system, or the calculation model such as a multi-step scheduling mode (steps of fire Ding et al. ). The task scheduler 35 can also be used to implement prioritization, or priority adjustment, or to disassemble tasks and allocate resources and task control for tasks.
资源管理器 36, 负责满足任务调度器 35的资源中请、 释放。 体而言, 资源管理器 36的主耍
功能可以包括资源管理、 资源匹配、 者资源 C]动仲缩。 The resource manager 36 is responsible for satisfying and releasing the resources of the task scheduler 35. In essence, the main player of the resource manager 36 Functions can include resource management, resource matching, and resource C].
任务运行投块 37, 负贲任务的处理, 调川川户开发的处理程序, 处理任务调度器 35分发个 '来 的任务。 The task runs the block 37, the processing of the negative task, the processing program developed by Tawakawa, and the processing task scheduler 35 distributes the 'coming task.
集群管理模块 38, 负责处理并行任务的集群的 fl动化部署与监控。 The cluster management module 38 is responsible for the deployment and monitoring of the clusters of the parallel tasks.
最底层支持物理机和 VM (Vi rtual Machine, 虚拟机) 等各种异构硬件 39。 物理机可以包括个 人电脑, I:作站, 或者各种应用服务器, 等 。 The bottom layer supports various heterogeneous hardware such as physical machines and VMs (virtual machines). Physical machines can include personal computers, I: stations, or various application servers, and so on.
体而言, 业务信息可以包括: 任务定义^息、 任务拆分信息、 任务步骤关联信息和任务步骤 信息。 Physically, the business information may include: task definition information, task split information, task step association information, and task step information.
( 1 ) 任务定义信息包括容错级别 Fau itTol e nce和计算模型 ProgramModel , 分别为: FaultTolerance="Normal" (1) The task definition information includes the fault tolerance level Fau itTol nce and the calculation model ProgramModel, respectively: FaultTolerance="Normal"
ProgramModel="CMR" ProgramModel="CMR"
其中, CMR为 Cloud MapReduce, 可以对应理解为一种多个步骤计算模型。 Among them, CMR is Cloud MapReduce, which can be understood as a multi-step calculation model.
可选的, 计筇模型还可以是 hadoop系统成者 E置系统的计算模型, 实现对两骤处理, 实现本发 明实施例并行处理装置对两骤处理的兼容。 Optionally, the meter model may also be a computational model of the system of the hadoop system, which implements two-step processing to achieve compatibility between the two-step processing of the parallel processing apparatus of the embodiment of the present invention.
( 2 ) 任务拆分信息, 如: (2) Task split information, such as:
<Spl itInfo> <Spl itInfo>
<JarRelativePath>opt/Package/user. jar</JarRelativePath> 解释为 jar包地址 <DownloadProtocol>LocalPath</DownloadProtocol> 解释为下载方式 <JarRelativePath>opt/Package/user. jar</JarRelativePath> is interpreted as the jar package address <DownloadProtocol>LocalPath</DownloadProtocol> is interpreted as the download method
<StepExecClass>Spl i tter. transSpl i tter</StepExecClass> 解释为处理函数 </Spl itInfo> <StepExecClass>Spl i tter. transSpl i tter</StepExecClass> is interpreted as a handler </Spl itInfo>
根据川户的任务拆分信息,可以对 户提交的任务进行拆分(或者可默认系统提供的拆分函数), 川 Γ拆分后的结果继续下面任务的步骤处理。 According to the task split information of the Sichuan household, the task submitted by the user can be split (or the split function provided by the default system), and the result of the splitting of the Chuanxi continues the processing of the following tasks.
( 3 )任务步骤关联信息:可以定义多个(人 Γ 2个)任务步骤,而非 hadoop系统的 map、 reduce 两步, 以及定义多个任务步骤之间的处理顺序, 即多个任务步骤关系。 克服了 hadoop系统、 EMR系 统需耍提交多次, 每次输入用户的任务 行参数来完成多个步骤处理的缺陷。 对丁 '多个任务步骤关 系的定义如下: (3) Task step related information: You can define multiple (people 2) task steps instead of the map and reduce steps of the hadoop system, and define the processing order between multiple task steps, that is, multiple task step relationships. . Overcome the defects of the hadoop system and the EMR system that need to be submitted multiple times, each time inputting the user's task line parameters to complete multiple steps. The definition of D's multiple task steps is as follows:
<Step elation> <Step elation>
<StepName>**</StepName> 解释为 StepName (步骤名) <StepName>**</StepName> is interpreted as StepName (step name)
<StepRat io>**</StePRatio> 解释为 Step间的运行任务进程对 比例 <StepRat io>**</Ste P Ratio> is interpreted as the running task process comparison between Steps
<Previous>**</Previous> 解释为前一个 Step <Previous>**</Previous> is interpreted as the previous step
<Next>**</Next> 解释为后一个 Step <Next>**</Next> explained as the next step
</StepRelat ion> </StepRelat ion>
其中, StepRat io为每一个 St印之间的任务 行进程比例关系, 调整 StepRatio从而实现任务 运行进程的调整。
如图 4所示的任务步骤关系, 根据任务步骤关系, 任务管理器进行任务的调度, 运行一个 Step 出现的结¾作为下一个 Step的输入继续运行, 如 St印 1后进入 Step2 , 再进入 Step3。 Among them, StepRat io is the ratio of the task line process between each St., and adjusts the StepRatio to adjust the task running process. According to the task step relationship shown in FIG. 4, according to the task step relationship, the task manager performs the task scheduling, and runs a Step that appears in the Step as the input of the next Step, and continues to run, for example, after Step 1 enters Step 2, and then enters Step 3 .
如图 5所示包括分叉的任务步骤关系, 如 Step21和 St印 22并列位 Τ· Stepl之后。 As shown in Figure 5, the task step relationship including bifurcation, such as Step21 and St. 22, is placed after Τ·Stepl.
( 4 ) 任务步骤信息, 任务步骤信息可以包拈每一个 Step运行的资源信息、 用户程序及 W户其 他设置。 任务步骤信息如: (4) Task step information, the task step information can include the resource information of each step run, the user program and other settings of the W household. Task step information such as:
<StepDef> <StepDef>
<StepName>**</StepNarae> 解籽为 StepName <StepName>**</StepNarae> Solve the seed as StepName
<StepRelatedFi le> <StepRelatedFi le>
<JarRelativePath>/opt/Package/user. jar</ JarRelat i vePath> 解释为用户 Jar 包 <JarRelativePath>/opt/Package/user. jar</ JarRelat i vePath> is interpreted as user Jar package
<DownloadProtocol>LocalPath</DownloadProtocol> 解释为卜'载方式 <StepExecClass>** </StcpExecClass> 解释为此 step的 运行函数 <DownloadProtocol>LocalPath</DownloadProtocol> is interpreted as a 'loaded method' <StepExecClass>** </StcpExecClass> is interpreted as the run function of this step
<Integritycheck>false</lntegritycheck> 解释为 完整性 检测 <Integritycheck>false</lntegritycheck> is interpreted as integrity detection
<Partitioner>fal se</Part i t ioner> 解 释 为 <Partitioner>fal se</Part i t ioner> explained as
Part i tion Part i tion
<Combine>false</Combine> 解释为 Combine </StepRelatedFi le> <Combine>false</Combine> is interpreted as Combine </StepRelatedFi le>
<ResourceRequirement> 解释为资源需求 <ResourceRequirement> is interpreted as resource requirement
<Processor— percent X/Processor— percent〉 解释为 CPU占用 怙况 <Processor—percent X/Processor—percent> is interpreted as CPU usage
<Memory>**</Meraory> 解籽为内存需 求 <Memory>**</Meraory> Unsolving seeds for memory requirements
<Swap>**</Swap> 解释为虚拟内 存需求 <Swap>**</Swap> is interpreted as virtual memory requirements
<Bandwidth>**</Bandwid t,h> 解释为 带宽需 求 <Bandwidth>**</Bandwid t,h> is interpreted as bandwidth requirement
<Disk>**</Disk> 解释为硬盘需求 </ResourceRequi rement> <Disk>**</Disk> is interpreted as hard disk requirements </ResourceRequi rement>
<IsExclusiveVM>false</I sExc lus iveVM> 解释为 Step是否独 机器 <IsExclusiveVM>false</I sExc lus iveVM> Explain as Step is unique
<Faul tToleranceLevel>Normal</Fau 1 tToleranceLeve 1 > 解释为容错级别 <Faul tToleranceLevel>Normal</Fau 1 tToleranceLeve 1 > Interpreted as fault tolerance level
<ImageID>img-l l l l</ImageID> 解释为 Step运行环境: 镜像
<SpecTD>vsp-l l l /SpecID> 解释为 Step运行环境: 规格 <ImageID>img-l ll l</ImageID> is interpreted as the Step Runtime Environment: Mirror <SpecTD>vsp-l ll /SpecID> Interpreted as Step Runtime Environment: Specifications
<IsStateful>false</IsStateful > 解籽为是否是有状态的 <lsPreempt ive>fal se</IsPreempt i ve> 解释为 ϋ否可 抢 Λ·其他资源 <IsStateful>false</IsStateful > Whether the seed is stateful <lsPreempt ive>fal se</IsPreempt i ve> is interpreted as ϋNo 抢 Λ·Other resources
<Sendlmmediately>false</Sendlmmedi atel y> 解释为是否即时输 出 <Sendlmmediately>false</Sendlmmedi atel y> is interpreted as immediate output
<IsSequence>false</IsSequence> 解释为是否 st印处理完再进 行卜一个 step <IsSequence>false</IsSequence> is interpreted as whether or not the st print is processed and then a step is performed.
<Pre-Process> <Pre-Process>
<ScriptLanguageX/ScriptLanguage> 解释为脚本语言 Shel l或 perl <ScriptLanguageX/ScriptLanguage> is interpreted as a scripting language Shel l or perl
<ScriptUrlX/ScriptUrl> 解释为预处理脚本的地 址 <ScriptUrlX/ScriptUrl> is interpreted as the address of the preprocessor script
</Pre-Process> </Pre-Process>
<AutoScale/> 解释为 是否 autoscale ( 动伸缩) <AutoScale/> is interpreted as if autoscale
<thresholdnum/> 解释为 autoscale策略(队列中 task的数 S超过此数值时 scale out (扩 容)) <thresholdnum/> is interpreted as an autoscale policy (the number of tasks in the queue exceeds this value when scale out)
<ovcrNum> 解释为 scaleout时队列中超过的个数%overNum为启动 worker (执行进 禾 ¾ ) 的个数 <ovcrNum> is interpreted as scaleout when the number of overruns in the queue %overNum is the number of startup workers (executing 3⁄4)
</StepDef> </StepDef>
其中 "%"为相除运算的符号。 Where "%" is the sign of the division operation.
调度器根据任务步骤信息进行资源的中请, 进行任务处理。 如图 6所示, 结合图 3所示的并行处理装置, 以媒体处理转码业务为例, 可以提交多个转码任 务, ¼个转码任务包括分片、 转码、 合并 3个步骤, 分别对应为 stepl、 step2、 step3为例, 说明 本发明实施例的并行处理方法, 包括: The scheduler performs resource processing based on the task step information and performs task processing. As shown in FIG. 6, in combination with the parallel processing apparatus shown in FIG. 3, taking the media processing transcoding service as an example, multiple transcoding tasks can be submitted, and the 1⁄4 transcoding tasks include three steps of fragmentation, transcoding, and merging. The parallel processing methods of the embodiments of the present invention are described as follows: Step 1 , step 2 , and step 3 are respectively illustrated as follows:
步骤 61 : 用户登陆。 Step 61: The user logs in.
步骤 62 : 业务定义模块接收到用户提交 β定义业务定义文件的请求; 返回业务定义页面。 步骤 63 : 业务定义模块完成业务的定义, 并将生成的业务定义文件提交给业务解析模块。 步骤 64: 业务解析投块接收到业务定义文件, 将文件解析, 获取用户定义的业务信息。 Step 62: The service definition module receives a request for the user to submit a β definition service definition file; and returns a service definition page. Step 63: The service definition module completes the definition of the service, and submits the generated service definition file to the service parsing module. Step 64: The service parsing block receives the service definition file, parses the file, and obtains the user-defined service information.
步骤 65: 业务解析模块将业务信息保存在数据库中, 以及返冋业务信息的业务 ID。 Step 65: The service parsing module saves the service information in the database, and returns the service ID of the service information.
歩骤 66: 用户提交任务, Web Service接收到用户提交的任务处理请求, 在本实施方式中, 所 述任务处理谘求可以包括 Mj户的输入输出和所釆 MJ的业务 1D。
步骤 67: Web Service将所述任务处理请求转发给任务调度器。 Step 66: The user submits the task, and the web service receives the task processing request submitted by the user. In the embodiment, the task processing and consultation may include the input and output of the Mj household and the service 1D of the MJ. Step 67: The web service forwards the task processing request to the task scheduler.
步骤 68: 任务调度器根据所述任务处理 iff求, 找到川户定义的业务信息, 在数据库中获取这些 业务信息, 以及返冋川户提交成功, 在本实施方式屮, 所述任务处理请求为业务 ID。 Step 68: The task scheduler searches for the service information defined by the Kawasaki according to the task processing, obtains the service information in the database, and returns to the Kawagawa submission success. In this embodiment, the task processing request is Business ID.
步骤 69: 任务调度器根据业务信息中的任务拆分信息, 获取川户的应川程序。 Step 69: The task scheduler obtains the Sichuan River's Yingchuan program according to the task split information in the service information.
具体的, 任务调度器根据任务拆分信 , , 将任务拆分为多个小任务, 如 st印 1、 step2、 step3, 以便 ϊ'·更快的进行并行处理。 Specifically, the task scheduler splits the task into multiple small tasks according to the task splitting letter, such as st 1, step 2, and step 3, so that the parallel processing can be performed faster.
歩骤 610: 任务调度器向资源管理器中 ¾资源。 Step 610: The task scheduler allocates resources to the resource manager.
根据 stepl的任务歩骤信息中的运行信息, 任务调度器向资源管理器申请需耍的资源 (包括运 行机器的规格、 镜像、 W户任务 行时所 Λ川的资源: CPU、 内存、 虚拟内存、 硬盘、 网络带宽等)。 According to the running information in the task information of step1, the task scheduler requests the resource manager for the resources to be used (including the running specifications of the machine, the mirror, and the resources of the W-administrator line: CPU, memory, virtual memory) , hard disk, network bandwidth, etc.).
资源管理器根据任务管理器提供的信息返冋匹配的资源标识。 The resource manager returns the matching resource identifier based on the information provided by the task manager.
可选的, 任务调度器可以根据任务的优先级排序, 选择优先级高的任务进行并发处理, 或者任 务调度器对优先级进行调整。 Optionally, the task scheduler may perform priority processing according to the priority order of the tasks, select a task with a higher priority, or adjust the priority by the task scheduler.
步骤 611 : 在中请的资源中运行任务。 Step 611: Run the task in the resource in the request.
体的, 在启动资源时, 会启动一个任务运行投块, 用于任务运行和管理。 任务调度器发送信 息到资源中的任务运行模块。 In fact, when the resource is started, a task running block is started for task running and management. The task scheduler sends information to the task run module in the resource.
步骤 612 : 任务运行模块到文件存储部件获取该 stepl的用户应用程序。 Step 612: The task running module to the file storage component acquires the user application of the stepl.
步骤 613: 任务运行模块运行 stepl。 Step 613: The task running module runs stepl.
如果 Stepl冇了处理结果, 任务调度器会根据任务歩骤关联信息, 找到 St印 2, 再次向资源管 理器中诘 Step2对应的资源运行任务, 直到 Step3执行结 ifc。 If Stepl smashes the processing result, the task scheduler will search for the information according to the task, find St 2, and then run the task to the resource corresponding to Step 2 in the resource manager until Step3 executes the ifc.
由 7"-是多个步骤处理的, 步骤 68-613足 —个步骤都会涉及到的, 直到整个任务都下发成功。 可选的, 任务凋度器/士:凋度任务步骤 fl、 需耍指定中间步骤的输入输出, 并在业务定义文件中 定义了毎一步是否是完成后输出。 如果定义为不经过完整性检査和即时输出, 则任务中每一个 step 的一个结果输出后就可以 ¾接作为下一步的输入继续运行下一步, 实现步骤的并行处理。 反之, 如 果定义为需耍经过完整性检杳, 则每个 Step输出一批中间结果, 需要对中间结果进行一定的处理, 完全输出后, 进行下一步, 实现任务的巾行处理。 It is processed by 7"- is a plurality of steps, and steps 68-613 are all involved, until the entire task is successfully issued. Optional, task wither/士: withering task step fl, need Specify the input and output of the intermediate step, and define in the business definition file whether the step is output after completion. If it is defined as not performing integrity check and immediate output, one result of each step in the task can be output. 3⁄4 is connected as the input of the next step to continue the next step, and realize the parallel processing of the steps. Conversely, if it is defined as the integrity check, each Step outputs a batch of intermediate results, and the intermediate result needs to be processed. After the full output, proceed to the next step to achieve the towel processing of the task.
可选的, 业务定义投块, 业务解析投块, 任务 i周度器, 资源管理器, 可以设置在同一或不同的 服务器上。 任务运行模块, 文件存储部件 MJ以设 S在同一或不同的物理机或 VM上。 Optional, business-defined placement, business parsing, task i-period, resource manager, can be set on the same or different servers. The task running module, the file storage component MJ, is set to be on the same or different physical machine or VM.
还需耍说明的, 用户在提交一类业务定义文件/! 7 (步骤 61-65 ), 可用于同类的任务的运行, 即 用户提交多次同类型任务, 可共 W—个步骤 61-65提交的业务定义文件。 It is also necessary to explain that the user is submitting a type of service definition file /! 7 (steps 61-65), which can be used for the operation of the same type of task, that is, the user submits multiple tasks of the same type, which can be a total of steps 61-65. Submitted business definition file.
MJ户可提交多个任务, Web Service 会转发到任务调度器中并行处理, 实现一个业务定义文件 供多个任务采用, 实现任务的并行运行。 如类似步骤 66- -步骤 613: The MJ user can submit multiple tasks, and the Web Service forwards it to the task scheduler for parallel processing, implementing a service definition file for multiple tasks, and implementing parallel running of tasks. Like similar steps 66- - step 613:
步骤 660: 户提交任务, Web Serv ice接收到川户提交任务处理的请求 (包括用户的输入输出 和所采用的业务 ID)。 Step 660: The user submits the task, and the Web Service receives the request (including the user's input and output and the used service ID) of the Sichuan household to submit the task processing.
步骤 670: Web Service将请求 (包括 户的输入输出和所采用的业务 ID) 转发给任务调度器。 步骤 680: 任务调度器根据谘求的信息(业务 ID), 找到用户定义的业务信息, 在数据库中获取
这些业务信息, 以及返冋 W户提交成功。 Step 670: The web service forwards the request (including the input and output of the household and the used service ID) to the task scheduler. Step 680: The task scheduler finds user-defined business information according to the consulted information (service ID), and obtains it in the database. These business information, as well as the return of the W households submitted successfully.
步骤 690: 任务调度器根据业务信息中的任务拆分信息, 获取 HJ户的应用程序。 Step 690: The task scheduler obtains an application of the HJ household according to the task split information in the service information.
步骤 6100: 任务调度器向资源管理器中访资源。 资源管理器根据任务管理器提供的信息返冋匹 配的资源标识。 Step 6100: The task scheduler accesses the resource to the resource manager. The resource manager returns the matching resource ID based on the information provided by the task manager.
歩骤 6110: 任务调度器将为 stepl中 iff的资源通知给任务运行模块。 Step 6110: The task scheduler notifies the task running module of the resources of the iff in stepl.
步骤 6120: 任务运行模块到文件存储部件获取该 step]的 MJ户应用程序。 Step 6120: The task running module goes to the file storage component to obtain the MJ user application of the step].
步骤 6130: 任务运行模块运行 stepl。 Step 6130: The task running module runs stepl.
可见, 采用多个任务步骤对任务进行并行处理, 克服了 hadoop系统、 EMR系统需耍提交多次, 每次输入川户的任务运行参数来完成多个步骤处理的缺陷。 It can be seen that the task is processed in parallel by using multiple task steps, which overcomes the defects that the hadoop system and the EMR system need to submit multiple times, and input the task parameters of the Sichuan household each time to complete the multiple steps.
而且, 户可提交多个任务, 实现一个业务定义文件供多个同类型的任务采用, 实现任务的并 Moreover, the user can submit multiple tasks, implement a business definition file for multiple tasks of the same type, and implement the task.
本领域 通技术人员可以理解实现上述实施例方法中的全部或部分流程, 是可以通过计算机程 序来指令相关的硬件来完成, 所述的程序可存储于一计算机可读取存储介质中, 该程序在执行时, 可包括如上述各方法的实施例的流程。 其中, 所述的存储介质可为磁碟、 光盘、 只读存储记忆体 ( Read-Only Memory, ROM) 或随机存储记忆体 (Random Access Memory, RAM) 等。
A person skilled in the art can understand that all or part of the process of implementing the above embodiment method can be completed by a computer program to instruct related hardware, and the program can be stored in a computer readable storage medium, the program In execution, the flow of an embodiment of the methods as described above may be included. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM).
Claims
1、 种并行处理方法, 其特征在 T, 包拈: 1, a parallel processing method, its characteristics in T, package:
接收多个任务处理请求, 根据所述任务处理 if!求携带的业务标识, 确定任务对应的业务信息; 根据所述任务对应的业务信息, 采用多个任务步骤对所述多个任务进行并行处理, 所述多个任 务步骤的步骤个数大 T等丁 - 2。 Receiving a plurality of task processing requests, determining the service information corresponding to the task according to the service identifier of the task, and determining the service information corresponding to the task; and performing parallel processing on the multiple tasks by using multiple task steps according to the service information corresponding to the task The number of steps of the plurality of task steps is a large T equal to -2.
2、 根据权利耍求 1所述的方法, 其特征在丁 ·, 所述在接收多个任务处理请求之前, 所述力 -法还 包括: 2. The method according to claim 1, wherein the method further comprises: before receiving the plurality of task processing requests, the force-method further comprises:
获取 户 ΰ定义的业务定义文件; Obtain the business definition file defined by the user;
解析所述业务定义文件, 获取所述业务信息; Parsing the service definition file to obtain the service information;
生成业务标识, 并建立业务标识与所述业务信息的对应关系。 A service identifier is generated, and a correspondence between the service identifier and the service information is established.
3、 根据权利要求 1或 2所述的方法, 其特征在于, 所述业务信息包括: The method according to claim 1 or 2, wherein the service information comprises:
任务定义信息: W于定义任务的容错级别和计算模型; Task definition information: W is used to define the fault tolerance level and calculation model of the task;
任务拆分信息: 用于将任务拆分成多个任务步骤; Task split information: Used to split a task into multiple task steps;
任务歩骤关联信息: 用于定义多个任务步骤之间的处理顺序; Task step association information: used to define the processing order between multiple task steps;
以及, 任务歩骤信息: 用丁 -定义每个任务步骤的运行信息, 所述运行信息包括: 资源信息、 用 户程序以及川户设置。 And, the task step information: defines the operation information of each task step, and the operation information includes: resource information, a user program, and a banker setting.
4、 根据权利耍求 3所述的方法, 其特征在丁 ·, 所述根据任务对应的业务信息, 采用多个任务步 骤对任务进行处理的方式, 包括: The method according to claim 3, wherein the method for processing the task by using multiple task steps according to the service information corresponding to the task includes:
根据所述业务信息中的任务拆分信息, 将所述任务拆分成多个任务步骤: Splitting the task into multiple task steps according to the task split information in the service information:
根据所述业务信息中的任务步骤信息, 获取任务步骤的用户程序, 以及为所述任务步骤巾请资 源: Obtaining a user program of the task step according to the task step information in the service information, and requesting resources for the task step:
根据所述业务信息中的任务步骤关联信息, 中请到的资源, 调用用户程序, 对所述任务进 行处理, S到按照多个任务步骤之间的处理顺序, 完成多个所述任务步骤的处理。 Determining, according to the task step related information in the service information, the user resource, calling the user program, processing the task, and completing the plurality of the task steps according to the processing sequence between the multiple task steps. deal with.
5、根据权利耍求 4所述的方法, 其特征在 Γ·, 所述运行信息还包括:多个任务步骤的处理模式, 该处理模式为串行处理校式或者并行处理模式: 5. The method according to claim 4, wherein the operation information further comprises: a processing mode of the plurality of task steps, the processing mode being a serial processing mode or a parallel processing mode:
当多个任务步骤的处理模式为所述串行处理模式时, 多个所述任务步骤中前一个任务歩骤的所 有输出经过完整性检 后, 作为多个所述任务步骤屮后一个仟务歩骤的输入; When the processing mode of the plurality of task steps is the serial processing mode, after all the outputs of the previous task step in the plurality of task steps are subjected to the integrity check, as a plurality of the task steps Input of a step;
当多个任务步骤的处理模式为所述并行处理模式时, 多个所述任务步骤中前一个任务步骤的任 一个输山直接作为多个所述任务步骤中后一个任务歩骤的输入。 When the processing mode of the plurality of task steps is the parallel processing mode, any one of the plurality of the task steps is directly input as the input of the next one of the plurality of task steps.
6、 一种并行处理装置, 其特征在丁-, 包括: 6. A parallel processing device, characterized by D-, comprising:
接收单元, W于接收多个任务处理诘求, 根据所述任务处理请求携带的业务标识, 确定任务对 应的业务信息; Receiving, by the receiving unit, a plurality of task processing requests, determining service information corresponding to the task according to the service identifier carried in the task processing request;
处理单元, 用于根据所述任务对应的业务信总, 采用多个任务步骤对所述多个任务进行并行处 理, 所述多个任务步骤的步骤个数人于 2。 The processing unit is configured to perform parallel processing on the multiple tasks by using multiple task steps according to the total service correspondence corresponding to the task, where the number of steps of the multiple task steps is 2.
7、 根据权利耍求 6所述的装置, 其特征在丁 ·, 所述装置还包括: 获取 元, 川 T获取 1+j户 Cl定义的业务定义文件; 7. The device according to claim 6, wherein the device further comprises: Obtaining the yuan, Chuan T obtains the business definition file defined by 1+j households Cl;
解析单元, 川丁解析所述业务定义文件, 获取所述业务信息, 生成业务标识, 并建立业务标识 Lj所述业务信息的对应关系; a parsing unit, the Kawasawa parses the service definition file, obtains the service information, generates a service identifier, and establishes a correspondence relationship between the service information of the service identifier Lj;
存储单元, W Τ·存储所述业务标识 所述业务信息的对应关系。 The storage unit, W Τ· stores the correspondence between the service identifiers and the service information.
8、 根据权利耍求 6或 7所述的装置, 其特征在丁 ·, 所述业务信息包括: 8. The apparatus according to claim 6 or 7, wherein the service information comprises:
任务定义信息: 丁 ·定义任务的容错级别和计算模: Task definition information: D. Define the fault tolerance level and calculation mode of the task:
任务拆分信息: 用丁-将任务拆分成多个任务步骤; Task split information: Use Ding - split the task into multiple task steps;
任务步骤关联信息: 于定义多个仟务歩骤之间的处理顺序; Task step association information: The order of processing between defining multiple tasks;
以及, 任务步骤信息: 用于定义每个任务步骤的运行信息, 所述运行信息包括: 资源信息、 ffl 户程序以及 户设置。 And, the task step information: used to define running information of each task step, the running information includes: resource information, ffl household program, and household settings.
9、 根据权利耍求 8所述的装置, 其特征在于, 所述处理单元, 具体用于: The device according to claim 8, wherein the processing unit is specifically configured to:
根据所述业务信息中的任务拆分信息, 将所述任务拆分成多个任务步骤: Splitting the task into multiple task steps according to the task split information in the service information:
根据所述业务信息中的任务步骤信息, 获収 tt务步骤的用户程序, 以及为所述任务步骤中请资 源: According to the task step information in the service information, the user program that receives the tt step, and the resources for the task step:
根据所述业务信息中的任务步骤关系信息, Λ ¾中请到的资源, 调用用户程序, 对所述任务进 行处理, 直到按照多个任务步骤之间的处理顺序, 完成多个所述任务步骤的处理。 Determining the task according to the task step relationship information in the service information, calling the user program, and completing the plurality of the task steps according to the processing sequence between the multiple task steps. Processing.
10、 根据权利要求 9所述的装置, 其特征在 Γ, 所述运行信息还包括: 多个任务步骤的处理模 式, 该处理模式为 Φ行处理模式或者并行处理模式, 所述处理单元, 具体还用丁-: The device according to claim 9, wherein the operation information further comprises: a processing mode of a plurality of task steps, the processing mode is a Φ row processing mode or a parallel processing mode, and the processing unit is specific Also use Ding-:
当多个任务步骤的处理模式为所述巾行处理模式时, 多个所述任务步骤中前一个任务步骤的所 有输出经过完整性检 S后, 作为多个所述任务歩骤中后一个任务歩骤的输入; When the processing mode of the plurality of task steps is the towel processing mode, after all the outputs of the previous one of the plurality of task steps pass the integrity check S, the next one of the plurality of task steps Input of a step;
当多个任务步骤的处理投式为所述并行处理模式时, 多个所述任务步骤中前一个任务步骤的任 一个输出直接作为多个所述任务步骤中后一个任务步骤的输入。 When the processing of the plurality of task steps is the parallel processing mode, any one of the plurality of the task steps outputs directly as an input to the next one of the plurality of task steps.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201280000701.4A CN103502941B (en) | 2012-03-19 | 2012-03-19 | A kind of method for parallel processing and device |
PCT/CN2012/072545 WO2013138982A1 (en) | 2012-03-19 | 2012-03-19 | A parallel processing method and apparatus |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2012/072545 WO2013138982A1 (en) | 2012-03-19 | 2012-03-19 | A parallel processing method and apparatus |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2013138982A1 true WO2013138982A1 (en) | 2013-09-26 |
Family
ID=49221785
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2012/072545 WO2013138982A1 (en) | 2012-03-19 | 2012-03-19 | A parallel processing method and apparatus |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN103502941B (en) |
WO (1) | WO2013138982A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107016480A (en) * | 2016-01-28 | 2017-08-04 | 五八同城信息技术有限公司 | Method for scheduling task, apparatus and system |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107818097B (en) * | 2016-09-12 | 2020-06-30 | 平安科技(深圳)有限公司 | Data processing method and device |
CN108898482B (en) * | 2018-07-09 | 2021-02-26 | 中国建设银行股份有限公司 | Multi-product signing method and device |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5812809A (en) * | 1989-09-04 | 1998-09-22 | Mitsubishi Denki Kabushiki Kaisha | Data processing system capable of execution of plural instructions in parallel |
CN1988463A (en) * | 2005-12-21 | 2007-06-27 | 国际商业机器公司 | Method and system for large message broadcast |
CN101110022A (en) * | 2007-08-30 | 2008-01-23 | 济南卓信智能科技有限公司 | Method for implementing workflow model by software |
CN101770402A (en) * | 2008-12-29 | 2010-07-07 | 中国移动通信集团公司 | Map task scheduling method, equipment and system in MapReduce system |
CN102033748A (en) * | 2010-12-03 | 2011-04-27 | 中国科学院软件研究所 | Method for generating data processing flow codes |
-
2012
- 2012-03-19 WO PCT/CN2012/072545 patent/WO2013138982A1/en active Application Filing
- 2012-03-19 CN CN201280000701.4A patent/CN103502941B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5812809A (en) * | 1989-09-04 | 1998-09-22 | Mitsubishi Denki Kabushiki Kaisha | Data processing system capable of execution of plural instructions in parallel |
CN1988463A (en) * | 2005-12-21 | 2007-06-27 | 国际商业机器公司 | Method and system for large message broadcast |
CN101110022A (en) * | 2007-08-30 | 2008-01-23 | 济南卓信智能科技有限公司 | Method for implementing workflow model by software |
CN101770402A (en) * | 2008-12-29 | 2010-07-07 | 中国移动通信集团公司 | Map task scheduling method, equipment and system in MapReduce system |
CN102033748A (en) * | 2010-12-03 | 2011-04-27 | 中国科学院软件研究所 | Method for generating data processing flow codes |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107016480A (en) * | 2016-01-28 | 2017-08-04 | 五八同城信息技术有限公司 | Method for scheduling task, apparatus and system |
Also Published As
Publication number | Publication date |
---|---|
CN103502941A (en) | 2014-01-08 |
CN103502941B (en) | 2017-11-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP7197612B2 (en) | Execution of auxiliary functions on on-demand network code execution systems | |
US11243953B2 (en) | Mapreduce implementation in an on-demand network code execution system and stream data processing system | |
US11336583B2 (en) | Background processes in update load balancers of an auto scaling group | |
CN112513811B (en) | Operating system customization in on-demand network code execution systems | |
US10817331B2 (en) | Execution of auxiliary functions in an on-demand network code execution system | |
US11010188B1 (en) | Simulated data object storage using on-demand computation of data objects | |
US11422844B1 (en) | Client-specified network interface configuration for serverless container management service | |
Ge et al. | GA-based task scheduler for the cloud computing systems | |
US11392422B1 (en) | Service-managed containers for container orchestration service | |
US20200218579A1 (en) | Selecting a cloud service provider | |
US11838384B2 (en) | Intelligent scheduling apparatus and method | |
US10038640B2 (en) | Managing state for updates to load balancers of an auto scaling group | |
CN109726004B (en) | Data processing method and device | |
CN106775948B (en) | Cloud task scheduling method and device based on priority | |
CN111078516A (en) | Distributed performance test method and device and electronic equipment | |
WO2013123650A1 (en) | Method for virtual machine assignment and device for virtual machine assignment | |
CN113055199B (en) | Gateway access method and device and gateway equipment | |
US11861386B1 (en) | Application gateways in an on-demand network code execution system | |
US11144359B1 (en) | Managing sandbox reuse in an on-demand code execution system | |
WO2022257247A1 (en) | Data processing method and apparatus, and computer-readable storage medium | |
WO2013138982A1 (en) | A parallel processing method and apparatus | |
CN115640113A (en) | Multi-plane flexible scheduling method | |
EP3539278B1 (en) | Method and system for affinity load balancing | |
Yeh et al. | Realizing integrated prioritized service in the Hadoop cloud system | |
Kurdi et al. | A hybrid approach for scheduling virtual machines in private clouds |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 12872085 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 12872085 Country of ref document: EP Kind code of ref document: A1 |