CN102279730A - Parallel data processing method, device and system - Google Patents

Parallel data processing method, device and system Download PDF

Info

Publication number
CN102279730A
CN102279730A CN2010102008917A CN201010200891A CN102279730A CN 102279730 A CN102279730 A CN 102279730A CN 2010102008917 A CN2010102008917 A CN 2010102008917A CN 201010200891 A CN201010200891 A CN 201010200891A CN 102279730 A CN102279730 A CN 102279730A
Authority
CN
China
Prior art keywords
task
main equipment
slave unit
state
execution
Prior art date
Application number
CN2010102008917A
Other languages
Chinese (zh)
Other versions
CN102279730B (en
Inventor
樊航成
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Priority to CN201010200891.7A priority Critical patent/CN102279730B/en
Publication of CN102279730A publication Critical patent/CN102279730A/en
Application granted granted Critical
Publication of CN102279730B publication Critical patent/CN102279730B/en

Links

Abstract

The embodiment of the invention discloses a parallel data processing method, device and system. The method comprises the following steps: primary equipment acquires data needing to be processed from a data source and creates a task for each datum to be processed; the primary equipment distributes a task to slave equipment which sends a request when a task acquisition request message sent by the slave equipment is received, combines execution results returned by the slave equipment and dynamically records the execution state of each task, wherein the execution state comprises non-execution, during completion, execution completion and combination completion; and the primary equipment outputs an execution result of the combined task. According to the embodiment of the invention, quick adjustment of the cluster scale of the system can be supported under the condition of resource insufficiency or resource waste.

Description

A kind of parallel data processing method, device and parallel data handling system

Technical field

The application relates to communication and field of computer technology, particularly relates to a kind of parallel data processing method, device and parallel data handling system.

Background technology

Along with the development of web2.0 technology, the business datum in internet, applications or the internet platform as user behavior data and platform system data, all presents the trend that magnanimity increases.In order to adapt to the application demand of the magnanimity business datum being carried out data processing, as, in the internet site platform, need analyze and calculate user behavior data and platform system data, a kind of distributed parallel data treatment technology arises at the historic moment, it utilizes the mutual cooperative work of a plurality of computing machines, finishes the processing to mass data jointly.

Current, in large-scale internet site platform, it is the Hadoop system framework that a kind of distributed parallel that is most widely used calculates framework.See also Fig. 1, it is the structural representation of Hadoop system framework in the prior art.As shown in Figure 1, comprise a main equipment (Master) and a slave unit (Slave) cluster in the system, wherein, every slave unit all has back end (DataNode) and the subtask tracker (TaskTracker) on the logic function.DataNode is responsible for the storage service data, and TaskTracker is responsible for carrying out the task that main equipment pushes, that is, the business datum of storing among the DataNode is handled, and the task executions result is carried out the part merging.Main equipment comprises namenode (NameNode) and task tracker (JobTracker) from logic function.NameNode is in charge of the business datum of storing in each slave unit, and JobTracker is responsible for starting, follows the tracks of and dispatch each slave unit.

But, the inventor finds under study for action, in existing Hadoop system, main equipment tabulates the information of all slave units in the management cluster by safeguarding a nodal information, and formulated task allocation algorithms based on all the slave unit information in the nodal information tabulation, according to task allocation algorithms task is pushed to each slave unit.Yet, inadequate resource appears in system, when needing the dynamic expansion slave unit, the wasting of resources perhaps appears, when needing the deletion slave unit, main equipment must upgrade the nodal information tabulation of self maintained earlier, formulates new task allocation algorithms based on the nodal information tabulation of upgrading again, so that main equipment is pushed to each slave unit according to task allocation algorithms with task, carry out data processing concurrently by slave unit.

This shows, data processing method and corresponding parallel data disposal system process when expansion or deletion slave unit parallel in the existing Hadoop system are loaded down with trivial details, and be unfavorable for dynamic expansion or deletion slave unit, can't the rapid adjustment cluster scale under the situation of the inadequate resource or the wasting of resources.

Summary of the invention

In order to solve the problems of the technologies described above, the embodiment of the present application provides a kind of parallel data processing method and parallel data handling system, with back-up system rapid adjustment cluster scale under the situation of the inadequate resource or the wasting of resources.

The embodiment of the present application discloses following technical scheme:

A kind of parallel data processing method comprises: main equipment is known the pending data that need handle from data source, is task of each pending data creation; Main equipment is when the request message of the task of obtaining that receives the slave unit transmission, for sending the slave unit allocating task of request, the execution result that slave unit is returned merges, and, each task executions state of dynamically recording, described executing state comprise do not carry out, carry out in, executed and having merged; Main equipment output is merged the task executions result.

A kind of parallel data processing equipment comprises: the task creation module, be used for knowing the pending data that need handle from data source, and be task of each pending data creation; The Task Distribution module is used for when the request message of the task of obtaining that receives the slave unit transmission, is the slave unit allocating task that sends request; Merge module, be used for the execution result that slave unit returns is merged; The dynamically recording module is used for each task executions state of dynamically recording, described executing state comprise do not carry out, carry out in, executed and having merged; Output module is used for output and is merged the task executions result as a result.

As can be seen from the above-described embodiment, because no longer propelling movement task of main equipment is given slave unit, but when the request message of the task of obtaining that receives the slave unit transmission, be the slave unit allocating task, simultaneously, because main equipment is no longer by safeguarding that a nodal information tabulates the information of all slave units in the management cluster, but be task of each pending data creation, and each task executions state of dynamically recording.Therefore, for main equipment, slave unit can add cluster at any time and to the master devices request allocating task, perhaps withdraw from cluster at any time, can the rapid adjustment cluster scale under the situation of the inadequate resource or the wasting of resources.

Description of drawings

In order to be illustrated more clearly in the embodiment of the present application or technical scheme of the prior art, to do to introduce simply to the accompanying drawing of required use in embodiment or the description of the Prior Art below, apparently, for those of ordinary skills, under the prerequisite of not paying creative work, can also obtain other accompanying drawing according to these accompanying drawings.

Fig. 1 is the process flow diagram of an embodiment of a kind of data charging method of the application;

Fig. 2 is the process flow diagram of an embodiment of a kind of parallel data processing method of the application;

Fig. 3 is a kind of system applies scene of the application synoptic diagram;

Fig. 4 is the state transition graph of task among the application;

Fig. 5 is the interaction diagrams of a kind of parallel data processing of the application;

Fig. 6 is the structural drawing of an embodiment of a kind of parallel data processing equipment of the application;

Fig. 7 is the structural drawing of another embodiment of a kind of parallel data processing equipment of the application;

Fig. 8 is the structural drawing of an embodiment of the application's task module;

Fig. 9 is the structural drawing of an embodiment of a kind of parallel data handling system of the application.

Embodiment

For above-mentioned purpose, the feature and advantage that make the application can become apparent more, the embodiment of the present application is described in detail below in conjunction with accompanying drawing.

Embodiment one

See also Fig. 2, it is the process flow diagram of an embodiment of a kind of parallel data processing method of the application, and this method may further comprise the steps:

Step 201: main equipment is known the pending data that need handle from data source, is task of each pending data creation;

Wherein, described main equipment is known the pending data that need handle from data source, be task of each pending data creation, specifically can comprise: main equipment obtains the identification list of the pending data that need handle from data source, has safeguarded the Data Identification of all pending data in the described identification list; Main equipment extracts the sign of each pending data from described identification list, after being task of each pending data creation, the sign of extracting is put into this task.

For example, see also Fig. 3, it is a kind of system applies scene of the application synoptic diagram.As shown in Figure 3, the data source place has the identification list of the pending data that need handle, has safeguarded the Data Identification of all pending data in identification list, for example, and can be with the address information of data as Data Identification.After main equipment has obtained identification list from data source, just can know that according to each Data Identification in the identification list which data resource is the pending data that needs are handled.After main equipment is task of pending data creation, from identification list, extract the Data Identification of these pending data, and the sign that will extract is put into this task.For example, after being task 1 of pending data A establishment, main equipment extracts the Data Identification of pending data A from identification list, and the Data Identification of pending data A is put into task 1.

Step 202: main equipment is the slave unit allocating task that sends request when the request message of the task of obtaining that receives the slave unit transmission, and the execution result that slave unit is returned merges; And, each task executions state of dynamically recording, described executing state comprises: in not carrying out, carrying out, executed and having merged;

For example, different with the task push-mechanism in the existing hadoop system, main equipment is based on the request mechanism of slave unit in the embodiment of the present application to the slave unit allocating task, promptly, when main equipment is received the request message of the task of obtaining that slave unit sends, again for sending the slave unit allocating task of request.Return task executions as a result the time as slave unit, the execution result that slave unit is returned merges.Simultaneously, each task executions state of main equipment dynamically recording, described executing state comprises: in not carrying out, carrying out, executed and having merged.

Wherein, each task executions state of described main equipment dynamically recording is specially: after main equipment is created a task, the task flagging of creating is execution; And, after main equipment receives the execution result that slave unit returns, be executed with complete task flagging; And, when main equipment is checked through the task of being in the executed state, and after execution result merged, with merged task flagging for merging.

Need to prove, since main equipment create a task time, receive the time of the execution result that slave unit returns and check whether not strict sequencing cycle length that is in the executed state is arranged, therefore, the embodiment of the present application does not limit the execution sequence of above-mentioned three labeling processes yet.For example,,, do not carried out yet, therefore, the task flagging of this new establishment is execution by slave unit because new creating of task is not distributed to slave unit whenever main equipment is new when creating a task.Receive the execution result of certain task A that certain slave unit returns whenever main equipment after, complete task A is labeled as executed.Whenever whether the main equipment inspection has the proof cycle that is in the executed state task arrive, and after execution result merged, with merged task flagging for merging.

Also need to prove, whether main equipment is except having the task of being in the executed state by periodic test, main equipment also can whenever receive the execution result that slave unit returns, and just checks the once task of being in the executed state that whether has, and the embodiment of the present application does not limit this.Certainly, preceding a kind of method can be saved system power dissipation effectively.

Step 203: main equipment output is merged the task executions result.

Need to prove that main equipment output is in advance merged the result of task, also can be that main equipment is exported all and merged the task executions result when all tasks all are in merging phase.For example, main equipment is periodically checked all task executions states, when all task executions states all when merging, export all and merged the task executions result.

See also Fig. 4, it is the state transition graph of task among the application.As shown in Figure 4, when task was created, its state was not for carrying out; As slave unit acquisition request task, and main equipment selection task and when distributing to slave unit from the task of being in executing state not, the state of task is never carried out and is converted in the execution; In the preset time after the Task Distribution, main equipment is not received the execution result of slave unit feedback, and the state of task is not converted to again from carry out and carries out; After slave unit was finished the work and execution result fed back to main equipment, the state of task was converted to executed from carry out; After main equipment merged the task of executed state, the state of task was converted to from executed and merges.

In the prior art, for the monitoring task is handled when whether being in unusual condition, therefore main equipment need, cause task executions efficient lower to a plurality of slave units poll practice condition repeatedly, and the stability of system and availability are also lower.In order further to improve the stability and the availability of task executions efficient, system, preferably, the method of the embodiment of the present application also comprises: in the task of main equipment state in being in execution, whether periodic test exists the task of not returning execution result in the preset time, if exist, the task of not returning execution result in the described preset time is not labeled as execution again.

For example, after main equipment is distributed to certain slave unit with task A, task A is labeled as in the execution, set a timer simultaneously, the timing of this timer is a preset time, when timer expiry,, then again task A is not labeled as and carries out if main equipment is not still received the execution result of task A.

At this moment, in the main equipment side, newly create and be labeled as the unenforced task except comprising a part, comprise that also a part is marked as the not task of executing state again owing to not being performed in preset time, when main equipment when receiving the request message of the task of obtaining that slave unit sends, can select one currently to be in the not task of executing state arbitrarily, and distribute to the slave unit of the request of transmission.Preferably, can will newly create and be in the slave unit that described transmission request is distributed in the not priority of task of executing state; When new establishment and be in not the task of executing state be assigned with finish after, will be labeled as not the task of executing state more again and give the slave unit of described transmission request according to the time sequencing primary distribution that once was assigned with.

For process that can simple declaration main equipment allocating task, be in not to have 5 in the main equipment side that the task of executing state is an example, task 1 wherein, task 2 and task 3 are for newly creating and being in the not task of executing state, task 4 and task 5 are for being labeled as the not task of executing state again, and, the time that the time that task 4 is assigned with for the first time is assigned with for the first time early than task 5.When initial, main equipment is preferentially distributed to slave unit with task 1, task 2 and task 3, when task 1, task 2 and task 3 all be assigned with finish after, main equipment is distributed to slave unit with task 4 earlier, again task 5 is distributed to slave unit.

As can be seen from the above-described embodiment, because no longer propelling movement task of main equipment is given slave unit, but when the request message of the task of obtaining that receives the slave unit transmission, be the slave unit allocating task, simultaneously, because main equipment is no longer by safeguarding that a nodal information tabulates the information of all slave units in the management cluster, but be task of each pending data creation, and each task executions state of dynamically recording.Therefore, for main equipment, slave unit can add cluster at any time and to the master devices request allocating task, perhaps withdraw from cluster at any time, can the rapid adjustment cluster scale under the situation of the inadequate resource or the wasting of resources.

In addition, because the slave unit task executions is no longer by the main equipment complete monitoring, main equipment is the maintenance task state only, in case do not return in the certain hour after task is assigned with, assert that promptly abnormal conditions have appearred in task executions, task status is not labeled as execution again, task is distributed again.Thereby stability and the availability of task executions efficient, system have further been improved.

Embodiment two

Describe parallel data processing method in detail from main equipment and slave unit reciprocal process below.See also Fig. 5, it is the interaction diagrams of a kind of parallel data processing of the application, and as shown in Figure 5, described interaction flow comprises:

Step 501: main equipment obtains the identification list of the pending data that need handle from data source;

Wherein, data source can be ftp server, database (DB) or file system.By identification list, main equipment can know which data is pending data.

Step 502: main equipment is the task of each pending data creation that is identified in the identification list, uses a task queue to safeguard all tasks, and the task flagging that will newly create is executing state not;

Wherein, main equipment is also put into corresponding task with the Data Identification of each pending data when creation task.

Step 503: main equipment receives the request message of the task of obtaining of slave unit transmission;

Step 504: the Task Distribution that will be in executing state not from task queue is given the slave unit of the request of transmission, and the state of task never carried out is labeled as in the execution;

Step 505: after slave unit receives the task of main equipment transmission, from task, resolve the Data Identification that obtains pending data;

Step 506: slave unit obtains pending data according to Data Identification from data source;

Step 507: slave unit is analyzed and is calculated the pending data of obtaining;

Above step 505-507 is the process that slave unit is executed the task, and wherein, analysis and the computing method for the treatment of deal with data can adopt method same as the prior art, so the embodiment of the present application repeats no more this.

Step 508: the result that slave unit will calculate and analyze returns to main equipment, and sends the request message that obtains next task to main equipment;

Step 509: after main equipment receives the execution result that slave unit returns, the state of task is labeled as executed from carry out;

Step 510: main equipment checks in the task queue whether the task of being in the executed state is arranged, if having, the result merges to task executions, the state of task is labeled as from executed merge, and if not, waits for next time and checking;

Wherein, main equipment can be periodic to the inspection of the task of executed state, also can be to trigger next time when returning execution result by slave unit to check.

Step 511: main equipment checks whether all tasks all are in merging phase in the task queue, if, export all and merged the task executions result, if not, wait for next time and checking;

Wherein, main equipment can be periodic to the inspection of the task of merging phase.

Step 512: in the task of main equipment state in being in execution, check whether there is the task of not returning execution result in the preset time,, the task of not returning execution result in the described preset time is not labeled as execution again if exist.

Need to prove that step 510-step 512 does not have strict execution sequencing with other steps 501-509, and, there is not strict execution sequencing between the step 510-step 512 yet, when its arrival is checked next time, can carry out this step.

As can be seen from the above-described embodiment, because no longer propelling movement task of main equipment is given slave unit, but when the request message of the task of obtaining that receives the slave unit transmission, be the slave unit allocating task, simultaneously, because main equipment is no longer by safeguarding that a nodal information tabulates the information of all slave units in the management cluster, but be task of each pending data creation, and each task executions state of dynamically recording.Therefore, for main equipment, slave unit can add cluster at any time and to the master devices request allocating task, perhaps withdraw from cluster at any time, can the rapid adjustment cluster scale under the situation of the inadequate resource or the wasting of resources.

In addition, because the slave unit task executions is no longer by the main equipment complete monitoring, main equipment is the maintenance task state only, in case do not return in the certain hour after task is assigned with, assert that promptly abnormal conditions have appearred in task executions, task status is not labeled as execution again, task is distributed again.Thereby stability and the availability of task executions efficient, system have further been improved.

Embodiment three

Corresponding with above-mentioned a kind of parallel data processing method, the embodiment of the present application also provides a kind of parallel data processing equipment.See also Fig. 6, it is the structural drawing of an embodiment of a kind of parallel data processing equipment of the application, and this device comprises task creation module 601, Task Distribution module 602, merges module 603, dynamically recording module 604 and output module 605 as a result.Principle of work below in conjunction with this device is further introduced its inner structure and annexation.

Task creation module 601 is used for knowing the pending data that need handle from data source, is task of each pending data creation;

Task Distribution module 602 is used for when the request message of the task of obtaining that receives the slave unit transmission, is the slave unit allocating task that sends request;

Merge module 603, be used for the execution result that slave unit returns is merged;

Dynamically recording module 604 is used for each task executions state of dynamically recording, described executing state comprise do not carry out, carry out in, executed and having merged;

Output module 605 as a result, are used for output and merged the task executions result.

Preferably, see also Fig. 7, it is the structural drawing of another embodiment of a kind of parallel data processing equipment of the application, as shown in Figure 7, described device also comprises: heavy logging modle 606, be used in the task of being in the execution state, and check whether there is the task of not returning execution result in the preset time, if exist, the task of not returning execution result in the described preset time is not labeled as execution again.

Preferably, see also Fig. 8, it is the structural drawing of an embodiment of the application's task creation module, and the task creation module comprises: submodule 801 is obtained in tabulation and sign is extracted submodule 802, wherein,

Submodule 801 is obtained in tabulation, is used for obtaining from data source the identification list of the pending data that need handle, has safeguarded the Data Identification of all pending data in the described identification list;

Sign is extracted submodule 802, is used for extracting from described identification list the sign of each pending data, after being task of each pending data creation, the sign of extracting is put into task.

Preferably, the dynamically recording module comprises: the first mark submodule after creating a task, is execution with the task flagging of creating; The second mark submodule is used for after receiving the execution result that slave unit returns, and is executed with complete task flagging; The 3rd mark submodule, be used for when periodic test to the task of being in the executed state is arranged, and after execution result merged, with merged task flagging for merging.

At the parallel data processing equipment among Fig. 7, preferably, the Task Distribution module comprises: first distribution sub module is used for and will newly creates and be in the slave unit that described transmission request is distributed in the not priority of task of executing state; Second distribution sub module is used for when newly creating and being in after the task of executing state has not been assigned with, and the task of executing state not of will being labeled as is again again distributed to the slave unit of described transmission request successively according to the time sequencing that is assigned with for the first time.

As can be seen from the above-described embodiment, because no longer propelling movement task of main equipment is given slave unit, but when the request message of the task of obtaining that receives the slave unit transmission, be the slave unit allocating task, simultaneously, because main equipment is no longer by safeguarding that a nodal information tabulates the information of all slave units in the management cluster, but be task of each pending data creation, and each task executions state of dynamically recording.Therefore, for main equipment, slave unit can add cluster at any time and to the master devices request allocating task, perhaps withdraw from cluster at any time, can the rapid adjustment cluster scale under the situation of the inadequate resource or the wasting of resources.

In addition, because the slave unit task executions is no longer by the main equipment complete monitoring, main equipment is the maintenance task state only, in case do not return in the certain hour after task is assigned with, assert that promptly abnormal conditions have appearred in task executions, task status is not labeled as execution again, task is distributed again.Thereby stability and the availability of task executions efficient, system have further been improved.

Embodiment four

The embodiment of the present application also provides a kind of parallel data handling system.See also Fig. 9, it is the structural drawing of an embodiment of a kind of parallel data handling system of the application, and this system comprises: the cluster that a main equipment 901 and a plurality of slave unit 902 are formed.Principle of work below in conjunction with this device is further introduced its inner structure and annexation.

Main equipment 901, be used for knowing the pending data that to handle from data source, be task of each pending data creation, when the request message of the task of obtaining that receives the slave unit transmission, for sending the slave unit allocating task of request, the execution result that slave unit is returned merges, and each task executions state of dynamically recording, described executing state comprise do not carry out, carry out in, executed and having merged, output is merged the amalgamation result of task;

Slave unit 902 is used for sending to described main equipment the request message of the task of obtaining, and after receiving the task that described main equipment distributes, carries out the task of distributing, and execution result is returned to described main equipment.

Preferably, it is executory task that main equipment 901 also is used at state, checks the task of not returning execution result in the preset time of whether depositing, if exist, the task of not returning execution result in the described preset time is not labeled as execution again.

As can be seen from the above-described embodiment, because no longer propelling movement task of main equipment is given slave unit, but when the request message of the task of obtaining that receives the slave unit transmission, be the slave unit allocating task, simultaneously, because main equipment is no longer by safeguarding that a nodal information tabulates the information of all slave units in the management cluster, but be task of each pending data creation, and each task executions state of dynamically recording.Therefore, for main equipment, slave unit can add cluster at any time and to the master devices request allocating task, perhaps withdraw from cluster at any time, can the rapid adjustment cluster scale under the situation of the inadequate resource or the wasting of resources.

In addition, because the slave unit task executions is no longer by the main equipment complete monitoring, main equipment is the maintenance task state only, in case do not return in the certain hour after task is assigned with, assert that promptly abnormal conditions have appearred in task executions, task status is not labeled as execution again, task is distributed again.Thereby stability and the availability of task executions efficient, system have further been improved.

Need to prove, one of ordinary skill in the art will appreciate that all or part of flow process that realizes in the foregoing description method, be to instruct relevant hardware to finish by computer program, described program can be stored in the computer read/write memory medium, this program can comprise the flow process as the embodiment of above-mentioned each side method when carrying out.Wherein, described storage medium can be magnetic disc, CD, read-only storage memory body (Read-Only Memory, ROM) or at random store memory body (Random AccessMemory, RAM) etc.

More than a kind of parallel data processing method, device and parallel data handling system that the application provided are described in detail, used specific embodiment herein the application's principle and embodiment are set forth, the explanation of above embodiment just is used to help to understand the application's method and core concept thereof; Simultaneously, for one of ordinary skill in the art, according to the application's thought, the part that all can change in specific embodiments and applications, in sum, this description should not be construed as the restriction to the application.

Claims (10)

1. a parallel data processing method is characterized in that, comprising:
Main equipment is known the pending data that need handle from data source, is task of each pending data creation;
Main equipment is when the request message of the task of obtaining that receives the slave unit transmission, for sending the slave unit allocating task of request, the execution result that slave unit is returned merges, and, each task executions state of dynamically recording, described executing state comprise do not carry out, carry out in, executed and having merged;
Main equipment output is merged the task executions result.
2. parallel data processing method according to claim 1 is characterized in that, described method also comprises:
In the task of main equipment state in being in execution, check whether there is the task of not returning execution result in the preset time,, the task of not returning execution result in the described preset time is not labeled as execution again if exist.
3. parallel data processing method according to claim 1 and 2 is characterized in that, described main equipment is known the pending data that need handle from data source, for task of each pending data creation specifically comprises:
Main equipment obtains the identification list of the pending data that need handle from data source, has safeguarded the Data Identification of all pending data in the described identification list;
Main equipment extracts the sign of each pending data from described identification list, after being task of each pending data creation, the sign of extracting is put into task.
4. parallel data processing method according to claim 1 and 2 is characterized in that, each task executions state of described main equipment dynamically recording comprises:
After main equipment is created a task, the task flagging of creating is execution; And,
After main equipment receives the execution result that slave unit returns, be executed with complete task flagging; And,
When main equipment is checked through the task of being in the executed state, and after execution result merged, with merged task flagging for merging.
5. parallel data processing method according to claim 2 is characterized in that, described main equipment is when the request message of the task of obtaining that receives the slave unit transmission, for the slave unit allocating task that sends request specifically comprises:
To newly create and be in the slave unit that described transmission request is distributed in the not priority of task of executing state;
When new establishment and be in after the task of executing state has not been assigned with, the task of executing state not of will being labeled as is again again distributed to the slave unit of described transmission request successively according to the time sequencing that is assigned with for the first time.
6. a parallel data processing equipment is characterized in that, comprising:
The task creation module is used for knowing the pending data that need handle from data source, is task of each pending data creation;
The Task Distribution module is used for when the request message of the task of obtaining that receives the slave unit transmission, is the slave unit allocating task that sends request;
Merge module, be used for the execution result that slave unit returns is merged;
The dynamically recording module is used for each task executions state of dynamically recording, described executing state comprise do not carry out, carry out in, executed and having merged;
Output module is used for output and is merged the task executions result as a result.
7. the device of parallel data processing according to claim 6 is characterized in that, also comprises:
Heavy logging modle is used in the task of being in the execution state, checks whether there is the task of not returning execution result in the preset time, if exist, the task of not returning execution result in the described preset time is not labeled as execution again.
8. according to claim 7 or 8 described parallel data processing equipments, it is characterized in that described task creation module comprises:
Submodule is obtained in tabulation, is used for obtaining from data source the identification list of the pending data that need handle, has safeguarded the Data Identification of all pending data in the described identification list;
Sign is extracted submodule, is used for extracting from described identification list the sign of each pending data, after being task of each pending data creation, the sign of extracting is put into task.
9. a parallel data handling system is characterized in that, comprising: a main equipment and a plurality of slave unit, wherein,
Described main equipment, be used for knowing the pending data that to handle from data source, be task of each pending data creation, when the request message of the task of obtaining that receives the slave unit transmission, for sending the slave unit allocating task of request, the execution result that slave unit is returned merges, and each task executions state of dynamically recording, described executing state comprise do not carry out, carry out in, executed and having merged, output is merged the amalgamation result of task;
Described slave unit is used for sending to described main equipment the request message of the task of obtaining, and after receiving the task that described main equipment distributes, carries out the task of distributing, and execution result is returned to described main equipment.
10. parallel data handling system according to claim 9, it is characterized in that, it is executory task that described main equipment also is used at state, check the task of not returning execution result in the preset time of whether depositing, if exist, the task of not returning execution result in the described preset time is not labeled as execution again.
CN201010200891.7A 2010-06-10 2010-06-10 Parallel data processing method, device and system CN102279730B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201010200891.7A CN102279730B (en) 2010-06-10 2010-06-10 Parallel data processing method, device and system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201010200891.7A CN102279730B (en) 2010-06-10 2010-06-10 Parallel data processing method, device and system
HK12101872.7A HK1161386A1 (en) 2010-06-10 2012-02-24 Method, device and system for parallel data processing

Publications (2)

Publication Number Publication Date
CN102279730A true CN102279730A (en) 2011-12-14
CN102279730B CN102279730B (en) 2014-02-05

Family

ID=45105202

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201010200891.7A CN102279730B (en) 2010-06-10 2010-06-10 Parallel data processing method, device and system

Country Status (2)

Country Link
CN (1) CN102279730B (en)
HK (1) HK1161386A1 (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103188306A (en) * 2011-12-30 2013-07-03 中国移动通信集团公司 Distributed preprocessing method and distributed preprocessing system
CN103294527A (en) * 2012-02-29 2013-09-11 深圳市思乐网络技术有限责任公司 Method, system, and server for processing network task
CN103475520A (en) * 2013-09-10 2013-12-25 青岛海信传媒网络技术有限公司 Service processing control method and device in distribution network
CN103559036A (en) * 2013-11-04 2014-02-05 北京中搜网络技术股份有限公司 Data batch processing system and method based on Hadoop
CN103729257A (en) * 2012-10-16 2014-04-16 阿里巴巴集团控股有限公司 Distributed parallel computing method and system
CN104102475A (en) * 2013-04-11 2014-10-15 腾讯科技(深圳)有限公司 Method, device and system for processing distributed type parallel tasks
CN104462304A (en) * 2014-11-28 2015-03-25 北京奇虎科技有限公司 Information processing method and device
CN105204941A (en) * 2015-08-18 2015-12-30 耿懿超 Data processing method and data processing device
CN105844717A (en) * 2016-01-08 2016-08-10 乐卡汽车智能科技(北京)有限公司 Information processing method and system, and control device
CN106201984A (en) * 2016-07-15 2016-12-07 青岛海信电器股份有限公司 A kind of method for reading data and device
CN103685492B (en) * 2013-12-03 2017-01-25 北京智谷睿拓技术服务有限公司 Dispatching method, dispatching device and application of Hadoop trunking system
CN106850409A (en) * 2017-01-24 2017-06-13 腾讯科技(深圳)有限公司 A kind of method of message chain rupture task treatment, equipment and system
CN107402956A (en) * 2017-06-07 2017-11-28 网易(杭州)网络有限公司 Data processing method, equipment and the computer-readable recording medium of big task
CN108153678A (en) * 2018-01-17 2018-06-12 北京网信云服信息科技有限公司 A kind of test assignment processing method and processing device
CN109146250A (en) * 2018-07-24 2019-01-04 武汉空心科技有限公司 Task exploitation delivery method and system based on page metering
CN109255515A (en) * 2018-07-24 2019-01-22 武汉空心科技有限公司 A kind of task exploitation cloud platform based on page metering and unit time distribution
WO2019019400A1 (en) * 2017-07-24 2019-01-31 上海壹账通金融科技有限公司 Task distributed processing method, device, storage medium and server

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101561767A (en) * 2008-04-16 2009-10-21 上海聚力传媒技术有限公司 Method and device for executing tasks based on operating system
CN101566957A (en) * 2008-04-25 2009-10-28 恩益禧电子股份有限公司 Information processing system and task execution control method
JP2010039526A (en) * 2008-07-31 2010-02-18 Toshiba Corp Computer program and master computer

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101561767A (en) * 2008-04-16 2009-10-21 上海聚力传媒技术有限公司 Method and device for executing tasks based on operating system
CN101566957A (en) * 2008-04-25 2009-10-28 恩益禧电子股份有限公司 Information processing system and task execution control method
JP2010039526A (en) * 2008-07-31 2010-02-18 Toshiba Corp Computer program and master computer

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103188306A (en) * 2011-12-30 2013-07-03 中国移动通信集团公司 Distributed preprocessing method and distributed preprocessing system
CN103188306B (en) * 2011-12-30 2016-04-27 中国移动通信集团公司 Distributed preprocess method and system
CN103294527A (en) * 2012-02-29 2013-09-11 深圳市思乐网络技术有限责任公司 Method, system, and server for processing network task
CN103729257B (en) * 2012-10-16 2017-04-12 阿里巴巴集团控股有限公司 Distributed parallel computing method and system
CN103729257A (en) * 2012-10-16 2014-04-16 阿里巴巴集团控股有限公司 Distributed parallel computing method and system
CN104102475A (en) * 2013-04-11 2014-10-15 腾讯科技(深圳)有限公司 Method, device and system for processing distributed type parallel tasks
CN104102475B (en) * 2013-04-11 2018-10-02 腾讯科技(深圳)有限公司 The method, apparatus and system of distributed parallel task processing
CN103475520B (en) * 2013-09-10 2017-04-26 聚好看科技股份有限公司 Service processing control method and device in distribution network
CN103475520A (en) * 2013-09-10 2013-12-25 青岛海信传媒网络技术有限公司 Service processing control method and device in distribution network
CN103559036A (en) * 2013-11-04 2014-02-05 北京中搜网络技术股份有限公司 Data batch processing system and method based on Hadoop
CN103685492B (en) * 2013-12-03 2017-01-25 北京智谷睿拓技术服务有限公司 Dispatching method, dispatching device and application of Hadoop trunking system
CN104462304A (en) * 2014-11-28 2015-03-25 北京奇虎科技有限公司 Information processing method and device
CN105204941A (en) * 2015-08-18 2015-12-30 耿懿超 Data processing method and data processing device
CN105844717A (en) * 2016-01-08 2016-08-10 乐卡汽车智能科技(北京)有限公司 Information processing method and system, and control device
CN106201984A (en) * 2016-07-15 2016-12-07 青岛海信电器股份有限公司 A kind of method for reading data and device
CN106850409A (en) * 2017-01-24 2017-06-13 腾讯科技(深圳)有限公司 A kind of method of message chain rupture task treatment, equipment and system
CN106850409B (en) * 2017-01-24 2019-12-10 腾讯科技(深圳)有限公司 Method, equipment and system for processing message chain breaking task
CN107402956B (en) * 2017-06-07 2020-02-21 网易有道信息技术(杭州)有限公司 Data processing method and device for large task and computer readable storage medium
CN107402956A (en) * 2017-06-07 2017-11-28 网易(杭州)网络有限公司 Data processing method, equipment and the computer-readable recording medium of big task
WO2019019400A1 (en) * 2017-07-24 2019-01-31 上海壹账通金融科技有限公司 Task distributed processing method, device, storage medium and server
CN108153678A (en) * 2018-01-17 2018-06-12 北京网信云服信息科技有限公司 A kind of test assignment processing method and processing device
CN109146250A (en) * 2018-07-24 2019-01-04 武汉空心科技有限公司 Task exploitation delivery method and system based on page metering
CN109255515A (en) * 2018-07-24 2019-01-22 武汉空心科技有限公司 A kind of task exploitation cloud platform based on page metering and unit time distribution

Also Published As

Publication number Publication date
CN102279730B (en) 2014-02-05
HK1161386A1 (en) 2012-08-24

Similar Documents

Publication Publication Date Title
Wang et al. Optimizing load balancing and data-locality with data-aware scheduling
Ghazi et al. Hadoop, MapReduce and HDFS: a developers perspective
US9407677B2 (en) High performance data streaming
US20160283282A1 (en) Optimization of map-reduce shuffle performance through shuffler i/o pipeline actions and planning
US10067791B2 (en) Methods and apparatus for resource management in cluster computing
CN102880503B (en) Data analysis system and data analysis method
US9753980B1 (en) M X N dispatching in large scale distributed system
Kulkarni et al. Survey on Hadoop and Introduction to YARN.
US8893148B2 (en) Performing setup operations for receiving different amounts of data while processors are performing message passing interface tasks
US8812627B2 (en) System and method for installation and management of cloud-independent multi-tenant applications
US8261266B2 (en) Deploying a virtual machine having a virtual hardware configuration matching an improved hardware profile with respect to execution of an application
KR101691126B1 (en) Fault tolerant batch processing
Gu et al. SHadoop: Improving MapReduce performance by optimizing job execution mechanism in Hadoop clusters
CN101908003B (en) Multi-core dispatching of parallelization inquiry
US7650331B1 (en) System and method for efficient large-scale data processing
CN102360309B (en) Scheduling system and scheduling execution method of multi-core heterogeneous system on chip
US7647590B2 (en) Parallel computing system using coordinator and master nodes for load balancing and distributing work
CN105117289B (en) Method for allocating tasks, apparatus and system based on cloud test platform
US9619430B2 (en) Active non-volatile memory post-processing
US20130061220A1 (en) Method for on-demand inter-cloud load provisioning for transient bursts of computing needs
US20120209943A1 (en) Apparatus and method for controlling distributed memory cluster
US9594637B2 (en) Deploying parallel data integration applications to distributed computing environments
US8127300B2 (en) Hardware based dynamic load balancing of message passing interface tasks
JP2013218700A (en) Distributed processing system, scheduler node and scheduling method of distributed processing system, and program generation apparatus therefor
US20150143380A1 (en) Scheduling workloads and making provision decisions of computer resources in a computing environment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
C10 Entry into substantive examination
CB03 Change of inventor or designer information

Inventor after: Cen Wenchu

Inventor before: Fan Hangcheng

C53 Correction of patent for invention or patent application
COR Change of bibliographic data

Free format text: CORRECT: INVENTOR; FROM: FAN HANGCHENG TO: CEN WENCHU

REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 1161386

Country of ref document: HK

C14 Grant of patent or utility model
GR01 Patent grant
REG Reference to a national code

Ref country code: HK

Ref legal event code: GR

Ref document number: 1161386

Country of ref document: HK