Summary of the invention
Inventors have found that the scheduling of more equipment is complicated, equipment and data fragmentation phase in relevant distributed processing mode
Association, when equipment delay machine, corresponding data fragmentation processing is stagnated, and influences the normal execution of task.
One purpose of the disclosure is to improve the reliability of distributed data processing.
According to the one aspect of some embodiments of the present disclosure, a kind of data processing method is proposed, comprising: be put into task
In task queue;Each data processing equipment gets task from task queue respectively;After data processing equipment completion task,
Data processing equipment gets task from task queue again, until not being not carried out in task queue for task.
In some embodiments, data processing equipment is according to the number of threads of the processing task of current self-operating, from appointing
The task of corresponding number is got in business queue.
In some embodiments, task is stored into task queue includes: acquisition each number associated with general assignment
According to the information of table;General assignment is split according to the information of tables of data, the task after fractionation is related to the information of corresponding tables of data
Connection generates subtask;Subtask is put into task queue, so that subtask is got and handled to data processing equipment.
In some embodiments, data processing method further include: obtain each task that each data processing equipment executes
Data processed result;Summarize data processed result.
In some embodiments, data processing method further include: according to task suspension execute instruction prevent task queue to
Data processing equipment provides task.
In some embodiments, data processing method further include: allow task queue to data according to task-performance instructions
Processing equipment provides task.
In some embodiments, data processing method further include: data processing equipment according to switch control execute starting or
Task is got in stopping from task queue.
In some embodiments, task queue passes through REDIS (REmote DIctionary Server, long-range dictionary clothes
Business device) technology realization.
By such method, task can be read from task queue, is handled respectively using multiple data processing equipments
Mode realize that task is executed in the distribution of more equipment, avoid the limitation according to data fragmentation controlling equipment, improve point
The reliability of cloth data processing.
According to the other side of some embodiments of the present disclosure, a kind of data processing system is proposed, comprising: task management
Device is configured as task being put into task queue;Multiple data processing equipments are configured as getting from task queue and appoint
Business, and after completion task, task is got from task queue again, until not being not carried out in task queue for task.
In some embodiments, data processing equipment is configured as the Thread Count of the processing task according to current self-operating
Amount, gets the task of corresponding number from task queue.
In some embodiments, task management device is configured as: obtaining each tables of data associated with general assignment
Information;General assignment is split according to the information of tables of data, the task after fractionation is associated with the information of corresponding tables of data, is generated
Subtask;Subtask is put into task queue, so that subtask is got and handled to data processing equipment.
In some embodiments, task management device is also configured to obtain each of each data processing equipment execution
The data processed result of task;Summarize data processed result.
In some embodiments, task management device is also configured to execute instruction pause according to task suspension to data
Processing equipment provides task.
In some embodiments, task management device is also configured to be allowed according to task-performance instructions to data processing
Equipment provides task.
In some embodiments, data processing equipment is additionally configured to be executed according to switch control and start or stop from task
Task is got in queue.
In some embodiments, task queue is realized by REDIS technology.
According to the another aspect of some embodiments of the present disclosure, a kind of data processing system is proposed, comprising: memory;
And it is coupled to the processor of memory, processor is configured as above any one based on the instruction execution for being stored in memory
Kind data processing method.
Such data processing system can using multiple data processing equipments from task queue read task, respectively from
The mode of reason realizes that task is executed in the distribution of more equipment, avoids the limitation according to data fragmentation controlling equipment, improves
The reliability of distributed data processing.
According to another aspect of some embodiments of the present disclosure, proposes a kind of computer readable storage medium, deposit thereon
Computer program instructions are contained, the step of which is executed by processor above any one data processing method.
It, can be using multiple data processing equipments from appointing by executing the instruction on such computer readable storage medium
Reading task, the mode handled respectively realize that task is executed in the distribution of more equipment in business queue, avoid according to data point
The limitation of piece controlling equipment improves the reliability of distributed data processing.
Specific embodiment
Below by drawings and examples, the technical solution of the disclosure is described in further detail.
The flow chart of one embodiment of the data processing method of the disclosure is as shown in Figure 1.
In a step 101, task is put into task queue.In some embodiments, a general assignment can be split
It is put into task queue at several subtasks.In some embodiments, task queue can be Redis queue, thus sufficiently
Guarantee the applicability of distributed data base the runnability of data processing method using Redis.
In a step 102, each data processing equipment gets task from task queue respectively.In some embodiments,
Data processing equipment can carry out data processing using multiple threads, and each thread can be got respectively from task queue appoints
Business, and itself getting for task is handled respectively, to reduce the deployment cost of data processing equipment, improve data-handling efficiency.
In some embodiments, data processing equipment can after unlatching actively to task queue request task, and according to
The tandem of task queue gets task.Data processing equipment can increase at any time, can also not interrupt the task of having got
It is closed in the case where execution.
In step 103, data processing equipment executes getting for task, until completing execution task, executes step 104.
At step 104, data processing equipment is attempted to get task from task queue.It is not carried out if having in task queue
Task thens follow the steps 105;If not being not carried out in task queue for task, i.e., task without waiting in thens follow the steps
106。
In step 105, data processing equipment gets task from task queue again, then executes step 103.
In step 106, task is got in data processing equipment pause, until reappearing task in task queue.
By such method, task can be read from task queue, is handled respectively using multiple data processing equipments
Mode realize that task is executed in the distribution of more equipment, avoid the limitation according to data fragmentation controlling equipment, improve point
The reliability of cloth data processing.
In some embodiments, what each task being put into task queue was not necessarily to executes sequence, that is, is not necessarily to one
Another is executed again after the completion of task, to guarantee that more data processing equipments can be according to itself processing capacity from task team
Task is obtained in column and executes task.In another embodiment, if the presence being put into task queue has necessary execution suitable
The task of sequence, then can be after first task be got by data processing equipment, and the task of suspended task queue provides function, until
The data processing equipment feedback task for obtaining first task is completed, and task queue allows data processing equipment to obtain in rear task,
To guarantee that the result of data processing is accurate.
In some embodiments, staff controls the working condition of data processing equipment, as started or closing at data
Equipment, and the number of threads of control data processing equipment are managed, so as to according to the state of each data processing equipment, needs
Quantity or difficulty of the task of processing etc. configure the operation resource of distributed processing network, reduce the loss of resource.At another
In embodiment, data processing equipment can voluntarily adjust the operating status of itself according to current operating load, such as: wired when
When journey is in execution status of task, increase new thread in the case where self performance allows;When the thread free time, closing should
Thread;When the equipment free time, into device sleeps state etc., to further increase automatic processing degree, resource is reduced
Loss.
The flow chart of another embodiment of the data processing method of the disclosure as shown in Fig. 2, include step 201~206,
Wherein, the step of the step of left side is executed by task management device, right side is executed by data processing equipment.
In step 201, the information of each tables of data associated with general assignment is obtained.In some embodiments, data
The information of table may include table name suffix, the tables of data that can need to be related to localization process task by the information of tables of data.
In step 202, general assignment is split according to the information of tables of data, by task and the corresponding tables of data after fractionation
Information is associated, generates subtask.In some embodiments, in order to avoid mutual dry between the task of front and back in task queue
It disturbs, the division of task can be carried out as unit of tables of data.
In step 203, subtask is put into task queue.
In step 204, each data processing equipment gets subtask from task queue respectively.Data processing equipment can
According to the associated tables of data of being executed for task of information determination of the tables of data in subtask.
In step 205, data processing equipment executes the subtask got.
In step 206, judge whether to execute completion.If data processing equipment completes subtasking, then follow the steps
207;Otherwise step 205 is continued to execute.
In step 207, data processing equipment is attempted to get subtask from task queue.It is not held if existing in task queue
Capable subtask, thens follow the steps 204;If there is no the subtask being not carried out in task queue, i.e., the task without waiting in,
Task is got in then data processing equipment stopping from task queue.
By such method, task can be split according to task relevant tables of data, consequently facilitating at data
It manages equipment acquisition and subtasking does not limit the data processing equipment of execution task, mention while improving task execution efficiency
The high reliability and execution efficiency of distributed processing system(DPS).
In some embodiments, task is handled for partial data, appointed in the son for completing to handle each tables of data
After business, it is also necessary to the data obtained in subtask are summarized, therefore data processing method can also include step 208~
209。
In a step 208, according to the task action result of each data processing equipment, at the data for obtaining each subtask
Manage result.
In step 209, summarize data processed result, obtain final result.In some embodiments, after can summarizing
Result do Data Integration processing, and be stored or sent to predetermined position according to mission requirements.
By such method, the data processed result of subtask can be summarized, to make the processing of general assignment
As a result it is fed back in whole form, the fractionation of task does not influence the acquisition form of final result, improves user-friendliness.
In some embodiments, the type of task may include the data in more new data table, as periodically reduced user
Expired integral etc., also may include the data that needs are extracted from tables of data, as counting user information does data sheet.Above-mentioned two
The data processing method of kind situation is as shown in Fig. 3 A, 3B, wherein the part on the left of flow chart is executed by task management device, right
The part of side is executed by each data processing equipment.The data processing method of the disclosure is in the case where reducing the expired integral scene of user
One embodiment flow chart it is as shown in Figure 3A.
In step 311, the tables of data of the integral ownership of the relevant user of Integral Processing task is determined.
In step 312, in the case where table is divided in a point library, Integral Processing task is split according to data table name suffix, will be torn open
Task after point is associated with table name suffix, generates Integral Processing subtask.
In step 313, subtask is put into task queue.Task queue can be Redis queue.
In a step 314, each data processing equipment gets Integral Processing subtask from task queue respectively.At data
Managing equipment can be according to the associated tables of data of being executed for task of information determination of the tables of data in subtask, and reduces data
The expired integral of user in table.
In step 315, corresponding subtask is executed according to the table name suffix in subtask, may include: to be appointed according to son
The table name of business first inquires corresponding batch data, and corresponding integration data state is changed.In some embodiments, it completes to hold
Capable task can fill in the obstruction queue labeled as completion status, or plug into task queue and is labeled as completion status.
In some embodiments, it is used here the producer/consumer's mode, data processing equipment is the producer, while having correspondence
A task sub thread, the producer, which after task plug to queue, will can recycle, takes out next batch data, successively handles.
In some embodiments, it when data processing equipment or some thread subtasking fail, can retry, if super
Pre-determined number is still failed out, then task status is set to failure and filled in task queue, to appoint in the son of untreated state
The subsequent subtask that status of fail is uniformly processed is completed in business processing.
In step 316, in the case where data processing equipment is completed to execute current task, step 317 is executed.
In step 317, judge in Integral Processing task queue with the presence or absence of the subtask being not carried out.If existing in queue
The subtask being not carried out, thens follow the steps 314, gets next subtask in task queue, carries out executing the subtask;If
There is no the subtask being not carried out in queue, then the execution of the task is completed, data processing terminates.In some embodiments, if
There are the subtask of status of fail, can again attempt to execute or prompt staff's processing.
By such method, it can be realized the processing to a large number of users integral or other database datas, be not necessarily to frame
In the case where frame, management distributed task scheduling function is fast implemented.
The data processing method of the disclosure extracts flow chart such as Fig. 3 B institute of one embodiment under scene in data sheet
Show.
In step 321, the associated tables of data of data statistics task is determined.
In step 322, data statistics task is split according to data table name suffix, by the task and table name suffix after fractionation
It is associated, generate Integral Processing subtask.
In step 323, data statistics subtask is put into task queue.
In step 324, data processing equipment gets subtask from task queue.
In step 325, corresponding single table data are extracted according to the associated table name suffix in subtask, executes corresponding data
Count subtask.
In some embodiments, the completing to execute of the task can fill in the obstruction queue labeled as completion status, or plug
Into task queue and it is labeled as completion status.In some embodiments, the producer/consumer's mode, data are used here
Processing equipment is the producer, while having a corresponding task sub thread, and the producer can recycle after task plug to queue and take
Next batch data out, is successively handled.
In step 326, in the case where data processing equipment is completed to execute current task, step 327 is executed.
In step 327, judge whether task queue is empty.If queue is not sky, 324 are thened follow the steps, task is got
Next subtask in queue carries out executing the subtask;If queue is sky, 328 are thened follow the steps.In some embodiments
In, the subtask of status of fail, can again attempt to execute or prompt staff's processing if it exists.In some embodiments,
If the subtask handled is identified as completion status in step 325 and is reentered into queue, in this step axis, judgement is appointed
Being engaged in queue, whether there is also the subtasks of untreated state.If being held in task queue there are the subtask of untreated state
Row step 324;The subtask of untreated state if it exists, thens follow the steps 328.
In step 328, the data statistics result for each task that each data processing equipment executes is obtained.
In step 329, summarize data statistics as a result, obtaining partial data statistical result.It in some embodiments, can be with
Merge the data extracted in all subtasks, issues report data, or be sent to the work for needing the report by modes such as mails
Make personnel.
By such method, it is able to solve a point library and divides the problem of staqtistical data base data performance declines in the case of table, mention
The formation efficiency of high data sheet.
In some embodiments, data processing equipment can be set and inquire and obtain to task queue according to preset frequency and appoint
Business, to guarantee to be found and execute in time after task enters queue, improves the execution efficiency of task.
In some embodiments, task management device can be switched with configuration schedules, can be opened or closed at any time, in task
In implementation procedure, developer may need to carry out some data verifications, be not desired to closing task at this time, then it is temporary to can choose triggering
Stop task switch, issues task suspension instruction and data processing equipment is allowed to enter empty race wait state, prevent task queue to data
Processing equipment provides task.In some embodiments, control staff convenient for management can be constructed to open at any time, closed, pause etc..
By such method, the flexibility ratio and user-friendliness of task execution are improved.
In some embodiments, can be set administration interface realize in task management device and data processing equipment at least
A kind of operation, control, improve interactive intuitive and convenient degree.
The schematic diagram of one embodiment of the data processing system of the disclosure is as shown in Figure 4.Task management device 41 can
Task is put into task queue.In some embodiments, task management device 41 can split into a general assignment several
A subtask is put into task queue.421~42n of data processing equipment can get task from task queue, and complete to appoint
After business, task is got from task queue again, until task queue is sky.In some embodiments, data processing equipment can
To carry out data processing using multiple threads, each thread can get task respectively from task queue, and handle respectively certainly
The task that body is got improves data-handling efficiency to reduce the deployment cost of data processing equipment.
Such data processing system can using multiple data processing equipments from task queue read task, respectively from
The mode of reason realizes that task is executed in the distribution of more equipment, avoids the limitation according to data fragmentation controlling equipment, improves
The reliability of distributed data processing.
In some embodiments, task management device 41 can first obtain the letter of each tables of data associated with general assignment
Breath splits general assignment further according to the information of tables of data, the task after fractionation is associated with the information of corresponding tables of data, is generated
Subtask, and subtask is put into task queue, so that subtask is got and handled to data processing equipment.
Such data processing system can split task according to the relevant tables of data of task, consequently facilitating data
Processing equipment obtains and subtasking, while improving task execution efficiency, does not limit the data processing equipment of execution task,
Improve the reliability and execution efficiency of distributed processing system(DPS).
In some embodiments, for partial data handle task, task management device 41 can also belong to it is same
After the completion of all subtasks processing of a task, the data obtained in subtask are summarized, to make the processing of general assignment
As a result it is fed back in whole form, the fractionation of task does not influence the acquisition form of final result, improves user-friendliness.
In some embodiments, task management device 41 can execute instruction temporarily according to the task suspension that staff triggers
Stop providing task to data processing equipment, is allowed to provide task to data processing equipment according to task-performance instructions, according to task
Out code stops providing task dispatching to data processing equipment, improves the flexibility ratio and user-friendliness of task execution.
In some embodiments, data processing equipment can start from task queue according to the opening operation of staff
Task is got, task is got from task queue according to the stopping of the shutoff operation of staff, it can also be according to equipment or thread
Busy condition control number of threads or control equipment converted between activation and dormant state, to reduce data processing equipment
Deployment cost improves data-handling efficiency.
The structural schematic diagram of one embodiment of disclosure data processing system is as shown in Figure 5.It is every in data processing system
A part respectively includes memory 501 and processor 502.Wherein: memory 501 can be disk, flash memory or other any non-
Volatile storage medium.Memory is used to store the instruction in the above corresponding embodiment of data processing method.Processor 502
It is coupled to memory 501, can be used as one or more integrated circuits to implement, such as microprocessor or microcontroller.At this
Reason device 502 can be improved the reliability of distributed data processing for executing the instruction stored in memory.
It in some embodiments, can be as shown in fig. 6, data processing system 600 includes memory 601 and processor
602.Processor 602 is coupled to memory 601 by BUS bus 603.The data processing system 600 can also be connect by storage
Mouthfuls 604 are connected to external memory 605 to call external data, can also be connected to by network interface 606 network or
The other computer system (not shown) of person.It no longer describes in detail herein.
In this embodiment, it is instructed by memory stores data, then above-metioned instruction is handled by processor, can be improved
The reliability of distributed data processing.
In another embodiment, a kind of computer readable storage medium, is stored thereon with computer program instructions, this refers to
The step of enabling the method realized in data processing method corresponding embodiment when being executed by processor.Those skilled in the art answer
Understand, embodiment of the disclosure can provide as method, apparatus or computer program product.Therefore, the disclosure can be used completely hard
The form of part embodiment, complete software embodiment or embodiment combining software and hardware aspects.Moreover, the disclosure can be used
The computer that one or more wherein includes computer usable program code can be with non-transient storage medium (including but not
Be limited to magnetic disk storage, CD-ROM, optical memory etc.) on the form of computer program product implemented.
The disclosure is reference according to the method for the embodiment of the present disclosure, the flow chart of equipment (system) and computer program product
And/or block diagram describes.It should be understood that each process in flowchart and/or the block diagram can be realized by computer program instructions
And/or the combination of the process and/or box in box and flowchart and/or the block diagram.It can provide these computer programs to refer to
Enable the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to generate
One machine so that by the instruction that the processor of computer or other programmable data processing devices executes generate for realizing
The device for the function of being specified in one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates,
Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or
The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting
Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or
The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one
The step of function of being specified in a box or multiple boxes.
So far, the disclosure is described in detail.In order to avoid covering the design of the disclosure, it is public that this field institute is not described
The some details known.Those skilled in the art as described above, completely it can be appreciated how implementing technology disclosed herein
Scheme.
Disclosed method and device may be achieved in many ways.For example, can by software, hardware, firmware or
Person's software, hardware, firmware any combination realize disclosed method and device.The step of for the method it is above-mentioned
Sequence is merely to be illustrated, and the step of disclosed method is not limited to sequence described in detail above, unless with other sides
Formula illustrates.In addition, in some embodiments, the disclosure can be also embodied as recording program in the recording medium, these
Program includes for realizing according to the machine readable instructions of disclosed method.Thus, the disclosure also covers storage for executing
According to the recording medium of the program of disclosed method.
Finally it should be noted that: above embodiments are only to illustrate the technical solution of the disclosure rather than its limitations;To the greatest extent
Pipe is described in detail the disclosure referring to preferred embodiment, it should be understood by those ordinary skilled in the art that: still
It can modify to the specific embodiment of the disclosure or some technical features can be equivalently replaced;Without departing from this public affairs
The spirit of technical solution is opened, should all be covered in the claimed technical proposal scope of the disclosure.