CN110362401A - Data run the member host in batch method, apparatus, storage medium and cluster - Google Patents
Data run the member host in batch method, apparatus, storage medium and cluster Download PDFInfo
- Publication number
- CN110362401A CN110362401A CN201910553729.4A CN201910553729A CN110362401A CN 110362401 A CN110362401 A CN 110362401A CN 201910553729 A CN201910553729 A CN 201910553729A CN 110362401 A CN110362401 A CN 110362401A
- Authority
- CN
- China
- Prior art keywords
- subtask
- data
- cluster
- message
- task
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 52
- 238000005192 partition Methods 0.000 claims abstract description 26
- 230000006870 function Effects 0.000 claims description 14
- 230000005540 biological transmission Effects 0.000 claims description 2
- 230000002159 abnormal effect Effects 0.000 description 14
- 238000010586 diagram Methods 0.000 description 8
- 230000002547 anomalous effect Effects 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 4
- 230000005856 abnormality Effects 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 239000010453 quartz Substances 0.000 description 2
- VYPSYNLAJGMNEJ-UHFFFAOYSA-N silicon dioxide Inorganic materials O=[Si]=O VYPSYNLAJGMNEJ-UHFFFAOYSA-N 0.000 description 2
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000001035 drying Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000000082 states acquisition Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/5044—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering hardware capabilities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
- G06F9/546—Message passing systems or structures, e.g. queues
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Debugging And Monitoring (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of data to run the member host in batch method, apparatus, storage medium and cluster, this method comprises: obtaining the corresponding batch pending data of current task node, the batch pending data is subjected to random subregion, obtains the partition data of preset quantity;The subtask of corresponding subregion is generated according to the partition data of the preset quantity, the subtask of the preset quantity is stored in the form of queue to default queue lists, and task is sent by message-oriented middleware and handles each member host of the message into cluster, when so that each member host in the cluster listening to the task processing message, subtask is obtained from the default queue lists and is handled.Based on big data, high-volume data are handled by cluster, each member host's distributed tasks can also execute task simultaneously in cluster, realize that the maximization of resource uses, data subregion can increase the concurrent processing speed of data independent of database with arbitrary extension subtask number.
Description
Technical field
The present invention relates to the technical fields of big data more particularly to a kind of data to run batch method, apparatus, storage medium and collection
Member host in group.
Background technique
Currently, when being related to high-volume data processing, batch application individually deployment is run, when not running batch, runs the idle of batch application
Lead to the wasting of resources;It is executed when high-volume task execution using single thread, causes the speed of performing task slower;High-volume task
Data subregion depends on Hash (hash) subregion of Oralce, a table data volume in Oralce reach after hundred million orders of magnitude or
2G size is arrived in single expression, and search efficiency can be decreased obviously, need by way of subregion, divide from capable dimension to table,
Avoid single table data volume excessive, or value range not strong for data regularity be difficult to it is determining, by Hash method by force into
Row subregion, subregion number need to be arranged to 2 power, and scalability is not strong.Therefore, how to improve high-volume data processing efficiency and
Realize that the maximization use of resource is a technical problem to be solved urgently.
Above content is only used to facilitate the understanding of the technical scheme, and is not represented and is recognized that above content is existing skill
Art.
Summary of the invention
The main purpose of the present invention is to provide a kind of data to run the member in batch method, apparatus, storage medium and cluster
Host, it is intended to solve the technical problem that high-volume data-handling efficiency is low in the prior art and resource utilization is not high.
To achieve the above object, the present invention provides a kind of data race batch method, and it includes following step that the data, which run batch method,
It is rapid:
The corresponding batch pending data of current task node is obtained, the batch pending data is divided at random
Area obtains the partition data of preset quantity;
The subtask that corresponding subregion is generated according to the partition data of the preset quantity, by the son of the preset quantity
Task is stored in the form of queue to default queue lists, and each into cluster by message-oriented middleware transmission task processing message
Member host when so that each member host in the cluster listening to the task processing message, arranges from the default queue
Subtask is obtained in table to be handled.
Preferably, the subtask that corresponding subregion is generated according to the partition data of the preset quantity, will be described default
The subtask of quantity is stored in the form of queue to default queue lists, and is sent task by message-oriented middleware and handled message
Each member host into cluster, when so that each member host in the cluster listening to the task processing message, from institute
It states and is obtained in default queue lists after subtask handled, the data run batch method further include:
When obtaining a subtask from the default queue lists, to the subtask number in the default queue lists
Amount subtracts one, obtains remaining subtask number.
Preferably, described when obtaining a subtask from the default queue lists, to the default queue lists
In subtask quantity subtract one, after obtaining remaining subtask number, the data run batch method further include:
When the remaining subtask number is zero, assert that the subtask of acquisition is the last item subtask, to described last
The processing progress of one subtask is monitored, and when listening to the last item subtask and having handled, is consumed using cluster
Mode sends subtask by the message-oriented middleware and all handles completion message to the cluster.
Preferably, described when the remaining subtask number is zero, assert that the subtask of acquisition is the last item subtask,
The processing progress of the last item subtask is monitored, when listening to the last item subtask and having handled, is adopted
It is described after sending subtask whole processing completion message to the cluster by the message-oriented middleware with cluster consumption pattern
Data run batch method further include:
When listening to the subtask whole processing completion message, judge whether all subtasks have all been processed into
Function obtains next task node and executes next task if all subtasks are all handled successfully.
Preferably, the subtask that corresponding subregion is generated according to the partition data of the preset quantity, will be described default
The subtask of quantity is stored in the form of queue to default queue lists, and is sent task by message-oriented middleware and handled message
Each member host into cluster, when so that each member host in the cluster listening to the task processing message, from institute
It states and is obtained in default queue lists after subtask handled, the data run batch method further include:
It is monitored by broadcast consumption mode, when listening to the task processing message, is arranged from the default queue
Subtask is obtained in table to be handled.
Preferably, described to be monitored by broadcast consumption mode, when listening to task processing message, from described
Subtask is obtained in default queue lists to be handled, comprising:
It is monitored by broadcast consumption mode;
When listening to the task processing message, subtask is obtained from the default queue lists and is handled;
Other member hosts other than the member host in addition to obtaining subtask in the cluster are carried out obtaining son times
Function of being engaged in block.
Preferably, described when listening to the task processing message, subtask is obtained from the default queue lists
It is handled, comprising:
When listening to the task processing message, CPU usage is calculated, is preset according to the CPU usage from described
Multiple subtasks are obtained in queue lists, and multiple subtasks are handled using multi-thread concurrent.
In addition, to achieve the above object, the present invention also proposes the member host in a kind of cluster, the member in the cluster
Host includes that the data that can run on the memory and on the processor of memory, processor and being stored in run batch journey
Sequence, the data run batch program and are arranged for carrying out the step of data as described above run the method for criticizing.
In addition, to achieve the above object, the present invention also proposes a kind of storage medium, data are stored on the storage medium
Batch program is run, the data run the step of data race batch method as described above is realized when batch program is executed by processor.
In addition, to achieve the above object, the present invention also proposes that a kind of data run batch device, and the data run batch device packet
It includes:
Random division module, for obtaining the corresponding batch pending data of current task node, by the batch wait locate
It manages data and carries out random subregion, obtain the partition data of preset quantity;
Generation module will be described pre- for generating the subtask of corresponding subregion according to the partition data of the preset quantity
If the subtask of quantity is stored in the form of queue to default queue lists, and is sent task processing by message-oriented middleware and disappeared
Each member host into cluster is ceased, when so that each member host in the cluster listening to the task processing message, from
Subtask is obtained in the default queue lists to be handled.
In the present invention, by obtaining the corresponding batch pending data of current task node, by batch number to be processed
According to random subregion is carried out, the partition data of preset quantity is obtained, data subregion can be appointed independent of database with arbitrary extension
Business number, increases the concurrent processing speed of data;The subtask of corresponding subregion is generated according to the partition data of the preset quantity, it will
The subtask of the preset quantity is stored in the form of queue to default queue lists, and sends task by message-oriented middleware
Each member host of the message into cluster is handled, so that each member host in the cluster listens to the task processing message
When, subtask is obtained from the default queue lists and is handled, big data is based on, and high-volume data are handled by cluster,
Distributed tasks can also execute task simultaneously, realize that the maximization of resource uses.
Detailed description of the invention
Fig. 1 is the structural representation of the member host in the cluster for the hardware running environment that the embodiment of the present invention is related to
Figure;
Fig. 2 is the flow diagram that data of the present invention run batch method first embodiment;
Fig. 3 is the flow diagram that data of the present invention run batch method second embodiment;
Fig. 4 is the flow diagram that data of the present invention run batch method 3rd embodiment;
Fig. 5 is the structural block diagram that data of the present invention run batch device first embodiment.
The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.
Specific embodiment
It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not intended to limit the present invention.
Referring to Fig.1, Fig. 1 is member host's structure in the cluster for the hardware running environment that the embodiment of the present invention is related to
Schematic diagram.
As shown in Figure 1, the member host in the cluster may include: processor 1001, such as central processing unit
(Central Processing Unit, CPU), communication bus 1002, user interface 1003, network interface 1004, memory
1005.Wherein, communication bus 1002 is for realizing the connection communication between these components.User interface 1003 may include display
Shield (Display), optional user interface 1003 can also include standard wireline interface and wireless interface, for user interface
1003 wireline interface can be USB interface in the present invention.Network interface 1004 optionally may include standard wireline interface,
Wireless interface (such as Wireless Fidelity (WIreless-FIdelity, WI-FI) interface).Memory 1005 can be the random of high speed
Memory (Random Access Memory, RAM) memory is accessed, stable memory (Non-volatile is also possible to
Memory, NVM), such as magnetic disk storage.Memory 1005 optionally can also be the storage independently of aforementioned processor 1001
Device.
It will be understood by those skilled in the art that structure shown in Fig. 1 does not constitute the limit to the member host in cluster
It is fixed, it may include perhaps combining certain components or different component layouts than illustrating more or fewer components.
As shown in Figure 1, as may include that operating system, network are logical in a kind of memory 1005 of computer storage medium
Believe that module, Subscriber Interface Module SIM and data run batch program.
In member host in cluster shown in Fig. 1, network interface 1004 is mainly used for connecting background server, with institute
It states background server and carries out data communication;User interface 1003 is mainly used for connecting user equipment;Member master in the cluster
Machine calls the data stored in memory 1005 to run batch program by processor 1001, and executes number provided in an embodiment of the present invention
According to race batch method.
Based on above-mentioned hardware configuration, propose that data of the present invention run the embodiment of batch method.
It is the flow diagram that data of the present invention run batch method first embodiment referring to Fig. 2, Fig. 2, proposes data of the present invention
Run batch method first embodiment.
In the first embodiment, the data run batch method the following steps are included:
Step S10: the corresponding batch pending data of current task node is obtained, the batch pending data is carried out
Random subregion obtains the partition data of preset quantity.
It should be understood that the executing subject of the present embodiment is the member host in the cluster, wherein in the cluster
Member host can be the electronic equipments such as PC or server.It include more member hosts, all members in the cluster
Host all have the function of it is the same, can with distributed tasks simultaneously can also execute task, realize resource maximization use.This reality
The executing subject of example is applied as any one member host in the cluster, passes through default open source projects start by set date race batch and appoints
Business, the default open source projects are quartz, after running batch task start, get the current task node, and work as by described in
The corresponding random subregion of batch pending data of preceding task node, is generally divided into a-z, A-Z and 0-9 and has 62 subregions altogether
The partition data, i.e., the described preset quantity be 62, subregion can also be expanded, be divided into greater number of subregion, data
Subregion can increase the concurrent processing speed of data independent of database with arbitrary extension subtask number.
Step S20: the subtask of corresponding subregion is generated according to the partition data of the preset quantity, by the preset quantity
The subtask stored in the form of queue to default queue lists, and pass through message-oriented middleware send task handle message to collect
Each member host in group, when so that each member host in the cluster listening to task processing message, from described pre-
If obtaining subtask in queue lists to be handled.
It will be appreciated that the son for generating the preset quantity of corresponding subregion is appointed after the completion of the batch pending data subregion
Business, the usual preset quantity is 62 namely 62 subtasks, and task symbol a-z, A-Z and 0-9 can also be to subtasks
Number is expanded, and is divided into greater number of subtask, and this 62 subtasks are put into the default queue in a manner of queue
List, the default queue lists are redis queue lists, send the task by the message-oriented middleware and handle message,
Inform that each host in the cluster has task to be processed.
It should be noted that the cluster is monitored by broadcast consumption mode, so each member host in the cluster
The task processing message will be received, subtask is then obtained inside the redis queue lists, and call and obtain
The corresponding processing class in subtask is handled, and the processing class is usually that subtask executes corresponding algorithm or rule etc., according to
The processing class executes the subtask, obtains result data.For example, the subtask is the commission for calculating A order, A order
Type be consumption borrow, then it is described processing class be consumption borrow type order commission calculation formula, according to the consumption loan type
The commission calculation formula of order calculates the commission of A order, calculates and completes, i.e. subtask processing is completed.
In the concrete realization, when executing each subtask, log recording is carried out for processing mistake or abnormal data, it can be pre-
A large amount of sample abnormal data and corresponding sample abnormal point are first obtained, it is different by sample described in convolutional neural networks model learning
Regular data and corresponding sample abnormal point obtain anomalous identification model, are identified in log recording by the anomalous identification model
Mistake or abnormal data carry out abnormality processing, complete subtask to be successfully processed to orient abnormal point.
It is by obtaining the corresponding batch pending data of current task node, the batch is to be processed in the present embodiment
Data carry out random subregion, obtain the partition data of preset quantity, and data subregion, can be with arbitrary extension independent of database
Number of tasks increases the concurrent processing speed of data;The subtask of corresponding subregion is generated according to the partition data of the preset quantity,
The subtask of the preset quantity is stored in the form of queue to default queue lists, and is sent and is appointed by message-oriented middleware
Each member host of the business processing message into cluster, so that each member host in the cluster listens to the task processing and disappears
When breath, subtask being obtained from the default queue lists and is handled, big data is based on, high-volume number is handled by cluster
According to distributed tasks can also execute task simultaneously, realize that the maximization of resource uses.
It is the flow diagram that data of the present invention run batch method second embodiment referring to Fig. 3, Fig. 3, based on shown in above-mentioned Fig. 2
First embodiment, propose that data of the present invention run the second embodiment for the method for criticizing.
In a second embodiment, after the step S20, further includes:
Step S30: when obtaining a subtask from the default queue lists, in the default queue lists
Subtask quantity subtracts one, obtains remaining subtask number.
It should be understood that being preset for each member host from described in order to avoid the repetition of subtask obtains or executes
Subtask is obtained in queue lists, distributed lock is realized using redis+lua script, when any one member host B is set from described
When obtaining subtask in queue lists, other member hosts are carried out to obtain the block of subtask function, i.e., do not allow other members
Host carries out obtaining at this moment the operation of subtask, reach data it is synchronous while greatly reduce the time brought by locking and open
Pin.A subtask is obtained from the default queue lists to the member host B, to the institute in the default queue lists
It states preset quantity and subtracts one, obtain remaining subtask number, for example the preset quantity is 62, the member host B is from described
A subtask is obtained in default queue lists, then the remaining subtask number is 61.The member host B is from the default team
After obtaining a subtask in column list, to the function of other member host's Open Access Journals subtasks, then other member hosts
Subtask can be obtained from the default queue lists to be handled, until the subtask whole quilt in the default queue lists
It obtains and executes.
In a second embodiment, after the step S30, further includes:
Step S40: when the remaining subtask number is zero, assert that the subtask of acquisition is the last item subtask, right
The processing progress of the last item subtask is monitored, and when listening to the last item subtask and having handled, is used
Cluster consumption pattern sends subtask by the message-oriented middleware and all handles completion message to the cluster.
It should be noted that any one member host C in the cluster is obtained from the default queue lists
When subtask, subtract one to the subtask quantity in the default queue lists, remaining subtask number is obtained, if the residue
Subtask number is zero, then illustrates that the subtask that the member host C is obtained from the default queue lists is last strip
Task, then when the member host C has handled the last item subtask, can be used group consumption pattern by the message among
Part sends subtask, and all processing completes message to the cluster.
In the concrete realization, in order to enable in the cluster each host can continuous high-efficient handle more multitask, described
When the member host C in cluster has handled the last item subtask, subtask can be sent by the message-oriented middleware
All processing completes message to the cluster, so that each host in the cluster can continue to obtain next task node to next
The task of task node is handled, and the subtask consumption pattern that all message is completed in processing uses cluster consumption pattern,
So only having a host listens to the subtask all processing completion message.
In a second embodiment, after the step S40, further includes:
When listening to the subtask whole processing completion message, judge whether all subtasks have all been processed into
Function obtains next task node and executes next task if all subtasks are all handled successfully.
It will be appreciated that the member host C is the member host that the last item subtask is handled in the cluster, it is described
The member host C sends a subtask all processing completion message, the subtask whole by the message-oriented middleware
The consumption pattern that message is completed in processing is consumed using cluster, so only having a member host, such as member host D, is monitored
To the subtask, all message is completed in processing, and the member host D is any one member host in the cluster.
In the concrete realization, when the member host D listens to the subtask whole processing completion message, judge each son
Whether task has all been handled successfully, if all subtasks have all been handled successfully, automatically into next task
Node executes, if all subtasks are not handled all successfully, suspends and obtains next task node, waiting problem row
It looks into, if problem is checked and handles completion, wakes up task processor system once again, continue under obtaining next task node and executing
One task.Data would generally be handled to subtask to be monitored, if being alerted after monitoring abnormal data, manpower intervention
Check mistake.
In a second embodiment, when obtaining a subtask from the default queue lists, to the default queue
Subtask quantity in list subtracts one, obtains remaining subtask number, when the remaining subtask number is zero, assert and obtains
Subtask be the last item subtask, the processing progress of the last item subtask is monitored, it is described listening to
When the last item subtask has been handled, subtask is sent by the message-oriented middleware using cluster consumption pattern and has all been handled
At message to the cluster, by being counted to the subtask quantity in queue lists, thus grasp the execution of subtask into
Degree, when all subtasks are processed completion, sending task in time, all message is completed in processing, so that the member master in cluster
Machine obtains the processing that next task node carries out next task in time, realizes the promotion of high-volume data-handling efficiency.
It is the flow diagram that data of the present invention run batch method 3rd embodiment referring to Fig. 4, Fig. 4, based on shown in above-mentioned Fig. 3
Second embodiment, propose that data of the present invention run the 3rd embodiment for the method for criticizing.
In the third embodiment, after the step S20, further includes:
Step S201: being monitored by broadcast consumption mode, when listening to task processing message, from described pre-
If obtaining subtask in queue lists to be handled.
It should be understood that the cluster in include more member hosts, all member hosts all have the function of it is the same,
Task can also be executed simultaneously with distributed tasks, realize that the maximization of resource uses, each member host in the cluster is logical
It crosses broadcast consumption mode to be monitored, so host all in the cluster can all receive the task processing message, then
The acquisition subtask inside redis queue lists, and corresponding processing class is called to be handled, wrong for processing or abnormal
Data do log recording.
Further, the step S201, comprising:
It is monitored by broadcast consumption mode;
When listening to the task processing message, subtask is obtained from the default queue lists and is handled;
Other member hosts other than the member host in addition to obtaining subtask in the cluster are carried out obtaining son times
Function of being engaged in block.
It should be noted that host distributed tasks all in the cluster can also execute task simultaneously, in order to avoid
The repetition of subtask is obtained or is executed, and obtains subtask use from the default queue lists for each member host
Redis+lua script realize distributed lock, when any one member host B from it is described set in queue lists obtain subtask when,
Other member hosts other than the member host in addition to obtaining subtask in the cluster are carried out obtaining subtask function envelope
Lock does not allow other member hosts to carry out obtaining the operation of subtask at this moment, the member host B is from the default queue
After obtaining one or more subtasks in list, to its other than the member host in addition to obtaining subtask in the cluster
His member host carries out the function of Open Access Journals subtask, then other member hosts can obtain son from the default queue lists
Task is handled, until the subtask in the default queue lists is all acquired execution.It is real using redis+lua script
Existing distributed lock greatly reduces time overhead brought by locking while reaching data and synchronizing.
In the present embodiment, described when listening to the task processing message, it is obtained from the default queue lists
Subtask is handled, comprising:
When listening to the task processing message, CPU usage is calculated, is preset according to the CPU usage from described
Multiple subtasks are obtained in queue lists, and multiple subtasks are handled using multi-thread concurrent.
It will be appreciated that each member host in the cluster can handle the subtask of different subregions, and every simultaneously
Member host handles task using multi-thread concurrent, and processing speed is fast.Concurrent processing task is determined according to the occupancy of CPU, is led to
The concurrent processing simultaneously of normal 5 threads, improves batch subtask processing speed.Each host in the cluster can be respective by calculating
CPU usage, according to the CPU usage judge obtain subtask quantity obtained under the premise of not influencing processing speed
Multiple subtasks are taken, so that multiple subtasks that multiple thread concurrent processing obtain, promote the default queue lists neutron and appoint
Business treatment effeciency.
In the present embodiment, it is monitored by broadcast consumption mode, when listening to the task processing message, from institute
It states acquisition subtask in default queue lists to be handled, all member hosts can receive the task processing and disappear in cluster
It ceases, each member host's distributed tasks can also execute task simultaneously in cluster, realize that the maximization of resource uses, also improve son times
Business treatment effeciency.
In addition, the embodiment of the present invention also proposes a kind of storage medium, it is stored with data on the storage medium and runs batch program,
The data run when batch program is executed by processor and realize the step of data as described above run the method for criticizing.
In addition, the embodiment of the present invention also proposes that a kind of data run batch device, and the data run batch device and include: referring to Fig. 5
Random division module 10, for obtaining the corresponding batch pending data of current task node, by the batch to
It handles data and carries out random subregion, obtain the partition data of preset quantity.
It should be understood that the cluster in include more member hosts, all member hosts all have the function of it is the same,
Task can also be executed simultaneously with distributed tasks, realize that the maximization of resource uses.By presetting open source projects start by set date
Batch task is run, the default open source projects are quartz, after running batch task start, get the current task node, and will
The corresponding random subregion of batch pending data of the current task node, is generally divided into a-z, A-Z and 0-9 and has 62 altogether
The partition data of a subregion, i.e., the described preset quantity are 62, can also expand to subregion, be divided into greater number of point
Area, data subregion can increase the concurrent processing speed of data independent of database with arbitrary extension subtask number.
Generation module 20 will be described for generating the subtask of corresponding subregion according to the partition data of the preset quantity
The subtask of preset quantity is stored in the form of queue to default queue lists, and sends task processing by message-oriented middleware
Each member host of the message into cluster, when so that each member host in the cluster listening to the task processing message,
Subtask is obtained from the default queue lists to be handled.
It will be appreciated that the son for generating the preset quantity of corresponding subregion is appointed after the completion of the batch pending data subregion
Business, the usual preset quantity is 62 namely 62 subtasks, and task symbol a-z, A-Z and 0-9 can also be to subtasks
Number is expanded, and is divided into greater number of subtask, and this 62 subtasks are put into the default queue in a manner of queue
List, the default queue lists are redis queue lists, send the task by the message-oriented middleware and handle message,
Inform that each host in the cluster has task to be processed.
It should be noted that the cluster is monitored by broadcast consumption mode, so each member host in the cluster
The task processing message will be received, subtask is then obtained inside the redis queue lists, and call and obtain
The corresponding processing class in subtask is handled, and the processing class is usually that subtask executes corresponding algorithm or rule etc., according to
The processing class executes the subtask, obtains result data.For example, the subtask is the commission for calculating A order, A order
Type be consumption borrow, then it is described processing class be consumption borrow type order commission calculation formula, according to the consumption loan type
The commission calculation formula of order calculates the commission of A order, calculates and completes, i.e. subtask processing is completed.
In the concrete realization, when executing each subtask, log recording is carried out for processing mistake or abnormal data, it can be pre-
A large amount of sample abnormal data and corresponding sample abnormal point are first obtained, it is different by sample described in convolutional neural networks model learning
Regular data and corresponding sample abnormal point obtain anomalous identification model, are identified in log recording by the anomalous identification model
Mistake or abnormal data carry out abnormality processing, complete subtask to be successfully processed to orient abnormal point.
It is by obtaining the corresponding batch pending data of current task node, the batch is to be processed in the present embodiment
Data carry out random subregion, obtain the partition data of preset quantity, and data subregion, can be with arbitrary extension independent of database
Number of tasks increases the concurrent processing speed of data;The subtask of corresponding subregion is generated according to the partition data of the preset quantity,
The subtask of the preset quantity is stored in the form of queue to default queue lists, and is sent and is appointed by message-oriented middleware
Each member host of the business processing message into cluster, so that each member host in the cluster listens to the task processing and disappears
When breath, subtask being obtained from the default queue lists and is handled, big data is based on, high-volume number is handled by cluster
According to distributed tasks can also execute task simultaneously, realize that the maximization of resource uses.
In one embodiment, the data run batch device further include:
Computing module, for arranging the default queue when obtaining a subtask from the default queue lists
Subtask quantity in table subtracts one, obtains remaining subtask number.
In one embodiment, the data run batch device further include:
Sending module, for assert that the subtask of acquisition is appointed for last strip when the remaining subtask number is zero
Business, monitors the processing progress of the last item subtask, when listening to the last item subtask and having handled,
Subtask is sent by the message-oriented middleware using cluster consumption pattern and all handles completion message to the cluster.
In one embodiment, the data run batch device further include:
Module is obtained, when all handling completion message for listening to the subtask, judges all subtasks whether
It is all handled successfully, if all subtasks are all handled successfully, obtains next task node and execute next
Task.
In one embodiment, the acquisition module is also used to be monitored by broadcast consumption mode, described listening to
When task handles message, subtask is obtained from the default queue lists and is handled.
In one embodiment, the acquisition module is also used to be monitored by broadcast consumption mode;It is described listening to
When task handles message, subtask is obtained from the default queue lists and is handled;To in the cluster in addition to obtaining
Other member hosts except the member host of subtask carry out obtaining the block of subtask function.
In one embodiment, the acquisition module is also used to calculate CPU when listening to the task processing message and account for
With rate, multiple subtasks are obtained from the default queue lists according to the CPU usage, and are handled using multi-thread concurrent
Multiple subtasks.
Data of the present invention run the other embodiments of batch device or specific implementation and can refer to above-mentioned each method and implement
Example, details are not described herein again.
It should be noted that, in this document, the terms "include", "comprise" or its any other variant are intended to non-row
His property includes, so that the process, method, article or the system that include a series of elements not only include those elements, and
And further include other elements that are not explicitly listed, or further include for this process, method, article or system institute it is intrinsic
Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including being somebody's turn to do
There is also other identical elements in the process, method of element, article or system.
The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.If listing equipment for drying
Unit claim in, several in these devices, which can be, to be embodied by the same item of hardware.Word first,
Second and the use of third etc. do not indicate any sequence, can be mark by these word explanations.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side
Method can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but in many cases
The former is more preferably embodiment.Based on this understanding, technical solution of the present invention substantially in other words does the prior art
The part contributed out can be embodied in the form of software products, which is stored in a storage medium
(such as read-only memory mirror image (Read Only Memory image, ROM)/random access memory (Random Access
Memory, RAM), magnetic disk, CD) in, including some instructions are used so that terminal device (can be mobile phone, computer,
Server, air conditioner or network equipment etc.) execute method described in each embodiment of the present invention.
The above is only a preferred embodiment of the present invention, is not intended to limit the scope of the invention, all to utilize this hair
Equivalent structure or equivalent flow shift made by bright specification and accompanying drawing content is applied directly or indirectly in other relevant skills
Art field, is included within the scope of the present invention.
Claims (10)
1. a kind of data run batch method, which is characterized in that the data run batch method the following steps are included:
The corresponding batch pending data of current task node is obtained, the batch pending data is subjected to random subregion, is obtained
Obtain the partition data of preset quantity;
The subtask that corresponding subregion is generated according to the partition data of the preset quantity, by the subtask of the preset quantity
It is stored in the form of queue to default queue lists, and task is sent by message-oriented middleware and handles each member of the message into cluster
Host, when so that each member host in the cluster listening to the task processing message, from the default queue lists
Subtask is obtained to be handled.
2. data as described in claim 1 run batch method, which is characterized in that the partition data according to the preset quantity
The subtask for generating corresponding subregion, the subtask of the preset quantity is stored in the form of queue to default queue lists,
And task is sent by message-oriented middleware and handles each member host of the message into cluster, so that each member master in the cluster
When machine listens to the task processing message, after acquisition subtask is handled in the default queue lists, the number
According to race batch method further include:
When from the default queue lists obtain a subtask when, to the subtask quantity in the default queue lists into
Row subtracts one, obtains remaining subtask number.
3. data as claimed in claim 2 run batch method, which is characterized in that described to be obtained from the default queue lists
When one subtask, subtract one to the subtask quantity in the default queue lists, after obtaining remaining subtask number, institute
It states data and runs batch method further include:
When the remaining subtask number is zero, assert that the subtask of acquisition is the last item subtask, to described the last item
The processing progress of subtask is monitored, when listening to the last item subtask and having handled, using cluster consumption pattern
Sending subtask by the message-oriented middleware, all processing completes message to the cluster.
4. data as claimed in claim 3 run batch method, which is characterized in that it is described when the remaining subtask number is zero,
Assert that the subtask obtained is the last item subtask, the processing progress of the last item subtask is monitored, is being supervised
It is complete by message-oriented middleware transmission subtask using cluster consumption pattern when hearing that the last item subtask has been handled
After message to the cluster is completed in portion's processing, the data run batch method further include:
When listening to the subtask whole processing completion message, judge whether all subtasks have all been handled successfully, if
All subtasks have all been handled successfully, then obtain next task node and execute next task.
5. as data of any of claims 1-4 run batch method, which is characterized in that described according to the preset quantity
Partition data generate the subtask of corresponding subregion, the subtask of the preset quantity is stored in the form of queue to default
Queue lists, and task is sent by message-oriented middleware and handles each member host of the message into cluster, so that in the cluster
Each member host when listening to task processing message, obtain subtask from the default queue lists and carry out handling it
Afterwards, the data run batch method further include:
It is monitored by broadcast consumption mode, when listening to the task processing message, from the default queue lists
Subtask is obtained to be handled.
6. data as claimed in claim 5 run batch method, which is characterized in that it is described to be monitored by broadcast consumption mode,
When listening to the task processing message, subtask is obtained from the default queue lists and is handled, comprising:
It is monitored by broadcast consumption mode;
When listening to the task processing message, subtask is obtained from the default queue lists and is handled;
Other member hosts other than the member host in addition to obtaining subtask in the cluster are carried out obtaining subtask function
It can block.
7. data as claimed in claim 6 run batch method, which is characterized in that described to listen to the task processing message
When, subtask, which is obtained, from the default queue lists is handled, comprising:
When listening to task processing message, CPU usage is calculated, according to the CPU usage from the default queue
Multiple subtasks are obtained in list, and multiple subtasks are handled using multi-thread concurrent.
8. the member host in a kind of cluster, which is characterized in that the member host in the cluster includes: memory, processor
And be stored in the data that can be run on the memory and on the processor and run batch program, the data run batch program by institute
State the step of data race batch method as described in any one of claims 1 to 7 is realized when processor executes.
9. a kind of storage medium, which is characterized in that be stored with data on the storage medium and run batch program, the data run batch journey
The step of data as described in any one of claims 1 to 7 run the method for criticizing is realized when sequence is executed by processor.
10. a kind of data run batch device, which is characterized in that the data run batch device and include:
Random division module, for obtaining the corresponding batch pending data of current task node, by batch number to be processed
According to random subregion is carried out, the partition data of preset quantity is obtained;
Generation module, for generating the subtask of corresponding subregion according to the partition data of the preset quantity, by the present count
The subtask of amount is stored in the form of queue to default queue lists, and is sent task by message-oriented middleware and handled message extremely
Each member host in cluster, when so that each member host in the cluster listening to task processing message, from described
Subtask is obtained in default queue lists to be handled.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910553729.4A CN110362401A (en) | 2019-06-20 | 2019-06-20 | Data run the member host in batch method, apparatus, storage medium and cluster |
PCT/CN2019/121210 WO2020253116A1 (en) | 2019-06-20 | 2019-11-27 | Batch data execution method, device, storage medium, and member host in cluster |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910553729.4A CN110362401A (en) | 2019-06-20 | 2019-06-20 | Data run the member host in batch method, apparatus, storage medium and cluster |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110362401A true CN110362401A (en) | 2019-10-22 |
Family
ID=68217029
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910553729.4A Pending CN110362401A (en) | 2019-06-20 | 2019-06-20 | Data run the member host in batch method, apparatus, storage medium and cluster |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110362401A (en) |
WO (1) | WO2020253116A1 (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111679920A (en) * | 2020-06-08 | 2020-09-18 | 中国银行股份有限公司 | Method and device for processing batch equity data |
WO2020253116A1 (en) * | 2019-06-20 | 2020-12-24 | 深圳壹账通智能科技有限公司 | Batch data execution method, device, storage medium, and member host in cluster |
CN112148505A (en) * | 2020-09-18 | 2020-12-29 | 京东数字科技控股股份有限公司 | Data batching system, method, electronic device and storage medium |
CN113485812A (en) * | 2021-07-23 | 2021-10-08 | 重庆富民银行股份有限公司 | Partition parallel processing method and system based on large data volume task |
CN113537937A (en) * | 2021-07-16 | 2021-10-22 | 重庆富民银行股份有限公司 | Task arrangement method, device and equipment based on topological sorting and storage medium |
CN113568761A (en) * | 2020-04-28 | 2021-10-29 | 中国联合网络通信集团有限公司 | Data processing method, device, equipment and storage medium |
CN114240109A (en) * | 2021-12-06 | 2022-03-25 | 中电金信软件有限公司 | Method, device and system for cross-region processing batch running task |
CN116501499A (en) * | 2023-05-17 | 2023-07-28 | 建信金融科技有限责任公司 | Data batch running method and device, electronic equipment and storage medium |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114168275B (en) * | 2021-10-28 | 2022-10-18 | 厦门国际银行股份有限公司 | Task scheduling method, system, terminal device and storage medium |
Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100306776A1 (en) * | 2009-05-28 | 2010-12-02 | Palo Alto Research Center Incorporated | Data center batch job quality of service control |
US20140082170A1 (en) * | 2012-09-19 | 2014-03-20 | Oracle International Corporation | System and method for small batching processing of usage requests |
CN104092794A (en) * | 2014-07-25 | 2014-10-08 | 中国工商银行股份有限公司 | Batch course processing method and system |
CN106648850A (en) * | 2015-11-02 | 2017-05-10 | 佳能株式会社 | Information processing apparatus and method of controlling the same |
US20170242726A1 (en) * | 2016-02-18 | 2017-08-24 | Red Hat, Inc. | Batched commit in distributed transactions |
CN107291911A (en) * | 2017-06-26 | 2017-10-24 | 北京奇艺世纪科技有限公司 | A kind of method for detecting abnormality and device |
CN108255619A (en) * | 2017-12-28 | 2018-07-06 | 新华三大数据技术有限公司 | A kind of data processing method and device |
CN108564167A (en) * | 2018-04-09 | 2018-09-21 | 杭州乾圆科技有限公司 | The recognition methods of abnormal data among a kind of data set |
CN108733477A (en) * | 2017-04-20 | 2018-11-02 | 中国移动通信集团湖北有限公司 | The method, apparatus and equipment of data clusterization processing |
CN108985632A (en) * | 2018-07-16 | 2018-12-11 | 国网上海市电力公司 | A kind of electricity consumption data abnormality detection model based on isolated forest algorithm |
CN109144731A (en) * | 2018-08-31 | 2019-01-04 | 中国平安人寿保险股份有限公司 | Data processing method, device, computer equipment and storage medium |
CN109298225A (en) * | 2018-09-29 | 2019-02-01 | 国网四川省电力公司电力科学研究院 | A kind of voltage metric data abnormality automatic identification model and method |
CN109299135A (en) * | 2018-11-26 | 2019-02-01 | 平安科技(深圳)有限公司 | Abnormal inquiry recognition methods, identification equipment and medium based on identification model |
CN109558600A (en) * | 2018-11-14 | 2019-04-02 | 北京字节跳动网络技术有限公司 | Translation processing method and device |
CN109672627A (en) * | 2018-09-26 | 2019-04-23 | 深圳壹账通智能科技有限公司 | Method for processing business, platform, equipment and storage medium based on cluster server |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8245081B2 (en) * | 2010-02-10 | 2012-08-14 | Vmware, Inc. | Error reporting through observation correlation |
CN109461068A (en) * | 2018-09-13 | 2019-03-12 | 深圳壹账通智能科技有限公司 | Judgment method, device, equipment and the computer readable storage medium of fraud |
CN110362401A (en) * | 2019-06-20 | 2019-10-22 | 深圳壹账通智能科技有限公司 | Data run the member host in batch method, apparatus, storage medium and cluster |
-
2019
- 2019-06-20 CN CN201910553729.4A patent/CN110362401A/en active Pending
- 2019-11-27 WO PCT/CN2019/121210 patent/WO2020253116A1/en active Application Filing
Patent Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100306776A1 (en) * | 2009-05-28 | 2010-12-02 | Palo Alto Research Center Incorporated | Data center batch job quality of service control |
US20140082170A1 (en) * | 2012-09-19 | 2014-03-20 | Oracle International Corporation | System and method for small batching processing of usage requests |
CN104092794A (en) * | 2014-07-25 | 2014-10-08 | 中国工商银行股份有限公司 | Batch course processing method and system |
CN106648850A (en) * | 2015-11-02 | 2017-05-10 | 佳能株式会社 | Information processing apparatus and method of controlling the same |
US20170242726A1 (en) * | 2016-02-18 | 2017-08-24 | Red Hat, Inc. | Batched commit in distributed transactions |
CN108733477A (en) * | 2017-04-20 | 2018-11-02 | 中国移动通信集团湖北有限公司 | The method, apparatus and equipment of data clusterization processing |
CN107291911A (en) * | 2017-06-26 | 2017-10-24 | 北京奇艺世纪科技有限公司 | A kind of method for detecting abnormality and device |
CN108255619A (en) * | 2017-12-28 | 2018-07-06 | 新华三大数据技术有限公司 | A kind of data processing method and device |
CN108564167A (en) * | 2018-04-09 | 2018-09-21 | 杭州乾圆科技有限公司 | The recognition methods of abnormal data among a kind of data set |
CN108985632A (en) * | 2018-07-16 | 2018-12-11 | 国网上海市电力公司 | A kind of electricity consumption data abnormality detection model based on isolated forest algorithm |
CN109144731A (en) * | 2018-08-31 | 2019-01-04 | 中国平安人寿保险股份有限公司 | Data processing method, device, computer equipment and storage medium |
CN109672627A (en) * | 2018-09-26 | 2019-04-23 | 深圳壹账通智能科技有限公司 | Method for processing business, platform, equipment and storage medium based on cluster server |
CN109298225A (en) * | 2018-09-29 | 2019-02-01 | 国网四川省电力公司电力科学研究院 | A kind of voltage metric data abnormality automatic identification model and method |
CN109558600A (en) * | 2018-11-14 | 2019-04-02 | 北京字节跳动网络技术有限公司 | Translation processing method and device |
CN109299135A (en) * | 2018-11-26 | 2019-02-01 | 平安科技(深圳)有限公司 | Abnormal inquiry recognition methods, identification equipment and medium based on identification model |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020253116A1 (en) * | 2019-06-20 | 2020-12-24 | 深圳壹账通智能科技有限公司 | Batch data execution method, device, storage medium, and member host in cluster |
CN113568761A (en) * | 2020-04-28 | 2021-10-29 | 中国联合网络通信集团有限公司 | Data processing method, device, equipment and storage medium |
CN113568761B (en) * | 2020-04-28 | 2023-06-27 | 中国联合网络通信集团有限公司 | Data processing method, device, equipment and storage medium |
CN111679920A (en) * | 2020-06-08 | 2020-09-18 | 中国银行股份有限公司 | Method and device for processing batch equity data |
CN112148505A (en) * | 2020-09-18 | 2020-12-29 | 京东数字科技控股股份有限公司 | Data batching system, method, electronic device and storage medium |
CN113537937A (en) * | 2021-07-16 | 2021-10-22 | 重庆富民银行股份有限公司 | Task arrangement method, device and equipment based on topological sorting and storage medium |
CN113485812A (en) * | 2021-07-23 | 2021-10-08 | 重庆富民银行股份有限公司 | Partition parallel processing method and system based on large data volume task |
CN113485812B (en) * | 2021-07-23 | 2023-12-12 | 重庆富民银行股份有限公司 | Partition parallel processing method and system based on large-data-volume task |
CN114240109A (en) * | 2021-12-06 | 2022-03-25 | 中电金信软件有限公司 | Method, device and system for cross-region processing batch running task |
CN116501499A (en) * | 2023-05-17 | 2023-07-28 | 建信金融科技有限责任公司 | Data batch running method and device, electronic equipment and storage medium |
CN116501499B (en) * | 2023-05-17 | 2023-09-19 | 建信金融科技有限责任公司 | Data batch running method and device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
WO2020253116A1 (en) | 2020-12-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110362401A (en) | Data run the member host in batch method, apparatus, storage medium and cluster | |
CN107729139B (en) | Method and device for concurrently acquiring resources | |
US20150347305A1 (en) | Method and apparatus for outputting log information | |
CN109656782A (en) | Visual scheduling monitoring method, device and server | |
CN108566290A (en) | service configuration management method, system, storage medium and server | |
CN111190753B (en) | Distributed task processing method and device, storage medium and computer equipment | |
CN110430068B (en) | Characteristic engineering arrangement method and device | |
CN112306719B (en) | Task scheduling method and device | |
CN109840142A (en) | Thread control method, device, electronic equipment and storage medium based on cloud monitoring | |
CN110300067A (en) | Queue regulation method, device, equipment and computer readable storage medium | |
US11656902B2 (en) | Distributed container image construction scheduling system and method | |
CN114610474A (en) | Multi-strategy job scheduling method and system in heterogeneous supercomputing environment | |
WO2024082853A1 (en) | Method and system for application performance optimization in high-performance computing | |
CN108512782A (en) | Accesses control list is grouped method of adjustment, the network equipment and system | |
CN110221936A (en) | Database alert processing method, device, equipment and computer readable storage medium | |
CN113110867A (en) | RPA robot management method, device, server and storage medium | |
CN111104281B (en) | Game performance monitoring method, device, system and storage medium | |
CN112395062A (en) | Task processing method, device, equipment and computer readable storage medium | |
CN111831452A (en) | Task execution method and device, storage medium and electronic device | |
CN109670932B (en) | Credit data accounting method, apparatus, system and computer storage medium | |
CN104092794B (en) | Batch process handling method and system | |
CN115563160A (en) | Data processing method, data processing device, computer equipment and computer readable storage medium | |
CN115712572A (en) | Task testing method and device, storage medium and electronic device | |
CN111008146A (en) | Method and system for testing safety of cloud host | |
CN115344370A (en) | Task scheduling method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191022 |
|
RJ01 | Rejection of invention patent application after publication |