CN110362401A - Data run the member host in batch method, apparatus, storage medium and cluster - Google Patents

Data run the member host in batch method, apparatus, storage medium and cluster Download PDF

Info

Publication number
CN110362401A
CN110362401A CN201910553729.4A CN201910553729A CN110362401A CN 110362401 A CN110362401 A CN 110362401A CN 201910553729 A CN201910553729 A CN 201910553729A CN 110362401 A CN110362401 A CN 110362401A
Authority
CN
China
Prior art keywords
subtask
data
cluster
message
task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910553729.4A
Other languages
Chinese (zh)
Inventor
符修亮
叶松
梁群峰
吕林澧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
OneConnect Smart Technology Co Ltd
Original Assignee
OneConnect Smart Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by OneConnect Smart Technology Co Ltd filed Critical OneConnect Smart Technology Co Ltd
Priority to CN201910553729.4A priority Critical patent/CN110362401A/en
Publication of CN110362401A publication Critical patent/CN110362401A/en
Priority to PCT/CN2019/121210 priority patent/WO2020253116A1/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5044Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering hardware capabilities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of data to run the member host in batch method, apparatus, storage medium and cluster, this method comprises: obtaining the corresponding batch pending data of current task node, the batch pending data is subjected to random subregion, obtains the partition data of preset quantity;The subtask of corresponding subregion is generated according to the partition data of the preset quantity, the subtask of the preset quantity is stored in the form of queue to default queue lists, and task is sent by message-oriented middleware and handles each member host of the message into cluster, when so that each member host in the cluster listening to the task processing message, subtask is obtained from the default queue lists and is handled.Based on big data, high-volume data are handled by cluster, each member host's distributed tasks can also execute task simultaneously in cluster, realize that the maximization of resource uses, data subregion can increase the concurrent processing speed of data independent of database with arbitrary extension subtask number.

Description

Data run the member host in batch method, apparatus, storage medium and cluster
Technical field
The present invention relates to the technical fields of big data more particularly to a kind of data to run batch method, apparatus, storage medium and collection Member host in group.
Background technique
Currently, when being related to high-volume data processing, batch application individually deployment is run, when not running batch, runs the idle of batch application Lead to the wasting of resources;It is executed when high-volume task execution using single thread, causes the speed of performing task slower;High-volume task Data subregion depends on Hash (hash) subregion of Oralce, a table data volume in Oralce reach after hundred million orders of magnitude or 2G size is arrived in single expression, and search efficiency can be decreased obviously, need by way of subregion, divide from capable dimension to table, Avoid single table data volume excessive, or value range not strong for data regularity be difficult to it is determining, by Hash method by force into Row subregion, subregion number need to be arranged to 2 power, and scalability is not strong.Therefore, how to improve high-volume data processing efficiency and Realize that the maximization use of resource is a technical problem to be solved urgently.
Above content is only used to facilitate the understanding of the technical scheme, and is not represented and is recognized that above content is existing skill Art.
Summary of the invention
The main purpose of the present invention is to provide a kind of data to run the member in batch method, apparatus, storage medium and cluster Host, it is intended to solve the technical problem that high-volume data-handling efficiency is low in the prior art and resource utilization is not high.
To achieve the above object, the present invention provides a kind of data race batch method, and it includes following step that the data, which run batch method, It is rapid:
The corresponding batch pending data of current task node is obtained, the batch pending data is divided at random Area obtains the partition data of preset quantity;
The subtask that corresponding subregion is generated according to the partition data of the preset quantity, by the son of the preset quantity Task is stored in the form of queue to default queue lists, and each into cluster by message-oriented middleware transmission task processing message Member host when so that each member host in the cluster listening to the task processing message, arranges from the default queue Subtask is obtained in table to be handled.
Preferably, the subtask that corresponding subregion is generated according to the partition data of the preset quantity, will be described default The subtask of quantity is stored in the form of queue to default queue lists, and is sent task by message-oriented middleware and handled message Each member host into cluster, when so that each member host in the cluster listening to the task processing message, from institute It states and is obtained in default queue lists after subtask handled, the data run batch method further include:
When obtaining a subtask from the default queue lists, to the subtask number in the default queue lists Amount subtracts one, obtains remaining subtask number.
Preferably, described when obtaining a subtask from the default queue lists, to the default queue lists In subtask quantity subtract one, after obtaining remaining subtask number, the data run batch method further include:
When the remaining subtask number is zero, assert that the subtask of acquisition is the last item subtask, to described last The processing progress of one subtask is monitored, and when listening to the last item subtask and having handled, is consumed using cluster Mode sends subtask by the message-oriented middleware and all handles completion message to the cluster.
Preferably, described when the remaining subtask number is zero, assert that the subtask of acquisition is the last item subtask, The processing progress of the last item subtask is monitored, when listening to the last item subtask and having handled, is adopted It is described after sending subtask whole processing completion message to the cluster by the message-oriented middleware with cluster consumption pattern Data run batch method further include:
When listening to the subtask whole processing completion message, judge whether all subtasks have all been processed into Function obtains next task node and executes next task if all subtasks are all handled successfully.
Preferably, the subtask that corresponding subregion is generated according to the partition data of the preset quantity, will be described default The subtask of quantity is stored in the form of queue to default queue lists, and is sent task by message-oriented middleware and handled message Each member host into cluster, when so that each member host in the cluster listening to the task processing message, from institute It states and is obtained in default queue lists after subtask handled, the data run batch method further include:
It is monitored by broadcast consumption mode, when listening to the task processing message, is arranged from the default queue Subtask is obtained in table to be handled.
Preferably, described to be monitored by broadcast consumption mode, when listening to task processing message, from described Subtask is obtained in default queue lists to be handled, comprising:
It is monitored by broadcast consumption mode;
When listening to the task processing message, subtask is obtained from the default queue lists and is handled;
Other member hosts other than the member host in addition to obtaining subtask in the cluster are carried out obtaining son times Function of being engaged in block.
Preferably, described when listening to the task processing message, subtask is obtained from the default queue lists It is handled, comprising:
When listening to the task processing message, CPU usage is calculated, is preset according to the CPU usage from described Multiple subtasks are obtained in queue lists, and multiple subtasks are handled using multi-thread concurrent.
In addition, to achieve the above object, the present invention also proposes the member host in a kind of cluster, the member in the cluster Host includes that the data that can run on the memory and on the processor of memory, processor and being stored in run batch journey Sequence, the data run batch program and are arranged for carrying out the step of data as described above run the method for criticizing.
In addition, to achieve the above object, the present invention also proposes a kind of storage medium, data are stored on the storage medium Batch program is run, the data run the step of data race batch method as described above is realized when batch program is executed by processor.
In addition, to achieve the above object, the present invention also proposes that a kind of data run batch device, and the data run batch device packet It includes:
Random division module, for obtaining the corresponding batch pending data of current task node, by the batch wait locate It manages data and carries out random subregion, obtain the partition data of preset quantity;
Generation module will be described pre- for generating the subtask of corresponding subregion according to the partition data of the preset quantity If the subtask of quantity is stored in the form of queue to default queue lists, and is sent task processing by message-oriented middleware and disappeared Each member host into cluster is ceased, when so that each member host in the cluster listening to the task processing message, from Subtask is obtained in the default queue lists to be handled.
In the present invention, by obtaining the corresponding batch pending data of current task node, by batch number to be processed According to random subregion is carried out, the partition data of preset quantity is obtained, data subregion can be appointed independent of database with arbitrary extension Business number, increases the concurrent processing speed of data;The subtask of corresponding subregion is generated according to the partition data of the preset quantity, it will The subtask of the preset quantity is stored in the form of queue to default queue lists, and sends task by message-oriented middleware Each member host of the message into cluster is handled, so that each member host in the cluster listens to the task processing message When, subtask is obtained from the default queue lists and is handled, big data is based on, and high-volume data are handled by cluster, Distributed tasks can also execute task simultaneously, realize that the maximization of resource uses.
Detailed description of the invention
Fig. 1 is the structural representation of the member host in the cluster for the hardware running environment that the embodiment of the present invention is related to Figure;
Fig. 2 is the flow diagram that data of the present invention run batch method first embodiment;
Fig. 3 is the flow diagram that data of the present invention run batch method second embodiment;
Fig. 4 is the flow diagram that data of the present invention run batch method 3rd embodiment;
Fig. 5 is the structural block diagram that data of the present invention run batch device first embodiment.
The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.
Specific embodiment
It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not intended to limit the present invention.
Referring to Fig.1, Fig. 1 is member host's structure in the cluster for the hardware running environment that the embodiment of the present invention is related to Schematic diagram.
As shown in Figure 1, the member host in the cluster may include: processor 1001, such as central processing unit (Central Processing Unit, CPU), communication bus 1002, user interface 1003, network interface 1004, memory 1005.Wherein, communication bus 1002 is for realizing the connection communication between these components.User interface 1003 may include display Shield (Display), optional user interface 1003 can also include standard wireline interface and wireless interface, for user interface 1003 wireline interface can be USB interface in the present invention.Network interface 1004 optionally may include standard wireline interface, Wireless interface (such as Wireless Fidelity (WIreless-FIdelity, WI-FI) interface).Memory 1005 can be the random of high speed Memory (Random Access Memory, RAM) memory is accessed, stable memory (Non-volatile is also possible to Memory, NVM), such as magnetic disk storage.Memory 1005 optionally can also be the storage independently of aforementioned processor 1001 Device.
It will be understood by those skilled in the art that structure shown in Fig. 1 does not constitute the limit to the member host in cluster It is fixed, it may include perhaps combining certain components or different component layouts than illustrating more or fewer components.
As shown in Figure 1, as may include that operating system, network are logical in a kind of memory 1005 of computer storage medium Believe that module, Subscriber Interface Module SIM and data run batch program.
In member host in cluster shown in Fig. 1, network interface 1004 is mainly used for connecting background server, with institute It states background server and carries out data communication;User interface 1003 is mainly used for connecting user equipment;Member master in the cluster Machine calls the data stored in memory 1005 to run batch program by processor 1001, and executes number provided in an embodiment of the present invention According to race batch method.
Based on above-mentioned hardware configuration, propose that data of the present invention run the embodiment of batch method.
It is the flow diagram that data of the present invention run batch method first embodiment referring to Fig. 2, Fig. 2, proposes data of the present invention Run batch method first embodiment.
In the first embodiment, the data run batch method the following steps are included:
Step S10: the corresponding batch pending data of current task node is obtained, the batch pending data is carried out Random subregion obtains the partition data of preset quantity.
It should be understood that the executing subject of the present embodiment is the member host in the cluster, wherein in the cluster Member host can be the electronic equipments such as PC or server.It include more member hosts, all members in the cluster Host all have the function of it is the same, can with distributed tasks simultaneously can also execute task, realize resource maximization use.This reality The executing subject of example is applied as any one member host in the cluster, passes through default open source projects start by set date race batch and appoints Business, the default open source projects are quartz, after running batch task start, get the current task node, and work as by described in The corresponding random subregion of batch pending data of preceding task node, is generally divided into a-z, A-Z and 0-9 and has 62 subregions altogether The partition data, i.e., the described preset quantity be 62, subregion can also be expanded, be divided into greater number of subregion, data Subregion can increase the concurrent processing speed of data independent of database with arbitrary extension subtask number.
Step S20: the subtask of corresponding subregion is generated according to the partition data of the preset quantity, by the preset quantity The subtask stored in the form of queue to default queue lists, and pass through message-oriented middleware send task handle message to collect Each member host in group, when so that each member host in the cluster listening to task processing message, from described pre- If obtaining subtask in queue lists to be handled.
It will be appreciated that the son for generating the preset quantity of corresponding subregion is appointed after the completion of the batch pending data subregion Business, the usual preset quantity is 62 namely 62 subtasks, and task symbol a-z, A-Z and 0-9 can also be to subtasks Number is expanded, and is divided into greater number of subtask, and this 62 subtasks are put into the default queue in a manner of queue List, the default queue lists are redis queue lists, send the task by the message-oriented middleware and handle message, Inform that each host in the cluster has task to be processed.
It should be noted that the cluster is monitored by broadcast consumption mode, so each member host in the cluster The task processing message will be received, subtask is then obtained inside the redis queue lists, and call and obtain The corresponding processing class in subtask is handled, and the processing class is usually that subtask executes corresponding algorithm or rule etc., according to The processing class executes the subtask, obtains result data.For example, the subtask is the commission for calculating A order, A order Type be consumption borrow, then it is described processing class be consumption borrow type order commission calculation formula, according to the consumption loan type The commission calculation formula of order calculates the commission of A order, calculates and completes, i.e. subtask processing is completed.
In the concrete realization, when executing each subtask, log recording is carried out for processing mistake or abnormal data, it can be pre- A large amount of sample abnormal data and corresponding sample abnormal point are first obtained, it is different by sample described in convolutional neural networks model learning Regular data and corresponding sample abnormal point obtain anomalous identification model, are identified in log recording by the anomalous identification model Mistake or abnormal data carry out abnormality processing, complete subtask to be successfully processed to orient abnormal point.
It is by obtaining the corresponding batch pending data of current task node, the batch is to be processed in the present embodiment Data carry out random subregion, obtain the partition data of preset quantity, and data subregion, can be with arbitrary extension independent of database Number of tasks increases the concurrent processing speed of data;The subtask of corresponding subregion is generated according to the partition data of the preset quantity, The subtask of the preset quantity is stored in the form of queue to default queue lists, and is sent and is appointed by message-oriented middleware Each member host of the business processing message into cluster, so that each member host in the cluster listens to the task processing and disappears When breath, subtask being obtained from the default queue lists and is handled, big data is based on, high-volume number is handled by cluster According to distributed tasks can also execute task simultaneously, realize that the maximization of resource uses.
It is the flow diagram that data of the present invention run batch method second embodiment referring to Fig. 3, Fig. 3, based on shown in above-mentioned Fig. 2 First embodiment, propose that data of the present invention run the second embodiment for the method for criticizing.
In a second embodiment, after the step S20, further includes:
Step S30: when obtaining a subtask from the default queue lists, in the default queue lists Subtask quantity subtracts one, obtains remaining subtask number.
It should be understood that being preset for each member host from described in order to avoid the repetition of subtask obtains or executes Subtask is obtained in queue lists, distributed lock is realized using redis+lua script, when any one member host B is set from described When obtaining subtask in queue lists, other member hosts are carried out to obtain the block of subtask function, i.e., do not allow other members Host carries out obtaining at this moment the operation of subtask, reach data it is synchronous while greatly reduce the time brought by locking and open Pin.A subtask is obtained from the default queue lists to the member host B, to the institute in the default queue lists It states preset quantity and subtracts one, obtain remaining subtask number, for example the preset quantity is 62, the member host B is from described A subtask is obtained in default queue lists, then the remaining subtask number is 61.The member host B is from the default team After obtaining a subtask in column list, to the function of other member host's Open Access Journals subtasks, then other member hosts Subtask can be obtained from the default queue lists to be handled, until the subtask whole quilt in the default queue lists It obtains and executes.
In a second embodiment, after the step S30, further includes:
Step S40: when the remaining subtask number is zero, assert that the subtask of acquisition is the last item subtask, right The processing progress of the last item subtask is monitored, and when listening to the last item subtask and having handled, is used Cluster consumption pattern sends subtask by the message-oriented middleware and all handles completion message to the cluster.
It should be noted that any one member host C in the cluster is obtained from the default queue lists When subtask, subtract one to the subtask quantity in the default queue lists, remaining subtask number is obtained, if the residue Subtask number is zero, then illustrates that the subtask that the member host C is obtained from the default queue lists is last strip Task, then when the member host C has handled the last item subtask, can be used group consumption pattern by the message among Part sends subtask, and all processing completes message to the cluster.
In the concrete realization, in order to enable in the cluster each host can continuous high-efficient handle more multitask, described When the member host C in cluster has handled the last item subtask, subtask can be sent by the message-oriented middleware All processing completes message to the cluster, so that each host in the cluster can continue to obtain next task node to next The task of task node is handled, and the subtask consumption pattern that all message is completed in processing uses cluster consumption pattern, So only having a host listens to the subtask all processing completion message.
In a second embodiment, after the step S40, further includes:
When listening to the subtask whole processing completion message, judge whether all subtasks have all been processed into Function obtains next task node and executes next task if all subtasks are all handled successfully.
It will be appreciated that the member host C is the member host that the last item subtask is handled in the cluster, it is described The member host C sends a subtask all processing completion message, the subtask whole by the message-oriented middleware The consumption pattern that message is completed in processing is consumed using cluster, so only having a member host, such as member host D, is monitored To the subtask, all message is completed in processing, and the member host D is any one member host in the cluster.
In the concrete realization, when the member host D listens to the subtask whole processing completion message, judge each son Whether task has all been handled successfully, if all subtasks have all been handled successfully, automatically into next task Node executes, if all subtasks are not handled all successfully, suspends and obtains next task node, waiting problem row It looks into, if problem is checked and handles completion, wakes up task processor system once again, continue under obtaining next task node and executing One task.Data would generally be handled to subtask to be monitored, if being alerted after monitoring abnormal data, manpower intervention Check mistake.
In a second embodiment, when obtaining a subtask from the default queue lists, to the default queue Subtask quantity in list subtracts one, obtains remaining subtask number, when the remaining subtask number is zero, assert and obtains Subtask be the last item subtask, the processing progress of the last item subtask is monitored, it is described listening to When the last item subtask has been handled, subtask is sent by the message-oriented middleware using cluster consumption pattern and has all been handled At message to the cluster, by being counted to the subtask quantity in queue lists, thus grasp the execution of subtask into Degree, when all subtasks are processed completion, sending task in time, all message is completed in processing, so that the member master in cluster Machine obtains the processing that next task node carries out next task in time, realizes the promotion of high-volume data-handling efficiency.
It is the flow diagram that data of the present invention run batch method 3rd embodiment referring to Fig. 4, Fig. 4, based on shown in above-mentioned Fig. 3 Second embodiment, propose that data of the present invention run the 3rd embodiment for the method for criticizing.
In the third embodiment, after the step S20, further includes:
Step S201: being monitored by broadcast consumption mode, when listening to task processing message, from described pre- If obtaining subtask in queue lists to be handled.
It should be understood that the cluster in include more member hosts, all member hosts all have the function of it is the same, Task can also be executed simultaneously with distributed tasks, realize that the maximization of resource uses, each member host in the cluster is logical It crosses broadcast consumption mode to be monitored, so host all in the cluster can all receive the task processing message, then The acquisition subtask inside redis queue lists, and corresponding processing class is called to be handled, wrong for processing or abnormal Data do log recording.
Further, the step S201, comprising:
It is monitored by broadcast consumption mode;
When listening to the task processing message, subtask is obtained from the default queue lists and is handled;
Other member hosts other than the member host in addition to obtaining subtask in the cluster are carried out obtaining son times Function of being engaged in block.
It should be noted that host distributed tasks all in the cluster can also execute task simultaneously, in order to avoid The repetition of subtask is obtained or is executed, and obtains subtask use from the default queue lists for each member host Redis+lua script realize distributed lock, when any one member host B from it is described set in queue lists obtain subtask when, Other member hosts other than the member host in addition to obtaining subtask in the cluster are carried out obtaining subtask function envelope Lock does not allow other member hosts to carry out obtaining the operation of subtask at this moment, the member host B is from the default queue After obtaining one or more subtasks in list, to its other than the member host in addition to obtaining subtask in the cluster His member host carries out the function of Open Access Journals subtask, then other member hosts can obtain son from the default queue lists Task is handled, until the subtask in the default queue lists is all acquired execution.It is real using redis+lua script Existing distributed lock greatly reduces time overhead brought by locking while reaching data and synchronizing.
In the present embodiment, described when listening to the task processing message, it is obtained from the default queue lists Subtask is handled, comprising:
When listening to the task processing message, CPU usage is calculated, is preset according to the CPU usage from described Multiple subtasks are obtained in queue lists, and multiple subtasks are handled using multi-thread concurrent.
It will be appreciated that each member host in the cluster can handle the subtask of different subregions, and every simultaneously Member host handles task using multi-thread concurrent, and processing speed is fast.Concurrent processing task is determined according to the occupancy of CPU, is led to The concurrent processing simultaneously of normal 5 threads, improves batch subtask processing speed.Each host in the cluster can be respective by calculating CPU usage, according to the CPU usage judge obtain subtask quantity obtained under the premise of not influencing processing speed Multiple subtasks are taken, so that multiple subtasks that multiple thread concurrent processing obtain, promote the default queue lists neutron and appoint Business treatment effeciency.
In the present embodiment, it is monitored by broadcast consumption mode, when listening to the task processing message, from institute It states acquisition subtask in default queue lists to be handled, all member hosts can receive the task processing and disappear in cluster It ceases, each member host's distributed tasks can also execute task simultaneously in cluster, realize that the maximization of resource uses, also improve son times Business treatment effeciency.
In addition, the embodiment of the present invention also proposes a kind of storage medium, it is stored with data on the storage medium and runs batch program, The data run when batch program is executed by processor and realize the step of data as described above run the method for criticizing.
In addition, the embodiment of the present invention also proposes that a kind of data run batch device, and the data run batch device and include: referring to Fig. 5
Random division module 10, for obtaining the corresponding batch pending data of current task node, by the batch to It handles data and carries out random subregion, obtain the partition data of preset quantity.
It should be understood that the cluster in include more member hosts, all member hosts all have the function of it is the same, Task can also be executed simultaneously with distributed tasks, realize that the maximization of resource uses.By presetting open source projects start by set date Batch task is run, the default open source projects are quartz, after running batch task start, get the current task node, and will The corresponding random subregion of batch pending data of the current task node, is generally divided into a-z, A-Z and 0-9 and has 62 altogether The partition data of a subregion, i.e., the described preset quantity are 62, can also expand to subregion, be divided into greater number of point Area, data subregion can increase the concurrent processing speed of data independent of database with arbitrary extension subtask number.
Generation module 20 will be described for generating the subtask of corresponding subregion according to the partition data of the preset quantity The subtask of preset quantity is stored in the form of queue to default queue lists, and sends task processing by message-oriented middleware Each member host of the message into cluster, when so that each member host in the cluster listening to the task processing message, Subtask is obtained from the default queue lists to be handled.
It will be appreciated that the son for generating the preset quantity of corresponding subregion is appointed after the completion of the batch pending data subregion Business, the usual preset quantity is 62 namely 62 subtasks, and task symbol a-z, A-Z and 0-9 can also be to subtasks Number is expanded, and is divided into greater number of subtask, and this 62 subtasks are put into the default queue in a manner of queue List, the default queue lists are redis queue lists, send the task by the message-oriented middleware and handle message, Inform that each host in the cluster has task to be processed.
It should be noted that the cluster is monitored by broadcast consumption mode, so each member host in the cluster The task processing message will be received, subtask is then obtained inside the redis queue lists, and call and obtain The corresponding processing class in subtask is handled, and the processing class is usually that subtask executes corresponding algorithm or rule etc., according to The processing class executes the subtask, obtains result data.For example, the subtask is the commission for calculating A order, A order Type be consumption borrow, then it is described processing class be consumption borrow type order commission calculation formula, according to the consumption loan type The commission calculation formula of order calculates the commission of A order, calculates and completes, i.e. subtask processing is completed.
In the concrete realization, when executing each subtask, log recording is carried out for processing mistake or abnormal data, it can be pre- A large amount of sample abnormal data and corresponding sample abnormal point are first obtained, it is different by sample described in convolutional neural networks model learning Regular data and corresponding sample abnormal point obtain anomalous identification model, are identified in log recording by the anomalous identification model Mistake or abnormal data carry out abnormality processing, complete subtask to be successfully processed to orient abnormal point.
It is by obtaining the corresponding batch pending data of current task node, the batch is to be processed in the present embodiment Data carry out random subregion, obtain the partition data of preset quantity, and data subregion, can be with arbitrary extension independent of database Number of tasks increases the concurrent processing speed of data;The subtask of corresponding subregion is generated according to the partition data of the preset quantity, The subtask of the preset quantity is stored in the form of queue to default queue lists, and is sent and is appointed by message-oriented middleware Each member host of the business processing message into cluster, so that each member host in the cluster listens to the task processing and disappears When breath, subtask being obtained from the default queue lists and is handled, big data is based on, high-volume number is handled by cluster According to distributed tasks can also execute task simultaneously, realize that the maximization of resource uses.
In one embodiment, the data run batch device further include:
Computing module, for arranging the default queue when obtaining a subtask from the default queue lists Subtask quantity in table subtracts one, obtains remaining subtask number.
In one embodiment, the data run batch device further include:
Sending module, for assert that the subtask of acquisition is appointed for last strip when the remaining subtask number is zero Business, monitors the processing progress of the last item subtask, when listening to the last item subtask and having handled, Subtask is sent by the message-oriented middleware using cluster consumption pattern and all handles completion message to the cluster.
In one embodiment, the data run batch device further include:
Module is obtained, when all handling completion message for listening to the subtask, judges all subtasks whether It is all handled successfully, if all subtasks are all handled successfully, obtains next task node and execute next Task.
In one embodiment, the acquisition module is also used to be monitored by broadcast consumption mode, described listening to When task handles message, subtask is obtained from the default queue lists and is handled.
In one embodiment, the acquisition module is also used to be monitored by broadcast consumption mode;It is described listening to When task handles message, subtask is obtained from the default queue lists and is handled;To in the cluster in addition to obtaining Other member hosts except the member host of subtask carry out obtaining the block of subtask function.
In one embodiment, the acquisition module is also used to calculate CPU when listening to the task processing message and account for With rate, multiple subtasks are obtained from the default queue lists according to the CPU usage, and are handled using multi-thread concurrent Multiple subtasks.
Data of the present invention run the other embodiments of batch device or specific implementation and can refer to above-mentioned each method and implement Example, details are not described herein again.
It should be noted that, in this document, the terms "include", "comprise" or its any other variant are intended to non-row His property includes, so that the process, method, article or the system that include a series of elements not only include those elements, and And further include other elements that are not explicitly listed, or further include for this process, method, article or system institute it is intrinsic Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including being somebody's turn to do There is also other identical elements in the process, method of element, article or system.
The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.If listing equipment for drying Unit claim in, several in these devices, which can be, to be embodied by the same item of hardware.Word first, Second and the use of third etc. do not indicate any sequence, can be mark by these word explanations.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side Method can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but in many cases The former is more preferably embodiment.Based on this understanding, technical solution of the present invention substantially in other words does the prior art The part contributed out can be embodied in the form of software products, which is stored in a storage medium (such as read-only memory mirror image (Read Only Memory image, ROM)/random access memory (Random Access Memory, RAM), magnetic disk, CD) in, including some instructions are used so that terminal device (can be mobile phone, computer, Server, air conditioner or network equipment etc.) execute method described in each embodiment of the present invention.
The above is only a preferred embodiment of the present invention, is not intended to limit the scope of the invention, all to utilize this hair Equivalent structure or equivalent flow shift made by bright specification and accompanying drawing content is applied directly or indirectly in other relevant skills Art field, is included within the scope of the present invention.

Claims (10)

1. a kind of data run batch method, which is characterized in that the data run batch method the following steps are included:
The corresponding batch pending data of current task node is obtained, the batch pending data is subjected to random subregion, is obtained Obtain the partition data of preset quantity;
The subtask that corresponding subregion is generated according to the partition data of the preset quantity, by the subtask of the preset quantity It is stored in the form of queue to default queue lists, and task is sent by message-oriented middleware and handles each member of the message into cluster Host, when so that each member host in the cluster listening to the task processing message, from the default queue lists Subtask is obtained to be handled.
2. data as described in claim 1 run batch method, which is characterized in that the partition data according to the preset quantity The subtask for generating corresponding subregion, the subtask of the preset quantity is stored in the form of queue to default queue lists, And task is sent by message-oriented middleware and handles each member host of the message into cluster, so that each member master in the cluster When machine listens to the task processing message, after acquisition subtask is handled in the default queue lists, the number According to race batch method further include:
When from the default queue lists obtain a subtask when, to the subtask quantity in the default queue lists into Row subtracts one, obtains remaining subtask number.
3. data as claimed in claim 2 run batch method, which is characterized in that described to be obtained from the default queue lists When one subtask, subtract one to the subtask quantity in the default queue lists, after obtaining remaining subtask number, institute It states data and runs batch method further include:
When the remaining subtask number is zero, assert that the subtask of acquisition is the last item subtask, to described the last item The processing progress of subtask is monitored, when listening to the last item subtask and having handled, using cluster consumption pattern Sending subtask by the message-oriented middleware, all processing completes message to the cluster.
4. data as claimed in claim 3 run batch method, which is characterized in that it is described when the remaining subtask number is zero, Assert that the subtask obtained is the last item subtask, the processing progress of the last item subtask is monitored, is being supervised It is complete by message-oriented middleware transmission subtask using cluster consumption pattern when hearing that the last item subtask has been handled After message to the cluster is completed in portion's processing, the data run batch method further include:
When listening to the subtask whole processing completion message, judge whether all subtasks have all been handled successfully, if All subtasks have all been handled successfully, then obtain next task node and execute next task.
5. as data of any of claims 1-4 run batch method, which is characterized in that described according to the preset quantity Partition data generate the subtask of corresponding subregion, the subtask of the preset quantity is stored in the form of queue to default Queue lists, and task is sent by message-oriented middleware and handles each member host of the message into cluster, so that in the cluster Each member host when listening to task processing message, obtain subtask from the default queue lists and carry out handling it Afterwards, the data run batch method further include:
It is monitored by broadcast consumption mode, when listening to the task processing message, from the default queue lists Subtask is obtained to be handled.
6. data as claimed in claim 5 run batch method, which is characterized in that it is described to be monitored by broadcast consumption mode, When listening to the task processing message, subtask is obtained from the default queue lists and is handled, comprising:
It is monitored by broadcast consumption mode;
When listening to the task processing message, subtask is obtained from the default queue lists and is handled;
Other member hosts other than the member host in addition to obtaining subtask in the cluster are carried out obtaining subtask function It can block.
7. data as claimed in claim 6 run batch method, which is characterized in that described to listen to the task processing message When, subtask, which is obtained, from the default queue lists is handled, comprising:
When listening to task processing message, CPU usage is calculated, according to the CPU usage from the default queue Multiple subtasks are obtained in list, and multiple subtasks are handled using multi-thread concurrent.
8. the member host in a kind of cluster, which is characterized in that the member host in the cluster includes: memory, processor And be stored in the data that can be run on the memory and on the processor and run batch program, the data run batch program by institute State the step of data race batch method as described in any one of claims 1 to 7 is realized when processor executes.
9. a kind of storage medium, which is characterized in that be stored with data on the storage medium and run batch program, the data run batch journey The step of data as described in any one of claims 1 to 7 run the method for criticizing is realized when sequence is executed by processor.
10. a kind of data run batch device, which is characterized in that the data run batch device and include:
Random division module, for obtaining the corresponding batch pending data of current task node, by batch number to be processed According to random subregion is carried out, the partition data of preset quantity is obtained;
Generation module, for generating the subtask of corresponding subregion according to the partition data of the preset quantity, by the present count The subtask of amount is stored in the form of queue to default queue lists, and is sent task by message-oriented middleware and handled message extremely Each member host in cluster, when so that each member host in the cluster listening to task processing message, from described Subtask is obtained in default queue lists to be handled.
CN201910553729.4A 2019-06-20 2019-06-20 Data run the member host in batch method, apparatus, storage medium and cluster Pending CN110362401A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910553729.4A CN110362401A (en) 2019-06-20 2019-06-20 Data run the member host in batch method, apparatus, storage medium and cluster
PCT/CN2019/121210 WO2020253116A1 (en) 2019-06-20 2019-11-27 Batch data execution method, device, storage medium, and member host in cluster

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910553729.4A CN110362401A (en) 2019-06-20 2019-06-20 Data run the member host in batch method, apparatus, storage medium and cluster

Publications (1)

Publication Number Publication Date
CN110362401A true CN110362401A (en) 2019-10-22

Family

ID=68217029

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910553729.4A Pending CN110362401A (en) 2019-06-20 2019-06-20 Data run the member host in batch method, apparatus, storage medium and cluster

Country Status (2)

Country Link
CN (1) CN110362401A (en)
WO (1) WO2020253116A1 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111679920A (en) * 2020-06-08 2020-09-18 中国银行股份有限公司 Method and device for processing batch equity data
WO2020253116A1 (en) * 2019-06-20 2020-12-24 深圳壹账通智能科技有限公司 Batch data execution method, device, storage medium, and member host in cluster
CN112148505A (en) * 2020-09-18 2020-12-29 京东数字科技控股股份有限公司 Data batching system, method, electronic device and storage medium
CN113485812A (en) * 2021-07-23 2021-10-08 重庆富民银行股份有限公司 Partition parallel processing method and system based on large data volume task
CN113537937A (en) * 2021-07-16 2021-10-22 重庆富民银行股份有限公司 Task arrangement method, device and equipment based on topological sorting and storage medium
CN113568761A (en) * 2020-04-28 2021-10-29 中国联合网络通信集团有限公司 Data processing method, device, equipment and storage medium
CN114240109A (en) * 2021-12-06 2022-03-25 中电金信软件有限公司 Method, device and system for cross-region processing batch running task
CN116501499A (en) * 2023-05-17 2023-07-28 建信金融科技有限责任公司 Data batch running method and device, electronic equipment and storage medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114168275B (en) * 2021-10-28 2022-10-18 厦门国际银行股份有限公司 Task scheduling method, system, terminal device and storage medium

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100306776A1 (en) * 2009-05-28 2010-12-02 Palo Alto Research Center Incorporated Data center batch job quality of service control
US20140082170A1 (en) * 2012-09-19 2014-03-20 Oracle International Corporation System and method for small batching processing of usage requests
CN104092794A (en) * 2014-07-25 2014-10-08 中国工商银行股份有限公司 Batch course processing method and system
CN106648850A (en) * 2015-11-02 2017-05-10 佳能株式会社 Information processing apparatus and method of controlling the same
US20170242726A1 (en) * 2016-02-18 2017-08-24 Red Hat, Inc. Batched commit in distributed transactions
CN107291911A (en) * 2017-06-26 2017-10-24 北京奇艺世纪科技有限公司 A kind of method for detecting abnormality and device
CN108255619A (en) * 2017-12-28 2018-07-06 新华三大数据技术有限公司 A kind of data processing method and device
CN108564167A (en) * 2018-04-09 2018-09-21 杭州乾圆科技有限公司 The recognition methods of abnormal data among a kind of data set
CN108733477A (en) * 2017-04-20 2018-11-02 中国移动通信集团湖北有限公司 The method, apparatus and equipment of data clusterization processing
CN108985632A (en) * 2018-07-16 2018-12-11 国网上海市电力公司 A kind of electricity consumption data abnormality detection model based on isolated forest algorithm
CN109144731A (en) * 2018-08-31 2019-01-04 中国平安人寿保险股份有限公司 Data processing method, device, computer equipment and storage medium
CN109298225A (en) * 2018-09-29 2019-02-01 国网四川省电力公司电力科学研究院 A kind of voltage metric data abnormality automatic identification model and method
CN109299135A (en) * 2018-11-26 2019-02-01 平安科技(深圳)有限公司 Abnormal inquiry recognition methods, identification equipment and medium based on identification model
CN109558600A (en) * 2018-11-14 2019-04-02 北京字节跳动网络技术有限公司 Translation processing method and device
CN109672627A (en) * 2018-09-26 2019-04-23 深圳壹账通智能科技有限公司 Method for processing business, platform, equipment and storage medium based on cluster server

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8245081B2 (en) * 2010-02-10 2012-08-14 Vmware, Inc. Error reporting through observation correlation
CN109461068A (en) * 2018-09-13 2019-03-12 深圳壹账通智能科技有限公司 Judgment method, device, equipment and the computer readable storage medium of fraud
CN110362401A (en) * 2019-06-20 2019-10-22 深圳壹账通智能科技有限公司 Data run the member host in batch method, apparatus, storage medium and cluster

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100306776A1 (en) * 2009-05-28 2010-12-02 Palo Alto Research Center Incorporated Data center batch job quality of service control
US20140082170A1 (en) * 2012-09-19 2014-03-20 Oracle International Corporation System and method for small batching processing of usage requests
CN104092794A (en) * 2014-07-25 2014-10-08 中国工商银行股份有限公司 Batch course processing method and system
CN106648850A (en) * 2015-11-02 2017-05-10 佳能株式会社 Information processing apparatus and method of controlling the same
US20170242726A1 (en) * 2016-02-18 2017-08-24 Red Hat, Inc. Batched commit in distributed transactions
CN108733477A (en) * 2017-04-20 2018-11-02 中国移动通信集团湖北有限公司 The method, apparatus and equipment of data clusterization processing
CN107291911A (en) * 2017-06-26 2017-10-24 北京奇艺世纪科技有限公司 A kind of method for detecting abnormality and device
CN108255619A (en) * 2017-12-28 2018-07-06 新华三大数据技术有限公司 A kind of data processing method and device
CN108564167A (en) * 2018-04-09 2018-09-21 杭州乾圆科技有限公司 The recognition methods of abnormal data among a kind of data set
CN108985632A (en) * 2018-07-16 2018-12-11 国网上海市电力公司 A kind of electricity consumption data abnormality detection model based on isolated forest algorithm
CN109144731A (en) * 2018-08-31 2019-01-04 中国平安人寿保险股份有限公司 Data processing method, device, computer equipment and storage medium
CN109672627A (en) * 2018-09-26 2019-04-23 深圳壹账通智能科技有限公司 Method for processing business, platform, equipment and storage medium based on cluster server
CN109298225A (en) * 2018-09-29 2019-02-01 国网四川省电力公司电力科学研究院 A kind of voltage metric data abnormality automatic identification model and method
CN109558600A (en) * 2018-11-14 2019-04-02 北京字节跳动网络技术有限公司 Translation processing method and device
CN109299135A (en) * 2018-11-26 2019-02-01 平安科技(深圳)有限公司 Abnormal inquiry recognition methods, identification equipment and medium based on identification model

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020253116A1 (en) * 2019-06-20 2020-12-24 深圳壹账通智能科技有限公司 Batch data execution method, device, storage medium, and member host in cluster
CN113568761A (en) * 2020-04-28 2021-10-29 中国联合网络通信集团有限公司 Data processing method, device, equipment and storage medium
CN113568761B (en) * 2020-04-28 2023-06-27 中国联合网络通信集团有限公司 Data processing method, device, equipment and storage medium
CN111679920A (en) * 2020-06-08 2020-09-18 中国银行股份有限公司 Method and device for processing batch equity data
CN112148505A (en) * 2020-09-18 2020-12-29 京东数字科技控股股份有限公司 Data batching system, method, electronic device and storage medium
CN113537937A (en) * 2021-07-16 2021-10-22 重庆富民银行股份有限公司 Task arrangement method, device and equipment based on topological sorting and storage medium
CN113485812A (en) * 2021-07-23 2021-10-08 重庆富民银行股份有限公司 Partition parallel processing method and system based on large data volume task
CN113485812B (en) * 2021-07-23 2023-12-12 重庆富民银行股份有限公司 Partition parallel processing method and system based on large-data-volume task
CN114240109A (en) * 2021-12-06 2022-03-25 中电金信软件有限公司 Method, device and system for cross-region processing batch running task
CN116501499A (en) * 2023-05-17 2023-07-28 建信金融科技有限责任公司 Data batch running method and device, electronic equipment and storage medium
CN116501499B (en) * 2023-05-17 2023-09-19 建信金融科技有限责任公司 Data batch running method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
WO2020253116A1 (en) 2020-12-24

Similar Documents

Publication Publication Date Title
CN110362401A (en) Data run the member host in batch method, apparatus, storage medium and cluster
CN107729139B (en) Method and device for concurrently acquiring resources
US20150347305A1 (en) Method and apparatus for outputting log information
CN109656782A (en) Visual scheduling monitoring method, device and server
CN108566290A (en) service configuration management method, system, storage medium and server
CN111190753B (en) Distributed task processing method and device, storage medium and computer equipment
CN110430068B (en) Characteristic engineering arrangement method and device
CN112306719B (en) Task scheduling method and device
CN109840142A (en) Thread control method, device, electronic equipment and storage medium based on cloud monitoring
CN110300067A (en) Queue regulation method, device, equipment and computer readable storage medium
US11656902B2 (en) Distributed container image construction scheduling system and method
CN114610474A (en) Multi-strategy job scheduling method and system in heterogeneous supercomputing environment
WO2024082853A1 (en) Method and system for application performance optimization in high-performance computing
CN108512782A (en) Accesses control list is grouped method of adjustment, the network equipment and system
CN110221936A (en) Database alert processing method, device, equipment and computer readable storage medium
CN113110867A (en) RPA robot management method, device, server and storage medium
CN111104281B (en) Game performance monitoring method, device, system and storage medium
CN112395062A (en) Task processing method, device, equipment and computer readable storage medium
CN111831452A (en) Task execution method and device, storage medium and electronic device
CN109670932B (en) Credit data accounting method, apparatus, system and computer storage medium
CN104092794B (en) Batch process handling method and system
CN115563160A (en) Data processing method, data processing device, computer equipment and computer readable storage medium
CN115712572A (en) Task testing method and device, storage medium and electronic device
CN111008146A (en) Method and system for testing safety of cloud host
CN115344370A (en) Task scheduling method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20191022

RJ01 Rejection of invention patent application after publication